@clawhub-lesliepie-e0aab784d6
Browser automation with human-like behavior. Use when: sites require JavaScript, login sessions, form filling, clicking, scrolling, or complex interactions....
---
name: browseragent
description: "Browser automation with human-like behavior. Use when: sites require JavaScript, login sessions, form filling, clicking, scrolling, or complex interactions. Simulates real user behavior to avoid bot detection."
homepage: https://playwright.dev
metadata: { "openclaw": { "emoji": "🤖", "requires": { "bins": ["node", "npx"], "npm": ["playwright"] } } }
---
# BrowserAgent Skill
Automate browser interactions with human-like behavior for complex web tasks.
## When to Use
✅ **USE this skill when:**
- Sites require JavaScript rendering
- Login/authentication needed
- Form filling and submission
- Clicking buttons, navigating menus
- Scrolling to load content (infinite scroll)
- Taking screenshots of web pages
- Downloading files from web
- Complex multi-step workflows
- Sites with bot detection (with human simulation)
- Testing web applications
- Data extraction from dynamic sites
## When NOT to Use
❌ **DON'T use this skill when:**
- Simple static content (use WebScraper)
- Single URL fetch (use web_fetch tool)
- High-volume scraping (respect site limits)
- Sites explicitly blocking automation
- Tasks that violate terms of service
## Installation
```bash
# Install Playwright
npm install -g playwright
# Install browsers (Chrome, Firefox, Safari)
npx playwright install
# Install Chrome extension for stealth (optional)
npm install -g playwright-extra playwright-extra-plugin-stealth
```
## Core Commands
### Basic Page Navigation
```javascript
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({
headless: false, // Set true for background execution
slowMo: 100 // Slow down by 100ms (human-like)
});
const page = await browser.newPage({
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
});
await page.goto('https://example.com', { waitUntil: 'networkidle' });
await browser.close();
})();
```
### Human-Like Behavior
```javascript
// Random delay between actions (human thinking time)
async function humanDelay(min = 1000, max = 3000) {
const delay = Math.floor(Math.random() * (max - min + 1)) + min;
await page.waitForTimeout(delay);
}
// Random mouse movement simulation
async function humanMove(element) {
const box = await element.boundingBox();
const x = box.x + Math.random() * box.width;
const y = box.y + Math.random() * box.height;
await page.mouse.move(x, y, { steps: 10 }); // Multiple steps = smoother
}
// Click with human behavior
async function humanClick(selector) {
await humanDelay(500, 1500); // Think before clicking
const element = await page.$(selector);
await humanMove(element);
await element.click();
await humanDelay(1000, 2000); // Wait after clicking
}
// Type with random speed (like real typing)
async function humanType(selector, text) {
await humanDelay(500, 1000);
await page.click(selector);
await page.waitForTimeout(300);
for (const char of text) {
await page.keyboard.type(char, { delay: Math.random() * 50 + 50 });
}
}
```
### Login Session Management
```javascript
// Save login state
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://example.com/login');
await humanType('#email', '[email protected]');
await humanType('#password', 'secret123');
await humanClick('button[type="submit"]');
// Wait for login to complete
await page.waitForNavigation({ waitUntil: 'networkidle' });
// Save cookies/storage for reuse
await page.context().storageState({ path: './session.json' });
await browser.close();
// Reuse session later
const browser2 = await chromium.launch({ headless: false });
const context = await browser2.newContext({
storageState: './session.json'
});
const page2 = await context.newPage();
await page2.goto('https://example.com'); // Already logged in!
```
### Screenshot & Recording
```javascript
// Full page screenshot
await page.screenshot({
path: 'fullpage.png',
fullPage: true
});
// Element screenshot
const element = await page.$('.product-card');
await element.screenshot({ path: 'product.png' });
// Start screen recording (video)
const browser = await chromium.launch({
recordVideo: { dir: 'videos/', size: { width: 1920, height: 1080 } }
});
```
### Data Extraction
```javascript
// Extract structured data
const data = await page.evaluate(() => {
const products = [];
document.querySelectorAll('.product-item').forEach(item => {
products.push({
name: item.querySelector('.name')?.textContent.trim(),
price: item.querySelector('.price')?.textContent.trim(),
image: item.querySelector('img')?.src,
url: item.querySelector('a')?.href
});
});
return products;
});
console.log(JSON.stringify(data, null, 2));
```
### Form Filling
```javascript
// Fill form fields
await humanType('#firstName', 'John');
await humanType('#lastName', 'Doe');
await humanType('#email', '[email protected]');
// Select dropdown
await page.selectOption('select#country', 'US');
// Check checkbox/radio
await page.check('#terms');
await page.check('input[value="premium"]');
// File upload
await page.setInputFiles('input[type="file"]', '/path/to/file.pdf');
// Submit form
await humanClick('button[type="submit"]');
await page.waitForNavigation({ waitUntil: 'networkidle' });
```
### Scrolling & Infinite Scroll
```javascript
// Scroll to bottom
await page.evaluate(() => {
window.scrollTo(0, document.body.scrollHeight);
});
// Incremental scroll (human-like)
async function scrollPage() {
const scrollStep = 300;
const delay = 500;
let position = 0;
while (position < document.body.scrollHeight) {
window.scrollTo(0, position);
position += scrollStep;
await new Promise(r => setTimeout(r, delay));
}
}
await page.evaluate(scrollPage);
// Wait for lazy-loaded content
await page.waitForSelector('.lazy-loaded-item');
```
### Handling Popups & Dialogs
```javascript
// Handle alert/confirm
page.on('dialog', async dialog => {
console.log(`Dialog message: dialog.message()`);
await dialog.accept(); // or dialog.dismiss()
});
// Handle new tab popup
const [newPage] = await Promise.all([
context.waitForEvent('page'),
page.click('a[target="_blank"]')
]);
await newPage.bringToFront();
// Close popup
await page.click('.popup-close');
```
### Wait Strategies
```javascript
// Wait for element
await page.waitForSelector('.content', { timeout: 10000 });
// Wait for element to be visible
await page.waitForSelector('.button', { state: 'visible' });
// Wait for navigation
await page.waitForNavigation({ waitUntil: 'networkidle' });
// Wait for specific URL
await page.waitForURL('**/dashboard');
// Wait for text content
await page.waitForFunction(
() => document.body.innerText.includes('Success'),
{ timeout: 5000 }
);
// Custom wait condition
await page.waitForFunction(
() => document.querySelectorAll('.item').length > 10,
{ timeout: 10000 }
);
```
## Advanced: Stealth Mode
```javascript
const playwright = require('playwright-extra');
const stealth = require('playwright-extra-plugin-stealth');
playwright.use(stealth());
const browser = await playwright.chromium.launch({
headless: false,
args: [
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
'--disable-dev-shm-usage'
]
});
const page = await browser.newPage();
// Additional stealth
await page.evaluateOnNewDocument(() => {
// Override navigator.webdriver
Object.defineProperty(navigator, 'webdriver', { get: () => false });
// Override plugins
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5]
});
// Override languages
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
});
});
```
## Common Workflows
### 1. Login + Data Extraction
```javascript
async function loginAndScrape() {
const browser = await chromium.launch({ headless: false, slowMo: 100 });
const page = await browser.newPage();
// Login
await page.goto('https://example.com/login');
await humanType('#username', 'myuser');
await humanType('#password', 'mypass');
await humanClick('button[type="submit"]');
await page.waitForNavigation({ waitUntil: 'networkidle' });
// Navigate to data page
await page.goto('https://example.com/data');
await humanDelay(2000, 3000);
// Extract data
const data = await page.evaluate(() => {
// ... extraction logic
});
await browser.close();
return data;
}
```
### 2. Multi-Page Scraping
```javascript
async function scrapeMultiplePages() {
const browser = await chromium.launch({ headless: false, slowMo: 100 });
const page = await browser.newPage();
const results = [];
for (let i = 1; i <= 5; i++) {
await page.goto(`https://example.com/products?page=i`);
await humanDelay(2000, 4000); // Human-like delay between pages
const items = await page.evaluate(() => {
// ... extract items
});
results.push(...items);
// Click next page
if (i < 5) {
await humanClick('.next-page');
await page.waitForNavigation({ waitUntil: 'networkidle' });
}
}
await browser.close();
return results;
}
```
### 3. Form Submission with Verification
```javascript
async function submitForm() {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://example.com/contact');
// Fill form
await humanType('#name', 'John Doe');
await humanType('#email', '[email protected]');
await humanType('#message', 'Hello!');
// Submit
await humanClick('button[type="submit"]');
// Verify submission
await page.waitForSelector('.success-message', { timeout: 5000 });
const success = await page.$('.success-message') !== null;
await browser.close();
return success;
}
```
## Error Handling
```javascript
async function robustScrape(url, retries = 3) {
for (let i = 0; i < retries; i++) {
try {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.goto(url, {
waitUntil: 'networkidle',
timeout: 30000
});
// ... scraping logic
await browser.close();
return result;
} catch (error) {
console.log(`Attempt i + 1 failed: error.message`);
if (i === retries - 1) throw error;
await new Promise(r => setTimeout(r, 2000 * (i + 1))); // Exponential backoff
}
}
}
```
## Human Behavior Patterns
### Mouse Movement
```javascript
// Bezier curve mouse movement (natural looking)
async function smoothMove(startX, startY, endX, endY, duration = 1000) {
const steps = 20;
const startTime = Date.now();
for (let i = 0; i <= steps; i++) {
const progress = i / steps;
const t = progress * Math.PI;
// Ease in-out
const ease = (1 - Math.cos(t)) / 2;
const x = startX + (endX - startX) * ease;
const y = startY + (endY - startY) * ease;
await page.mouse.move(x, y);
await page.waitForTimeout(duration / steps);
}
}
```
### Typing Patterns
```javascript
// Realistic typing with errors and corrections
async function realisticType(selector, text) {
await page.click(selector);
const words = text.split(' ');
for (const word of words) {
// Occasional typo (5% chance)
if (Math.random() < 0.05 && word.length > 3) {
const pos = Math.floor(Math.random() * word.length);
const typo = word.slice(0, pos) + 'x' + word.slice(pos + 1);
await page.keyboard.type(typo, { delay: 80 });
// Backspace and correct
await page.keyboard.press('Backspace');
await page.waitForTimeout(200);
await page.keyboard.type(word[pos], { delay: 80 });
} else {
await page.keyboard.type(word + ' ', { delay: 60 });
}
// Random pause between words
await page.waitForTimeout(Math.random() * 300 + 100);
}
}
```
### Scroll Patterns
```javascript
// Human-like scrolling (variable speed, occasional pauses)
async function humanScroll() {
let position = 0;
const maxScroll = await page.evaluate(() => document.body.scrollHeight - window.innerHeight);
while (position < maxScroll) {
const scrollAmount = Math.random() * 200 + 100;
position = Math.min(position + scrollAmount, maxScroll);
await page.evaluate(y => window.scrollTo(0, y), position);
// Random pause
await page.waitForTimeout(Math.random() * 1000 + 500);
// Occasional scroll back up (reading behavior)
if (Math.random() < 0.1) {
const backScroll = Math.random() * 100;
await page.evaluate(y => window.scrollBy(0, -y), backScroll);
await page.waitForTimeout(500);
}
}
}
```
## Configuration Reference
### Browser Launch Options
```javascript
{
headless: false, // false = visible browser
slowMo: 100, // Slow operations by 100ms
channel: 'chrome', // Use Chrome instead of Chromium
executablePath: '/path/to/chrome', // Custom browser path
args: [ // Browser arguments
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
'--disable-dev-shm-usage',
'--disable-gpu'
]
}
```
### Page Context Options
```javascript
{
viewport: { width: 1920, height: 1080 },
userAgent: 'Mozilla/5.0 ...',
locale: 'en-US',
timezoneId: 'America/New_York',
geolocation: { longitude: -74.0, latitude: 40.7 },
permissions: ['geolocation', 'notifications'],
colorScheme: 'light', // or 'dark'
deviceScaleFactor: 1,
isMobile: false,
hasTouch: false
}
```
## Troubleshooting
| Problem | Solution |
|---------|----------|
| Site detects bot | Use stealth plugin, add human delays |
| Element not found | Increase wait time, check selector |
| Timeout errors | Increase timeout, check network |
| Captcha appears | Use real session, reduce request rate |
| Memory leak | Close browser, dispose contexts |
| Slow performance | Use headless mode, reduce slowMo |
## Security & Ethics
⚠️ **Important Guidelines:**
1. **Respect robots.txt** - Check site's scraping policy
2. **Rate limiting** - Add delays (2-5s between actions)
3. **Terms of Service** - Don't violate site ToS
4. **Personal data** - Don't scrape PII without consent
5. **Authentication** - Only use your own credentials
6. **Copyright** - Respect intellectual property
7. **Server load** - Don't overwhelm servers
## Related Skills
- **WebScraper** - For simple static content extraction
- **web_search** - For finding URLs to scrape
- **coding-agent** - For processing extracted data
## Quick Reference
```bash
# Install
npm install -g playwright
npx playwright install
# Run script
node browser-script.js
# Debug with inspector
PWDEBUG=1 node browser-script.js
# Record interactions
npx playwright codegen https://example.com
```
FILE:package.json
{
"name": "browseragent",
"version": "1.0.0",
"description": "Browser automation with human-like behavior. Use when: sites require JavaScript, login sessions, form filling, clicking, scrolling, or complex interactions.",
"homepage": "https://playwright.dev",
"author": "OpenClaw Community",
"license": "MIT",
"keywords": ["openclaw", "skill", "browser", "automation", "playwright"],
"openclaw": {
"emoji": "🤖",
"requires": {
"bins": ["node", "npx"],
"npm": ["playwright"]
}
}
}
Extract readable content from web pages. Use when: user wants to read article content, fetch documentation, grab product info, or get text from URLs. NOT for...
---
name: webscraper
description: "Extract readable content from web pages. Use when: user wants to read article content, fetch documentation, grab product info, or get text from URLs. NOT for: interactive sites, login-required pages, or complex JavaScript-rendered content."
homepage: https://docs.openclaw.ai/tools/web
metadata: { "openclaw": { "emoji": "🕸️", "requires": { "bins": ["curl", "node"] } } }
---
# WebScraper Skill
Extract and parse content from web pages into readable markdown or plain text.
## When to Use
✅ **USE this skill when:**
- "Read this article: [URL]"
- "What does this page say?"
- "Get the content from [URL]"
- Fetch documentation, blog posts, news articles
- Extract product information from e-commerce sites
- Grab API documentation or tutorials
- Summarize web page content
## When NOT to Use
❌ **DON'T use this skill when:**
- Login-required pages (use BrowserAgent with session)
- Heavy JavaScript-rendered content (use BrowserAgent)
- Interactive web apps (dashboards, SPAs)
- CAPTCHA-protected sites
- Sites with strict anti-bot measures
- Real-time data (stock tickers, live scores)
## Commands
### Fetch URL Content
```bash
# Using OpenClaw web_fetch tool (recommended)
# Called via tool, not direct CLI
# Basic fetch (markdown output)
web_fetch(url: "https://example.com/article")
# Text-only mode (no markdown)
web_fetch(url: "https://example.com/article", extractMode: "text")
# Limit content length
web_fetch(url: "https://example.com/article", maxChars: 5000)
```
### Using curl (fallback)
```bash
# Simple HTML fetch
curl -s "https://example.com" | html2text -width 80
# With user-agent (avoid bot detection)
curl -s -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "https://example.com"
# Fetch and extract main content (requires readability-cli)
curl -s "https://example.com" | readability
# Get just the title
curl -s "https://example.com" | grep -oP '(?<=<title>).*?(?=</title>)'
```
### Using Node.js (advanced)
```bash
# Install cheerio for HTML parsing
npm install -g cheerio
# Parse HTML with Node
node -e "
const cheerio = require('cheerio');
const html = \`\$(curl -s 'https://example.com')\`;
const \$ = cheerio.load(html);
console.log(\$('article').text());
"
```
## Response Format
When fetching content, structure responses as:
```markdown
## 📄 [Page Title]
**Source:** [URL](https://...)
**Fetched:** 2026-03-20
### Content
[Extracted content here...]
---
*Summary: [1-2 sentence summary if helpful]*
```
## Best Practices
### 1. Respect Rate Limits
```bash
# Add delay between requests
sleep 2 && curl "https://example.com/page1"
sleep 2 && curl "https://example.com/page2"
```
### 2. Use Proper User-Agent
```bash
# Desktop Chrome
curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
# Mobile Safari
curl -A "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1"
```
### 3. Handle Errors
```bash
# Check HTTP status
curl -s -o /dev/null -w "%{http_code}" "https://example.com"
# Timeout after 10 seconds
curl -s --max-time 10 "https://example.com"
# Retry on failure
curl -s --retry 3 "https://example.com"
```
### 4. Extract Specific Content
```bash
# Get all links
curl -s "https://example.com" | grep -oP 'href="\K[^"]+' | head -20
# Get images
curl -s "https://example.com" | grep -oP 'src="\K[^"]+\.(jpg|png|webp)'
# Get meta description
curl -s "https://example.com" | grep -oP '(?<=<meta name="description" content=")[^"]+'
```
## Integration with OpenClaw
### Using web_fetch Tool
```javascript
// In your agent code
const content = await web_fetch({
url: "https://example.com/article",
extractMode: "markdown", // or "text"
maxChars: 10000
});
```
### Batch Processing
For multiple URLs, process sequentially with delays:
```
URL1 → fetch → wait 2s → URL2 → fetch → wait 2s → URL3 → fetch
```
## Common Use Cases
### 1. Article Summarization
```
1. Fetch article content
2. Extract main text (remove nav, footer, ads)
3. Generate summary
4. Return with source attribution
```
### 2. Product Information
```
1. Fetch product page
2. Extract: name, price, description, specs
3. Format as structured data
4. Return comparison-ready format
```
### 3. Documentation Lookup
```
1. Fetch docs page
2. Extract relevant section
3. Search for specific topic
4. Return code examples + explanations
```
## Troubleshooting
| Problem | Solution |
|---------|----------|
| Content empty/missing | Site uses JS rendering → use BrowserAgent |
| Blocked by site | Add User-Agent, add delay, use proxy |
| Timeout | Increase timeout, check URL validity |
| Garbled text | Check charset, try text mode |
| Login required | Use BrowserAgent with session cookies |
## Related Skills
- **BrowserAgent** - For interactive/JS-heavy sites
- **web_search** - For finding URLs before fetching
- **coding-agent** - For processing extracted data
## Security Notes
⚠️ **Important:**
- Respect robots.txt
- Don't scrape personal data
- Honor copyright/terms of service
- Add delays between requests (2-5s)
- Don't overload servers
- Use official APIs when available
FILE:package.json
{
"name": "webscraper",
"version": "1.0.0",
"description": "Extract readable content from web pages. Use when: user wants to read article content, fetch documentation, grab product info, or get text from URLs.",
"homepage": "https://docs.openclaw.ai/tools/web",
"author": "OpenClaw Community",
"license": "MIT",
"keywords": ["openclaw", "skill", "web", "scraper", "fetch"],
"openclaw": {
"emoji": "🕸️",
"requires": {
"bins": ["curl", "node"]
}
}
}