@clawhub-lgy2020-29d29aeaec
Daily tech news collection and distribution system. Automated methodology for collecting, curating, and distributing industry news via scheduled cron jobs. U...
---
name: daily-news-collector
description: Daily tech news collection and distribution system. Automated methodology for collecting, curating, and distributing industry news via scheduled cron jobs. Use when setting up automated daily news digests, creating tech news roundups, or building scheduled content delivery workflows. Triggers on "daily news", "news digest", "tech roundup", "industry news collection", "automated newsletter", "news cron".
---
# Daily News Collector
Automated methodology for collecting, curating, and distributing industry news.
## Architecture: Collect → Cache → Distribute
Separate collection (slow) from distribution (fast) using a file cache.
```
07:00 Collect news → Write to weekly file (slow, ~3-5 min)
08:00 Read file → Push to chat (instant, <10 sec)
```
**Key insight**: Collection and distribution happen the same morning. News is at most ~1 hour old when delivered, not 9+ hours.
## Setup Steps
### 1. Create Weekly File
Create a markdown file for the current week: `weekly-news-YYYY-WNN.md`
### 2. Collection Cron (daily, 07:00)
Schedule an isolated `agentTurn` cron job that:
1. Checks if today's report already exists in the weekly file (anti-duplication)
2. Searches news sources via two methods:
- **Tavily API** (~80% weight): Broad search across multiple keyword groups
- **web_fetch** (~20% weight): Deep crawl of core technical blogs
3. Selects top 10-15 stories, grouped by category
4. Writes formatted report to weekly file with today's date
### 3. Distribution Cron (daily, 08:00)
Schedule an isolated `agentTurn` cron job that:
1. Reads the weekly file
2. Finds today's report by date header
3. Pushes content to chat
4. If not found, notifies user instead of generating new content
**Timing**: 1 hour gap between collection and distribution ensures collection completes before push.
## Search Strategy
### Layer 1: AI Search API (Broad Discovery, ~80% weight)
Use Tavily API (or similar) with keyword groups tailored to your domain. Example for browser/AI news (7 groups, ~27 candidates):
- Group 1: `browser Chrome Firefox Safari Edge news 2026` (5 results, topic: news)
- Group 2: `AI machine learning LLM technology news March 2026` (5 results, topic: news)
- Group 3: Local language keywords for regional coverage — e.g. `中国科技 AI 浏览器 最新消息` (5 results, topic: news)
- Group 4: `Web standards W3C WHATWG V8 JavaScript new features 2026` (3 results, topic: news)
- Group 5: Platform-specific keywords — e.g. `Android Chrome mobile browser development 2026` (3 results, topic: news)
- Group 6: Chinese AI media keywords — e.g. `APPSO 机器之心 量子位 AI 人工智能 最新` (3 results, topic: news)
- Group 7: Chinese tech industry keywords — e.g. `虎嗅 雷科技 科技行业 消费电子` (3 results, topic: news)
See [references/tavily-setup.md](references/tavily-setup.md) for Tavily API setup.
### Layer 2: Core Blog Crawl (Deep Coverage, ~20% weight)
Use `web_fetch` to directly crawl authoritative blogs. These guarantee coverage of domain-specific news that AI search might miss.
Example sources for browser/AI domain:
- WebKit Blog: `https://webkit.org/blog/`
- V8 Blog: `https://v8.dev/blog`
- Mozilla Hacks: `https://hacks.mozilla.org/`
- Chromium Blog: `https://blog.chromium.org/`
### Layer 3: Aggregator Check (Community Pulse)
Check community aggregators for trending discussions:
- Hacker News: `https://news.ycombinator.com`
### Three-Layer Information Source Model
| Layer | Weight | Purpose | Speed | Depth |
|-------|--------|---------|-------|-------|
| AI Search API | ~80% | Broad discovery | Fast (1-3s/query) | Medium |
| Core blogs | ~20% | Domain authority | Slow (5-10s/source) | Deep |
| Aggregators | Optional | Community trends | Fast | Shallow |
Each layer should contribute 2-3 stories minimum to ensure balanced coverage.
## Report Format
### Title
```
## YYYY.M.D Report Title | Day N
```
### Categories (ordered by priority)
Use domain-specific categories. Examples:
- `### 🔧 Browser Engine & Web Standards`
- `### 🦊 Firefox / Mozilla`
- `### 🤖 AI & Browser Tech`
- `### 🇨🇳 Regional Tech`
- `### 📱 Mobile / Web Dev`
### Story Format
```
N. emoji **Title** — Description (2-3 sentences with specific details like version numbers, data, impact)
- Source: full clickable URL
```
### Insights Section
```
#### 💡 Analyst Insights
💡 **Insight Title** — Analysis (2-3 sentences with actionable perspective)
```
### Footer
```
*Sources: Source1 · Source2 · Source3*
*Collected: YYYY-MM-DD HH:MM TZ*
```
## Anti-Duplication Rules
**Critical**: Multiple cron sessions may run simultaneously and cause conflicts.
1. **Collection cron**: Before writing, scan file for today's date header (## YYYY.M.D). If found, output "Report exists, skipping" and exit. Do NOT overwrite.
2. **Distribution cron**: Read-only. Never search, never write. If report missing, only notify user.
3. **Strict division**: Collection writes, distribution reads. Never cross.
## Quality Control
- Select for technical depth and impact, not quantity
- "Better 8 great stories than 15 mediocre ones"
- Cross-reference: prefer the original source when story appears in multiple feeds
- Each story must have a clickable URL
- Insights must add analysis, not just repeat the news
## Tavily Search Script (tavily-search.js)
Save this as `tavily-search.js` and run with: `node tavily-search.js "query" [max_results] [topic] [search_depth]`
Requires `TAVILY_API_KEY` environment variable.
```javascript
#!/usr/bin/env node
/**
* Tavily Search — AI-optimized search API wrapper for news collection
* Usage: node scripts/tavily-search.js "query" [max_results] [topic] [search_depth]
*
* Env: TAVILY_API_KEY required
* Output: JSON with results array (title, url, content, score)
*/
const https = require('https');
const API_KEY = process.env.TAVILY_API_KEY;
if (!API_KEY) {
console.error('Error: TAVILY_API_KEY environment variable not set');
process.exit(1);
}
const query = process.argv[2];
const maxResults = parseInt(process.argv[3]) || 5;
const topic = process.argv[4] || 'general';
const searchDepth = process.argv[5] || 'basic';
if (!query) {
console.error('Usage: node tavily-search.js "query" [max_results] [topic] [search_depth]');
process.exit(1);
}
const payload = JSON.stringify({
query,
max_results: maxResults,
topic,
search_depth: searchDepth,
include_answer: true,
include_raw_content: false,
});
const req = https.request({
hostname: 'api.tavily.com',
path: '/search',
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer API_KEY`,
},
}, (res) => {
let body = '';
res.on('data', (chunk) => body += chunk);
res.on('end', () => {
try {
const data = JSON.parse(body);
const output = {
query: data.query,
answer: data.answer || null,
results: (data.results || []).map(r => ({
title: r.title,
url: r.url,
content: r.content?.substring(0, 500),
score: r.score,
})),
};
console.log(JSON.stringify(output, null, 2));
} catch (e) {
console.error('Parse error:', e.message);
console.error('Raw:', body.substring(0, 500));
process.exit(1);
}
});
});
req.on('error', (e) => {
console.error('Request error:', e.message);
process.exit(1);
});
req.write(payload);
req.end();
```
## References
- [references/usage-guide.md](references/usage-guide.md) — Beginner-friendly usage guide with FAQ
- [references/tavily-setup.md](references/tavily-setup.md) — Tavily API setup and configuration
- [references/sources.md](references/sources.md) — Curated information source list by domain
FILE:references/sources.md
# Information Source Reference
## Source Selection Criteria
Sources must be:
1. **Authoritative** — Original publisher, not reposted content
2. **Regularly updated** — At least weekly
3. **Machine-readable** — Accessible via web_fetch (not JS-rendered only)
## Source Tiers
### Tier 1: Official Technical Blogs (Highest Authority)
| Source | URL | Coverage | Update Freq |
|--------|-----|----------|-------------|
| WebKit Blog | webkit.org/blog | Safari/WebKit engine updates | Weekly |
| V8 Blog | v8.dev/blog | Chrome JS engine, performance | Weekly |
| Mozilla Hacks | hacks.mozilla.org | Firefox, web standards, Wasm | Weekly |
| Web.dev | web.dev | Google web dev guidance | Weekly |
| Chromium Blog | blog.chromium.org | Chrome browser updates | Weekly |
| Android Developers | android-developers.googleblog.com | Android WebView, mobile web | Weekly |
| Chromium Blog | blog.chromium.org | Chrome browser updates | Weekly |
### Tier 2: Technical Media (Curated Analysis)
| Source | URL | Coverage | Update Freq |
|--------|-----|----------|-------------|
| InfoQ | infoq.cn | Enterprise tech, QCon talks | Daily |
| 36氪 | 36kr.com | Chinese tech industry, funding | Daily |
| APPSO | apps.sina.cn | AI product news, tools | Daily |
| 机器之心 | jiqizhixin.com | AI deep coverage | Daily |
| 量子位 | qbitai.com | AI industry dynamics | Daily |
| 虎嗅 | huxiu.com | Tech industry analysis | Daily |
| 雷科技 | leikeji.com | Consumer electronics, tech | Daily |
| The Register | theregister.com | Tech industry news, analysis | Daily |
| Ars Technica | arstechnica.com | Deep tech coverage | Daily |
### Tier 3: Community Aggregators (Trend Signal)
| Source | URL | Coverage | Update Freq |
|--------|-----|----------|-------------|
| Hacker News | news.ycombinator.com | Global tech community | Real-time |
| Lobste.rs | lobste.rs | Curated tech links | Daily |
| Slashdot | slashdot.org | Tech news discussion | Daily |
## Source → Category Mapping
| Category | Primary Sources | Secondary Sources |
|----------|----------------|-------------------|
| Browser Engine | WebKit Blog, V8 Blog, Chromium Blog | Hacker News |
| Web Standards | Web.dev, Mozilla Hacks, WHATWG | Hacker News |
| Firefox | Mozilla Hacks, Mozilla Blog | Hacker News |
| AI & Browser | InfoQ, 36氪, Hacker News | WebKit Blog, V8 Blog |
| Mobile/Android | Android Developers Blog | InfoQ |
| Regional Tech | 36氪, InfoQ | — |
## Anti-JS-Rendered Sources
Some major Chinese tech sites (机器之心, 部分36kr pages) are JS-rendered and cannot be fetched with simple HTTP requests. For these:
1. Try RSS feed if available
2. Use Tavily API which handles JS rendering
3. Skip if both fail — don't waste time
## Source Health Check
Periodically verify sources are still active:
- Check last post date (flag if >30 days old)
- Verify URL still resolves
- Check for domain changes or redirects
FILE:references/tavily-setup.md
# Tavily API Setup
## What is Tavily
Tavily is an AI-optimized search API that returns structured results with relevance scores. It's faster than web_fetch for broad discovery and returns results in a consistent format.
## Setup
1. Sign up at https://tavily.com
2. Get API key from dashboard
3. Set environment variable: `TAVILY_API_KEY=tvly-dev-xxxxx`
## Usage
```bash
node scripts/tavily-search.js "query" [max_results] [topic] [search_depth]
```
- `query`: Search query (required)
- `max_results`: Number of results (default: 5)
- `topic`: `general` or `news` (default: general)
- `search_depth`: `basic` or `advanced` (default: basic)
## Output
Returns JSON array with: title, url, content, score, published_date
## Best Practices
- Use `topic: news` for time-sensitive daily collection
- Use `search_depth: basic` for speed (reserve `advanced` for deep research)
- Group 3-5 keyword queries to cover different aspects of your domain
- Typical response time: 1-3 seconds per query
## Integration Pattern
Run multiple queries in sequence, then deduplicate by URL:
```bash
node scripts/tavily-search.js "browser news 2026" 5 news
node scripts/tavily-search.js "AI technology news" 5 news
node scripts/tavily-search.js "Web standards new features" 3 news
```
## Fallback
If Tavily is unavailable, fall back to `web_fetch` on core blogs. Always maintain a list of authoritative blogs as backup.
FILE:references/usage-guide.md
# Beginner Usage Guide
## What This Skill Does
Teaches your AI assistant to automatically collect, curate, and deliver a daily news digest for any topic/domain.
## Quick Start (4 Steps)
### Step 1: Get a Tavily API Key (Free)
1. Go to tavily.com and sign up with your email (1 minute)
2. Copy your API Key from the Dashboard
3. Free tier: 1000 searches/month — enough for daily use
### Step 2: Send the Skill to Your AI
Send `daily-news-collector.skill` to your AI assistant (OpenClaw, Claude, ChatGPT, etc.) and say:
> "Extract this skill and set up a daily [YOUR TOPIC] news digest for me."
Replace `[YOUR TOPIC]` with your domain:
- "frontend development" / "web development"
- "AI and machine learning"
- "blockchain and crypto"
- "gaming industry"
- "automotive tech"
- Any topic you're interested in
### Step 3: Your AI Will Configure Everything
The AI will:
1. Set up your Tavily API Key as an environment variable
2. Create two scheduled tasks (cron jobs):
- **Morning collection**: Search news → Save to file
- **Delivery**: Read file → Send to your chat
3. Customize keywords and sources for your domain
### Step 4: Receive Daily Digests ✅
That's it. Your AI will automatically deliver a daily digest every morning.
## For OpenClaw Users
Tell your AI:
> "Read daily-news-collector.skill, then create two cron jobs:
> 1. Collect [YOUR TOPIC] news daily at 7:00 AM
> 2. Push the digest to this chat at 8:00 AM"
The AI will:
- Read SKILL.md for the methodology
- Use scripts/tavily-search.js for searching
- Reference sources.md for information sources
- Set up cron jobs automatically
## FAQ
**Q: Can I skip Tavily and use something else?**
A: Yes. SKILL.md includes a pure web_fetch approach. It's slower but works without any API key.
**Q: Is the free tier enough?**
A: Yes. 7 search groups × 30 days = 210 searches/month. Free tier gives 1000.
**Q: Can I use this for non-tech topics?**
A: Absolutely. Just change the keywords. Works for parenting, fitness, investing, cooking, anything.
**Q: How good are the search results?**
A: Tavily returns relevance scores of 0.85-0.99, which is very high quality.
**Q: What if I already have another search API?**
A: You can replace Tavily with Brave Search, SerpAPI, or any search API. The methodology stays the same — just swap the search layer.
**Q: Can I get more than 10 stories per day?**
A: Yes. Change the selection count in the cron prompt. But we recommend quality over quantity.
## Core Design Principles
### Three-Layer Source Model (No Single Point of Failure)
- **AI Search (80%)** — Fast, broad coverage
- **Technical Blogs (20%)** — Deep, authoritative content
- **Community Aggregators** — Trend sensing
### Collection-Distribution Separation
- **Night/Morning**: Slow search → Save to file
- **Morning**: Read file → Instant push
- **User experience**: Immediate delivery
### Anti-Duplication
- Collection checks if today's report already exists before writing
- Distribution is read-only
- Two tasks have strict division of labor
## Customization Tips
### Changing Search Frequency
Edit the cron schedule in your AI's configuration:
- Every 12 hours: `0 7,19 * * *`
- Weekdays only: `0 7 * * 1-5`
- Three times daily: `0 7,12,18 * * *`
### Adding/Removing Sources
Edit `references/sources.md` to add or remove information sources for your domain.
### Adjusting Output Length
In the collection cron prompt, change:
- `精选最优质 10 条` → `精选最优质 15 条` (more stories)
- `2-3句详细描述` → `1-2句简短描述` (shorter descriptions)
Keep OpenClaw gateway running 24/7 on a laptop or workstation. Use when: (1) user reports gateway disconnects or crashes, (2) user asks how to make OpenClaw...
---
name: openclaw-keepalive
description: |
Keep OpenClaw gateway running 24/7 on a laptop or workstation. Use when: (1) user reports gateway disconnects or crashes, (2) user asks how to make OpenClaw run persistently, (3) user wants auto-start on boot, (4) user needs process monitoring or auto-restart, (5) user mentions "keep alive", "daemon", "service", "background", "startup", "24/7". Covers Windows/Linux/macOS with cross-platform scripts and configuration.
---
# OpenClaw Gateway Keepalive
Ensure the OpenClaw gateway process stays running across reboots, crashes, sleep, and network interruptions.
## Quick Start (Recommended)
Register as a system service with one command (all platforms):
```bash
openclaw gateway install
```
Effects:
- ✅ Auto-start on boot — no terminal needed
- ✅ Runs in background — survives terminal close
- ✅ Survives screen lock — independent of user session
- ✅ Auto-restart on crash — OS-level process supervision
Verify:
```bash
openclaw gateway status
```
Expected output:
- `Service: Scheduled Task (registered)` ✅
- `RPC probe: ok` ✅
- `Port: [127.0.0.1:18789]` ✅
## Commands
| Command | Description |
|---------|-------------|
| `openclaw gateway install` | Register as system service (one-time) |
| `openclaw gateway uninstall` | Remove system service |
| `openclaw gateway start` | Start gateway manually |
| `openclaw gateway stop` | Stop gateway |
| `openclaw gateway restart` | Restart gateway |
| `openclaw gateway status` | Check gateway status |
| `openclaw logs --follow` | Tail real-time logs |
## Platform Differences
### Windows
- Registers via **Task Scheduler** (schtasks)
- Service name: `OpenClaw Gateway`
- Survives screen lock and user session disconnect
### macOS
- Registers via **launchd** plist
- Supports `KeepAlive=true` for auto-restart
### Linux
- Registers via **systemd** service
- Supports `Restart=on-failure` policy
## Prevent Sleep (Critical)
The gateway requires the computer to stay awake. Configure power settings:
### Windows
```powershell
# List current power plans
powercfg /list
# Never sleep when plugged in
powercfg /change standby-timeout-ac 0
# Allow display off but keep system awake
powercfg /change monitor-timeout-ac 10
```
### macOS
```bash
# Prevent system sleep
sudo pmset -a sleep 0
# Prevent disk sleep
sudo pmset -a disksleep 0
# Allow display off but keep system running
sudo pmset -a displaysleep 10
```
### Linux
```bash
# Disable suspend
sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target
```
## Network Recovery
The gateway's WebSocket connection auto-reconnects after network interruptions using exponential backoff (1s → 2s → 4s → ...).
Manual restart if needed:
```bash
openclaw gateway restart
```
## Troubleshooting
### Gateway Unresponsive
```bash
openclaw gateway status # Check process status
openclaw logs --follow # View real-time logs
openclaw gateway restart # Attempt restart
```
### Service Not Registered
```bash
openclaw gateway install # Re-register
openclaw gateway status # Verify
```
### Frequent Crashes
1. Check logs with `openclaw logs --follow` for crash cause
2. Ensure sufficient memory (Node.js process needs at least 512MB free)
3. Check for port conflicts (default: 18789)
4. Update to latest version
## Advanced: External Process Supervisors
If `gateway install` is not stable enough, use external tools:
### Option A: pm2 (Recommended, Cross-Platform)
```bash
npm install -g pm2
pm2 start openclaw -- gateway start
pm2 save
pm2 startup # Generate startup command
```
### Option B: NSSM (Windows Only)
```bash
nssm install OpenClaw "C:\Program Files\nodejs\node.exe" "openclaw gateway start"
nssm set OpenClaw AppExit Default Restart
nssm start OpenClaw
```
### Option C: Windows Task Scheduler (Manual)
```powershell
# Create auto-start task with 1-minute restart on failure
schtasks /create /tn "OpenClaw Gateway" /tr "openclaw gateway start" /sc onstart /rl highest /f
# Or use Task Scheduler GUI for finer restart policies
```
## Healthcheck Scripts
Schedule these with cron or Task Scheduler for periodic health checks with auto-restart.
### Windows (healthcheck.cmd)
```cmd
@echo off
REM OpenClaw Gateway Healthcheck Script (Windows)
REM Run periodically via Task Scheduler or cron
REM Exit code 0 = healthy, 1 = failed to recover
echo [%date% %time%] Checking OpenClaw Gateway status...
openclaw gateway status >nul 2>&1
if %errorlevel% neq 0 (
echo [%date% %time%] Gateway not responding. Attempting restart...
openclaw gateway restart >nul 2>&1
timeout /t 5 >nul
openclaw gateway status >nul 2>&1
if %errorlevel% neq 0 (
echo [%date% %time%] CRITICAL: Gateway failed to restart!
exit /b 1
) else (
echo [%date% %time%] Gateway restarted successfully.
exit /b 0
)
) else (
echo [%date% %time%] Gateway is running normally.
exit /b 0
)
```
### Linux/macOS (healthcheck.sh)
```bash
#!/bin/bash
# OpenClaw Gateway Healthcheck Script (Linux/macOS)
# Usage: bash healthcheck.sh - can be scheduled via cron
# Example cron: */5 * * * * /path/to/healthcheck.sh >> /var/log/openclaw-healthcheck.log 2>&1
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$TIMESTAMP] Checking OpenClaw Gateway status..."
if openclaw gateway status > /dev/null 2>&1; then
echo "[$TIMESTAMP] Gateway is running normally."
exit 0
else
echo "[$TIMESTAMP] Gateway not responding. Attempting restart..."
openclaw gateway restart > /dev/null 2>&1
sleep 5
if openclaw gateway status > /dev/null 2>&1; then
echo "[$TIMESTAMP] Gateway restarted successfully."
exit 0
else
echo "[$TIMESTAMP] CRITICAL: Gateway failed to restart!"
exit 1
fi
fi
```
Save to a location of your choice and schedule:
- **Windows**: Task Scheduler → Run every 5 minutes
- **Linux/macOS**: `*/5 * * * * /path/to/healthcheck.sh >> /var/log/openclaw-healthcheck.log 2>&1`
Solve cross-session context storage and sync problems. Use when (1) isolated sessions (cron/subagent/heartbeat) lack context from main session, (2) long-runn...
---
name: context-persistence
description: Solve cross-session context storage and sync problems. Use when (1) isolated sessions (cron/subagent/heartbeat) lack context from main session, (2) long-running tasks need progress tracking across sessions, (3) multiple sessions need shared state, (4) users report "agent doesn't remember what happened", (5) designing memory/progress systems for AI agents. Triggers on "context sync", "session memory", "progress tracking", "cross-session state", "memory mechanism", "persist progress".
---
# Context Persistence & Cross-Session Sync
Design and implement persistent context systems that survive session boundaries.
## Core Problem
OpenClaw has multiple session types with different context access:
| Session | Memory Files | History | Cron Context |
|---------|-------------|---------|-------------|
| Main DM | ✅ All injected | ✅ Full | N/A |
| Group Chat | ❌ Not loaded | ✅ Partial | N/A |
| Cron (isolated) | ❌ None | ❌ None | ✅ Payload only |
| Heartbeat | ❌ None | ✅ Partial | N/A |
| Subagent | ❌ None | ❌ None | ✅ Task only |
**Result**: State created in one session is invisible to others unless persisted to files.
## Architecture: Three-Layer Memory System
### Layer 1: Long-Term Memory (MEMORY.md)
- **What**: Curated facts, decisions, lessons, key state
- **Who writes**: Main session only
- **Who reads**: Main session only (injected via AGENTS.md)
- **Update frequency**: On significant events, periodic review during heartbeats
- **Size limit**: < 200 lines (context budget)
### Layer 2: Daily Logs (memory/YYYY-MM-DD.md)
- **What**: Raw chronological notes, conversations, decisions
- **Who writes**: Any session that has something to record
- **Who reads**: Main session (at startup), heartbeats (for review)
- **Update frequency**: Real-time as events happen
- **Size limit**: Unbounded (not injected into context)
### Layer 3: Task Progress Files (memory/<task>-progress.md)
- **What**: Structured progress for long-running work
- **Who writes**: Any session doing the task
- **Who reads**: Any session continuing the task
- **Update frequency**: At task boundaries (session end, checkpoints)
- **Size limit**: < 300 lines
### The Key Insight
> **Files are the only cross-session communication channel.**
> In-memory state dies with the session. Files survive.
## Pattern 1: Progress Tracking
For tasks spanning multiple sessions (source code reading, data analysis, etc.)
See [references/progress-tracking.md](references/progress-tracking.md) for full template.
Essential elements:
```markdown
# <Task> Progress
- Total: X items
- Completed: Y items
- Progress: Z%
## Completed List (dedup)
## Current Position / Next Steps
## Key Findings
```
## Pattern 2: Cron Job Context Injection
Isolated cron sessions have NO access to workspace memory. Solutions:
1. **Embed context in payload message** (for <1KB state)
2. **Read from progress files** (task loads its own context)
3. **Shared state file** (coordination between sessions)
See [references/cross-session-sync.md](references/cross-session-sync.md) for patterns.
## Pattern 3: Main Session Initialization
The AGENTS.md startup sequence ensures context loading:
```
1. Read SOUL.md (persona)
2. Read USER.md (who you help)
3. Read memory/YYYY-MM-DD.md (today + yesterday)
4. If main session: also read MEMORY.md
```
This is the ONLY automated context loading. Everything else must be explicit.
## Quick Checklist
When designing context for a new task:
- [ ] Can this span multiple sessions? → Create progress file
- [ ] Does cron/subagent need this? → Embed in payload or file
- [ ] Is this a fact to remember? → Update MEMORY.md
- [ ] Is this a raw event? → Append to daily log
- [ ] Should future sessions know this? → Write it DOWN, never rely on memory
FILE:references/cross-session-sync.md
# Cross-Session Context Sync
## The Problem
OpenClaw session types and their context access:
```
┌─────────────┐ FILES ┌──────────────┐
│ Main DM │ ──────→ │ MEMORY.md │ ← Only main session reads
│ (full ctx) │ ←────── │ daily logs │ ← Main + heartbeat read
└─────────────┘ │ progress.md │ ← Any session can read/write
↓ └──────────────┘
HISTORY (dies) ↓
DISK (survives)
↑ ↓
┌─────────────┐ ┌──────────────┐
│ Cron Job │ ──────→ │ payload msg │ ← Only context source
│ (no ctx) │ no file │ (embedded) │
└─────────────┘ access └──────────────┘
```
## Solutions by Scenario
### Scenario 1: Cron Needs Main Session Context
**Problem**: Cron job runs in isolation, doesn't know user preferences, current state, etc.
**Solution A: Embed in Payload** (for small context)
```
payload.message = "你是日报推送 agent。规则:\n" +
"1. 读取 weekly-ai-browser-news-2026-W1.md\n" +
"2. 找到今天日期对应的日报\n" +
"3. 推送到 chat:oc_xxx\n" +
"当前日报格式:见 memory/daily-news-format.md"
```
**Solution B: Self-Loading from Files** (for larger context)
```
payload.message = "你是日报推送 agent。\n" +
"1. 读取 memory/daily-news-rules.md 获取完整规则\n" +
"2. 按规则执行\n" +
"3. 如果规则文件不存在,回复需要配置"
```
**Solution C: Shared State File** (for dynamic state)
- Main session writes state to `memory/shared-state.json`
- Cron reads the file at runtime
- Both sessions update as needed
### Scenario 2: Subagent Needs Progress Context
**Problem**: Subagent spawned for continuing a task doesn't know where to resume.
**Solution**: Progress file pattern
```
sessions_spawn(
task: "继续读 Chromium 源码。先读 memory/chromium-extensions-progress.md 了解进度,然后从下一步开始",
...
)
```
### Scenario 3: Heartbeat Needs Task State
**Problem**: Heartbeat needs to check if tasks need attention.
**Solution**: Structured state files
```json
// memory/heartbeat-state.json
{
"lastChecks": {
"email": 1703275200,
"calendar": 1703260800
},
"activeTasks": {
"chromium-reading": {
"progress": 7.01,
"lastSession": "2026-03-17T00:19:00+08:00",
"status": "in-progress"
}
}
}
```
Heartbeat reads this → knows what to check → updates after checking.
### Scenario 4: Multi-Channel User (DM + Group)
**Problem**: User talks in DM and group, context split across both.
**Solution**:
- MEMORY.md (main session only) = source of truth
- Daily logs = raw records from all channels
- Key rule: "DM 和群推送内容必须一致,从统一文件读取"
## Coordination Patterns
### Pattern: File-Based Producer-Consumer
```
Producer Session (e.g., nightly cron):
1. Generate content
2. Write to shared file (e.g., weekly-report.md)
3. Done
Consumer Session (e.g., morning cron):
1. Read shared file
2. Deliver content
3. Done
```
**Critical**: Both must agree on file format. Document it in MEMORY.md or a shared rules file.
### Pattern: Progress Checkpointing
```
Session starts:
Read(progress-file) → state = {completed: [...], next: X}
Session work:
Process items X..Y
state.completed += [X..Y]
state.next = Y+1
Session ends:
Write(progress-file, state)
```
### Pattern: State Lease
For preventing concurrent modification:
```json
{
"lock": {
"holder": "session-abc123",
"acquiredAt": "2026-03-17T10:00:00Z",
"expiresAt": "2026-03-17T11:00:00Z"
}
}
```
Usually overkill for OpenClaw (rare concurrent access), but useful for cron race conditions.
## Common Pitfalls
1. **"I wrote it to MEMORY.md but cron can't see it"**
→ MEMORY.md only loads in main session. Use payload embedding or separate file.
2. **"Subagent doesn't remember what I told it"**
→ Subagents have no history. Put everything in the task message or a file.
3. **"Two sessions overwrote each other's progress"**
→ Use append-only logs or atomic writes. For progress files, include full state (not deltas).
4. **"Cron keeps failing but I don't know why"**
→ Check `cron runs <jobId>` for error logs. Add verbose logging to payload.
## Debugging Cross-Session Issues
```bash
# Check cron execution history
openclaw cron list
openclaw cron runs <jobId>
# Check session context usage
# In-session: /status shows context %
# Check what files are being read
# Look at daily logs for file access records
```
FILE:references/memory-patterns.md
# Memory File Patterns
## File Hierarchy
```
workspace/
├── MEMORY.md ← Long-term curated (main session only)
├── SOUL.md ← Persona (always loaded)
├── USER.md ← User info (always loaded)
├── AGENTS.md ← Startup sequence (always loaded)
├── TOOLS.md ← Local tool notes (always loaded)
├── HEARTBEAT.md ← Periodic tasks (heartbeat reads)
├── memory/
│ ├── 2026-03-17.md ← Daily raw log (main + heartbeat)
│ ├── 2026-03-16.md ← Previous daily log
│ ├── <task>-progress.md ← Task progress (any session)
│ ├── heartbeat-state.json ← Heartbeat tracking
│ └── shared-state.json ← Cross-session shared state
└── context-persistence/ ← This skill
```
## MEMORY.md Structure
Curated long-term memory. Maximum ~200 lines.
```markdown
# MEMORY.md - Long-Term Memory
## About <User>
- Key facts, preferences, context
## Core Principles (inviolable rules)
- Privacy rules
- Behavioral constraints
## Project State
- Active projects and their status
- Key decisions made
## Technical Environment
- System details
- Tools and configs
## Lessons Learned
- Mistakes to avoid
- Patterns that work
## <Topic Sections>
- Organized by relevance
```
**Rules**:
- Only load in main session (security: don't leak to groups)
- Review and prune during heartbeats
- Extract from daily logs, don't duplicate
- Remove outdated info proactively
## Daily Log Structure (memory/YYYY-MM-DD.md)
Raw chronological notes. No size limit.
```markdown
# 2026-03-17 Daily Notes
## HH:MM - Event Title
What happened, decisions made, context captured
## HH:MM - Another Event
Details...
## Pending
- [ ] Todo items from today
- [ ] Carry forward items
```
**Rules**:
- Create at first event of the day
- Append, don't edit (append-only log)
- Include timestamps
- Note pending items at end
## AGENTS.md Startup Sequence
This is the critical context loading. Without this, sessions are blind.
```markdown
## Every Session
Before doing anything else:
1. Read SOUL.md — persona
2. Read USER.md — who you're helping
3. Read memory/YYYY-MM-DD.md (today + yesterday)
4. If MAIN SESSION: also read MEMORY.md
```
## Heartbeat State (memory/heartbeat-state.json)
```json
{
"lastChecks": {
"email": 1703275200,
"calendar": 1703260800,
"weather": null
},
"lastMemoryReview": "2026-03-16",
"activeReminders": []
}
```
Heartbeat reads this → skips recent checks → updates after checking.
## Progress File Structure
See [progress-tracking.md](progress-tracking.md) for full details.
Minimal structure:
```markdown
# <Task> Progress
- **Total**: N
- **Done**: M (X%)
## Completed (dedup)
## Next Steps
## Key Findings
```
## Evolution Pattern
Memory files should evolve:
```
Raw Event → Daily Log → Extract → MEMORY.md
↓
Progress File (if task-oriented)
↓
Key Findings → MEMORY.md
```
## Cross-Reference Pattern
Link between files instead of duplicating:
- MEMORY.md: "详细信息源见 memory/2026-03-16.md"
- Progress file: "相关决策见 MEMORY.md#项目状态"
- Daily log: "提取到 MEMORY.md 了,原始记录在这里"
FILE:references/progress-tracking.md
# Progress Tracking for Long-Running Tasks
## When to Use
Tasks that:
- Have many items to process (reading files, processing data)
- Span multiple sessions
- Need deduplication (don't re-do completed work)
- Accumulate insights over time
## Template
```markdown
# <Task Name> Progress
- **Start Date**: YYYY-MM-DD
- **Total Items**: N
- **Completed**: M
- **Progress**: X%
- **Last Session**: YYYY-MM-DD HH:MM
## Completed Items (dedup list)
1. ✅ `path/to/item1` — one-line summary
2. ✅ `path/to/item2` — one-line summary
## Current Position
- Last completed: item N
- Next: item N+1
## Key Findings (accumulated insights)
### Finding 1: Title
Details...
### Finding 2: Title
Details...
## Next Steps
- [ ] Do this
- [ ] Then that
```
## Rules
1. **Always use the same file** — don't create session-specific progress files
2. **Update at session end** — write progress before the session dies
3. **Include dedup list** — so next session knows what's done
4. **Log current position** — so next session knows where to resume
5. **Extract key findings** — raw notes → curated insights over time
## Multi-Session Workflow
### Session A (starts task)
1. Read progress file (if exists, resume; if not, create)
2. Process items from current position
3. Update completed list + position
4. Write key findings
5. Save file
### Session B (continues)
1. Read progress file → sees completed items + position
2. Resume from last position
3. Repeat update cycle
## Example: Source Code Reading
```markdown
# Chromium Extensions Source Progress
- **Total Files**: 3,137
- **Completed**: 220
- **Progress**: 7.01%
- **Last Session**: 2026-03-17 00:19
## Completed Files (dedup)
1. ✅ `README.md` — extension system overview
2. ✅ `BUILD.gn` — build config
...
## Architecture Notes
### Layer 1: Browser Process
- EventRouter: event dispatch
- ExtensionFunction: API implementation
...
### Layer 2: Renderer Process
- Dispatcher: IPC message handling
- ModuleSystem: require() pattern
...
## Next Files to Read
1. `browser/event_router.cc` — EventRouter implementation
2. `browser/process_manager.cc` — ProcessManager implementation
...
```
## Anti-Patterns
❌ **Don't**: Put progress in daily log files (hard to aggregate)
❌ **Don't**: Use multiple progress files for one task (fragmentation)
❌ **Don't**: Skip dedup info (next session re-does work)
❌ **Don't**: Forget to save at session end (progress lost)
✅ **Do**: One progress file per long-running task
✅ **Do**: Clear completed/pending state
✅ **Do**: Save at natural boundaries