@clawhub-lunarcache-3275db02be
Operate RAGFlow v0.25.x deployments through the bundled Node CLI and API client. Use when user needs to manage RAGFlow datasets, documents, uploads, parsing,...
---
name: skill-for-ragflow
description: Operate RAGFlow v0.25.x deployments through the bundled Node CLI and API client. Use when user needs to manage RAGFlow datasets, documents, uploads, parsing, chunks, retrieval, chat assistants, chat sessions, agents, agent sessions, metadata filters, model discovery, system settings, or API diagnostics. Also use when the user asks about knowledge bases, document chunking, vector retrieval, or RAG workflows and the current context explicitly involves a RAGFlow server or deployment.
version: 1.0.0
metadata:
openclaw:
requires:
bins:
- node
env:
- RAGFLOW_URL
- RAGFLOW_API_KEY
primaryEnv: RAGFLOW_API_KEY
homepage: https://github.com/LunarCache/ragflow-skill
---
# RAGFlow Skill
Use this skill to operate RAGFlow through `scripts/ragflow.js`. The CLI wraps the full v0.25.x REST API - every action goes through `node {baseDir}/scripts/ragflow.js <command> [options]`. Prefer `--json` on any command when the output will be parsed or chained into another step.
## Requirements
- Set `RAGFLOW_URL` and `RAGFLOW_API_KEY` in the environment or this skill's `.env`.
- Use Node.js to run bundled scripts.
- Set `RAGFLOW_WEB_TOKEN` only when `list-models` needs a web-session token for `/v1/llm/my_llms`.
- Tune chunk deletion retries only when needed with `RAGFLOW_DELETE_CHUNK_RETRIES` and `RAGFLOW_DELETE_CHUNK_RETRY_DELAY_MS`.
- Tune the chunk deletion diagnostic script only when needed with `RAGFLOW_REPRO_TIMEOUT_MS`, `RAGFLOW_REPRO_DELETE_RETRIES`, `RAGFLOW_REPRO_DELETE_RETRY_DELAY_MS`, and `RAGFLOW_REPRO_EMBEDDING_MODEL`.
## Quick Command Reference
| Scenario | Commands |
|----------|----------|
| **Knowledge base setup** | `create-dataset`, `list-datasets`, `get-dataset`, `update-dataset`, `delete-datasets` |
| **Document ingestion** | `upload-documents`, `list-documents`, `get-document`, `update-document`, `delete-documents`, `metadata-summary` |
| **Parsing & chunking** | `start-parsing`, `stop-parsing`, `wait-parsing`, `list-chunks`, `add-chunk`, `update-chunk`, `delete-chunks` |
| **Direct retrieval** | `retrieve` |
| **Chat assistant** | `create-chat`, `list-chats`, `get-chat`, `update-chat`, `patch-chat`, `delete-chats` |
| **Chat sessions** | `create-session`, `list-sessions`, `delete-sessions`, `chat`, `chat-session` |
| **Agent** | `create-agent`, `list-agents`, `get-agent`, `update-agent`, `delete-agents` |
| **Agent sessions** | `create-agent-session`, `list-agent-sessions`, `delete-agent-sessions`, `agent-chat` |
| **Model discovery** | `list-models` |
| **System** | `system-version`, `get-log-levels`, `set-log-level` |
## Common Workflows
### Full RAG pipeline (upload -> parse -> retrieve)
1. `create-dataset --name "My KB" --chunk-method naive`
2. `upload-documents --dataset <id> --files ./doc1.pdf ./doc2.txt`
3. `start-parsing --dataset <id> --doc-ids <doc_id1> <doc_id2>`
4. `wait-parsing --dataset <id> --doc-ids <doc_id1> <doc_id2>`
5. `retrieve --question "What is X?" --datasets <id>`
### Chat assistant with sessions
1. `create-chat --name "Q&A" --datasets <id> --llm-id <model>`
2. `create-session --chat <chat_id>`
3. `chat-session --chat <chat_id> --session <session_id> --question "Hello"`
### Agent workflow
1. `create-agent --title "Assistant" --dsl @agent_dsl.json`
2. `create-agent-session --agent <agent_id>`
3. `agent-chat --agent <agent_id> --session <session_id> --question "Hello"`
## Workflow Decision Guide
The first step in any RAGFlow operation is resolving the target resource ID. After that, choose the right path:
1. **Need CLI syntax or option details?** -> Read [references/COMMANDS.md](references/COMMANDS.md) - it's organized by workflow scenario with full option tables.
2. **Editing client code or checking request/response shapes?** -> Read [references/API.md](references/API.md) - it has code examples for every `RagflowClient` method.
3. **A command failed?** -> Read [references/TROUBLESHOOTING.md](references/TROUBLESHOOTING.md) - common errors with causes and fixes.
4. **Formatting output for the user?** -> Read [references/REFERENCE.md](references/REFERENCE.md) - consistent response templates and status labels.
## Key Constraints
- **Destructive deletes need confirmation.** RAGFlow deletes are immediate and irreversible. Confirm before running `delete-datasets`, `delete-documents`, `delete-chunks`, `delete-chats`, `delete-sessions`, or `delete-agents` - unless the resource is a temporary artifact you created in the same workflow and the user asked you to clean up.
- **Upload and parsing are separate steps.** RAGFlow does not auto-parse on upload because different documents may need different chunk methods. Upload first, adjust config if needed, then start parsing explicitly.
- **Use v0.25.x route shapes from the references.** The RAGFlow API has changed between versions. The routes and payloads in the reference docs match v0.25.x - inventing fallback payloads will produce errors on real servers.
- **Tenant model identifiers use the `model@provider` format.** When creating datasets with `--embedding-model`, the server expects the full identifier, for example `text-embedding-v4@Tongyi-Qianwen`, not just the model name. Use `list-models` to discover the correct identifiers.
- **Chat sessions use the API-key SDK route.** `chat-session` posts to `/api/v1/chats/{chat_id}/completions` with `session_id` in the body. This is the v0.25.x API-key route - the login-session frontend route is intentionally avoided.
- **Agent DSL requires specific top-level fields.** RAGFlow agents need `components`, `history`, `path`, `retrieval`, `globals`, and `graph` in the DSL. Missing fields cause `KeyError` at creation time.
- **Chunk deletion may need retries.** The v0.25.0 server can return `rm_chunk deleted chunks 0, expect N` due to document-store refresh lag even when the chunk exists. The CLI handles this automatically - it retries after confirming the chunk is still visible via exact ID lookup. If retries still fail, run `scripts/repro-delete-chunks.js` for a clean diagnosis.
## Output Format
When presenting results to the user, follow the templates in [references/REFERENCE.md](references/REFERENCE.md). Key conventions:
- **3+ items with attributes** -> Table, abbreviating long IDs
- **Sequential steps** -> Numbered list
- **Parsing status** -> Use labels: `UNSTART`, `RUNNING`, `CANCEL`, `DONE`, `FAIL`
- **Search results** -> Table with similarity scores, content as quote blocks
- **Errors** -> Show code and human-readable message
FILE:agents/openai.yaml
interface:
display_name: "RAGFlow Skill"
short_description: "Operate RAGFlow v0.25.x deployments via bundled CLI for datasets, documents, retrieval, chat assistants, agents, and diagnostics."
default_prompt: "Use $ragflow-skill to manage RAGFlow deployments: datasets, document uploads and parsing, chunks, retrieval, chat assistants and sessions, agents, model discovery, system settings, and API diagnostics."
FILE:lib/api.js
const https = require("https");
const http = require("http");
const fs = require("fs");
const path = require("path");
const DEFAULT_TIMEOUT = 30000;
const MAX_RETRIES = 2;
const RETRY_DELAY = 1000;
const DELETE_CHUNK_RETRIES = 3;
const DELETE_CHUNK_RETRY_DELAY = 1000;
class RagflowClient {
constructor(baseUrl, apiKey, options = {}) {
if (!baseUrl) throw new Error("RAGFLOW_URL is required");
if (!apiKey) throw new Error("RAGFLOW_API_KEY is required");
this.baseUrl = baseUrl.replace(/\/+$/, "");
this.apiKey = apiKey;
this.apiPrefix = "/api/v1";
this.timeout = options.timeout || DEFAULT_TIMEOUT;
this.maxRetries = options.maxRetries !== undefined ? options.maxRetries : MAX_RETRIES;
this.webToken = options.webToken || process.env.RAGFLOW_WEB_TOKEN || "";
}
async request(method, endpoint, options = {}) {
const isMultipart = options.files && options.files.length > 0;
const headers = {
Authorization: `Bearer options.authToken || this.apiKey`,
};
let body;
if (isMultipart) {
const boundary = "----FormBoundary" + Math.random().toString(36).slice(2);
headers["Content-Type"] = `multipart/form-data; boundary=boundary`;
body = this._buildMultipart(options.files, options.json || {}, boundary);
} else if (options.json) {
headers["Content-Type"] = "application/json";
body = JSON.stringify(options.json);
}
if (body) {
headers["Content-Length"] = Buffer.byteLength(body);
}
let lastError;
const attempts = this.maxRetries + 1;
for (let attempt = 1; attempt <= attempts; attempt++) {
try {
return await this._doRequest(method, endpoint, headers, body, options.timeout, options.apiPrefix);
} catch (err) {
lastError = err;
if (this._isRetryable(err) && attempt < attempts) {
await this._delay(RETRY_DELAY * attempt);
continue;
}
throw err;
}
}
throw lastError;
}
_isRetryable(err) {
if (err.code === "ECONNRESET" || err.code === "ECONNREFUSED" || err.code === "ETIMEDOUT") return true;
if (err.message && (err.message.includes("socket hang up") || err.message.includes("network"))) return true;
if (err.code && err.code >= 500) return true;
return false;
}
_delay(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
_isDeleteChunkVisibilityError(err) {
return err && /rm_chunk deleted chunks 0, expect \d+/.test(err.message || "");
}
_isNotFoundError(err) {
return err && (err.code === 404 || err.status === 404 || /not found/i.test(err.message || ""));
}
_decorateDeleteChunkError(err, details) {
err.delete_chunk_details = details;
const existing = details.existing_chunk_ids || [];
const missing = details.missing_chunk_ids || [];
const parts = [];
if (existing.length) parts.push(`existing: existing.join(",")`);
if (missing.length) parts.push(`missing: missing.join(",")`);
if (parts.length) err.message = `err.message (parts.join("; "))`;
return err;
}
_doRequest(method, endpoint, headers, body, timeoutOverride, apiPrefix = this.apiPrefix) {
const url = this._buildUrl(endpoint, apiPrefix);
const timeout = timeoutOverride || this.timeout;
return new Promise((resolve, reject) => {
const mod = url.protocol === "https:" ? https : http;
const req = mod.request(url, { method, headers }, (res) => {
const chunks = [];
res.on("data", (c) => chunks.push(c));
res.on("end", () => {
const raw = Buffer.concat(chunks).toString("utf-8");
try {
const data = JSON.parse(raw);
if (data.code === 0) {
resolve(data.data !== undefined ? data.data : {});
} else {
const err = new Error(data.message || `API error code data.code`);
err.code = data.code;
err.status = res.statusCode;
reject(err);
}
} catch {
const err = new Error(`Invalid JSON response: raw.slice(0, 200)`);
err.status = res.statusCode;
reject(err);
}
});
});
req.setTimeout(timeout, () => {
req.destroy(new Error(`Request timed out after timeoutms`));
});
req.on("error", (err) => {
err.message = `Request failed: err.message`;
reject(err);
});
if (body) req.write(body);
req.end();
});
}
async validateConnection() {
try {
await this.listDatasets({ page: 1, page_size: 1 });
return true;
} catch {
return false;
}
}
_buildMultipart(files, fields, boundary) {
const parts = [];
for (const [key, value] of Object.entries(fields)) {
parts.push(Buffer.from(
`--boundary\r\nContent-Disposition: form-data; name="key"\r\n\r\nvalue`
));
parts.push(Buffer.from("\r\n"));
}
for (const file of files) {
const basename = path.basename(file);
const content = fs.readFileSync(file);
const header = `--boundary\r\nContent-Disposition: form-data; name="file"; filename="basename"\r\nContent-Type: application/octet-stream\r\n\r\n`;
parts.push(Buffer.from(header, "utf-8"));
parts.push(content);
parts.push(Buffer.from("\r\n"));
}
parts.push(Buffer.from(`--boundary--\r\n`, "utf-8"));
return Buffer.concat(parts);
}
async _streamRequest(method, endpoint, json, timeoutOverride) {
const url = this._buildUrl(endpoint, this.apiPrefix);
const body = JSON.stringify(json);
const timeout = timeoutOverride || this.timeout * 3;
const headers = {
Authorization: `Bearer this.apiKey`,
"Content-Type": "application/json",
"Content-Length": Buffer.byteLength(body),
};
return new Promise((resolve, reject) => {
const mod = url.protocol === "https:" ? https : http;
const req = mod.request(url, { method, headers }, (res) => {
const chunks = [];
res.on("data", (c) => chunks.push(c));
res.on("end", () => {
const raw = Buffer.concat(chunks).toString("utf-8");
const lines = raw.split("\n");
let lastAnswer = "";
let reference = null;
for (const line of lines) {
if (line.startsWith("data:")) {
const payload = line.slice(5).trim();
if (!payload || payload === "[DONE]") continue;
try {
const data = JSON.parse(payload);
if (data.event) {
if (data.event === "message" && data.data?.content !== undefined) {
lastAnswer += data.data.content;
}
if ((data.event === "message_end" || data.event === "done") && data.data?.reference !== undefined) {
reference = data.data.reference;
}
continue;
}
if (data.code === 0) {
if (data.data?.answer !== undefined) lastAnswer = data.data.answer;
if (data.data?.content !== undefined) lastAnswer += data.data.content;
if (data.data?.reference) reference = data.data.reference;
} else {
reject(new Error(data.message || data.data?.message || `API error code data.code`));
return;
}
} catch {
// Skip invalid JSON lines
}
}
}
resolve({ answer: lastAnswer, reference });
});
});
req.setTimeout(timeout, () => {
req.destroy(new Error(`Chat request timed out after timeoutms`));
});
req.on("error", (err) => {
err.message = `Request failed: err.message`;
reject(err);
});
req.write(body);
req.end();
});
}
// ── Dataset ──
async listDatasets(params = {}) {
const query = this._buildQuery(params);
return this.request("GET", `/datasets?query.toString()`);
}
async getDataset(datasetId) {
const result = await this.listDatasets({ id: datasetId });
if (!result || result.length === 0) {
throw new Error(`Dataset datasetId not found`);
}
return result[0];
}
async createDataset(data) {
return this.request("POST", "/datasets", { json: data });
}
async updateDataset(datasetId, data) {
return this.request("PUT", `/datasets/datasetId`, { json: data });
}
async deleteDatasets(ids) {
return this.request("DELETE", "/datasets", { json: { ids } });
}
// ── Document ──
async uploadDocuments(datasetId, files, params = {}) {
return this.request("POST", `/datasets/datasetId/documents`, {
files,
json: params,
});
}
async listDocuments(datasetId, params = {}) {
const query = this._buildQuery(params);
return this.request("GET", `/datasets/datasetId/documents?query.toString()`);
}
async deleteDocuments(datasetId, ids) {
return this.request("DELETE", `/datasets/datasetId/documents`, { json: { ids } });
}
async getDocument(datasetId, documentId) {
const result = await this.listDocuments(datasetId, { id: documentId });
if (!result || !result.docs || result.docs.length === 0) {
throw new Error(`Document documentId not found in dataset datasetId`);
}
return result.docs[0];
}
async updateDocument(datasetId, documentId, data) {
return this.request("PATCH", `/datasets/datasetId/documents/documentId`, { json: data });
}
// ── Chunk / Parsing ──
async startParsing(datasetId, documentIds) {
return this.request("POST", `/datasets/datasetId/chunks`, {
json: { document_ids: documentIds },
});
}
async stopParsing(datasetId, documentIds) {
return this.request("DELETE", `/datasets/datasetId/chunks`, {
json: { document_ids: documentIds },
});
}
async waitForParsing(datasetId, documentIds, options = {}) {
const interval = options.interval || 3000;
const maxWait = options.maxWait || 120000;
const start = Date.now();
while (Date.now() - start < maxWait) {
const docs = await this.listDocuments(datasetId);
const targets = (docs.docs || docs || []).filter((d) => documentIds.includes(d.id));
const allDone = targets.every((d) => d.run === "DONE" || d.run === "FAIL");
if (allDone) return targets;
await this._delay(interval);
}
throw new Error(`Parsing timed out after maxWaitms`);
}
async listChunks(datasetId, documentId, params = {}) {
const query = this._buildQuery(params);
return this.request(
"GET",
`/datasets/datasetId/documents/documentId/chunks?query.toString()`
);
}
async getChunk(datasetId, documentId, chunkId) {
const result = await this.listChunks(datasetId, documentId, { id: chunkId });
const chunks = result?.chunks || (Array.isArray(result) ? result : []);
const chunk = chunks.find((item) => item.id === chunkId) || chunks[0];
if (!chunk) throw new Error(`Chunk not found: datasetId/chunkId`);
return chunk;
}
async _existingChunkIds(datasetId, documentId, chunkIds) {
const existing = [];
const missing = [];
for (const chunkId of chunkIds) {
try {
await this.getChunk(datasetId, documentId, chunkId);
existing.push(chunkId);
} catch (err) {
if (this._isNotFoundError(err)) {
missing.push(chunkId);
continue;
}
throw err;
}
}
return { existing, missing };
}
async addChunk(datasetId, documentId, data) {
return this.request("POST", `/datasets/datasetId/documents/documentId/chunks`, {
json: data,
});
}
async deleteChunks(datasetId, documentId, chunkIds, options = {}) {
const uniqueChunkIds = [...new Set(chunkIds)].filter(Boolean);
const maxRetries = options.maxRetries !== undefined
? options.maxRetries
: Number(process.env.RAGFLOW_DELETE_CHUNK_RETRIES || DELETE_CHUNK_RETRIES);
const retryDelay = options.retryDelay !== undefined
? options.retryDelay
: Number(process.env.RAGFLOW_DELETE_CHUNK_RETRY_DELAY_MS || DELETE_CHUNK_RETRY_DELAY);
let lastError;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await this.request("DELETE", `/datasets/datasetId/documents/documentId/chunks`, {
json: { chunk_ids: uniqueChunkIds },
});
} catch (err) {
lastError = err;
if (!this._isDeleteChunkVisibilityError(err)) {
throw err;
}
const { existing, missing } = await this._existingChunkIds(datasetId, documentId, uniqueChunkIds);
const details = {
attempt,
max_retries: maxRetries,
existing_chunk_ids: existing,
missing_chunk_ids: missing,
};
if (existing.length === 0 || attempt >= maxRetries) {
throw this._decorateDeleteChunkError(err, details);
}
if (typeof options.onRetry === "function") {
options.onRetry(details);
}
await this._delay(retryDelay);
}
}
throw lastError;
}
async updateChunk(datasetId, documentId, chunkId, data) {
return this.request(
"PUT",
`/datasets/datasetId/documents/documentId/chunks/chunkId`,
{ json: data }
);
}
// ── Retrieval ──
async retrieve(params) {
return this.request("POST", "/retrieval", { json: params });
}
// ── Chat Assistant ──
async listChatAssistants(params = {}) {
const query = this._buildQuery(params);
return this.request("GET", `/chats?query.toString()`);
}
async createChatAssistant(data) {
return this.request("POST", "/chats", { json: data });
}
async updateChatAssistant(chatId, data) {
return this.request("PUT", `/chats/chatId`, { json: data });
}
async patchChatAssistant(chatId, data) {
return this.request("PATCH", `/chats/chatId`, { json: data });
}
async deleteChatAssistants(ids) {
return this.request("DELETE", "/chats", { json: { ids } });
}
async getChatAssistant(chatId) {
return this.request("GET", `/chats/chatId`);
}
// ── Session ──
async listSessions(chatId, params = {}) {
const query = this._buildQuery(params);
return this.request("GET", `/chats/chatId/sessions?query.toString()`);
}
async createSession(chatId, data = {}) {
return this.request("POST", `/chats/chatId/sessions`, { json: data });
}
async deleteSessions(chatId, ids) {
return this.request("DELETE", `/chats/chatId/sessions`, { json: { ids } });
}
// ── Chat (Conversation) ──
async chat(chatId, sessionId, question, params = {}) {
return this._streamRequest(
"POST", `/chats/chatId/completions`,
{ question, session_id: sessionId, ...params }
);
}
async chatSession(chatId, sessionId, data = {}) {
const payload = { ...data, session_id: sessionId };
if (!payload.question && payload.messages) {
const userMessages = Array.isArray(payload.messages)
? payload.messages.filter((message) => message && message.role === "user" && message.content)
: [];
const lastUserMessage = userMessages[userMessages.length - 1];
if (lastUserMessage) payload.question = lastUserMessage.content;
}
delete payload.messages;
if (!payload.question) {
throw new Error("chatSession requires question or messages with a user message");
}
if (data.stream === false || data.stream === "false") {
return this.request("POST", `/chats/chatId/completions`, { json: payload });
}
return this._streamRequest("POST", `/chats/chatId/completions`, payload);
}
// ── Agent ──
async listAgents(params = {}) {
const query = this._buildQuery(params);
return this.request("GET", `/agents?query.toString()`);
}
async createAgent(data) {
return this.request("POST", "/agents", { json: data });
}
async updateAgent(agentId, data) {
return this.request("PUT", `/agents/agentId`, { json: data });
}
async deleteAgents(ids) {
return Promise.all(ids.map((id) => this.request("DELETE", `/agents/id`)));
}
async getAgent(agentId) {
const result = await this.listAgents({ id: agentId });
if (!result || result.length === 0) {
throw new Error(`Agent agentId not found`);
}
return result[0];
}
// ── Agent Session ──
async listAgentSessions(agentId, params = {}) {
const query = this._buildQuery(params);
return this.request("GET", `/agents/agentId/sessions?query.toString()`);
}
async createAgentSession(agentId, data = {}) {
return this.request("POST", `/agents/agentId/sessions`, { json: data });
}
async deleteAgentSessions(agentId, ids) {
return this.request("DELETE", `/agents/agentId/sessions`, { json: { ids } });
}
// ── Agent Chat ──
async agentChat(agentId, sessionId, question, params = {}) {
return this._streamRequest(
"POST", `/agents/agentId/completions`,
{ question, session_id: sessionId, ...params }
);
}
// ── LLM Models ──
async listModels(params = {}) {
const query = this._buildQuery(params);
const authToken = this.webToken || this.apiKey;
return this.request("GET", `/llm/my_llms?query.toString()`, {
apiPrefix: "/v1",
authToken,
});
}
// ── Helpers ──
async metadataSummary(datasetId, docIds = []) {
const query = new URLSearchParams();
if (docIds.length) {
query.set("doc_ids", docIds.join(","));
}
const suffix = query.toString();
return this.request("GET", `/datasets/datasetId/metadata/summarysuffix ? `?${suffix` : ""}`);
}
async getSystemVersion() {
return this.request("GET", "/system/version");
}
async getLogLevels() {
return this.request("GET", "/system/config/log");
}
async setLogLevel(pkgName, level) {
return this.request("PUT", "/system/config/log", { json: { pkg_name: pkgName, level } });
}
_buildUrl(endpoint, apiPrefix = this.apiPrefix) {
const prefix = apiPrefix.replace(/\/+$/, "");
const suffix = endpoint.startsWith("/") ? endpoint : `/endpoint`;
return new URL(prefix + suffix, this.baseUrl);
}
_buildQuery(params) {
const query = new URLSearchParams();
for (const [k, v] of Object.entries(params)) {
if (v === undefined || v === null) continue;
if (Array.isArray(v)) {
for (const item of v) {
if (item !== undefined && item !== null) query.append(k, String(item));
}
} else {
query.set(k, String(v));
}
}
return query;
}
}
function loadEnv() {
const envPath = path.resolve(__dirname, "..", ".env");
if (fs.existsSync(envPath)) {
const content = fs.readFileSync(envPath, "utf-8");
for (const line of content.split("\n")) {
const trimmed = line.trim();
if (!trimmed || trimmed.startsWith("#")) continue;
const eq = trimmed.indexOf("=");
if (eq > 0) {
const key = trimmed.slice(0, eq);
if (process.env[key] === undefined || process.env[key] === "") {
process.env[key] = trimmed.slice(eq + 1);
}
}
}
}
}
function createClient(options = {}) {
loadEnv();
return new RagflowClient(
process.env.RAGFLOW_URL,
process.env.RAGFLOW_API_KEY,
options
);
}
module.exports = { RagflowClient, createClient };
FILE:references/API.md
# Programmatic API and Configuration
## Table of Contents
- [Setup](#setup)
- [Dataset](#dataset)
- [Document](#document)
- [Parsing](#parsing)
- [Chunk](#chunk)
- [Retrieval](#retrieval)
- [Chat Assistant](#chat-assistant)
- [Session](#session)
- [Chat Conversation](#chat-conversation)
- [Agent](#agent)
- [Agent Session](#agent-session)
- [Agent Chat](#agent-chat)
- [LLM Models](#llm-models)
- [System](#system)
- [Utility](#utility)
- [Configuration](#configuration)
## Setup
```javascript
const { createClient } = require("{baseDir}/lib/api.js");
const client = createClient();
```
`createClient()` reads `RAGFLOW_URL` and `RAGFLOW_API_KEY` from the environment and then fills missing values from the bundled `.env` file. Existing environment variables take precedence. See [Configuration](#configuration) below.
## Dataset
```javascript
// List datasets (supports pagination: page, page_size, id, name)
const datasets = await client.listDatasets({ page: 1, page_size: 10 });
// Get a single dataset by ID
const dataset = await client.getDataset("<dataset_id>");
// Create a dataset
const dataset = await client.createDataset({
name: "Tech Docs",
chunk_method: "naive",
});
// Update a dataset
await client.updateDataset("<dataset_id>", { name: "New Name" });
// Delete datasets by IDs
await client.deleteDatasets(["<id1>", "<id2>"]);
```
## Document
```javascript
// Upload documents
await client.uploadDocuments("<dataset_id>", ["./report.pdf", "./notes.txt"]);
// List documents (supports page, page_size, id, name, orderby, desc, keywords, suffix, types, run, metadata, metadata_condition, return_empty_metadata)
const docs = await client.listDocuments("<dataset_id>");
// Get a single document by ID
const doc = await client.getDocument("<dataset_id>", "<doc_id>");
// Update a document
await client.updateDocument("<dataset_id>", "<doc_id>", {
name: "Renamed",
parser_config: { pages: [[1, 2]] },
chunk_method: "knowledge_graph",
enabled: 1,
meta_fields: { author: "Alice" },
});
// Delete documents by IDs
await client.deleteDocuments("<dataset_id>", ["<doc_id1>", "<doc_id2>"]);
```
RAGFlow v0.25.0 defines document updates as `PATCH /api/v1/datasets/{dataset_id}/documents/{document_id}`. `updateDocument()` sends that request directly.
You can also filter documents by metadata:
```javascript
const docs = await client.listDocuments("<dataset_id>", {
metadata_condition: JSON.stringify({
logic: "and",
conditions: [{ name: "status", comparison_operator: "=", value: "published" }],
}),
});
```
You can also summarize metadata across documents:
```javascript
const summary = await client.metadataSummary("<dataset_id>", ["<doc_id1>", "<doc_id2>"]);
// Returns: { summary: [...] }
```
## Parsing
```javascript
// Start parsing (returns immediately)
await client.startParsing("<dataset_id>", ["<doc_id1>"]);
// Stop parsing
await client.stopParsing("<dataset_id>", ["<doc_id1>"]);
// Wait for parsing to complete (polls until DONE or FAIL)
// Documents stuck in CANCEL keep polling until timeout.
const results = await client.waitForParsing("<dataset_id>", ["<doc_id1>"], {
interval: 3000, // poll interval in ms (default: 3000)
maxWait: 120000, // max wait in ms (default: 120000)
});
```
## Chunk
```javascript
// List chunks (supports pagination: page, page_size, keywords)
const chunks = await client.listChunks("<dataset_id>", "<doc_id>");
// Exact chunk lookup by ID
const chunk = await client.getChunk("<dataset_id>", "<doc_id>", "<chunk_id>");
// Add a chunk
await client.addChunk("<dataset_id>", "<doc_id>", {
content: "Custom chunk text",
important_keywords: ["keyword1", "keyword2"],
});
// Update a chunk
await client.updateChunk("<dataset_id>", "<doc_id>", "<chunk_id>", {
content: "Updated content",
important_keywords: ["new_keyword"],
});
// Delete chunks by IDs
await client.deleteChunks("<dataset_id>", "<doc_id>", ["<chunk_id1>"]);
```
`deleteChunks()` retries the v0.25.0 transient `rm_chunk deleted chunks 0, expect N` response only after `getChunk()` confirms the target chunk still exists. This distinguishes document-store refresh delay from a genuinely missing chunk. Override with:
```javascript
await client.deleteChunks("<dataset_id>", "<doc_id>", ["<chunk_id1>"], {
maxRetries: 0,
retryDelay: 1000,
});
```
When the CLI is run with `--json`, `delete-chunks` wraps the server result with diagnostic fields that pipelines can consume directly:
```json
{
"result": {},
"requested_chunk_ids": ["<chunk_id1>"],
"existing_chunk_ids": ["<chunk_id1>"],
"missing_chunk_ids": [],
"visibility_checked": true,
"retry_count": 1,
"retries": [
{
"attempt": 0,
"next_attempt": 2,
"max_retries": 3,
"existing_chunk_ids": ["<chunk_id1>"],
"missing_chunk_ids": []
}
]
}
```
On a final delete visibility failure, the CLI exits non-zero and emits JSON with `error`, `requested_chunk_ids`, `existing_chunk_ids`, `missing_chunk_ids`, `retry_count`, `retries`, and `delete_chunk_diagnostics`.
All CLI command failures in `--json` mode use the same top-level error envelope:
```json
{
"error": {
"message": "API Error: Unauthorized",
"raw_message": "Unauthorized",
"code": 401,
"status": 401,
"command": "list-models"
}
}
```
Command-specific diagnostics, such as delete chunk visibility checks, are added as extra top-level fields alongside `error`.
## Retrieval
```javascript
const results = await client.retrieve({
question: "What is deep learning?",
dataset_ids: ["<dataset_id>"],
similarity_threshold: 0.3,
page_size: 5,
top_k: 1024,
vector_similarity_weight: 0.7,
keyword: true,
use_kg: false,
rerank_id: "<rerank_model_id>",
});
```
## Chat Assistant
```javascript
// List chat assistants (supports pagination)
const chats = await client.listChatAssistants({ page: 1, page_size: 10 });
// Get a single chat assistant by ID
const chat = await client.getChatAssistant("<chat_id>");
// Create a chat assistant
const chat = await client.createChatAssistant({
name: "Tech Q&A",
dataset_ids: ["<dataset_id>"],
llm_id: "<model_name>",
prompt_config: { system: "You are a helpful assistant." },
similarity_threshold: 0.3,
top_n: 5,
});
// Update a chat assistant
await client.updateChatAssistant("<chat_id>", { name: "New Name" });
// Patch a chat assistant
await client.patchChatAssistant("<chat_id>", { prompt_config: { system: "Use the dataset" } });
// Delete chat assistants by IDs
await client.deleteChatAssistants(["<chat_id1>", "<chat_id2>"]);
```
## Session
```javascript
// List sessions for a chat assistant
const sessions = await client.listSessions("<chat_id>", { page: 1 });
// Create a session
const session = await client.createSession("<chat_id>", { name: "Q&A Session" });
// Delete sessions by IDs
await client.deleteSessions("<chat_id>", ["<session_id1>"]);
```
## Chat Conversation
```javascript
// Chat with an assistant (streaming SSE, returns final answer + references)
const answer = await client.chat("<chat_id>", "<session_id>", "What is RAG?");
// Returns: { answer: "...", reference: { ... } }
// Chat with a session (messages payload)
const sessionAnswer = await client.chatSession("<chat_id>", "<session_id>", {
question: "Summarize the policy.",
});
// Convenience form: the last user message becomes `question`
const sessionAnswerFromMessages = await client.chatSession("<chat_id>", "<session_id>", {
messages: [
{ role: "system", content: "Follow the dataset." },
{ role: "user", content: "Summarize the policy." },
],
});
```
`chatSession()` uses `POST /api/v1/chats/{chat_id}/completions` with `session_id` in the JSON body. This is the API-key SDK route in v0.25.0. The login-session frontend route `/api/v1/chats/{chat_id}/sessions/{session_id}/completions` is intentionally not used here.
## Agent
```javascript
// List agents (supports pagination)
const agents = await client.listAgents({ page: 1 });
// Get a single agent by ID
const agent = await client.getAgent("<agent_id>");
// Create an agent (requires DSL payload with components, history, path, retrieval, globals, and graph)
const agent = await client.createAgent({ title: "My Agent", dsl: { ... } });
// Update an agent
await client.updateAgent("<agent_id>", { title: "Updated Agent" });
// Delete agents by IDs
await client.deleteAgents(["<agent_id1>"]);
```
## Agent Session
```javascript
// List agent sessions
const sessions = await client.listAgentSessions("<agent_id>", { page: 1 });
// Create an agent session
const session = await client.createAgentSession("<agent_id>", { name: "Session 1" });
// Delete agent sessions by IDs
await client.deleteAgentSessions("<agent_id>", ["<session_id1>"]);
```
## Agent Chat
```javascript
// Chat with an agent (streaming SSE, returns final answer + references)
const answer = await client.agentChat("<agent_id>", "<session_id>", "Analyze the data");
// Returns: { answer: "...", reference: { ... } }
```
## LLM Models
```javascript
// List available models
const models = await client.listModels({ include_details: true });
// Returns: { groups: [...], total: <n> }
```
RAGFlow v0.25.0 exposes model discovery at `/v1/llm/my_llms`. If the endpoint requires web-session authentication, provide `RAGFLOW_WEB_TOKEN`.
## System
```javascript
// Get the server version
const version = await client.getSystemVersion();
// Inspect and update log levels
const levels = await client.getLogLevels();
await client.setLogLevel("ragflow", "DEBUG");
```
## Utility
```javascript
// Validate connection to RAGFlow server
const ok = await client.validateConnection();
// Returns: true | false
```
## Configuration
Set the following environment variables to configure the API client:
```bash
export RAGFLOW_URL=https://your-ragflow-instance.com
export RAGFLOW_API_KEY=ragflow-xxxxx
```
`RAGFLOW_URL` should be the server root, for example `http://127.0.0.1:9380`; the client adds `/api/v1` for REST endpoints and `/v1` for model discovery.
Optional environment variables:
```bash
export RAGFLOW_WEB_TOKEN=<web-session-token>
```
- `RAGFLOW_WEB_TOKEN` is used only for `/v1/llm/my_llms` model discovery when the deployment requires web-session authentication.
FILE:references/COMMANDS.md
# Command Reference
Full CLI reference for `scripts/ragflow.js`, organized by workflow scenario rather than resource type.
Use `--json` on any command to suppress status text and print only machine-readable JSON.
JSON-valued options such as `--parser-config`, `--prompt-config`, and `--dsl` accept either inline JSON or `@path/to/file.json`.
On command failure with `--json`, the CLI exits non-zero and prints a structured error envelope:
```json
{
"error": {
"message": "API Error: Unauthorized",
"raw_message": "Unauthorized",
"code": 401,
"status": 401,
"command": "list-models"
}
}
```
## Table of Contents
- [Scenario Map](#scenario-map)
- [Knowledge Base Setup](#knowledge-base-setup)
- [Document Ingestion](#document-ingestion)
- [Parsing and Chunking](#parsing-and-chunking)
- [Information Retrieval](#information-retrieval)
- [RAG Assistant Operation](#rag-assistant-operation)
- [Agent Operation](#agent-operation)
- [Discovery and Configuration](#discovery-and-configuration)
- [System Operations](#system-operations)
## Scenario Map
| Scenario | Use it for |
|---|---|
| [Knowledge Base Setup](#knowledge-base-setup) | Create and maintain datasets before ingesting files |
| [Document Ingestion](#document-ingestion) | Upload, inspect, update, and remove source documents |
| [Parsing and Chunking](#parsing-and-chunking) | Turn documents into searchable chunks and manage chunk content |
| [Information Retrieval](#information-retrieval) | Query datasets directly without creating a chat assistant |
| [RAG Assistant Operation](#rag-assistant-operation) | Create chat assistants, manage sessions, and run Q&A |
| [Agent Operation](#agent-operation) | Create tool-capable agents, manage sessions, and run agent chat |
| [Discovery and Configuration](#discovery-and-configuration) | Inspect available LLM models before choosing downstream workflows |
| [System Operations](#system-operations) | Read version and log-level settings |
## Knowledge Base Setup
Use this section when the user is creating or maintaining the dataset container that everything else depends on.
```bash
node {baseDir}/scripts/ragflow.js create-dataset --name "Tech Docs" --chunk-method naive
node {baseDir}/scripts/ragflow.js create-dataset --name "Tech Docs" --embedding-model "text-embedding-v4@Tongyi-Qianwen"
node {baseDir}/scripts/ragflow.js list-datasets
node {baseDir}/scripts/ragflow.js get-dataset --id <id>
node {baseDir}/scripts/ragflow.js update-dataset --id <id> --name "New Name"
node {baseDir}/scripts/ragflow.js delete-datasets --ids <id1> <id2>
```
When you provide `--embedding-model` to a real v0.25.0 server, use the tenant model identifier format `<model_name>@<provider>`, for example `text-embedding-v4@Tongyi-Qianwen`. Use `list-models` to discover available model/provider pairs.
Typical flow:
1. `create-dataset`
2. `list-datasets` or `get-dataset`
3. `update-dataset` if metadata or chunk method needs adjustment
4. `delete-datasets` only after explicit confirmation
## Document Ingestion
Use this section when the user needs to get files into a dataset or inspect document-level metadata.
```bash
node {baseDir}/scripts/ragflow.js upload-documents --dataset <id> --files ./doc1.pdf ./doc2.txt
node {baseDir}/scripts/ragflow.js list-documents --dataset <id> --metadata-condition @metadata_condition.json
node {baseDir}/scripts/ragflow.js get-document --dataset <id> --id <doc_id>
node {baseDir}/scripts/ragflow.js update-document --dataset <id> --id <doc_id> --name "New Name"
node {baseDir}/scripts/ragflow.js update-document --dataset <id> --id <doc_id> --parser-config @parser_config.json --meta-fields @meta_fields.json
node {baseDir}/scripts/ragflow.js metadata-summary --dataset <id> --doc-ids <doc_id1> <doc_id2>
node {baseDir}/scripts/ragflow.js delete-documents --dataset <id> --ids <doc_id1>
```
`update-document` follows the v0.25.0 RAGFlow source and sends `PATCH /api/v1/datasets/{dataset_id}/documents/{document_id}`. It accepts `name`, `parser_config`, `chunk_method`, `enabled`, and `meta_fields`.
`list-documents` supports `metadata`, `metadata_condition`, `return_empty_metadata`, `orderby`, `desc`, `suffix`, `types`, and `run`.
Use this when you need to:
- upload raw source files
- inspect a document before parsing
- rename or adjust a document record
- delete a document by explicit ID
## Parsing and Chunking
Use this section after document upload, or when the user wants to control chunk generation directly.
### Parsing workflow
```bash
node {baseDir}/scripts/ragflow.js start-parsing --dataset <id> --doc-ids <doc_id1>
node {baseDir}/scripts/ragflow.js stop-parsing --dataset <id> --doc-ids <doc_id1>
node {baseDir}/scripts/ragflow.js wait-parsing --dataset <id> --doc-ids <doc_id1> --timeout 120
```
Parsing status is observable through `list-documents` by inspecting the `run` field: `UNSTART`, `RUNNING`, `CANCEL`, `DONE`, `FAIL`.
The `run` filter accepts either numeric values (`0`-`4`) or these text labels.
### Chunk operations
```bash
node {baseDir}/scripts/ragflow.js list-chunks --dataset <id> --document <doc_id>
node {baseDir}/scripts/ragflow.js list-chunks --dataset <id> --document <doc_id> --id <chunk_id>
node {baseDir}/scripts/ragflow.js add-chunk --dataset <id> --document <doc_id> --content "chunk content"
node {baseDir}/scripts/ragflow.js update-chunk --dataset <id> --document <doc_id> --chunk <chunk_id> --content "updated content"
node {baseDir}/scripts/ragflow.js delete-chunks --dataset <id> --document <doc_id> --chunk-ids <id1>
node {baseDir}/scripts/repro-delete-chunks.js
```
`add-chunk` writes directly to the document store and returns the generated chunk ID immediately. On Elasticsearch/OpenSearch-style stores, exact `GET` by ID can see a new chunk before search/delete-by-query can see it because insert uses the store refresh cycle. `delete-chunks` handles this by retrying the v0.25.0 transient response `rm_chunk deleted chunks 0, expect N` only after an exact ID lookup confirms the target chunk still exists. Tune this with `RAGFLOW_DELETE_CHUNK_RETRIES` and `RAGFLOW_DELETE_CHUNK_RETRY_DELAY_MS`.
With `--json`, `delete-chunks` returns a structured envelope instead of the bare server result:
```json
{
"result": {},
"requested_chunk_ids": ["<chunk_id1>"],
"existing_chunk_ids": ["<chunk_id1>"],
"missing_chunk_ids": [],
"visibility_checked": true,
"retry_count": 1,
"retries": [
{
"attempt": 0,
"next_attempt": 2,
"max_retries": 3,
"existing_chunk_ids": ["<chunk_id1>"],
"missing_chunk_ids": []
}
]
}
```
If exact-ID checks prove that a target chunk is missing, the command exits non-zero and emits JSON containing `error`, `requested_chunk_ids`, `existing_chunk_ids`, `missing_chunk_ids`, `retry_count`, `retries`, and `delete_chunk_diagnostics`.
If a real server still returns `rm_chunk deleted chunks 0, expect 1` after retries, run `scripts/repro-delete-chunks.js`. The repro creates temporary resources, tries immediate deletion and retry/backoff without the client-side retry wrapper, prints a JSON diagnosis, and removes its dataset.
### Chunk methods
| Method | Use Case |
|--------|----------|
| `naive` | General chunking (default) |
| `manual` | Manual documents |
| `qna` | Q&A pairs |
| `table` | Table data |
| `paper` | Academic papers |
| `book` | Books |
| `laws` | Legal documents |
| `presentation` | Presentations |
| `picture` | Image OCR |
| `one` | Whole document as one chunk |
## Information Retrieval
Use this section when the user wants retrieval results directly instead of creating a chat assistant or agent.
```bash
# Basic retrieval
node {baseDir}/scripts/ragflow.js retrieve --question "What is RAG?" --datasets <id>
# Advanced retrieval with keyword + knowledge graph
node {baseDir}/scripts/ragflow.js retrieve \
--question "machine learning algorithms" \
--datasets <id1> <id2> \
--similarity 0.3 \
--top-n 10 \
--rerank <rerank_model_id> \
--keyword \
--kg
```
### Retrieval parameters
| Parameter | Short | Default | Description |
|-----------|-------|---------|-------------|
| `--question` | `-q` | - | Search question (required) |
| `--datasets` | `-d` | - | Dataset IDs |
| `--similarity` | `-s` | 0.2 | Similarity threshold (0-1) |
| `--top-n` | `-n` | 5 | Number of retrieved chunks; sent as RAGFlow `page_size` |
| `--top-k` | `-k` | 1024 | Number of candidates |
| `--vector-weight` | `-w` | 0.3 | Vector similarity weight (0-1) |
| `--rerank` | `-r` | - | Rerank model ID |
| `--keyword` | | false | Enable keyword search |
| `--kg` | | false | Enable knowledge graph; sent as RAGFlow `use_kg` |
| `--cross-langs` | | - | Cross-language targets |
## RAG Assistant Operation
Use this section when the user wants a dataset-backed chat assistant with reusable sessions.
### Assistant lifecycle
```bash
node {baseDir}/scripts/ragflow.js list-chats
node {baseDir}/scripts/ragflow.js create-chat --name "Tech Q&A" --datasets <id1> <id2> --llm-id <model_id>
node {baseDir}/scripts/ragflow.js get-chat --id <chat_id>
node {baseDir}/scripts/ragflow.js update-chat --id <chat_id> --name "New Name"
node {baseDir}/scripts/ragflow.js update-chat --id <chat_id> --prompt-config @prompt_config.json
node {baseDir}/scripts/ragflow.js patch-chat --id <chat_id> --prompt "Use the dataset"
node {baseDir}/scripts/ragflow.js delete-chats --ids <id1> <id2>
```
### Session management
```bash
node {baseDir}/scripts/ragflow.js list-sessions --chat <chat_id>
node {baseDir}/scripts/ragflow.js create-session --chat <chat_id> --name "New Session"
node {baseDir}/scripts/ragflow.js delete-sessions --chat <chat_id> --ids <session_id1>
```
### Ask the assistant
```bash
node {baseDir}/scripts/ragflow.js chat --chat <chat_id> --session <session_id> --question "Hello"
node {baseDir}/scripts/ragflow.js chat-session --chat <chat_id> --session <session_id> --messages @session_messages.json
node {baseDir}/scripts/ragflow.js chat-session --chat <chat_id> --session <session_id> --question "Hello"
```
`chat-session` uses the API-key SDK route `POST /api/v1/chats/{chat_id}/completions` with `session_id` in the body. The v0.25.0 login-session route `POST /api/v1/chats/{chat_id}/sessions/{session_id}/completions` is not used by this CLI. When `--messages` is provided, the CLI extracts the last `role: "user"` message as `question`; use `--question` when you already have a single user prompt.
Use this path when the user wants multi-turn Q&A over documents without building a full agent workflow.
## Agent Operation
Use this section when the user wants a more autonomous workflow built around an agent DSL and agent sessions.
### Agent lifecycle
```bash
node {baseDir}/scripts/ragflow.js list-agents
node {baseDir}/scripts/ragflow.js create-agent --title "Assistant" --dsl '<dsl_json>'
node {baseDir}/scripts/ragflow.js create-agent --title "Assistant" --dsl @agent_dsl.json
node {baseDir}/scripts/ragflow.js get-agent --id <agent_id>
node {baseDir}/scripts/ragflow.js update-agent --id <agent_id> --title "New Name"
node {baseDir}/scripts/ragflow.js delete-agents --ids <id1> <id2>
```
Agents require a DSL workflow definition. A minimal DSL:
```json
{
"components": {
"begin": {
"obj": {
"component_name": "Begin",
"params": {
"mode": "conversational",
"prologue": "Hello"
}
},
"downstream": [],
"upstream": []
}
},
"history": [],
"path": [],
"retrieval": [],
"globals": {
"sys.query": "",
"sys.user_id": "",
"sys.conversation_turns": 0,
"sys.files": [],
"sys.history": []
},
"graph": {
"edges": [],
"nodes": [
{
"id": "begin",
"data": { "name": "Begin" },
"position": { "x": 0, "y": 0 },
"type": "begin"
}
]
}
}
```
### Agent session management
```bash
node {baseDir}/scripts/ragflow.js list-agent-sessions --agent <agent_id>
node {baseDir}/scripts/ragflow.js create-agent-session --agent <agent_id>
node {baseDir}/scripts/ragflow.js delete-agent-sessions --agent <agent_id> --ids <session_id1>
```
### Ask the agent
```bash
node {baseDir}/scripts/ragflow.js agent-chat --agent <agent_id> --session <session_id> --question "Hello"
```
Use this path when the user explicitly wants an agent workflow instead of a simple retrieval assistant.
## Discovery and Configuration
Use this section when the user needs to inspect available models before creating datasets, assistants, or agents.
```bash
node {baseDir}/scripts/ragflow.js list-models
node {baseDir}/scripts/ragflow.js list-models --include-details
node {baseDir}/scripts/ragflow.js list-models --group-by factory
node {baseDir}/scripts/ragflow.js list-models --all
```
This is usually the first stop when the user is troubleshooting model availability or deciding which model to use downstream.
RAGFlow v0.25.0 exposes model discovery at `/v1/llm/my_llms`. If the endpoint requires web-session authentication, provide `RAGFLOW_WEB_TOKEN`.
## System Operations
Use this section when the user needs version or log-level configuration.
```bash
node {baseDir}/scripts/ragflow.js system-version
node {baseDir}/scripts/ragflow.js get-log-levels
node {baseDir}/scripts/ragflow.js set-log-level --pkg-name ragflow --level INFO
```
FILE:references/REFERENCE.md
# Output Format Reference
Style guide for consistent RAGFlow skill responses.
Apply this reference to all user-facing output for this skill.
## Format Decision Matrix
| Information Type | Format | Use Case |
|------------------|--------|----------|
| Multiple items (3+) with attributes | Table | Datasets list, search results, model list |
| Sequential steps | Numbered List | Upload workflow, procedures |
| Features/options | Bullet List | Capability overview, model features |
| Structured data | JSON Code Block | API responses, DSL definitions |
| Document content | Quote Block | Retrieved chunks |
| Single object properties | Definition List | Dataset details, model details |
| Status | Status marker + text | Parsing tables use `UNSTART` / `RUNNING` / `CANCEL` / `DONE` / `FAIL`; generic success can use `OK` |
## Common Formats
### Tables (3+ items)
```markdown
| Dataset | Docs | Chunks | Status |
|---------|------|--------|--------|
| delete | 4 | 53 | OK |
```
- Abbreviate long IDs: `abc123...`
- Use consistent status labels: `UNSTART`, `RUNNING`, `CANCEL`, `DONE`, `FAIL` for parsing; `OK` for generic success; `WARN` for warnings
### Bullet Lists
```markdown
- Upload documents to dataset
- Start parsing to generate chunks
```
- Start with verbs for actions
- Keep nesting shallow
### Numbered Lists
```markdown
1. Create dataset
2. Upload files
3. Start parsing
```
- Use for sequential procedures
### Status Labels
| Label | Meaning |
|-------|---------|
| `UNSTART` | Not started |
| `RUNNING` | In progress |
| `CANCEL` | Cancelled |
| `DONE` | Finished successfully |
| `FAIL` | Failed / Unavailable |
| `OK` | Success / Available |
| `WARN` | Warning |
| `EMPTY` | Empty |
## Response Templates
**List operations:**
```markdown
**Datasets** (3 total)
| Name | ID | Status | Chunks |
|------|-----|--------|--------|
| docs | abc... | OK | 152 |
```
**Search results:**
```markdown
**Results** (2 found)
| # | Source | Similarity | Content |
|---|--------|------------|---------|
| 1 | doc.pdf | 85% | excerpt... |
```
**Object details:**
```markdown
**Dataset Details**
**ID:** `1ce917df20e411f191a984ba59bc54d9`
**Name:** delete
**Chunks:** 53
```
**Model list:**
```markdown
**Available Models** (12 in 3 groups)
| Group | Model | Type | Factory |
|-------|-------|------|---------|
| chat | gpt-4 | chat | OpenAI |
| chat | gpt-3.5 | chat | OpenAI |
| embedding | text-embedding-3 | embedding | OpenAI |
```
**Parsing status:**
```markdown
**Parsing Status**
| Document | Status | Chunks |
|----------|--------|--------|
| report.pdf | DONE | 45 |
| notes.md | RUNNING | - |
| data.xlsx | FAIL | 0 |
```
**Chat/Agent conversation:**
```markdown
**Response**
> The answer to your question...
**Sources:** 3 chunks from 2 documents
- doc1.pdf (similarity: 0.85)
- doc2.md (similarity: 0.72)
```
**Chunk operations:**
```markdown
**Chunks** (15 total)
| ID | Content Preview | Keywords |
|----|-----------------|----------|
| abc... | First 50 chars... | term1, term2 |
```
**Session operations:**
```markdown
**Sessions** (3 total)
| ID | Name | Messages | Created |
|----|------|----------|---------|
| ses_abc... | Q&A Session | 12 | 2024-01-15 |
```
**Single resource details:**
```markdown
**Document Details**
**ID:** `doc_abc123...`
**Name:** report.pdf
**Status:** DONE
**Chunks:** 45
**Created:** 2024-01-15
```
## Error Response Format
**API errors:**
```markdown
**Error**
**Code:** `123`
**Message:** Dataset not found
```
**Connection errors:**
```markdown
**Connection Failed**
Cannot connect to RAGFlow server at `http://xxx/xx`
Check the RAGFLOW_URL environment variable and server availability
```
**Validation errors:**
```markdown
**Validation Error**
Missing required parameter: `--dataset`
Run with `--help` for usage information
```
FILE:references/TROUBLESHOOTING.md
# Troubleshooting
| Problem | Cause | Solution |
|---------|-------|----------|
| "Model not authorized" | Requested model is not configured for this tenant, or the model/factory name does not match | Verify the model name, factory suffix, and tenant model settings; use a configured model from `list-models` |
| "Embedding model identifier must follow `<model_name>@<provider>` format" | `create-dataset --embedding-model` used only the model name | Use a full identifier from `list-models`, for example `text-embedding-v4@Tongyi-Qianwen` |
| "Malformed JSON syntax" | The request body is not valid JSON | Fix the JSON payload or file contents before retrying |
| "Can't stop parsing" | The document is already done or has not started yet | Only running documents can be stopped |
| "No DSL data in request" | Agent creation omitted the DSL payload | Pass `--dsl` with a valid JSON object |
| "Invalid DSL JSON string." | The DSL payload is not valid JSON | Pass a JSON object or `@file.json` that can be normalized by the agent parser |
| `KeyError('path')` from `create-agent-session` | Agent DSL is missing runtime fields required by RAGFlow Canvas | Include top-level `history`, `path`, `retrieval`, `globals`, and a `graph.nodes[].data.name` entry; see `COMMANDS.md` |
| "Dataset doesn't own parsed file" | The dataset has no parsed documents yet | Upload files and start parsing before creating a chat assistant |
| "Chunk not found" | Chunk ID does not exist or belongs to another document | Verify the chunk ID with `list-chunks` before `update-chunk` or `delete-chunks` |
| `rm_chunk deleted chunks 0, expect 1` | The RAGFlow server accepted the chunk ID but document-store search/delete visibility lagged behind exact ID visibility | `delete-chunks` retries only after exact ID lookup confirms the chunk exists; with `--json`, consume `existing_chunk_ids` and `missing_chunk_ids`; tune with `RAGFLOW_DELETE_CHUNK_RETRIES` and `RAGFLOW_DELETE_CHUNK_RETRY_DELAY_MS`, or run `node scripts/repro-delete-chunks.js` for a clean diagnosis |
| "`content` is required" | Empty content was submitted to chunk update or set | Provide non-empty content; omitting `--content` on the CLI keeps the existing chunk text |
| `chat-session` returns Not Found for `/sessions/<session_id>/completions` | That route is the login-session frontend route, not the API-key SDK route | Use the current CLI or client, which posts to `/api/v1/chats/<chat_id>/completions` with `session_id` in the body |
| `list-models` returns Unauthorized | The `/v1/llm/my_llms` endpoint needs a web-session token in some deployments | Set `RAGFLOW_WEB_TOKEN` |
| `update-document` gets Method Not Allowed | The server does not match the v0.25.0 source expected by this skill | Use a v0.25.0-compatible server; document updates are sent with `PATCH` |
| Connection refused | `RAGFLOW_URL` is wrong or the server is down | Verify the URL and that the RAGFlow server is running |
In `--json` mode, command failures are emitted on stdout as `{ "error": { "message", "raw_message", "code", "status", "command" } }` and exit non-zero. `delete-chunks` may also include `existing_chunk_ids`, `missing_chunk_ids`, `retry_count`, `retries`, and `delete_chunk_diagnostics`.
FILE:scripts/ragflow.js
#!/usr/bin/env node
const fs = require("fs");
const path = require("path");
const { createClient } = require("../lib/api.js");
const args = process.argv.slice(2);
const command = args[0];
const outputMode = { jsonOnly: false };
// ── Output helpers ──
const C = {
reset: "\x1b[0m",
bold: "\x1b[1m",
dim: "\x1b[2m",
red: "\x1b[31m",
green: "\x1b[32m",
yellow: "\x1b[33m",
blue: "\x1b[34m",
cyan: "\x1b[36m",
};
function ok(msg) {
if (outputMode.jsonOnly) return;
console.log(`C.greenOKC.reset msg`);
}
function warn(msg) {
if (outputMode.jsonOnly) return;
console.log(`C.yellowWARNC.reset msg`);
}
function fail(msg) {
console.error(`C.redERRORC.reset msg`);
}
function info(msg) {
if (outputMode.jsonOnly) return;
console.log(`C.cyanINFOC.reset msg`);
}
function json(data) {
console.log(JSON.stringify(data, null, 2));
}
function cliErrorMessage(err) {
const message = err.message || String(err);
if (err.code === "ECONNREFUSED") {
return "Cannot connect to RAGFlow server. Check RAGFLOW_URL in .env";
}
if (err.code === "ECONNRESET") {
return "Connection reset by RAGFlow server. The server may be restarting";
}
if (message.includes("timed out")) {
return "Request timed out. The server may be slow or unreachable";
}
if (message.includes("RAGFLOW_URL is required") || message.includes("RAGFLOW_API_KEY is required")) {
return `message. Configure .env file with RAGFLOW_URL and RAGFLOW_API_KEY`;
}
if (err.code) {
return `API Error: message`;
}
return message;
}
function uniqueList(value) {
return [...new Set(listValue(value))];
}
function cloneDeleteChunkDetails(details) {
return {
attempt: details.attempt,
next_attempt: details.attempt + 2,
max_retries: details.max_retries,
existing_chunk_ids: details.existing_chunk_ids || [],
missing_chunk_ids: details.missing_chunk_ids || [],
};
}
function deleteChunkJsonPayload(result, requestedChunkIds, retryDetails) {
const latest = retryDetails[retryDetails.length - 1] || {};
return {
result,
requested_chunk_ids: uniqueList(requestedChunkIds),
existing_chunk_ids: latest.existing_chunk_ids || [],
missing_chunk_ids: latest.missing_chunk_ids || [],
visibility_checked: retryDetails.length > 0,
retry_count: retryDetails.length,
retries: retryDetails,
};
}
function commandErrorJsonPayload(err) {
const details = err.delete_chunk_details || {};
const retries = err.delete_chunk_retries || [];
const payload = {
error: {
message: cliErrorMessage(err),
raw_message: err.message,
code: err.code,
status: err.status,
command,
},
};
if (err.delete_chunk_details) {
payload.requested_chunk_ids = err.delete_chunk_requested_chunk_ids || [];
payload.existing_chunk_ids = details.existing_chunk_ids || [];
payload.missing_chunk_ids = details.missing_chunk_ids || [];
payload.visibility_checked = Array.isArray(details.existing_chunk_ids) || Array.isArray(details.missing_chunk_ids);
payload.retry_count = retries.length;
payload.retries = retries;
payload.delete_chunk_diagnostics = details;
}
return payload;
}
function requireOpt(opts, name) {
if (!opts[name]) {
throw new Error(`Missing required option: --name.replace(/([A-Z])/g, "-$1").toLowerCase()`);
}
return opts[name];
}
// ── Arg parser ──
function parseArgs(argv) {
const opts = { _: [] };
let i = 0;
const aliases = {
d: "datasets",
h: "help",
k: "topK",
n: "topN",
q: "question",
r: "rerank",
s: "similarity",
w: "vectorWeight",
};
const multiKeys = new Set(["files", "ids", "docIds", "chunkIds", "datasets", "suffix", "types", "run"]);
while (i < argv.length) {
if (argv[i].startsWith("-") && argv[i] !== "-") {
const isLong = argv[i].startsWith("--");
const rawKey = isLong ? argv[i].replace(/^--/, "") : argv[i].replace(/^-/, "");
const key = aliases[rawKey] || rawKey.replace(/-([a-z])/g, (_, c) => c.toUpperCase());
if (multiKeys.has(key)) {
const values = [];
let j = i + 1;
while (j < argv.length && !(argv[j].startsWith("-") && argv[j] !== "-")) {
values.push(argv[j]);
j++;
}
opts[key] = values;
i = j;
} else if (i + 1 < argv.length && !argv[i + 1].startsWith("--")) {
opts[key] = argv[i + 1];
i += 2;
} else {
opts[key] = true;
i += 1;
}
} else {
opts._.push(argv[i]);
i += 1;
}
}
return opts;
}
function listValue(value) {
const values = Array.isArray(value) ? value : String(value).split(",");
return values
.flatMap((item) => String(item).split(","))
.map((item) => item.trim())
.filter(Boolean);
}
function jsonOption(value, optionName) {
const source = String(value);
const raw = source.startsWith("@")
? fs.readFileSync(path.resolve(process.cwd(), source.slice(1)), "utf-8")
: source;
try {
return JSON.parse(raw);
} catch (err) {
throw new Error(`Invalid JSON for optionName: err.message`);
}
}
function jsonStringOption(value, optionName) {
return JSON.stringify(jsonOption(value, optionName));
}
function questionFromMessages(messages) {
if (!Array.isArray(messages)) {
throw new Error("--messages must be a JSON array");
}
const userMessages = messages.filter((message) => message && message.role === "user" && message.content);
const lastUserMessage = userMessages[userMessages.length - 1];
return lastUserMessage ? String(lastUserMessage.content) : "";
}
function applyChatOptions(data, opts) {
if (opts.datasets) data.dataset_ids = listValue(opts.datasets);
if (opts.llm || opts.llmId) data.llm_id = opts.llmId || opts.llm;
if (opts.promptConfig) data.prompt_config = jsonOption(opts.promptConfig, "--prompt-config");
if (opts.prompt) data.prompt_config = { ...(data.prompt_config || {}), system: opts.prompt };
if (opts.similarityThreshold) data.similarity_threshold = Number(opts.similarityThreshold);
if (opts.topN) data.top_n = Number(opts.topN);
if (opts.topK) data.top_k = Number(opts.topK);
if (opts.vectorWeight) data.vector_similarity_weight = Number(opts.vectorWeight);
if (opts.rerank) data.rerank_id = opts.rerank;
}
function buildParams(opts, map) {
const params = {};
for (const [optKey, paramKey, transform] of map) {
if (opts[optKey] !== undefined) {
params[paramKey] = transform ? transform(opts[optKey]) : opts[optKey];
}
}
return params;
}
// ── Dataset ──
async function createDataset(opts) {
const client = createClient();
const name = requireOpt(opts, "name");
const data = { name };
if (opts.chunkMethod) data.chunk_method = opts.chunkMethod;
if (opts.embeddingModel) data.embedding_model = opts.embeddingModel;
if (opts.permission) data.permission = opts.permission;
if (opts.description) data.description = opts.description;
info(`Creating dataset "name"...`);
const result = await client.createDataset(data);
ok(`Dataset created: result.id`);
json(result);
}
async function listDatasets(opts) {
const client = createClient();
const params = buildParams(opts, [
["page", "page", Number],
["pageSize", "page_size", Number],
["name", "name"],
["id", "id"],
]);
const result = await client.listDatasets(params);
if (Array.isArray(result) && result.length === 0) {
warn("No datasets found");
if (!outputMode.jsonOnly) return;
} else {
ok(`Found "datasets"`);
}
json(result);
}
async function getDataset(opts) {
const client = createClient();
const id = requireOpt(opts, "id");
info(`Fetching dataset id...`);
const result = await client.getDataset(id);
ok(`Dataset: result.name`);
json(result);
}
async function updateDataset(opts) {
const client = createClient();
const id = requireOpt(opts, "id");
const data = {};
if (opts.name) data.name = opts.name;
if (opts.chunkMethod) data.chunk_method = opts.chunkMethod;
if (opts.permission) data.permission = opts.permission;
if (opts.description) data.description = opts.description;
info(`Updating dataset id...`);
const result = await client.updateDataset(id, data);
ok("Dataset updated");
json(result);
}
async function deleteDatasets(opts) {
const client = createClient();
const ids = requireOpt(opts, "ids");
info(`Deleting ids.length dataset(s)...`);
const result = await client.deleteDatasets(ids);
ok("Datasets deleted");
json(result);
}
// ── Document ──
async function uploadDocuments(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const files = requireOpt(opts, "files");
info(`Uploading files.length file(s) to dataset dataset...`);
const result = await client.uploadDocuments(dataset, files);
ok(`Uploaded files.length file(s)`);
json(result);
}
async function listDocuments(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const params = buildParams(opts, [
["page", "page", Number],
["pageSize", "page_size", Number],
["orderby", "orderby"],
["desc", "desc"],
["keywords", "keywords"],
["id", "id"],
["name", "name"],
["suffix", "suffix", listValue],
["types", "types", listValue],
["run", "run", listValue],
["createTimeFrom", "create_time_from", Number],
["createTimeTo", "create_time_to", Number],
]);
if (opts.metadataCondition !== undefined) params.metadata_condition = jsonStringOption(opts.metadataCondition, "--metadata-condition");
if (opts.metadata !== undefined) params.metadata = jsonStringOption(opts.metadata, "--metadata");
if (opts.returnEmptyMetadata !== undefined) params.return_empty_metadata = opts.returnEmptyMetadata !== "false" && opts.returnEmptyMetadata !== false;
const result = await client.listDocuments(dataset, params);
if (Array.isArray(result) && result.length === 0) {
warn("No documents found");
if (!outputMode.jsonOnly) return;
} else {
ok(`Found "documents"`);
}
json(result);
}
async function deleteDocuments(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const ids = requireOpt(opts, "ids");
info(`Deleting ids.length document(s)...`);
const result = await client.deleteDocuments(dataset, ids);
ok("Documents deleted");
json(result);
}
async function getDocument(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const id = requireOpt(opts, "id");
info(`Fetching document id...`);
const result = await client.getDocument(dataset, id);
ok(`Document: result.name`);
json(result);
}
async function updateDocument(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const id = requireOpt(opts, "id");
const data = {};
if (opts.name) data.name = opts.name;
if (opts.parserConfig) data.parser_config = jsonOption(opts.parserConfig, "--parser-config");
if (opts.chunkMethod) data.chunk_method = opts.chunkMethod;
if (opts.enabled !== undefined) data.enabled = Number(opts.enabled);
if (opts.metaFields) data.meta_fields = jsonOption(opts.metaFields, "--meta-fields");
info(`Updating document id in dataset dataset...`);
const result = await client.updateDocument(dataset, id, data);
ok("Document updated");
json(result);
}
// ── Parsing ──
async function startParsing(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const docIds = requireOpt(opts, "docIds");
info(`Starting parsing for docIds.length document(s)...`);
const result = await client.startParsing(dataset, docIds);
ok("Parsing started");
json(result);
}
async function stopParsing(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const docIds = requireOpt(opts, "docIds");
info(`Stopping parsing for docIds.length document(s)...`);
const result = await client.stopParsing(dataset, docIds);
ok("Parsing stopped");
json(result);
}
async function waitParsing(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const docIds = requireOpt(opts, "docIds");
const maxWait = opts.timeout ? Number(opts.timeout) * 1000 : 120000;
info(`Waiting for docIds.length document(s) to finish parsing (timeout: maxWait / 1000s)...`);
const result = await client.waitForParsing(dataset, docIds, { maxWait });
const failed = result.filter((d) => d.run === "FAIL");
if (failed.length > 0) {
warn(`failed.length document(s) failed parsing`);
} else {
ok("All documents parsed successfully");
}
json(result.map((d) => ({ id: d.id, name: d.name, run: d.run, chunk_count: d.chunk_count })));
}
// ── Chunk ──
async function listChunks(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const document = requireOpt(opts, "document");
const params = buildParams(opts, [
["page", "page", Number],
["pageSize", "page_size", Number],
["keywords", "keywords"],
["id", "id"],
]);
const result = await client.listChunks(dataset, document, params);
if (Array.isArray(result) && result.length === 0) {
warn("No chunks found");
if (!outputMode.jsonOnly) return;
} else {
ok(`Found "chunks"`);
}
json(result);
}
async function addChunk(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const document = requireOpt(opts, "document");
const content = requireOpt(opts, "content");
const data = { content };
if (opts.keywords) data.important_keywords = listValue(opts.keywords);
info("Adding chunk...");
const result = await client.addChunk(dataset, document, data);
ok("Chunk added");
json(result);
}
async function deleteChunks(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const document = requireOpt(opts, "document");
const chunkIds = requireOpt(opts, "chunkIds");
info(`Deleting chunkIds.length chunk(s)...`);
const retries = [];
let result;
try {
result = await client.deleteChunks(dataset, document, chunkIds, {
onRetry(details) {
const retry = cloneDeleteChunkDetails(details);
retries.push(retry);
warn(
`delete-chunks returned 0 deletions, but exact ID lookup still found retry.existing_chunk_ids.length chunk(s): retry.existing_chunk_ids.join(", "). Retrying (retry.next_attempt/retry.max_retries + 1)...`
);
},
});
} catch (err) {
if (err.delete_chunk_details) {
err.delete_chunk_requested_chunk_ids = uniqueList(chunkIds);
err.delete_chunk_retries = retries;
}
throw err;
}
ok("Chunks deleted");
json(outputMode.jsonOnly ? deleteChunkJsonPayload(result, chunkIds, retries) : result);
}
async function updateChunk(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const document = requireOpt(opts, "document");
const chunk = requireOpt(opts, "chunk");
const data = {};
if (opts.content) data.content = opts.content;
if (opts.keywords) data.important_keywords = listValue(opts.keywords);
info(`Updating chunk chunk...`);
const result = await client.updateChunk(dataset, document, chunk, data);
ok("Chunk updated");
json(result);
}
// ── Retrieval ──
async function retrieve(opts) {
const client = createClient();
const question = opts.question;
if (!question) {
throw new Error("Missing required option: --question");
}
const params = { question };
if (opts.datasets) {
params.dataset_ids = listValue(opts.datasets);
}
if (opts.similarity) params.similarity_threshold = Number(opts.similarity);
if (opts.topN) params.page_size = Number(opts.topN);
if (opts.topK) params.top_k = Number(opts.topK);
if (opts.vectorWeight) params.vector_similarity_weight = Number(opts.vectorWeight);
if (opts.rerank) params.rerank_id = opts.rerank;
if (opts.keyword) params.keyword = true;
if (opts.kg) params.use_kg = true;
if (opts.crossLangs) params.cross_languages = opts.crossLangs.split(",");
info(`Searching: "question"`);
const result = await client.retrieve(params);
const count = Array.isArray(result) ? result.length : 0;
ok(`Found count result(s)`);
json(result);
}
// ── Chat Assistant ──
async function listChatAssistants(opts) {
const client = createClient();
const params = buildParams(opts, [
["page", "page", Number],
["pageSize", "page_size", Number],
["name", "name"],
["id", "id"],
]);
const result = await client.listChatAssistants(params);
if (Array.isArray(result) && result.length === 0) {
warn("No chat assistants found");
if (!outputMode.jsonOnly) return;
} else {
ok(`Found "assistants"`);
}
json(result);
}
async function createChatAssistant(opts) {
const client = createClient();
const name = requireOpt(opts, "name");
const data = { name };
applyChatOptions(data, opts);
info(`Creating chat assistant "name"...`);
const result = await client.createChatAssistant(data);
ok(`Chat assistant created: result.id`);
json(result);
}
async function updateChatAssistant(opts) {
const client = createClient();
const id = requireOpt(opts, "id");
const data = {};
if (opts.name) data.name = opts.name;
applyChatOptions(data, opts);
info(`Updating chat assistant id...`);
const result = await client.updateChatAssistant(id, data);
ok("Chat assistant updated");
json(result);
}
async function patchChatAssistant(opts) {
const client = createClient();
const id = requireOpt(opts, "id");
const data = {};
if (opts.name) data.name = opts.name;
applyChatOptions(data, opts);
info(`Patching chat assistant id...`);
const result = await client.patchChatAssistant(id, data);
ok("Chat assistant patched");
json(result);
}
async function deleteChatAssistants(opts) {
const client = createClient();
const ids = requireOpt(opts, "ids");
info(`Deleting ids.length chat assistant(s)...`);
const result = await client.deleteChatAssistants(ids);
ok("Chat assistants deleted");
json(result);
}
async function getChatAssistant(opts) {
const client = createClient();
const id = requireOpt(opts, "id");
info(`Fetching chat assistant id...`);
const result = await client.getChatAssistant(id);
ok(`Chat assistant: result.name`);
json(result);
}
// ── Session ──
async function listSessions(opts) {
const client = createClient();
const chat = requireOpt(opts, "chat");
const params = buildParams(opts, [
["page", "page", Number],
["pageSize", "page_size", Number],
]);
const result = await client.listSessions(chat, params);
ok(`Found "sessions"`);
json(result);
}
async function createSession(opts) {
const client = createClient();
const chat = requireOpt(opts, "chat");
const data = {};
if (opts.name) data.name = opts.name;
info("Creating session...");
const result = await client.createSession(chat, data);
ok(`Session created: result.id`);
json(result);
}
async function deleteSessions(opts) {
const client = createClient();
const chat = requireOpt(opts, "chat");
const ids = requireOpt(opts, "ids");
info(`Deleting ids.length session(s)...`);
const result = await client.deleteSessions(chat, ids);
ok("Sessions deleted");
json(result);
}
// ── Chat ──
async function chat(opts) {
const client = createClient();
const chatId = requireOpt(opts, "chat");
const session = requireOpt(opts, "session");
const question = opts.question;
if (!question) {
throw new Error("Missing required option: --question");
}
const params = {};
if (opts.stream) params.stream = true;
if (opts.topN) params.top_n = Number(opts.topN);
info(`Asking: "question"`);
const result = await client.chat(chatId, session, question, params);
ok("Response received");
json(result);
}
async function chatSession(opts) {
const client = createClient();
const chatId = requireOpt(opts, "chat");
const session = requireOpt(opts, "session");
const data = {};
if (opts.messages) {
const messages = jsonOption(opts.messages, "--messages");
data.question = questionFromMessages(messages);
}
if (opts.question) data.question = opts.question;
if (!data.question) {
throw new Error("Missing required option: --question or --messages with a user message");
}
if (opts.llmId || opts.llm) data.llm_id = opts.llmId || opts.llm;
if (opts.temperature !== undefined) data.temperature = Number(opts.temperature);
if (opts.topP !== undefined) data.top_p = Number(opts.topP);
if (opts.frequencyPenalty !== undefined) data.frequency_penalty = Number(opts.frequencyPenalty);
if (opts.presencePenalty !== undefined) data.presence_penalty = Number(opts.presencePenalty);
if (opts.maxTokens !== undefined) data.max_tokens = Number(opts.maxTokens);
if (opts.stream !== undefined) data.stream = opts.stream !== "false" && opts.stream !== false;
info(`Asking session: session...`);
const result = await client.chatSession(chatId, session, data);
ok("Session response received");
json(result);
}
// ── Agent ──
async function listAgents(opts) {
const client = createClient();
const params = buildParams(opts, [
["page", "page", Number],
["pageSize", "page_size", Number],
["name", "title"],
["title", "title"],
["id", "id"],
]);
const result = await client.listAgents(params);
if (Array.isArray(result) && result.length === 0) {
warn("No agents found");
if (!outputMode.jsonOnly) return;
} else {
ok(`Found "agents"`);
}
json(result);
}
async function createAgent(opts) {
const client = createClient();
const title = requireOpt(opts, "title");
const dsl = requireOpt(opts, "dsl");
const data = { title, dsl: jsonOption(dsl, "--dsl") };
if (opts.description) data.description = opts.description;
info(`Creating agent "title"...`);
const result = await client.createAgent(data);
ok(`Agent created`);
json(result);
}
async function updateAgent(opts) {
const client = createClient();
const id = requireOpt(opts, "id");
const data = {};
if (opts.title) data.title = opts.title;
if (opts.dsl) data.dsl = jsonOption(opts.dsl, "--dsl");
if (opts.description) data.description = opts.description;
info(`Updating agent id...`);
const result = await client.updateAgent(id, data);
ok("Agent updated");
json(result);
}
async function deleteAgents(opts) {
const client = createClient();
const ids = requireOpt(opts, "ids");
info(`Deleting ids.length agent(s)...`);
const result = await client.deleteAgents(ids);
ok("Agents deleted");
json(result);
}
async function getAgent(opts) {
const client = createClient();
const id = requireOpt(opts, "id");
info(`Fetching agent id...`);
const result = await client.getAgent(id);
ok(`Agent: result.title || result.id`);
json(result);
}
// ── Agent Session ──
async function listAgentSessions(opts) {
const client = createClient();
const agent = requireOpt(opts, "agent");
const params = buildParams(opts, [
["page", "page", Number],
["pageSize", "page_size", Number],
]);
const result = await client.listAgentSessions(agent, params);
ok(`Found "sessions"`);
json(result);
}
async function createAgentSession(opts) {
const client = createClient();
const agent = requireOpt(opts, "agent");
const data = {};
if (opts.name) data.name = opts.name;
info("Creating agent session...");
const result = await client.createAgentSession(agent, data);
ok(`Agent session created: result.id`);
json(result);
}
async function deleteAgentSessions(opts) {
const client = createClient();
const agent = requireOpt(opts, "agent");
const ids = requireOpt(opts, "ids");
info(`Deleting ids.length agent session(s)...`);
const result = await client.deleteAgentSessions(agent, ids);
ok("Agent sessions deleted");
json(result);
}
// ── Agent Chat ──
async function agentChat(opts) {
const client = createClient();
const agentId = requireOpt(opts, "agent");
const session = requireOpt(opts, "session");
const question = opts.question;
if (!question) {
throw new Error("Missing required option: --question");
}
info(`Asking agent: "question"`);
const result = await client.agentChat(agentId, session, question);
ok("Agent response received");
json(result);
}
// ── LLM Models ──
async function listModels(opts) {
const client = createClient();
const params = {};
if (opts.includeDetails) params.include_details = true;
info("Fetching available LLM models...");
let result;
try {
result = await client.listModels(params);
} catch (err) {
if (err.status === 401 || err.code === 401 || /unauthor/i.test(err.message)) {
err.message = `err.message. Set RAGFLOW_WEB_TOKEN from a web login session for /v1/llm/my_llms.`;
}
throw err;
}
// Normalize and group models
const factories = result || {};
const groups = [];
const groupBy = opts.groupBy || "type";
const includeUnavailable = opts.all;
for (const [factoryName, factoryPayload] of Object.entries(factories)) {
if (factoryName.startsWith("__")) continue;
if (!factoryPayload || !factoryPayload.llm) continue;
const llms = factoryPayload.llm || [];
for (const llm of llms) {
const status = llm.status;
const isAvailable = status === 1 || status === "1" || status === true;
if (!includeUnavailable && !isAvailable) continue;
const key = groupBy === "factory" ? factoryName : (llm.type || "unknown");
let group = groups.find(g => g.name === key);
if (!group) {
group = { name: key, models: [] };
groups.push(group);
}
const model = {
id: llm.id,
name: llm.name,
type: llm.type,
factory: factoryName,
status: isAvailable ? "available" : "unavailable",
};
if (opts.includeDetails) {
model.used_token = llm.used_token;
if (llm.api_base) model.api_base = llm.api_base;
if (llm.max_tokens) model.max_tokens = llm.max_tokens;
}
group.models.push(model);
}
}
groups.sort((a, b) => a.name.localeCompare(b.name));
for (const group of groups) {
group.models.sort((a, b) => (a.name || "").localeCompare(b.name || ""));
}
const totalModels = groups.reduce((sum, g) => sum + g.models.length, 0);
ok(`Found totalModels model(s) in groups.length group(s)`);
json({ groups, total: totalModels });
}
// ── Command registry ──
async function metadataSummary(opts) {
const client = createClient();
const dataset = requireOpt(opts, "dataset");
const docIds = opts.docIds ? listValue(opts.docIds) : [];
info(`Fetching metadata summary for dataset dataset...`);
const result = await client.metadataSummary(dataset, docIds);
ok("Metadata summary fetched");
json(result);
}
async function systemVersion() {
const client = createClient();
const result = await client.getSystemVersion();
ok("System version fetched");
json(result);
}
async function getLogLevels() {
const client = createClient();
const result = await client.getLogLevels();
ok("Log levels fetched");
json(result);
}
async function setLogLevel(opts) {
const client = createClient();
const pkgName = requireOpt(opts, "pkgName");
const level = requireOpt(opts, "level");
info(`Setting log level for pkgName...`);
const result = await client.setLogLevel(pkgName, level);
ok("Log level updated");
json(result);
}
const COMMANDS = {
// Dataset
"create-dataset": { fn: createDataset, group: "Dataset", desc: "Create a dataset" },
"list-datasets": { fn: listDatasets, group: "Dataset", desc: "List all datasets" },
"get-dataset": { fn: getDataset, group: "Dataset", desc: "Get dataset details" },
"update-dataset": { fn: updateDataset, group: "Dataset", desc: "Update a dataset" },
"delete-datasets": { fn: deleteDatasets, group: "Dataset", desc: "Delete datasets" },
// Document
"upload-documents": { fn: uploadDocuments, group: "Document", desc: "Upload documents" },
"list-documents": { fn: listDocuments, group: "Document", desc: "List documents" },
"get-document": { fn: getDocument, group: "Document", desc: "Get document details" },
"update-document": { fn: updateDocument, group: "Document", desc: "Update a document" },
"delete-documents": { fn: deleteDocuments, group: "Document", desc: "Delete documents" },
// Parsing
"start-parsing": { fn: startParsing, group: "Parsing", desc: "Start document parsing" },
"stop-parsing": { fn: stopParsing, group: "Parsing", desc: "Stop document parsing" },
"wait-parsing": { fn: waitParsing, group: "Parsing", desc: "Wait for parsing to complete" },
// Chunk
"list-chunks": { fn: listChunks, group: "Chunk", desc: "List chunks" },
"add-chunk": { fn: addChunk, group: "Chunk", desc: "Add a chunk" },
"update-chunk": { fn: updateChunk, group: "Chunk", desc: "Update a chunk" },
"delete-chunks": { fn: deleteChunks, group: "Chunk", desc: "Delete chunks" },
// Retrieval
"retrieve": { fn: retrieve, group: "Retrieval", desc: "Retrieve from datasets" },
// Chat Assistant
"list-chats": { fn: listChatAssistants, group: "Chat", desc: "List chat assistants" },
"create-chat": { fn: createChatAssistant, group: "Chat", desc: "Create a chat assistant" },
"get-chat": { fn: getChatAssistant, group: "Chat", desc: "Get chat assistant details" },
"update-chat": { fn: updateChatAssistant, group: "Chat", desc: "Update a chat assistant" },
"patch-chat": { fn: patchChatAssistant, group: "Chat", desc: "Patch a chat assistant" },
"delete-chats": { fn: deleteChatAssistants, group: "Chat", desc: "Delete chat assistants" },
// Session
"list-sessions": { fn: listSessions, group: "Session", desc: "List chat sessions" },
"create-session": { fn: createSession, group: "Session", desc: "Create a chat session" },
"delete-sessions": { fn: deleteSessions, group: "Session", desc: "Delete chat sessions" },
// Chat conversation
"chat": { fn: chat, group: "Chat", desc: "Chat with an assistant" },
"chat-session": { fn: chatSession, group: "Chat", desc: "Chat with a session" },
// Agent
"list-agents": { fn: listAgents, group: "Agent", desc: "List agents" },
"create-agent": { fn: createAgent, group: "Agent", desc: "Create an agent" },
"get-agent": { fn: getAgent, group: "Agent", desc: "Get agent details" },
"update-agent": { fn: updateAgent, group: "Agent", desc: "Update an agent" },
"delete-agents": { fn: deleteAgents, group: "Agent", desc: "Delete agents" },
// Agent Session
"list-agent-sessions": { fn: listAgentSessions, group: "Agent", desc: "List agent sessions" },
"create-agent-session": { fn: createAgentSession, group: "Agent", desc: "Create an agent session" },
"delete-agent-sessions": { fn: deleteAgentSessions, group: "Agent", desc: "Delete agent sessions" },
// Agent Chat
"agent-chat": { fn: agentChat, group: "Agent", desc: "Chat with an agent" },
// LLM Models
"list-models": { fn: listModels, group: "Models", desc: "List available LLM models" },
// Metadata / System
"metadata-summary": { fn: metadataSummary, group: "Document", desc: "Summarize document metadata" },
"system-version": { fn: systemVersion, group: "System", desc: "Get system version" },
"get-log-levels": { fn: getLogLevels, group: "System", desc: "Get log levels" },
"set-log-level": { fn: setLogLevel, group: "System", desc: "Set a log level" },
};
function printHelp() {
const groups = {};
for (const [cmd, { group, desc }] of Object.entries(COMMANDS)) {
if (!groups[group]) groups[group] = [];
groups[group].push({ cmd, desc });
}
let out = `C.boldUsage:C.reset node ragflow.js <command> [options]\n`;
for (const [group, cmds] of Object.entries(groups)) {
out += `\nC.boldC.cyan groupC.reset\n`;
for (const { cmd, desc } of cmds) {
out += ` C.greencmd.padEnd(22)C.reset desc\n`;
}
}
out += `
C.boldCommon Options:C.reset
--name Name
--id ID
--ids IDs (multiple values)
--dataset Dataset ID
--files File paths (multiple values)
--doc-ids Document IDs (multiple values)
--document Document ID
--content Chunk content
--chunk-ids Chunk IDs (multiple values)
--messages Messages JSON (for session chat)
--chat Chat assistant ID
--agent Agent ID
--session Session ID
--llm-id LLM model ID
--question, -q Question (for retrieve/chat)
--datasets, -d Dataset IDs for retrieval
--metadata Metadata filter JSON
--metadata-condition Metadata condition JSON
--meta-fields Document metadata JSON
--similarity, -s Similarity threshold (0-1)
--top-n, -n Number of results
--top-k, -k Number of candidates
--top-p Top-p
--vector-weight, -w Vector similarity weight (0-1)
--temperature Sampling temperature
--frequency-penalty Frequency penalty
--presence-penalty Presence penalty
--max-tokens Max tokens
--stream Stream completion
--rerank, -r Rerank model ID
--keyword Enable keyword search
--kg Enable knowledge graph
--cross-langs Cross-language targets (comma-separated)
--page Page number
--page-size Page size
--orderby Order by field
--desc Sort descending
--return-empty-metadata Return docs with empty metadata
--include-details Include detailed model info
--group-by Group models by type/factory
--all Include unavailable models
--parser-config Parser configuration (JSON)
--prompt-config Chat prompt configuration (JSON or @file)
--pkg-name Log package name
--level Log level
--json Print machine-readable JSON only
`;
console.log(out);
}
// ── Main ──
async function main() {
const opts = parseArgs(args.slice(1));
outputMode.jsonOnly = Boolean(opts.json);
if (!command || command === "help" || command === "--help" || command === "-h" || opts.help) {
printHelp();
process.exit(0);
}
const cmd = COMMANDS[command];
if (!cmd) {
if (outputMode.jsonOnly && command) {
json({
error: {
message: `Unknown command: command`,
raw_message: `Unknown command: command`,
command,
},
});
process.exit(1);
}
printHelp();
process.exit(command ? 1 : 0);
}
try {
await cmd.fn(opts);
} catch (err) {
if (outputMode.jsonOnly) {
json(commandErrorJsonPayload(err));
process.exit(1);
}
fail(cliErrorMessage(err));
process.exit(1);
}
}
main();
FILE:scripts/repro-delete-chunks.js
#!/usr/bin/env node
const fs = require("fs");
const os = require("os");
const path = require("path");
const { createClient } = require("../lib/api.js");
const DEFAULT_RETRIES = 3;
const DEFAULT_RETRY_DELAY_MS = 1000;
function delay(ms) {
return new Promise((resolve) => setTimeout(resolve, ms));
}
function normalizeError(err) {
return {
message: err.message,
code: err.code,
status: err.status,
};
}
function chunkListIds(payload) {
const chunks = Array.isArray(payload) ? payload : payload?.chunks || [];
return chunks.map((chunk) => chunk.id).filter(Boolean);
}
function chunkIdsFromExactLookup(payload) {
return chunkListIds(payload);
}
function firstDocumentId(upload) {
if (Array.isArray(upload)) return upload[0]?.id || "";
if (Array.isArray(upload?.docs)) return upload.docs[0]?.id || "";
if (upload?.docs?.id) return upload.docs.id;
if (upload?.document?.id) return upload.document.id;
return upload?.id || "";
}
async function main() {
const client = createClient({ timeout: Number(process.env.RAGFLOW_REPRO_TIMEOUT_MS || 60000) });
const retries = Number(process.env.RAGFLOW_REPRO_DELETE_RETRIES || DEFAULT_RETRIES);
const retryDelayMs = Number(process.env.RAGFLOW_REPRO_DELETE_RETRY_DELAY_MS || DEFAULT_RETRY_DELAY_MS);
const embeddingModel = process.env.RAGFLOW_REPRO_EMBEDDING_MODEL || "text-embedding-v4@Tongyi-Qianwen";
const marker = `RAGFLOW_DELETE_CHUNK_REPRO_Date.now()`;
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), "ragflow-delete-chunks-"));
const filePath = path.join(tempDir, "delete-chunks-repro.md");
const result = {
marker,
config: { embedding_model: embeddingModel, retries, retry_delay_ms: retryDelayMs },
steps: [],
attempts: [],
cleanup: {},
};
let datasetId = "";
let documentId = "";
try {
fs.writeFileSync(
filePath,
[
"# RAGFlow delete-chunks repro",
"",
`Marker: marker`,
"This file is created by ragflow-skill to reproduce manual chunk deletion.",
"",
].join("\n"),
"utf8"
);
const datasetPayload = {
name: `ragflow-delete-chunks-repro-Date.now()`,
chunk_method: "naive",
permission: "me",
description: "Temporary dataset for ragflow-skill delete-chunks reproduction",
};
if (embeddingModel) datasetPayload.embedding_model = embeddingModel;
const dataset = await client.createDataset(datasetPayload);
datasetId = dataset.id;
result.steps.push({ step: "create-dataset", dataset_id: datasetId });
const upload = await client.uploadDocuments(datasetId, [filePath]);
result.steps.push({ step: "upload-response-shape", shape: Array.isArray(upload) ? "array" : Object.keys(upload || {}) });
documentId = firstDocumentId(upload);
if (!documentId) throw new Error("Upload did not return a document id");
result.steps.push({ step: "upload-documents", document_id: documentId });
await client.startParsing(datasetId, [documentId]);
const parsed = await client.waitForParsing(datasetId, [documentId], { maxWait: 120000, interval: 3000 });
result.steps.push({ step: "wait-parsing", documents: parsed.map((doc) => ({ id: doc.id, run: doc.run, chunk_count: doc.chunk_count })) });
const added = await client.addChunk(datasetId, documentId, {
content: `Manual chunk for delete reproduction. marker`,
important_keywords: ["delete", "repro"],
});
const chunkId = added?.chunk?.id || added?.id;
if (!chunkId) throw new Error("addChunk did not return a chunk id");
result.steps.push({ step: "add-chunk", chunk_id: chunkId });
const beforeDelete = await client.listChunks(datasetId, documentId);
result.steps.push({ step: "list-before-delete", chunk_ids: chunkListIds(beforeDelete), contains_added_chunk: chunkListIds(beforeDelete).includes(chunkId) });
try {
const exactBeforeDelete = await client.listChunks(datasetId, documentId, { id: chunkId });
result.steps.push({
step: "exact-id-before-delete",
chunk_ids: chunkIdsFromExactLookup(exactBeforeDelete),
contains_added_chunk: chunkIdsFromExactLookup(exactBeforeDelete).includes(chunkId),
});
} catch (err) {
result.steps.push({ step: "exact-id-before-delete", error: normalizeError(err) });
}
for (let attempt = 0; attempt <= retries; attempt++) {
if (attempt > 0) await delay(retryDelayMs);
const entry = { attempt, delay_ms: attempt === 0 ? 0 : retryDelayMs };
try {
entry.response = await client.deleteChunks(datasetId, documentId, [chunkId], { maxRetries: 0 });
entry.ok = true;
result.attempts.push(entry);
break;
} catch (err) {
entry.ok = false;
entry.error = normalizeError(err);
if (err.delete_chunk_details) entry.delete_chunk_details = err.delete_chunk_details;
try {
const listed = await client.listChunks(datasetId, documentId);
entry.chunk_ids_after_failure = chunkListIds(listed);
entry.chunk_still_listed = entry.chunk_ids_after_failure.includes(chunkId);
} catch (listErr) {
entry.list_error = normalizeError(listErr);
}
result.attempts.push(entry);
}
}
const firstOk = result.attempts[0]?.ok;
const laterOk = result.attempts.some((attempt, index) => index > 0 && attempt.ok);
const anyOk = result.attempts.some((attempt) => attempt.ok);
if (firstOk) {
result.conclusion = "delete-chunks succeeded immediately; no retry workaround is indicated by this run.";
} else if (laterOk) {
result.conclusion = "delete-chunks succeeded only after retry; exact ID lookup and search/delete visibility are temporarily inconsistent after manual chunk insert.";
} else if (!anyOk) {
result.conclusion = "delete-chunks failed after retries; this points to a RAGFlow server/doc-store deletion issue rather than a transient CLI timing issue.";
}
} catch (err) {
result.error = normalizeError(err);
} finally {
if (datasetId) {
try {
result.cleanup.delete_dataset = await client.deleteDatasets([datasetId]);
} catch (err) {
result.cleanup.delete_dataset_error = normalizeError(err);
}
}
fs.rmSync(tempDir, { recursive: true, force: true });
}
console.log(JSON.stringify(result, null, 2));
if (result.error) process.exitCode = 1;
}
main().catch((err) => {
console.error(JSON.stringify({ error: normalizeError(err) }, null, 2));
process.exit(1);
});