@clawhub-sdk-team-83914865ba
Alibaba Cloud ECS extension installation skill. Supports querying available extension lists, checking if a specific extension is available, and one-click ins...
---
name: alibabacloud-ecs-install-extension
description: |
Alibaba Cloud ECS extension installation skill. Supports querying available extension lists, checking if a specific extension is available,
and one-click installation of extensions (e.g., OpenClaw, BT Panel, Python environments, etc.). Extensions are officially provided by Alibaba Cloud
with verified installation packages and scripts.
Triggers: "extension", "install", "BT Panel",
"OpenClaw", "Python", "Node.js", "package", "one-click install"
---
# ECS Extension Installation Skill
You are a professional cloud operations assistant responsible for helping users query, verify, and install Alibaba Cloud ECS extensions (OOS Packages). Follow the scenario-based workflow strictly.
## Scenario Description
This skill provides ECS extension program query and installation capabilities through Alibaba Cloud OOS (Operation Orchestration Service). Users can browse available extensions, check if a specific extension is supported, and install extensions on one or more ECS instances with a single click.
**Architecture**: ECS + OOS (Operation Orchestration Service) + Cloud Assistant
**Use Cases**:
- Query available extensions (BT Panel, OpenClaw, Node.js, Python, etc.)
- Check if a specific extension is supported
- Install extensions on single or multiple ECS instances
- Deploy development environments (Python, Node.js, Java, etc.)
- Install server management panels (BT Panel, etc.)
## Prerequisites
> **Pre-check: Aliyun CLI >= 3.3.1 required**
> Run `aliyun version` to verify >= 3.3.1. If not installed or version too low,
> see `references/cli-installation-guide.md` for installation instructions.
> Then [MUST] run `aliyun configure set --auto-plugin-install true` to enable automatic plugin installation.
>
> **Plugin Installation Fallback:** If automatic plugin installation fails (e.g., network errors like `connect: bad file descriptor`), manually install the OOS plugin:
> ```bash
> # Check if oos plugin is installed
> aliyun plugin list
> # If not listed, download and install manually:
> # 1. Visit https://github.com/aliyun/alibaba-cloud-cli/releases to find the oos plugin package
> # 2. Download the .tar.gz for your platform
> # 3. Install: aliyun plugin install --file <path-to-downloaded-plugin.tar.gz>
> # 4. Verify: aliyun plugin list (should show oos plugin)
> ```
> **Pre-check: Alibaba Cloud Credentials Required**
>
> **Security Rules:**
> - **NEVER** read, echo, or print AK/SK values (e.g., `echo $ALIBABA_CLOUD_ACCESS_KEY_ID` is FORBIDDEN)
> - **NEVER** ask the user to input AK/SK directly in the conversation or command line
> - **NEVER** use `aliyun configure set` with literal credential values
> - **ONLY** use `aliyun configure list` to check credential status
>
> ```bash
> aliyun configure list
> ```
> Check the output for a valid profile (AK, STS, or OAuth identity).
>
> **If no valid profile exists, STOP here.**
> 1. Obtain credentials from [Alibaba Cloud Console](https://ram.console.aliyun.com/manage/ak)
> 2. Configure credentials **outside of this session** (via `aliyun configure` in terminal or environment variables in shell profile)
> 3. Return and re-run after `aliyun configure list` shows a valid profile
> **Endpoint Note (Plugin Mode)**: In plugin mode, the `--endpoint` flag is typically not needed. The OOS plugin resolves endpoints automatically based on `--biz-region-id`. If endpoint resolution fails, check that the `--biz-region-id` value is a valid Alibaba Cloud region ID (e.g., `cn-hangzhou`).
---
## AI-Mode & Plugin Update
> **[MUST]** Before executing any `aliyun` CLI command in this workflow, run the following initialization commands:
>
> ```bash
> aliyun configure ai-mode enable
> aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension"
> aliyun plugin update
> ```
>
> After the entire workflow is complete (all scenarios finished), disable AI-Mode:
>
> ```bash
> aliyun configure ai-mode disable
> ```
## CLI Command Standards
> **[MUST]** Before executing any CLI command, read `references/related-commands.md` for command format standards.
>
> **Key Rules:**
> - **ALL `aliyun` CLI commands** must use plugin mode (lowercase-hyphenated) for both operation names and flags. This applies to **every cloud service**, not just OOS. **Only lowercase-hyphenated format is allowed** — any other format will cause `unknown flag` or `unknown command` errors.
> - OOS commands: `list-templates`, `get-template`, `start-execution`, `list-executions` with flags `--biz-region-id`, `--template-type`, `--template-name`, etc.
> - ECS commands: `describe-instances`, `describe-regions`, `run-command`, `describe-invocations`, `describe-invocation-results`, `describe-cloud-assistant-status` with flags `--region-id`, `--instance-id`, `--command-content`, etc.
>
> **[RECOMMENDED] Flag Verification:** Run `aliyun <service> <action> --help` (e.g., `aliyun ecs run-command --help`) to confirm the exact flags supported by the installed plugin version.
## Required Permissions
This skill requires the following RAM permissions:
- `bss:DescribeOrderDetail` (query order details for billing verification)
- `ecs:DescribeCloudAssistantStatus` (check Cloud Assistant status)
- `ecs:DescribeInstances` (instance information verification)
- `ecs:DescribeInvocations` (list Cloud Assistant command invocations)
- `ecs:DescribeInvocationResults` (view command execution results)
- `ecs:RunCommand` (Cloud Assistant command execution during installation)
- `oos:GetApplicationGroup` (get OOS application group information)
- `oos:GetTemplate` (get OOS template details)
- `oos:ListInstancePackageStates` (query instance extension package status)
- `oos:ListTemplates` (list available extension packages)
- `oos:StartExecution` (start OOS execution for installation)
- `oos:UpdateInstancePackageState` (update instance package state)
- `oss:GetObject` (download extension package files from OSS)
See `references/ram-policies.md` for detailed policy configuration.
> **[MUST] Permission Failure Handling:** When any command or API call fails due to permission errors at any point during execution, follow this process:
> 1. Read `references/ram-policies.md` to get the full list of permissions required by this SKILL
> 2. Use `ram-permission-diagnose` skill to guide the user through requesting the necessary permissions
> 3. Pause and wait until the user confirms that the required permissions have been granted
## Parameter Confirmation
> **IMPORTANT: Parameter Confirmation** — Before executing any installation command,
> ALL user-customizable parameters MUST be confirmed with the user. Do NOT assume or use default
> values without explicit user approval.
| Parameter Name | Required/Optional | Description | Default Value |
|----------------|-------------------|-------------|---------------|
| `RegionId` | Required | Region where the target instances are located | N/A |
| `InstanceId` | Required | One or more ECS instance IDs to install the extension on | N/A |
| `PackageName` | Required | Extension package name (e.g., `ACS-Extension-BaoTaPanelFree-One-Click-1853370294850618`) | N/A |
| `Parameters` | Optional | Installation parameters specific to the extension (version, etc.) | Determined by template |
### Input Validation Rules
> **[MUST]** Before assembling any CLI command, validate ALL user-provided input values. Reject invalid input immediately and prompt the user to correct it. **Never** pass unvalidated user input into shell command strings.
| Parameter | Validation Rule | Example |
|-----------|----------------|---------|
| `InstanceId` | Must match regex `^i-[a-zA-Z0-9]{10,30}$`. Each ID in the array must pass validation. | `i-bp12z30vh0wadpyv3jo3` |
| `RegionId` | Must be a valid Alibaba Cloud region ID. Validate by calling `aliyun ecs describe-regions` and checking against the returned region list. | `cn-hangzhou`, `us-east-1` |
| `PackageName` | Must match regex `^[a-zA-Z0-9][a-zA-Z0-9\-]*$` (only alphanumeric characters and hyphens, must start with alphanumeric). | `ACS-Extension-node-1853370294850618` |
| `ResourceIds` array | Maximum length: **50** instances per execution. | — |
> **Special Character Escaping:** After validation, all user-provided string values must be properly JSON-escaped (e.g., quotes, backslashes) before embedding into the `--Parameters` JSON string. Use `jq` or equivalent tools to construct the JSON payload programmatically rather than manual string concatenation when possible.
---
## Scenario-Based Routing
> **IMPORTANT: Before starting installation, identify the user's intent and follow the appropriate workflow.**
Based on the user's request, route to the appropriate scenario:
| User Intent | Trigger Keywords | Handling Method |
|-------------|------------------|-----------------|
| **Query Available Extensions** | "what extensions", "list", "available extensions", "show me" | Execute **Scenario 1** |
| **Query Extension Support** | "can I install", "is it supported", "do you have", "support" | Execute **Scenario 2** |
| **Install Extension** | "install", "deploy", "one-click install", "set up" | Execute **Scenario 3** |
---
## Scenario 1: Query Available Extensions List
When the user asks "What extensions are available?" or similar, follow these steps:
### Step 1: List Templates
Call `list-templates` to get all available public extension packages:
```bash
aliyun oos list-templates \
--biz-region-id cn-hangzhou \
--template-type Package \
--share-type Public \
--max-results 100 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
```
### Step 2: Parse and Display Results
Parse the response and present the results in a table format to the user:
| Extension Name | Description | Category |
|----------------|-------------|----------|
| (from TemplateName, prefer `name-zh-cn` from parsed Description JSON) | (from `zh-cn` or `en` in parsed Description JSON) | (from `categories` in parsed Description JSON) |
> **Note:** The `Description` field is a JSON string containing metadata. Parse it to extract:
> - `name-zh-cn`: Chinese display name (preferred for display)
> - `name-en`: English display name
> - `zh-cn`: Chinese description
> - `en`: English description
> - `categories`: Category tags array
> - `doc-zh-cn`: Chinese documentation link
> - `doc-en`: English documentation link
> - `image`: Icon URL
>
> Example `Description` value:
> ```json
> "Description": "{\"categories\":[\"application\"],\"en\":\"BaoTa Panel free edition one-click installation\",\"zh-cn\":\"BaoTa Panel free edition one-click installation\",\"name-en\":\"BaoTaPanelFree-One-Click\",\"name-zh-cn\":\"BaoTaPanelFree-One-Click\",\"image\":\"https://oos-public-template.oss-cn-beijing.aliyuncs.com/BaoTaPanelFree/icon.png\"}"
> ```
> **Note:** The `--biz-region-id` in the command is used for API endpoint routing. The returned public templates are available across all regions.
---
## Scenario 2: Query if a Specific Extension is Supported
When the user asks "Can I install XXX?" or similar, follow these steps:
### Step 1: List and Search
Call `list-templates` (same as Scenario 1) and search for the extension by keyword:
```bash
aliyun oos list-templates \
--biz-region-id cn-hangzhou \
--template-type Package \
--share-type Public \
--max-results 100 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
```
### Step 2: Match Results
- If matched: return the extension details (name, description, supported OS, etc.)
- If not matched: inform the user that the extension is not currently supported, and suggest similar alternatives or Scenario 1 to browse the full list
---
## Scenario 3: Install Extension
This is the core workflow. Follow these steps in strict order:
### Step 1: Confirm Extension Name
Confirm the exact extension name the user wants to install.
- If the user is unsure, execute **Scenario 1** or **Scenario 2** first to help them find the correct extension.
- If the user provides a vague name (e.g., "BT Panel"), search and confirm the exact `TemplateName` (e.g., `ACS-Extension-BaoTaPanelFree-One-Click-1853370294850618`).
### Step 2: Get Template Details
Call `get-template` to retrieve the extension template details. **Redirect output to a temporary file** to avoid terminal truncation (the `Content` field is usually very large):
```bash
aliyun oos get-template \
--biz-region-id cn-hangzhou \
--template-name "【Extension-Name】" \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension > /tmp/oos-template.json
```
Then extract the `Parameters` from the template content:
```bash
jq -r '(.Content | fromjson | .Parameters)' /tmp/oos-template.json
```
> **[IMPORTANT] Output Truncation Warning**: `get-template` returns a `Content` field that is typically very large (contains full installation scripts). Always redirect command output to a temporary file (`> /tmp/oos-template.json`) first, then use `jq` or file read tools to parse. Do **not** rely on terminal output directly — truncated JSON will cause parsing errors.
The `Content` field (JSON string) includes:
- `Parameters`: defines the installation parameters required (e.g., version number, installation path, etc.)
- `Description`: extension description
- `TemplateVersion`: template version
Parse `Content.Parameters` and extract all required and optional parameters.
### Step 3: Guide User to Provide Parameters
Based on the `Parameters` parsed in Step 2, guide the user to provide necessary values:
- **Required parameters**: must obtain user input
- **Optional parameters**: inform the user of defaults; if the user does not provide, use defaults
> **[IMPORTANT]** Only extract parameters from `Content.Parameters`. Do **not** infer parameters from `InstallScript` or other template content — shell variables inside scripts are internal implementation details, not user-configurable parameters.
Common parameter examples:
| Parameter | Type | Description |
|-----------|------|-------------|
| `version` | String | Software version number (e.g., `v22.13.1` for Node.js) |
| `packageVersion` | String | Extension package version (e.g., `v27`) |
> **Note:** Do not fabricate parameter values. Must be obtained from the user or template defaults.
### Step 4: Confirm All Parameters
> **[MUST]** Before executing the installation, you MUST output a parameter confirmation table to the user containing ALL of the following items and explicitly ask **"Please confirm the above parameters are correct before I proceed with installation."** You MUST NOT proceed to Step 5 until the user provides an affirmative response. Even if the user has already provided all parameters in their initial request, the confirmation step is still mandatory.
| Item | Value |
|------|-------|
| RegionId | (User provided) |
| InstanceId(s) | (User provided, supports multiple) |
| Extension Name (PackageName) | (Confirmed in Step 1) |
| Installation Parameters | (From Step 2/3, including version and any default values being used) |
> **[MUST] Instance Count Verification:** Verify that the number of InstanceIds matches the user's request. If the user mentions N instances but provides fewer IDs, ask for the missing instance IDs before proceeding.
>
> **[MUST]** Installation operations will modify instance state. Must obtain explicit user confirmation before execution. Do NOT skip this step under any circumstances.
### Step 5: Execute Installation
> **[MUST] Idempotency Check:** Before executing, query whether a running execution already exists for the same extension and target instances:
>
> ```bash
> aliyun oos list-executions \
> --biz-region-id "【User-Provided-Region】" \
> --template-name "ACS-ECS-BulkyConfigureOOSPackageWithTemporaryURL" \
> --status Running \
> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
> ```
>
> If a running execution with the same `packageName` and `targets` is found:
> 1. Inform the user about the existing execution
> 2. Ask the user whether to wait for it or create a new execution
> 3. **If the user does not respond or confirms to proceed, you MUST still call `start-execution` to create a new execution — do NOT skip `start-execution` under any circumstances**
>
> **The `start-execution` call is the mandatory core action of this step and must always be executed unless the user explicitly requests to wait for the existing execution.**
>
> **[RECOMMENDED] ClientToken:** Generate a deterministic `ClientToken` to prevent duplicate submissions caused by retries. The `ClientToken` must be a string of 1-64 ASCII characters.
>
> ```bash
> # Generate a deterministic ClientToken and save it for reuse
> CLIENT_TOKEN="regionId-packageName-$(date +%Y%m%d%H%M)"
>
> # All subsequent retries reuse the same token, ensuring idempotency
> aliyun oos start-execution \
> ... \
> --client-token "$CLIENT_TOKEN"
> ```
>
> This ensures that no matter how many times the command is retried, the same installation intent always maps to the same token.
**[MUST]** Call `start-execution` to execute the installation task (this call must NOT be skipped):
**[MUST] Parameter Recording:** Before executing `start-execution`, save the complete `--parameters` JSON to a file for traceability, then use the file content for the command:
```bash
# Save parameters to file for traceability
cat > /tmp/oos-start-params.json << 'PARAMS_EOF'
{"regionId":"【User-Provided-Region】","OOSAssumeRole":"","targets":{"ResourceIds":["【User-Provided-InstanceId】"],"RegionId":"【User-Provided-Region】","Type":"ResourceIds"},"rateControl":{"Mode":"Concurrency","Concurrency":1,"MaxErrors":0},"action":"install","packageName":"【User-Specified-Package】","parameters":【User-Provided-Parameters】}
PARAMS_EOF
# Execute with parameters from file
aliyun oos start-execution \
--biz-region-id "【User-Provided-Region】" \
--template-name "ACS-ECS-BulkyConfigureOOSPackageWithTemporaryURL" \
--mode "Automatic" \
--tags "{}" \
--parameters "$(cat /tmp/oos-start-params.json)" \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
```
**[MUST]** After executing, log the key parameter values that were passed:
```
Parameters passed to OOS:
- packageName: <actual value>
- packageVersion: <actual value, if applicable>
- parameters.version: <actual value, if applicable>
- targets.ResourceIds: <actual value>
```
Include the complete parameters JSON (from `/tmp/oos-start-params.json`) in the Installation Report's "Installation Parameters" field.
**Parameter Description:**
| Parameter | Description |
|-----------|-------------|
| `regionId` | Must be consistent with `--biz-region-id` |
| `targets.ResourceIds` | Array of instance IDs to install on |
| `targets.RegionId` | Must be consistent with `--biz-region-id` |
| `targets.Type` | Fixed value `ResourceIds` |
| `rateControl.Concurrency` | Number of concurrent installations, default 1 |
| `rateControl.MaxErrors` | Maximum number of errors allowed, default 0 |
| `action` | Fixed value `install` |
| `packageName` | Extension package name |
| `parameters` | Extension-specific installation parameters (JSON object) |
**Example:**
```bash
aliyun oos start-execution \
--biz-region-id cn-hangzhou \
--template-name "ACS-ECS-BulkyConfigureOOSPackageWithTemporaryURL" \
--mode "Automatic" \
--tags "{}" \
--parameters "{\"regionId\":\"cn-hangzhou\",\"OOSAssumeRole\":\"\",\"targets\":{\"ResourceIds\":[\"i-bp12z30vh0xxxxxxxxxx\"],\"RegionId\":\"cn-hangzhou\",\"Type\":\"ResourceIds\"},\"rateControl\":{\"Mode\":\"Concurrency\",\"Concurrency\":1,\"MaxErrors\":0},\"action\":\"install\",\"packageName\":\"ACS-Extension-node-1853370294850618\",\"packageVersion\":\"v27\",\"parameters\":{\"version\":\"v22.13.1\"}}" \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
```
### Step 6: Check Execution Result and Verify
After the command returns, extract `ExecutionId` from the response and poll the execution status:
```bash
aliyun oos list-executions \
--biz-region-id "【User-Provided-Region】" \
--execution-id "【ExecutionId-from-Response】" \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
```
> **Polling Strategy**: Check execution status **every 20 seconds**. If the status is still `Running`, wait 20 seconds and check again. **Maximum wait time is 20 minutes** (60 checks).
>
> **[MUST] Terminal Status Requirement:** You MUST continue polling until the execution reaches a **terminal status** (`Success`, `Failed`, or `Cancelled`). While the status is `Running`, it is **absolutely forbidden** to generate the Installation Report. You may ONLY stop polling and generate a report in these two cases:
> 1. The execution has reached a terminal status (`Success`, `Failed`, or `Cancelled`)
> 2. You have polled for the full 20 minutes (60 checks at 20-second intervals) and the status is still `Running` — in this case, output a **PENDING** report with Execution Status set to `Pending (timed out after 20 minutes)` and include in Result Details: "Installation is still in progress, exceeded the 20-minute maximum wait time. Please check status manually using: `aliyun oos list-executions --biz-region-id <region> --execution-id <exec-id> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension`"
>
> **Any other situation (e.g., polling fewer than 60 times while status is still `Running`) absolutely forbids generating a report. You must keep polling.**
Installation status explanation:
| Status | Description |
|--------|-------------|
| `Running` | Installation in progress — wait 20 seconds and check again. **Do NOT output the report yet.** |
| `Success` | Installation successful — proceed to generate the report |
| `Failed` | Installation failed — view `Outputs` or `Tasks` for error details, then generate the report |
| `Cancelled` | Installation cancelled — generate the report |
> **[MUST] Post-Installation Version Verification:** When the execution status is `Success`, you MUST verify the actual installed/existing software version by executing the appropriate version check command via Cloud Assistant (using `aliyun ecs run-command` or the OOS_RunCommand MCP tool). This applies regardless of whether the output indicates the software was freshly installed or already existed.
>
> **Example** (verifying Node.js version via Cloud Assistant — note: ALL flags use kebab-case):
> ```bash
> aliyun ecs run-command \
> --region-id "<region>" \
> --instance-id '["<instance-id>"]' \
> --type RunShellScript \
> --command-content "node -v" \
> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
> ```
>
> Standard version check commands:
> | Software | Command |
> |----------|---------|
> | Node.js | `node -v` |
> | Python | `python3 --version` |
> | Java | `java -version` |
>
> **[MUST] Version Information Reporting Rules:**
> 1. Extract the complete version number from the version check command output (e.g., `v22.13.1`, `3.10.12`, `21.0.7`)
> 2. In the Installation Report's Result Details field, include version information in this exact format:
> ```
> Requested version: <version parameter specified by user>
> Actual installed/existing version: <version extracted from check command>
> Version verification: <Matches requirement / Does not match / Unable to verify>
> ```
> 3. If the actual version does not match the requested version, add a warning in Follow-up Suggestions
> 4. **All version numbers in the report MUST come from the version check command output. Do NOT infer or guess version numbers from descriptive log text. Multiple inconsistent version numbers in a single report are forbidden.**
---
## Installation Report Output Format
> **[MUST]** Only generate this report when one of the following conditions is met:
> 1. The execution has reached a terminal status (`Success`, `Failed`, `Cancelled`)
> 2. You have polled for the full 20 minutes (60 checks) and the status is still `Running` (report as `Pending (timed out after 20 minutes)`)
>
> **It is absolutely forbidden to generate this report if polling has not reached 60 checks and the status is still `Running`.** You must keep polling.
```
================== ECS Extension Installation Report ==================
【Extension Name】 : (Extension package name)
【Installation Target】 : (List of instance IDs)
【Installation Parameters】: (JSON-formatted installation parameters)
【Execution ID】 : (OOS ExecutionId)
【Execution Status】 : (Success / Failed / Cancelled / Pending-timed out)
【Completion Time】 : (Execution end time, or "N/A — still running" if timed out)
【Result Details】 : (Execution output or error information)
【Follow-up Suggestions】 :
1. (Suggestion 1, e.g., verify service status)
2. (Suggestion 2, e.g., security group port opening)
3. (Suggestion 3, e.g., check installation logs)
=======================================================================
```
## Best Practices
1. **Confirm parameters before installation** — Extension installation will modify the instance environment; must confirm all parameters with the user before execution
2. **Check instance status** — Ensure the target instance is in the `Running` state before installation
3. **Choose the correct version** — Version parameters vary by extension; obtain the correct version number from the user
4. **Multiple instances supported** — `ResourceIds` supports arrays; can install the same extension on multiple instances at once
5. **Security awareness** — Never expose AK/SK in commands or reports
## Reference Links
| Document | Description |
|----------|-------------|
| [Related Commands](references/related-commands.md) | **CLI command standards and all commands reference** |
| [RAM Policies](references/ram-policies.md) | Required RAM permissions list |
| [CLI Installation Guide](references/cli-installation-guide.md) | Aliyun CLI installation instructions |
## Notes
1. Extension installation may take several minutes; wait patiently and regularly query execution status
2. On API failure, read error messages, check permissions, and retry
3. Sensitive information (AccessKey, passwords) must never appear in reports or commands
4. Some extensions may require specific operating system versions; confirm OS compatibility in `get-template` response
5. Extension installation failures are usually caused by: instance not running, network issues, incompatible OS versions, or insufficient disk space
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
Complete guide for installing and configuring Aliyun CLI.
> **Aliyun CLI 3.3.1+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.1 or later for full plugin ecosystem coverage.
## Installation
### macOS
**Using Homebrew (Recommended)**
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
# Verify version (>= 3.3.1)
aliyun version
```
**Using Binary**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
# Extract
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
# Move to PATH
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
### Linux
**Debian/Ubuntu**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**CentOS/RHEL**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**ARM64 Architecture**
```bash
# Download ARM64 version
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-arm64.tgz
sudo mv aliyun /usr/local/bin/
```
### Windows
**Using Binary**
1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Open new Command Prompt or PowerShell
5. Verify: `aliyun version`
**Using PowerShell**
```powershell
# Download
Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip"
# Extract
Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli
# Add to PATH (requires admin privileges)
$env:Path += ";C:\aliyun-cli"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)
# Verify
aliyun version
```
## Configuration
### Quick Start
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
All `aliyun configure` commands support non-interactive flags, which is the recommended approach —
it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts.
**Where to Get Access Keys**
1. Log in to Aliyun Console: https://ram.console.aliyun.com/
2. Navigate to: AccessKey Management
3. Create a new AccessKey pair
4. Save the secret immediately — it's only shown once
### Configuration Modes
Aliyun CLI supports 6 authentication modes. All examples below use non-interactive flags.
#### 1. AK Mode (Access Key)
Most common mode for personal accounts and scripts.
```bash
aliyun configure set \
--mode AK \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Configuration is stored in `~/.aliyun/config.json`:
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "AK",
"access_key_id": "LTAI5tXXXXXXXX",
"access_key_secret": "8dXXXXXXXXXXXXXXXXXXXXXXXX",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "en"
}
]
}
```
#### 2. StsToken Mode (Temporary Credentials)
For short-lived access (tokens expire in 1-12 hours).
```bash
aliyun configure set \
--mode StsToken \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--sts-token v1.0:XXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Use cases: CI/CD pipelines, temporary access for external contractors, cross-account access.
#### 3. RamRoleArn Mode (Assume RAM Role)
Assume a RAM role for elevated or cross-account access.
```bash
aliyun configure set \
--mode RamRoleArn \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--ram-role-arn acs:ram::123456789012:role/AdminRole \
--role-session-name my-session \
--region cn-hangzhou
```
Use cases: cross-account resource access, temporary elevated privileges, role-based access control.
#### 4. EcsRamRole Mode (ECS Instance RAM Role)
Use the RAM role attached to an ECS instance — no credentials needed.
```bash
aliyun configure set \
--mode EcsRamRole \
--ram-role-name MyEcsRole \
--region cn-hangzhou
```
Requirements: must be running on an ECS instance with a RAM role attached.
Use cases: scripts and automation running on ECS instances.
#### 5. RsaKeyPair Mode (RSA Key Pair)
Use RSA key pair for authentication (generate key pair in Aliyun Console first).
```bash
aliyun configure set \
--mode RsaKeyPair \
--private-key /path/to/private-key.pem \
--key-pair-name my-key-pair \
--region cn-hangzhou
```
#### 6. RamRoleArnWithEcs Mode (ECS + RAM Role)
Combine ECS instance role with RAM role assumption for cross-account access from ECS.
```bash
aliyun configure set \
--mode RamRoleArnWithEcs \
--ram-role-name MyEcsRole \
--ram-role-arn acs:ram::123456789012:role/TargetRole \
--role-session-name my-session \
--region cn-hangzhou
```
### Environment Variables
**Highest priority** - overrides config file
**Access Key Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**STS Token Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_SECURITY_TOKEN=your_sts_token
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**ECS RAM Role Mode**
```bash
export ALIBABA_CLOUD_ECS_METADATA=role_name
```
**Use Case**:
- CI/CD pipelines
- Docker containers
- Temporary credential override
### Managing Multiple Profiles
**Create Named Profiles**
```bash
aliyun configure set --profile projectA \
--mode AK \
--access-key-id LTAI5tAAAAAAAA \
--access-key-secret 8dAAAAAAAAAAAAAAAAAAAAAAAA \
--region cn-hangzhou
aliyun configure set --profile projectB \
--mode AK \
--access-key-id LTAI5tBBBBBBBB \
--access-key-secret 8dBBBBBBBBBBBBBBBBBBBBBBBB \
--region cn-shanghai
```
**Use Specific Profile**
```bash
aliyun ecs describe-instances --profile projectA
export ALIBABA_CLOUD_PROFILE=projectA
aliyun ecs describe-instances # Uses projectA
```
**List and Switch Profiles**
```bash
aliyun configure list # List all profiles
aliyun configure set --current projectA # Switch default profile
```
### Credential Priority
Credentials are loaded in this order (first found wins):
1. **Command-line flag**: `--profile <name>`
2. **Environment variable**: `ALIBABA_CLOUD_PROFILE`
3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc.
4. **Configuration file**: `~/.aliyun/config.json` (current profile)
5. **ECS Instance RAM Role**: If running on ECS with attached role
## Verification
### Test Authentication
```bash
# Basic test - list regions
aliyun ecs describe-regions
# Expected output: JSON array of regions
```
**If successful**, you'll see:
```json
{
"Regions": {
"Region": [
{
"RegionId": "cn-hangzhou",
"RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com",
"LocalName": "China East 1 (Hangzhou)"
},
...
]
},
"RequestId": "..."
}
```
**If failed**, you'll see error messages:
- `InvalidAccessKeyId.NotFound` - Wrong Access Key ID
- `SignatureDoesNotMatch` - Wrong Access Key Secret
- `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode)
- `Forbidden.RAM` - Insufficient permissions
### Debug Configuration
```bash
# Show current configuration
aliyun configure get
# Test with debug logging
aliyun ecs describe-regions --log-level=debug
# Check credential provider
aliyun configure get mode
```
## Security Best Practices
### 1. Use RAM Users (Not Root Account)
❌ **Don't**: Use Aliyun root account credentials
✅ **Do**: Create RAM users with specific permissions
```bash
# Create RAM user in console
# Attach only necessary policies
# Use RAM user's access keys
```
### 2. Principle of Least Privilege
Grant only the minimum permissions needed:
```bash
# Example: Read-only ECS access
# Attach policy: AliyunECSReadOnlyAccess
```
### 3. Rotate Access Keys Regularly
```bash
# Create new access key in RAM Console, then update configuration
aliyun configure set --access-key-id NEW_KEY --access-key-secret NEW_SECRET
# Delete old access key from console
```
### 4. Use STS Tokens for Temporary Access
```bash
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token XXXX --region cn-hangzhou
```
### 5. Use ECS RAM Roles When Possible
```bash
aliyun configure set --mode EcsRamRole --ram-role-name MyRole --region cn-hangzhou
```
### 6. Never Commit Credentials
```bash
# Add to .gitignore
echo "~/.aliyun/config.json" >> .gitignore
# Use environment variables in CI/CD instead
```
### 7. Secure Config File
```bash
# Restrict permissions
chmod 600 ~/.aliyun/config.json
```
## Troubleshooting
### Issue: Command Not Found
```bash
# Check installation
which aliyun
# Check PATH
echo $PATH
# Reinstall or add to PATH
```
### Issue: Authentication Failed
```bash
# Verify configuration
aliyun configure get
# Test with debug
aliyun ecs describe-regions --log-level=debug
# Check credentials in console
# Verify access key is active
```
### Issue: Permission Denied
```bash
# Error: Forbidden.RAM
# Check RAM user permissions
# Attach necessary policies in RAM console
# Example: AliyunECSFullAccess for ECS operations
```
### Issue: STS Token Expired
```bash
# Error: InvalidSecurityToken.Expired
# Reconfigure with new token
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token NEW_TOKEN --region cn-hangzhou
```
### Issue: Wrong Region
```bash
# Some resources may not exist in the specified region
# Check available regions
aliyun ecs describe-regions
# Update default region
aliyun configure set region cn-shanghai
```
## Advanced Configuration
### Custom Endpoint
```bash
# Use custom or private endpoint
export ALIBABA_CLOUD_ECS_ENDPOINT=ecs-vpc.cn-hangzhou.aliyuncs.com
```
### Proxy Settings
```bash
# HTTP proxy
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
# No proxy for specific domains
export NO_PROXY=localhost,127.0.0.1,.aliyuncs.com
```
### Timeout Settings
```bash
# Connection timeout (default: 10s)
export ALIBABA_CLOUD_CONNECT_TIMEOUT=30
# Read timeout (default: 10s)
export ALIBABA_CLOUD_READ_TIMEOUT=30
```
## Next Steps
After installation and configuration:
1. **Install plugins** for services you need (v3.3.1+ supports all published product plugins):
```bash
aliyun plugin install --names ecs vpc rds
# List all available plugins
aliyun plugin list-remote
```
2. **Explore commands**:
```bash
aliyun ecs --help
aliyun fc --help
```
3. **Read documentation**:
- [Command Syntax Guide](./command-syntax.md)
- [Global Flags Reference](./global-flags.md)
- [Common Scenarios](./common-scenarios.md)
## References
- Official Documentation: https://help.aliyun.com/zh/cli/
- RAM Console: https://ram.console.aliyun.com/
- Access Key Management: https://ram.console.aliyun.com/manage/ak
- Plugin Repository: https://github.com/aliyun/aliyun-cli
FILE:references/ram-policies.md
# RAM Policies for ECS Extension Installation
Required RAM permissions for the ECS Extension Installation skill.
## Permission List
| Permission | Action | Description |
|------------|--------|-------------|
| `bss:DescribeOrderDetail` | Query | Query order details for extension billing verification |
| `ecs:DescribeCloudAssistantStatus` | Query | Check Cloud Assistant status on target instances |
| `ecs:DescribeInstances` | Query | Verify instance information (status, region, etc.) |
| `ecs:DescribeInvocations` | Query | List Cloud Assistant command invocations |
| `ecs:DescribeInvocationResults` | Query | View Cloud Assistant command execution results |
| `ecs:RunCommand` | Write | Execute Cloud Assistant commands during installation |
| `oos:GetApplicationGroup` | Query | Get OOS application group information |
| `oos:GetTemplate` | Query | Get detailed information of a specific OOS template |
| `oos:ListInstancePackageStates` | Query | Query instance extension package installation status |
| `oos:ListTemplates` | Query | List available OOS templates (extension packages) |
| `oos:StartExecution` | Write | Start an OOS execution to install the extension |
| `oos:UpdateInstancePackageState` | Write | Update instance extension package state |
| `oss:GetObject` | Read | Download extension package files from OSS |
## Minimum Permission Policy
Use this policy when you only need extension installation functionality:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bss:DescribeOrderDetail",
"ecs:DescribeCloudAssistantStatus",
"ecs:DescribeInstances",
"ecs:DescribeInvocations",
"ecs:DescribeInvocationResults",
"ecs:RunCommand",
"oos:GetApplicationGroup",
"oos:GetTemplate",
"oos:ListInstancePackageStates",
"oos:ListTemplates",
"oos:StartExecution",
"oos:UpdateInstancePackageState",
"oss:GetObject"
],
"Resource": "*"
}
]
}
```
## Full Permission Policy (Recommended)
Recommended for production use with additional query and monitoring permissions:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bss:DescribeOrderDetail",
"ecs:DescribeCloudAssistantStatus",
"ecs:DescribeInstances",
"ecs:DescribeInvocations",
"ecs:DescribeInvocationResults",
"ecs:RunCommand",
"oos:GetApplicationGroup",
"oos:GetTemplate",
"oos:ListExecutions",
"oos:ListInstancePackageStates",
"oos:ListTemplates",
"oos:StartExecution",
"oos:UpdateInstancePackageState",
"oss:GetObject"
],
"Resource": "*"
}
]
}
```
> **Note:** `oos:ListExecutions` is used to query execution status and history, which is helpful for tracking installation progress. `ecs:DescribeInvocationResults` is used to view Cloud Assistant command execution results. `ecs:DescribeCloudAssistantStatus` checks if Cloud Assistant is installed and running on the instance. `oos:ListInstancePackageStates` and `oos:UpdateInstancePackageState` are used for managing extension package states on instances. `oss:GetObject` is required when the extension package needs to be downloaded from OSS. `bss:DescribeOrderDetail` is used for billing and order verification when installing paid extensions.
## Permission Verification Command
After attaching the policy, verify permissions:
```bash
# Verify OOS template query permission
aliyun oos list-templates \
--biz-region-id cn-hangzhou \
--template-type Package \
--share-type Public \
--max-results 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
# Verify ECS instance query permission
aliyun ecs describe-instances \
--region-id cn-hangzhou \
--max-results 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
```
If all commands return data successfully, permissions are correctly configured.
## Common Permission Errors and Troubleshooting
### Error: `Forbidden.RAM` / `NoPermission`
**Cause:** The RAM user does not have the required permissions.
**Solution:**
1. Log in to [RAM Console](https://ram.console.aliyun.com/)
2. Find the target RAM user
3. Click "Add Permissions"
4. Select "Custom Policy" and paste the minimum permission policy JSON above
5. Or select system policies: `AliyunOOSFullAccess` + `AliyunECSFullAccess` (broader permissions)
### Error: `Forbidden` on `oos:StartExecution`
**Cause:** Missing OOS execution permission.
**Solution:** Ensure the policy includes `oos:StartExecution` action.
### Error: `Forbidden` on `ecs:RunCommand`
**Cause:** Cloud Assistant command execution permission is missing.
**Solution:** Ensure the policy includes `ecs:RunCommand` action. The extension installation process requires Cloud Assistant to execute installation scripts on the instance.
### Error: `InvalidAccount.NotFound`
**Cause:** Incorrect AccessKey or the account does not exist.
**Solution:**
- Check if AccessKey ID is correct
- Verify if the AccessKey is active in the RAM console
- Reconfigure credentials outside of this session using `aliyun configure` interactively or via environment variables
### Using Predefined System Policies
If custom policies are not convenient, you can directly attach the following system policies:
| System Policy | Description |
|---------------|-------------|
| `AliyunOOSFullAccess` | Full OOS permissions (includes ListTemplates, GetTemplate, StartExecution, etc.) |
| `AliyunECSFullAccess` | Full ECS permissions (includes RunCommand, DescribeInstances, etc.) |
Attach method:
```bash
# Attach through RAM console or CLI
aliyun ram attach-policy-to-user \
--policy-type System \
--policy-name AliyunOOSFullAccess \
--user-name <your-ram-username> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
```
> **Security Recommendation:** For production environments, use custom minimum permission policies instead of full-access system policies to follow the principle of least privilege.
FILE:references/related-commands.md
# OOS Related Commands Reference
CLI command reference for ECS Extension Installation skill.
## Command Format Standards
- For OOS commands, use plugin mode (lowercase-hyphenated) operation names: `list-templates`, `get-template`, `start-execution`, `list-executions`
- All OOS plugin flags use kebab-case: `--biz-region-id`, `--template-type`, `--share-type`, `--max-results`, `--template-name`, `--execution-id`, etc.
- Always include `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension`
- OOS command format: `aliyun oos <action> --biz-region-id <region> [parameters]`
> **[RECOMMENDED] Flag Verification:** Run `aliyun oos <action> --help` to confirm exact flag names for the installed plugin version.
---
## list-templates
Query available OOS templates (extension packages).
### Command
```bash
aliyun oos list-templates \
--biz-region-id <region-id> \
--template-type Package \
--share-type Public \
--max-results <max-results> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
```
### Parameters
| Parameter | Required | Type | Description |
|-----------|----------|------|-------------|
| `--biz-region-id` | Yes | String | Region ID, e.g., `cn-hangzhou` |
| `--template-type` | No | String | Template type, `Package` for extension packages |
| `--share-type` | No | String | Share type, `Public` for public templates |
| `--max-results` | No | Integer | Maximum number of results, range 1-100 |
| `--next-token` | No | String | Pagination token |
| `--user-agent` | Yes | String | Fixed value `AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension` |
### Output Example
```json
{
"Templates": [
{
"TemplateId": "t-xxxxxxxxxxxxxxxx",
"TemplateName": "ACS-Extension-BaoTaPanelFree-One-Click-1853370294850618",
"TemplateVersion": "v1",
"Description": "{\"categories\":[\"application\"],\"en\":\"BaoTa Panel free edition one-click installation\",\"zh-cn\":\"BaoTa Panel free edition one-click installation\",\"name-en\":\"BaoTaPanelFree-One-Click\",\"name-zh-cn\":\"BaoTaPanelFree-One-Click\",\"image\":\"https://oos-public-template.oss-cn-beijing.aliyuncs.com/BaoTaPanelFree/icon.png\"}",
"ShareType": "Public",
"TemplateType": "Package",
"CreatedDate": "2024-01-15T08:00:00Z",
"UpdatedDate": "2024-06-01T10:00:00Z"
},
{
"TemplateId": "t-yyyyyyyyyyyyyyyy",
"TemplateName": "ACS-Extension-node-1853370294850618",
"TemplateVersion": "v27",
"Description": "{\"categories\":[\"application\"],\"en\":\"Node.js environment one-click installation\",\"zh-cn\":\"Node.js environment one-click installation\",\"name-en\":\"Node.js\",\"name-zh-cn\":\"Node.js\",\"image\":\"https://oos-public-template.oss-cn-beijing.aliyuncs.com/Nodejs/icon.png\"}",
"ShareType": "Public",
"TemplateType": "Package",
"CreatedDate": "2024-03-10T06:00:00Z",
"UpdatedDate": "2024-07-15T12:00:00Z"
}
],
"MaxResults": 100,
"TotalCount": 2,
"RequestId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
```
### Output Field Description
| Field | Description |
|-------|-------------|
| `Templates` | Array of template information |
| `TemplateId` | Unique template ID |
| `TemplateName` | Template name (used as extension package name) |
| `TemplateVersion` | Template version |
| `Description` | Template description (JSON string, see parsing notes below) |
| `ShareType` | Share type: `Public` or `Private` |
| `TemplateType` | Template type: `Package` or `Automation` |
| `TotalCount` | Total number of templates |
| `RequestId` | Request ID (for troubleshooting) |
> **Description Field Parsing:** The `Description` field is a JSON string containing localized metadata. Parse it to extract:
> - `name-zh-cn`: Chinese display name (preferred for display)
> - `name-en`: English display name
> - `zh-cn`: Chinese description
> - `en`: English description
> - `categories`: Category tags array
> - `doc-zh-cn`: Chinese documentation link
> - `doc-en`: English documentation link
> - `image`: Icon URL
### Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| `unknown endpoint for oos/<region>` | Automatic endpoint resolution failed (network issue or location service unreachable) | Verify `--biz-region-id` value is correct; if still fails, check network connectivity |
| `unknown flag: --RegionId` | Using PascalCase flag instead of kebab-case | Use `--biz-region-id` instead of `--RegionId` |
| `Forbidden.RAM` | Insufficient permissions | Ensure required RAM permissions are granted (see SKILL.md Required Permissions section) |
---
## get-template
Get detailed information of a specific OOS template.
### Command
**Recommended: redirect output to a temporary file** (the `Content` field is usually very large and will be truncated in terminal):
```bash
aliyun oos get-template \
--biz-region-id <region-id> \
--template-name <template-name> \
[--template-version <version>] \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension > /tmp/oos-template.json
```
Then extract parameters using `jq`:
```bash
# Extract installation parameters
jq -r '(.Content | fromjson | .Parameters)' /tmp/oos-template.json
# Extract template description
jq -r '.Description' /tmp/oos-template.json
```
> **[IMPORTANT] Output Truncation Warning**: `get-template` returns a `Content` field that contains full installation scripts and can be extremely large. Always redirect to a file first, then parse with `jq` or file read tools. Do **not** rely on terminal output directly.
### Parameters
| Parameter | Required | Type | Description |
|-----------|----------|------|-------------|
| `--biz-region-id` | Yes | String | Region ID, e.g., `cn-hangzhou` |
| `--template-name` | Yes | String | Template name |
| `--template-version` | No | String | Template version, defaults to latest if not specified |
| `--user-agent` | Yes | String | Fixed value `AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension` |
### Output Example
```json
{
"Template": {
"TemplateId": "t-xxxxxxxxxxxxxxxx",
"TemplateName": "ACS-Extension-node-1853370294850618",
"TemplateVersion": "v27",
"Description": "{\"categories\":[\"application\"],\"en\":\"Node.js environment one-click installation\",\"zh-cn\":\"Node.js environment one-click installation\",\"name-en\":\"Node.js\",\"name-zh-cn\":\"Node.js\",\"image\":\"https://oos-public-template.oss-cn-beijing.aliyuncs.com/Nodejs/icon.png\"}",
"Content": "{\"FormatVersion\":\"OOS-2019-06-01\",\"Description\":\"Node.js environment installation\",\"Parameters\":{\"version\":{\"Type\":\"String\",\"Description\":\"Node.js version number\",\"Default\":\"v22.13.1\"}},\"Tasks\":[...]}",
"CreatedDate": "2024-03-10T06:00:00Z",
"UpdatedDate": "2024-07-15T12:00:00Z"
},
"RequestId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
```
### Content Field Parsing
The `Content` field is a JSON string containing the complete template definition. Key fields:
```json
{
"FormatVersion": "OOS-2019-06-01",
"Description": "Template description",
"Parameters": {
"version": {
"Type": "String",
"Description": "Parameter description",
"Default": "default value",
"AllowedValues": ["v1", "v2"]
}
},
"Tasks": [...]
}
```
| Field | Description |
|-------|-------------|
| `Parameters` | Template parameters, defines installation options |
| `Parameters.{name}.Type` | Parameter type: `String`, `Integer`, `Boolean`, etc. |
| `Parameters.{name}.Description` | Parameter description |
| `Parameters.{name}.Default` | Default value |
| `Parameters.{name}.AllowedValues` | List of allowed values |
| `Tasks` | Execution task definitions |
### Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| `TemplateNotFound` | Template name does not exist | Check if the template name is correct, use `list-templates` to query |
| `MissingTemplateName` | Missing `--template-name` parameter | Add `--template-name` parameter |
---
## start-execution
Start an OOS execution to install the extension.
### Command
```bash
aliyun oos start-execution \
--biz-region-id <region-id> \
--template-name "ACS-ECS-BulkyConfigureOOSPackageWithTemporaryURL" \
--mode "Automatic" \
--tags "{}" \
--parameters '<json-parameters>' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
```
### Parameters
| Parameter | Required | Type | Description |
|-----------|----------|------|-------------|
| `--biz-region-id` | Yes | String | Region ID, must match the target instance region |
| `--template-name` | Yes | String | Fixed value `ACS-ECS-BulkyConfigureOOSPackageWithTemporaryURL` |
| `--mode` | Yes | String | Execution mode, `Automatic` for automatic execution |
| `--tags` | No | String | Tags, JSON format string, e.g., `"{}"` |
| `--parameters` | Yes | String | Execution parameters, JSON format string |
| `--user-agent` | Yes | String | Fixed value `AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension` |
### Parameters Field Structure
```json
{
"regionId": "cn-hangzhou",
"OOSAssumeRole": "",
"targets": {
"ResourceIds": ["i-bp12z30vh0wadpyv3jo3"],
"RegionId": "cn-hangzhou",
"Type": "ResourceIds"
},
"rateControl": {
"Mode": "Concurrency",
"Concurrency": 1,
"MaxErrors": 0
},
"action": "install",
"packageName": "ACS-Extension-node-1853370294850618",
"packageVersion": "v27",
"parameters": {
"version": "v22.13.1"
}
}
```
| Parameter | Required | Description |
|-----------|----------|-------------|
| `regionId` | Yes | Region ID, must be consistent with `--biz-region-id` |
| `OOSAssumeRole` | No | RAM role assumed by OOS, leave empty to use default |
| `targets.ResourceIds` | Yes | Array of target instance IDs |
| `targets.RegionId` | Yes | Region ID of target instances |
| `targets.Type` | Yes | Fixed value `ResourceIds` |
| `rateControl.Mode` | Yes | Rate control mode, `Concurrency` or `Batch` |
| `rateControl.Concurrency` | Yes | Number of concurrent executions |
| `rateControl.MaxErrors` | Yes | Maximum number of errors allowed |
| `action` | Yes | Fixed value `install` |
| `packageName` | Yes | Extension package name |
| `packageVersion` | No | Extension package version |
| `parameters` | No | Extension-specific parameters (JSON object) |
### Output Example
```json
{
"Execution": {
"ExecutionId": "exec-xxxxxxxxxxxxxxxx",
"TemplateName": "ACS-ECS-BulkyConfigureOOSPackageWithTemporaryURL",
"Status": "Running",
"CreateDate": "2024-08-01T10:00:00Z",
"UpdateDate": "2024-08-01T10:00:00Z",
"Parameters": {...}
},
"RequestId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
```
### Output Field Description
| Field | Description |
|-------|-------------|
| `ExecutionId` | Unique execution ID, used to query execution status |
| `TemplateName` | Template name |
| `Status` | Execution status: `Running`, `Success`, `Failed`, `Cancelled` |
| `CreateDate` | Execution creation time |
| `UpdateDate` | Execution update time |
| `Parameters` | Execution parameters |
### Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| `InvalidParameter` | Parameter format error | Check if `--parameters` is a valid JSON string |
| `TemplateNotFound` | Template does not exist | Check if `packageName` is correct |
| `EntityNotExists.Instance` | Instance does not exist | Check if `InstanceId` is correct |
| `InvalidInstance.NotRunning` | Instance is not in running state | Start the instance first |
| `Forbidden.RAM` | Insufficient permissions | Ensure required RAM permissions are granted (see SKILL.md Required Permissions section) |
| `RateLimit` | API rate limit exceeded | Wait a moment and retry |
---
## list-executions (Auxiliary Command)
Query OOS execution status and results.
### Command
```bash
aliyun oos list-executions \
--biz-region-id <region-id> \
--execution-id <execution-id> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
```
### Parameters
| Parameter | Required | Type | Description |
|-----------|----------|------|-------------|
| `--biz-region-id` | Yes | String | Region ID |
| `--execution-id` | Yes | String | Execution ID returned by `start-execution` |
| `--user-agent` | Yes | String | Fixed value `AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension` |
### Output Example
```json
{
"Executions": [
{
"ExecutionId": "exec-xxxxxxxxxxxxxxxx",
"TemplateName": "ACS-ECS-BulkyConfigureOOSPackageWithTemporaryURL",
"Status": "Success",
"StatusReason": "Execution completed successfully",
"CreateDate": "2024-08-01T10:00:00Z",
"UpdateDate": "2024-08-01T10:05:00Z",
"Outputs": {
"result": "Installation completed"
},
"Tasks": [
{
"TaskName": "installPackage",
"Status": "Success",
"StatusReason": "Task completed"
}
]
}
],
"TotalCount": 1,
"RequestId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
```
### Execution Status Description
| Status | Description |
|--------|-------------|
| `Running` | Execution in progress |
| `Success` | Execution successful |
| `Failed` | Execution failed |
| `Cancelled` | Execution cancelled |
| `Pending` | Waiting to execute |
### Common Errors
| Error | Cause | Solution |
|-------|-------|----------|
| `ExecutionNotFound` | Execution ID does not exist | Check if the execution ID is correct |
---
## JSON Parameter Escaping Notes
When passing JSON parameters via the command line, pay attention to escaping:
### Bash
```bash
# Use single quotes to wrap the entire JSON to avoid shell escaping issues
aliyun oos start-execution \
--biz-region-id cn-hangzhou \
--template-name "ACS-ECS-BulkyConfigureOOSPackageWithTemporaryURL" \
--parameters '{"regionId":"cn-hangzhou","targets":{"ResourceIds":["i-xxx"],"RegionId":"cn-hangzhou","Type":"ResourceIds"},"action":"install","packageName":"ACS-Extension-node-1853370294850618","parameters":{"version":"v22.13.1"}}'
```
### Complex Parameters
For complex parameters, it is recommended to write them to a file first:
```bash
# Write parameters to file
cat > /tmp/oos-params.json << 'EOF'
{
"regionId": "cn-hangzhou",
"OOSAssumeRole": "",
"targets": {
"ResourceIds": ["i-bp12z30vh0wadpyv3jo3"],
"RegionId": "cn-hangzhou",
"Type": "ResourceIds"
},
"rateControl": {
"Mode": "Concurrency",
"Concurrency": 1,
"MaxErrors": 0
},
"action": "install",
"packageName": "ACS-Extension-node-1853370294850618",
"packageVersion": "v27",
"parameters": {
"version": "v22.13.1"
}
}
EOF
# Read from file
aliyun oos start-execution \
--biz-region-id cn-hangzhou \
--template-name "ACS-ECS-BulkyConfigureOOSPackageWithTemporaryURL" \
--parameters "$(cat /tmp/oos-params.json)" \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-ecs-install-extension
```
## Error Handling Best Practices
1. **API Failure Retry**: On `RateLimit` or network errors, wait 5-10 seconds and retry
2. **Permission Error**: Ensure required RAM permissions are granted, then use `ram-permission-diagnose` skill
3. **Parameter Error**: Carefully check JSON format and required fields
4. **Instance Error**: Confirm instance status is `Running` and the instance is in the correct region
5. **Execution Failure**: Use `list-executions` to query detailed error information; check `StatusReason` and `Tasks` fields
Diagnose ECS instance reboot or crash issues. First checks for abnormal maintenance events, then uses Cloud Assistant to check for internal restarts or kerne...
--- name: alibabacloud-ecs-reboot-or-crash-diagnosis description: Diagnose ECS instance reboot or crash issues. First checks for abnormal maintenance events, then uses Cloud Assistant to check for internal restarts or kernel panics. Use this skill when users report ECS instance unexpected reboot, crash, abnormal shutdown, kernel panic, or OOM. Supports vmcore file analysis, kdump configuration, system log analysis, and Windows crash dump analysis. metadata: pattern: pipeline steps: "5" required_params: "instance_id, region_id" domain: aiops owner: ecs-team contact: [email protected] ai-mode: disabled --- # ECS Instance Reboot/Crash Diagnosis Diagnose root cause of ECS instance unexpected reboot or crash. Uses standard workflow: check platform maintenance events first, then check internal system logs. Supports both Linux and Windows systems. ## Required Parameters Before starting diagnosis, **must** obtain the following parameters from user: | Parameter | Description | Example | |------|------|------| | `INSTANCE_ID` | ECS instance ID | `i-bp1a2b3c4d5e6f7g8h9j` | | `REGION_ID` | Region ID | `cn-hangzhou` | **If user does not provide any of the above parameters, must ask user first. Do not start diagnosis.** ## Mandatory Execution Rules 1. **Must obtain parameters first** — Instance ID and Region ID are required. Must ask user if missing. 2. **Standard workflow cannot be skipped** — Must execute in order: Maintenance Event Check → OSType Detection → System Log Check 3. **Must check Cloud Assistant status before diagnostics** — Before executing Step 3A/3B, must verify Cloud Assistant is running via `DescribeCloudAssistantStatus`. If not running, provide alternative diagnostic approaches. 4. **All diagnostic conclusions must be based on actual data** — No fabrication, speculation, or assumptions 5. **Output format must be strictly followed** — After diagnosis, **must read the complete template in `references/output-format.md`**, output strictly according to template structure. No free-form output, no omitted sections, no changed hierarchy. Every placeholder `{...}` in the template must be filled with actual data. --- ## Prerequisites ### CLI Tools - **aliyun-cli 3.3.3+** (required) — For calling Alibaba Cloud API - Installation & configuration: see [CLI Installation Guide](../../cli-installation-guide.md) ### AI-Mode Configuration (Required) Before using aliyun CLI commands, must configure AI-Mode: ```bash # Enable AI-Mode aliyun configure ai-mode enable # Set user-agent for skill identification aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-ecs-reboot-or-crash-diagnosis" # Update plugins aliyun plugin update ``` **After diagnosis complete, disable AI-Mode:** ```bash aliyun configure ai-mode disable ``` ### Alibaba Cloud Credentials Credentials must be pre-configured **outside of agent session**. Agent only verifies: ```bash aliyun configure list ``` ### Instance Requirements - **Cloud Assistant client must be installed and running** on the instance - Alibaba Cloud Linux: Pre-installed by default - Ubuntu/CentOS/Other: May require manual installation, check with `DescribeCloudAssistantStatus` API - Installation guide: https://help.aliyun.com/document_detail/64930.html - Instance status must be Running - **Note:** If Cloud Assistant is not running, diagnostic commands cannot be executed remotely. Must provide manual diagnostic steps to user. --- ## Required RAM Permissions See **[RAM Policies](references/ram-policies.md)** for the complete permission list and custom policy example. --- ## Step 1: Confirm Instance Information (Cannot Skip) **Verify instance exists and get basic information:** ```bash aliyun ecs describe-instances \ --biz-region-id <REGION_ID> \ --region <REGION_ID> \ --instance-ids '["<INSTANCE_ID>"]' ``` Confirm from returned JSON: - `RegionId` — Region ID (matches user provided) - `Status` — Instance status (Running/Stopped) - `InstanceName` — Instance name - `OSType` — Operating system type (**windows / linux**) **Record OSType for Step 3 branch selection.** --- ## Step 2: Check ECS Maintenance Events **Query instance historical system events to determine if platform maintenance caused reboot:** ```bash aliyun ecs describe-instance-history-events \ --biz-region-id <REGION_ID> \ --region <REGION_ID> \ --instance-id <INSTANCE_ID> \ --event-cycle-status Executed ``` **Event Analysis:** | Event Type | Meaning | Determination | Next Step | |---|---|---|---| | `SystemMaintenance.Reboot` | Reboot caused by system maintenance | Platform-initiated maintenance | Inform user, no further investigation needed | | `SystemFailure.Reboot` | Reboot caused by underlying hardware/system failure | Platform infrastructure failure | Suggest instance migration or contact support | | `InstanceFailure.Reboot` | Reboot caused by instance-level failure | **Instance internal issue detected by platform** | **Must continue to Step 3 for system log check** | | `InstanceExpiration.Stop` | Instance stopped due to expiration | Billing issue | Need renewal, no further investigation | | No relevant events | No platform maintenance events found | Not platform-initiated | Continue to Step 3 | **Important Notes for InstanceFailure.Reboot:** - This event indicates the platform detected an instance-level anomaly and triggered automatic recovery - Common causes: kernel panic, OOM, system hang, critical process failure - **Must execute Step 3** to check system logs for root cause - Even if no obvious errors in logs, the instance may have been unresponsive at kernel level **If maintenance event found:** - Clearly inform user of reboot cause (event type, time, reason) - Provide handling suggestions - End diagnosis flow **If no maintenance event found:** - Continue to Step 3, check internal system logs based on OSType --- ## Step 3A: Linux System Diagnosis (Execute when OSType is linux) ### Step 3A.1: Check Cloud Assistant Status (Mandatory) **Before executing diagnostic commands, verify Cloud Assistant is running:** ```bash aliyun ecs describe-cloud-assistant-status \ --biz-region-id <REGION_ID> \ --region <REGION_ID> \ --instance-id <INSTANCE_ID> ``` **Check the response:** ```json { "InstanceCloudAssistantStatusSet": { "InstanceCloudAssistantStatus": [ { "InstanceId": "i-xxx", "RegionId": "cn-xxx", "CloudAssistantStatus": "true", "LastHeartbeatTime": "2026-04-09T07:26:58Z" } ] } } ``` **Important Notes:** - `CloudAssistantStatus` is a **string** ("true"/"false"), not boolean - Check `LastHeartbeatTime` to ensure it's recent (within last few minutes) - Even if status is "true", RunCommand may still fail if service is unstable - **Always check RunCommand execution result** and handle failures gracefully - **Ubuntu vs RHEL differences:** - RHEL/CentOS/Alibaba Cloud Linux: Service name is `kdump`, crash files named `vmcore-*` - Ubuntu/Debian: Service name is `kdump-tools`, crash files named `dump.*` and `dmesg.*` - Diagnostic script now checks both service names and all crash file types **If CloudAssistantStatus is false or command fails:** - Cloud Assistant is not installed or not running on the instance - **Cannot proceed with remote diagnostic commands** - **Alternative approaches:** 1. Guide user to SSH into the instance and check logs manually 2. Provide manual diagnostic commands for user to execute 3. Suggest installing Cloud Assistant: [Installation Guide](https://help.aliyun.com/document_detail/64930.html) 4. Check instance monitoring data via CloudMonitor API **If CloudAssistantStatus is true:** - Proceed to Step 3A.2 ### Step 3A.2: Execute Linux Diagnostic Script Execute Linux diagnostic script via Cloud Assistant to check: - System reboot records (`last reboot`, `/var/log/messages` or `/var/log/syslog`) - Kernel Panic records (`dmesg`) - OOM records and `vm.panic_on_oom` configuration - Kdump configuration and crash dump file status - **Crash dump files**: vmcore (RHEL/CentOS) or dump.*/dmesg.* (Ubuntu/Debian) **Complete diagnostic commands: see [diagnostic-commands.md](references/diagnostic-commands.md#linux-system-diagnosis)** **Linux Result Analysis:** | Finding | Possible Cause | Suggestion | |---|---|---| | Kernel Panic + crash dump (vmcore/dump.*) | Kernel crash, dump file generated | Read dmesg.* file for panic reason, contact Alibaba Cloud technical support for deep analysis | | Kernel Panic + no crash dump | Kernel crash, but kdump not configured or not working | **Proceed to Step 5**: Recommend Kdump configuration for future crash capture | | OOM + panic_on_oom=1 | OOM triggered kernel panic | Disable panic_on_oom or increase memory | | OOM Killer | Memory insufficient causing process killed | Optimize memory usage or upgrade instance type | | SysRq triggered crash | Manual crash trigger via `/proc/sysrq-trigger` | Check if intentional test, review bash history and audit logs | | Normal reboot records | User or program triggered reboot | Check cron jobs or ops scripts | | No abnormal records | No system-level issues found | May be external factors, suggest monitoring | --- ## Step 3B: Windows System Diagnosis (Execute when OSType is windows) ### Step 3B.1: Check Cloud Assistant Status (Mandatory) **Before executing diagnostic commands, verify Cloud Assistant is running:** ```bash aliyun ecs describe-cloud-assistant-status \ --biz-region-id <REGION_ID> \ --region <REGION_ID> \ --instance-id <INSTANCE_ID> ``` **Check the response:** - `CloudAssistantStatus: true` — Cloud Assistant is running, proceed to Step 3B.2 - `CloudAssistantStatus: false` — Cloud Assistant is not running - **Cannot proceed with remote diagnostic commands** - Guide user to SSH/RDP into instance and run diagnostics manually - Suggest reinstalling Cloud Assistant: [Windows Installation Guide](https://help.aliyun.com/document_detail/64930.html) ### Step 3B.2: Execute Windows Diagnostic Script Execute Windows diagnostic script via Cloud Assistant to check: - System uptime and unexpected shutdown events (Event ID 41, 1074, 6008, 6006) - Memory dump configuration and pagefile settings - MEMORY.DMP and minidump files existence - BSOD events and application crashes **Complete diagnostic commands: see [diagnostic-commands.md](references/diagnostic-commands.md#windows-system-diagnosis)** **Windows Result Analysis:** | Finding | Possible Cause | Suggestion | |---|---|---| | Event 41 (Kernel-Power) | Unexpected shutdown/crash | Check for BSOD, dump files | | Dump configured + dump file exists | System crashed and captured dump | Contact Alibaba Cloud technical support for dump file analysis | | Dump configured + no dump file | Crash occurred but no dump captured | Check pagefile and disk space | | Dump not configured | Crash dumps disabled | Enable memory dump for diagnosis | | BSOD events found | Blue screen crash occurred | Check bug check code in dump | | No abnormal events | No system-level crash records | May be power issue or external factor | --- ## Step 3.5: Get Cloud Assistant Command Output (Required after Step 3) After executing diagnostic script via `RunCommand`, query the execution result: ```bash aliyun ecs describe-invocations \ --biz-region-id <REGION_ID> \ --region <REGION_ID> \ --instance-id <INSTANCE_ID> \ --invoke-id <INVOKE_ID> ``` **Important Notes:** - Use `--instance-id` (not `--instance-id.1`) for describe-invocations API - The `InvokeId` is returned by the `RunCommand` API call - Decode the `Output` field from Base64 to get diagnostic results - Check `InvokeStatus` to ensure command execution completed successfully --- ## Step 4: Analyze Crash Dump Files If Step 3 found crash dump files (vmcore on Linux, MEMORY.DMP/minidump on Windows), perform preliminary analysis. **Complete analysis commands: see [diagnostic-commands.md](references/diagnostic-commands.md#crash-dump-analysis)** > **Important:** If Linux vmcore files need deep analysis or Windows dump files (MEMORY.DMP/minidump) are found, recommend the user contact Alibaba Cloud technical support team for professional crash dump analysis assistance. --- ## Step 5: Recommend Kdump Configuration (If Not Configured) **If Step 3A found Kernel Panic records but no vmcore files, must advise user to configure Kdump.** ### When to Recommend Kdump Configuration - Kernel panic records found in dmesg or system logs, but `/var/crash` has no vmcore files - Kdump service status shows `inactive` or `failed` - `/proc/cmdline` does not contain `crashkernel=` parameter ### Key Points to Communicate 1. **Why Kdump is needed**: Without Kdump, kernel crashes will not generate vmcore files, making root cause analysis impossible. 2. **Configuration requirements**: - Reserve memory for crash kernel via `crashkernel=` kernel parameter - Enable and start the kdump (RHEL/CentOS) or kdump-tools (Ubuntu/Debian) service - Ensure sufficient disk space in `/var/crash` (or configured path) 3. **Configuration reference**: Provide guidance from [diagnostic-commands.md](references/diagnostic-commands.md#kdump-配置建议) ### Kdump Configuration Steps Summary **RHEL/CentOS/Alibaba Cloud Linux:** 1. Install: `yum install -y kexec-tools` 2. Add `crashkernel=auto` to kernel parameters in `/etc/default/grub` 3. Run `grub2-mkconfig -o /boot/grub2/grub.cfg` 4. Reboot the instance 5. Enable: `systemctl enable --now kdump` **Ubuntu/Debian:** 1. Install: `apt-get install -y kdump-tools` 2. Set `USE_KDUMP=1` in `/etc/default/kdump-tools` 3. Run `update-grub` (crashkernel parameter usually auto-added) 4. Reboot the instance 5. Verify: `systemctl status kdump-tools` ### Windows Memory Dump Configuration If Step 3B found BSOD events but no dump files: 1. Verify pagefile is configured and has sufficient size 2. Enable memory dump: System Properties → Advanced → Startup and Recovery → Settings 3. Select "Automatic memory dump" or "Kernel memory dump" 4. Ensure `CrashDumpEnabled` registry value is not 0 --- ## Final Output (Must execute after diagnosis complete) **After all diagnostic steps complete, must do both of the following:** 1. **Read `references/output-format.md`** — Get complete output format template 2. **Output strictly according to template structure** — Choose corresponding template based on actual result --- ## References - **[Output Format](references/output-format.md)** — Diagnostic result output template - **[Common Scenarios](references/scenarios.md)** — Typical problem diagnosis examples - **[Diagnostic Commands](references/diagnostic-commands.md)** — Complete diagnostic scripts and analysis commands FILE:references/diagnostic-commands.md # Diagnostic Commands Reference 本文档提供诊断检查项和命令参考。**Agent 应根据实际操作系统类型和版本,动态生成适配的诊断命令。** > **重要原则**: > - 不同 Linux 发行版的服务名称、日志路径、工具命令可能不同 > - 先通过 `DescribeInstances` 获取 OSType 和 OSName,再生成适配的命令 > - 命令应包含错误处理,避免因路径不存在或命令不可用而中断 --- ## Linux 系统诊断 ### 检查项清单 | 检查项 | 目的 | 参考命令 | |--------|------|----------| | 系统重启记录 | 查看历史重启时间和来源 | `last reboot`, `who -b` | | 系统日志中的重启/关机记录 | 识别正常/异常关机 | `grep -i "reboot\|shutdown" /var/log/messages` 或 `/var/log/syslog` | | Kernel Panic 记录 | 检测内核崩溃 | `dmesg | grep -i panic`, 日志文件搜索 | | OOM Killer 记录 | 检测内存不足导致的进程终止 | 日志文件搜索 "Out of memory", "oom", "Kill process" | | OOM Panic 配置 | 判断 OOM 是否会触发系统重启 | `sysctl -n vm.panic_on_oom` | | Kdump 服务状态 | 验证崩溃转储是否配置并启用 | `systemctl status kdump` 或 `systemctl status kdump-tools` | | Kdump 配置文件 | 获取转储文件存储路径 | `/etc/kdump.conf` 或 `/etc/default/kdump-tools` | | 崩溃转储文件 | 检查是否存在 vmcore/dump 文件 | 检查配置的路径或默认 `/var/crash` | ### 操作系统差异对照表 | 项目 | RHEL/CentOS/Alibaba Cloud Linux | Ubuntu/Debian | |------|--------------------------------|---------------| | 系统日志路径 | `/var/log/messages` | `/var/log/syslog` | | Kdump 服务名 | `kdump` | `kdump-tools` | | Kdump 配置文件 | `/etc/kdump.conf` | `/etc/default/kdump-tools` | | 崩溃转储文件名 | `vmcore` (目录: `127.0.0.1-date-time/`) | `dump.*` + `dmesg.*` | | 默认转储路径 | `/var/crash` | `/var/crash` | ### 命令示例参考 以下命令仅供参考,Agent 应根据实际操作系统动态调整。 #### 1. 获取系统信息 ```bash # 操作系统版本 cat /etc/os-release # 内核版本 uname -r # 系统运行时间 uptime ``` #### 2. 系统重启历史 ```bash # 重启历史记录 last reboot | head -10 # 最近一次启动时间 who -b ``` #### 3. 系统日志检查 **RHEL/CentOS/Alibaba Cloud Linux:** ```bash # 重启/关机相关日志 grep -i "reboot\|shutdown\|restart" /var/log/messages | tail -20 # Kernel Panic 记录 dmesg | grep -i "panic\|oops" | tail -20 grep -i "kernel panic" /var/log/messages | tail -10 # OOM 记录 grep -i "out of memory\|oom\|kill process" /var/log/messages | tail -20 ``` **Ubuntu/Debian:** ```bash # 重启/关机相关日志 grep -i "reboot\|shutdown\|restart" /var/log/syslog | tail -20 # Kernel Panic 记录 dmesg | grep -i "panic\|oops" | tail -20 grep -i "kernel panic" /var/log/syslog | tail -10 # OOM 记录 grep -i "out of memory\|oom\|kill process" /var/log/syslog | tail -20 ``` #### 4. OOM Panic 配置 ```bash # 检查 OOM 时是否触发 panic sysctl -n vm.panic_on_oom # 返回值说明: # 0 - OOM 只杀死进程,系统继续运行 # 1 - OOM 触发 kernel panic,导致系统重启 ``` #### 5. Kdump 状态检查 **RHEL/CentOS/Alibaba Cloud Linux:** ```bash # 检查 kdump 服务状态 systemctl status kdump # 检查是否已配置 cat /etc/kdump.conf # 获取转储路径 (默认 /var/crash) grep "^path" /etc/kdump.conf ``` **Ubuntu/Debian:** ```bash # 检查 kdump-tools 服务状态 systemctl status kdump-tools # 检查是否已配置 cat /etc/default/kdump-tools # 检查内核 crashkernel 参数 cat /proc/cmdline | grep crashkernel ``` #### 6. 崩溃转储文件检查 ```bash # 检查转储目录 ls -la /var/crash/ # RHEL/CentOS: 查找 vmcore 文件 find /var/crash -name "vmcore*" -type f -exec ls -lh {} \; # Ubuntu/Debian: 查找 dump 和 dmesg 文件 find /var/crash -name "dump.*" -type f -exec ls -lh {} \; find /var/crash -name "dmesg.*" -type f -exec ls -lh {} \; # 检查最近 7 天的转储文件 find /var/crash -type f \( -name "vmcore*" -o -name "dump.*" -o -name "dmesg.*" \) -mtime -7 ``` --- ## Kdump 配置建议 ### 何时需要建议配置 Kdump 以下情况应建议用户配置 Kdump: 1. **检测到 Kernel Panic 迹象但无 vmcore 文件** - dmesg 或系统日志中有 panic 记录 - 但 `/var/crash` 目录为空或不存在 2. **Kdump 服务未运行** - `systemctl status kdump` 显示 inactive/failed - `systemctl status kdump-tools` 显示 inactive/failed 3. **内核未配置 crashkernel 参数** - `/proc/cmdline` 中没有 `crashkernel=` 参数 ### Kdump 配置参考 **RHEL/CentOS/Alibaba Cloud Linux:** ```bash # 1. 安装 kexec-tools (如未安装) yum install -y kexec-tools # 2. 配置 /etc/kdump.conf # 默认配置通常可用,关键配置项: # path /var/crash # 转储文件存储路径 # core_collector makedumpfile -l --message-level 1 -d 31 # 压缩转储 # 3. 在 /etc/default/grub 的 GRUB_CMDLINE_LINUX 中添加 crashkernel 参数 # crashkernel=auto 或 crashkernel=128M # 4. 更新 grub 配置 grub2-mkconfig -o /boot/grub2/grub.cfg # 5. 重启系统使 crashkernel 生效 reboot # 6. 启用并启动 kdump 服务 systemctl enable kdump systemctl start kdump systemctl status kdump ``` **Ubuntu/Debian:** ```bash # 1. 安装 kdump-tools apt-get install -y kdump-tools # 2. 配置 /etc/default/kdump-tools # USE_KDUMP=1 # 3. 更新 grub 配置 (安装时通常会自动添加 crashkernel 参数) update-grub # 4. 重启系统 reboot # 5. 验证服务状态 systemctl status kdump-tools ``` ### Kdump 配置验证 ```bash # 验证 crashkernel 参数已生效 cat /proc/cmdline | grep crashkernel # 验证 kdump 服务状态 systemctl status kdump # RHEL/CentOS systemctl status kdump-tools # Ubuntu/Debian ``` --- ## Windows 系统诊断 ### 检查项清单 | 检查项 | 目的 | 参考 PowerShell 命令 | |--------|------|----------------------| | 系统信息 | 获取 Windows 版本和主机名 | `Get-ComputerInfo` | | 系统运行时间 | 判断最近是否重启过 | `[WMI]'\\.\root\cimv2:Win32_OperatingSystem'` | | 意外关机事件 | 检测非正常关机 | Event ID 41, 6008, 6006, 1074 | | 内存转储配置 | 验证是否配置了崩溃转储 | 注册表 `CrashControl` | | 页面文件配置 | 转储文件需要页面文件支持 | `Get-CimInstance Win32_PageFileUsage` | | MEMORY.DMP 文件 | 检查完整内存转储文件 | `Test-Path C:\Windows\MEMORY.DMP` | | Minidump 文件 | 检查小型转储文件 | `Get-ChildItem C:\Windows\Minidump` | | BSOD 事件 | 检测蓝屏错误报告 | WER 事件日志 | ### 事件 ID 说明 | Event ID | 来源 | 含义 | |----------|------|------| | 41 | Kernel-Power | 系统意外重启(未正常关机) | | 1074 | User32 | 正常关机/重启,记录原因 | | 6008 | EventLog | 上次关机是意外的 | | 6006 | EventLog | 事件日志服务已停止(正常关机) | ### 内存转储类型 | CrashDumpEnabled | 类型 | 说明 | |------------------|------|------| | 0 | None | 禁用内存转储 | | 1 | Complete | 完整内存转储(最大,约等于内存大小) | | 2 | Kernel | 内核内存转储(中等大小) | | 3 | Small | 小内存转储(64KB,Minidump) | | 7 | Automatic | 自动内存转储(推荐) | ### PowerShell 命令示例 ```powershell # 系统信息 Get-ComputerInfo | Select-Object WindowsProductName, WindowsVersion, OsArchitecture, CsName # 系统运行时间 $os = Get-CimInstance Win32_OperatingSystem $uptime = (Get-Date) - $os.LastBootUpTime Write-Host "Last boot: $($os.LastBootUpTime)" Write-Host "Uptime: $($uptime.Days) days, $($uptime.Hours) hours" # 意外关机事件 Get-WinEvent -FilterHashtable @{LogName="System"; ID=41,1074,6008,6006} -ErrorAction SilentlyContinue | Select-Object TimeCreated, Id, Message -First 10 # 内存转储配置 $crashControl = Get-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl" Write-Host "CrashDumpEnabled: $($crashControl.CrashDumpEnabled)" Write-Host "DumpFile: $($crashControl.DumpFile)" Write-Host "MinidumpDir: $($crashControl.MinidumpDir)" # 页面文件配置 Get-CimInstance Win32_PageFileUsage | Select-Object Name, AllocatedBaseSize, CurrentUsage # 检查 MEMORY.DMP $dumpFile = $crashControl.DumpFile if (-not $dumpFile) { $dumpFile = "C:\Windows\MEMORY.DMP" } if (Test-Path $dumpFile) { $fileInfo = Get-Item $dumpFile Write-Host "MEMORY.DMP found: Size=$([math]::Round($fileInfo.Length/1GB,2)) GB, Modified=$($fileInfo.LastWriteTime)" } # 检查 Minidump 文件 $minidumpDir = $crashControl.MinidumpDir if (-not $minidumpDir) { $minidumpDir = "C:\Windows\Minidump" } if (Test-Path $minidumpDir) { Get-ChildItem -Path $minidumpDir -Filter "*.dmp" | Sort-Object LastWriteTime -Descending | Select-Object -First 5 } # BSOD 事件 Get-WinEvent -FilterHashtable @{LogName="System"; ProviderName="Microsoft-Windows-WER-SystemErrorReporting"} -ErrorAction SilentlyContinue | Select-Object TimeCreated, Id, Message -First 10 ``` ### Windows 内存转储配置建议 当检测到 BSOD 事件但无转储文件时,建议配置内存转储: 1. **通过系统属性配置**: - 右键"此电脑" → 属性 → 高级系统设置 - 启动和故障恢复 → 设置 - 选择"自动内存转储"或"内核内存转储" - 确保页面文件大小足够(至少内存大小 + 1MB) 2. **PowerShell 配置**: ```powershell # 设置自动内存转储 Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl" -Name "CrashDumpEnabled" -Value 7 # 确保页面文件存在且大小足够 # 通常由系统自动管理,检查方法: $cs = Get-CimInstance Win32_ComputerSystem if ($cs.AutomaticManagedPagefile) { Write-Host "Pagefile is automatically managed" } else { # 手动配置页面文件 # 需要重启生效 } ``` --- ## 崩溃转储文件分析 ### Linux vmcore 分析 如果找到 vmcore 文件,可读取 `vmcore-dmesg.txt` 进行初步分析: ```bash # 查看 vmcore-dmesg.txt 内容 cat /var/crash/127.0.0.1-*/vmcore-dmesg.txt # 关键信息搜索 grep -i "kernel panic" /var/crash/*/vmcore-dmesg.txt grep -i "RIP:" /var/crash/*/vmcore-dmesg.txt grep -i "Call Trace" /var/crash/*/vmcore-dmesg.txt ``` **关键信息解读**: | 关键字 | 含义 | |--------|------| | `Kernel panic - not syncing: VFS` | 文件系统相关问题 | | `Kernel panic - not syncing: Attempted to kill init` | init 进程崩溃 | | `Kernel panic - not syncing: Out of memory` | OOM 导致崩溃 | | `RIP: 0010:` | 崩溃时的指令位置 | | `Call Trace:` | 调用栈 | | `MCE` / `Machine Check Exception` | 硬件错误 | > **注意**:深度 vmcore 分析需要使用 `crash` 工具和调试符号包,建议联系阿里云技术支持获取专业分析。 ### Windows 转储文件分析 使用 WinDbg 或 BlueScreenView 工具分析: ``` # WinDbg 命令 !analyze -v # 自动分析崩溃原因 k # 查看调用栈 .bugcheck # 查看 bugcheck 代码 ``` > **注意**:深度 dump 分析建议联系阿里云技术支持。 --- ## 通过云助手执行命令 诊断命令通过阿里云云助手远程执行: ### Linux 命令执行 ```bash aliyun ecs run-command \ --biz-region-id <REGION_ID> \ --region <REGION_ID> \ --type RunShellScript \ --instance-id <INSTANCE_ID> \ --timeout 3600 \ --command-content '<SCRIPT_CONTENT>' ``` ### Windows 命令执行 ```bash aliyun ecs run-command \ --biz-region-id <REGION_ID> \ --region <REGION_ID> \ --type RunPowerShellScript \ --instance-id <INSTANCE_ID> \ --timeout 3600 \ --command-content '<SCRIPT_CONTENT>' ``` ### 获取命令执行结果 ```bash aliyun ecs describe-invocations \ --biz-region-id <REGION_ID> \ --region <REGION_ID> \ --instance-id <INSTANCE_ID> \ --invoke-id <INVOKE_ID> ``` **注意**:`Output` 字段为 Base64 编码,需要解码后查看。 FILE:references/output-format.md # Output Format Requirements After diagnosis is complete, output results according to the following structure. ## Table of Contents 1. [Linux Diagnosis Result](#linux-diagnosis-result) 2. [Windows Diagnosis Result](#windows-diagnosis-result) --- ## Linux Diagnosis Result ```markdown ## Diagnostic Progress ### Step 1: Confirm Instance Information > First need to confirm instance basic info and region. - Instance ID: {instance_id} - Region: {region_id} - OS Type: Linux - Current Status: {status} ### Step 2: Check Maintenance Events > Check if platform maintenance events caused reboot. **Findings:** - {event_query_result} ### Step 3A: Linux System Diagnosis (if needed) > No maintenance events found, checking internal restart or panic records. **Cloud Assistant Status Check:** - Cloud Assistant Running: {yes/no} - If no: {explain why cannot proceed and provide alternative approaches} **Findings:** - {cloud_assistant_execution_result} **Kdump Configuration Status:** - Service Status: {active/inactive/active (kdump-tools)} - Service Type: {kdump (RHEL/CentOS) / kdump-tools (Ubuntu/Debian)} - Crash Dump Path: {configured_path} **OOM Panic Configuration:** - vm.panic_on_oom: {0/1} - Impact: {OOM kills process only / OOM triggers kernel panic and reboot} **Crash Dump File Check:** - Found crash dumps: {yes/no} - Dump type: {vmcore (RHEL) / dump.*+dmesg.* (Ubuntu)} - Latest dump: {file_path, size, time} - Panic reason (from dmesg): {panic_message_if_available} **Alternative Diagnostic Approaches (if Cloud Assistant not available):** ```bash # Provide these commands to user for manual execution via SSH ssh root@{instance_public_ip} # Check reboot history last reboot # Check system logs grep -i "reboot\|shutdown\|panic\|oom" /var/log/syslog | tail -50 # Check dmesg for errors dmesg | grep -i "panic\|oom\|error" | tail -20 # Check kdump systemctl status kdump ls -lh /var/crash/ ``` ### Step 4A: vmcore-dmesg.txt Analysis (if vmcore found) > Found vmcore file, reading vmcore-dmesg.txt for preliminary analysis. **Panic Reason:** - {panic_specific_reason} **Crash Location:** - RIP: {function_and_address_at_crash} - Involved Modules: {related_kernel_modules} **Call Stack:** ``` {key_call_stack_fragment} ``` ### Step 5: Kdump Configuration Recommendation (if no vmcore and kdump not configured) > Kernel panic detected but no crash dump file found. Kdump is not properly configured. **Current Kdump Status:** - Service Status: {kdump_service_status} - crashkernel Parameter: {present/absent in /proc/cmdline} - Config File Exists: {yes/no} **Why Kdump is Needed:** Without Kdump configured, kernel crashes will not generate vmcore files, making root cause analysis impossible for future occurrences. **Configuration Steps for {OS_Type}:** {configuration_steps_based_on_os} --- ## Diagnostic Conclusion - **Root Cause Analysis**: {root_cause} - **Impact Scope**: {impact_scope} --- ## Recommendations 1. {recommendation_1} 2. {recommendation_2} ``` --- ## Windows Diagnosis Result ```markdown ## Diagnostic Progress ### Step 1: Confirm Instance Information > First need to confirm instance basic info and region. - Instance ID: {instance_id} - Region: {region_id} - OS Type: Windows - Current Status: {status} ### Step 2: Check Maintenance Events > Check if platform maintenance events caused reboot. **Findings:** - {event_query_result} ### Step 3B: Windows System Diagnosis (if needed) > No maintenance events found, checking Windows crash dump and event logs. **System Uptime:** - Last Boot Time: {last_boot_time} - Uptime: {days} days, {hours} hours **Unexpected Shutdown Events:** - {shutdown_event_summary} **Memory Dump Configuration:** - CrashDumpEnabled: {0/1/2/3/7} - Dump Type: {None/Complete/Kernel/Small/Automatic} - Dump File Path: {dump_file_path} - Pagefile: {configured/not configured} **Memory Dump File Check:** - Memory dump file: {found/not found} - File size: {size} - Last modified: {timestamp} **Minidump Files:** - Count: {count} - Latest: {filename, timestamp} **BSOD Events:** - {bsod_event_summary} --- ## Diagnostic Conclusion - **Root Cause Analysis**: {root_cause} - **Impact Scope**: {impact_scope} --- ## Recommendations 1. {recommendation_1} 2. {recommendation_2} ``` FILE:references/ram-policies.md # RAM 权限清单 本 Skill 执行所需的 RAM 权限(最小权限原则): ## 必需权限 `ecs:DescribeInstances` — 确认实例存在并获取基本信息(状态、名称、操作系统类型) `ecs:DescribeInstanceAttribute` — 获取实例详细属性,用于操作系统类型检测和分支选择 `ecs:DescribeInstanceHistoryEvents` — 查询实例历史维护事件,判断是否为平台触发重启 `ecs:DescribeCloudAssistantStatus` — 验证云助手运行状态,确保远程诊断命令可执行 `ecs:RunCommand` — 通过云助手执行诊断脚本(Linux Shell 或 Windows PowerShell) `ecs:DescribeInvocations` — 获取云助手命令执行结果,提取诊断输出 ## 权限说明 - **权限范围**: 仅包含诊断所需的只读和命令执行权限 - **写操作**: 无(本 Skill 不修改实例配置) - **通配符**: 未使用(遵循最小权限原则) ## 自定义策略示例 ```json { "Version": "1", "Statement": [ { "Effect": "Allow", "Action": [ "ecs:DescribeInstances", "ecs:DescribeInstanceAttribute", "ecs:DescribeInstanceHistoryEvents", "ecs:DescribeCloudAssistantStatus", "ecs:RunCommand", "ecs:DescribeInvocations" ], "Resource": "*" } ] } ``` ## 使用场景 此权限配置适用于 ECS 实例故障诊断场景: - 检查平台维护事件 - 通过云助手远程执行诊断命令 - 获取系统日志和崩溃转储文件信息 - 分析重启/崩溃根因 FILE:references/scenarios.md # Common Diagnostic Scenarios This document lists common diagnostic scenarios and expected outputs for the ecs-reboot-or-crash skill. --- ## Linux Scenarios ### Scenario 1: System Maintenance Reboot ``` Diagnosis Result: - Found event: SystemMaintenance.Reboot - Event time: 2025-03-20 10:00:00 - Reason: Planned system maintenance Conclusion: Instance reboot was caused by Alibaba Cloud platform maintenance, which is normal ops activity. ``` --- ### Scenario 2: Kernel Panic + vmcore Available ``` Diagnosis Result: - No maintenance events found - Cloud Assistant found: "Kernel panic - not syncing" record in dmesg - Kdump service status: active - Found vmcore: /var/crash/127.0.0.1-2025-03-20-10:30:00/vmcore (2.5G) - vmcore time: 2025-03-20 10:30:00 vmcore-dmesg.txt analysis: - Panic reason: Kernel panic - not syncing: Fatal exception in interrupt - Crash location: RIP: 0010:nvme_queue_rq+0x1a2/0x4d0 [nvme] - Involved module: nvme driver - Call stack: nvme_queue_rq -> blk_mq_dispatch_rq_list -> ... Conclusion: Instance rebooted due to NVMe driver abnormality causing kernel crash, vmcore dump file generated. Suggestion: Check NVMe driver version, upgrade driver or kernel if needed. Use crash tool for deeper vmcore analysis. ``` --- ### Scenario 3: Kernel Panic + No vmcore ``` Diagnosis Result: - No maintenance events found - Cloud Assistant found: "Kernel panic" record in dmesg - Kdump service status: inactive - Found vmcore: No Conclusion: Instance rebooted due to kernel crash, but kdump not configured or not working, unable to capture vmcore. Suggestion: Configure kdump service to capture vmcore on next crash for root cause analysis. ``` --- ### Scenario 4: OOM with panic_on_oom Enabled ``` Diagnosis Result: - No maintenance events found - Cloud Assistant found: "Out of memory: Kill process" record in /var/log/messages - vm.panic_on_oom: 1 (OOM triggers kernel panic) - Kdump service status: active - Found vmcore: Yes Conclusion: OOM event triggered kernel panic because vm.panic_on_oom=1, causing system reboot. Suggestions: 1. Disable panic_on_oom: sysctl -w vm.panic_on_oom=0 (add to /etc/sysctl.conf for persistence) 2. Optimize application memory usage or upgrade instance type 3. Review OOM killed processes to identify memory-hungry applications ``` --- ### Scenario 5: OOM Killer Only ``` Diagnosis Result: - No maintenance events found - Cloud Assistant found: "Out of memory: Kill process" record in /var/log/messages - vm.panic_on_oom: 0 (OOM only kills process, no panic) - No kernel panic records found Conclusion: Instance triggered OOM Killer due to insufficient memory, some processes were terminated but system continued running. Suggestion: Optimize application memory usage, or upgrade instance type. ``` --- ## Windows Scenarios ### Scenario 6: BSOD with Memory Dump ``` Diagnosis Result: - No maintenance events found - Unexpected shutdown event: Event ID 41 (Kernel-Power) - BSOD events found in WER logs - Memory dump configuration: Automatic memory dump (CrashDumpEnabled=7) - Memory dump file found: C:\Windows\MEMORY.DMP (4.2 GB) - Dump time: 2025-03-20 10:30:00 Conclusion: Windows BSOD crash occurred, memory dump captured. Suggestion: Download MEMORY.DMP and analyze with WinDbg: 1. Install Windows Debugging Tools 2. Open dump file in WinDbg 3. Run: !analyze -v ``` --- ### Scenario 7: BSOD without Dump (Not Configured) ``` Diagnosis Result: - No maintenance events found - Unexpected shutdown event: Event ID 41 (Kernel-Power) - BSOD events found in WER logs - Memory dump configuration: None (CrashDumpEnabled=0) - Memory dump file: Not found Conclusion: Windows BSOD crash occurred but memory dump was not configured. Suggestions: 1. Enable memory dump: System Properties > Advanced > Startup and Recovery > Settings 2. Select "Automatic memory dump" or "Kernel memory dump" 3. Ensure pagefile is configured and has sufficient space ``` --- ### Scenario 8: BSOD without Dump (Pagefile Issue) ``` Diagnosis Result: - No maintenance events found - Unexpected shutdown event: Event ID 41 (Kernel-Power) - Memory dump configuration: Automatic memory dump (CrashDumpEnabled=7) - Pagefile: Not configured - Memory dump file: Not found Conclusion: Windows BSOD crash occurred but memory dump was not captured because pagefile is not configured. Suggestions: 1. Configure pagefile: System Properties > Advanced > Performance > Settings > Advanced > Virtual memory 2. Set pagefile size to at least RAM size + 1MB 3. Reboot for pagefile changes to take effect ``` --- ### Scenario 9: Minidump Available ``` Diagnosis Result: - No maintenance events found - Unexpected shutdown event: Event ID 41 (Kernel-Power) - Memory dump configuration: Small memory dump (CrashDumpEnabled=3) - Minidump files found in C:\Windows\Minidump: - 032025-12345-01.dmp (128 KB, 2025-03-20 10:30:00) Conclusion: Windows crash occurred, minidump captured. Suggestion: Analyze minidump with WinDbg Preview (Microsoft Store) or BlueScreenView tool. ``` --- ### Scenario 10: Application Crash Causing Instability ``` Diagnosis Result: - No maintenance events found - No unexpected shutdown events - Application crash events found: Multiple crashes of {application_name}.exe - No system crash dump files Conclusion: Application crashes detected but no system-level crash. System remained running. Suggestions: 1. Check application logs for crash details 2. Verify application compatibility with Windows version 3. Check for application updates or known issues ```
Quick BI-SmartQ skill with multiple data analysis capabilities: 1. **File Q&A**: Upload Excel/CSV files for intelligent analysis via Quick BI API 2. **Datase...
---
name: alibabacloud-quickbi-smartq
description: |
Quick BI-SmartQ skill with multiple data analysis capabilities:
1. **File Q&A**: Upload Excel/CSV files for intelligent analysis via Quick BI API
2. **Dataset Q&A**: Natural language queries on Quick BI platform datasets, with automatic intelligent table selection and matching
3. **Document Parsing**: Parse PDF/Word/Excel/CSV/images, extract text, and support extracting key fields to generate structured Excel
4. **Dashboard Skill Generation**: Auto-convert QuickBI dashboards into data query skills
5. **Data Insight**: Deep data insight analysis on Quick BI datasets
6. **Data Report**: Auto-generate professional data reports based on analysis results
Use when users mention data analysis, smart Q&A, querying data, file analysis,
document parsing, dashboard skills, data insight, or data reports.
compatibility: "tools: [python3, pip, browser], runtime: [requests, pyyaml, matplotlib, numpy]"
metadata:
label: Quick BI-SmartQ
version: "1.3.0"
---
# Quick BI-SmartQ — QuickBI Data Analysis Assistant
One entry point covering all QuickBI data analysis capabilities. Automatically routes to the corresponding module based on user intent — no manual selection required.
## Scope
**Does:**
- Automatically identify user intent and route to the corresponding data analysis module
- Perform natural-language analysis on user-uploaded Excel/CSV files via the Quick BI API (File Q&A)
- Perform natural-language query analysis on Quick BI platform datasets, with automatic intelligent table selection and matching (Dataset Q&A)
- Parse PDF/Word/Excel/CSV/images, extract text, and support extracting key fields to generate structured Excel (Document Parsing)
- Auto-convert QuickBI dashboards into data query skills (Dashboard Skill Generation)
- Perform deep insight analysis on datasets (Data Insight)
- Auto-generate professional data reports based on analysis results (Data Report)
**Does NOT:**
- Use pandas/openpyxl/csv or similar libraries to read files locally for analysis in Q&A scenarios
- Require users to manually choose a module or provide internal parameters such as cubeId
- Perform tasks unrelated to QuickBI data analysis
## Task Routing
Automatically determine intent based on user input and route to the corresponding module for execution.
### Routing Decision Table
| User Intent | Routed Module | Reference Document |
|------------------------|--------------------------|------------------------------|
| Uploaded Excel/CSV file, wants to query specific metrics or answer specific data questions (e.g. TOP N, comparison, filtering) | File Q&A | [module-chat-file.md](references/chat/module-chat-file.md) |
| 上传了 Excel/CSV 文件,要查询具体指标或回答具体数据问题(如 TOP N、对比、筛选) | File Q&A | [module-chat-file.md](references/chat/module-chat-file.md) |
| No file uploaded, wants to query/analyze specific metrics in platform datasets | Dataset Q&A | [module-chat-dataset.md](references/chat/module-chat-dataset.md) |
| 未上传文件,要查询/分析平台数据集中的具体指标 | Dataset Q&A | [module-chat-dataset.md](references/chat/module-chat-dataset.md) |
| Uploaded multiple files (PDF/Word/images etc.) or selected a folder, wants to query specific data questions (e.g. TOP N, comparison, filtering) | Document Parsing → File Q&A | [module-document-parser.md](references/document/module-document-parser.md) → [module-chat-file.md](references/chat/module-chat-file.md) |
| 上传了多个文件(PDF/word/图片等),或者选择文件夹,要查询具体数据问题(如 TOP N、对比、筛选) | Document Parsing → File Q&A | [module-document-parser.md](references/document/module-document-parser.md) → [module-chat-file.md](references/chat/module-chat-file.md) |
| Uploaded PDF/Word/images or other unstructured documents, or selected a folder, wants to parse all file contents or extract fields | Document Parsing | [module-document-parser.md](references/document/module-document-parser.md) |
| 上传了 PDF/Word/图片等非结构化文档,或者选择文件夹,要解析所有文件内容或提取字段 | Document Parsing | [module-document-parser.md](references/document/module-document-parser.md) |
| Provided a QuickBI dashboard URL, wants to generate a query skill | Dashboard Skill Generation | [module-dashboard.md](references/dashboard/module-dashboard.md) |
| 提供了 QuickBI 仪表板 URL,要生成查询技能 | Dashboard Skill Generation | [module-dashboard.md](references/dashboard/module-dashboard.md) |
| Uploaded Excel file, wants deep interpretation/insight/trend analysis of data (not generating a report document) | Data Insight | [module-data-insight.md](references/insight/module-data-insight.md) |
| 上传了 Excel 文件,要对数据进行深度解读/洞察/趋势分析(不生成报告文档) | Data Insight | [module-data-insight.md](references/insight/module-data-insight.md) |
| Uploaded multiple files (PDF/Word/images etc.) or selected a folder, wants deep interpretation/insight/trend analysis of data (not generating a report document) | Document Parsing → Data Insight | [module-document-parser.md](references/document/module-document-parser.md) → [module-data-insight.md](references/insight/module-data-insight.md) |
| 上传了多个文件(PDF/word/图片等),或者选择文件夹,要对数据进行深度解读/洞察/趋势分析(不生成报告文档) | Document Parsing → Data Insight | [module-document-parser.md](references/document/module-document-parser.md) → [module-data-insight.md](references/insight/module-data-insight.md) |
| Wants to generate a report/analysis report/review report, regardless of whether files are uploaded | Data Report | [module-data-report.md](references/report/module-data-report.md) |
| 要生成报告/分析报告/复盘报告,无论是否上传文件 | Data Report | [module-data-report.md](references/report/module-data-report.md) |
### Routing Priority Rules
When user intent may match multiple modules, determine by the following priority:
1. **"Report" keyword takes priority** (「报告」关键词优先): When user intent contains keywords like "report", "review", "summary report", "analysis report" (报告/复盘/总结报告/分析报告), **ALWAYS route to the Data Report module**, regardless of whether files are uploaded. Data Report module has higher priority than File Q&A and Data Insight.
2. **"Interpret", "insight", "trend" keywords** (「解读」「洞察」「趋势」关键词): When user wants to understand data meaning, discover trends, or gain insights (解读/洞察/趋势), route to the Data Insight module.
3. **Specific data query** (具体数据查询): When user wants to query specific metrics (TOP N, sum, comparison, etc.), route to the Q&A module (choose Dataset Q&A or File Q&A based on whether files are present).
4. **Dashboard URL** (仪表板 URL): When user provides a dashboard link, route to Dashboard Skill Generation.
### Routing Examples
| User Input | Routing Result | Reasoning |
|-------------------------------------|--------------------------------------------------------------|-------------------|
| "Help me find the product with the highest sales in this data" + uploaded file | → File Q&A (module-chat-file) | Querying specific metric, has file |
| "帮我查一下这份数据中销售额最高的产品" + 上传文件 | → File Q&A (module-chat-file) | 查具体指标,有文件 |
| "Help me analyze this Excel data, TOP 10 headcount by department" + uploaded file | → File Q&A (module-chat-file) | Querying specific metric (TOP N), has file |
| "帮我分析这份Excel数据,各部门人数分布TOP10" + 上传文件 | → File Q&A (module-chat-file) | 查具体指标(TOP N),有文件 |
| "Top 3 regions with the highest sales" | → Dataset Q&A (module-chat-dataset) | Querying specific metric, no file |
| "销量最高的地区TOP3" | → Dataset Q&A (module-chat-dataset) | 查具体指标,无文件 |
| "Parse these contracts and summarize the information" + folder | → Document Parsing (module-document-parser) | |
| "解析这些合同并汇总信息" + 文件夹 | → Document Parsing (module-document-parser) | |
| "Convert this dashboard into a query skill" + URL | → Dashboard Skill Generation (module-dashboard) | Provided dashboard URL |
| "把这个仪表板转化为查询技能" + URL | → Dashboard Skill Generation (module-dashboard) | 提供了仪表板 URL |
| "Help me interpret the trend in sales data" + uploaded file | → Data Insight (module-data-insight) | Requests interpretation/insight, not a report |
| "帮我解读一下销售数据的趋势" + 上传文件 | → Data Insight (module-data-insight) | 要求解读/洞察,非报告 |
| "Any patterns and insights in this data" + uploaded file | → Data Insight (module-data-insight) | Requests insight analysis |
| "这份数据有什么规律和洞察" + 上传文件 | → Data Insight (module-data-insight) | 要求洞察分析 |
| "Generate a sales data report for this month" | → Data Report (module-data-report) | Contains "report" keyword |
| "生成一份本月销售数据报告" | → Data Report (module-data-report) | 含「报告」关键词 |
| "Help me generate an analysis report based on this Excel" + uploaded file | → Data Report (module-data-report) | Contains "report" keyword, file used as reference |
| "帮我基于这份Excel生成一份分析报告" + 上传文件 | → Data Report (module-data-report) | 含「报告」关键词,文件作为参考资料 |
| "Summarize these data, write a review report" + uploaded files | → Data Report (module-data-report) | Contains "review report" keyword |
| "汇总这几份数据,写一份复盘报告" + 上传文件 | → Data Report (module-data-report) | 含「复盘报告」关键词 |
| "Combine these files to generate a data analysis report" + uploaded files | → Data Report (module-data-report) | Contains "report" keyword |
| "结合这些文件生成数据分析报告" + 上传文件 | → Data Report (module-data-report) | 含「报告」关键词 |
| "Parse these 10 invoice PDFs, extract fields and generate Excel" + multiple files | → Document Parsing (module-document-parser) | Contains "extract fields" related keywords |
| "解析这10个发票PDF,提取字段生成Excel" + 多文件 | → Document Parsing (module-document-parser) | 含"提取字段"等相关关键字 |
| "Help me find the product with the highest sales in this data" + multiple files or folder | → Document Parsing → File Q&A (module-chat-file) | Querying specific metric, has multiple files |
| "帮我查一下这份数据中销售额最高的产品" + 多个文件或者文件夹 | → Document Parsing → File Q&A (module-chat-file) | 查具体指标,有多个文件 |
| "Any patterns and insights in these files" + multiple files or folder | → Document Parsing → Data Insight (module-data-insight) | Requests insight analysis |
| "这些文件中的数据有什么规律和洞察" + 多个文件或者文件夹 | → Document Parsing → Data Insight (module-data-insight) | 要求洞察分析 |
| "Summarize these data, write a review report" + ≤5 files | → Data Report (module-data-report) | Contains "review report" keyword |
| "汇总这几份数据,写一份复盘报告" + 5个以内文件 | → Data Report (module-data-report) | 含「复盘报告」关键词 |
| "Summarize these data, write a review report" + >5 files | → Document Parsing → Data Report (module-data-report) | Contains "review report" keyword |
| "汇总这几份数据,写一份复盘报告" + >5个文件 | → Document Parsing → Data Report (module-data-report) | 含「复盘报告」关键词 |
### Fallback Rules
- When intent is unclear, **default to Dataset Q&A** (module-chat-dataset) (当意图不明确时,默认路由到数据集问数)
- If the user involves multiple modules at the same time (e.g. "analyze data and generate a report" ("分析数据并生成报告")), execute them in sequence
- **Special scenario — multi-file preprocessing before Q&A** (特殊场景 — 多文件问数前置处理):
- When user uploads ≥5 unstructured documents (PDF/Word/images, etc.) and requests analysis
- **MUST execute Document Parsing first** (generate structured Excel)
- **Then route to the corresponding functional module based on question intent** (perform intelligent analysis on the generated Excel)
- Example: "Analyze these invoice data" + 10 PDFs → Document Parsing (generate Excel) → File Q&A (analyze Excel)
- 示例:"分析这些发票数据" + 10个PDF → 文档解析(生成Excel) → 文件问数(分析Excel)
- If routing is incorrect, allow the user to manually specify the module (路由错误时允许用户手动指定模块)
## Configuration
This skill uses a **layered configuration** architecture, separating user configuration from the skill package. **Skill package updates will NOT overwrite user configuration**.
> **`<workspace-dir>` convention**: In this document, `<workspace-dir>` refers to the absolute path of the folder the user currently has open in the IDE / file manager. The Agent MUST confirm this path before the first operation by running a Python script with `os.getenv('CODE_AGENT_CURRENT_SESSION_WORK_DIR')`. If the script returns nothing or empty, use the absolute path of the folder selected by the user. MUST NOT infer using `$PWD`, `$CWD`, or `Path.cwd()` or similar runtime variables.
>
> **`<skill-package-dir>` convention**: In this document, `<skill-package-dir>` refers to the root directory of this skill after installation (i.e. the directory containing this `SKILL.md` file). The Agent can infer it from the path of this file.
### Configuration Loading Priority (higher overrides lower)
1. **Environment variable** `ACCESS_TOKEN` (highest priority, suitable for container deployment)
2. **Workspace-level configuration** `<workspace-dir>/.qbi/smartq-chat/config.yaml`
3. **QBI global configuration** `~/.qbi/config.yaml` (shared by all skills)
4. **Default configuration** `default_config.yaml` inside the skill package (package defaults, updated with the package)
`server_domain`, `api_key`, `api_secret`, and `user_token` can be placed in the workspace-level configuration or the global configuration. When both exist, the workspace-level configuration takes priority.
### Configuration Item Descriptions
- **`server_domain`**: Quick BI service domain
- **`api_key`** / **`api_secret`**: OpenAPI authentication key pair (if not configured, built-in defaults are used for trial mode)
- **`user_token`**: Quick BI platform user ID; the Q&A interface requires `userId` (if not configured, it is registered automatically and written back)
If `use_env_property: true` is enabled, the configuration can be overridden through the `qbi_api_key`, `qbi_api_secret`, `qbi_server_domain`, and `qbi_user_token` fields in the `ACCESS_TOKEN` environment variable JSON.
### Automatic Trial Credential Registration
When neither `api_key` nor `api_secret` is configured (regardless of whether `user_token` exists), the script will:
1. If `user_token` is also not configured, print a friendly message informing the user that trial credentials will be registered automatically and trial mode will begin
2. Use built-in default credentials to populate `api_key` and `api_secret`
3. Automatically register a user based on the device's unique identifier and write the userId to the global configuration `~/.qbi/config.yaml` (not affected by skill package updates)
> Note: `user_token` existing alone in the global configuration (from automatic trial registration) will NOT prevent trial credential population. ONLY when `api_key` or `api_secret` exists in an external configuration will the trial flow be skipped.
Trial expiration is controlled by the server-side interface through error code `AE0579100004` — no local tracking is required.
### Custom Configuration Guidance
When users want to use their own Quick BI account credentials (rather than trial credentials), sign in to the Quick BI console, click the avatar option **"Copy skill configuration with one click"**, as shown below:
> Show the configuration screenshot to the user based on current locale:
> - zh_CN: 
> - en_US: 
After copying, paste the configuration to the Agent. The Agent will automatically write `server_domain`, `api_key`, `api_secret`, and `user_token` into the workspace-level configuration `<workspace-dir>/.qbi/smartq-chat/config.yaml` (and decide whether to sync to the global configuration based on the `save_global_property` switch).
## Agent Configuration Update Rules (Required Reading)
**Zero-configuration initialization for new users**: If the user says "initialize configuration", "I am a new user", or similar, but **has NOT provided any specific configuration values**, there is no need to manually write anything to any configuration file. Tell the user to run Q&A directly — the system will automatically complete trial registration (see the **Automatic Trial Credential Registration** section above).
ONLY apply the following write rules when the user **explicitly provides** specific configuration values.
**Existing configuration protection rule**: Before writing, the Agent **MUST** first check whether the workspace-level configuration file `<workspace-dir>/.qbi/smartq-chat/config.yaml` already exists and contains valid configuration. If the file already exists and is non-empty, the Agent **MUST NOT** modify or overwrite any configuration items on its own, unless the user **explicitly expresses intent to update** (e.g. "update my configuration", "replace with this configuration", "change api_key to xxx", etc.). When existing configuration is found, inform the user that configuration already exists and ask whether to confirm overwriting.
When the user provides any one or more of `api_key`, `api_secret`, `user_token`, or `server_domain`, and the above protection rule is satisfied, the Agent **MUST** use a file editing tool to directly modify the corresponding user configuration file and write the provided values into the matching fields.
**Write location rules**:
- `server_domain`, `api_key`, `api_secret`, `user_token` → **ALWAYS** write to the **workspace-level configuration** `<workspace-dir>/.qbi/smartq-chat/config.yaml`
- Global configuration read/write is controlled by the `save_global_property` switch (default `true`):
- If the switch is `false` → **MUST NOT read or write global configuration under any circumstance**, skip the global-configuration-related steps below
- If the switch is `true` and the **global configuration** `~/.qbi/config.yaml` is empty or does not exist → also write to the global configuration
- If the switch is `true` and the global configuration already contains content → write ONLY to the workspace-level configuration, then ask the user "Global configuration already exists. Do you want to sync the update?" and decide whether to write based on the user's reply
**Procedure**:
1. Extract configuration key-value pairs from the user message (support common formats such as `key: value`, `key:value`, and `key=value`)
2. Use a file editing tool (such as search_replace) to write the configuration into the workspace-level configuration file
3. Read the `save_global_property` value in the configuration; if it is `false`, skip to step 5
4. Check whether the global configuration `~/.qbi/config.yaml` exists and is non-empty:
- If it is empty or does not exist → also write to the global configuration
- If it already contains content → ask the user "Global configuration already exists. Do you want to sync the update?" and decide whether to write based on the user's reply
5. After the update, confirm to the user which configuration items were written and where they were written
**Prohibited actions**:
- ❌ MUST NOT refuse to modify configuration citing reasons such as "limited permissions" or "unable to modify files inside the skill package"
- ❌ MUST NOT suggest workarounds such as using environment variables or manually copying files
- ❌ MUST NOT only output the configuration content and ask the user to modify it themselves
## Prerequisites
- Python dependencies MUST be installed: `pip install requests pyyaml matplotlib numpy`
- Browser automation capability is required (Dashboard Skill Generation module ONLY)
- Dataset Q&A: user MUST have **Q&A permission** for the target dataset
- File Q&A: file formats limited to `xls`, `xlsx`, `csv`; single file size ≤ 10MB
- **Document Parsing**:
- System dependency: `brew install tesseract tesseract-lang` (required for local parsing ONLY)
- Supported formats: PDF, Word (.doc/.docx), Excel (.xls/.xlsx), CSV, images (.png/.jpg/.jpeg)
- Single file size ≤ 10MB (remote OCR limit)
- **Error handling**:
- Local parsing failure → automatically falls back to remote OCR
- Remote OCR still fails → classified as "parse failure", retaining original filename and error message
- Unknown document type → extract 5+ generic fields, MUST obtain user confirmation before generating Excel
- Detailed documentation: [module-document-parser.md](references/document/module-document-parser.md)
## Script Calling Convention (Required Reading)
When calling any Python script:
1. The script path MUST use the **absolute path of the installed skill package directory** (i.e. `<skill-package-dir>/scripts/...`); MUST NOT use relative paths
2. **MUST** pass the absolute path of `<workspace-dir>` via the `--workspace-dir` parameter (see the conventions in the **Configuration** section above for how to obtain it)
3. Wrap path parameter values in quotes (to prevent shell tokenization issues caused by Chinese characters, spaces, or other special characters)
4. `smartq_stream_query.py`, `file_stream_query.py`, `q_insights.py`, `create_chat.py`, `generate_report.py` MUST include the `--locale` parameter — see **User Locale Determination Rules** below
**Invocation examples**:
```bash
# File upload
python '<skill-package-dir>/scripts/chat/upload_file.py' '/path/to/data.xlsx' --workspace-dir '<workspace-dir>'
# File Q&A
python '<skill-package-dir>/scripts/chat/file_stream_query.py' <fileId> "各部门人数分布" --locale zh_CN --workspace-dir '<workspace-dir>'
# Dataset Q&A
python '<skill-package-dir>/scripts/chat/smartq_stream_query.py' "TOP 3 regions by sales" --locale zh_CN --workspace-dir '<workspace-dir>'
# Dataset Q&A (with dataset name hint — enables name lookup, exact match skips intelligent table selection)
python '<skill-package-dir>/scripts/chat/smartq_stream_query.py' "Based on 'Order Sales Details', what is the sales share by platform in Q1?" --cube-name 'Order Sales Details' --locale zh_CN --workspace-dir '<workspace-dir>'
# Document Parsing - local
python '<skill-package-dir>/scripts/document/document_local_parse.py' '/path/to/folder/' --json --workspace-dir '<workspace-dir>'
# Document Parsing - remote OCR
python '<skill-package-dir>/scripts/document/document_remote_ocr.py' '/path/to/folder/' --workspace-dir '<workspace-dir>'
# Excel generation
python '<skill-package-dir>/scripts/document/generate_excel.py' '<json-path>' --workspace-dir '<workspace-dir>'
# Data Insight
python '<skill-package-dir>/scripts/insight/q_insights.py' "这个报表有什么异常?" --excel-file '/path/to/data.xlsx' --locale zh_CN --workspace-dir '<workspace-dir>'
# Report generation
python '<skill-package-dir>/scripts/report/generate_report.py' "本月销售分析" --locale zh_CN --workspace-dir '<workspace-dir>'
```
### User Locale Determination Rules
> **Core principle**: `--locale` MUST be determined **SOLELY based on the user's input text**, NOT influenced by any other source.
Valid values: `zh_CN` or `en_US` only.
**Determination method**:
- Examine the **user's original input message** (the question or instruction the user typed)
- Identify the **question/instruction language** — the language of the sentence structure, verbs, and functional words (not embedded proper nouns)
- Chinese question language → `zh_CN`; English or other question language → `en_US`
**Mixed-language handling** (critical):
- When user input contains both Chinese and English, determine locale by the **question framing language**, NOT by embedded entity names (dataset names, field names, table names, etc.)
- Entity names (dataset names, field names, etc.) embedded in the question are **proper nouns / references** — they do NOT indicate the user's language preference
- Rule of thumb: strip out quoted names or recognizable entity references, then judge the language of the remaining sentence structure
**What counts as "user input text"**:
- ✅ The text the user typed in the current conversation turn
- ✅ The user's original question when performing follow-up queries in the same session
**What MUST NOT influence locale determination**:
- ❌ The Agent's own reply language (Agent may reply in a language different from the user's input)
- ❌ API response content or error messages (these are always in a fixed language regardless of the user's language)
- ❌ Script console output text
- ❌ Dataset names, field names, or other metadata — whether returned by the platform OR embedded in the user's question as references
- ❌ System prompt language or Agent configuration language
**Example**:
- User input: "帮我分析销售数据" → `--locale zh_CN` (question language is Chinese)
- User input: "Analyze sales data" → `--locale en_US` (question language is English)
- User input: "Analyze the 销售数据集" → `--locale en_US` (question language is English; "销售数据集" is a dataset name reference, not the question language)
- User input: "Show me data from 2024年度报表" → `--locale en_US` (question language is English; "2024年度报表" is a dataset name)
- User input: "帮我查一下 Sales Dataset 的数据" → `--locale zh_CN` (question language is Chinese; "Sales Dataset" is a dataset name)
- User input: "帮我分析销售数据", but API returns English error message → `--locale zh_CN` (locale is determined by user input, NOT by API response)
- Previous Agent reply was in English, user then types "查询TOP3" → `--locale zh_CN` (locale is determined by user input, NOT by Agent's previous reply)
**Prohibited actions**:
- ❌ MUST NOT omit the `--workspace-dir` parameter when calling scripts
- ❌ MUST NOT use relative paths to call scripts (e.g. `python3 scripts/chat/...`)
- ❌ MUST NOT use hard-coded paths or guessed paths
- ❌ MUST NOT omit the `--locale` parameter when calling scripts that require it
- ❌ MUST NOT determine `--locale` based on Agent's own output language or API/script return content
FILE:references/chat/module-chat.md
# 问数模块 (Chat Module)
> 配置说明请参见主文件的「配置」章节。
## Scope
**Does:**
- 对 Quick BI 平台已授权数据集进行自然语言查询分析(数据集问数)
- 对用户上传的 Excel/CSV 文件通过 Quick BI API 进行自然语言分析(文件问数)
- 自动智能选表匹配最合适的数据集,无需用户提供 cubeId
- 渲染 matplotlib 图表并输出可视化结果和分析结论
**Does NOT:**
- 在问数场景下使用 pandas/openpyxl/csv 等库直接读取文件进行本地分析
- 要求用户手动提供 cubeId 或其他内部参数
## 技能触发与模式选择
### 模式 A:数据集问数(无文件上传)
- 用户没有上传文件,要查询平台数据集 → **数据集问数**
- 触发词示例:"问数""小Q问数""查下xx数据集""数据集提问""自然语言查询"
### 模式 B:文件问数(有文件上传)
- 用户上传了 Excel/CSV 文件并对数据提问 → **文件问数**
- 触发词示例:"帮我分析这份数据""查询xx最多的TOP10""各部门销售额对比""分析下这个文件""文件问数"
- **执行方式**:严格按两步脚本执行(upload_file.py → file_stream_query.py),不得用其他方式读取或分析文件
## 前置条件
- 需安装 Python 依赖:`pip install requests pyyaml matplotlib numpy`
- 数据集问数:用户需要有目标数据集的**问数权限**
- 文件问数:文件格式限 `xls`、`xlsx`、`csv`,单文件大小 ≤ 10MB
---
## 模式 A — 数据集问数
对 Quick BI 平台上已授权的数据集进行自然语言查询。
### 工作流程
一步式执行,脚本内部自动完成完整的问数 → 取数 → 渲染流程:
```mermaid
flowchart LR
input["用户问题"] --> hasCubeId{"已指定 cubeId?"}
hasCubeId -- 是 --> streamQuery["SSE 流式问数"]
hasCubeId -- 否 --> queryCubes["查询有权限的数据集"]
queryCubes --> tableSearch["智能选表 POST /tableSearch\n(带 cubeIds 参数)"]
tableSearch -- 匹配到 --> streamQuery
tableSearch -- 未匹配 --> relevance["按文本相关性选择最相关数据集"]
relevance --> streamQuery
streamQuery --> parseSSE["实时解析 SSE 事件"]
parseSSE --> reasoning["输出推理过程"]
parseSSE --> olapResult["olapResult 事件\n(取数结果直接内联)"]
olapResult --> chart["matplotlib 图表 或 Markdown 表格"]
parseSSE --> conclusion["输出结论"]
```
> **等待预期**:问数分析通常需要 15~60 秒,复杂查询可能更久。建议在发起问数前告知用户正在分析中。
### 执行命令
**默认用法(自动智能选表,无需提供 cubeId)**:
```bash
python scripts/chat/smartq_stream_query.py "分析销售数据集中销量最高的地区TOP3"
```
> **cubeId 是可选参数**,脚本会自动查询用户有权限的数据集并通过智能选表匹配最合适的数据集,无需用户手动提供。
可选:已知目标数据集 ID 时直接指定(跳过智能选表):
```bash
python scripts/chat/smartq_stream_query.py "总销售额是多少" --cube-id "dcbb0f94-4cee-4ba2-9950-927918bdd498"
```
可选:提供候选数据集列表辅助智能选表:
```bash
python scripts/chat/smartq_stream_query.py "总销售额是多少" --cube-ids "cubeId1,cubeId2,cubeId3"
```
### 内部处理流程
1. **智能选表**(当未指定 `--cube-id` 时自动触发):
- 调用 `GET /openapi/v2/smartq/query/llmCubeWithThemeList` 查询用户有权限的数据集列表
- 按用户问题与数据集名称的文本相关性对所有权限数据集预排序
- 使用**自适应降级策略**调用 `POST /openapi/v2/smartq/tableSearch` 进行智能选表:
- 依次尝试批次大小 `[30, 10]`(可配置),取当前批次最相关的 top N 个数据集
- 若接口返回 `"cubeIds can not be empty or over limit"` 错误,自动降级到下一批次
- 传入参数:`userQuestion`、`userId`、`llmNameForInference`(默认 `SYSTEM_deepseek-r1-0528`)、`cubeIds`
- 任意批次匹配成功即返回第一个 cubeId,不再继续尝试
- 若所有批次均未匹配到结果,则按文本相关性从权限数据集中选取最相关的一个
2. **调用问数流式接口**:`POST /openapi/v2/smartq/queryByQuestionStream`,请求体为 JSON(`userQuestion`、`cubeId`、`userId` 等),响应为 SSE 事件流
3. **实时解析 SSE 事件**(事件格式:`event:message\ndata:{"data":"xxx","type":"xxx","subType":"xxx"}`):
- `relatedInfo` → 输出关联知识(数据集名称、业务定义等)
- `reasoning` → 输出推理过程(subType `MODEL_REASONING` 为模型推理)
- `text` / `sql` → 输出文本和 SQL 语句
- `olapResult` → **核心步骤**,取数结果直接内联在事件流中
- `summary` → 输出数据解读(subType `MODEL_REASONING` 为模型推理)
- `conclusion` → 输出分析结论
- `check` → 校验错误信息
- `error` → 异常错误信息
- `finish` → 问数结束
4. **olapResult 事件处理** :
- 从事件 `data` 中解析取数结果 JSON,包含 `values`(行数据)、`chartType`(图表类型枚举)、`metaType`(字段元信息)、`logicSql`(查询 SQL)
- `metaType` 中 `t` 字段标识维度(dimension)或度量(measure),`type` 字段标识 row/column,多维度场景下 `colorLegend` 标识颜色图例维度
- `chartType` 枚举:`NEW_TABLE`(交叉表) / `BAR`(柱图) / `LINE`(线图) / `PIE`(饼图) / `SCATTER_NEW`(散点图) / `INDICATOR_CARD`(指标看板) / `RANKING_LIST`(排行榜) / `DETAIL_TABLE`(明细表) / `MAP_COLOR_NEW`(色彩地图) / `PROGRESS_NEW`(进度条) / `FUNNEL_NEW`(漏斗图)
- 将数据转换为 chart_renderer 格式并使用 matplotlib 渲染图表(输出到 `$WORKSPACE_DIR/output/` 目录)
- matplotlib 不可用时回退为 Markdown 表格
### 输出说明
脚本运行时会实时输出以下内容:
- `[关联知识]` 命中的数据集和业务定义
- `[推理过程]` AI 的分析推理
- `[SQL]` 生成的查询 SQL
- `[取数结果]` 图表类型和取数状态
- **图表图片或 Markdown 表格**:取决于图表类型和渲染条件(详见下方「展示规则」)
- **`[图表数据]`**:所有图表的结构化数据(含字段信息、数据行、图表类型等)会保存到 JSON 文件,并在控制台输出文件路径
- `[结论]` 最终分析结论
- `[数据解读]` 对数据的进一步解读分析
- `[Trace]` 请求追踪 ID(问题反馈时提供此 ID 可加速排查)
- `[完成]` 问数结束
### 展示规则
> 并非所有问数结果都会生成图表图片。脚本会根据图表类型和渲染条件自动选择输出**图片**或 **Markdown 表格**,Agent 应根据脚本的实际输出格式进行回复。
**何时有图片**:脚本输出中包含 `[...](...)` 时,说明图表已渲染为 PNG。
**何时无图片**:以下场景脚本只输出 Markdown 表格或纯文字结论,不会有 `[...](...)` 图片:
- 图表类型为交叉表(`NEW_TABLE`)或明细表(`DETAIL_TABLE`)→ 直接输出 Markdown 表格
- matplotlib 未安装或渲染失败 → 回退为 Markdown 表格
- 取数结果为空(`values` 无数据)→ 仅输出 `[结论]` 和 `[数据解读]`
- 查询校验失败或出错 → 仅输出 `[校验]` 或 `[错误]` 信息
#### 有图片时(强制)
> **MUST**:脚本输出中包含 `[...](...)` 图片引用时,Agent 的答复中**必须原样包含**该 Markdown 图片语法,否则用户无法看到图表。这是硬性要求,不可省略。
1. **原样复制** `` 到答复正文中,让用户直接看到可视化结果
2. 紧接图片下方标注图表文件路径,例如:`> 图表路径:$WORKSPACE_DIR/output/chart_xxx.png`
3. **不要**在图表上方添加「饼图如下」「脚本输出路径」之类的机械化引导文字,分析结论自然衔接即可
4. 如果有多张图表,按脚本输出顺序逐一内联展示
#### 无图片时
1. 如果脚本输出了 Markdown 表格,**直接展示表格**,结合 `[结论]` 和 `[数据解读]` 进行总结
2. 如果脚本既无图片也无表格(取数为空、查询失败等),基于 `[结论]` / `[错误]` / `[校验]` 信息向用户说明结果
3. **禁止**在没有图片输出时自行编造 `[...](...)` 图片语法或占位表格
**示例 A — 脚本输出包含图片时**:
假设脚本输出中包含:
```

```
Agent 回复应为:
```
根据分析结果,销量最高的三个地区如下:

> 图表路径:/path/output/chart_1744123456_1.png
从图表可以看出,华东地区以 XX 万的销量位居第一……
```
**示例 B — 脚本输出为 Markdown 表格时(无图片)**:
假设脚本输出中包含:
```
[取数结果] 图表类型: table (交叉表), 字段数: 3, 数据行数: 5
| 地区 | 销量 | 占比 |
|------|------|------|
| 华东 | 1200 | 35% |
| 华南 | 980 | 28% |
| 华北 | 750 | 22% |
| 西南 | 320 | 9% |
| 其他 | 210 | 6% |
[结论] 华东地区销量最高,占总销量的 35%
[数据解读] 华东和华南两个地区合计占比超过 60%,是主要销售区域……
```
Agent 回复应为:
```
根据数据集的查询结果:
| 地区 | 销量 | 占比 |
|------|------|------|
| 华东 | 1200 | 35% |
| 华南 | 980 | 28% |
| 华北 | 750 | 22% |
| 西南 | 320 | 9% |
| 其他 | 210 | 6% |
华东地区销量最高,占总销量的 35%。华东和华南两个地区合计占比超过 60%,是主要销售区域……
```
**示例 C — 取数结果为空或查询失败时**:
假设脚本输出:
```
[取数结果] 查询结果(无数据)
[结论] 未查询到符合条件的数据,建议调整查询条件后重试
```
Agent 回复应为:
```
本次查询未返回数据,可能是筛选条件过于严格或数据集中暂无匹配记录。建议您调整查询条件后重试。
```
---
## 模式 B — 文件问数
基于用户上传的 Excel/CSV 结构化数据文件,通过流式问数接口进行智能分析。
### 工作流程
严格按两步执行,每一步独立运行并输出完整结果。步骤 1 的输出(`fileId`)作为步骤 2 的输入。
**错误处理原则**:业务逻辑错误(权限不足、试用到期、格式不支持等)**必须立即终止整个流程**;网络类瞬态错误(超时、连接中断)可重试 1~2 次(建议指数退避),重试仍失败则终止。
```mermaid
flowchart LR
file["用户数据文件\n(xls/xlsx/csv)"] --> autoReg{"user_token\n已配置?"}
autoReg -- 否 --> register["自动注册用户\n回填 ~/.qbi/config.yaml"]
autoReg -- 是 --> upload
register -- 失败 --> abort["终止流程\n告知原因"]
register -- 成功 --> upload["步骤1: upload_file.py\nPOST /copilot/parse"]
upload -- 失败 --> abort
upload -- 成功 --> fileId["输出 fileId\n+ 文件结构详情"]
fileId --> stream["步骤2: file_stream_query.py\nPOST /smartq/queryByQuestionStreamByFile"]
stream -- 失败 --> abort
stream -- 成功 --> parseSSE["实时解析 SSE 事件流"]
parseSSE --> reasoning["输出思考/推理过程"]
parseSSE --> code["code 事件 → 拼接完整 Python 代码"]
parseSSE --> result["result 事件 → 解析结构化数据"]
result --> renderChart["matplotlib 渲染图表 PNG\n保存到 $WORKSPACE_DIR/output/"]
parseSSE --> reporter["reporter 事件 → 分析报告"]
parseSSE --> conclusion["输出结论/数据解读"]
parseSSE --> finish["finish 事件 → 结束"]
```
### 步骤 1 — 上传文件获取 fileId
```bash
python scripts/chat/upload_file.py /path/to/data.xlsx
```
| 项目 | 说明 |
|------|------|
| 接口 | `POST /openapi/v2/copilot/parse` |
| Content-Type | `multipart/form-data` |
| 功能 | 上传文件并解析各 Sheet 结构详情 |
#### 请求参数
| 参数 | 类型 | 说明 |
|------|------|------|
| `file` | File | 上传的数据文件(multipart 文件域) |
| `fileName` | String | 文件名(如 `sales_data.xlsx`) |
| `tableConfigs[0].tableName` | String | 表名(默认取文件名去后缀) |
| `tableConfigs[0].tableType` | String | `excel` 或 `csv` |
| `isSave` | String | 固定 `false` |
| `fileId` | String | 留空(首次上传) |
#### 输出内容
脚本会输出:上传进度提示、`fileId`、完整的响应 JSON(含文件结构详情、各 Sheet 的列名和类型)。
> **关键输出**:从输出中提取 `fileId` 值,作为步骤 2 的第一个参数。
#### 错误处理
步骤 1 出现以下任一情况时,**立即终止整个流程,不得继续执行步骤 2**:
- 用户自动注册失败
- 文件上传失败(格式不支持、大小超限、服务端解析错误)
- 脚本以非零退出码结束
> **等待预期**:文件问数通常需要 15~60 秒,复杂分析可能需要数分钟(最长 10 分钟超时)。建议提前告知用户耐心等待。
### 步骤 2 — 基于 fileId 发起流式问数
```bash
python scripts/chat/file_stream_query.py <fileId> "用户的问题"
```
示例:
```bash
python scripts/chat/file_stream_query.py "abc123-def456" "各部门的销售额对比"
```
| 项目 | 说明 |
|------|------|
| 接口 | `POST /openapi/v2/smartq/queryByQuestionStreamByFile` |
| Content-Type | `application/json` |
| 响应格式 | SSE (Server-Sent Events) 事件流 |
| 超时时间 | 10 分钟(600 秒) |
#### SSE 核心事件类型
| 事件类型 | 输出标记 | 处理方式 |
|----------|----------|----------|
| `text` | (直接输出) | 实时拼接输出文本内容 |
| `reasoning` | (直接输出) | 实时输出 AI 思考推理过程 |
| `code` | (静默收集) | 静默拼接 → 流结束后保存到 `$WORKSPACE_DIR/.qbi/smartq-chat/output/` |
| `result` | `[取数结果]` | 解析结构化数据 → matplotlib 渲染图表 PNG |
| `reporter` | (直接输出) | 实时拼接分析报告文本 |
| `html` | `[HTML 图表]` | 仅保存原始 HTML 到 `$WORKSPACE_DIR/.qbi/smartq-chat/output/` |
| `html_result` | `[图表数据]` | 解析结构化数据,渲染图表 |
| `sql` | `[SQL]` | 输出生成的 SQL 语句 |
| `conclusion` | `[结论]` | 输出最终分析结论 |
| `summary` | `[数据解读]` | 输出数据解读分析 |
| `trace` | `[Trace]` | 请求追踪 ID,问题反馈时提供此 ID 可加速排查 |
| `finish` | `[完成]` | 标记事件流结束(终止事件) |
| `error` | `[错误]` | 输出错误信息(终止事件) |
### 展示规则
> 并非所有文件问数结果都会生成图表图片。脚本会根据渲染条件自动选择输出**图片**或 **Markdown 表格**,Agent 应根据脚本的实际输出格式进行回复。
#### 有图片时(强制)
> **MUST**:脚本输出中包含 `[...](...)` 图片引用时,Agent 的答复中**必须原样包含**该 Markdown 图片语法,否则用户无法看到图表。这是硬性要求,不可省略。
1. **原样复制** `` 到答复正文中
2. 紧接图片下方标注图表文件路径
3. **不要**在图表上方添加机械化引导文字
4. 如果有多张图表,按顺序逐一内联展示
#### 无图片时
1. 如果脚本输出了 Markdown 表格,直接展示表格
2. 如果 matplotlib 不可用,基于 `result` 事件数据输出 Markdown 表格
3. 如果既无图片也无表格,基于 `conclusion` / `summary` / `reporter` 内容组织回复
4. **禁止**在没有图片输出时自行编造 `[...](...)` 图片语法或占位表格
### 结果总结要求
Agent 在答复用户时,必须同时满足以下两点:
1. **内联图表(优先)**:若脚本输出中包含 `` 图片引用,**必须先原样复制**到答复正文中(参见上方「展示规则 › 有图片时」),确保用户能看到可视化结果
2. **文字总结**:基于脚本输出中的 `conclusion`(结论)和 `summary`(数据解读)内容,结合 `reporter`(分析报告)文本,对分析结果进行**重新组织和总结**
**禁止**向用户展示分析代码或代码文件路径(详见重要提示第 10 条)。
---
## 异常处理(必读)
脚本已内置以下三种异常的检测逻辑,会在控制台自动打印对应提示。Agent 应参考 `../common/error_messages.md` 中的提示文案向用户传达,可根据上下文适当调整措辞,但核心信息(链接、操作建议)不可省略。检测到任一异常时,**立即终止流程**。
### 1. 无数据集权限
**触发条件**:数据集问数模式下,脚本输出包含「您当前没有可用的问数数据集」
**检测位置**:`scripts/chat/cube_resolver.py` 权限查询
**处理方式**:提示用户没有可用数据集,建议尝试文件问数或开通服务。详细提示文案见 [error_messages.md](../common/error_messages.md)
**附加规则**:整个回复中**只展示一次**,不得重复。可自然询问用户是否改用文件问数。
### 2. 试用到期
**触发条件**:任何步骤的脚本输出或 API 响应中出现错误码 `AE0579100004`
**检测位置**:`scripts/common/utils.py` 中的 `check_trial_expired()`
**处理方式**:告知用户试用已到期,引导开通正式服务。详细提示文案见 [error_messages.md](../common/error_messages.md)
### 3. 数据文件解析失败
**触发条件**:文件问数模式下,脚本输出包含「数据文件解析失败」
**检测位置**:`scripts/chat/file_stream_query.py` 中的 `_on_error` 方法
**处理方式**:提示用户检查文件格式和内容后重试。详细提示文案见 [error_messages.md](../common/error_messages.md)
---
## 关键接口汇总
| 接口 | 方法 | Content-Type | 模式 | 说明 |
|------|------|-------------|------|------|
| `/openapi/v2/smartq/tableSearch` | POST | application/json | A | 智能选表,返回匹配的 cubeId 列表 |
| `/openapi/v2/smartq/query/llmCubeWithThemeList` | GET | - | A | 查询用户有权限的问数数据集列表 |
| `/openapi/v2/smartq/queryByQuestionStream` | POST | application/json | A | 数据集问数流式接口,返回 SSE(olapResult 事件直接包含取数结果) |
| `/openapi/v2/copilot/parse` | POST | multipart/form-data | B | 上传文件并解析结构,返回 fileId |
| `/openapi/v2/smartq/queryByQuestionStreamByFile` | POST | application/json | B | 文件问数流式接口(SSE) |
| `/openapi/v2/organization/user/queryByAccount` | GET | - | 通用 | 通过 accountName 查询用户是否在组织中 |
| `/openapi/v2/organization/user/addSuer` | POST | application/json | 通用 | 添加用户到组织 |
---
## 重要提示
1. **文件问数必须走 API**:详见顶部「核心约束」,禁止使用 pandas/openpyxl 等库直接分析用户上传的文件
2. **模式选择**:根据用户是否上传了文件自动选择数据集问数或文件问数模式
3. **数据集问数无需 cubeId**:用户进行数据集问数时,**直接执行脚本**,不传 `--cube-id`,脚本会自动智能选表。**禁止**要求用户提供 cubeId 或提示 cubeId 为必传参数
4. **文件问数必须分步执行**:先执行步骤 1 上传文件获取 `fileId`,再执行步骤 2 传入 `fileId` 进行问数,不可跳过或合并
5. **错误处理**:业务逻辑错误(权限不足、试用到期等)必须立即终止整个流程;网络类瞬态错误(超时、连接中断)可重试 1~2 次后终止。向用户清晰说明报错原因,并提醒:「如需进一步帮助,请联系 Quick BI 产品服务同学获取支持。」
6. **流式超时**:默认超时 10 分钟(600 秒),复杂查询可能需要较长时间
7. **文件格式限制**:仅支持 `xls`、`xlsx`、`csv` 格式,单文件不超过 10MB
8. **userId 自动处理**:`user_token` 未配置时,脚本启动时即自动基于设备唯一标识生成 accountId,通过组织用户接口检查并注册用户,注册成功后将 userId 回写到全局配置 `~/.qbi/config.yaml`,后续调用不再重复注册
9. **图表展示(强制)**:PNG 文件保存在 `$WORKSPACE_DIR/output/` 目录中,脚本会以 `` 格式输出。Agent **必须**将脚本输出的 `[...](...)` 原样复制到答复中,这是用户看到图表的唯一方式,不可省略
10. **禁止展示代码**:文件问数中 `code` 事件的 Python 代码仅静默保存,**禁止在答复中向用户展示代码内容或代码文件路径**
11. **禁止编造占位表格**:Agent **禁止**自行构造含「(数据见下方图表)」等占位符的 Markdown 表格或其他入空壳表格
---
## Examples
**Example 1: 数据集问数(自动智能选表)**
Input:
```
用户: "销量最高的地区TOP3是哪些"
```
Expected:
```bash
python scripts/chat/smartq_stream_query.py "销量最高的地区TOP3是哪些"
```
脚本自动智能选表匹配数据集,输出推理过程、图表(或 Markdown 表格)和分析结论。
Agent 回复示例(脚本输出含图片时,必须包含图片 Markdown):
```
根据销售数据集的分析,销量最高的三个地区为:

> 图表路径:/path/output/chart_1744123456_1.png
从图表可以看出:1. XX地区销量最高……
```
**Example 2: 文件问数(上传 Excel 分析)**
Input:
```
用户: 上传了 sales_data.xlsx,提问"各部门的销售额对比"
```
Expected:
```bash
# 步骤 1:上传文件获取 fileId
python scripts/chat/upload_file.py /path/to/sales_data.xlsx
# 输出 fileId=abc123-def456
# 步骤 2:基于 fileId 发起问数
python scripts/chat/file_stream_query.py "abc123-def456" "各部门的销售额对比"
```
分两步执行,Agent 基于结论和数据解读总结分析结果。若脚本输出了 `[...](...)` 图片,**必须原样复制到回复中**;若输出的是 Markdown 表格则直接展示表格;若无图表数据则基于结论文字回复。不展示代码。
**Example 3: 数据集问数(结果为交叉表,无图片)**
Input:
```
用户: "各月份的销售明细数据"
```
Expected:
```bash
python scripts/chat/smartq_stream_query.py "各月份的销售明细数据"
```
脚本输出交叉表类型数据时,直接输出 Markdown 表格而非图片。
Agent 回复示例(无图片时基于表格和结论回复):
```
以下是各月份的销售明细:
| 月份 | 销售额 | 订单数 |
|------|--------|--------|
| 1月 | 150万 | 320 |
| 2月 | 128万 | 280 |
| 3月 | 175万 | 390 |
从数据来看,3月的销售额和订单数均为最高……
```
FILE:references/common/error_messages.md
# 异常提示文案
以下为各异常场景的用户提示文案。Agent 输出时**禁止**使用 `[text](url)` 链接语法,所有 URL 直接以纯文本形式内嵌在文案中。
## 1. 无数据集权限
> 您当前没有可用的问数数据集。
>
> 📂 **试试「文件问数」**
> 无需任何权限配置,上传 Excel/CSV 文件即可直接分析。
>
> 🚀 **0 元体验,限时加码**
> 现在上阿里云,将额外赠送 30 天全功能体验,解锁企业级安全管控与深度分析引擎,让 AI 洞察更准、更稳。点击下方链接,领取试用:
> https://www.aliyun.com/product/quickbi-smart?utm_content=g_1000411205
>
> 💬 点击下方链接,进入交流群获取最新资讯:
> https://at.umtrack.com/r4Tnme
## 2. 试用到期
> 小 Q 超级分析助理已陪伴您一周,我们看到您在通过 AI 寻找数据背后的真相,这很了不起。
>
> 🕙 **试用模式已结束**
> 授权到期后,动态分析将暂告一段落。
>
> 💡 **其实,您可以更轻松**
> 目前的"文件模式"仍需您手动搬运数据。让 AI 直连企业存量数据资产,实现分析结果自动更新?立即体验完整功能。
>
> 🚀 **0 元体验,限时加码**
> 现在上阿里云,将额外赠送 30 天全功能体验,解锁企业级安全管控与深度分析引擎,让 AI 洞察更准、更稳。点击下方链接,领取试用:
> https://www.aliyun.com/product/quickbi-smart?utm_content=g_1000411205
>
> 💬 点击下方链接,进入交流群获取最新资讯:
> https://at.umtrack.com/r4Tnme
## 3. 数据文件解析失败
> ⚠️ **数据文件解析失败**
> 当前问数的数据文件可能存在格式或内容问题,服务端多次重试执行均未成功。
>
> 💡 **建议排查**
> 请检查文件是否为标准的 Excel/CSV 格式,确认数据内容完整无损后重新上传。
>
> 💬 如仍无法解决,点击下方链接,进入交流群联系 Quick BI 产品服务同学获取支持:
> https://at.umtrack.com/r4Tnme
FILE:references/dashboard/module-dashboard-reference.md
# QuickBI 仪表板技能生成器 - 参考文档
> 本文档包含 SKILL.md 的详细参考内容,供深入了解使用。
## 分析框架匹配规则
### 框架匹配规则表
综合**指标语义 + 布局模式 + 联动关系**,匹配最适合的分析框架。
| 匹配规则(基于真实字段名称) | 分析框架 | 适用场景 | 核心公式/方法 |
|---------------------------|---------|---------|--------------|
| 包含"销售额/收入"+"成本"+"利润"+"毛利率/利润率" | **杜邦分析** | 财务指标分解,盈利能力分析 | ROE = 利润率 × 资产周转率 × 权益乘数 |
| 包含"获客/新增"+"激活"+"留存"+"转化"+"收入/付费" | **AARRR 海盗模型** | 互联网产品增长漏斗分析 | 各环节转化率优化 |
| 包含"最近购买时间"+"购买频次"+"消费金额" | **RFM 客户分析** | 客户价值分群,精准营销 | R×F×M 评分矩阵 |
| 包含"产品/商品"维度 + "市场份额/增长率" | **波士顿矩阵** | 产品组合策略分析 | 明星/现金牛/问题/瘦狗分类 |
| 包含"步骤/阶段/环节"维度 + "转化率/流失率" | **漏斗分析** | 流程优化,定位流失环节 | 各环节转化率 = 下一步/上一步 |
| 包含"目标值/计划值" + "实际值/完成值" | **目标达成分析** | KPI 完成度监控 | 达成率 = 实际值/目标值 × 100% |
| 包含"同期/去年同期" + "当期/本期" | **同环比分析** | 时间对比趋势分析 | 同比 = (本期-同期)/同期 × 100% |
| 包含"预算" + "实际/执行" | **预实对比分析** | 预算执行监控 | 预算执行率 = 实际/预算 × 100% |
| 包含"库存/存货" + "周转/动销" | **库存分析** | 库存健康度监控 | 周转率 = 销售成本/平均库存 |
| 包含"客单价" + "客户数/用户数" + "销售额" | **客户价值分析** | 客户贡献度分析 | 销售额 = 客户数 × 客单价 |
| 包含"曝光/展示" + "点击" + "转化/成交" | **营销漏斗分析** | 广告投放效果分析 | CTR/CVR 等转化指标 |
| 包含"人力/人数" + "产出/效率" | **人效分析** | 人力资源效能分析 | 人均产出 = 总产出/人数 |
| 以上都不匹配 | **L1-L4 金字塔** | 通用层级分析框架 | 概览→趋势→分解→明细 |
---
## 布局模式分析规则
### 布局模式识别
基于 `tileLayout` 位置信息推断仪表板的整体分析模式。
| 布局特征 | 布局模式 | 典型特点 | 推断的仪表板类型 |
|---------|---------|---------|----------------|
| 第一行有多个 indicator-card 类型组件 | **指标矩阵型** | 顶部密集指标卡阵列 | 监控型仪表板(强调 L1 概览) |
| 存在 line/bar 等趋势图表 | **核心图表型** | 有主次之分的焦点布局 | 分析型仪表板(强调 L2/L3) |
| 底部存在 common-table | **明细导向型** | 底部有明细表 | 运营型仪表板(强调 L4 追溯) |
| 同一行有多个相同类型组件 | **对比分析型** | 并列布局便于对比 | 多维对比分析 |
| 组件数量少(≤4) | **聚焦分析型** | 少而精的核心图表 | 专题分析仪表板 |
### 布局模式与分析框架的关联
| 布局模式 | 倾向的分析框架 | 置信度提升依据 |
|---------|--------------|---------------|
| 指标矩阵型 | 目标达成分析、同环比分析 | 多指标并列 → 关注指标对比 |
| 核心图表型 | 趋势分析、漏斗分析 | 大图表为主 → 关注过程变化 |
| 明细导向型 | L1-L4 金字塔 | 有明细表 → 需要追溯能力 |
| 对比分析型 | 杜邦分析、客户价值分析 | 并列布局 → 关注维度对比 |
---
## 层级归类规则
### L1-L4 层级判断
| 层级 | 类型特征 | 位置特征 |
|-----|---------|----------|
| **L1** | indicator-card/kpi/gauge | y ≤ 20(顶部)|
| **L2** | line/area/indicator-trend | 20 < y < 50,含 datetime 维度 |
| **L3** | bar/pie/ranking-list | 30 < y < 70,含分类维度 |
| **L4** | common-table | y > 50(底部)|
### 分析主题推断(基于图表类型)
| 图表类型 | 分析主题模式 |
|---------|-------------|
| indicator-card/kpi | "{度量}指标展示" |
| line/area | "{度量}时序趋势" |
| pie | "{维度}分布/占比" |
| bar | "{维度}对比分析" |
| ranking-list | "{维度}排行榜" |
| common-table | "{主题}明细查询" |
---
## 意图路由规则
### 用户问法模式匹配
| 用户问法模式 | 提取的意图 | 匹配目标 |
|-------------|-----------|----------|
| "XX是多少/有多少" | 查询单一指标 | L1 指标卡 |
| "XX趋势/走势/变化" | 趋势分析 | L2 折线图/趋势图 |
| "XX排行/TOP/最高/最低" | 排序分析 | L3 排行榜 |
| "XX分布/占比/构成" | 结构分析 | L3 饼图/柱图 |
| "各XX的YY" | 维度分解 | L3 分组图表 |
| "XX明细/详情/列表" | 明细查询 | L4 明细表 |
| "为什么XX下降/上升" | 归因分析 | L1→L2→L3 联合 |
---
## 业务逻辑推断规则
### 指标组合推断公式
| 指标组合 | 推断公式 |
|---------|----------|
| 销售额 + 成本 + 利润 | 利润 = 销售额 - 成本 |
| 销售额 + 销量 | 客单价 = 销售额 / 销量 |
| 目标值 + 实际值 | 达成率 = 实际 / 目标 × 100% |
| 本期 + 同期 | 同比增长率 = (本期-同期)/同期 × 100% |
| 本期 + 上期 | 环比增长率 = (本期-上期)/上期 × 100% |
---
## 工具函数说明
### quickbi_openapi.py 函数清单
| 函数名 | 用途 | 使用阶段 |
|--------|------|----------|
| `load_config(config_path=None)` | 加载配置(优先级:环境变量 > 工作目录级 > 全局 > 包内默认) | Step 1.0, Step 2.1 |
| `is_dataportal_url(url)` | 判断是否为数据门户 URL | Step 1.0 |
| `extract_dataportal_ids(url)` | 从数据门户 URL 提取 productId 和 menuId | Step 1.0 |
| `get_dataportal_page_id(...)` | 通过 OpenAPI 获取数据门户关联的仪表板 pageId | Step 1.0 |
| `extract_page_id(url)` | 从仪表板 URL 提取 pageId | Step 1.0 |
| `validate_and_prepare_dashboard(...)` | 仪表板预校验及预处理 | Step 1.0 |
| `get_dashboard_json(...)` | 获取仪表板完整 JSON 数据 | Step 2.1 |
| `query_openapi(...)` | 调用 SmartQ 查询接口 | 生成的 skill 查询阶段 |
| `get_dashboard_update_time(...)` | 查询仪表板更新时间 | 生成的 skill 启动校验 |
### get_dashboard_json.js 函数清单
| 函数名 | 用途 |
|--------|------|
| `parseDashboardJson(json)` | 解析仪表板原始 JSON,提取组件结构 |
| `analyzeLayout(charts)` | 基于 tileLayout 分析图表布局 |
### config_loader.py 函数清单
| 函数名 | 用途 |
|--------|------|
| `load_config()` | 四层配置加载(优先级:环境变量 > 工作目录级 > 全局 > 包内默认) |
| `persist_to_global_config(key, value)` | 写入全局配置 `~/.qbi/config.yaml` |
---
## 错误码参考
### 预校验接口错误码
| 错误码 | 含义 | 处理建议 |
|--------|------|----------|
| `AE0510000005` | 用户不在组织中 | 检查 user_token 是否正确 |
| `AE0510150002` | 没有仪表板访问权限 | 检查用户是否有该仪表板的访问权限 |
| `AE0510200000` | 没有数据集管理或者授权的权限 | 检查是否有数据集管理和问数配置权限 |
| `AE0581030022` | 未购买问数功能 | 确认已购买 SmartQ 问数功能 |
| `OE10010106` | API 未授权 | 检查 api_key/api_secret 配置 |
| `CONNECTION_ERROR` | 网络连接失败 | 检查网络和 server_domain 配置 |
### 数据门户接口错误码
| 错误码 | 含义 | 处理建议 |
|--------|------|----------|
| `NO_PAGE_ID` | 数据门户菜单未关联仪表板 | 检查门户菜单是否正确配置了仪表板页面 |
| `CONNECTION_ERROR` | 网络连接失败 | 检查网络和 server_domain 配置 |
---
## dashboardData 数据结构
Step 2.2 解析后返回的完整数据结构:
```typescript
{
success: boolean;
basicInfo: {
name: string; // 仪表板名称
pageId: string; // 页面ID
workspaceId: string; // 工作空间ID
gmtModified: number; // 最后修改时间(毫秒级时间戳),用于 skill_generated_at
};
queryControls: Array<{ // 查询控件列表
componentId: string;
internalId: string;
needManualQuery: boolean;
fields: Array<{
labelName: string;
componentType: string; // datetime / enumSelect
relatedGraphIds: string[];
}>;
}>;
chartComponents: Array<{ // 图表组件列表
componentId: string;
componentName: string;
sourceId: string; // 数据集ID - 问数调用的关键
dimensions: Array<{caption: string; pathId: string}>;
measures: Array<{caption: string; aggregateType: string}>;
drillFields: Array<{caption: string}>;
tabInfo: object | null; // Tab 从属关系
}>;
tabComponents: Array<{ // Tab 组件列表
componentId: string;
tabs: Array<{id: string; title: string}>;
}>;
richTextComponents: Array<{textContent: string}>;
layoutAnalysis: {rows: Array}; // 布局分析
}
```
FILE:references/dashboard/module-dashboard.md
---
name: quickbi-smartq-dashboard
description: >
根据 QuickBI 仪表板生成专用查询技能。
当用户提供仪表板 URL 并希望创建查询技能时使用。
触发关键词:生成技能、仪表板转 Skill。
---
# QuickBI 仪表板技能生成器
通过 OpenAPI 获取 QuickBI 仪表板数据,发现其图表组件、字段配置、查询控件、布局关系,提炼分析思路,生成一份可用于数据查询的 SKILL.md 文件。
## Scope
**Does:**
- 接收 QuickBI 仪表板 URL 或数据门户 URL,解析出 pageId
- 调用 OpenAPI 获取仪表板完整 JSON 结构
- 解析图表组件、查询控件、数据集、字段配置
- 分析布局模式,匹配适用的分析框架(L1-L4 金字塔或专业框架)
- 生成完整的查询技能 SKILL.md 文件并安装到技能中心
**Does NOT:**
- 不执行实际的数据查询(查询由生成的子 skill 负责)
- 不支持非 QuickBI 平台的仪表板
- 不处理需要特殊权限的仪表板(会在预校验阶段提示错误)
- **不在 `fetch_dashboard_data` 失败时尝试任何替代方案**(必须终止流程,禁止绕行)
## 触发场景
当用户提出以下类型的请求时使用此 Skill:
- "帮我把这个 QuickBI 仪表板转化为一个查询 Skill"
- "把这个看板变成一个可以查询数据的 Skill"
- "生成这个仪表板的查询技能"
- "提取这个仪表板的分析思路,生成 Skill"
- "为这个仪表板生成技能:{URL}"
- 用户提供了一个 类似 `https://bi.aliyun.com/dashboard/view/pc.htm?pageId=XXXXXXX` 格式的仪表板 URL,并希望创建查询能力
- 用户提供了一个数据门户页面 URL(格式如 `https://bi.aliyun.com/product/view.htm?module=dashboard&productId=xxx&menuId=yyy`),并希望创建查询能力
### 支持的 URL 格式
| URL 类型 | 路径特征 | 关键参数 | 处理方式 |
|----------|---------|---------|----------|
| **仪表板页面** | `/dashboard/view/pc.htm` | `pageId` | 直接提取 pageId |
| **数据门户页面** | `/product/view.htm` | `productId`, `menuId` | 通过 OpenAPI 获取关联的 pageId |
## 前置条件
- 需要有效的 API 凭证(用于调用 OpenAPI 获取仪表板数据)
- 配置说明请参见主文件的「配置」章节
---
## Phase 1: 输入收集与验证
### Step 1.0: 获取用户输入
从用户消息中提取:
1. **页面地址**(必需):QuickBI 仪表板链接或数据门户链接
2. **技能名称**(可选):生成的 skill 目录名(kebab-case 格式)
- 如果用户指定了技能名称,直接使用(会覆盖同名技能)
- 如果未指定,将在 Phase 2 发现仪表板标题后自动推导
---
## Phase 2: 仪表板数据获取与解析
### Step 2.1: 一站式获取仪表板数据
> **⚠️ 强制约束**:必须使用封装脚本,禁止自行拆分执行。
使用 `scripts/fetch_dashboard_data.py` 一站式完成:配置加载 → URL 解析 → 预校验 → 获取 JSON → 解析结构 → 获取数据集名称。
**[强制规则] 失败立即终止,禁止任何绕行**:
> ⛔ **绝对禁止**:当 `fetch_dashboard_data` 返回失败时,**禁止尝试任何替代方案**,包括但不限于:
> - ❌ 直接调用底层 API(如 `get_dashboard_json`)
> - ❌ 跳过预校验步骤
> - ❌ 尝试"其他方法"获取数据
> - ❌ 继续执行后续任何步骤
>
> **唯一正确的行为**:输出错误信息 → 终止流程 → 等待用户修正后重新触发
- 如果获取失败(`result["success"] == False`),**必须立即终止整个流程**
- **失败原因已在 `result["error"]` 中说明**,直接展示给用户即可
- **不要尝试"智能"地绕过错误**——预校验失败说明前置条件不满足,绕行只会导致后续步骤全部失败
```python
from dashboard.fetch_dashboard_data import fetch_dashboard_data
# 一站式获取(自动处理:配置加载、URL解析、预校验、获取JSON、解析、数据集名称)
result = fetch_dashboard_data(user_input_url)
if not result["success"]:
print(f"获取失败: {result['error']}")
# ⛔ 必须立即终止!禁止尝试其他方法,禁止继续执行任何后续步骤
return # 流程到此结束,等待用户修正后重新触发
# 提取结果
dashboardData = result["dashboardData"] # 解析后的仪表板结构
datasetNameMap = result["datasetNameMap"] # cubeId -> cubeName 映射
page_id = result["pageId"] # 仪表板 pageId
dashboard_url = result["dashboardUrl"] # 标准仪表板预览页地址(用于生成 skill)
print(f"获取成功: {dashboardData['basicInfo']['name']}")
```
**脚本位置**:[scripts/fetch_dashboard_data.py](scripts/fetch_dashboard_data.py)
**执行后必须输出**(确认数据已获取):
```markdown
---
## Step 2.1 执行结果
**执行状态**:{成功/失败}
**仪表板名称**:{dashboardData.basicInfo.name}
**仪表板URL**:{dashboard_url}(用于仪表板知识库的 URL 字段)
**pageId**:{page_id}
**gmtModified**:{dashboardData.basicInfo.gmtModified}(用于 SKILL_METADATA.skill_generated_at)
**图表组件数**:{dashboardData.chartComponents.length} 个
**查询控件数**:{dashboardData.queryControls.length} 个
**Tab组件数**:{dashboardData.tabComponents.length} 个
> ⚠️ **pageId 校验**:上方 pageId 值来自脚本返回的 `result["pageId"]`。
> 如果用户传入的是数据门户 URL(含 productId),pageId 与 productId **一定不同**。
> 后续 Phase 3 生成技能文件时,所有需要 pageId 的地方**必须使用此值**,禁止使用 URL 中的 productId。
**数据集清单**(去重):
| 数据集名称 | 数据集ID |
|-----------|----------|
| {datasetNameMap[cubeId]} | {cubeId} |
> 数据已存储到 `dashboardData`、`datasetNameMap` 和 `dashboard_url`,继续执行 Step 2.2
---
```
**失败处理**(⛔ 禁止绕行):
- 如果 `result["success"] == False`,**立即终止整个流程**
- 输出错误信息 `result["error"]`,告知用户失败原因
- **禁止**尝试直接调用 `get_dashboard_json` 或任何其他方法
- 提示用户检查配置或仪表板权限后重新触发
**返回数据结构**:`dashboardData` 包含 `basicInfo`、`queryControls`、`chartComponents`、`tabComponents`、`richTextComponents`、`layoutAnalysis` 等字段,完整定义见 [reference.md - dashboardData 数据结构](./reference.md#dashboarddata-数据结构)。
### Step 2.2: 数据验证与补充
**目的**:验证解析结果的完整性,必要时补充信息。
#### 2.2.1 Tab 结构验证
如果 `dashboardData.tabComponents.length > 0`:
1. 列出所有 Tab 及其标题
2. 确认每个 Tab 下包含的图表组件
3. 记录 Tab 与图表的从属关系
#### 2.2.2 图表标题验证
1. 检查 `dashboardData.chartComponents[].componentName` 是否有意义
2. 如果为空或无意义,根据度量/维度字段推断主题
3. 记录调整后的图表标题
#### 2.2.3 富文本内容提取
1. 从 `dashboardData.richTextComponents[].textContent` 提取纯文本
2. 用于理解仪表板的业务背景和使用说明
### Step 2.3: 提炼分析思路与匹配分析框架
> **【必须执行步骤】** 完成 5 个子步骤 + 强制输出(2.3.2-OUTPUT)。
> 这是将仪表板数据转化为可用分析框架的核心步骤。即使时间紧迫,也必须完成此步骤的所有子步骤(2.3.1 - 2.3.5)和强制输出(2.3.2-OUTPUT)。
> **核心原则**:所有内容必须基于 `dashboardData`(Step 2.1 获取的数据)推断,不可臆造。
#### 2.3.1 数据提取
从 `dashboardData.chartComponents` 中提取:
**数据集清单**:收集所有图表的 `sourceId`,去重后建立清单:
| 数据集ID | 关联图表 | 可用维度 | 可用度量 |
|----------|---------|---------|----------|
| {sourceId} | {图表名列表} | {维度字段} | {度量字段(聚合方式)} |
**指标体系**:遍历 `chartComponents[].measures`,按 `caption` 去重。
**维度体系**:遍历 `chartComponents[].dimensions`,按 `itemType` 分类:
- `datetime` → 时间维度 | `geographic` → 地理维度 | `dimension` → 分类维度
#### 2.3.2 分析框架匹配
综合**指标语义 + 布局模式 + 联动关系**,匹配最适合的分析框架。
**框架匹配规则**:详见 [reference.md - 分析框架匹配规则](reference.md#分析框架匹配规则)
常用框架:杜邦分析、AARRR 海盗模型、RFM 客户分析、漏斗分析、目标达成分析、同环比分析等。无法匹配时使用 **L1-L4 金字塔**(概览→趋势→分解→明细)。
#### 2.3.2-OUTPUT: 【强制】输出分析框架匹配结果
> **不可跳过**:完成 2.3.1-2.3.2 后,**必须输出**以下格式:
```markdown
---
## 【分析框架匹配结果】
### 提取到的真实字段
**度量字段**:{measures 列表}
**维度字段**:时间({datetime}) | 地理({geographic}) | 分类({dimension})
### 布局模式
**布局特征**:第一行{N}个{类型}组件,总{M}行,底部{有/无}明细表
**布局模式**:{指标矩阵型/明细导向型/对比分析型/聚焦分析型}
### 框架匹配
**匹配框架**:{框架名称}
**置信度**:{high/medium/default}
**匹配依据**:{指标特征} + {布局特征} + {联动特征}
### 层级预览(仅 L1-L4 框架)
| 层级 | 图表数 | 典型图表 | 归类依据 |
|-----|-------|---------|----------|
| L1 | {N} | {示例} | {类型+位置} |
---
```
#### 2.3.3 布局模式分析与仪表板类型推断
基于 `tileLayout` 位置信息推断仪表板的整体分析模式。详见 [reference.md - 布局模式分析规则](reference.md#布局模式分析规则)
**布局模式类型**:指标矩阵型、核心图表型、明细导向型、对比分析型、聚焦分析型。
#### 2.3.4 自动匹配分析框架(多维度综合推断)
> **综合指标语义 + 布局模式 + 联动关系,匹配最适合的分析框架**
>
> **重要**:这是一个**思考推断步骤**,综合多个维度的信息来推断分析框架。
**层级归类规则**:详见 [reference.md - 层级归类规则](reference.md#层级归类规则)
- **L1**(整体监控):indicator-card/kpi/gauge,顶部位置
- **L2**(趋势分析):line/area/indicator-trend,含 datetime 维度
- **L3**(维度分解):bar/pie/ranking-list,含分类维度
- **L4**(明细追踪):common-table,底部位置
**匹配思考流程**:
1. **列出所有真实指标**:从 `uniqueMeasures` 中列出所有指标名称
2. **列出所有真实维度**:从 `uniqueDimensions` 中列出所有维度名称
3. **识别布局模式**:根据布局特征判断仪表板类型
4. **分析联动关系**:哪些图表共享筛选器?暗示它们在同一分析路径上
5. **分析下钻方向**:`drillFields` 的维度类型暗示分析深入的方向
6. **综合语义分析**:结合以上信息,理解仪表板的整体分析意图
7. **框架匹配**:根据综合特征判断最匹配的分析框架
8. **记录匹配依据**:说明是基于哪些特征(指标+布局+联动)匹配到该框架
#### 2.3.5 业务逻辑推断
**业务逻辑推断(基于指标组合)**:
| 指标组合 | 推断公式 |
|---------|----------|
| 销售额 + 成本 + 利润 | 利润 = 销售额 - 成本 |
| 销售额 + 销量 | 客单价 = 销售额 / 销量 |
| 目标值 + 实际值 | 达成率 = 实际 / 目标 × 100% |
更多推断规则详见 [reference.md - 业务逻辑推断规则](reference.md#业务逻辑推断规则)
### Step 2.4: 推断意图路由矩阵
> **核心目标**:建立用户问题 → 目标图表 → 数据集ID 的精准映射
**意图关键词提取规则**:
| 用户问法模式 | 提取的意图 | 匹配目标 |
|-------------|-----------|----------|
| "XX是多少/有多少" | 查询单一指标 | L1 指标卡 |
| "XX趋势/走势/变化" | 趋势分析 | L2 折线图/趋势图 |
| "XX排行/TOP/最高/最低" | 排序分析 | L3 排行榜 |
| "XX分布/占比/构成" | 结构分析 | L3 饼图/柱图 |
| "各XX的YY" | 维度分解 | L3 分组图表 |
| "XX明细/详情/列表" | 明细查询 | L4 明细表 |
| "为什么XX下降/上升" | 归因分析 | L1→L2→L3 联合 |
更多规则详见 [reference.md - 意图路由规则](reference.md#意图路由规则)
### Step 2.5: 汇总探索结果
> **【前置检查】确认 Step 2.1 和 Step 2.3.2-OUTPUT 已输出分析框架匹配结果**
整理所有探查结果,按以下结构输出(详细格式见 Phase 3.2 模板),必须确保:
- 所有图表组件都被列出(包含数据集ID、字段列表、分析主题、层级归属)
- 分析框架匹配结果基于真实提取的指标和维度
- 业务背景综合仪表板标题、图表标题、字段名称、富文本内容
```
## 探索结果汇总
├── 基本信息(名称/pageId/URL)
├── 业务背景与统计口径
├── 分析框架匹配结果
├── 核心指标体系
├── 维度体系
├── 业务逻辑推断
├── 层级结构(A/B/C 形式)
├── 数据集清单
├── 查询控件
├── 图表组件完整列表(12列)
├── 意图路由矩阵
├── 联动与下钻路径
├── 下钻字段配置
└── 适用场景与查询路径
```
**关键要求**:
- 所有图表组件必须完整列出,每个都包含数据集ID、字段列表、分析主题
- 分析框架匹配结果基于真实提取的指标和维度
- 层级归属基于图表类型和布局位置
---
## Phase 3: 技能文件生成
根据探索结果,组装并写入 SKILL.md 和 config.yaml 文件。
> ⚠️ **关键变量确认**:生成技能文件前,确认以下值的来源:
> - **pageId** = `result["pageId"]`(Step 2.1 脚本返回值)—— **不是**用户输入 URL 中的 `productId`
> - **dashboard_url** = `result["dashboardUrl"]`(Step 2.1 脚本返回值,已包含正确 pageId)
> - **skill_generated_at** = `dashboardData.basicInfo.gmtModified`
>
> 数据门户 URL 中的 `productId` 是门户 ID,与仪表板 `pageId` 是完全不同的值,禁止混用。
### Step 3.1: 确定技能名称
如果用户提供了技能名称,直接使用(会覆盖同名技能)。否则:
1. 取仪表板标题
2. 转换为 kebab-case 格式(中文用拼音或有意义的英文缩写)
3. 添加 `qbi-` 前缀
4. **追加 pageId 前8位确保唯一性**
5. 例如:"好美家零售数据" (pageId: `ab12cd34-xxxx`) → `qbi-retail-query-ab12cd34`
### Step 3.2: 组装 SKILL.md 内容
按以下模板生成技能文件。**{占位符}** 表示用探索结果填充的内容。
#### 3.2.1-3.2.7 内容模板
> **【必须执行】读取模板文件**
>
> 使用 `read_file` 工具读取 `./templates/output_skill_template.md` 文件,获取以下内容的完整模板:
> - **YAML Frontmatter + 技能元数据**(3.2.1):name/description 格式、SKILL_METADATA 注释块
> - **标题和触发场景**(3.2.2-3.2.3):5-8 个自然语言查询示例
> - **前置条件**(3.2.4):配置引导流程(与本 skill 的前置条件逻辑一致)
> - **仪表板知识库**(3.2.5):基本信息、业务背景、数据集清单、查询控件、图表组件完整列表、意图路由矩阵
> - **仪表板分析思路**(3.2.6):分析框架匹配、层级结构、核心指标、维度体系、业务逻辑、联动路径
> - **工作流程**(3.2.7):问题理解→拆解→构建查询→调用 SmartQ→汇总结果→错误处理
>
> 按模板格式生成对应内容,填充 `dashboardData` 中提取的真实数据。
**关键要求**:
- description 必须保留 `INSTEAD OF generic quickbi-smartq-chat` 优先级声明
- 所有图表组件必须完整列出,不遗漏
- 指标和维度必须来自 `dashboardData`,不可臆造
### Step 3.3: 生成 config.yaml
复制配置模板,敏感字段置空:
```yaml
# Quick BI 文件问数配置文件
# QBI 域名
server_domain: https://bi.aliyun.com
# OpenAPI 认证配置
api_key:
api_secret:
# 用户令牌
user_token:
# 是否从环境变量读取认证信息
use_env_property: false
```
### Step 3.4: 写入文件
1. **确定输出目录**:与当前技能所在目录保持一致
- 获取当前技能的目录路径(即本 SKILL.md 文件所在的父目录的父目录)
- 在该目录下创建 `{skill-name}/` 子目录
- 例如:如果当前技能在 `.qoderwork/skills/quickbi-smartq-chat/`,则新技能应在 `.qoderwork/skills/{skill-name}/`
2. 创建目录(如不存在)
3. 写入 SKILL.md 文件
4. 写入 config.yaml 模板
5. **复制脚本文件到生成的 skill 的 scripts 目录**(需调整 import):
- 复制 `scripts/dashboard/quickbi_openapi.py` → `{skill-name}/scripts/quickbi_openapi.py`
- **必须调整 import**:移除 `sys.path.insert(0, ...)` 行,将 `from common.config_loader import load_config as _load_config_from_loader` 改为 `from config_loader import load_config as _load_config_from_loader`(注意:不要加点号前缀,因为 scripts 目录不是 Python 包,脚本通过 `sys.path.insert` 方式加载,只能使用绝对导入)
- 复制 `scripts/common/config_loader.py` → `{skill-name}/scripts/config_loader.py`
- **必须调整**:将 `DEFAULT_CONFIG_PATH = BASE_DIR.parent.parent / "default_config.yaml"` 改为 `DEFAULT_CONFIG_PATH = BASE_DIR.parent / "config.yaml"`(扁平结构下 `BASE_DIR` 指向 `scripts/`,`BASE_DIR.parent` 即 skill 根目录)
6. **复制 `references/common/copy_skill_config.png`** 到生成的 skill 的 `example/` 目录,用于首次配置引导
7. 告知用户:
- 技能文件已生成
- 首次使用时会引导配置 API 凭证(config.yaml)
- 生成的技能如何使用
8. **将生成的技能安装到技能中心**(必须执行):
执行以下命令将技能注册到技能中心:
```bash
skills install local --json '{"sourcePath": "<生成的 skill 目录绝对路径>"}'
```
**生成的 Skill 目录结构**:
```
./skills/{skill-name}/
├── SKILL.md # 技能文件
├── config.yaml # API 配置(首次使用时会引导用户配置)
├── example/
│ └── copy_skill_config.png # 首次配置引导图片(来源:common/copy_skill_config.png)
└── scripts/
├── quickbi_openapi.py # OpenAPI 调用工具函数
└── config_loader.py # 配置加载器(全局配置存在即用)
```
---
## Examples
### Example 1: 从仪表板 URL 生成查询技能
**Input:**
```
用户:帮我把这个仪表板转成查询技能
https://bi.aliyun.com/dashboard/view/pc.htm?pageId=ab12cd34-5678-90ef-ghij-klmnopqrstuv
```
**Expected Output:**
1. 执行预校验,确认用户有访问权限
2. 获取仪表板 JSON 并解析组件结构
3. 输出分析框架匹配结果(如 L1-L4 金字塔)
4. 生成 `skills/qbi-xxx-ab12cd34/SKILL.md`
5. 自动安装到技能中心
### Example 2: 从数据门户 URL 生成查询技能
**Input:**
```
用户:这是我们的数据门户,生成一个可以查数据的 skill
https://bi.aliyun.com/product/view.htm?module=dashboard&productId=abc123&menuId=menu456
```
**Expected Output:**
1. 识别为数据门户 URL,调用 `get_dataportal_page_id` 获取关联的仪表板 pageId
2. 执行后续标准流程(同 Example 1)
---
## 重要注意事项
1. **API 凭证安全**:config.yaml 中的 AccessKey 是敏感信息,提醒用户妥善保管
2. **数据集ID是关键**:问数查询依赖正确的 `sourceId`(数据集ID),必须从 JSON 中准确提取
3. **意图路由准确性**:意图路由矩阵决定了用户问题能否正确匹配到数据集,需要仔细推断
4. **分析框架基于真实数据**:所有分析框架的指标和维度必须来自 `dashboardData`(Step 2.1 获取的数据),不可臆造
5. **分析框架必须输出**:Step 2.3.2-OUTPUT 是强制步骤,必须在汇总探索结果前输出分析框架匹配结果。如果跳过此步骤,生成的 Skill 将缺少核心分析能力
---
## 附录 C: 工具函数
### 目录结构
```
quickbi-smartq-chat/
├── SKILL.md # 统一入口技能
├── default_config.yaml # 默认配置
├── references/
│ ├── dashboard/
│ │ ├── module-dashboard.md # 本文档
│ │ ├── module-dashboard-reference.md # 详细参考文档
│ │ └── templates/
│ │ └── output_skill_template.md # 生成模板
│ └── common/
│ └── copy_skill_config.png # 配置引导图片
└── scripts/
├── common/
│ └── config_loader.py # 配置加载器(四层配置优先级)
└── dashboard/
├── fetch_dashboard_data.py # 一站式仪表板数据获取
├── get_dashboard_json.js # JSON 解析脚本
└── quickbi_openapi.py # OpenAPI 工具函数
```
### 核心函数
#### fetch_dashboard_data.py
| 函数名 | 用途 | 使用阶段 |
|--------|------|----------|
| `fetch_dashboard_data(url, config=None)` | 一站式获取仪表板数据(配置加载+URL解析+预校验+获取JSON+解析+数据集名称) | Step 2.1 |
#### quickbi_openapi.py
| 函数名 | 用途 | 使用阶段 |
|--------|------|----------|
| `load_config(config_path=None)` | 加载配置(优先级:环境变量 > 工作目录级 > 全局 > 包内默认) | Step 1.0 |
| `is_dataportal_url(url)` | 判断是否为数据门户 URL | Step 1.0 |
| `extract_dataportal_ids(url)` | 从数据门户 URL 提取 productId 和 menuId | Step 1.0 |
| `get_dataportal_page_id(...)` | 通过 OpenAPI 获取数据门户关联的仪表板 pageId | Step 1.0 |
| `extract_page_id(url)` | 从仪表板 URL 提取 pageId | Step 1.0 |
| `validate_and_prepare_dashboard(...)` | 仪表板预校验及预处理 | Step 1.0 |
| `get_dashboard_json(...)` | 获取仪表板完整 JSON 数据 | 内部调用 |
| `batch_get_dataset_schema(...)` | 批量获取数据集详情(名称等) | 内部调用 |
| `query_openapi(...)` | 调用 SmartQ 查询接口 | 生成的 skill 查询阶段 |
| `get_dashboard_update_time(...)` | 查询仪表板更新时间 | 生成的 skill 启动校验 |
#### get_dashboard_json.js
| 函数名 | 用途 |
|--------|------|
| `parseDashboardJson(json)` | 解析仪表板原始 JSON,提取组件结构 |
| `analyzeLayout(charts)` | 基于 tileLayout 分析图表布局 |
#### config_loader.py(scripts/common/)
| 函数名 | 用途 |
|--------|------|
| `load_config()` | 四层配置加载(优先级:环境变量 > 工作目录级 > 全局 > 包内默认) |
| `check_trial_expired(result)` | 检查 API 返回结果是否为试用过期错误 |
| `get_server_domain(config=None)` | 获取 server_domain(可选传入已加载的 config) |
| `persist_to_global_config(key, value)` | 写入全局配置 `~/.qbi/config.yaml` |
| `persist_to_skill_config(key, value)` | 写入工作目录级配置 `$WORKSPACE_DIR/.qbi/smartq-chat/config.yaml` |
### 预校验接口错误码
| 错误码 | 含义 | 处理建议 |
|--------|------|----------|
| `AE0510000005` | 用户不在组织中 | 检查 user_token 是否正确 |
| `AE0510150002` | 没有仪表板访问权限 | 检查用户是否有该仪表板的访问权限 |
| `AE0510200000` | 没有数据集管理或者授权的权限 | 检查是否有数据集管理和问数配置权限 |
| `AE0581030022` | 未购买问数功能 | 确认已购买 SmartQ 问数功能 |
| `OE10010106` | API 未授权 | 检查 api_key/api_secret 配置 |
| `CONNECTION_ERROR` | 网络连接失败 | 检查网络和 server_domain 配置 |
### 数据门户接口错误码
| 错误码 | 含义 | 处理建议 |
|--------|------|----------|
| `NO_PAGE_ID` | 数据门户菜单未关联仪表板 | 检查门户菜单是否正确配置了仪表板页面 |
| `CONNECTION_ERROR` | 网络连接失败 | 检查网络和 server_domain 配置 |
FILE:references/dashboard/templates/output_skill_template.md
# 输出 SKILL.md 模板
> 本文件包含生成 SKILL.md 的完整模板,供 Step 3.2 引用。
## YAML Frontmatter
```yaml
---
name: {skill-name}
description: >
**专用查询技能** for "{仪表板标题}" QuickBI dashboard.
Use this skill INSTEAD OF generic quickbi-smartq-chat when user asks about:
{5-8个核心度量字段,逗号分隔}.
Trigger keywords: {仪表板标题/简称}, {3-5个业务主题词}.
---
```
> **生成要点**:
> - 只保留**高区分度**的关键词,不要列出所有字段
> - 核心度量:选 5-8 个用户最常问的指标(如销售额、订单数、库存数量)
> - 业务主题:选 3-5 个该仪表板独有的主题词(如销售分析、供应链、客户服务)
> - `INSTEAD OF generic quickbi-smartq-chat` 是优先级声明,必须保留
## 技能元数据
在 YAML frontmatter 后面添加元数据块,用于支持自动更新校验:
```markdown
<!-- SKILL_METADATA
dashboard_page_id: {pageId}
skill_generated_at: {生成时的 gmtModified 值}
dashboard_name: {仪表板标题}
generator_skill: quickbi-smartq-chat
-->
```
> **字段说明**:
> - `dashboard_page_id`:仪表板的 pageId,用于查询更新时间
> - `skill_generated_at`:**直接取 `dashboardData.basicInfo.gmtModified`(Step 2.1 已返回)**
> - `dashboard_name`:仪表板名称,用于用户提示
> - `generator_skill`:生成此技能的元技能名称,用于触发重新生成
## 标题和描述
```markdown
# {仪表板标题} - 数据查询
针对「{仪表板标题}」仪表板的专用查询技能,支持自然语言查询仪表板相关的数据。
```
## 入口检查
> **【关键】此部分位于 SKILL.md 开头,确保 LLM 在处理任何用户问题前先执行更新检测**
```markdown
## ⚠️ 入口检查(每次查询必须执行)
> **强制约束**:当用户提出任何问题时,**必须首先执行此检查**,然后才能执行后续步骤。
**执行步骤**:
1. 运行以下代码检查仪表板是否有更新:
\`\`\`python
import sys, os
sys.path.insert(0, os.path.join(os.path.dirname(os.path.abspath('__file__')), 'scripts'))
from config_loader import load_config
from quickbi_openapi import get_dashboard_update_time
config = load_config()
result = get_dashboard_update_time(host=config["server_domain"], access_id=config["api_key"], access_key=config["api_secret"], page_id="{pageId}", user_id=config["user_token"])
has_update = result["success"] and result["data"]["last_modified"] > {skill_generated_at}
print(f"has_update: {has_update}")
\`\`\`
> **注意**:`'__file__'` 占位符在生成时必须替换为生成的 SKILL.md 所在目录的绝对路径(即 `<技能包目录>`)
2. **根据结果处理**:
- `has_update == False` → 静默继续,执行后续查询
- `has_update == True` → **立即终止当前 skill**,**原样输出**以下提示(URL 用反引号包裹避免渲染为链接):
检测到「{仪表板名称}」仪表板已更新,当前技能的数据结构可能已过时。
请按以下步骤重新生成技能:(复制下面内容到对话框执行)
第 1 步:删除当前技能
删除 {skill-name} 技能
第 2 步:重新生成技能
为这个仪表板生成技能:`{仪表板URL}`
> **LLM 执行约束**:输出上述提示后立即终止,**禁止**自动删除技能或自动重新生成,必须等待用户手动执行。
---
```
> **生成要点**:
> - `{pageId}` 替换为 Step 2.1 返回的 `result["pageId"]`
> - ⛔ **禁止**使用用户输入 URL 中的 `productId`(门户 ID ≠ 仪表板 pageId)
> - 可通过 Step 2.1 输出的「pageId 校验」区块确认正确值
> - `{skill_generated_at}` 替换为 `dashboardData.basicInfo.gmtModified`(数值,不带引号)
> - `{仪表板名称}` 替换为 `dashboardData.basicInfo.name`
> - `{仪表板URL}` / `{url}` 替换为 `dashboard_url`(Step 2.1 返回的标准仪表板预览页地址,已包含正确 pageId)
> - Python 代码压缩为单行式,避免 LLM "理解"但不"执行"
## 触发场景
根据仪表板分析思路以及仪表板知识库生成 5-8 个自然语言查询示例:
```markdown
## 触发场景
当用户提出以下类型的问题时使用此 Skill:
- "查询某时间段的销售额"
- "配送方式分布情况"
- "商品子类别销售排行"
- "销售额和利润的趋势"
- "查看明细数据"
```
## 前置条件
```markdown
## 前置条件
- 内置 `scripts/quickbi_openapi.py` 和 `scripts/config_loader.py` 工具函数
- 需要有效的 API 凭证配置(四层配置加载,优先级:环境变量 > 工作目录级 > 全局 > 包内默认)
- 支持仪表板更新自动检测(通过元数据中的 `skill_generated_at` 与当前 `gmtModified` 对比)
### 配置加载优先级
**读取逻辑**:四层配置加载,优先级从高到低:`ACCESS_TOKEN` 环境变量 > 工作目录级 `$WORKSPACE_DIR/.qbi/smartq-chat/config.yaml` > 全局 `~/.qbi/config.yaml` > 包内 `config.yaml`
**写入逻辑**:始终写入工作目录级配置;全局配置写入受 `save_global_property` 开关控制
### 首次使用配置指南
**配置检查流程**(必须按此逻辑执行):
1. **调用 `load_config()` 加载配置**:
\`\`\`python
import sys, os
sys.path.insert(0, os.path.join('<技能包目录>', 'scripts'))
from config_loader import load_config
config = load_config() # 四层配置加载
\`\`\`
> **注意**:`<技能包目录>` 在生成时替换为实际路径
2. **检查配置是否完整**:
- 检查 `config.get("api_key")`、`config.get("api_secret")`、`config.get("user_token")` 三个字段
- **配置完整**(三个字段均非空):**不输出任何提示**,直接继续后续流程
- **配置不完整**(任一字段为空):执行步骤 3 的引导流程
3. **仅当配置不完整时**,执行以下引导步骤:
1. 使用 read_file 工具读取文件:`./example/copy_skill_config.png`,记录读取到的文件路径
> 注:该图片由仪表板技能生成器从父 skill 的 `common/copy_skill_config.png` 复制而来
2. 以 `` 的 Markdown 图片语法将图片展示给用户(路径以实际读取路径为准,不要写死)
3. 在图片下方输出引导语:「请登录 Quick BI 控制台,点击头像处的「**一键复制 skill 配置**」(如上图所示),然后将复制的配置粘贴给我。」
4. 等待用户粘贴配置,解析后写入工作目录级配置;若 `save_global_property` 为 `true`,同时写入全局配置 `~/.qbi/config.yaml`
```
## 仪表板知识库
> **【重要】图表组件必须完整列出**
>
> 生成的 Skill 必须包含仪表板中**所有**图表组件(除容器类组件外),不能只列出"主要"图表。
> 每个图表必须包含:数据集ID(sourceId)、字段配置、分析主题、分析层级。
> 这是用户问题路由到正确数据集的关键依据。
```markdown
## 仪表板知识库
### 基本信息
| 属性 | 值 |
|------|---|
| 名称 | {title} |
| URL | `{url}` |
| pageId | `{pageId}` |
> ⚠️ 上方 pageId 和 URL 均取自 Step 2.1 脚本返回值(`result["pageId"]` / `result["dashboardUrl"]`),非用户输入的原始 URL 参数。
### 业务背景与统计口径
> **【重要】此部分信息在问数查询时必须参考,特别是统计口径和过滤条件**
{综合仪表板标题、图表标题、字段名称、富文本生成的业务背景说明,包含:}
- **业务主题**:{从仪表板标题推断的业务领域,如"销售管理分析"、"财务报表"等}
- **核心指标**:{从图表标题和度量字段提取的关键指标,如"销售额、利润、订单数"等}
- **分析维度**:{从维度字段提取的关键维度,如"时间、区域、产品线"等}
- **统计口径**:{从富文本识别的指标定义,如无则写"无特殊说明"}
- **过滤条件**:{从富文本识别的数据过滤规则,如"不含退货"、"仅已完成订单",如无则写"无"}
### 数据集清单
> 汇总所有图表关联的数据集ID,这是 `query_openapi` 查询的核心入参
| 数据集ID | 关联图表数 | 主要用途 |
|----------|-----------|----------|
| `{sourceId1}` | {N}个 | {销售分析/库存管理等} |
| `{sourceId2}` | {N}个 | {客户分析等} |
### 查询控件 ({N} 个)
> **查询控件影响图表数据的筛选范围,理解控件配置有助于正确解读数据**
| 控件名 | 类型 | 时间粒度 | 默认值 | 需手动触发 | 关联图表 |
|--------|------|---------|--------|-----------|---------|
| {labelName} | {componentType} | {timeGranularity或-} | {defaultValue或无} | {needManualQuery: 是/否} | {关联图表列表} |
**查询控件字段说明**:
- **控件名**:`fieldConfigs[].labelName`
- **类型**:`componentType`(datetime=时间选择器, enumSelect=枚举选择器等)
- **时间粒度**:`config.timeGranularity`(day/week/month/quarter/year),影响时间筛选精度
- **默认值**:`defaultValue`,图表首次加载时的预设筛选值
- **需手动触发**:`needManualQuery`,若为"是"则用户需点击"查询"按钮才生效
- **关联图表**:该筛选器影响哪些图表的数据
### 图表组件(完整列表)
> **必须列出所有图表组件**,每个图表的数据集ID是问数查询的关键
| 图表名 | 组件类型 | 数据集ID | 维度字段 | 度量字段 | 过滤条件 | 下钻字段 | 关联筛选器 | 分析主题 | 分析层级 | 位置 | 所属Tab |
|--------|---------|---------|---------|---------|---------|---------|---------|---------|---------|------|--------|
| {图表名} | {customComponentId} | `{sourceId}` | {维度列表} | {度量(聚合)} | {filters列表或无} | {drillFields或无} | {关联的查询控件或无} | {分析主题} | L1/2/3/4 | ({x},{y}) | {Tab或-} |
**图表组件字段说明**:
- **图表名**:从 `componentName` 或 `attribute.caption` 提取
- **组件类型**:`customComponentId` 的值(如 indicator-card, ranking-list, pie 等)
- **数据集ID**:`queryInput.sourceId`,**必填项**,是 `query_openapi` 查询的核心参数(cube_id)
- **维度字段**:从 `queryInput.area` 中 `itemType` 为 dimension/datetime/geographic 的字段
- **度量字段**:从 `queryInput.area` 中 `itemType` 为 measure 的字段,标注聚合类型
- **过滤条件**:从 `queryInput.area` 中 `id` 为 `filters` 的字段,以及 `defaultFilters` 预设过滤(理解图表数据范围的关键)
- **下钻字段**:从 `queryInput.area` 中 `id` 为 `drill` 的 `columnList` 提取,支持用户下钻分析
- **关联筛选器**:从 `relatedQueryControls` 提取,标注该图表受哪些查询控件影响
- **分析主题**:根据图表类型和字段推断的业务分析场景(如"销售趋势分析"、"区域分布对比"等)
- **分析层级**:L1(整体监控)/L2(趋势分析)/L3(维度分解)/L4(明细追踪)
- **位置**:从 `componentContent.tileLayout` 提取的栅格 x/y 坐标
- **所属Tab**:如果图表属于某个Tab页,标注Tab名称
### 意图路由矩阵
> 用户问题 → 目标图表 → 数据集ID 的映射关系
| 用户意图关键词 | 目标图表 | 组件ID | 数据集ID | 分析主题 | 分析层级 | 路由说明 |
|---------------|---------|--------|---------|---------|---------|----------|
| {关键词1}, {关键词2} | {图表名} | `{componentId}` | `{sourceId}` | {分析主题} | L1/L2/L3/L4 | {为什么匹配到这个图表} |
**路由规则**:
1. 优先匹配度量字段名称(如"销售额"→ 包含销售额度量的图表)
2. 其次匹配维度字段名称(如"按区域"→ 包含区域维度的图表)
3. 再次匹配图表类型特征(如"排行"→ ranking-list 类型图表)
4. 最后匹配分析主题(如"趋势"→ 分析主题包含趋势的图表)
5. 无法匹配时,使用所有数据集ID让 `query_openapi` 自行判断
```
## 仪表板分析思路
> **【核心】此部分内容必须完全基于 `dashboardData`(Step 2.2 获取的数据)生成**
> - 所有指标名称必须来自图表的 `measures` 字段
> - 所有维度名称必须来自图表的 `dimensions` 字段
> - 分析框架匹配必须基于真实指标组合
> - 不可臆造不存在的指标或维度
```markdown
## 仪表板分析思路
### 一、报表主题
- **报表名称**:{从 basicInfo.name 获取}
- **核心主题**:{基于图表标题和指标推断的业务主题}
- **分析目标**:{解决什么业务问题}
- **目标用户**:{推断的使用者角色,如"运营人员"、"财务分析师"等}
### 二、分析框架匹配
> **基于仪表板真实指标组合自动匹配最适合的分析框架**
常用框架:杜邦分析、AARRR 海盗模型、RFM 客户分析、漏斗分析、目标达成分析、同环比分析等。无法匹配时使用 **L1-L4 金字塔**(概览→趋势→分解→明细)。
#### 本仪表板匹配结果
- **匹配的分析框架**:{根据真实指标匹配的框架名称}
- **匹配置信度**:{high/medium/default}
- **匹配依据**:基于以下真实字段匹配 - {列出匹配到的关键字段}
### 三、层级结构
> **根据仪表板复杂度选择合适的形式**
**形式 A:简单仪表板(1-2 个组件)**
- 直接列出组件即可,不强行分层
- 示例:`组件1: 销售额指标卡, 组件2: 销售趋势图`
**形式 B:标准层级仪表板(3-8 个组件,有明确层级)**
- 使用 L1-L4 金字塔描述:
```
L1 整体监控:{indicator-card/gauge 类型图表}
└─ 核心指标:{度量字段}
L2 趋势分析:{line/area/indicator-trend 类型图表}
└─ 时间维度 + 关键指标
L3 维度分解:{bar/pie/ranking-list 类型图表}
└─ 分析维度 + 分析指标
L4 明细追踪:{common-table 类型图表}
└─ 明细字段列表
```
**形式 C:多主题仪表板(Tab 页签或独立板块)**
- 按主题/Tab 分组描述:
```
主题1: {Tab名称或板块名}
└─ 组件列表及层级
主题2: {Tab名称或板块名}
└─ 组件列表及层级
```
### 四、核心指标体系
> **从 `dashboardData.chartComponents[].measures` 提取,不可臆造**
| 指标类型 | 指标名称 | 聚合方式 | 来源图表 | 分析层级 | 分析价值 |
|---------|---------|---------|---------|---------|---------|
| 结果指标 | {measure.caption} | {aggregateType} | {图表名} | L1 | {推断的分析价值,如"核心业绩指标"} |
| 过程指标 | {measure.caption} | {aggregateType} | {图表名} | L2/L3 | {推断的分析价值,如"过程监控指标"} |
| 效率指标 | {measure.caption} | {aggregateType} | {图表名} | L1/L2 | {推断的分析价值,如"效率评估指标"} |
**指标分类说明**:
- **结果指标**:反映最终业务成果,如销售额、利润、订单数等
- **过程指标**:反映业务过程状态,如转化率、完成率、增长率等
- **效率指标**:反映资源利用效率,如人均产出、周转率、客单价等
### 五、维度体系
> **从 `dashboardData.chartComponents[].dimensions` 提取,不可臆造**
| 维度层级 | 维度名称 | 维度类型 | 来源图表 | 分析用途 |
|---------|---------|---------|---------|---------|
| 时间维度 | {dimension.caption} | datetime | {图表名} | 趋势分析、同环比对比 |
| 地理维度 | {dimension.caption} | geographic | {图表名} | 区域分布、地域下钻 |
| 分类维度 | {dimension.caption} | dimension | {图表名} | 结构分析、归因定位 |
**维度层级说明**:
- **时间维度**:用于趋势分析和周期对比
- **地理维度**:用于区域分布和地域下钻
- **分类维度**:用于结构分析和问题归因
### 六、业务逻辑
> **基于真实指标字段名称推断可能的计算关系**
**指标关系**(基于字段名称推断):
- {推断的计算公式1,如:利润 = 销售额 - 成本}
- {推断的计算公式2,如:毛利率 = 利润 / 销售额 × 100%}
- {推断的计算公式3,如:客单价 = 销售额 / 订单数}
**业务逻辑链**:
{指标间的因果关系链,如:}
- 销售额 ← 客户数 × 客单价
- 利润 ← 销售额 - 成本
- 毛利率 ← 利润 / 销售额
**分析路径**:
```
发现问题(L1 指标异常)
→ 趋势定位(L2 确定异常时间点)
→ 维度归因(L3 定位问题维度)
→ 明细追溯(L4 查看具体记录)
```
### 七、联动与下钻路径
| 源层级 | 源图表 | 联动动作 | 目标层级 | 目标图表 | 分析用途 |
|-------|-------|---------|---------|---------|---------|
| L1 | {指标卡图表名} | 点击/筛选 | L2 | {趋势图表名} | 从总览下钻到趋势,确定异常时间点 |
| L2 | {趋势图表名} | 点击时间点 | L3 | {分布图表名} | 从趋势下钻到维度,定位问题分类 |
| L3 | {分布图表名} | 点击维度值 | L4 | {明细表名} | 从维度下钻到明细,追溯具体记录 |
### 八、适用场景与查询路径
| 业务问题类型 | 典型问题示例 | 分析路径 | 涉及层级 | 涉及指标/维度 |
|-------------|-------------|---------|---------|--------------|
| 现状查询 | "当前{指标名}是多少" | L1 直接查询 | L1 | {指标名} |
| 趋势分析 | "{指标名}趋势如何" | L1→L2 | L1, L2 | {时间维度} + {指标名} |
| 对比分析 | "各{维度}的{指标名}对比" | L1→L3 | L1, L3 | {分类维度} + {指标名} |
| 问题诊断 | "为什么{指标名}下降" | L1→L2→L3 | L1, L2, L3 | {相关指标链} |
| 归因定位 | "哪个{维度}有问题" | L3 深入 | L3 | {分类维度} + {多个指标} |
| 明细追溯 | "查看{主题}明细" | L3→L4 | L3, L4 | {维度} + {明细字段} |
| 排行查询 | "{维度}排行TOP N" | L3 排序 | L3 | {分类维度} + {指标名} |
```
## 工作流程
```markdown
## 工作流程
### Step 1: 理解用户问题
参照「仪表板知识库」和「分析思路」以及 「分析路径」理解用户问题,**必须输出理解结果**:
1. **解析用户问题**:提取关键词(指标、维度、时间范围、筛选条件)
2. **参照知识库理解**:
- 对照「图表组件」表格,识别涉及的图表和数据集
- 对照「分析思路」,匹配适用的分析场景
- 对照「业务逻辑」,理解指标间的关联关系
3. **【必须输出】理解结果**:
```markdown
---
## 问题理解
- **用户意图**:{用户想要了解什么}
- **匹配分析场景**:{从分析思路中匹配的场景,如"趋势分析"、"对比分析";多 Tab 仪表板需全面考虑各 Tab 的分析场景}
- **涉及指标**:{相关的度量字段}
- **涉及维度**:{相关的维度字段}
---
```
### Step 2: 判断问题分类并拆解
根据 Step 1 的理解结果,判断问题类型并输出分析计划:
#### 分类规则
| 类型 | 判断条件 | 处理方式 |
|------|----------|----------|
| **简单问题** | 用户问题与某图表名高度吻合 | 无需拆解,直接查询该图表对应的维度+指标 |
| **复合问题** | 需要多维度分析或归因 | 拆解为 2-3 个子问题 |
#### 子问题拆解规范(仅复合问题需要)
1. **三要素**:每个子问题必须明确 `分析维度` + `待查指标` + `目的`
2. **单表原则**:每个子问题应能让 SmartQ 直接返回一张表,避免多维度交叉
3. **查询顺序**:按 L1→L2→L3 逻辑排列(先总体 → 再维度分解 → 最后细分定位)
4. **动态补充**:根据查询结果动态决定是否需要追加问题(非固定数量)
5. **比率处理**:若问题涉及比率(如增长率、转化率),可先尝试直接查询;如SmartQ无法识别或结果异常,再拆解为基础指标(分子/分母)自行计算
#### 输出格式(此步骤输出给用户查看)
```
## 分析计划
**用户问题**:{原始问题}
**分析层级**:{L1/L2/L3/L4} - {简要说明,如"基于L1整体销售额,下钻到L2门店维度"}
| # | 子问题 | 维度 | 指标 | 目的 |
|---|--------|------|------|------|
| 1 | {问题描述} | {维度} | {指标} | {目的} |
| 2 | {问题描述} | {维度} | {指标} | {目的} |
```
### Step 3: 构建查询
根据问题分类构建查询:
**简单问题**:
```markdown
- **查询问题**:{用户原始问题或转换后的查询语句}
- **数据集ID**:{从匹配图表获取的 sourceId}
```
**复杂问题**(拆解为 2-4 个子问题):
```markdown
---
## 子问题清单
| 序号 | 子问题 | 数据集ID | 分析目的 |
|------|--------|---------|----------|
| 1 | {子问题1} | `{sourceId}` | {这个子问题要解答什么} |
| 2 | {子问题2} | `{sourceId}` | {这个子问题要解答什么} |
| 3 | {子问题3} | `{sourceId}` | {这个子问题要解答什么} |
---
```
**兜底处理**:如果无法匹配到具体图表,收集所有图表组件的 `数据集ID`,去重后作为多数据集入参。
### Step 4: 调用 SmartQ
使用 Step 3 构建的查询,调用内置的 `scripts/quickbi_openapi.py` 中的 `query_openapi` 函数:
```python
import sys, os
sys.path.insert(0, os.path.join('<技能包目录>', 'scripts'))
from quickbi_openapi import query_openapi
from config_loader import load_config
# 加载配置(优先级:环境变量 > 工作目录级 > 全局 > 包内默认)
config = load_config()
# 简单问题:单次调用
result = query_openapi(
endpoint=config["server_domain"],
access_key_id=config["api_key"],
access_key_secret=config["api_secret"],
question=query_question,
user_id=config["user_token"],
cube_id=dataset_id # 单个数据集ID 或 逗号分隔的多个ID
)
# 复杂问题:循环调用每个子问题
results = []
for sub_question in sub_questions:
result = query_openapi(
endpoint=config["server_domain"],
access_key_id=config["api_key"],
access_key_secret=config["api_secret"],
question=sub_question["question"],
user_id=config["user_token"],
cube_id=sub_question["cube_id"]
)
results.append({"sub_question": sub_question, "result": result})
# 根据结果动态决定是否补充查询
# 如发现异常值,可动态添加第3个问题
```
**多次调用规则**:
- 每个子问题独立调用一次 `query_openapi`,最多 **3 次**(极简原则)
- 按 Step 1 拆解的顺序逐条执行
- **动态补充**:前一次结果如发现异常(如某维度值过高/过低),可补充下一个问题深入分析
**重要约束**:
- **数据正常返回时不尝试其他方式**:当 API 正常返回数据(包括空数据、全0数据等),直接展示
- **仅异常时降级到浏览器**:只有当 API 返回异常(接口抛错、网络不通、权限错误等)时,才尝试打开仪表板
- **不添加个人判断**:不对数据的合理性进行判断,直接展示 API 返回的原始结果
### Step 5: 汇总结果展示
根据问题分类展示结果:
**简单问题**:
```markdown
## 查询结果
**智能总结**: {ConclusionText,如有}
| 列1 | 列2 | ... |
|-----|-----|-----|
| 数据... |
*共 N 条数据*
```
**复杂问题**:
```markdown
## 分析结果
### 子问题 1: {子问题描述}
{该子问题的查询结果表格}
### 子问题 2: {子问题描述}
{该子问题的查询结果表格}
...
### 综合分析
{基于各子问题结果的综合分析,回答用户原始问题}
**来源**:[仪表板名称]({URL})
```
**来源链接说明**:
- URL 从「仪表板知识库 - 基本信息 - URL」字段获取
- 如果问题匹配到了具体的图表组件,则在 URL 末尾追加 `&componentId={componentId}&highlight=true`
## 错误处理
| 场景 | 检测方式 | 处理策略 |
|------|---------|----------|
| 问题理解失败 | 无法提取有效关键词 | 询问用户具体想查询什么数据 |
| 意图匹配失败 | 无法匹配到图表组件 | 使用所有数据集ID进行兜底查询 |
| API 调用失败 | 返回错误码 | 根据错误码提示用户(权限、实例过期等)|
| 部分子问题失败 | 部分 API 调用失败 | 展示成功的结果,标注失败的子问题 |
**注意**:API 成功返回的数据(包括空数据、全0数据等)直接展示给用户,不进行二次查询或修正。
```
FILE:references/document/document_classification.md
# 文档分类体系 — 详细字段定义(V2.0)
本文件定义了 **10 大分类组、37 个文档子类型**及其标准提取字段。Agent 提取文本字段并进行归并时,必须参照此文件。
**字段命名规则**:英文 `snake_case`,Excel 表头使用中文名(括号内为英文字段名)。每个子类型的字段列表中,`filename`(源文件名)为隐含首列,无需在此定义。
---
## A. 财务与税务类 (finance-tax)
### A1. 增值税发票 (vat-invoice)
*(Sheet: `增值税发票`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| invoice_type | 发票类型 | 专用/普通/电子/红字 |
| invoice_code | 发票代码 | 10-12位 |
| invoice_number | 发票号码 | 8位 |
| date | 开票日期 | YYYY-MM-DD |
| buyer_name | 购买方名称 | |
| buyer_tax_id | 购买方税号 | |
| seller_name | 销售方名称 | |
| seller_tax_id | 销售方税号 | |
| items | 货物/服务名称 | 多项分号分隔 |
| amount_before_tax | 金额(不含税) | 数值,2位小数 |
| tax_rate | 税率 | 如 13%、9%、6% |
| tax_amount | 税额 | 数值 |
| total_amount | 价税合计 | 数值 |
| remarks | 备注 | |
### A2. 银行回单 (bank-receipt)
*(Sheet: `银行回单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| bank_name | 银行名称 | |
| serial_number | 流水号/凭证号 | |
| transaction_date | 交易日期 | YYYY-MM-DD |
| payer_name | 付款人名称 | |
| payer_account | 付款人账号 | |
| payee_name | 收款人名称 | |
| payee_account | 收款人账号 | |
| amount | 交易金额 | 数值 |
| currency | 币种 | 默认 CNY |
| transaction_type | 交易类型 | 转账/汇款/代付等 |
| purpose | 用途/摘要 | |
### A3. 银行对账单 (bank-statement)
*(Sheet: `银行对账单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| bank_name | 银行名称 | |
| account_number | 账号 | |
| account_name | 户名 | |
| statement_period | 对账周期 | 如 2026-01 至 2026-03 |
| opening_balance | 期初余额 | 数值 |
| closing_balance | 期末余额 | 数值 |
| total_debit | 借方合计 | 数值 |
| total_credit | 贷方合计 | 数值 |
| transaction_count | 交易笔数 | 整数 |
### A4. 费用报销单 (expense-claim)
*(Sheet: `费用报销单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| applicant | 报销人 | |
| department | 部门 | |
| claim_date | 报销日期 | YYYY-MM-DD |
| expense_items | 费用项目 | 分号分隔 |
| total_amount | 报销金额 | 数值 |
| payment_method | 支付方式 | 现金/转账等 |
| approver | 审批人 | |
| remarks | 备注 | |
### A5. 合同/协议 (contract)
*(Sheet: `合同协议`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| contract_number | 合同编号 | |
| contract_title | 合同名称 | |
| party_a | 甲方 | |
| party_b | 乙方 | |
| contract_type | 合同类型 | 采购/服务/租赁/其他 |
| effective_date | 生效日期 | YYYY-MM-DD |
| expiry_date | 到期日期 | YYYY-MM-DD |
| contract_value | 合同金额 | 数值 |
| currency | 币种 | 默认 CNY |
| key_terms | 关键条款 | 摘要≤150字 |
| signature_date | 签署日期 | YYYY-MM-DD |
### A6. 税务申报表 (tax-return)
*(Sheet: `税务申报表`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| taxpayer_name | 纳税人名称 | |
| tax_id | 纳税人识别号 | |
| declaration_period | 申报所属期 | YYYY-MM |
| tax_type | 税种 | 增值税/企业所得税等 |
| taxable_amount | 计税依据 | 数值 |
| tax_payable | 应纳税额 | 数值 |
| tax_paid | 已缴税额 | 数值 |
| deduction_amount | 减免税额 | 数值 |
| declaration_status | 申报状态 | 已申报/未申报/更正 |
### A7. 财务报表 (financial-statement)
*(Sheet: `财务报表`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| company_name | 企业名称 | |
| report_period | 报表期间 | 如 2025-Q3 |
| total_assets | 资产总额 | 数值 |
| total_liabilities | 负债总额 | 数值 |
| equity | 所有者权益 | 数值 |
| revenue | 营业收入 | 数值 |
| net_profit | 净利润 | 数值 |
| cash_flow_operating | 经营现金流 | 数值 |
| audit_opinion | 审计意见 | 无保留/保留/否定等 |
### A8. 收据/付款凭证 (receipt-payment)
*(Sheet: `收据付款凭证`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| receipt_number | 收据编号 | |
| date | 收款日期 | YYYY-MM-DD |
| payer | 付款方 | |
| payee | 收款方 | |
| amount | 金额 | 数值 |
| payment_method | 支付方式 | 现金/转账/支票等 |
| purpose | 款项用途 | |
| issuer_signature | 开具人签字 | 文本或“有/无” |
---
## B. 人力资源类 (hr)
### B1. 简历 (resume)
*(Sheet: `简历`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| name | 姓名 | |
| gender | 性别 | |
| birth_date | 出生日期 | YYYY-MM-DD 或年龄 |
| phone | 手机号 | |
| email | 邮箱 | |
| highest_education | 最高学历 | 本科/硕士/博士等 |
| school | 毕业院校 | 最高学历院校 |
| major | 专业 | |
| work_years | 工作年限 | |
| latest_company | 最近工作单位 | |
| latest_position | 最近职位 | |
| skills | 技能关键词 | 用分号分隔 |
| expected_salary | 期望薪资 | 如有 |
### B2. 劳动合同 (labor-contract)
*(Sheet: `劳动合同`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| employee_name | 员工姓名 | |
| id_number | 身份证号 | |
| employer_name | 用人单位 | |
| contract_type | 合同类型 | 固定期限/无固定期限/完成一定工作 |
| start_date | 合同起始日 | YYYY-MM-DD |
| end_date | 合同终止日 | YYYY-MM-DD |
| position | 岗位 | |
| work_location | 工作地点 | |
| salary | 薪酬 | 数值或描述 |
| probation_period | 试用期 | 如 3个月 |
| probation_salary | 试用期薪酬 | |
| signature_date | 签订日期 | YYYY-MM-DD |
### B3. 离职证明 (resignation-cert)
*(Sheet: `离职证明`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| employee_name | 员工姓名 | |
| id_number | 身份证号 | |
| company_name | 公司名称 | |
| position | 职位 | |
| entry_date | 入职日期 | YYYY-MM-DD |
| leave_date | 离职日期 | YYYY-MM-DD |
| leave_reason | 离职原因 | 如有 |
| issue_date | 开具日期 | YYYY-MM-DD |
### B4. 工资条 (payslip)
*(Sheet: `工资条`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| employee_name | 员工姓名 | |
| employee_id | 工号 | |
| pay_period | 工资周期 | 如 2026-03 |
| base_salary | 基本工资 | 数值 |
| allowances | 津贴/补贴 | 数值 |
| overtime_pay | 加班费 | 数值 |
| bonus | 奖金 | 数值 |
| gross_pay | 应发合计 | 数值 |
| social_insurance | 社保扣除 | 数值 |
| housing_fund | 公积金扣除 | 数值 |
| tax | 个税扣除 | 数值 |
| other_deductions | 其他扣款 | 数值 |
| net_pay | 实发工资 | 数值 |
### B5. 考勤记录 (attendance-record)
*(Sheet: `考勤记录`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| employee_id | 员工ID | |
| date | 日期 | YYYY-MM-DD |
| clock_in | 打卡时间(入) | HH:MM |
| clock_out | 打卡时间(出) | HH:MM |
| work_hours | 工时 | 数值 |
| status | 异常状态 | 正常/迟到/早退/缺卡/请假 |
| department | 部门 | |
### B6. 培训证书 (training-cert)
*(Sheet: `培训证书`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| trainee_name | 学员姓名 | |
| course_name | 课程名称 | |
| training_org | 培训机构 | |
| issue_date | 颁发日期 | YYYY-MM-DD |
| valid_until | 有效期至 | YYYY-MM-DD |
| certificate_no | 证书编号 | |
| credits_hours | 学分/课时 | 数值 |
### B7. 绩效考核表 (performance-review)
*(Sheet: `绩效考核表`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| employee_id | 员工ID | |
| review_period | 考核周期 | 如 2025-Q3 |
| kpi_scores | KPI得分 | 指标:得分,分号分隔 |
| overall_rating | 综合评级 | S/A/B/C/D |
| manager_comments | 主管评语 | 摘要 |
| next_goals | 下期目标 | 摘要 |
| review_date | 考核日期 | YYYY-MM-DD |
---
## C. 供应链与采购类 (supply-chain)
### C1. 采购订单 (purchase-order)
*(Sheet: `采购订单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| po_number | 订单编号 | |
| date | 下单日期 | YYYY-MM-DD |
| supplier | 供应商 | |
| buyer | 采购方 | |
| items | 物料/商品 | 多项用分号分隔 |
| quantities | 数量 | 对应 items |
| unit_prices | 单价 | 对应 items |
| total_amount | 总金额 | 数值 |
| currency | 币种 | 默认 CNY |
| delivery_date | 交货日期 | YYYY-MM-DD |
| payment_terms | 付款条件 | |
### C2. 送货单 (delivery-note)
*(Sheet: `送货单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| delivery_number | 送货单号 | |
| date | 送货日期 | YYYY-MM-DD |
| supplier | 供应商/发货方 | |
| receiver | 收货方 | |
| items | 货物名称 | 多项用分号分隔 |
| quantities | 数量 | 对应 items |
| delivery_address | 送货地址 | |
| receiver_name | 签收人 | |
| related_po | 关联采购订单号 | 如有 |
### C3. 入库单 (warehouse-receipt)
*(Sheet: `入库单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| receipt_number | 入库单号 | |
| date | 入库日期 | YYYY-MM-DD |
| supplier | 供应商 | |
| warehouse | 入库仓库 | |
| items | 物料名称 | 多项用分号分隔 |
| quantities | 数量 | 对应 items |
| inspector | 验收人 | |
| related_po | 关联采购订单号 | 如有 |
| remarks | 备注 | |
### C4. 质检报告 (quality-report)
*(Sheet: `质检报告`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| report_number | 报告编号 | |
| date | 检测日期 | YYYY-MM-DD |
| product_name | 产品名称 | |
| batch_number | 批号 | |
| specification | 规格型号 | |
| test_items | 检测项目 | 多项用分号分隔 |
| test_results | 检测结果 | 对应 test_items |
| conclusion | 结论 | 合格/不合格 |
| inspector | 检验员 | |
| issuing_org | 出具机构 | |
### C5. 供应商评估表 (supplier-evaluation)
*(Sheet: `供应商评估表`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| supplier_name | 供应商名称 | |
| evaluation_period | 评估周期 | 如 2025-Q3 |
| quality_score | 质量评分 | 1-10分 |
| delivery_score | 交期评分 | 1-10分 |
| cost_score | 价格评分 | 1-10分 |
| service_score | 服务评分 | 1-10分 |
| overall_rating | 综合评级 | A/B/C/D |
| risk_level | 风险等级 | 高/中/低 |
| evaluator | 评估人 | |
### C6. 库存盘点表 (inventory-count)
*(Sheet: `库存盘点表`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| warehouse | 仓库名称 | |
| location_code | 库位编码 | |
| item_code | 物料编码 | |
| book_qty | 账面数量 | 数值 |
| actual_qty | 实盘数量 | 数值 |
| variance_qty | 差异数量 | 数值 |
| variance_reason | 差异原因 | 摘要 |
| count_date | 盘点日期 | YYYY-MM-DD |
| counter | 盘点人 | |
---
## D. 行政与法务类 (admin-legal)
### D1. 营业执照 (business-license)
*(Sheet: `营业执照`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| company_name | 企业名称 | |
| unified_credit_code | 统一社会信用代码 | 18位 |
| legal_representative | 法定代表人 | |
| company_type | 企业类型 | 有限责任/股份有限等 |
| registered_capital | 注册资本 | |
| establishment_date | 成立日期 | YYYY-MM-DD |
| business_scope | 经营范围 | 摘要,200字以内 |
| address | 住所 | |
| valid_period | 营业期限 | |
### D2. 身份证 (id-card)
*(Sheet: `身份证`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| name | 姓名 | |
| gender | 性别 | 男/女 |
| ethnicity | 民族 | |
| birth_date | 出生日期 | YYYY-MM-DD |
| address | 住址 | |
| id_number | 身份证号码 | 18位 |
| issuing_authority | 签发机关 | |
| valid_period | 有效期限 | 起止日期 |
### D3. 护照 (passport)
*(Sheet: `护照`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| name_cn | 中文姓名 | |
| name_en | 英文姓名 | |
| nationality | 国籍 | |
| gender | 性别 | |
| birth_date | 出生日期 | YYYY-MM-DD |
| birth_place | 出生地 | |
| passport_number | 护照号码 | |
| issue_date | 签发日期 | YYYY-MM-DD |
| expiry_date | 有效期至 | YYYY-MM-DD |
| issuing_authority | 签发机关 | |
### D4. 保密协议 (nda)
*(Sheet: `保密协议`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| party_a | 甲方 | |
| party_b | 乙方 | |
| sign_date | 签署日期 | YYYY-MM-DD |
| confidential_period | 保密期限 | 如 3年/永久 |
| scope_summary | 保密范围摘要 | ≤150字 |
| penalty_clause | 违约赔偿条款 | 摘要 |
| governing_law | 管辖法律 | |
### D5. 资质证书 (qualification-license)
*(Sheet: `资质证书`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| cert_number | 证书编号 | |
| holder_name | 持证主体 | |
| issuing_authority | 发证机关 | |
| issue_date | 发证日期 | YYYY-MM-DD |
| expiry_date | 有效期至 | YYYY-MM-DD |
| license_scope | 许可范围 | 摘要 |
| status | 状态 | 有效/吊销/过期 |
### D6. 公文/通知 (official-notice)
*(Sheet: `公文通知`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| doc_number | 文号 | |
| issuing_dept | 发文机关 | |
| title | 标题 | |
| publish_date | 发布日期 | YYYY-MM-DD |
| target_audience | 主送/抄送单位 | |
| content_summary | 正文摘要 | ≤200字 |
| urgency | 紧急程度 | 特急/加急/平件 |
| confidential_level | 密级 | 绝密/机密/秘密/公开 |
---
## E. 医疗类 (medical)
### E1. 病历 (medical-record)
*(Sheet: `病历`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| patient_name | 患者姓名 | |
| gender | 性别 | |
| age | 年龄 | |
| visit_date | 就诊日期 | YYYY-MM-DD |
| hospital | 医疗机构 | |
| department | 科室 | |
| doctor | 主治医生 | |
| chief_complaint | 主诉 | 简述,100字以内 |
| diagnosis | 诊断 | |
| treatment_plan | 治疗方案 | 摘要 |
| medications | 用药 | 多项用分号分隔 |
### E2. 处方单 (prescription)
*(Sheet: `处方单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| patient_name | 患者姓名 | |
| gender | 性别 | |
| age | 年龄 | |
| prescription_date | 开具日期 | YYYY-MM-DD |
| hospital | 医疗机构 | |
| department | 科室 | |
| doctor | 处方医生 | |
| diagnosis | 临床诊断 | |
| medications | 药品名称 | 多项用分号分隔 |
| dosage | 剂量 | 对应 medications |
| usage | 用法 | 如 口服 tid |
| duration | 疗程 | 如 7天 |
### E3. 检验/检查报告 (lab-report)
*(Sheet: `检验检查报告`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| report_no | 报告编号 | |
| patient_id | 患者ID | |
| test_date | 检查日期 | YYYY-MM-DD |
| test_items | 检测项目 | 分号分隔 |
| results | 结果值 | 分号分隔,对应项目 |
| reference_ranges | 参考范围 | 分号分隔 |
| abnormal_flags | 异常标识 | ↑/↓/正常 |
| critical_value | 危急值标记 | 是/否 |
| reviewer | 审核医师 | |
### E4. 体检报告 (health-checkup)
*(Sheet: `体检报告`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| checkup_no | 体检编号 | |
| check_date | 体检日期 | YYYY-MM-DD |
| dept_items | 科室/项目 | 分号分隔 |
| indicators | 指标值 | 分号分隔 |
| normal_abnormal | 正常/异常 | 分号分隔 |
| health_advice | 健康建议 | 摘要 |
| overall_risk | 总体风险等级 | 高/中/低 |
| doctor | 总检医师 | |
---
## F. 保险类 (insurance)
### F1. 保单 (insurance-policy)
*(Sheet: `保单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| policy_number | 保单号 | |
| insurance_company | 保险公司 | |
| insurance_type | 险种 | 寿险/车险/财产险等 |
| insured_name | 被保险人 | |
| insured_id | 被保险人证件号 | |
| policyholder | 投保人 | |
| coverage_amount | 保额 | 数值 |
| premium | 保费 | 数值 |
| payment_frequency | 缴费方式 | 年缴/月缴/趸缴 |
| effective_date | 生效日期 | YYYY-MM-DD |
| expiry_date | 到期日期 | YYYY-MM-DD |
| beneficiary | 受益人 | |
### F2. 理赔申请 (insurance-claim)
*(Sheet: `理赔申请`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| claim_number | 理赔编号 | |
| policy_number | 关联保单号 | |
| claimant | 申请人 | |
| incident_date | 出险日期 | YYYY-MM-DD |
| incident_type | 出险类型 | 意外/疾病/车祸等 |
| incident_description | 事故描述 | 摘要,150字以内 |
| claim_amount | 申请金额 | 数值 |
| hospital | 就诊医院 | 如适用 |
| supporting_docs | 附件材料 | 列出提交的证明文件 |
| claim_date | 申请日期 | YYYY-MM-DD |
### F3. 理赔结案通知 (claim-settlement)
*(Sheet: `理赔结案通知`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| claim_no | 理赔编号 | |
| policy_no | 关联保单号 | |
| settlement_date | 结案日期 | YYYY-MM-DD |
| approved_amount | 核定赔付金额 | 数值 |
| deductible | 免赔额 | 数值 |
| payment_method | 支付方式 | 银行转账/支票等 |
| closure_reason | 结案原因 | 赔付/拒赔/撤诉 |
| insurer_signature | 保险公司签章 | 文本或“有/无” |
---
## G. 物流类 (logistics)
### G1. 运单 (waybill)
*(Sheet: `运单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| waybill_number | 运单号 | |
| carrier | 承运方 | 物流公司名称 |
| shipper | 发货人 | |
| shipper_address | 发货地址 | |
| consignee | 收货人 | |
| consignee_address | 收货地址 | |
| goods_description | 货物描述 | |
| quantity | 件数 | 整数 |
| weight | 重量(kg) | 数值 |
| freight | 运费 | 数值 |
| shipment_date | 发货日期 | YYYY-MM-DD |
| delivery_date | 预计到达日期 | YYYY-MM-DD |
### G2. 提单 (bill-of-lading)
*(Sheet: `提单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| bl_number | 提单号 | |
| bl_type | 提单类型 | 正本/副本/电放 |
| shipper | 托运人 | |
| consignee | 收货人 | |
| notify_party | 通知方 | |
| vessel_name | 船名 | |
| voyage | 航次 | |
| port_of_loading | 装货港 | |
| port_of_discharge | 卸货港 | |
| container_number | 集装箱号 | 多个用分号分隔 |
| seal_number | 铅封号 | |
| goods_description | 货物描述 | |
| gross_weight | 毛重(kg) | 数值 |
| measurement | 体积(CBM) | 数值 |
| issue_date | 签发日期 | YYYY-MM-DD |
| number_of_originals | 正本份数 | 通常 3 |
### G3. 报关单 (customs-declaration)
*(Sheet: `报关单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| declaration_no | 报关单号 | |
| customs_code | 海关编码 | |
| importer | 进口商 | |
| exporter | 出口商 | |
| trade_mode | 贸易方式 | 一般贸易/加工贸易等 |
| goods_name | 货物名称 | |
| hs_code | HS编码 | |
| declared_value | 申报价值 | 数值 |
| currency | 币种 | USD/CNY等 |
| port | 口岸 | |
| declaration_date | 申报日期 | YYYY-MM-DD |
---
## H. 技术与运维类 (tech-ops) 🆕
### H1. 系统日志 (system-log)
*(Sheet: `系统日志`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| timestamp | 时间戳 | YYYY-MM-DD HH:MM:SS |
| log_level | 日志级别 | INFO/WARN/ERROR/FATAL |
| service_name | 服务/模块名 | |
| trace_id | 追踪ID | 用于链路追踪 |
| error_code | 错误码 | 如有 |
| ip_address | IP地址 | 来源或目标 |
| message_summary | 消息摘要 | ≤200字 |
| stack_trace | 堆栈关键行 | 截取关键报错 |
### H2. 漏洞扫描报告 (vulnerability-report)
*(Sheet: `漏洞扫描报告`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| report_id | 报告编号 | |
| scan_date | 扫描时间 | YYYY-MM-DD |
| target_system | 目标系统/IP | |
| vuln_name | 漏洞名称 | |
| cve_id | CVE编号 | 如有 |
| risk_level | 风险等级 | 高/中/低/信息 |
| affected_component | 影响组件 | |
| remediation | 修复建议 | 摘要 |
| status | 修复状态 | 未修复/已修复/忽略 |
### H3. 服务器监控报表 (server-monitoring)
*(Sheet: `服务器监控报表`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| host_name | 主机名/IP | |
| monitor_time | 监控时间 | YYYY-MM-DD HH:MM |
| cpu_usage | CPU使用率(%) | 数值 |
| mem_usage | 内存使用率(%) | 数值 |
| disk_io | 磁盘IO(MB/s) | 数值 |
| network_traffic | 网络流量(Mbps) | 数值 |
| alert_events | 告警事件数 | 整数 |
| sla_availability | 可用性SLA(%) | 数值 |
---
## I. 客服与销售类 (sales-service) 🆕
### I1. 客诉/工单记录 (customer-ticket)
*(Sheet: `客诉工单记录`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| ticket_id | 工单号 | |
| customer_id | 客户ID | |
| create_time | 创建时间 | YYYY-MM-DD HH:MM |
| category | 问题分类 | 产品/物流/售后/账单等 |
| priority | 紧急程度 | P1/P2/P3/P4 |
| handler | 处理人 | |
| sla_hours | 解决时长(小时) | 数值 |
| csat_score | 满意度评分 | 1-5分 |
| status | 工单状态 | 待处理/处理中/已关闭 |
### I2. 销售报价单 (sales-quotation)
*(Sheet: `销售报价单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| quote_no | 报价单号 | |
| quote_date | 报价日期 | YYYY-MM-DD |
| customer_name | 客户名称 | |
| sales_rep | 销售代表 | |
| items | 产品/服务 | 分号分隔 |
| unit_price | 单价 | 对应items |
| quantity | 数量 | 对应items |
| total_amount | 报价总额 | 数值 |
| valid_until | 报价有效期 | YYYY-MM-DD |
| terms | 商务条款 | 摘要 |
### I3. 售后退换货单 (return-exchange)
*(Sheet: `售后退换货单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| return_no | 退换单号 | |
| original_order | 原订单号 | |
| apply_date | 申请日期 | YYYY-MM-DD |
| reason | 退换原因 | 质量/错发/七天无理由等 |
| items | 退换商品 | 分号分隔 |
| refund_amount | 退款金额 | 数值 |
| logistics_no | 退回物流单号 | |
| approval_status | 审批状态 | 待审/通过/驳回 |
---
## J. 政务与合规类 (gov-compliance) 🆕
### J1. 招投标文件 (bidding-doc)
*(Sheet: `招投标文件`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| project_no | 项目编号 | |
| tenderer | 招标方 | |
| bidder | 投标方 | |
| bid_date | 开标日期 | YYYY-MM-DD |
| bid_price | 投标报价 | 数值 |
| tech_summary | 技术方案摘要 | ≤200字 |
| qualification | 资质证明 | 列出核心资质 |
| evaluation_score | 评标得分 | 数值 |
| result | 中标状态 | 中标/未中标/流标 |
### J2. 政务审批单 (gov-approval)
*(Sheet: `政务审批单`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| approval_no | 审批单号 | |
| applicant | 申请人/企业 | |
| matter | 申请事项 | |
| submit_date | 提交日期 | YYYY-MM-DD |
| process_nodes | 审批环节 | 分号分隔 |
| result | 批复结果 | 通过/驳回/补正 |
| valid_period | 有效期 | YYYY-MM-DD 至 YYYY-MM-DD |
| attachments | 附件清单 | 分号分隔 |
### J3. 合规审计报告 (compliance-audit)
*(Sheet: `合规审计报告`)*
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| audit_target | 审计对象 | |
| audit_period | 审计期间 | |
| auditor | 审计机构 | |
| high_issues | 高风险问题数 | 整数 |
| medium_issues | 中风险问题数 | 整数 |
| low_issues | 低风险问题数 | 整数 |
| rectification_status | 整改状态 | 未开始/进行中/已完成 |
| conclusion | 审计结论 | 摘要 |
| rating | 合规评级 | A/B/C/D |
---
## 未识别类 (unrecognized)
Sheet 名:`未识别`
无法匹配任何预定义子类型的文档归入此类。
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| raw_text_preview | 内容预览 | parsedText 前 300 字符 |
| possible_type | 疑似类型 | agent 的最佳猜测(如有) |
| confidence | 置信度 | 高/中/低 |
| reason | 未识别原因 | 内容模糊/类型未覆盖/OCR质量差 |
## 解析失败类 (parse-failed)
Sheet 名:`解析失败`
document-parser API 返回 status=failed 的文件。
| 字段名 | 中文表头 | 说明 |
|--------|---------|------|
| error_message | 错误信息 | API 返回的 errorMessage |
| file_type | 文件类型 | 文件扩展名 |
| file_size | 文件大小 | KB/MB |
---
## 📐 分类决策指引(V2.0 更新版)
### 核心优先级规则
1. **标题/抬头优先**:发票/回单/对账单/报销单 → A组;合同/协议 → 按内容判定归属(劳动→B2,采购→C1,其他→A5/D4)
2. **金额单据路由**:含税额/税号 → A1;含流水号/账户 → A2/A3;含报销明细 → A4;无税号仅金额 → A8
3. **证照/资质**:营业执照/身份证/护照/许可证 → D组,不混入财务
4. **医疗 vs 保险**:病历/处方/检验/体检 → E组;保单/理赔/结案 → F组
5. **技术/运维**:日志/漏洞/监控 → H组;客服/销售/退换 → I组;招投标/审批/审计 → J组
### 边界 Case 处理
6. **收据 vs 发票**:无税号/无税额 → A8;有完整税务信息 → A1
7. **红字/负数单据**:仍归原类,金额填负值,类型字段标注“红字/冲正”
8. **多页复合文档**:按首页/核心业务分类,在 `remarks` 或 `key_terms` 标注“含多类型附件”
9. **OCR 质量低/置信度不足**:强制归入 `unrecognized`,禁止强行提取
10. **空白/无文本/纯图片**:归入 `parse-failed` 或 `unrecognized`(视解析API返回状态)
### 多语言映射(新增)
| 英文/外文 | 对应分类 | 备注 |
|-----------|----------|------|
| Commercial Invoice / Tax Invoice | A1 | 国际发票结构类似 |
| Bank Statement / Advice | A2/A3 | 按明细或汇总区分 |
| Payslip / Salary Advice | B4 | |
| Purchase Order (PO) / Sales Order (SO) | C1 / I2 | SO 归报价单或订单 |
| Delivery Note / Packing List | C2 | |
| Certificate of Analysis (COA) | C4 | |
| Bill of Lading (B/L) / AWB | G2 / G1 | |
| System Log / Error Log | H1 | |
| Vulnerability Scan Report | H2 | |
| Customer Ticket / Case | I1 | |
| Bidding Document / Tender | J1 | |
| Audit Report / Compliance Review | J3 | |
### 动态扩展机制
若文档明确不属于现有 37 类但可识别性质:
1. 创建新子类型,Sheet 名 ≤31 字符
2. 至少定义 5 个核心字段(含 1 个日期/金额/状态类分析字段)
3. 提交字段定义供业务方确认后纳入路由表
---
FILE:references/document/module-document-parser.md
---
name: quickbi-document-parser
description: >
文档智能解析与结构化提取工具。当用户需要识别 PDF、Word、Excel、CSV、图片等文档内容,
或提取关键字段并生成结构化 Excel 报表时使用。支持单文件、批量文件和文件夹递归处理。
---
# QuickBI 文档解析工具
**核心能力**:
1. 📄 **文档内容识别**: 解析 PDF、Word、Excel、CSV、图片等非结构化文件为可读取的文本内容
2. 📊 **字段提取与汇总**: 从文档中智能提取核心字段,自动生成带格式的多 Sheet Excel 报表
## Scope
**Does:**
- 识别 PDF、Word(.doc/.docx)、Excel(.xls/.xlsx)、CSV、图片(.png/.jpg/.jpeg) 等文档内容
- 支持单文件、多文件批量处理、文件夹递归扫描
- 优先本地提取文本,失败后自动降级到远程 OCR
- 根据预定义分类体系(10 大分类组、37 个子类型)智能提取核心字段
- 支持未知文档的动态结构化提取(5+ 字段需用户确认)
- 生成带格式的多 Sheet Excel 报表(汇总统计 + 分类数据)
**Does NOT:**
- 不支持修改原始文档内容
- 不支持在线编辑 Excel
- 不支持非文档类文件(如视频、音频、可执行文件)
- 严禁杜撰或编造任何提取数据
## Instructions
本技能提供 **2 种使用模式**,根据用户意图自动选择:
### ⚠️ 模式判定规则(重要)
**严格按以下规则判断使用哪个模式**:
| 用户意图关键词 | 使用模式 | 说明 |
|--------------|---------|------|
| 识别、读取、提取文本、转成文本、查看内容 | **模式 A** | 仅需文档内容,不需要结构化 |
| 提取字段、生成 Excel、汇总报表、结构化、分类提取 | **模式 B** | 需要字段提取和 Excel 输出 |
| 解析 + 无后续说明 | **模式 A** | 默认仅识别内容 |
| 解析 + 明确提到字段/Excel/汇总 | **模式 B** | 需要完整流程 |
**关键原则**:
- 📌 **"解析"、"识别"、"读取" 等动词默认指向模式 A**
- 📌 **只有用户明确要求"提取字段"、"生成 Excel"、"汇总报表"时才使用模式 B**
- 📌 **不确定时,优先使用模式 A,然后询问用户是否需要生成 Excel**
---
### 模式 A: 文档内容识别
**适用场景**: 用户仅需读取文档中的文本内容,无需结构化提取
**处理流程**: Step 1 (文本识别)
**示例**:
- "帮我读取这个 PDF 的内容"
- "解析这个 PDF 的内容"
- "提取这些 Word 文档的文本"
- "扫描文件夹,把所有文档转成文本"
---
### 模式 B: 字段提取与 Excel 汇总
**适用场景**: 用户需要从众多文档中,提取核心字段并生成结构化的Excel
**处理流程**: Step 1 (文本识别) → Step 2 (字段提取) → Step 3 (生成 Excel)
**示例**:
- "解析这些发票,提取关键字段并生成 Excel"
- "扫描合同文件夹,汇总所有合同信息到Excel"
- "批量处理文档,按分类提取字段并导出"
---
### 工作流程概览
```
用户上传文件/文件夹
↓
┌─────────────────────────────────────┐
│ Step 1: 文本识别 │
│ 本地解析优先 → 失败降级远程 OCR │
│ 输出: JSON (file + parsedText) │
└─────────────────────────────────────┘
↓
[模式 A: 到此结束,返回文本内容]
↓
┌─────────────────────────────────────┐
│ Step 2: 字段提取 │
│ 智能分类 → 提取核心字段 │
│ 输出: JSON (分类 + 字段数据) │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 3: 生成 Excel 报表 │
│ 多 Sheet + 格式化 + 汇总统计 │
│ 输出: .xlsx 文件 │
└─────────────────────────────────────┘
↓
输出总结 + Excel 交付物
```
### Step 1: 文档内容识别
**目标**: 提取文档中的原始文本内容,生成 JSON 文件
**核心能力**: 📄 支持 PDF、Word、Excel、CSV、图片等多种格式的智能识别
**执行逻辑**:
1. **优先调用本地解析** (`document_local_parse.py`)
```bash
# 单文件
python scripts/document/document_local_parse.py <文件路径> --json
# 多文件
python scripts/document/document_local_parse.py <文件1> <文件2> <文件3> --json
# 文件夹(递归扫描)
python scripts/document/document_local_parse.py <文件夹路径> --json
```
2. **如果本地解析失败**,尝试远程 OCR (`document_remote_ocr.py`)
```bash
# 文件夹扫描
python scripts/document/document_remote_ocr.py <文件夹路径>
# 多文件
python scripts/document/document_remote_ocr.py --files <文件1> <文件2>
```
3. **输出格式**:
```json
[
{
"file": "filename.pdf",
"parsedText": "提取的完整文本内容..."
}
]
```
**注意事项**:
- 本地解析支持: PDF(PyMuPDF)、Word(python-docx)、Excel(openpyxl)、CSV(pandas)、图片(Tesseract OCR)
- 远程 OCR 支持: PDF、图片、Word、Excel、PPT(通过 QuickBI API)
- 单文件最大 10MB
- 默认输出到 `output/` 目录,带时间戳
### Step 2: 字段提取与智能分类
**目标**: 根据文档分类体系,从原始文本中提取核心字段
**核心能力**: 📊 智能分类 + 动态提取 + 用户确认机制
**执行逻辑**:
1. **加载分类体系**: 参考 `references/document_classification.md`
- 10 大分类组: A.财务与税务、B.人力资源、C.供应链与采购、D.行政与法务、E.医疗、F.保险、G.物流、H.技术与运维、I.客服与销售、J.政务与合规
- 37 个子类型: 每个子类型有明确的字段定义和中文表头。
2. **文档分类与字段提取**: 按照以下优先级策略处理
**第一优先级: 匹配预定义分类体系**
- 参考 `references/document_classification.md` 进行分类
- 优先匹配标题/抬头(如"增值税发票"、"银行回单")
- 根据关键字段路由(如含税号→A1,含流水号→A2/A3)
- 匹配成功后,严格按照对应子类型的字段定义提取数据
**第二优先级: 动态结构化提取**
- 如果无法匹配预定义的 37 个子类型,评估文档是否具备结构化提取价值
- 判断标准: 能否从文本中识别并提取 **至少 5 个有效字段**
- 如果可以提取 5+ 个字段:
- 智能识别字段名称和对应值
- **必须使用 AskUserQuestion 工具让用户确认字段定义**
- 用户确认后,按确认的字段结构进行提取
- 为新类型创建临时 Sheet 名(格式: `自定义_{类型名}`)
**第三优先级: 归入未识别类**
- 如果无法匹配预定义分类 **且** 无法结构化提取 5+ 个字段
- 归入"未识别"类,记录内容预览和疑似类型
3. **字段提取**: 严格按照分类体系定义的字段提取
- 字段命名: 英文 `snake_case`
- Excel 表头: 中文名(括号内英文字段名)
- 每个子类型的隐含首列: `filename`(源文件名)
4. **组装 JSON**:
```json
{
"scan_time": "2026-04-07 15:00:00",
"total_files": 10,
"extraction_data": {
"增值税发票": {
"headers_cn": ["源文件名", "发票类型", "发票代码", "发票号码", "开票日期", "购买方名称", "销售方名称", "价税合计"],
"rows": [
["invoice_001.pdf", "专用", "033002100511", "03933249", "2023-05-14", "购买方公司", "销售方公司", "118.00"]
]
},
"未识别": {
"headers_cn": ["源文件名", "内容预览", "疑似类型", "置信度"],
"rows": [
["unknown.pdf", "这是一段文本...", "合同", "中"]
]
}
}
}
```
**⚠️ 核心原则: 严禁杜撰数据**
- ✅ **允许**: 从 `parsedText` 中提取存在的字段值
- ✅ **允许**: 字段缺失时留空(空字符串)
- ❌ **禁止**: 编造不存在的字段和字段值,禁止杜撰数据
- ❌ **禁止**: 根据上下文推测或补全数据
- ❌ **禁止**: 修改原始文本内容
- ❌ **禁止**: 填充默认值(除非分类体系明确说明,如"币种默认 CNY")
**提取示例**:
```python
# ✅ 正确: 从文本中提取
if "发票代码" in text:
invoice_code = extract_value(text, "发票代码") # 提取实际值
else:
invoice_code = "" # 留空,不编造
# ❌ 错误: 杜撰数据
invoice_code = "1234567890" # 文本中没有,禁止编造
```
### Step 3: 生成 Excel 汇总报表
**目标**: 将提取的字段数据生成结构化、带格式的 Excel 报表
**核心能力**: 📈 多 Sheet 自动化 + 格式化 + 汇总统计
**执行命令**:
```bash
# 默认输出到 output/doc_scan_result_{timestamp}.xlsx
python scripts/document/generate_excel.py <Step2的JSON路径>
# 自定义输出路径
python scripts/document/generate_excel.py <Step2的JSON路径> /path/to/output.xlsx
```
**Excel 结构**:
- **excel名称** :`{category名称}_{timestamp}.xlsx`
- **汇总 Sheet**(首页): 统计各分类组的文件数量和提取字段
- **数据 Sheet**(每个子类型一个): 带格式的表格数据
- 蓝色表头(`#4472C4`)+ 白色粗体
- 自动筛选 + 冻结首行
- 自动列宽 + 单元格换行
### 最终交付
在窗口中输出:
1. **处理总结**:
```
文档解析完成
文件总数: 10
成功识别: 9
识别失败: 1
分类统计:
- A.财务与税务: 5 个文件(增值税发票 3, 银行回单 2)
- B.人力资源: 2 个文件(简历 1, 劳动合同 1)
- 未识别: 1 个文件
提取字段: 45 个
```
2. **Excel 交付物路径**:
```
✓ Excel 已生成: /path/to/output/invoice_20260407_150000.xlsx
```
## Examples
### 模式 A 示例
**Example 1: 解析单个文件内容**
Input:
```
请帮我读取这个 PDF 的内容: /Users/user/document.pdf
```
Expected output:
```
[Step 1] 本地解析 document.pdf...
[PDF提取] 成功提取 2350 字符
[保存] JSON 结果已保存到: output/extract_results_1775575200.json
文档解析完成
文件总数: 1
成功识别: 1
提取文本: 2350 字符
✓ 文本内容已保存: output/extract_results_1775575200.json
```
**Example 2: 批量解析文件夹**
Input:
```
扫描并解析 /Users/user/documents/ 下的所有文档,提取文本内容
```
Expected output:
```
[Step 1] 扫描文件夹...
[扫描] 在 /Users/user/documents/ 中找到 15 个支持的文件
[并行提取] 开始处理 15 个文件 (最大并行数: 10)
...
文档解析完成
文件总数: 15
成功识别: 14
识别失败: 1
总文本量: 45,230 字符
✓ 文本内容已保存: output/extract_results_1775576400.json
```
---
### 模式 B 示例
**Example 3: 解析发票并生成 Excel 表格**
Input:
```
请解析这些发票文件,提取关键字段并生成 Excel 报表: /Users/user/invoices/
```
Expected output:
```
[Step 1] 本地解析 invoices/ 文件夹...
[扫描] 找到 10 个支持的文件
[并行提取] 开始处理 10 个文件 (最大并行数: 10)
...
[Step 2] 智能分类与字段提取...
- 增值税发票: 5 个文件 (提取 13 个字段/文件)
- 银行回单: 3 个文件 (提取 11 个字段/文件)
- 未识别: 2 个文件
[Step 3] 生成 Excel 汇总报表...
[格式化] 应用蓝色表头 + 自动筛选 + 冻结首行
[保存] ✓ Excel 结果已保存到: output/doc_scan_result_20260407_150000.xlsx
文档解析完成
文件总数: 10
文件总数: 10
成功识别: 8
未识别: 2
分类统计:
- A.财务与税务: 8 个文件 (增值税发票 5, 银行回单 3)
- 未识别: 2 个文件
提取字段: 98 个
✓ Excel 报表已生成: output/doc_scan_result_20260407_150000.xlsx
```
**Example 4: 本地解析失败,降级到远程 OCR**
Input:
```
解析这个扫描件 PDF 并提取字段: /Users/user/scanned_invoice.pdf
```
Expected output:
```
[Step 1] 本地解析 scanned_invoice.pdf...
[PDF提取] 警告: 本地提取文本较少 (12 字符),可能是扫描件,尝试 OCR...
[PDF提取] 降级到 Tesseract OCR 识别...
[OCR降级] OCR 识别质量不佳,尝试远程 OCR...
[远程 OCR] 上传 scanned_invoice.pdf...
[上传] ✓ scanned_invoice.pdf -> taskId: abc123
[轮询] ✓ 任务解析成功 (850 字符)
[Step 2] 智能分类: 增值税发票 (vat-invoice)
提取字段: 发票代码、发票号码、开票日期、购买方名称... (从 OCR 文本中提取)
[Step 3] 生成 Excel 汇总报表...
[保存] ✓ Excel 结果已保存到: output/doc_scan_result_20260407_160000.xlsx
文档解析完成
文件总数: 1
成功识别: 1 (远程 OCR)
文件总数: 1
成功识别: 1 (远程 OCR)
分类统计:
- A.财务与税务: 1 个文件 (增值税发票 1)
提取字段: 13 个
✓ Excel 报表已生成: output/doc_scan_result_20260407_160000.xlsx
```
**Example 5: 未知文档动态提取(需用户确认)**
Input:
```
解析这个自定义文档并提取字段: /Users/user/custom_report.pdf
```
Expected output:
```
[Step 1] 本地解析 custom_report.pdf...
[PDF提取] 成功提取 1580 字符
[Step 2] 智能分类...
⚠️ 无法匹配预定义的 37 个标准分类
🔍 评估文档结构化提取价值...
✓ 识别到 8 个潜在字段: 报告编号、检测日期、样品名称、检测项目、结果值、检测员、审核人、检测机构
[AskUserQuestion] 检测到未知文档类型,确认识别字段:
┌─────────────────────────────────────┐
│ 文档类型: 检测报告 (自定义) │
│ 识别字段: │
│ 1. 报告编号 (report_no) │
│ 2. 检测日期 (test_date) │
│ 3. 样品名称 (sample_name) │
│ 4. 检测项目 (test_items) │
│ 5. 结果值 (results) │
│ 6. 检测员 (inspector) │
│ 7. 审核人 (reviewer) │
│ 8. 检测机构 (testing_org) │
│ │
│ 是否确认按此结构提取? │
└─────────────────────────────────────┘
用户确认: ✓ 是
[Step 2] 按确认结构提取字段...
[提取] 成功提取 8 个字段
[Step 3] 生成 Excel 汇总报表...
[创建 Sheet] 自定义_检测报告
[保存] ✓ Excel 结果已保存到: output/doc_scan_result_20260407_170000.xlsx
============================================================
文档解析完成
============================================================
文件总数: 1
成功识别: 1 (自定义类型)
分类统计:
- 自定义_检测报告: 1 个文件
提取字段: 8 个
============================================================
✓ Excel 报表已生成: output/doc_scan_result_20260407_170000.xlsx
```
## Additional Resources
- **分类体系详细定义**: [document_classification.md](./document_classification.md)
## 脚本接口参考
### 1. 本地解析脚本 (`document_local_parse.py`)
**功能**: 纯本地文本提取,支持 PDF/Word/Excel/CSV/图片,不依赖外部 API
**支持格式**:
- PDF(.pdf)、Word(.doc/.docx)、Excel(.xls/.xlsx)、CSV(.csv)
- 图片(.png/.jpg/.jpeg/.bmp/.tiff/.webp) - 使用 Tesseract OCR
**命令行用法**:
```bash
# 单文件
python scripts/document/document_local_parse.py <文件路径> --json
# 多文件
python scripts/document/document_local_parse.py <文件1> <文件2> <文件3> --json
# 文件夹递归扫描
python scripts/document/document_local_parse.py <文件夹路径> --json
# 自定义输出目录
python scripts/document/document_local_parse.py <路径> --json --output-dir /custom/output/
# 禁用 OCR 降级
python scripts/document/document_local_parse.py <文件路径> --json --no-ocr
```
**核心参数**:
| 参数 | 说明 | 默认值 |
|------|------|-------|
| `--json` | 保存 JSON 结果 | False |
| `--output-dir` | JSON 输出目录 | `output/` |
| `--no-ocr` | 禁用 OCR 降级 | False |
**输出格式**:
```json
[
{"file": "filename.pdf", "parsedText": "提取的文本内容..."}
]
```
**系统依赖**:
```bash
# macOS
brew install tesseract tesseract-lang
# Ubuntu/Debian
sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-eng
```
---
### 2. 远程 OCR 脚本 (`document_remote_ocr.py`)
**功能**: 基于 QuickBI API 的远程 OCR 识别,支持批量并发处理
**支持格式**:
- PDF、图片(.png/.jpg/.jpeg/.webp/.bmp/.gif/.jp2)
- Word(.doc/.docx)、PPT(.ppt/.pptx)、Excel(.xls/.xlsx/.csv)
- **文件大小限制**: 单文件 ≤ 10MB
**命令行用法**:
```bash
# 文件夹扫描(递归)
python scripts/document/document_remote_ocr.py <文件夹路径>
# 多文件
python scripts/document/document_remote_ocr.py --files <文件1> <文件2> <文件3>
# 自定义输出路径
python scripts/document/document_remote_ocr.py <路径> --output /custom/result.json
# JSON 模式(仅输出JSON,无日志)
python scripts/document/document_remote_ocr.py <路径> --json
# 调整并发数
python scripts/document/document_remote_ocr.py <路径> --upload-workers 5 --poll-workers 10
```
**核心参数**:
| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| `directory` | 可选 | - | 目录路径(递归扫描) |
| `--files` | 可选 | - | 文件列表(与 directory 二选一) |
| `--upload-workers` | int | 5 | 上传并发数(最大10) |
| `--poll-workers` | int | 10 | 轮询并发数(最大10) |
| `--output` | str | - | 输出 JSON 路径 |
| `--json` | flag | false | 仅输出JSON(无日志) |
**输出格式**:
```json
[
{"file": "filename.pdf", "parsedText": "识别文本..."},
{"file": "error.pdf", "parsedText": null, "error": "错误信息"}
]
```
**配置说明**:请参见主文件的「配置」章节。
---
### 3. Excel 生成脚本 (`generate_excel.py`)
**功能**: 将分类提取的 JSON 数据转换为多 Sheet Excel 报表
**命令行用法**:
```bash
# 默认输出到 output/doc_scan_result_{timestamp}.xlsx
python scripts/document/generate_excel.py <JSON路径>
# 自定义输出路径
python scripts/document/generate_excel.py <JSON路径> /path/to/output.xlsx
```
**输入 JSON 格式**:
```json
{
"scan_time": "2026-04-07 15:00:00",
"total_files": 10,
"extraction_data": {
"增值税发票": {
"headers_cn": ["源文件名", "发票类型", "发票代码", "..."],
"rows": [["file.pdf", "专用", "033002100511", "..."]]
},
"未识别": {
"headers_cn": ["源文件名", "内容预览", "疑似类型", "置信度"],
"rows": [["unknown.pdf", "文本...", "合同", "中"]]
}
}
}
```
**Excel 结构**:
- **汇总 Sheet**(首页): 统计各分类组文件数量和提取字段
- **数据 Sheet**(每子类型一个): 蓝色表头 + 自动筛选 + 冻结首行 + 自动列宽
## 依赖安装
```bash
# 安装所有 Python 依赖(requirements.txt 位于 scripts 目录下)
pip install -r <项目根目录>/scripts/requirements.txt
# 系统依赖(仅本地解析需要)
# macOS
brew install tesseract tesseract-lang
# Ubuntu/Debian
sudo apt-get install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-eng
```
**核心 Python 依赖**:
- 本地解析: `PyMuPDF`, `python-docx`, `openpyxl`, `xlrd`, `pandas`, `pytesseract`, `Pillow`
- 远程 OCR: `requests`, `pyyaml`
- Excel 生成: `openpyxl>=3.1.0`
## 注意事项
1. **模式判定优先级**: 严格按"模式判定规则"表格判断,不确定的话**优先使用模式 A**,然后询问用户是否需要生成 Excel
2. **数据真实性**: Step 2 字段提取严禁杜撰,所有数据必须来源于 Step 1 的 `parsedText`
3. **字段缺失处理**: 如果文本中不存在某字段,留空(`""`),不要编造
4. **分类容错**: 无法匹配预定义分类的文档,优先尝试动态提取(5+ 字段),失败后归入"未识别"类
5. **动态提取确认**: 未知文档提取 5+ 字段时,**必须**使用 AskUserQuestion 让用户确认
6. **输出路径**: 所有输出文件默认在 `output/` 目录,带时间戳避免覆盖
7. **并发限制**: 远程 OCR 最大并发 10 个文件,本地解析最大并行 10 个文件
8. **文件大小**: 单文件最大 10MB(远程 OCR 限制)
9. **OCR 降级策略**: 本地解析 PDF 提取文本 < 50 字符时,自动降级到 Tesseract OCR;仍失败则尝试远程 OCR
FILE:references/insight/module-data-insight.md
# 数据解读模块 (Data Insight Module)
通过 Quick BI 小Q解读开放 API,将上传的 Excel 文件(.xls / .xlsx)解析为 Markdown 表格,经 base64 编码后发送至数据解读流式接口,生成深度分析结果。
> 配置说明请参见主文件的「配置」章节。
## 环境依赖
安装命令:
```bash
pip install requests pyyaml openpyxl xlrd
```
| 依赖包 | 必要性 | 用途 |
|--------|--------|------|
| `requests` | **必需** | HTTP 请求(OpenAPI 调用、SSE 流式请求) |
| `pyyaml` | **必需** | 读取 `config.yaml` 配置文件 |
| `openpyxl` | **必需**(`.xlsx` 文件) | 解析 Office Open XML 格式的 Excel 文件 |
| `xlrd` | **必需**(`.xls` 文件) | 解析旧版 Excel 97-2003 格式的文件 |
# 前置条件
- 用户提供 `.xls` 或 `.xlsx` 格式的 Excel 数据文件
- 当数据超限或存在多文件时,Agent 必须先执行数据预处理(详见下方「数据过滤」章节)
## 数据过滤
**当满足以下任一条件时,Agent 必须在调用脚本前执行数据过滤;否则可直接调用脚本,跳过过滤:**
- 用户传入**多份 Excel 文件**
- 单个 Excel 文件的数据量**可能超过 10 万字符**(大文件)
- 用户问题有**明确的过滤条件**(如特定地区、时间范围等)
> 脚本在数据超过 10 万字符时会直接报错退出(不会截断),要求 Agent 先完成过滤后重新调用。
过滤步骤:
1. 用 Python(pandas / openpyxl)读取 Excel 文件
2. 根据用户问题,只保留与问题相关的行和列
3. 将过滤后的数据另存为新的 Excel 文件
4. 用新文件调用本脚本
```python
# 示例:用户问"华东地区销售情况",先过滤出华东数据
import pandas as pd
df = pd.read_excel("/path/to/data.xlsx")
df_filtered = df[df["地区"] == "华东"]
df_filtered.to_excel("/tmp/filtered_data.xlsx", index=False)
# 然后用 filtered_data.xlsx 调用本脚本
```
> 若数据超限但用户问题是泛泛的"解读这份数据",没有明确过滤条件,则保留全部列,按行截取前 N 行(确保转 Markdown 后不超过 10 万字符),并将截取后的数据另存为新文件调用。
## 数据量处理策略
脚本单次调用仅支持 **一个 Excel 文件**,且最大支持 **10 万字符**的 Markdown 数据。**超限时脚本会报错退出,不会截断数据。** 当数据超过此限制或用户传入多份 Excel 时,**调用方(Agent)必须在调用脚本之前**完成预处理。
> **用户可见性约束:**
> - 多文件合并时,向用户输出:"正在整合多份数据并进行综合分析..."
> - 数据过滤时,向用户输出:"正在根据您的问题进行数据过滤,缩小分析范围..."
> - 分批调用时,向用户输出:"正在对数据进行分批解读并汇总结果..."
> - **禁止**向用户暴露数据量限制、字符数上限、调用失败等技术细节
### 多文件处理(用户传入多份 Excel 时必须执行)
脚本只接受单个 `--excel-file`,因此多文件场景需先合并。具体步骤:
1. 用 Python 分别读取每份 Excel
2. 根据用户问题,对每份数据进行行/列过滤,只保留与问题相关的数据
3. 将过滤后的多份数据合并为一个 Excel 文件
4. 用合并后的文件调用本技能
```python
# 示例:用户传入 3 份文件,问"华东地区销售情况"
import pandas as pd
files = ["/path/to/sales_2023.xlsx", "/path/to/sales_2024.xlsx", "/path/to/sales_2025.xlsx"]
dfs = []
for f in files:
df = pd.read_excel(f)
# 根据用户问题过滤相关数据
if "地区" in df.columns:
df = df[df["地区"] == "华东"]
dfs.append(df)
merged = pd.concat(dfs, ignore_index=True)
merged.to_excel("/tmp/merged_data.xlsx", index=False)
# 然后用 merged_data.xlsx 调用本技能
```
#### 策略一:精细过滤(优先)
在前置过滤的基础上,若数据仍超限,进一步缩小过滤范围:减少保留的列数、增加过滤条件、限制时间范围等。
### 策略二:分批调用 + 汇总
若无法通过过滤缩减数据量,则将 Excel 按行拆分为多份(每份不超过 10 万字符),分批调用本技能,最后将各批次的解读结果合并为一份完整报告。
具体步骤:
1. 用 Python 读取 Excel,按行数拆分为多个子文件(每个子文件保留原始表头)
2. 对每个子文件依次调用 `python scripts/insight/q_insights.py "问题" --excel-file "/tmp/part_N.xlsx"`
3. 收集所有批次的解读结果
4. 将所有结果汇总,生成一份完整、连贯的分析报告,去除重复内容,保留所有关键数据和结论
5. 向用户仅输出最终合并后的报告,禁止展示中间分片结果
```python
# 示例:拆分大文件为多份
import pandas as pd
import math
df = pd.read_excel("/path/to/big_data.xlsx")
ROWS_PER_CHUNK = 500 # 根据列数调整,确保每份转 Markdown 后不超过 10 万字符
total_chunks = math.ceil(len(df) / ROWS_PER_CHUNK)
for i in range(total_chunks):
chunk = df.iloc[i * ROWS_PER_CHUNK : (i + 1) * ROWS_PER_CHUNK]
chunk.to_excel(f"/tmp/part_{i+1}.xlsx", index=False)
# 然后逐份调用本技能,最后汇总结果
```
## 工作流程
```mermaid
flowchart TB
input["解读问题 + Excel 文件"] --> needFilter{"多文件 / 大文件 / 有明确过滤条件?"}
needFilter -- 是 --> filter["Agent 根据用户问题过滤数据"]
needFilter -- 否 --> parseExcel["解析 Excel → Markdown 表格"]
filter --> saveFile["另存为新 Excel 文件"]
saveFile --> parseExcel
parseExcel --> checkLimit{"数据 ≤ 10万字符?"}
checkLimit -- 是 --> encode["Markdown 文本 base64 编码"]
checkLimit -- 否 --> errorExit["报错退出,要求过滤或分批"]
encode --> interpret["POST 数据解读流式接口"]
interpret --> parseSSE["实时解析 SSE 事件"]
parseSSE --> reasoning["输出推理过程"]
parseSSE --> text["输出解读结果"]
```
### 执行命令
```bash
# 通过 Excel 文件解读(支持 .xls 和 .xlsx)
python scripts/insight/q_insights.py "各分公司业绩有什么趋势?" --excel-file "/path/to/data.xlsx"
python scripts/insight/q_insights.py "这个报表有什么异常?" --excel-file "/path/to/data.xls"
```
### 内部处理流程
1. **解析 Excel 文件**:根据文件扩展名自动选择解析库(`.xlsx` → `openpyxl`,`.xls` → `xlrd`),支持多 Sheet,每个 Sheet 的第一行作为表头,转换为 Markdown 表格文本
2. **数据编码**:将 Markdown 表格文本进行 UTF-8 + base64 双重编码
3. **调用数据解读流式接口**:`POST /openapi/v2/smartq/dataInterpretationStream`,请求体为 JSON(`stringData`(base64 编码)、`userQuestion`、`modelCode`),响应为 SSE 事件流
**SSE 事件解析:**
- `reasoning` → 输出推理过程
- `text` / `summary` → 输出解读结果
- `finish` → 解读结束
## 输出说明
脚本运行时会实时输出以下内容:
- `[Excel]` Excel 文件解析状态
- `[推理过程]` AI 的分析推理
- `[解读结果]` 最终的数据解读内容
- `[完成]` 解读结束
## 关键接口
| 接口 | 方法 | Content-Type | 说明 |
|------|------|-------------|------|
| `/openapi/v2/smartq/dataInterpretationStream` | POST | application/json | 数据解读流式接口,返回 SSE |
### 请求体字段
| 字段 | 类型 | 说明 |
|------|------|------|
| `stringData` | string | Excel 解析后的 Markdown 文本,经 UTF-8 + base64 编码 |
| `userQuestion` | string | 用户的解读问题 |
| `modelCode` | string | 解读模型代码,当前为 `SYSTEM_deepseek-v3` |
## 重要提示
1. **Excel 文件格式**:支持 `.xls`(Excel 97-2003)和 `.xlsx`(Office Open XML),文件第一行作为表头
2. **数据编码**:Excel 解析后的 Markdown 文本需经 UTF-8 + base64 双重编码后作为 `stringData` 发送
3. **流式超时**:数据解读流默认超时 10 分钟(600 秒)
4. **数据量上限**:单次调用最大 10 万字符,超过时必须先过滤或拆分数据,禁止直接传入超限数据;**禁止向用户透出此上限值**
5. **用户可见性**:多文件合并时告知用户"正在整合多份数据",过滤时告知"正在进行数据过滤",分批时告知"正在分批解读并汇总",禁止向用户暴露字符数限制、调用失败等技术细节
6. **userId 自动处理**:`user_token` 未配置时,脚本启动时即自动基于设备唯一标识生成 accountId,通过组织用户接口检查并注册用户,注册成功后将 userId 回写到全局配置 `~/.qbi/config.yaml`,后续调用不再重复注册
---
## 异常处理(必读)
脚本已内置以下异常的检测逻辑,会在控制台自动打印对应提示。Agent **必须原样转达**提示内容(含链接),不得省略、改写或仅输出通用错误信息。检测到任一异常时,**立即终止流程**。
[qr-group]: https://img.alicdn.com/imgextra/i4/O1CN01DYNgA71txOqnKPdQU_!!6000000005968-2-tps-836-706.png
### 1. 试用到期
**触发条件**:任何步骤的脚本输出或 API 响应中出现错误码 `AE0579100004`
**检测位置**:`scripts/common/utils.py` 中的 `check_trial_expired()`
**必须展示的提示**:
> 您的解读skill试用已到期。
> 点击链接即可领取30天延期试用,保留数据解读、洞察分析等能力。或点击加入交流群,获取最新资讯。
>
> 试用链接:https://bi.aliyun.com/console.htm
>
> ![交流群二维码][qr-group]
### 2. 用户自动注册失败
**触发条件**:脚本输出包含「自动注册用户失败」
**检测位置**:`scripts/common/utils.py` 中的 `_auto_provision_user()`
**必须展示的提示**:
> 自动注册用户失败,请手动在 ~/.qbi/config.yaml 中配置 user_token。
> 可通过 Quick BI 管理控制台获取用户 ID:https://bi.aliyun.com
>
> 如需帮助,请扫码加入交流群获取支持:
>
> ![交流群二维码][qr-group]
FILE:references/report/module-data-report.md
# 报告生成模块 (Data Report Module)
通过 Quick BI 开放API接口,实现与智能小Q的对话,支持生成数据分析报告功能。
> 配置说明请参见主文件的「配置」章节。
## 技能触发与区分规则
### 本模块 vs docx/xlsx/pdf 等原生文件处理技能
**当用户意图是「生成报告」时,本模块的优先级高于所有原生文件处理技能**。即使用户上传了`.pdf` `.docx` 或 `.xlsx` 文件,只要目标是生成分析报告,就必须使用本模块,而非 docx/xlsx/pdf 原生技能。
- 示例:"请帮我结合这两份文件生成一份数据分析报告" → **本模块**
- 示例:"基于上传的 Excel 和 Word 文件生成报告" → **本模块**
- 示例:"汇总这几份数据,写一份复盘报告" → **本模块**
### 本模块 vs 问数模块(文件问数)
- 用户目标是生成「报告/文档/复盘」→ **本模块**
- 用户目标是「查数/问数/分析某个具体指标」→ 问数模块
- 示例:"帮我基于这份数据生成一份分析报告" → **本模块**
- 示例:"帮我分析这份数据,组件数量最多的产品TOP10" → 问数模块
### 本模块 vs 问数模块(数据集问数)
- 用户要生成报告文档 → **本模块**
- 用户没有文件,要查询 Quick BI 平台数据集 → 问数模块
## 工作流程
按以下步骤**分别执行**各脚本(不要使用一键脚本 `generate_report.py`),确保每一步都能实时展示中间结果:
```mermaid
flowchart LR
userQuestion[用户问题] --> autoReg{"user_id\n已配置?"}
autoReg -- 否 --> register["自动注册用户\n回填 config.yaml"]
autoReg -- 是 --> uploadStep
register -- 失败 --> abort["⛔ 终止流程\n告知原因 + 联系 Quick BI 服务"]
register -- 成功 --> uploadStep["Step 1: 上传文件(可选)"]
uploadStep --> resources["resources: id,title,type"]
userQuestion --> createChat["Step 2: 创建报告会话"]
resources --> createChat
createChat --> pollResult["Step 3: 轮询 SSE 结果"]
pollResult --> finishCheck{"type=qreport && function=qreportUsedToken"}
finishCheck -->|是| reportUrl["Step 4: 输出 qreportReplay 链接"]
```
### Step 1:上传文件(可选)
用户上传文件时,先调用 `scripts/report/upload_reference_file.py` 上传每个文件。
```bash
python scripts/report/upload_reference_file.py "<文件1>" "<文件2>"
```
上传接口:**`POST /openapi/v2/qreport/uploadReferenceFile`**,表单字段:`file`(必填)、`chatType`(固定 `manus`)、`userId`(与 `config.yaml` 的 `user_id` 一致)。
上传结果需映射为会话参数 `resources`,每个资源对象只保留以下字段:
```json
[
{
"id": "fileId",
"title": "fileName",
"type": "fileType"
}
]
```
文件格式支持 `doc`、`docx`、`xls`、`xlsx`、`csv`,单文件大小不超过 `10MB`(与 Quick BI 开放接口说明一致)。
### Step 2:创建报告会话
```bash
python scripts/report/create_chat.py "<用户问题>"
```
脚本会输出 `chatId`、`messageId`(不含回放链接),记录 `chatId` 用于下一步轮询;**`reportUrl` 仅在轮询正常完成时**(出现 `qreportUsedToken` 且无 `error`)由 `query_report_result.py` 输出。
如果 Step 1 上传了文件,通过 `--resources-json` 参数传入 resources:
```bash
python scripts/report/create_chat.py "<用户问题>" --resources-json '<resources JSON>'
```
**说明:**
- 创建会话接口:**`POST /openapi/v2/smartq/createQreportChat`**,请求体为 JSON,接口响应体直接返回 `chatId` 字符串
- 请求体始终包含 `"resources": []` 和 `"interruptFeedback": ""`(即使没有上传文件也要传空数组和空字符串);上传文件后 `resources` 会被替换为实际文件列表
- **`attachment`**(必传):JSON 字符串,结构为 `{"resource": {"files": [...], "pages": [], "cubes": [], "dashboardFiles": []}, "useOnlineSearch": true}`。`useOnlineSearch` 固定传 `true`;`pages`/`cubes`/`dashboardFiles` 固定传空数组;若有上传文件,`files` 中每个对象包含 `fileId`/`fileType`/`iconType`/`file.name`/`fileName`,若无则传空数组。完整示例见 `example/qreport_input_with_attachment.json`
- **`bizArgs`**(必传):对象,至少包含 `qbiHost` 字段,取值为 `config.yaml` 中的 `server_domain`
- 完整传参样例见 `example/qreport_input.json`(无文件)和 `example/qreport_input_with_attachment.json`(含文件)
- `chatId` 是后续轮询的关键值,也是最终回放页的 `caseId`
### Step 3:轮询获取结果
使用 Step 2 返回的 `chatId` 开始轮询,脚本会实时打印增量内容:
```bash
python scripts/report/query_report_result.py "<chatId>"
```
轮询接口:**`GET /openapi/v2/smartq/qreportChatData`**,查询参数:`chatId`(会话 UUID)、`userId`(与 `config.yaml` 的 `user_id` 一致)。
轮询接口返回 **JSON 数组**,每个元素为 `{"data":"...", "type":"..."}`。返回结果模型见 `example/output_model.txt`,完整正常输出样例见 `example/qreport_output_data.json`。脚本会自动解析并持续输出新增内容。关注以下事件类型:
| type | 说明 |
|------|------|
| `trace` | 链路追踪 ID(如 UUID),脚本会输出 `[trace] ...` |
| `heartbeat` / `check` | 心跳与流控,脚本静默跳过 |
| `error` | 报告异常,**立即终止轮询**并输出错误信息和 trace;脚本会自动提示"当前报告生成失败,请联系产品服务同学排查问题。"(样例见 `example/qreport_output_error.json`) |
| `plan` | 规划阶段:`learn`(文件学习)、`thinking`(思考)、`mainText`(规划步骤)、`refuse`(拒识)、`interrupt`(确认) |
| `schedule` | 任务调度分析 |
| `step` | 执行步骤:包含 `id`/`title`/`desc` |
| `actionThinking` | 执行阶段思考(markdown) |
| `subStep` | 子步骤:`onlineSearchResult`(联网搜索)、`knowledgeBaseResult`(知识库)、`reasoning`(推理)、`structuredChart`/`unStructuredChart`(图表)、`sql`/`dsl`/`learn`/`rewrite`/`answer` 等 |
| `qreport` | 报告生成阶段:脚本**不输出**报告正文/图表/搜索结果等内容,仅提示「正在生成报告...」;`qreportUsedToken` 标志报告完成 |
| `finish` / `time` | 流结束标记,脚本静默跳过 |
**终止条件:**
- `type=qreport` 且 `function=qreportUsedToken`:报告正常生成完毕;**仅在此情况下**脚本会输出报告链接
- `type=error`:报告异常终止,**立即停止执行后续所有步骤**,输出错误信息和 trace,并提示用户"当前报告生成失败,请联系产品服务同学排查问题。";**不输出**报告链接
### Step 4:输出报告链接(仅正常完成)
仅当满足上一节「`qreportUsedToken` 且无 `error`」时,轮询脚本才会输出:
```text
📊 报告链接
• 在线数据报告:
点击查看完整报告: {server_domain}/copilot/qreportReplay?caseId=<chatId>
```
其中 `{server_domain}` 为配置文件中的 `server_domain` 值,`<chatId>` 为 Step 2 生成的 UUID(与 `caseId` 相同)。若因 `type=error` 结束,脚本不输出报告链接,请勿自行拼接或展示回放链接。
## 试用到期处理(必读)
**任何步骤**的脚本输出或 API 响应中出现错误码 **`AE0579100004`** 时,表示当前用户的分析 skill 试用已到期。此时**必须立即终止整个流程**,并向用户**完整展示**以下提示信息(不得省略、改写或截断):
> 您的分析skill试用已到期。
> 点击链接即可领取30天延期试用,保留自动分析、报告生成等能力。或点击加入交流群,获取最新资讯。
>
> 试用链接:https://bi.aliyun.com/console.htm
> 交流群链接:https://img.alicdn.com/imgextra/i4/O1CN01JNZBQn1YhGxAi2R9B_!!6000000003090-2-tps-872-836.png
脚本已内置此错误码的检测逻辑(`scripts/common/utils.py` 中的 `check_trial_expired()`),会在控制台自动打印上述提示。agent 在答复用户时**必须原样转达**此提示内容(含两个链接),不得仅输出原始错误信息。
## 重要提示
1. **试用到期优先**:当检测到错误码 `AE0579100004` 时,必须优先向用户展示试用到期提示(见上方「试用到期处理」章节),不得仅输出通用错误信息
2. **分步执行**:必须按 Step 1 → Step 2 → Step 3 → Step 4 依次执行各脚本,**不要使用 `generate_report.py` 一键脚本**,否则会阻塞且无法实时展示中间结果
3. **禁止自行解析文件**:当用户输入中包含文件(Excel/CSV/Word 等)时,**必须严格按照工作流程通过 Step 1 的接口上传文件**,由小Q报告后端进行解析和分析。**绝对不要**自行读取、解析或分析文件内容(如用 pandas 读取 Excel、用 python 解析 CSV 等),所有文件处理均由 Quick BI 后端完成
4. **一次对话只创建一次报告**(极重要):同一次对话中,**只允许调用一次 `create_chat.py`**;获取到 `chatId` 后,后续无论用户追问多少次、无论轮询是否超时或失败,都**必须复用该 `chatId`** 调用 `query_report_result.py` 继续轮询,**禁止重新调用 `create_chat.py` 创建新会话**
5. **运行中任务自动切换**(严格执行):当 `createQreportChat` 接口返回格式为「当前用户已有运行中的任务,自动切换至该任务结果输出,问题:%s,chatId:%s」时,表示该用户已有正在运行的报告任务。此时**必须**:
- **完整展示返回信息**:将接口返回的完整内容原样展示给用户(包含问题和 chatId),不得省略或改写
- 使用返回结果中的 `chatId`(而非请求时传入的 chatId)进行后续轮询
- 脚本已内置此逻辑,会自动解析并输出切换提示,agent 需将提示信息**完整转达**给用户
6. **轮询间隔**:默认 **每 10 秒**请求一次 `qreportChatData`(`utils.DEFAULT_POLL_INTERVAL_SECONDS = 10.0`);可用 `python scripts/report/query_report_result.py "<chatId>" --poll-interval 5` 调整
7. **超时判定**:轮询总时间超过 30 分钟仍未返回结果则认为失败
8. **错误处理**:轮询结果中出现 `type=error` 时,脚本自动终止并输出错误信息和 trace,同时提示"当前报告生成失败,请联系产品服务同学排查问题。"。agent **必须立即停止执行后续步骤**,将错误信息和 trace 展示给用户,**不要**提供报告链接,**不要**重试或继续执行
9. **userId 自动处理**:`user_token` 未配置时,脚本启动时即自动基于设备唯一标识生成 accountId,通过组织用户接口检查并注册用户,注册成功后将 userId 回写到全局配置 `~/.qbi/config.yaml`,后续调用不再重复注册
10. **遇错即停**:任何步骤(用户注册、文件上传、创建会话、轮询结果)执行报错时,必须立即终止整个流程,向用户清晰说明报错原因,并提醒:「如需进一步帮助,请联系 Quick BI 产品服务同学获取支持。」不得跳过错误继续执行后续步骤
## 输出建议
- 创建会话后只输出 `chatId`、`messageId`,不要提前展示回放链接
- 轮询过程中实时输出思考、规划步骤和联网搜索等中间结果;报告正文/图表内容**不会输出**,仅提示「正在生成报告...」
- **仅正常完成**时脚本会输出报告链接(含 URL),agent 据此提示用户点击查看完整报告;脚本不再输出结果 JSON
- 失败时不要编造或拼接回放地址,直接展示脚本输出的错误信息和 trace,并告知用户"当前报告生成失败,请联系产品服务同学排查问题。"
- 如果上传了文件,上传后输出文件上传完成即可,`resources` 映射结果无须展示
FILE:scripts/__init__.py
FILE:scripts/chat/__init__.py
FILE:scripts/chat/chart_renderer.py
# -*- coding: utf-8 -*-
"""
图表渲染器:基于 result 事件的结构化数据生成 matplotlib 图表。
支持图表类型(与 Quick BI 前端一致):
指标看板 | 线图 | 组合图
柱图 | 堆积柱 | 百分比柱
条形图 | 堆积条 | 百分比条
排行榜 | 饼图 | 漏斗图
散点图 | 气泡图
分组柱状图(多指标默认)
"""
from __future__ import annotations
import glob
import os
import platform
import sys
import time
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
try:
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import matplotlib.patches as mpatches
import matplotlib.font_manager as fm
import numpy as np
HAS_MPL = True
except ImportError:
HAS_MPL = False
# ── 调色板 ──────────────────────────────────────────────────
PALETTE = [
"#5B8FF9", "#5AD8A6", "#F6BD16", "#E86452",
"#6DC8EC", "#945FB9", "#FF9845", "#1E9493",
"#FF99C3", "#269A99",
]
# ── chartType 映射 ──────────────────────────────────────────
_CHART_TYPE_MAP: Dict[str, str] = {
"indicator": "indicator",
"indicator-card": "indicator",
"line": "line",
"combo": "combo",
"combination": "combo",
"bar": "bar",
"column": "bar",
"grouped_bar": "grouped_bar",
"stacked_bar": "stacked_bar",
"stackedBar": "stacked_bar",
"percent_bar": "percent_bar",
"percentBar": "percent_bar",
"horizontal_bar": "horizontal_bar",
"horizontalBar": "horizontal_bar",
"strip": "horizontal_bar",
"stacked_horizontal_bar": "stacked_horizontal_bar",
"stackedHorizontalBar": "stacked_horizontal_bar",
"percent_horizontal_bar": "percent_horizontal_bar",
"percentHorizontalBar": "percent_horizontal_bar",
"ranking": "ranking",
"rank": "ranking",
"ranking-list": "ranking",
"pie": "pie",
"funnel": "funnel",
"scatter": "scatter",
"bubble": "bubble",
# table 类型不映射为 bar,保留原值让 render_chart 函数处理
"table": "table",
"wordCloud": "bar",
}
# ── 密集数据阈值 ────────────────────────────────────────────
MAX_DISPLAY_ITEMS = 20
MAX_DISPLAY_ITEMS_HORIZONTAL = 30
# ── 公共工具 ────────────────────────────────────────────────
def _format_value(v: float) -> str:
abs_v = abs(v)
if abs_v >= 1e8:
return f"{v / 1e8:.2f}亿"
if abs_v >= 1e4:
return f"{v / 1e4:.2f}万"
if abs_v == int(abs_v):
return str(int(v))
return f"{v:.2f}"
# ── 中文字体候选列表(按优先级排列) ────────────────────────
_CJK_FONT_CANDIDATES = [
"Hiragino Sans GB", "STHeiti", "Heiti TC",
"PingFang HK", "Songti SC", "Arial Unicode MS",
"PingFang SC", "Microsoft YaHei", "SimHei",
"Noto Sans CJK SC", "Noto Sans SC",
"WenQuanYi Micro Hei", "WenQuanYi Zen Hei",
"Droid Sans Fallback", "Source Han Sans SC",
"Source Han Sans CN",
]
# 系统字体目录:用于扫描未被 matplotlib 自动发现的 .ttf/.otf 文件
_EXTRA_FONT_DIRS = [
"/usr/share/fonts",
"/usr/local/share/fonts",
os.path.expanduser("~/.local/share/fonts"),
os.path.expanduser("~/.fonts"),
"/System/Library/Fonts",
"/Library/Fonts",
os.path.expanduser("~/Library/Fonts"),
# Windows
os.path.expandvars(r"%WINDIR%\Fonts"),
os.path.expandvars(r"%LOCALAPPDATA%\Microsoft\Windows\Fonts"),
]
_font_setup_done = False
def _scan_and_register_fonts():
"""扫描系统字体目录,将未被 matplotlib 发现的字体强制注册。"""
registered = 0
for font_dir in _EXTRA_FONT_DIRS:
if not os.path.isdir(font_dir):
continue
for ext in ("**/*.ttf", "**/*.otf", "**/*.ttc"):
for font_path in glob.glob(os.path.join(font_dir, ext), recursive=True):
try:
fm.fontManager.addfont(font_path)
registered += 1
except Exception:
pass
return registered
def _find_available_cjk_font() -> Optional[str]:
"""从候选列表中找到第一个 matplotlib 可用的中文字体名称。"""
available = {f.name for f in fm.fontManager.ttflist}
for name in _CJK_FONT_CANDIDATES:
if name in available:
return name
return None
def _test_cjk_render(font_name: str) -> bool:
"""快速测试字体能否真正渲染中文字符(排除字体名存在但字形缺失的情况)。"""
try:
prop = fm.FontProperties(family=font_name)
font_path = fm.findfont(prop, fallback_to_default=False)
if font_path and os.path.exists(font_path):
return True
except Exception:
pass
return False
def _setup_style():
global _font_setup_done
# 第一步:尝试从候选列表直接匹配
chosen_font = _find_available_cjk_font()
# 第二步:若无可用字体,强制扫描系统字体目录并重新匹配
if not chosen_font:
count = _scan_and_register_fonts()
if count > 0:
chosen_font = _find_available_cjk_font()
# 第三步:若仍无匹配,尝试从已注册字体中找任意含 CJK 关键词的字体
if not chosen_font:
cjk_keywords = [
"CJK", "Noto", "Han", "Hei", "Song", "Ming", "Fang",
"YaHei", "Gothic", "Droid", "WenQuan",
]
for f in fm.fontManager.ttflist:
if any(kw.lower() in f.name.lower() for kw in cjk_keywords):
if _test_cjk_render(f.name):
chosen_font = f.name
break
# 构建最终字体列表:已验证可用的字体放最前面
if chosen_font:
font_list = [chosen_font] + [
f for f in _CJK_FONT_CANDIDATES if f != chosen_font
] + ["sans-serif"]
else:
font_list = _CJK_FONT_CANDIDATES + ["sans-serif"]
if not _font_setup_done:
if chosen_font:
print(f"[chart_renderer] 使用中文字体: {chosen_font}", file=sys.stderr)
else:
_sys_name = platform.system()
if _sys_name == "Windows":
_install_hint = (
"建议安装: 在 Windows 设置 → 时间和语言 → 语言 → "
"中文(简体)→ 安装语言包,或手动安装 SimHei / Microsoft YaHei 字体"
)
elif _sys_name == "Darwin":
_install_hint = "建议安装: macOS 通常自带 PingFang SC,若缺失请通过字体册安装中文字体"
else:
_install_hint = (
"建议安装: yum install -y google-noto-sans-cjk-sc-fonts && fc-cache -fv"
)
print(
"[chart_renderer] 警告: 未找到可用中文字体,图表中文可能显示异常。"
f"{_install_hint}",
file=sys.stderr,
)
_font_setup_done = True
plt.rcParams.update({
"font.sans-serif": font_list,
"axes.unicode_minus": False,
"figure.dpi": 150,
"savefig.dpi": 150,
"savefig.bbox": "tight",
"savefig.pad_inches": 0.3,
})
def _safe_float(v: Any) -> Optional[float]:
if v is None or v == "-" or v == "":
return None
try:
return float(v)
except (ValueError, TypeError):
return None
def _build_labels(rows: List[dict], dims: List[dict]) -> List[str]:
dim_names = [d["fieldName"] for d in dims]
labels = []
for row in rows:
parts = [str(row.get(dn, "")) for dn in dim_names]
labels.append(" / ".join(parts) if len(parts) > 1 else parts[0])
return labels
def _extract_metric_values(rows: List[dict], metric_name: str) -> List[Optional[float]]:
return [_safe_float(r.get(metric_name)) for r in rows]
def _truncate_data(
rows: List[dict],
metrics: List[dict],
limit: int,
) -> Tuple[List[dict], int]:
"""Truncate rows to *limit*, sorting by the first metric descending.
Returns (truncated_rows, original_total).
"""
total = len(rows)
if total <= limit:
return rows, total
metric_name = metrics[0]["fieldName"]
sorted_rows = sorted(rows, key=lambda r: _safe_float(r.get(metric_name)) or 0, reverse=True)
return sorted_rows[:limit], total
def _apply_common_style(ax, title: str = "", ylabel: str = "", show_grid_y=True):
if title:
ax.set_title(title, fontsize=13, fontweight="bold", pad=12, color="#1e293b")
if ylabel:
ax.set_ylabel(ylabel, fontsize=9, color="#64748b")
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_color("#e2e8f0")
ax.spines["bottom"].set_color("#e2e8f0")
ax.tick_params(colors="#94a3b8")
ax.set_axisbelow(True)
if show_grid_y:
ax.grid(axis="y", color="#f1f5f9", linewidth=0.8)
def _add_truncation_note(ax, shown: int, total: int, position: str = "bottom"):
if shown >= total:
return
note = f"显示前 {shown} 条(共 {total} 条)"
if position == "bottom":
ax.annotate(
note, xy=(0.5, -0.02), xycoords="axes fraction",
ha="center", va="top", fontsize=8, color="#94a3b8",
fontstyle="italic",
)
else:
ax.annotate(
note, xy=(0.98, 0.98), xycoords="axes fraction",
ha="right", va="top", fontsize=8, color="#94a3b8",
fontstyle="italic",
)
def _save_and_close(fig, path: Path):
fig.patch.set_facecolor("white")
fig.tight_layout()
fig.savefig(path)
plt.close(fig)
# ── 图表类型自动推断 ────────────────────────────────────────
# 仅支持单指标的图表类型,多指标时需回退到自动推断
_SINGLE_METRIC_TYPES = {"indicator", "bar", "horizontal_bar", "ranking", "pie", "funnel"}
def _infer_chart_type(
dims: List[dict],
metrics: List[dict],
chart_type_hint: str,
) -> str:
mapped = _CHART_TYPE_MAP.get(chart_type_hint, "")
n_dims = len(dims)
n_metrics = len(metrics)
# 服务端推荐 table 类型时,直接返回 table(后续会渲染为 Markdown 表格)
if mapped == "table":
return "table"
# 服务端推荐的类型仅支持单指标,但实际有多指标时,回退到自动推断
if mapped and not (n_metrics > 1 and mapped in _SINGLE_METRIC_TYPES):
return mapped
if n_dims == 0 and n_metrics >= 1:
return "indicator"
if n_dims >= 1 and n_metrics == 1:
return "bar"
if n_dims >= 1 and n_metrics > 1:
return "grouped_bar"
return "bar"
# ── 主入口 ──────────────────────────────────────────────────
def render_result_charts(
result_data: dict,
output_dir: str | Path,
*,
prefix: str = "chart",
) -> List[str]:
if not HAS_MPL:
return []
_setup_style()
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
data_list = result_data.get("dataList", [])
if not data_list:
return []
ts = int(time.time())
chart_paths: List[str] = []
for idx, dataset in enumerate(data_list, 1):
rows = dataset.get("data", [])
field_info = dataset.get("fieldInfo", [])
title = dataset.get("title") or ""
chart_type_hint = dataset.get("chartType") or ""
if not rows or not field_info:
continue
dims = [f for f in field_info if f.get("role") == "dimension"]
metrics = [f for f in field_info if f.get("role") == "metric"]
if not metrics:
continue
ctype = _infer_chart_type(dims, metrics, chart_type_hint)
path = output_dir / f"{prefix}_{ts}_{idx}.png"
# table 类型不渲染图片,直接跳过(由调用方使用 Markdown 表格 fallback)
if ctype == "table":
continue
renderer = _RENDERERS.get(ctype, _render_bar_chart)
try:
renderer(rows, dims, metrics, title, path)
except Exception:
_render_bar_chart(rows, dims, metrics, title, path)
if path.exists():
chart_paths.append(str(path))
return chart_paths
# =====================================================================
# 各图表类型渲染函数
# 统一签名: (rows, dims, metrics, title, path)
# =====================================================================
# ── 1. 指标看板 ─────────────────────────────────────────────
def _render_indicator(rows, dims, metrics, title, path):
metric_name = metrics[0]["fieldName"]
value = _safe_float(rows[0].get(metric_name))
if value is None:
return
display_title = title or metric_name
display_value = _format_value(value)
fig, ax = plt.subplots(figsize=(5, 3))
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.axis("off")
ax.text(0.5, 0.62, display_value, ha="center", va="center",
fontsize=36, fontweight="bold", color="#1e293b")
ax.text(0.5, 0.28, display_title, ha="center", va="center",
fontsize=14, color="#64748b")
_save_and_close(fig, path)
# ── 2. 柱图 ─────────────────────────────────────────────────
def _render_bar_chart(rows, dims, metrics, title, path):
if not dims:
_render_indicator(rows, dims, metrics, title, path)
return
total = len(rows)
if total > MAX_DISPLAY_ITEMS:
rows, total = _truncate_data(rows, metrics, MAX_DISPLAY_ITEMS_HORIZONTAL)
return _render_bar_as_horizontal(rows, dims, metrics, title, path, total)
metric = metrics[0]
metric_name = metric["fieldName"]
labels = _build_labels(rows, dims)
values = _extract_metric_values(rows, metric_name)
valid = [(l, v) for l, v in zip(labels, values) if v is not None]
if not valid:
return
labels, values = zip(*valid)
fig_w = min(16, max(6, len(labels) * 0.8))
fig, ax = plt.subplots(figsize=(fig_w, 5))
bars = ax.bar(range(len(labels)), values, color=PALETTE[0], width=0.6, edgecolor="white")
show_val_labels = len(labels) <= 15
if show_val_labels:
for bar, val in zip(bars, values):
ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height(),
_format_value(val), ha="center", va="bottom", fontsize=8, color="#475569")
_set_x_labels(ax, labels)
_apply_common_style(ax, title or metric_name, metric_name)
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: _format_value(x)))
_save_and_close(fig, path)
def _render_bar_as_horizontal(rows, dims, metrics, title, path, original_total):
"""Render a horizontal bar chart for dense data (auto-switched from vertical bar)."""
metric = metrics[0]
metric_name = metric["fieldName"]
labels = _build_labels(rows, dims)
values = _extract_metric_values(rows, metric_name)
valid = [(l, v) for l, v in zip(labels, values) if v is not None]
if not valid:
return
labels, values = zip(*valid)
labels = list(reversed(labels))
values = list(reversed(values))
n = len(labels)
fig_h = max(5, n * 0.38)
fig, ax = plt.subplots(figsize=(10, fig_h))
max_val = max(values) if values else 1
colors = [PALETTE[0] if i >= n - 3 else PALETTE[0] for i in range(n)]
alphas = [1.0 if i >= n - 3 else 0.75 for i in range(n)]
for i, (label, val) in enumerate(zip(labels, values)):
ax.barh(i, val, height=0.6, color=colors[i], alpha=alphas[i], edgecolor="white")
ax.text(val + max_val * 0.01, i, _format_value(val),
ha="left", va="center", fontsize=8, color="#475569")
ax.set_yticks(range(n))
ax.set_yticklabels(labels, fontsize=8)
_apply_common_style(ax, title or metric_name, show_grid_y=False)
ax.grid(axis="x", color="#f1f5f9", linewidth=0.8)
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: _format_value(x)))
_add_truncation_note(ax, n, original_total, position="top")
_save_and_close(fig, path)
# ── 3. 分组柱状图 ───────────────────────────────────────────
def _render_grouped_bar(rows, dims, metrics, title, path):
if not dims:
_render_indicator(rows, dims, metrics, title, path)
return
original_total = len(rows)
if original_total > MAX_DISPLAY_ITEMS:
rows, original_total = _truncate_data(rows, metrics, MAX_DISPLAY_ITEMS)
labels = _build_labels(rows, dims)
metric_names = [m["fieldName"] for m in metrics]
data_matrix = [[_safe_float(r.get(mn)) for r in rows] for mn in metric_names]
n_g = len(labels)
n_m = len(metric_names)
bw = 0.7 / n_m
x = np.arange(n_g)
fig_w = min(16, max(7, n_g * 1.2))
fig, ax = plt.subplots(figsize=(fig_w, 5.5))
show_val_labels = n_g <= 12
for i, (mn, vals) in enumerate(zip(metric_names, data_matrix)):
offset = (i - n_m / 2 + 0.5) * bw
pv = [v if v is not None else 0 for v in vals]
bars = ax.bar(x + offset, pv, bw, label=mn, color=PALETTE[i % len(PALETTE)], edgecolor="white")
if show_val_labels:
for bar, val in zip(bars, vals):
if val is not None:
ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height(),
_format_value(val), ha="center", va="bottom", fontsize=7, color="#475569")
_set_x_labels(ax, labels)
_apply_common_style(ax, title or "数据对比")
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: _format_value(x)))
ax.legend(fontsize=8, loc="upper right", framealpha=0.9)
_add_truncation_note(ax, n_g, original_total)
_save_and_close(fig, path)
# ── 4. 堆积柱 ───────────────────────────────────────────────
def _render_stacked_bar(rows, dims, metrics, title, path):
if not dims:
return _render_indicator(rows, dims, metrics, title, path)
original_total = len(rows)
if original_total > MAX_DISPLAY_ITEMS:
rows, original_total = _truncate_data(rows, metrics, MAX_DISPLAY_ITEMS)
labels = _build_labels(rows, dims)
metric_names = [m["fieldName"] for m in metrics]
data_matrix = [[v if v is not None else 0 for v in _extract_metric_values(rows, mn)] for mn in metric_names]
n = len(labels)
x = np.arange(n)
fig_w = min(16, max(6, n * 0.9))
fig, ax = plt.subplots(figsize=(fig_w, 5.5))
show_val_labels = n <= 15
bottom = np.zeros(n)
for i, (mn, vals) in enumerate(zip(metric_names, data_matrix)):
arr = np.array(vals)
ax.bar(x, arr, 0.6, bottom=bottom, label=mn, color=PALETTE[i % len(PALETTE)], edgecolor="white")
if show_val_labels:
for j, (xi, v) in enumerate(zip(x, arr)):
if v != 0:
ax.text(xi, bottom[j] + v / 2, _format_value(v), ha="center", va="center", fontsize=7, color="white")
bottom += arr
_set_x_labels(ax, labels)
_apply_common_style(ax, title or "堆积柱状图")
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: _format_value(x)))
ax.legend(fontsize=8, loc="upper right", framealpha=0.9)
_add_truncation_note(ax, n, original_total)
_save_and_close(fig, path)
# ── 5. 百分比柱 ─────────────────────────────────────────────
def _render_percent_bar(rows, dims, metrics, title, path):
if not dims:
return _render_indicator(rows, dims, metrics, title, path)
original_total = len(rows)
if original_total > MAX_DISPLAY_ITEMS:
rows, original_total = _truncate_data(rows, metrics, MAX_DISPLAY_ITEMS)
labels = _build_labels(rows, dims)
metric_names = [m["fieldName"] for m in metrics]
data_matrix = [np.array([v if v is not None else 0 for v in _extract_metric_values(rows, mn)]) for mn in metric_names]
totals = sum(data_matrix)
totals[totals == 0] = 1
pct_matrix = [arr / totals * 100 for arr in data_matrix]
n = len(labels)
x = np.arange(n)
fig_w = min(16, max(6, n * 0.9))
fig, ax = plt.subplots(figsize=(fig_w, 5.5))
show_val_labels = n <= 15
bottom = np.zeros(n)
for i, (mn, pcts) in enumerate(zip(metric_names, pct_matrix)):
ax.bar(x, pcts, 0.6, bottom=bottom, label=mn, color=PALETTE[i % len(PALETTE)], edgecolor="white")
if show_val_labels:
for j, (xi, p) in enumerate(zip(x, pcts)):
if p > 5:
ax.text(xi, bottom[j] + p / 2, f"{p:.0f}%", ha="center", va="center", fontsize=7, color="white")
bottom += pcts
ax.set_ylim(0, 100)
_set_x_labels(ax, labels)
_apply_common_style(ax, title or "百分比柱状图")
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f"{x:.0f}%"))
ax.legend(fontsize=8, loc="upper right", framealpha=0.9)
_add_truncation_note(ax, n, original_total)
_save_and_close(fig, path)
# ── 6. 条形图(水平柱) ────────────────────────────────────
def _render_horizontal_bar(rows, dims, metrics, title, path):
if not dims:
return _render_indicator(rows, dims, metrics, title, path)
original_total = len(rows)
if original_total > MAX_DISPLAY_ITEMS_HORIZONTAL:
rows, original_total = _truncate_data(rows, metrics, MAX_DISPLAY_ITEMS_HORIZONTAL)
metric = metrics[0]
metric_name = metric["fieldName"]
labels = _build_labels(rows, dims)
values = _extract_metric_values(rows, metric_name)
valid = [(l, v) for l, v in zip(labels, values) if v is not None]
if not valid:
return
labels, values = zip(*valid)
labels = list(reversed(labels))
values = list(reversed(values))
n = len(labels)
fig_h = max(4, n * 0.38)
fig, ax = plt.subplots(figsize=(10, fig_h))
y = range(n)
bars = ax.barh(y, values, color=PALETTE[0], height=0.6, edgecolor="white")
for bar, val in zip(bars, values):
ax.text(bar.get_width(), bar.get_y() + bar.get_height() / 2,
" " + _format_value(val), ha="left", va="center", fontsize=8, color="#475569")
ax.set_yticks(y)
ax.set_yticklabels(labels, fontsize=8)
_apply_common_style(ax, title or metric_name, show_grid_y=False)
ax.grid(axis="x", color="#f1f5f9", linewidth=0.8)
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: _format_value(x)))
_add_truncation_note(ax, n, original_total, position="top")
_save_and_close(fig, path)
# ── 7. 堆积条(水平堆积) ──────────────────────────────────
def _render_stacked_horizontal_bar(rows, dims, metrics, title, path):
if not dims:
return _render_indicator(rows, dims, metrics, title, path)
labels = list(reversed(_build_labels(rows, dims)))
metric_names = [m["fieldName"] for m in metrics]
data_matrix = [np.array(list(reversed([v if v is not None else 0 for v in _extract_metric_values(rows, mn)]))) for mn in metric_names]
y = np.arange(len(labels))
fig, ax = plt.subplots(figsize=(8, max(4, len(labels) * 0.45)))
left = np.zeros(len(labels))
for i, (mn, vals) in enumerate(zip(metric_names, data_matrix)):
ax.barh(y, vals, 0.6, left=left, label=mn, color=PALETTE[i % len(PALETTE)], edgecolor="white")
for j, (yi, v) in enumerate(zip(y, vals)):
if v != 0:
ax.text(left[j] + v / 2, yi, _format_value(v), ha="center", va="center", fontsize=7, color="white")
left += vals
ax.set_yticks(y)
ax.set_yticklabels(labels, fontsize=9)
_apply_common_style(ax, title or "堆积条形图", show_grid_y=False)
ax.grid(axis="x", color="#f1f5f9", linewidth=0.8)
ax.legend(fontsize=8, loc="lower right", framealpha=0.9)
_save_and_close(fig, path)
# ── 8. 百分比条 ─────────────────────────────────────────────
def _render_percent_horizontal_bar(rows, dims, metrics, title, path):
if not dims:
return _render_indicator(rows, dims, metrics, title, path)
labels = list(reversed(_build_labels(rows, dims)))
metric_names = [m["fieldName"] for m in metrics]
data_matrix = [np.array(list(reversed([v if v is not None else 0 for v in _extract_metric_values(rows, mn)]))) for mn in metric_names]
totals = sum(data_matrix)
totals[totals == 0] = 1
pct_matrix = [arr / totals * 100 for arr in data_matrix]
y = np.arange(len(labels))
fig, ax = plt.subplots(figsize=(8, max(4, len(labels) * 0.45)))
left = np.zeros(len(labels))
for i, (mn, pcts) in enumerate(zip(metric_names, pct_matrix)):
ax.barh(y, pcts, 0.6, left=left, label=mn, color=PALETTE[i % len(PALETTE)], edgecolor="white")
for j, (yi, p) in enumerate(zip(y, pcts)):
if p > 5:
ax.text(left[j] + p / 2, yi, f"{p:.0f}%", ha="center", va="center", fontsize=7, color="white")
left += pcts
ax.set_xlim(0, 100)
ax.set_yticks(y)
ax.set_yticklabels(labels, fontsize=9)
_apply_common_style(ax, title or "百分比条形图", show_grid_y=False)
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f"{x:.0f}%"))
ax.legend(fontsize=8, loc="lower right", framealpha=0.9)
_save_and_close(fig, path)
# ── 9. 排行榜 ───────────────────────────────────────────────
def _render_ranking(rows, dims, metrics, title, path):
if not dims:
return _render_indicator(rows, dims, metrics, title, path)
original_total = len(rows)
if original_total > MAX_DISPLAY_ITEMS_HORIZONTAL:
rows, original_total = _truncate_data(rows, metrics, MAX_DISPLAY_ITEMS_HORIZONTAL)
metric = metrics[0]
metric_name = metric["fieldName"]
labels = _build_labels(rows, dims)
values = _extract_metric_values(rows, metric_name)
valid = [(l, v) for l, v in zip(labels, values) if v is not None]
if not valid:
return
labels, values = zip(*valid)
labels = list(reversed(labels))
values = list(reversed(values))
n = len(labels)
max_val = max(values) if values else 1
fig_h = max(4, n * 0.42)
fig, ax = plt.subplots(figsize=(10, fig_h))
y = range(n)
for i, (label, val) in enumerate(zip(labels, values)):
rank = n - i
color = PALETTE[0] if rank <= 3 else "#cbd5e1"
bar_alpha = 1.0 if rank <= 3 else 0.6
ax.barh(i, val, height=0.55, color=color, alpha=bar_alpha, edgecolor="white")
ax.text(-max_val * 0.02, i, f"{rank}", ha="right", va="center",
fontsize=10, fontweight="bold", color=PALETTE[0] if rank <= 3 else "#94a3b8")
ax.text(val + max_val * 0.01, i, _format_value(val), ha="left", va="center",
fontsize=8, color="#475569")
ax.set_yticks(y)
ax.set_yticklabels(labels, fontsize=8)
ax.set_xlim(-max_val * 0.05, max_val * 1.18)
_apply_common_style(ax, title or metric_name, show_grid_y=False)
ax.spines["left"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.tick_params(left=False, bottom=False, labelbottom=False)
ax.grid(False)
_add_truncation_note(ax, n, original_total, position="top")
_save_and_close(fig, path)
# ── 10. 线图 ────────────────────────────────────────────────
def _render_line(rows, dims, metrics, title, path):
if not dims:
return _render_indicator(rows, dims, metrics, title, path)
original_total = len(rows)
if original_total > MAX_DISPLAY_ITEMS * 2:
rows = rows[:MAX_DISPLAY_ITEMS * 2]
labels = _build_labels(rows, dims)
n = len(labels)
x = np.arange(n)
fig_w = min(18, max(6, n * 0.6))
fig, ax = plt.subplots(figsize=(fig_w, 5))
show_val_labels = n <= 15
marker_size = 5 if n <= 20 else (3 if n <= 30 else 0)
for i, m in enumerate(metrics):
mn = m["fieldName"]
values = _extract_metric_values(rows, mn)
plot_vals = [v if v is not None else float("nan") for v in values]
color = PALETTE[i % len(PALETTE)]
ax.plot(x, plot_vals, marker="o" if marker_size > 0 else None,
markersize=marker_size, linewidth=2, color=color, label=mn)
if show_val_labels:
for xi, val in zip(x, values):
if val is not None:
ax.text(xi, val, _format_value(val), ha="center", va="bottom", fontsize=7, color=color)
_set_x_labels(ax, labels)
_apply_common_style(ax, title or metrics[0]["fieldName"])
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: _format_value(x)))
if len(metrics) > 1:
ax.legend(fontsize=8, loc="upper right", framealpha=0.9)
_add_truncation_note(ax, n, original_total)
_save_and_close(fig, path)
# ── 11. 组合图(第一个指标柱图,其余线图) ──────────────────
def _render_combo(rows, dims, metrics, title, path):
if not dims or len(metrics) < 2:
return _render_line(rows, dims, metrics, title, path)
original_total = len(rows)
if original_total > MAX_DISPLAY_ITEMS:
rows, original_total = _truncate_data(rows, metrics, MAX_DISPLAY_ITEMS)
labels = _build_labels(rows, dims)
n = len(labels)
x = np.arange(n)
fig_w = min(16, max(7, n * 0.9))
fig, ax1 = plt.subplots(figsize=(fig_w, 5.5))
bar_metric = metrics[0]
bar_vals = [v if v is not None else 0 for v in _extract_metric_values(rows, bar_metric["fieldName"])]
ax1.bar(x, bar_vals, 0.5, color=PALETTE[0], alpha=0.7, label=bar_metric["fieldName"], edgecolor="white")
ax1.set_ylabel(bar_metric["fieldName"], fontsize=9, color=PALETTE[0])
ax1.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: _format_value(x)))
marker_size = 4 if n <= 20 else 0
ax2 = ax1.twinx()
for i, m in enumerate(metrics[1:], 1):
mn = m["fieldName"]
vals = [v if v is not None else float("nan") for v in _extract_metric_values(rows, mn)]
color = PALETTE[i % len(PALETTE)]
ax2.plot(x, vals, marker="o" if marker_size > 0 else None,
markersize=marker_size, linewidth=2, color=color, label=mn)
ax2.set_ylabel(metrics[1]["fieldName"], fontsize=9, color=PALETTE[1])
ax2.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: _format_value(x)))
_set_x_labels(ax1, labels)
_apply_common_style(ax1, title or "组合图")
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, fontsize=8, loc="upper right", framealpha=0.9)
ax2.spines["top"].set_visible(False)
_add_truncation_note(ax1, n, original_total)
_save_and_close(fig, path)
# ── 12. 饼图 ────────────────────────────────────────────────
_MAX_PIE_SLICES = 10
def _render_pie(rows, dims, metrics, title, path):
metric = metrics[0]
metric_name = metric["fieldName"]
if dims:
labels = _build_labels(rows, dims)
else:
labels = [metric_name]
values = _extract_metric_values(rows, metric_name)
valid = [(l, v) for l, v in zip(labels, values) if v is not None and v > 0]
if not valid:
return
labels, values = zip(*valid)
if len(labels) > _MAX_PIE_SLICES:
pairs = sorted(zip(labels, values), key=lambda x: x[1], reverse=True)
top = pairs[:_MAX_PIE_SLICES - 1]
others_sum = sum(v for _, v in pairs[_MAX_PIE_SLICES - 1:])
labels = [l for l, _ in top] + ["其他"]
values = [v for _, v in top] + [others_sum]
colors = [PALETTE[i % len(PALETTE)] for i in range(len(labels))]
fig, ax = plt.subplots(figsize=(7, 5.5))
wedges, texts, autotexts = ax.pie(
values,
labels=None,
autopct=lambda pct: f"{pct:.1f}%" if pct > 3 else "",
colors=colors,
startangle=90,
pctdistance=0.75,
wedgeprops={"edgecolor": "white", "linewidth": 1.5},
)
for t in autotexts:
t.set_fontsize(8)
t.set_color("white")
t.set_fontweight("bold")
legend_labels = [f"{l} {_format_value(v)}" for l, v in zip(labels, values)]
ax.legend(wedges, legend_labels, loc="center left", bbox_to_anchor=(1.05, 0.5),
fontsize=8, frameon=False)
display_title = title or metric_name
ax.set_title(display_title, fontsize=13, fontweight="bold", pad=12, color="#1e293b")
_save_and_close(fig, path)
# ── 13. 漏斗图 ──────────────────────────────────────────────
def _render_funnel(rows, dims, metrics, title, path):
metric = metrics[0]
metric_name = metric["fieldName"]
if dims:
labels = _build_labels(rows, dims)
else:
labels = [str(i + 1) for i in range(len(rows))]
values = _extract_metric_values(rows, metric_name)
valid = [(l, v) for l, v in zip(labels, values) if v is not None and v > 0]
if not valid:
return
labels, values = zip(*valid)
max_val = max(values)
n = len(labels)
fig, ax = plt.subplots(figsize=(8, max(4, n * 0.7)))
ax.set_xlim(0, 1)
ax.set_ylim(0, n)
ax.axis("off")
for i, (label, val) in enumerate(zip(labels, values)):
width = val / max_val * 0.8
left = (1 - width) / 2
y_bottom = n - i - 1
color = PALETTE[i % len(PALETTE)]
rect = mpatches.FancyBboxPatch(
(left, y_bottom + 0.1), width, 0.75,
boxstyle="round,pad=0.02", facecolor=color, edgecolor="white", linewidth=1.5,
)
ax.add_patch(rect)
ax.text(0.5, y_bottom + 0.48, f"{label} {_format_value(val)}",
ha="center", va="center", fontsize=10, color="white", fontweight="bold")
display_title = title or metric_name
ax.set_title(display_title, fontsize=13, fontweight="bold", pad=8, color="#1e293b")
_save_and_close(fig, path)
# ── 14. 散点图 ──────────────────────────────────────────────
def _render_scatter(rows, dims, metrics, title, path):
if len(metrics) < 2:
return _render_bar_chart(rows, dims, metrics, title, path)
x_metric = metrics[0]["fieldName"]
y_metric = metrics[1]["fieldName"]
x_vals = _extract_metric_values(rows, x_metric)
y_vals = _extract_metric_values(rows, y_metric)
labels = _build_labels(rows, dims) if dims else [None] * len(rows)
valid = [(xv, yv, l) for xv, yv, l in zip(x_vals, y_vals, labels) if xv is not None and yv is not None]
if not valid:
return
xs, ys, ls = zip(*valid)
fig, ax = plt.subplots(figsize=(7, 5.5))
ax.scatter(xs, ys, s=60, c=PALETTE[0], alpha=0.7, edgecolors="white", linewidths=0.5)
for xv, yv, l in zip(xs, ys, ls):
if l:
ax.annotate(l, (xv, yv), textcoords="offset points", xytext=(5, 5), fontsize=7, color="#64748b")
ax.set_xlabel(x_metric, fontsize=10, color="#475569")
ax.set_ylabel(y_metric, fontsize=10, color="#475569")
_apply_common_style(ax, title or "散点图")
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: _format_value(x)))
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: _format_value(x)))
_save_and_close(fig, path)
# ── 15. 气泡图 ──────────────────────────────────────────────
def _render_bubble(rows, dims, metrics, title, path):
if len(metrics) < 2:
return _render_scatter(rows, dims, metrics, title, path)
x_metric = metrics[0]["fieldName"]
y_metric = metrics[1]["fieldName"]
size_metric = metrics[2]["fieldName"] if len(metrics) >= 3 else None
x_vals = _extract_metric_values(rows, x_metric)
y_vals = _extract_metric_values(rows, y_metric)
size_vals = _extract_metric_values(rows, size_metric) if size_metric else [None] * len(rows)
labels = _build_labels(rows, dims) if dims else [None] * len(rows)
valid = [(xv, yv, sv, l) for xv, yv, sv, l in zip(x_vals, y_vals, size_vals, labels)
if xv is not None and yv is not None]
if not valid:
return
xs, ys, ss, ls = zip(*valid)
if any(s is not None for s in ss):
s_arr = np.array([s if s is not None else 0 for s in ss])
max_s = s_arr.max() if s_arr.max() > 0 else 1
sizes = (s_arr / max_s) * 800 + 30
else:
sizes = [120] * len(xs)
fig, ax = plt.subplots(figsize=(7, 5.5))
ax.scatter(xs, ys, s=sizes, c=PALETTE[0], alpha=0.5, edgecolors=PALETTE[0], linewidths=1)
for xv, yv, l in zip(xs, ys, ls):
if l:
ax.annotate(l, (xv, yv), textcoords="offset points", xytext=(5, 5), fontsize=7, color="#64748b")
ax.set_xlabel(x_metric, fontsize=10, color="#475569")
ax.set_ylabel(y_metric, fontsize=10, color="#475569")
_apply_common_style(ax, title or "气泡图")
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: _format_value(x)))
ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: _format_value(x)))
_save_and_close(fig, path)
# ── X 轴标签辅助 ───────────────────────────────────────────
def _set_x_labels(ax, labels):
n = len(labels)
ax.set_xticks(range(n))
if n <= 6:
ax.set_xticklabels(labels, fontsize=9, rotation=0, ha="center")
elif n <= 15:
ax.set_xticklabels(labels, fontsize=8, rotation=30, ha="right")
elif n <= MAX_DISPLAY_ITEMS:
ax.set_xticklabels(labels, fontsize=7, rotation=45, ha="right")
else:
step = max(1, n // 15)
display = [labels[i] if i % step == 0 else "" for i in range(n)]
ax.set_xticklabels(display, fontsize=7, rotation=45, ha="right")
# ── 渲染器注册表 ───────────────────────────────────────────
_RENDERERS = {
"indicator": _render_indicator,
"bar": _render_bar_chart,
"grouped_bar": _render_grouped_bar,
"stacked_bar": _render_stacked_bar,
"percent_bar": _render_percent_bar,
"horizontal_bar": _render_horizontal_bar,
"stacked_horizontal_bar": _render_stacked_horizontal_bar,
"percent_horizontal_bar": _render_percent_horizontal_bar,
"ranking": _render_ranking,
"line": _render_line,
"combo": _render_combo,
"pie": _render_pie,
"funnel": _render_funnel,
"scatter": _render_scatter,
"bubble": _render_bubble,
}
# =====================================================================
# 单图表入口(供数据集问数 / bichart 模式使用)
# =====================================================================
def render_chart(chart_data: dict, output_dir: str | Path = ".") -> Optional[str]:
"""渲染单个图表数据为 PNG,返回文件路径;失败时返回 None。
``chart_data`` 结构与 ``render_result_charts`` 的 ``dataList`` 中的
单个元素一致:``{data, fieldInfo, chartType, title, id?}``。
"""
if not HAS_MPL:
return None
output_dir = Path(output_dir)
wrapper = {"dataList": [chart_data]}
prefix = f"chart_{chart_data.get('id', 'out')[:8]}"
paths = render_result_charts(wrapper, output_dir, prefix=prefix)
return paths[0] if paths else None
# =====================================================================
# Markdown 表格回退(matplotlib 不可用时使用)
# =====================================================================
def chart_data_to_markdown(chart_data: dict) -> str:
"""将图表数据转为 Markdown 表格文本。"""
data = chart_data.get("data", [])
field_info = chart_data.get("fieldInfo", [])
title = chart_data.get("title", "")
if not data or not field_info:
return f"**{title}**\n\n(无数据)"
headers = [f.get("fieldName", f"列{i}") for i, f in enumerate(field_info)]
lines: List[str] = [f"**{title}**\n"] if title else []
lines.append("| " + " | ".join(headers) + " |")
lines.append("|" + "|".join([" --- "] * len(headers)) + "|")
for row in data[:50]:
row_vals = [str(row.get(h, "")) for h in headers]
lines.append("| " + " | ".join(row_vals) + " |")
if len(data) > 50:
lines.append(f"\n(共 {len(data)} 行,仅显示前 50 行)")
return "\n".join(lines)
FILE:scripts/chat/cube_resolver.py
# -*- coding: utf-8 -*-
"""
数据集解析模块:智能选表、用户权限查询、数据集相关性排序。
当用户未指定 cubeId 时,通过本模块自动匹配最合适的数据集。
"""
from __future__ import annotations
import json
import sys
from difflib import SequenceMatcher
from pathlib import Path
from typing import Any, Dict, List, Optional
sys.path.insert(0, str(Path(__file__).parent.parent))
from common.utils import read_config, require_user_id, request_openapi
# ---------------------------------------------------------------------------
# 用户权限查询
# ---------------------------------------------------------------------------
def query_accessible_cubes(*, config: Optional[dict] = None) -> List[dict]:
"""
调用 GET /openapi/v2/smartq/query/llmCubeWithThemeList 查询用户有权限的问数数据集。
返回 [{"cubeId": "xxx", "cubeName": "yyy"}, ...] 列表。
"""
config = config or read_config()
user_id = require_user_id(config)
resp = request_openapi(
"GET",
"/openapi/v2/smartq/query/llmCubeWithThemeList",
params={"userId": user_id, "runningBySkill": "true"},
config=config,
)
result = resp.json()
if not result.get("success", False):
raise RuntimeError(
f"查询问数数据集权限失败: code={result.get('code')}, message={result.get('message')}"
)
data = result.get("data") or {}
if isinstance(data, str):
data = json.loads(data)
cube_ids_map = (data.get("cubeIds") if isinstance(data, dict) else {}) or {}
return [
{"cubeId": cid, "cubeName": cname}
for cid, cname in cube_ids_map.items()
]
# ---------------------------------------------------------------------------
# 数据集相关性排序
# ---------------------------------------------------------------------------
def rank_cubes_by_relevance(
question: str, cubes: List[dict], top_n: int = 2
) -> List[dict]:
"""
根据用户问题与数据集名称的文本相关性对数据集排序,返回最相关的 top_n 个。
评分策略:
1. cubeName 是 question 的子串 → 高分加成
2. question 中包含 cubeName 的连续子串 → 按最长匹配长度加分
3. 共同字符占 cubeName 长度的比例 → 基础分
"""
scored: List[tuple] = []
q_lower = question.lower()
for cube in cubes:
name = cube.get("cubeName", "")
if not name:
scored.append((0.0, cube))
continue
n_lower = name.lower()
score = 0.0
if n_lower in q_lower:
score += 100.0
elif q_lower in n_lower:
score += 80.0
matcher = SequenceMatcher(None, q_lower, n_lower)
longest = matcher.find_longest_match(0, len(q_lower), 0, len(n_lower))
if longest.size > 0:
score += (longest.size / max(len(n_lower), 1)) * 50.0
common_chars = set(q_lower) & set(n_lower)
name_chars = set(n_lower)
if name_chars:
score += (len(common_chars) / len(name_chars)) * 30.0
scored.append((score, cube))
scored.sort(key=lambda x: x[0], reverse=True)
return [item[1] for item in scored[:top_n]]
# ---------------------------------------------------------------------------
# 智能选表
# ---------------------------------------------------------------------------
def call_table_search(
question: str,
*,
cube_ids: Optional[List[str]] = None,
config: Optional[dict] = None,
) -> List[str]:
"""
调用 POST /openapi/v2/smartq/tableSearch 进行智能选表。
当用户未指定 cubeId 时,根据问题自动匹配最合适的数据集。
返回匹配到的 cubeId 列表。
"""
config = config or read_config()
user_id = require_user_id(config)
payload: Dict[str, Any] = {
"userId": user_id,
"userQuestion": question,
"llmNameForInference": "SYSTEM_deepseek-r1-0528",
"runningBySkill": True,
}
if cube_ids:
payload["cubeIds"] = cube_ids
resp = request_openapi(
"POST",
"/openapi/v2/smartq/tableSearch",
json_body=payload,
config=config,
)
body = resp.json()
if isinstance(body, list):
return body
if isinstance(body, dict):
if str(body.get("success", "")).lower() != "true":
raise RuntimeError(f"tableSearch 失败: [{body.get('code')}] {body.get('message')}")
data = body.get("data")
if isinstance(data, list):
return data
if isinstance(data, str) and data != "null":
try:
parsed = json.loads(data)
if isinstance(parsed, list):
return parsed
except (json.JSONDecodeError, TypeError):
pass
return []
# ---------------------------------------------------------------------------
# 常量配置
# ---------------------------------------------------------------------------
# 智能选表批次大小降级序列:从大到小尝试,避免触发接口 cubeIds 数量限制
# 当接口返回 "cubeIds can not be empty or over limit" 时自动降级到下一档
TABLE_SEARCH_BATCH_SIZES = [30, 10]
# ---------------------------------------------------------------------------
# 组合:数据集解析
# ---------------------------------------------------------------------------
def resolve_cube_id(
question: str,
*,
cube_ids: Optional[List[str]] = None,
config: Optional[dict] = None,
) -> Optional[str]:
"""
完整的数据集解析流程:
1. 查询用户有权限的问数数据集列表
2. 将权限数据集 cubeIds(与调用方传入的候选合并)按文本相关性预筛选
3. 使用自适应降级策略调用智能选表(尝试批次大小 [30, 10])
4. 智能选表未匹配时,按文本相关性从权限数据集中选择最相关的
返回解析到的 cubeId,全部失败时返回 None。
"""
config = config or read_config()
print(f"{'=' * 60}", flush=True)
print(f"[智能选表] 未指定数据集,正在根据问题自动匹配 ...", flush=True)
print(f"[智能选表] 问题: {question}", flush=True)
print(f"{'=' * 60}", flush=True)
# Step 1: 查询用户有权限的数据集
try:
accessible = query_accessible_cubes(config=config)
except Exception as e:
print(f"[权限查询失败] GET /openapi/v2/smartq/query/llmCubeWithThemeList 调用异常:\n {e}", flush=True)
return None
if not accessible:
print("[权限查询] 该用户没有任何数据集的问数权限", flush=True)
print(
"\n============================================================\n"
"您当前没有可用的问数数据集。\n\n"
"📂 试试「文件问数」\n"
"无需任何权限配置,上传 Excel/CSV 文件即可直接分析。\n\n"
"🚀 0 元体验,限时加码\n"
"现在上阿里云,将额外赠送 30 天全功能体验,解锁企业级安全管控与深度分析引擎,\n"
"让 AI 洞察更准、更稳。点击下方链接,领取试用:\n"
"https://www.aliyun.com/product/quickbi-smart?utm_content=g_1000411205\n\n"
"💬 点击下方链接,进入交流群获取最新资讯:\n"
"https://at.umtrack.com/r4Tnme\n"
"============================================================",
flush=True,
)
return None
print(f"[权限查询] 用户共有 {len(accessible)} 个可用数据集:", flush=True)
for item in accessible[:10]:
print(f" - {item['cubeId']} {item['cubeName']}", flush=True)
if len(accessible) > 10:
print(f" ... 共 {len(accessible)} 个", flush=True)
# Step 2: 合并权限 cubeIds 与调用方传入的候选
accessible_ids = [item["cubeId"] for item in accessible]
if cube_ids:
merged = list(dict.fromkeys(cube_ids + accessible_ids))
else:
merged = accessible_ids
# Step 3: 使用自适应降级策略调用智能选表
matched_cube_ids: List[str] = []
# 预筛选:按文本相关性对所有候选排序
ranked_all = rank_cubes_by_relevance(question, accessible, top_n=len(accessible))
ranked_id_to_cube = {cube["cubeId"]: cube for cube in accessible}
# 将 merged 中的 ID 按相关性排序
merged_set = set(merged)
ranked_merged_ids = [cube["cubeId"] for cube in ranked_all if cube["cubeId"] in merged_set]
for batch_size in TABLE_SEARCH_BATCH_SIZES:
# 截取当前批次的候选 ID
candidates = ranked_merged_ids[:batch_size]
if not candidates:
continue
print(f"[智能选表] 尝试使用 top {len(candidates)} 个相关数据集进行匹配...", flush=True)
try:
matched_cube_ids = call_table_search(question, cube_ids=candidates, config=config)
if matched_cube_ids:
print(f"[智能选表] 匹配成功(批次大小 {len(candidates)})", flush=True)
break # 找到匹配,提前终止
except Exception as e2:
error_msg = str(e2)
if "cubeIds can not be empty or over limit" in error_msg:
print(f"[智能选表] 候选数量 {len(candidates)} 超出接口限制,降级到下一批次...", flush=True)
continue # 尝试更小批次
# 其他异常直接抛出
print(f"[智能选表失败] POST /openapi/v2/smartq/tableSearch 调用异常:\n {e2}", flush=True)
matched_cube_ids = []
break
if matched_cube_ids:
cube_id = matched_cube_ids[0]
print(f"[智能选表] 匹配到数据集: {cube_id}", flush=True)
if len(matched_cube_ids) > 1:
print(f"[智能选表] 其他候选: {matched_cube_ids[1:]}", flush=True)
return cube_id
# Step 4: 智能选表未匹配,按文本相关性从权限数据集中选择
ranked = rank_cubes_by_relevance(question, accessible, top_n=2)
cube_id = ranked[0]["cubeId"]
print(f"[相关性匹配] 智能选表未返回结果,根据问题与数据集名称相关性选择:", flush=True)
for i, rc in enumerate(ranked):
tag = "→ 选定" if i == 0 else " 候选"
print(f" {tag}: {rc['cubeId']} {rc['cubeName']}", flush=True)
return cube_id
FILE:scripts/chat/file_stream_query.py
# -*- coding: utf-8 -*-
"""
文件问数流式查询脚本(步骤 2)。
接收步骤 1(upload_file.py)返回的 fileId,发起流式问数,
实时解析 SSE 事件流并输出结果。
核心策略:
- code 事件 → 拼接为完整 Python 代码并保存
- result 事件 → 解析结构化数据,用 matplotlib 渲染图表 PNG
- reporter 事件 → 拼接为分析报告文本
- html 事件 → 仅保存原始 HTML(不截图)
用法:
python scripts/file_stream_query.py <fileId> "各部门人数分布"
"""
from __future__ import annotations
import argparse
import json
import sys
import time
from pathlib import Path
from typing import Any, Dict, List, Optional
sys.path.insert(0, str(Path(__file__).parent.parent))
from common.utils import (
read_config,
require_user_id,
request_openapi_stream,
parse_sse_event,
check_trial_expired,
)
from chat.chart_renderer import render_result_charts, HAS_MPL
from common.config_loader import get_skill_output_dir, get_image_output_dir
OUTPUT_DIR = None # 已废弃,下方函数直接调用 get_skill_output_dir()
STREAM_URI = "/openapi/v2/smartq/queryByQuestionStreamByFile"
TERMINAL_EVENTS = {"finish", "error", "check"}
SKIP_EVENTS = {"heartbeat", "timestamp", "locale", "feedback", "message", "token"}
def _save_code_file(code: str, ts: int) -> str:
"""将拼接完成的 Python 代码保存到 output/ 目录。"""
output_dir = get_skill_output_dir()
output_dir.mkdir(parents=True, exist_ok=True)
path = output_dir / f"analysis_code_{ts}.py"
path.write_text(code, encoding="utf-8")
return str(path)
def _save_html_raw(html_content: str, question: str, index: int) -> str:
"""将原始 HTML 保存到 output/(不做截图)。"""
output_dir = get_skill_output_dir()
output_dir.mkdir(parents=True, exist_ok=True)
ts = int(time.time())
filepath = output_dir / f"chart_html_{ts}_{index}.html"
if not (html_content.strip().lower().startswith("<!doctype")
or html_content.strip().lower().startswith("<html")):
html_content = (
'<!DOCTYPE html><html lang="zh-CN"><head><meta charset="utf-8"/></head>'
f"<body>{html_content}</body></html>"
)
filepath.write_text(html_content, encoding="utf-8")
return str(filepath)
# ---------------------------------------------------------------------------
# 事件流处理
# ---------------------------------------------------------------------------
class StreamSession:
"""管理一次文件问数的流式会话状态。"""
def __init__(self, question: str):
self.question = question
self.ts = int(time.time())
# 核心输出
self.code_parts: List[str] = []
self.result_data: Optional[dict] = None
self.chart_images: List[str] = []
self.reporter_parts: List[str] = []
# 辅助
self.text_parts: List[str] = []
self.reasoning_parts: List[str] = []
self.answer_parts: List[str] = []
self.plan_text = ""
self.related_info_parts: List[str] = []
self.html_files: List[str] = []
self.html_chart_index = 0
self.sql = ""
self.conclusion = ""
self.summary = ""
self.finish_msg = ""
self.error_msg = ""
self.react_event_start_count = 0
self.trace_id = ""
# ----- 公共方法 -----
def handle_event(self, event: Dict[str, Any]):
"""处理单个 SSE 事件。"""
event_type = event.get("type", "")
data = event.get("data", "")
sub_type = event.get("subType", "")
if event_type in SKIP_EVENTS:
return
if event_type == "react" and sub_type == "EVENT_START":
self.react_event_start_count += 1
handler = getattr(self, f"_on_{event_type}", None)
if handler:
handler(data)
else:
self._on_unknown(event_type, data)
def finalize(self):
"""流结束后的收尾:保存代码、渲染图表。"""
if self.code_parts:
full_code = "".join(self.code_parts).strip()
if full_code:
path = _save_code_file(full_code, self.ts)
print(f"\n[代码] 分析代码已生成", flush=True)
if self.result_data and HAS_MPL:
charts = render_result_charts(
self.result_data,
get_image_output_dir(),
prefix="chart",
)
if charts:
self.chart_images.extend(charts)
for img in charts:
print(f"\n[图表] 已生成 → {img}", flush=True)
print(f"", flush=True)
def get_result_summary(self) -> str:
"""返回简要结果摘要(仅输出元信息,避免与流式输出重复)。"""
parts = []
if self.result_data:
data_list = self.result_data.get("dataList", [])
parts.append(f"取数结果:共 {len(data_list)} 组数据")
for i, ds in enumerate(data_list, 1):
rows = ds.get("data", [])
fields = [f.get("fieldName", "") for f in ds.get("fieldInfo", [])]
parts.append(f" 数据集{i}: {len(rows)} 行, 字段={fields}")
if self.chart_images:
parts.append(f"生成图表 {len(self.chart_images)} 张:")
for img in self.chart_images:
parts.append(f" - {img}")
if self.error_msg:
parts.append(f"错误:{self.error_msg}")
return "\n".join(parts) if parts else "未获取到有效结果"
# ----- code 事件:拼接 Python 代码 -----
def _on_code(self, data):
if data:
self.code_parts.append(str(data))
# ----- result 事件:结构化取数结果 + 图表渲染 -----
def _on_result(self, data):
parsed = data
if isinstance(data, str):
try:
parsed = json.loads(data)
except (json.JSONDecodeError, TypeError):
parsed = None
if isinstance(parsed, dict) and "dataList" in parsed:
self.result_data = parsed
data_list = parsed.get("dataList", [])
print(f"\n\n[取数结果] 共 {len(data_list)} 组数据", flush=True)
for i, ds in enumerate(data_list, 1):
rows = ds.get("data", [])
fields = [f.get("fieldName", "") for f in ds.get("fieldInfo", [])]
print(f" 数据集{i}: {len(rows)} 行, 字段={fields}", flush=True)
else:
self.text_parts.append(str(data))
print(f"\n[执行结果] {str(data)[:500]}", flush=True)
# ----- reporter 事件:分析报告 -----
def _on_reporter(self, data):
if data:
self.reporter_parts.append(str(data))
print(str(data), end="", flush=True)
# ----- plan / question / relatedInfo -----
def _on_plan(self, data):
self.plan_text = str(data)
print(f"\n[分析规划]\n{data}", flush=True)
def _on_question(self, data):
parsed = data
if isinstance(data, str):
try:
parsed = json.loads(data)
except (json.JSONDecodeError, TypeError):
parsed = data
if isinstance(parsed, dict):
title = parsed.get("title", "")
desc = parsed.get("desc", "")
print(f"\n[{title}]\n{desc}", flush=True)
else:
print(f"\n[子问题] {data}", flush=True)
def _on_relatedInfo(self, data):
if data:
self.related_info_parts.append(str(data))
# ----- text / reasoning / answer -----
def _on_text(self, data):
if data:
self.text_parts.append(str(data))
print(str(data), end="", flush=True)
def _on_reasoning(self, data):
if data:
self.reasoning_parts.append(str(data))
print(str(data), end="", flush=True)
def _on_answer(self, data):
if data:
self.answer_parts.append(str(data))
print(str(data), end="", flush=True)
# ----- html 事件(仅保存,不截图) -----
def _on_html(self, data):
if data:
self.html_chart_index += 1
path = _save_html_raw(str(data), self.question, self.html_chart_index)
self.html_files.append(path)
print(f"\n[HTML 图表] 已保存 → {path}(仅供参考,图表由 result 数据生成)", flush=True)
def _on_html_result(self, data):
parsed = data
if isinstance(data, str):
try:
parsed = json.loads(data)
except (json.JSONDecodeError, TypeError):
parsed = data
if isinstance(parsed, dict) and "dataList" in parsed:
self.result_data = parsed
data_list = parsed.get("dataList", [])
print(f"\n[图表数据] 共 {len(data_list)} 组数据", flush=True)
else:
print(f"\n[图表数据] {str(data)[:300]}", flush=True)
def _on_unStructuredChart(self, data):
if data:
self.html_chart_index += 1
path = _save_html_raw(str(data), self.question, self.html_chart_index)
self.html_files.append(path)
print(f"\n[非结构化图表] 已保存 → {path}", flush=True)
# ----- SQL / 结论 -----
def _on_sql(self, data):
self.sql = str(data)
print(f"\n[SQL]\n{data}", flush=True)
def _on_python(self, data):
if isinstance(data, dict):
code = data.get("code", "")
result = data.get("result", "")
if code:
self.code_parts.append(code)
if result:
print(f"\n[执行结果]\n{result}", flush=True)
else:
print(f"\n[Python] {data}", flush=True)
def _on_conclusion(self, data):
self.conclusion = str(data)
print(f"\n[结论] {data}", flush=True)
def _on_summary(self, data):
self.summary = str(data)
print(f"\n[数据解读] {data}", flush=True)
# ----- 终止事件 -----
def _on_trace(self, data):
self.trace_id = str(data)
print(f"[Trace] {self.trace_id}", flush=True)
def _on_finish(self, data):
self.finish_msg = str(data) if data else ""
if self.finish_msg:
print(f"\n[完成] {self.finish_msg}", flush=True)
else:
print("\n[完成]", flush=True)
if self.trace_id:
print(f"[Trace ID] {self.trace_id}(问题反馈时请提供此 ID)", flush=True)
def _on_error(self, data):
self.error_msg = str(data)
print(f"\n[错误] {data}", flush=True)
check_trial_expired(data if isinstance(data, dict) else str(data))
if self.react_event_start_count >= 2:
print(
"\n============================================================\n"
"⚠️ 数据文件解析失败\n"
"当前问数的数据文件可能存在格式或内容问题,服务端多次重试执行均未成功。\n\n"
"💡 建议排查\n"
"请检查文件是否为标准的 Excel/CSV 格式,确认数据内容完整无损后重新上传。\n\n"
"💬 如仍无法解决,点击下方链接,进入交流群联系 Quick BI 产品服务同学获取支持:\n"
"https://at.umtrack.com/r4Tnme\n"
"============================================================",
flush=True,
)
def _on_check(self, data):
print(f"\n[校验] {data}", flush=True)
def _on_reject(self, data):
print(f"\n[拒识] {data}", flush=True)
# ----- 辅助事件 -----
def _on_step(self, data):
print(f"\n[步骤] {data}", flush=True)
def _on_subStep(self, data):
print(f"\n[子步骤] {data}", flush=True)
def _on_rewrite(self, data):
print(f"\n[问题改写] {data}", flush=True)
def _on_python_error(self, data):
print(f"\n[Python 错误] {data}", flush=True)
def _on_olapResult(self, data):
if isinstance(data, dict):
print(f"\n[OLAP 结果] 行数={len(data.get('data', []))}", flush=True)
def _on_onlineSearchResult(self, data):
print(f"\n[联网搜索] {str(data)[:200]}", flush=True)
def _on_actionThinking(self, data):
print(f"\n[思考] {data}", flush=True)
def _on_schedule(self, data):
print(f"\n[调度] {data}", flush=True)
def _on_selector(self, data):
print(f"\n[选表] {data}", flush=True)
def _on_systemSelector(self, data):
print(f"\n[系统选表] {data}", flush=True)
def _on_react(self, data):
if data:
print(f"\n[重试代码] {data}", flush=True)
def _on_table_retrieve(self, data):
print(f"\n[表召回] {data}", flush=True)
def _on_schema_retrieve(self, data):
print(f"\n[Schema 召回] {data}", flush=True)
def _on_adaptation(self, data):
print(f"\n[问题改写] {data}", flush=True)
def _on_resource_info(self, data):
print(f"\n[资源信息] {data}", flush=True)
def _on_unknown(self, event_type: str, data):
if data:
print(f"\n[{event_type}] {str(data)[:200]}", flush=True)
def main():
parser = argparse.ArgumentParser(description="文件问数:基于 fileId 发起流式问数")
parser.add_argument("file_id", help="步骤 1(upload_file.py)返回的 fileId")
parser.add_argument("question", help="要问的问题")
parser.add_argument("--verbose", action="store_true", help="启用详细调试输出")
parser.add_argument("--workspace-dir", default=None, help="用户工作目录路径")
args = parser.parse_args()
if args.workspace_dir:
from common.config_loader import set_workspace_dir
set_workspace_dir(args.workspace_dir)
try:
config = read_config()
user_id = require_user_id(config)
print(f"[文件问数] fileId={args.file_id}", flush=True)
print(f"[文件问数] userId={user_id}", flush=True)
print(f"[文件问数] 问题: {args.question}", flush=True)
print("=" * 60, flush=True)
body = {
"fileId": args.file_id,
"userId": user_id,
"userQuestion": args.question,
"runningBySkill": True,
}
session = StreamSession(args.question)
for raw_event in request_openapi_stream(STREAM_URI, json_body=body, config=config):
if args.verbose:
print(f"\n--- RAW SSE ---\n{raw_event}\n--- END SSE ---", flush=True)
event = parse_sse_event(raw_event)
if not event:
continue
if args.verbose:
print(f"[PARSED] type={event.get('type', '')}, data={json.dumps(event, ensure_ascii=False, default=str)[:500]}", flush=True)
session.handle_event(event)
event_type = event.get("type", "")
if event_type in TERMINAL_EVENTS:
break
session.finalize()
print("\n" + "=" * 60, flush=True)
print(session.get_result_summary(), flush=True)
except Exception as e:
print(f"\n[错误] {e}", flush=True)
check_trial_expired(str(e))
sys.exit(1)
if __name__ == "__main__":
main()
FILE:scripts/chat/html_snapshot.py
# -*- coding: utf-8 -*-
"""
HTML → PNG 截图工具(零额外依赖)。
利用系统已安装的 Chrome / Chromium / Edge 的 headless 模式将 HTML 文件截图为 PNG。
无需安装 Playwright、Selenium 或任何 pip/npm 包。
用法:
from html_snapshot import html_to_png
png_path = html_to_png("/path/to/chart.html")
# 返回 PNG 路径,失败返回 None
"""
from __future__ import annotations
import os
import platform
import shutil
import subprocess
from pathlib import Path
from typing import Optional
_BROWSER_PATHS: dict[str, list[str]] = {
"Darwin": [
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary",
"/Applications/Chromium.app/Contents/MacOS/Chromium",
"/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge",
"/Applications/Brave Browser.app/Contents/MacOS/Brave Browser",
],
"Linux": [
"google-chrome",
"google-chrome-stable",
"chromium-browser",
"chromium",
"microsoft-edge",
],
"Windows": [
os.path.expandvars(r"%ProgramFiles%\Google\Chrome\Application\chrome.exe"),
os.path.expandvars(r"%ProgramFiles(x86)%\Google\Chrome\Application\chrome.exe"),
os.path.expandvars(r"%LocalAppData%\Google\Chrome\Application\chrome.exe"),
os.path.expandvars(r"%ProgramFiles%\Microsoft\Edge\Application\msedge.exe"),
os.path.expandvars(r"%ProgramFiles(x86)%\Microsoft\Edge\Application\msedge.exe"),
],
}
def _find_browser() -> Optional[str]:
"""在系统中查找可用的 Chrome/Chromium/Edge 浏览器路径。"""
sys_name = platform.system()
candidates = _BROWSER_PATHS.get(sys_name, [])
for path in candidates:
if sys_name in ("Darwin", "Windows"):
if Path(path).exists():
return path
else:
# 使用 shutil.which() 替代 subprocess 调用 which 命令,跨平台兼容
found = shutil.which(path)
if found:
return found
return None
def html_to_png(
html_path: str,
*,
output_path: Optional[str] = None,
width: int = 800,
height: int = 600,
) -> Optional[str]:
"""将 HTML 文件截图为 PNG。
Args:
html_path: HTML 文件的绝对路径
output_path: PNG 输出路径,默认为同目录同名 .png
width: 视口宽度
height: 视口高度
Returns:
PNG 文件路径,浏览器不可用或截图失败时返回 None
"""
browser = _find_browser()
if not browser:
return None
html_path = str(Path(html_path).resolve())
if output_path is None:
output_path = str(Path(html_path).with_suffix(".png"))
# 使用 Path.as_uri() 生成跨平台兼容的 file:// URL
file_url = Path(html_path).as_uri()
cmd = [
browser,
"--headless=new",
"--disable-gpu",
"--no-sandbox",
"--disable-software-rasterizer",
f"--window-size={width},{height}",
f"--screenshot={output_path}",
"--hide-scrollbars",
file_url,
]
try:
result = subprocess.run(
cmd, capture_output=True, text=True, timeout=15,
)
if Path(output_path).exists() and Path(output_path).stat().st_size > 0:
return output_path
except Exception:
pass
return None
FILE:scripts/chat/smartq_stream_query.py
# -*- coding: utf-8 -*-
"""
Quick BI 小Q问数:流式查询主入口。
负责 SSE 事件流的解析和编排,具体子任务委托给各专职模块:
- cube_resolver — 数据集解析(智能选表 / 权限查询 / 相关性兜底)
- chart_renderer — 图表渲染 + Markdown 表格 fallback
SSE 流中的 olapResult 事件已直接包含取数结果,无需再调用 OLAP 接口。
用法:
python scripts/smartq_stream_query.py "2023年的总销售额是多少" --cube-id "dcbb0f94-..."
"""
from __future__ import annotations
import argparse
import json
import sys
import time
from pathlib import Path
from typing import Any, Dict, List, Optional
sys.path.insert(0, str(Path(__file__).parent.parent))
from common.utils import (
read_config,
require_user_id,
request_openapi_stream,
parse_sse_event,
check_trial_expired,
)
from chat.cube_resolver import resolve_cube_id
from chat.chart_renderer import render_chart, chart_data_to_markdown
from common.config_loader import get_skill_output_dir, get_image_output_dir
# olapResult 事件中的 chartType 枚举 → chart_renderer 可识别的图表类型
OLAP_CHART_TYPE_MAP = {
"NEW_TABLE": "table",
"BAR": "bar",
"LINE": "line",
"PIE": "pie",
"SCATTER_NEW": "scatter",
"INDICATOR_CARD": "indicator-card",
"RANKING_LIST": "ranking-list",
"DETAIL_TABLE": "table",
"MAP_COLOR_NEW": "bar",
"PROGRESS_NEW": "horizontal_bar",
"FUNNEL_NEW": "funnel",
}
# olapResult metaType 中的数据类型 → fieldInfo 标准类型
DATA_TYPE_MAP = {
"number": "numerical",
"string": "string",
"date": "time",
"datetime": "time",
"boolean": "string",
}
# ---------------------------------------------------------------------------
# SSE olapResult 事件处理(取数结果已内联在流中,无需再调用 OLAP 接口)
# ---------------------------------------------------------------------------
def handle_olap_result_event(
event_data: dict,
*,
question: str = "",
) -> Optional[dict]:
"""
处理 type=olapResult 事件:SSE 流已直接返回取数结果,
将其转换为 chart_renderer 所需的 chart_data 格式并渲染图表。
olapResult 数据格式::
{
"values": [{"row": ["val1", "val2"]}, ...],
"chartType": "RANKING_LIST",
"metaType": [{"v": "string", "k": "字段名", "type": "row", "t": "dimension"}, ...],
"logicSql": "SELECT ...",
"conclusionText": "..."
}
"""
data_str = event_data.get("data", "")
try:
olap_result = json.loads(data_str) if isinstance(data_str, str) else data_str
except json.JSONDecodeError:
print(" [olapResult] 无法解析 data JSON", flush=True)
return None
values = olap_result.get("values") or []
meta_type_list = olap_result.get("metaType") or []
chart_type_raw = olap_result.get("chartType", "")
logic_sql = olap_result.get("logicSql", "")
chart_type = OLAP_CHART_TYPE_MAP.get(chart_type_raw, "bar")
print(f"\n{'=' * 60}", flush=True)
print(f"[取数结果] 图表类型: {chart_type} ({chart_type_raw})", flush=True)
print(f"[取数结果] 数据行数: {len(values)}", flush=True)
print(f"{'=' * 60}", flush=True)
if logic_sql:
print(f"\n[SQL] {logic_sql}", flush=True)
if not values or not meta_type_list:
print(f"\n**{question or '查询结果'}**\n\n(无数据)", flush=True)
return None
field_info: List[Dict[str, Any]] = []
for meta in meta_type_list:
meta_v = meta.get("v", "string")
meta_t = meta.get("t", "")
meta_type_val = meta.get("type", "")
if meta_t:
role = "metric" if meta_t.lower() == "measure" else "dimension"
else:
role = "metric" if meta_type_val == "column" else "dimension"
field_info.append({
"fieldName": meta.get("k", ""),
"type": DATA_TYPE_MAP.get(meta_v, "string"),
"role": role,
})
data_rows: List[Dict[str, Any]] = []
for val_item in values:
row_values = val_item.get("row") or []
row_dict: Dict[str, Any] = {}
for i, field in enumerate(field_info):
if i < len(row_values):
raw_val = row_values[i]
if field["type"] == "numerical" and raw_val is not None:
try:
row_dict[field["fieldName"]] = float(raw_val)
except (ValueError, TypeError):
row_dict[field["fieldName"]] = raw_val
else:
row_dict[field["fieldName"]] = raw_val
else:
row_dict[field["fieldName"]] = ""
data_rows.append(row_dict)
chart_data: Dict[str, Any] = {
"data": data_rows,
"fieldInfo": field_info,
"chartType": chart_type,
"title": question,
"id": f"olap_{int(time.time())}",
}
print(f" 字段数: {len(field_info)}, 数据行数: {len(data_rows)}", flush=True)
# 统计度量列数量
metric_count = sum(1 for f in field_info if f.get("role") == "metric")
if chart_type == "table" and metric_count == 1:
# table 类型且仅一个度量列:同时输出图表图片和 Markdown 表格
output_dir = str(get_image_output_dir())
# 临时将 chartType 改为 bar 以便渲染图片
chart_data_for_render = {**chart_data, "chartType": "bar"}
chart_path = render_chart(chart_data_for_render, output_dir=output_dir)
if chart_path:
chart_title = chart_data.get("title", "图表")
print(f"\n", flush=True)
md_table = chart_data_to_markdown(chart_data)
print(f"\n{md_table}", flush=True)
elif chart_type == "table":
# table 类型且多个度量列:仅输出 Markdown 表格
md_table = chart_data_to_markdown(chart_data)
print(f"\n{md_table}", flush=True)
else:
output_dir = str(get_image_output_dir())
chart_path = render_chart(chart_data, output_dir=output_dir)
if chart_path:
chart_title = chart_data.get("title", "图表")
print(f"\n", flush=True)
else:
md_table = chart_data_to_markdown(chart_data)
print(f"\n{md_table}", flush=True)
return chart_data
# ---------------------------------------------------------------------------
# 流式问数主流程
# ---------------------------------------------------------------------------
def run_stream_query(
question: str,
cube_id: Optional[str] = None,
*,
cube_ids: Optional[List[str]] = None,
config: Optional[dict] = None,
):
"""
执行流式问数的完整流程。
当 cube_id 未提供时,委托 cube_resolver.resolve_cube_id 自动解析数据集。
"""
config = config or read_config()
user_id = require_user_id(config)
if not cube_id:
cube_id = resolve_cube_id(question, cube_ids=cube_ids, config=config)
if not cube_id:
print("\n[数据集问数终止] 未能匹配到可用数据集,请参考上方提示选择其他方式。", flush=True)
return []
print(flush=True)
payload: Dict[str, Any] = {
"userQuestion": question,
"userId": user_id,
"cubeId": cube_id,
"llmNameForData": "nvl",
"llmNameForInference": "nvl",
"runningBySkill": True,
}
print(f"{'=' * 60}", flush=True)
print(f"[小Q问数] 问题: {question}", flush=True)
print(f"[小Q问数] 数据集: {cube_id}", flush=True)
print(f"{'=' * 60}\n", flush=True)
uri = "/openapi/v2/smartq/queryByQuestionStream"
related_info_buf: List[str] = []
reasoning_buf: List[str] = []
sql_buf: List[str] = []
summary_buf: List[str] = []
chart_results: List[dict] = []
trace_id: Optional[str] = None
def _flush_buffers():
"""输出已缓冲的关联知识、推理过程和 SQL。"""
nonlocal related_info_buf, reasoning_buf, sql_buf
if related_info_buf:
print(f"\n[关联知识] {''.join(related_info_buf)}", flush=True)
related_info_buf.clear()
if reasoning_buf:
print(f"\n[推理过程] {''.join(reasoning_buf)}", flush=True)
reasoning_buf.clear()
if sql_buf:
print(f"\n[SQL] {''.join(sql_buf)}", flush=True)
sql_buf.clear()
try:
for raw_event in request_openapi_stream(uri, json_body=payload, config=config, timeout=600):
event_data = parse_sse_event(raw_event)
if not event_data:
continue
event_type = event_data.get("type", "")
data = event_data.get("data", "")
if event_type in ("heartbeat", "locale", "message"):
continue
elif event_type == "trace":
trace_id = str(data)
print(f"[Trace] {trace_id}", flush=True)
elif event_type == "relatedInfo":
related_info_buf.append(str(data))
elif event_type == "reasoning":
reasoning_buf.append(str(data))
elif event_type == "text":
print(f"[文本] {data}", flush=True)
elif event_type == "sql":
sql_buf.append(str(data))
elif event_type == "olapResult":
_flush_buffers()
chart_data = handle_olap_result_event(
event_data, question=question,
)
if chart_data:
chart_results.append(chart_data)
elif event_type == "summary":
summary_buf.append(str(data))
elif event_type == "conclusion":
print(f"\n[结论] {data}", flush=True)
elif event_type == "check":
print(f"\n[校验] {data}", flush=True)
elif event_type == "error":
print(f"\n[错误] {data}", flush=True)
check_trial_expired(str(data))
elif event_type in ("dsl", "answer", "param", "feedback", "python"):
pass
elif event_type == "finish":
_flush_buffers()
if summary_buf:
print(f"\n[数据解读] {''.join(summary_buf)}", flush=True)
print(f"\n[完成] {data}", flush=True)
break
except Exception as e:
print(f"\n[问数流式请求失败] POST {uri} 调用异常:\n {e}", flush=True)
check_trial_expired(str(e))
return chart_results
print(f"\n{'=' * 60}", flush=True)
print(f"[问数结束] 共生成 {len(chart_results)} 个图表", flush=True)
if trace_id:
print(f"[Trace ID] {trace_id}(问题反馈时请提供此 ID)", flush=True)
# 将图表数据保存到 JSON 文件
if chart_results:
output_dir = get_skill_output_dir()
output_dir.mkdir(parents=True, exist_ok=True)
chart_data_file = output_dir / f"chart_results_{int(time.time())}.json"
try:
with open(chart_data_file, 'w', encoding='utf-8') as f:
json.dump(chart_results, f, ensure_ascii=False, indent=2)
print(f"[图表数据] 已保存到: {chart_data_file}", flush=True)
except Exception as e:
print(f"[警告] 图表数据保存失败: {e}", flush=True)
print(f"{'=' * 60}", flush=True)
return chart_results
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(description="Quick BI 小Q问数")
parser.add_argument("question", help="用户问题")
parser.add_argument("--cube-id", dest="cube_id", default=None, help="数据集 ID(不指定时自动智能选表)")
parser.add_argument("--cube-ids", dest="cube_ids", default=None, help="候选数据集 ID 列表,逗号分隔(仅智能选表时使用)")
parser.add_argument("--workspace-dir", default=None, help="用户工作目录路径")
args = parser.parse_args()
if args.workspace_dir:
from common.config_loader import set_workspace_dir
set_workspace_dir(args.workspace_dir)
cube_ids = args.cube_ids.split(",") if args.cube_ids else None
chart_results = run_stream_query(args.question, args.cube_id, cube_ids=cube_ids)
# CLI 模式下,返回值已通过文件保存并输出路径到控制台
# 如需进一步处理,可使用 chart_results 变量
if __name__ == "__main__":
main()
FILE:scripts/chat/upload_file.py
# -*- coding: utf-8 -*-
"""
文件上传脚本(步骤 1):将 Excel/CSV 文件上传至 Quick BI 并获取 fileId。
用法:
python scripts/upload_file.py /path/to/data.xlsx
"""
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
from typing import Dict, Optional
sys.path.insert(0, str(Path(__file__).parent.parent))
import requests
from common.utils import read_config, require_user_id, get_server_domain, build_request_headers, check_trial_expired, _should_skip_ssl
ALLOWED_EXTENSIONS = {"xls", "xlsx", "csv"}
MAX_FILE_SIZE = 10 * 1024 * 1024 # 10 MB
MIME_MAP = {
".xls": "application/vnd.ms-excel",
".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
".csv": "text/csv",
}
UPLOAD_URI = "/openapi/v2/copilot/parse"
def validate_file(file_path: str):
"""校验文件格式和大小。"""
p = Path(file_path)
if not p.exists():
raise FileNotFoundError(f"文件不存在: {file_path}")
ext = p.suffix.lstrip(".").lower()
if ext not in ALLOWED_EXTENSIONS:
raise ValueError(f"不支持的文件格式 .{ext},仅支持: {', '.join(ALLOWED_EXTENSIONS)}")
size = p.stat().st_size
if size > MAX_FILE_SIZE:
raise ValueError(f"文件大小 {size / 1024 / 1024:.1f}MB 超过限制 10MB")
def upload_file(file_path: str, *, config: Optional[dict] = None) -> dict:
"""
调用 POST /openapi/v2/copilot/parse 上传文件并解析结构。
返回接口响应 JSON(含 fileId)。
"""
config = config or read_config()
validate_file(file_path)
p = Path(file_path)
file_name = p.name
ext = p.suffix.lower()
content_type = MIME_MAP.get(ext, "application/octet-stream")
server_domain = get_server_domain(config)
url = server_domain + UPLOAD_URI
user_id = require_user_id(config)
form_data: Dict[str, str] = {
"fileName": file_name,
"isSave": "false",
"fileId": "",
"oApiUserId": user_id,
"runningBySkill": "true",
}
headers = build_request_headers("POST", UPLOAD_URI, None, config=config)
headers["origin"] = server_domain
file_size = p.stat().st_size
print(f"[文件上传] 请求: POST {url}", flush=True)
print(f"[文件上传] 表单参数: {json.dumps(form_data, ensure_ascii=False)}", flush=True)
print(f"[文件上传] 文件: name={file_name}, size={file_size / 1024:.1f}KB, contentType={content_type}", flush=True)
with open(file_path, "rb") as f:
files = {"file": (file_name, f, content_type)}
resp = requests.post(url, headers=headers, data=form_data, files=files, timeout=120,
verify=not _should_skip_ssl(url))
if not resp.ok:
body = ""
try:
body = resp.text[:2000]
except Exception:
pass
raise requests.HTTPError(
f"HTTP {resp.status_code} {resp.reason} for POST {UPLOAD_URI}\n响应体: {body}",
response=resp,
)
return resp.json()
def _is_success(result: dict) -> bool:
val = result.get("success")
if isinstance(val, bool):
return val
return str(val).lower() == "true"
def main():
parser = argparse.ArgumentParser(description="上传 Excel/CSV 文件并获取 fileId")
parser.add_argument("file", help="要上传的文件路径(支持 xls/xlsx/csv,≤10MB)")
parser.add_argument("--workspace-dir", default=None, help="用户工作目录路径")
args = parser.parse_args()
if args.workspace_dir:
from common.config_loader import set_workspace_dir
set_workspace_dir(args.workspace_dir)
config = read_config()
user_id = require_user_id(config)
print(f"[文件上传] userId={user_id}", flush=True)
print(f"[文件上传] 正在上传: {args.file}", flush=True)
result = upload_file(args.file, config=config)
if _is_success(result):
data = result.get("data", {})
if isinstance(data, str):
try:
data = json.loads(data)
except json.JSONDecodeError:
data = {}
file_id = data.get("fileId", "") if isinstance(data, dict) else ""
print(f"[文件上传] 上传成功,fileId={file_id}", flush=True)
print(json.dumps(result, indent=2, ensure_ascii=False), flush=True)
else:
print(f"[文件上传] 上传失败: {result.get('message', '未知错误')}", flush=True)
print(f"[文件上传] 错误详情:", flush=True)
print(json.dumps(result, indent=2, ensure_ascii=False), flush=True)
check_trial_expired(result)
sys.exit(1)
if __name__ == "__main__":
main()
FILE:scripts/common/__init__.py
FILE:scripts/common/config_loader.py
# -*- coding: utf-8 -*-
"""
QBI 小Q问数配置加载器。
实现四层配置分层加载,确保用户配置不受技能包更新影响。
加载优先级(低 → 高):
1. default_config.yaml — 包内默认值,随技能包发布
2. ~/.qbi/config.yaml — QBI 全局配置,所有 skill 共享
3. $WORKSPACE_DIR/.qbi/smartq-chat/config.yaml — 工作目录级配置(由 --workspace-dir 参数或 WORKSPACE_DIR 环境变量指定,必须显式传入)
4. ACCESS_TOKEN 环境变量 — 最高优先级,适合容器部署
"""
from __future__ import annotations
import base64
import json
import os
from pathlib import Path
from typing import Any, Dict, Optional
import yaml
# ---------------------------------------------------------------------------
# 路径常量
# ---------------------------------------------------------------------------
BASE_DIR = Path(__file__).resolve().parent
DEFAULT_CONFIG_PATH = BASE_DIR.parent.parent / "default_config.yaml"
QBI_HOME = Path.home() / ".qbi"
GLOBAL_CONFIG_PATH = QBI_HOME / "config.yaml"
SKILL_NAME = "smartq-chat"
# 由 CLI --workspace-dir 参数设置,优先级最高
_workspace_dir_override: Optional[str] = None
def set_workspace_dir(path: str):
"""设置工作目录路径(由脚本入口通过 --workspace-dir 参数调用)。"""
global _workspace_dir_override
_workspace_dir_override = path
def _resolve_work_dir() -> Path:
"""获取用户实际工作目录。
优先级:CLI --workspace-dir 参数 > WORKSPACE_DIR 环境变量。
两者均未设置时直接报错,禁止静默降级到 HOME 目录(会导致配置读取错误)。
"""
# 1. CLI 参数(最高优先级)
if _workspace_dir_override:
work_dir = Path(_workspace_dir_override)
print(f"[配置] 工作目录: {work_dir} (来源=--workspace-dir 参数)", flush=True)
return work_dir
# 2. 环境变量
env_cwd = os.environ.get("WORKSPACE_DIR")
if env_cwd:
work_dir = Path(env_cwd)
print(f"[配置] 工作目录: {work_dir} (来源=WORKSPACE_DIR 环境变量)", flush=True)
return work_dir
# 3. 未设置 → 报错,要求 agent 显式传入
raise RuntimeError(
"[配置错误] 工作目录未设置!--workspace-dir 参数和 WORKSPACE_DIR 环境变量均未提供。\n"
"请通过 --workspace-dir 参数传入用户实际工作目录的绝对路径后重新执行脚本。\n"
"示例: python3 script.py ... --workspace-dir '/path/to/workspace'"
)
def get_skill_work_home() -> Path:
"""$WORKSPACE_DIR/.qbi(工作目录级 QBI 根目录)"""
return _resolve_work_dir() / ".qbi"
def get_skill_config_dir() -> Path:
"""$WORKSPACE_DIR/.qbi/smartq-chat/"""
return get_skill_work_home() / SKILL_NAME
def get_skill_config_path() -> Path:
"""$WORKSPACE_DIR/.qbi/smartq-chat/config.yaml"""
return get_skill_config_dir() / "config.yaml"
def get_skill_output_dir() -> Path:
"""$WORKSPACE_DIR/.qbi/smartq-chat/output/"""
return get_skill_config_dir() / "output"
def get_image_output_dir() -> Path:
"""$WORKSPACE_DIR/output/(仅用于图片产物,便于用户直接查看)"""
return _resolve_work_dir() / "output"
# ---------------------------------------------------------------------------
# 试用凭证常量与解码器
# ---------------------------------------------------------------------------
# 注意:以下为试用凭证的 base64 编码,仅用于避免明文展示,并非加密措施。
# 任何人可通过 base64 解码还原原始值。正式环境请使用用户自有凭证。
_R0 = "__BI_TRAIL_AK__"
_R1 = "__BI_TRAIL_SK__"
def _rv(v: str) -> str:
return base64.b64decode(v).decode("utf-8")
# ---------------------------------------------------------------------------
# 试用欢迎提示
# ---------------------------------------------------------------------------
_TRIAL_WELCOME_MSG = """
============================================================
您的超级数据分析师已就位!
只需自然语言提问,即可智能匹配并分析您的 Excel 或 Quick BI 数据集,
将洞察即时呈现。让复杂分析,从未如此简单。
检测到您尚未配置凭证,我们将自动为您注册试用凭证并进入试用期。
试用到期后,请前往 Quick BI 控制台获取正式凭证:
https://www.aliyun.com/product/quickbi-smart?utm_content=g_1000411205
如需帮助,请扫码加入交流群获取最新资讯:
https://at.umtrack.com/r4Tnme
============================================================
""".strip()
def _print_trial_welcome():
print(f"\n{_TRIAL_WELCOME_MSG}\n", flush=True)
# ---------------------------------------------------------------------------
# 试用到期检测
# ---------------------------------------------------------------------------
TRIAL_EXPIRED_CODE = "AE0579100004"
_TRIAL_EXPIRED_MESSAGE = """
============================================================
小 Q 超级分析助理已陪伴您一周,我们看到您在通过 AI 寻找数据背后的真相,这很了不起。
🕙 试用模式已结束
授权到期后,动态分析将暂告一段落。
💡 其实,您可以更轻松
目前的"文件模式"仍需您手动搬运数据。让 AI 直连企业存量数据资产,实现分析结果自动更新?立即体验完整功能。
🚀 0 元体验,限时加码
现在上阿里云,将额外赠送 30 天全功能体验,解锁企业级安全管控与深度分析引擎,让 AI 洞察更准、更稳。点击下方链接,领取试用:
https://www.aliyun.com/product/quickbi-smart?utm_content=g_1000411205
💬 点击下方链接,进入交流群获取最新资讯:
https://at.umtrack.com/r4Tnme
============================================================
""".strip()
def check_trial_expired(result) -> bool:
"""检查 API 响应是否包含试用到期错误码,如果是则打印提示信息。
Args:
result: API 响应 dict 或原始文本 str。
Returns:
True 表示检测到试用到期,False 表示非此错误。
"""
code = None
if isinstance(result, dict):
code = str(result.get("code", ""))
elif isinstance(result, str):
if TRIAL_EXPIRED_CODE in result:
code = TRIAL_EXPIRED_CODE
if code == TRIAL_EXPIRED_CODE:
print(f"\n{_TRIAL_EXPIRED_MESSAGE}", flush=True)
return True
return False
# ---------------------------------------------------------------------------
# 配置加载
# ---------------------------------------------------------------------------
def _load_yaml(path: Path) -> dict:
"""安全加载 YAML 文件,文件不存在或解析失败返回空 dict。"""
if not path.exists():
return {}
try:
with open(path, "r", encoding="utf-8") as f:
return yaml.safe_load(f) or {}
except Exception:
return {}
def _merge_config(base: dict, override: dict) -> dict:
"""将 override 中的非空值合并到 base 中。"""
for key, value in override.items():
if value is not None and str(value).strip() != "":
base[key] = value
return base
def load_config() -> dict:
"""四层配置加载。
加载优先级(高覆盖低):
1. default_config.yaml(包内默认值)
2. ~/.qbi/config.yaml(QBI 全局配置)
3. $WORKSPACE_DIR/.qbi/smartq-chat/config.yaml(工作目录级配置)
4. ACCESS_TOKEN 环境变量(最高优先级)
"""
# --- 第 1 层:包内默认配置 ---
print(
f"[配置] 包内默认配置路径: {DEFAULT_CONFIG_PATH}"
f" (存在={DEFAULT_CONFIG_PATH.exists()})",
flush=True,
)
config = _load_yaml(DEFAULT_CONFIG_PATH)
# --- 提前读取工作目录级配置,用于判断 save_global_property 开关 ---
skill_config_path = get_skill_config_path()
print(
f"[配置] 工作目录级配置路径: {skill_config_path}"
f" (存在={skill_config_path.exists()})",
flush=True,
)
skill_config = _load_yaml(skill_config_path)
# 判断 save_global_property 开关(仅从工作目录级 > 默认配置两层取值)
_global_enabled = True
for _cfg in (skill_config, config):
_val = _cfg.get("save_global_property")
if _val is not None:
_global_enabled = bool(_val)
break
# --- 第 2 层:QBI 全局配置(受 save_global_property 开关控制) ---
if _global_enabled:
print(
f"[配置] 全局配置路径: {GLOBAL_CONFIG_PATH}"
f" (存在={GLOBAL_CONFIG_PATH.exists()})",
flush=True,
)
global_config = _load_yaml(GLOBAL_CONFIG_PATH)
_merge_config(config, global_config)
else:
print(
f"[配置] save_global_property 为 false,跳过全局配置读取"
f" (路径={GLOBAL_CONFIG_PATH})",
flush=True,
)
# --- 第 3 层:工作目录级配置(已提前读取,直接合并) ---
_merge_config(config, skill_config)
# --- 第 4 层:环境变量覆盖(最高优先级) ---
if config.get("use_env_property"):
access_token = os.environ.get("ACCESS_TOKEN")
if not access_token:
raise ValueError("use_env_property 为 true 时,必须设置 ACCESS_TOKEN 环境变量")
try:
token_data = json.loads(access_token)
except json.JSONDecodeError as exc:
raise ValueError(f"ACCESS_TOKEN 解析失败:{exc}") from exc
env_mapping = {
"qbi_api_key": "api_key",
"qbi_api_secret": "api_secret",
"qbi_server_domain": "server_domain",
"qbi_user_token": "user_token",
}
for env_key, config_key in env_mapping.items():
env_val = token_data.get(env_key)
if env_val:
config[config_key] = env_val
# --- 试用凭证兜底 ---
# 先检查全局配置和工作目录级配置原始文件中是否已有 api_key / api_secret
# (不受 save_global_property 开关影响,避免已配置用户误入试用链路)
# 注意:仅检查 api_key / api_secret,不检查 user_token。
# user_token 可能来自试用自动注册(_persist_user_id force=True),
# 单独存在 user_token 不代表用户已有自有凭证,不应阻止试用凭证填充。
_raw_global_cfg = global_config if _global_enabled else _load_yaml(GLOBAL_CONFIG_PATH)
_has_external_api_creds = (
_raw_global_cfg.get("api_key") or _raw_global_cfg.get("api_secret")
or skill_config.get("api_key") or skill_config.get("api_secret")
)
if _has_external_api_creds:
# 全局配置或工作目录级配置已有 api_key/api_secret,不进入试用链路
print("[配置] 检测到全局配置或工作目录级配置已有 API 凭证,跳过试用凭证填充", flush=True)
else:
missing_key = not config.get("api_key")
missing_secret = not config.get("api_secret")
missing_token = not config.get("user_token")
if missing_key and missing_secret and missing_token:
_print_trial_welcome()
if missing_key:
config["api_key"] = _rv(_R0)
if missing_secret:
config["api_secret"] = _rv(_R1)
return config
# ---------------------------------------------------------------------------
# 服务域名获取
# ---------------------------------------------------------------------------
def get_server_domain(config: Optional[dict] = None) -> str:
config = config or load_config()
return str(config["server_domain"]).rstrip("/")
# ---------------------------------------------------------------------------
# 配置持久化
# ---------------------------------------------------------------------------
def persist_to_skill_config(key: str, value: str):
"""将单个配置项写入工作目录级配置文件。
写入路径:$WORKSPACE_DIR/.qbi/smartq-chat/config.yaml
"""
config_dir = get_skill_config_dir()
config_path = get_skill_config_path()
_persist_to_yaml(
config_dir,
config_path,
key,
value,
header=(
"# Quick BI 用户配置(此文件不受技能包更新影响)\n"
"# 配置优先级:此文件 > ~/.qbi/config.yaml > 包内 default_config.yaml\n"
f"# 路径:{config_path}\n\n"
),
)
def is_global_save_enabled() -> bool:
"""检查 save_global_property 开关是否开启。
仅从工作目录级配置和包内默认配置中读取,不依赖全局配置本身。
默认为 True。
"""
skill_cfg = _load_yaml(get_skill_config_path())
default_cfg = _load_yaml(DEFAULT_CONFIG_PATH)
# 按优先级:工作目录级 > 默认
for cfg in (skill_cfg, default_cfg):
val = cfg.get("save_global_property")
if val is not None:
return bool(val)
return True
def persist_to_global_config(key: str, value: str, *, force: bool = False):
"""将单个配置项写入 QBI 全局配置文件。
写入路径:~/.qbi/config.yaml
当 save_global_property 为 false 且 force=False 时,跳过写入并打印提示。
试用凭证自动注册场景应使用 force=True 强制写入。
"""
if not force and not is_global_save_enabled():
print(
f"[配置] save_global_property 为 false,跳过全局配置写入: {key}",
flush=True,
)
return
_persist_to_yaml(
QBI_HOME,
GLOBAL_CONFIG_PATH,
key,
value,
header=(
"# Quick BI 全局配置(所有 skill 共享,不受技能包更新影响)\n"
"# 所有配置(server_domain、api_key、api_secret、user_token 等)建议放在此文件\n\n"
),
)
def _persist_to_yaml(config_dir: Path, config_path: Path, key: str, value: str, header: str):
"""将单个键值对写入指定 YAML 配置文件。"""
config_dir.mkdir(parents=True, exist_ok=True)
if config_path.exists():
with open(config_path, "r", encoding="utf-8") as f:
lines = f.readlines()
else:
lines = [header]
found = False
for i, line in enumerate(lines):
stripped = line.lstrip()
if stripped.startswith(f"{key}:"):
lines[i] = f"{key}: {value}\n"
found = True
break
if not found:
lines.append(f"{key}: {value}\n")
with open(config_path, "w", encoding="utf-8") as f:
f.writelines(lines)
FILE:scripts/common/device_id.py
# -*- coding: utf-8 -*-
"""
提供稳定的设备唯一标识获取能力,基于持久化 UUID + 平台文件读取实现,
用法:
from device_id import get_device_id, get_device_account_id, get_device_hostname
device_id = get_device_id()
account_id = get_device_account_id()
hostname = get_device_hostname()
"""
from __future__ import annotations
import hashlib
import os
import platform
import random
import string
import uuid
from pathlib import Path
from typing import Optional
# ---------------------------------------------------------------------------
# 公开接口
# ---------------------------------------------------------------------------
def get_device_id() -> str:
"""获取设备ID。
按优先级依次尝试:
1. 持久化文件 ~/.qbi/device_id(已有则复用)
2. Linux /etc/machine-id(纯文件读取,无需外部命令)
3. 首次使用时生成 UUID 并持久化
"""
device_id: Optional[str] = None
# 优先读取已持久化的设备 ID(跨平台通用)
device_id = _read_persisted_device_id()
if device_id:
return device_id
# Linux 环境尝试读取 machine-id(纯文件读取)
if platform.system() == "Linux":
device_id = _read_linux_machine_id()
if device_id:
# 将 machine-id 也持久化,保证后续一致性
_write_persisted_device_id(device_id)
return device_id
# 生成新的 UUID 并持久化
device_id = _create_persisted_device_id()
return device_id
def get_device_account_id() -> str:
"""获取设备唯一标识的 MD5 值,可直接用作 accountId。"""
device_id = get_device_id()
account_id = hashlib.md5(device_id.encode("utf-8")).hexdigest()
print(
f"[设备标识] platform={platform.system()}, "
f"device_id={device_id[:8]}..., accountId(md5)={account_id}",
flush=True,
)
return account_id
def get_device_hostname() -> str:
"""获取当前设备主机名;获取失败时返回带随机后缀的占位名。"""
try:
name = platform.node()
if name:
return name
except Exception:
pass
suffix = "".join(random.choices(string.ascii_lowercase + string.digits, k=5))
return f"host_{suffix}"
# ---------------------------------------------------------------------------
# Linux machine-id(纯文件读取,无外部命令)
# ---------------------------------------------------------------------------
def _read_linux_machine_id() -> Optional[str]:
"""读取 Linux machine-id(systemd 系统 + 旧 dbus 系统)。"""
for path_str in ("/etc/machine-id", "/var/lib/dbus/machine-id"):
try:
p = Path(path_str)
if p.exists():
content = p.read_text().strip()
if content:
return content
except Exception:
pass
return None
# ---------------------------------------------------------------------------
# 持久化设备 ID(跨平台通用方案)
# ---------------------------------------------------------------------------
_QBI_HOME = Path.home() / ".qbi"
_DEVICE_ID_FILE = _QBI_HOME / "device_id"
def _read_persisted_device_id() -> Optional[str]:
"""从本地持久化文件读取设备 ID。"""
try:
if _DEVICE_ID_FILE.exists():
content = _DEVICE_ID_FILE.read_text().strip()
if content:
return content
except Exception:
pass
return None
def _write_persisted_device_id(device_id: str) -> None:
"""将设备 ID 写入持久化文件。"""
try:
_QBI_HOME.mkdir(parents=True, exist_ok=True)
_DEVICE_ID_FILE.write_text(device_id, encoding="utf-8")
except Exception:
pass
def _create_persisted_device_id() -> str:
"""生成并持久化一个 UUID 作为设备标识。"""
device_id = str(uuid.uuid4())
_write_persisted_device_id(device_id)
return device_id
FILE:scripts/common/utils.py
# -*- coding: utf-8 -*-
"""
QBI 小Q问数公共工具(统一版)。
提供配置读取、OpenAPI 签名、HTTP 请求(含 SSE 流式)、SSE 事件解析、
用户自动注册、试用提示以及 multipart 文件上传能力。
"""
from __future__ import annotations
import base64
import hashlib
import hmac
import json
import os
import random
import re
import string
import time
import uuid
from html import unescape
from pathlib import Path
from typing import Any, Dict, Generator, List, Optional, Sequence, Tuple
from urllib import parse
import requests
from .config_loader import (
load_config as read_config, # 向后兼容:其他脚本 from utils import read_config
persist_to_skill_config,
persist_to_global_config,
is_global_save_enabled,
set_workspace_dir,
get_server_domain,
get_skill_config_path,
get_skill_output_dir,
check_trial_expired,
TRIAL_EXPIRED_CODE,
DEFAULT_CONFIG_PATH,
GLOBAL_CONFIG_PATH,
)
BASE_DIR = Path(__file__).resolve().parent
def _should_skip_ssl(url: str) -> bool:
"""当请求域名包含 'test' 时跳过 SSL 证书验证。"""
return "test" in url.lower()
def require_user_id(config: dict) -> str:
"""获取 userId,按优先级:外部config → 自动注册。"""
user_id = config.get("user_token")
if user_id is None or str(user_id).strip() == "":
user_id = _auto_provision_user(config)
config["user_token"] = user_id
else:
user_id = str(user_id).strip()
config["user_token"] = user_id
# 已有 user_token(来自全局/skill/环境变量),无需再持久化
return user_id
# ---------------------------------------------------------------------------
# 用户自动注册
# ---------------------------------------------------------------------------
from .device_id import get_device_account_id as _get_device_account_id # noqa: E402
from .device_id import get_device_hostname as _get_device_hostname # noqa: E402
_ALREADY_IN_ORG_CODE = "AE0150100022"
_NICK_EXISTS_CODE = "AE0150100010"
_last_add_user_code: Optional[str] = None
def _add_user_to_org(account_id: str, hostname: str, config: dict) -> Optional[str]:
"""
调用 POST /openapi/v2/organization/user/addSuer 添加用户到默认组织。
返回系统分配的 userId,失败返回 None。
"""
uri = "/openapi/v2/organization/user/addSuer"
body: Dict[str, Any] = {
"accountId": account_id,
"accountName": hostname,
"nickName": hostname
}
print(f"[用户注册][添加用户] 请求: POST {uri}", flush=True)
print(f"[用户注册][添加用户] 入参: {json.dumps(body, ensure_ascii=False)}", flush=True)
global _last_add_user_code
try:
resp = request_openapi(
"POST",
uri,
json_body=body,
config=config,
)
result = resp.json()
print(f"[用户注册][添加用户] 响应: {json.dumps(result, ensure_ascii=False)}", flush=True)
_last_add_user_code = str(result.get("code", ""))
if result.get("success") is True and isinstance(result.get("data"), dict):
user_id = result["data"].get("userId")
if user_id:
return user_id
except Exception as e:
print(f"[用户注册][添加用户] 异常: {e}", flush=True)
return None
def _query_user_by_account(account_name: str, config: dict) -> Optional[str]:
"""
通过 GET /openapi/v2/organization/user/queryByAccount 查询已存在用户的 userId。
"""
uri = "/openapi/v2/organization/user/queryByAccount"
params = {"account": account_name}
print(f"[用户注册][查询用户] 请求: GET {uri}?account={account_name}", flush=True)
try:
resp = request_openapi("GET", uri, params=params, config=config)
result = resp.json()
print(f"[用户注册][查询用户] 响应: {json.dumps(result, ensure_ascii=False)}", flush=True)
if result.get("success") and isinstance(result.get("data"), dict):
return result["data"].get("userId")
except Exception as e:
print(f"[用户注册][查询用户] 异常: {e}", flush=True)
return None
def _persist_user_id(user_id: str):
"""将自动注册产生的 user_token 持久化到全局配置 ~/.qbi/config.yaml。
写入全局配置而非包内 default_config.yaml,因为后者随技能包更新覆盖。
试用凭证自动注册产生的 user_token 使用 force=True 强制写入,
不受 save_global_property 开关限制。
"""
try:
persist_to_global_config("user_token", user_id, force=True)
print(f"[用户注册] user_token 已写入 {GLOBAL_CONFIG_PATH}", flush=True)
except Exception as e:
print(f"[用户注册] 警告:无法将 user_token 写入 {GLOBAL_CONFIG_PATH}: {e}", flush=True)
def _auto_provision_user(config: dict) -> str:
"""
未配置 user_token 时的自动注册流程:
1. 生成 accountId(设备 ID 的 MD5)和 accountName(主机名)
2. 先通过 accountName 查询用户是否已在组织中,已存在则直接复用 userId(兼容历史用户)
3. 不存在则调用 addUser 添加到组织
4. 将 userId 写入全局配置 ~/.qbi/config.yaml(不受技能包更新影响)
"""
account_id = _get_device_account_id()
hostname = _get_device_hostname()
print(f"[用户注册] 未配置 user_token,开始自动注册 (accountId={account_id}, accountName={hostname})", flush=True)
existing_uid = _query_user_by_account(hostname, config)
if existing_uid:
print(f"[用户注册] 通过 accountName 查询到已有用户,userId={existing_uid}", flush=True)
_persist_user_id(existing_uid)
return existing_uid
print(f"[用户注册] 未查询到已有用户,正在添加 (accountName={hostname}) ...", flush=True)
uid = _add_user_to_org(account_id, hostname, config)
if uid:
print(f"[用户注册] 添加成功,userId={uid}", flush=True)
_persist_user_id(uid)
return uid
if _last_add_user_code in (_ALREADY_IN_ORG_CODE, _NICK_EXISTS_CODE):
print(f"[用户注册] 添加返回已存在(错误码={_last_add_user_code}),重新查询 userId ...", flush=True)
queried_uid = _query_user_by_account(hostname, config)
if queried_uid:
print(f"[用户注册] 查询成功,userId={queried_uid}", flush=True)
_persist_user_id(queried_uid)
return queried_uid
suffixed_name = f"{hostname}_{''.join(random.choices(string.ascii_lowercase + string.digits, k=5))}"
print(f"[用户注册] 使用带后缀名称重试 (accountName={suffixed_name}) ...", flush=True)
uid = _add_user_to_org(account_id, suffixed_name, config)
if uid:
print(f"[用户注册] 重试添加成功,userId={uid}", flush=True)
_persist_user_id(uid)
return uid
raise ValueError(
"自动注册用户失败,请手动在 ~/.qbi/config.yaml 中配置 user_token。"
"可通过 Quick BI 管理控制台获取用户 ID。"
)
# ---------------------------------------------------------------------------
# OpenAPI 签名
# ---------------------------------------------------------------------------
def build_signature(
method: str,
uri: str,
params: Optional[Dict[str, Any]],
access_id: str,
access_key: str,
nonce: str,
timestamp: str,
) -> str:
if not params:
request_query_string = ""
else:
parts: List[str] = []
for key in sorted(params):
value = params[key]
if value is None or value == "":
continue
parts.append(f"{key}={value}")
request_query_string = "\n" + "&".join(parts) if parts else ""
request_headers = (
"\nX-Gw-AccessId:" + access_id
+ "\nX-Gw-Nonce:" + nonce
+ "\nX-Gw-Timestamp:" + timestamp
)
string_to_sign = method.upper() + "\n" + uri + request_query_string + request_headers
encoded_string = parse.quote(string_to_sign, "")
digest = hmac.new(
access_key.encode("utf-8"),
encoded_string.encode("utf-8"),
hashlib.sha256,
).digest()
return base64.b64encode(digest).decode("utf-8")
def build_request_headers(
method: str,
uri: str,
params: Optional[Dict[str, Any]],
*,
content_type: Optional[str] = None,
config: Optional[dict] = None,
) -> Dict[str, str]:
config = config or read_config()
access_id = str(config["api_key"])
access_key = str(config["api_secret"])
nonce = str(uuid.uuid4())
timestamp = str(int(time.time() * 1000))
signature = build_signature(method, uri, params, access_id, access_key, nonce, timestamp)
headers = {
"X-Gw-AccessId": access_id,
"X-Gw-Nonce": nonce,
"X-Gw-Timestamp": timestamp,
"X-Gw-Signature": signature,
"X-Gw-Debug": "true",
}
if content_type:
headers["Content-Type"] = content_type
return headers
# ---------------------------------------------------------------------------
# HTTP 请求
# ---------------------------------------------------------------------------
def request_openapi(
method: str,
uri: str,
*,
params: Optional[Dict[str, Any]] = None,
json_body: Optional[Dict[str, Any]] = None,
form_data: Optional[Dict[str, Any]] = None,
files: Optional[Dict[str, Any]] = None,
sign_params: Optional[Dict[str, Any]] = None,
timeout: int = 60,
config: Optional[dict] = None,
quiet: bool = False,
) -> requests.Response:
"""调用 Quick BI OpenAPI(非流式)。
Args:
files: multipart 文件上传字典(与 requests 的 files 参数一致)。
quiet: 为 True 时仅打印请求方法和状态码,不打印详细请求体与响应体(用于高频轮询场景)。
"""
config = config or read_config()
server_domain = get_server_domain(config)
method = method.upper()
url = server_domain + uri
if sign_params is None and method == "GET":
sign_params = params
if sign_params is None and form_data is not None:
sign_params = form_data
content_type: Optional[str] = None
if json_body is not None:
content_type = "application/json"
elif files is not None:
content_type = None # requests 自动设置 multipart boundary
elif form_data is not None:
content_type = "application/x-www-form-urlencoded"
headers = build_request_headers(method, uri, sign_params, content_type=content_type, config=config)
headers["origin"] = server_domain
kwargs: Dict[str, Any] = {"method": method, "url": url, "headers": headers, "timeout": timeout}
if method == "GET":
kwargs["params"] = params
elif files is not None:
kwargs["data"] = params
kwargs["files"] = files
elif json_body is not None:
kwargs["json"] = json_body
elif form_data is not None:
kwargs["data"] = form_data
else:
kwargs["params"] = params
if not quiet:
print(f"\n>>> API Request: {method} {uri}", flush=True)
if _should_skip_ssl(url):
kwargs["verify"] = False
resp = requests.request(**kwargs)
if not quiet and not resp.ok:
body = ""
try:
body = resp.text[:2000]
except Exception:
pass
print(f"\n<<< API Response: {resp.status_code}", flush=True)
if not resp.ok:
body = ""
try:
body = resp.text[:2000]
except Exception:
pass
check_trial_expired(body)
raise requests.HTTPError(
f"HTTP {resp.status_code} {resp.reason} for {method} {uri}\n响应体: {body}",
response=resp,
)
return resp
def request_openapi_stream(
uri: str,
*,
json_body: Dict[str, Any],
config: Optional[dict] = None,
timeout: int = 600,
) -> Generator[str, None, None]:
"""
POST 流式请求,返回 SSE 事件文本块的生成器。
每次 yield 一个完整的 SSE 事件块(以 ``\\n\\n`` 分隔)。
"""
config = config or read_config()
server_domain = get_server_domain(config)
url = server_domain + uri
headers = build_request_headers("POST", uri, None, content_type="application/json", config=config)
headers["origin"] = server_domain
headers["Accept"] = "text/event-stream"
headers["Accept-Encoding"] = "identity"
headers["Cache-Control"] = "no-cache"
verify = not _should_skip_ssl(url)
with requests.post(url, json=json_body, headers=headers, stream=True, timeout=timeout, verify=verify) as resp:
if not resp.ok:
body = ""
try:
body = resp.text[:2000]
except Exception:
pass
check_trial_expired(body)
raise requests.HTTPError(
f"HTTP {resp.status_code} {resp.reason} for POST {uri}\n响应体: {body}",
response=resp,
)
resp.encoding = "utf-8"
buffer = ""
for line in resp.iter_lines(decode_unicode=True):
if line is None:
continue
buffer += line + "\n"
# SSE 事件以空行分隔(即连续两个换行)
while "\n\n" in buffer:
event_block, buffer = buffer.split("\n\n", 1)
event_block = event_block.strip()
if event_block:
yield event_block
if buffer.strip():
yield buffer.strip()
# ---------------------------------------------------------------------------
# SSE 事件解析
# ---------------------------------------------------------------------------
def parse_sse_event(raw_event: str) -> Dict[str, Any]:
"""
解析单个 SSE 事件块,返回 data 中的 JSON 字典。
事件格式示例::
event:message
data:{"data":"xxx","type":"reasoning"}
"""
lines = raw_event.strip().split("\n")
data_content = ""
for line in lines:
if line.startswith("data:"):
data_content = line[len("data:"):]
break
if not data_content:
return {}
try:
return json.loads(data_content)
except json.JSONDecodeError:
try:
repaired = data_content.replace('\\"', '"').replace('\\\\', '\\')
return json.loads(repaired)
except json.JSONDecodeError:
return {"raw": data_content}
# ---------------------------------------------------------------------------
# 报告模块公共工具(从 quickbi-smartq-data-report 迁移)
# ---------------------------------------------------------------------------
REPORT_URL_PATH_TEMPLATE = "/copilot/qreportReplay?caseId={chat_id}"
UPLOAD_CHAT_TYPE = "manus"
DEFAULT_POLL_INTERVAL_SECONDS = 10.0
DEFAULT_MAX_POLL_SECONDS = 30 * 60
SUPPORTED_UPLOAD_SUFFIXES = {".doc", ".docx", ".xls", ".xlsx", ".csv"}
MAX_UPLOAD_SIZE_BYTES = 10 * 1024 * 1024
def validate_upload_file(file_path: str) -> Path:
"""校验上传文件类型与大小。"""
path = Path(file_path).expanduser().resolve()
if not path.exists():
raise FileNotFoundError(f"文件不存在:{path}")
if not path.is_file():
raise ValueError(f"不是有效文件:{path}")
if path.suffix.lower() not in SUPPORTED_UPLOAD_SUFFIXES:
raise ValueError(
f"不支持的文件类型:{path.suffix},仅支持 doc/docx/xls/xlsx/csv"
)
file_size = path.stat().st_size
if file_size > MAX_UPLOAD_SIZE_BYTES:
raise ValueError(
f"文件超过 10MB:{path.name},当前大小 {file_size} 字节"
)
return path
def upload_reference_file(
file_path: str,
*,
config: Optional[dict] = None,
) -> Dict[str, Any]:
"""上传单个文件并返回原始文件元数据。"""
config = config or read_config()
path = validate_upload_file(file_path)
user_id = require_user_id(config)
params = {"chatType": UPLOAD_CHAT_TYPE, "userId": user_id}
with path.open("rb") as file_handle:
upload_files_dict = {"file": (path.name, file_handle)}
response = request_openapi(
"POST",
"/openapi/v2/qreport/uploadReferenceFile",
params=params,
files=upload_files_dict,
sign_params=None,
timeout=60,
config=config,
)
try:
result = response.json()
except json.JSONDecodeError as exc:
raise ValueError(f"上传文件接口返回了非 JSON 响应:{response.text}") from exc
if not isinstance(result, dict):
raise ValueError(f"上传文件接口返回结构异常:{result}")
data = result.get("data", result)
if isinstance(data, dict) and data.get("fileId"):
result = data
if not result.get("fileId") or not result.get("fileName") or not result.get("fileType"):
raise ValueError(f"上传文件接口未返回完整文件信息:{result}")
return result
def resource_from_reference_file(reference_file: Dict[str, Any]) -> Dict[str, str]:
"""将上传结果映射为会话所需的 resources 结构。"""
return {
"id": str(reference_file.get("fileId", "")),
"title": str(reference_file.get("fileName", "")),
"type": str(reference_file.get("fileType", "")),
}
def build_resources(reference_files: Sequence[Dict[str, Any]]) -> List[Dict[str, str]]:
"""批量把上传结果映射为 resources 列表。"""
return [resource_from_reference_file(item) for item in reference_files]
def normalize_resources(resources: Optional[Sequence[Dict[str, Any]]]) -> List[Dict[str, str]]:
"""兼容上传结果和已映射 resources 两种输入。"""
normalized: List[Dict[str, str]] = []
for item in resources or []:
if {"id", "title", "type"}.issubset(item.keys()):
normalized.append(
{
"id": str(item["id"]),
"title": str(item["title"]),
"type": str(item["type"]),
}
)
elif {"fileId", "fileName", "fileType"}.issubset(item.keys()):
normalized.append(resource_from_reference_file(item))
else:
raise ValueError(f"资源格式不正确:{item}")
return normalized
def _parse_running_task_response(text: str) -> Optional[Dict[str, str]]:
"""解析"运行中任务"的返回格式。"""
pattern = r"当前用户已有运行中的任务.*问题[::]\s*(.+?)[,,]\s*chatId[::]\s*([a-zA-Z0-9\-]+)"
match = re.search(pattern, text)
if match:
return {
"question": match.group(1).strip(),
"chatId": match.group(2).strip(),
"message": text,
}
return None
def normalize_string_response(response_text: str, fallback: str) -> str:
"""把接口返回值规范为纯字符串;非字符串结果回退到 fallback。"""
text = response_text.strip()
if not text:
return fallback
try:
parsed = json.loads(text)
except json.JSONDecodeError:
normalized = text.strip('"').strip("'")
return normalized if normalized else fallback
if isinstance(parsed, str):
normalized = parsed.strip()
return normalized if normalized else fallback
return fallback
_FILE_TYPE_ICON_MAP = {
"doc": "word",
"docx": "word",
"xls": "excel",
"xlsx": "excel",
"csv": "excel",
"pdf": "pdf",
}
def _build_attachment(
resources: List[Dict[str, str]],
upload_results: Optional[Sequence[Dict[str, Any]]] = None,
) -> str:
"""根据上传结果构建 attachment JSON 字符串。"""
files_list: List[Dict[str, Any]] = []
if upload_results:
for item in upload_results:
file_id = str(item.get("fileId", ""))
file_type = str(item.get("fileType", ""))
file_name = str(item.get("fileName", ""))
icon_type = _FILE_TYPE_ICON_MAP.get(file_type, file_type)
ext = f".{file_type}" if file_type else ""
files_list.append({
"fileId": file_id,
"fileType": file_type,
"iconType": icon_type,
"file": {"name": f"{file_name}{ext}"},
"fileName": file_name,
})
elif resources:
for res in resources:
file_id = res.get("id", "")
file_type = res.get("type", "")
file_name = res.get("title", "")
icon_type = _FILE_TYPE_ICON_MAP.get(file_type, file_type)
ext = f".{file_type}" if file_type else ""
files_list.append({
"fileId": file_id,
"fileType": file_type,
"iconType": icon_type,
"file": {"name": f"{file_name}{ext}"},
"fileName": file_name,
})
attachment_obj = {
"resource": {
"files": files_list,
"pages": [],
"cubes": [],
"dashboardFiles": [],
},
"useOnlineSearch": True,
}
return json.dumps(attachment_obj, ensure_ascii=False)
def format_report_url(chat_id: str, config: Optional[dict] = None) -> str:
"""拼出最终回放链接,域名从 config.server_domain 动态获取。"""
config = config or read_config()
server_domain = get_server_domain(config)
return server_domain + REPORT_URL_PATH_TEMPLATE.format(chat_id=chat_id)
def create_report_chat(
question: str,
*,
resources: Optional[Sequence[Dict[str, Any]]] = None,
upload_results: Optional[Sequence[Dict[str, Any]]] = None,
chat_id: Optional[str] = None,
message_id: Optional[str] = None,
config: Optional[dict] = None,
) -> Dict[str, Any]:
"""创建小Q报告会话。
当接口返回"当前用户已有运行中的任务"时,自动切换至该任务并返回对应 chatId。
"""
config = config or read_config()
chat_id = chat_id or str(uuid.uuid4())
message_id = message_id or str(uuid.uuid4())
oapi_user_id = require_user_id(config)
server_domain = get_server_domain(config)
normalized_resources = normalize_resources(resources)
payload: Dict[str, Any] = {
"async": True,
"chatId": chat_id,
"messageId": message_id,
"oapiUserId": oapi_user_id,
"reGenerate": True,
"needReplay": True,
"userQuestion": question,
"needWebSearch": True,
"autoAcceptedPlan": True,
"runningBySkill": True,
"resources": normalized_resources if normalized_resources else [],
"interruptFeedback": "",
"messages": [
{
"role": "user",
"content": question,
}
],
"attachment": _build_attachment(normalized_resources, upload_results),
"bizArgs": {
"qbiHost": server_domain,
},
}
response = request_openapi(
"POST",
"/openapi/v2/smartq/createQreportChat",
json_body=payload,
sign_params=None,
timeout=60,
config=config,
)
response_text = response.text.strip()
running_task = _parse_running_task_response(response_text)
if running_task:
existing_chat_id = running_task["chatId"]
existing_question = running_task["question"]
print(f"\n{'=' * 60}", flush=True)
print(f"[运行中任务] 检测到当前用户已有运行中的任务", flush=True)
print(f"[运行中任务] 已有任务问题:{existing_question}", flush=True)
print(f"[运行中任务] 已有任务chatId:{existing_chat_id}", flush=True)
print(f"[运行中任务] 将使用上述 chatId 进行后续轮询", flush=True)
print(f"{'=' * 60}\n", flush=True)
return {
"chatId": existing_chat_id,
"messageId": message_id,
"reportUrl": format_report_url(existing_chat_id, config),
"request": payload,
"response": response_text,
"statusCode": response.status_code,
"runningTask": True,
"runningTaskInfo": running_task,
}
final_chat_id = normalize_string_response(response_text, chat_id)
return {
"chatId": final_chat_id,
"messageId": message_id,
"reportUrl": format_report_url(final_chat_id, config),
"request": payload,
"response": final_chat_id,
"statusCode": response.status_code,
}
def fetch_report_result(chat_id: str, *, config: Optional[dict] = None) -> str:
"""获取当前 chatId 的累积结果(轮询场景静默)。"""
config = config or read_config()
user_id = require_user_id(config)
query_params = {"chatId": chat_id, "userId": user_id}
response = request_openapi(
"GET",
"/openapi/v2/smartq/qreportChatData",
params=query_params,
sign_params=query_params,
timeout=60,
config=config,
quiet=True,
)
return response.text
def _decode_data_field(item: Dict[str, Any]) -> Dict[str, Any]:
"""对单个事件的 data 字段做二次 JSON 解析,填充 parsedData。"""
event = dict(item)
data_val = event.get("data")
if isinstance(data_val, str) and data_val.strip():
try:
event["parsedData"] = json.loads(data_val)
except json.JSONDecodeError:
event["parsedData"] = data_val
else:
event["parsedData"] = data_val
return event
def _parse_json_array_events(arr: list) -> List[Dict[str, Any]]:
"""将 JSON 数组中的每个元素解码为事件。"""
return [_decode_data_field(item) for item in arr if isinstance(item, dict)]
def parse_report_events(raw_text: str) -> List[Dict[str, Any]]:
"""解析 qreportChatData 返回结果。"""
if not raw_text or not raw_text.strip():
return []
text = raw_text.strip()
try:
parsed = json.loads(text)
if isinstance(parsed, dict):
inner = parsed.get("data")
if isinstance(inner, str) and inner.strip():
try:
inner_parsed = json.loads(inner)
if isinstance(inner_parsed, list):
return _parse_json_array_events(inner_parsed)
except json.JSONDecodeError:
pass
elif isinstance(inner, list):
return _parse_json_array_events(inner)
if isinstance(parsed, list):
return _parse_json_array_events(parsed)
except json.JSONDecodeError:
pass
events: List[Dict[str, Any]] = []
blocks = re.split(r"\r?\n\r?\n", text)
for block in blocks:
lines = [line for line in block.splitlines() if line.strip()]
if not lines:
continue
data_lines: List[str] = []
for line in lines:
if line.startswith("data:"):
data_lines.append(line[len("data:"):].lstrip())
if not data_lines:
continue
raw_data = "\n".join(data_lines)
try:
outer = json.loads(raw_data)
except json.JSONDecodeError:
events.append({"data": raw_data, "type": "unknown", "parsedData": raw_data})
continue
if isinstance(outer, dict):
events.append(_decode_data_field(outer))
return events
def clean_text_fragment(text: str) -> str:
"""把 HTML 片段转成便于终端阅读的纯文本。"""
if not text:
return ""
cleaned = unescape(text)
cleaned = re.sub(r"<br\s*/?>", "\n", cleaned, flags=re.IGNORECASE)
cleaned = re.sub(r"</p\s*>", "\n", cleaned, flags=re.IGNORECASE)
cleaned = re.sub(r"<p[^>]*>", "", cleaned, flags=re.IGNORECASE)
cleaned = re.sub(r"<li[^>]*>", "- ", cleaned, flags=re.IGNORECASE)
cleaned = re.sub(r"</li>", "\n", cleaned, flags=re.IGNORECASE)
cleaned = re.sub(r"<[^>]+>", "", cleaned)
cleaned = re.sub(r"\n{3,}", "\n\n", cleaned)
return cleaned.strip()
def truncate_text(text: str, limit: int = 800) -> str:
"""截断长文本,避免终端输出过长。"""
if len(text) <= limit:
return text
return text[:limit].rstrip() + "…"
_FUNCTION_LABEL = {
"thinking": "思考中...",
"learn": "文件学习",
"refuse": "拒识",
"mainText": "规划步骤",
"interrupt": "等待确认",
}
_SKIP_TYPES = {"heartbeat", "locale", "check", "time"}
def _normalize_streaming_text(text: str) -> str:
"""修复 SSE 流式传输造成的文本断行。"""
if not text:
return text
text = re.sub(r'\*\n\*', '**', text)
text = re.sub(
r'(?<=[^。.!!??\n::;;\))\]】])\n(?=[^\n\s*#\-\d((【\[])',
'',
text,
)
return text
def _streaming_content(events: List[Dict[str, Any]]) -> str:
"""从事件中提取流式文本内容并修复 SSE 断行。"""
return _normalize_streaming_text(clean_text_fragment(_collect_content(events)))
def _event_group_key(event: Dict[str, Any]) -> Tuple[str, str]:
"""返回 (type, function) 用于分组。"""
etype = event.get("type", "")
pd = event.get("parsedData")
func = pd.get("function", "") if isinstance(pd, dict) else ""
return (etype, func)
def _collect_content(events: List[Dict[str, Any]]) -> str:
"""从一组事件中提取并拼接 content 字段。"""
parts: List[str] = []
for e in events:
pd = e.get("parsedData")
if isinstance(pd, dict):
c = pd.get("content", "")
if c:
parts.append(c)
elif isinstance(pd, str) and pd:
parts.append(pd)
else:
raw = e.get("data", "")
if isinstance(raw, str) and raw:
parts.append(raw)
return "".join(parts)
def group_events(events: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""将连续同 (type, function) 的事件合并为分组。"""
if not events:
return []
groups: List[Dict[str, Any]] = []
cur_key = _event_group_key(events[0])
cur_events: List[Dict[str, Any]] = [events[0]]
for event in events[1:]:
key = _event_group_key(event)
if key == cur_key:
cur_events.append(event)
else:
groups.append({
"type": cur_key[0],
"function": cur_key[1],
"events": cur_events,
})
cur_key = key
cur_events = [event]
groups.append({
"type": cur_key[0],
"function": cur_key[1],
"events": cur_events,
})
return groups
def summarize_structured_chart(parsed_payload: Dict[str, Any]) -> List[str]:
"""生成结构化图表的终端摘要。"""
chart = parsed_payload.get("chart") or {}
if isinstance(chart, list):
data_list = chart
elif isinstance(chart, dict):
data_list = chart.get("dataList") or []
else:
data_list = []
if not data_list:
return ["[structuredChart] 空结果"]
lines: List[str] = []
for index, item in enumerate(data_list, start=1):
title = item.get("title") or item.get("chartType") or item.get("id") or f"chart-{index}"
rows = item.get("data") or []
lines.append(f"[structuredChart] {index}. {title} | rows={len(rows)}")
if rows:
sample = json.dumps(rows[0], ensure_ascii=False)
lines.append(f" sample: {truncate_text(sample, 500)}")
return lines
def summarize_unstructured_chart(parsed_payload: Dict[str, Any]) -> List[str]:
"""生成非结构化图表/大纲的终端摘要。"""
chart = parsed_payload.get("chart") or []
if not chart:
return ["[unStructuredChart] 空结果"]
lines: List[str] = []
for index, item in enumerate(chart, start=1):
content = item.get("data") or item.get("content") or ""
title = item.get("title") or item.get("purpose") or item.get("id") or f"chunk-{index}"
lines.append(f"[unStructuredChart] {index}. {title}")
if content:
lines.append(truncate_text(clean_text_fragment(str(content)), 800))
return lines
def _render_token_info(parsed_data: Dict[str, Any], label: str) -> Tuple[List[str], Dict[str, Any]]:
"""提取 token 用量并格式化输出。"""
info = {
"promptTokens": parsed_data.get("promptTokens"),
"totalTokens": parsed_data.get("totalTokens"),
"completionTokens": parsed_data.get("completionTokens"),
}
line = (
f"[{label}] "
f"prompt={info['promptTokens']} total={info['totalTokens']} "
f"completion={info['completionTokens']}"
)
return [line], info
_SUBSTEP_LABEL = {
"onlineSearchResult": "联网搜索结果",
"knowledgeBaseResult": "知识库结果",
}
def _render_substep(parsed_data: Dict[str, Any]) -> List[str]:
"""渲染 subStep 事件。"""
if not isinstance(parsed_data, dict):
return [f" [subStep] {truncate_text(str(parsed_data), 1000)}"]
function_name = parsed_data.get("function", "subStep")
label = _SUBSTEP_LABEL.get(function_name, function_name)
prefix = f"[subStep:{label}]" if label else "[subStep]"
raw_lines: List[str] = []
if function_name == "onlineSearchResult":
web_items = parsed_data.get("webItems") or []
raw_lines.append(f"{prefix} {len(web_items)} 条网页结果")
for idx, item in enumerate(web_items[:3], 1):
title = item.get("title", "")
link = item.get("link", "")
host = item.get("hostName", "")
display_title = truncate_text(title, 100)
if link:
raw_lines.append(f" {idx}. [{display_title}]({link}) — {host}")
elif host:
raw_lines.append(f" {idx}. {display_title} — {host}")
else:
raw_lines.append(f" {idx}. {display_title}")
elif function_name == "knowledgeBaseResult":
kb_items = parsed_data.get("knowledgeItems") or []
raw_lines.append(f"{prefix} {len(kb_items)} 条知识库结果")
for idx, item in enumerate(kb_items[:3], 1):
name = item.get("resourceName", "")
raw_lines.append(f" {idx}. {truncate_text(name, 120)}")
elif function_name == "structuredChart":
raw_lines.extend(summarize_structured_chart(parsed_data))
elif function_name == "unStructuredChart":
raw_lines.extend(summarize_unstructured_chart(parsed_data))
elif function_name == "usedToken":
return []
elif function_name == "text":
content = parsed_data.get("content", "")
if content:
cleaned = truncate_text(_normalize_streaming_text(clean_text_fragment(str(content))), 1500)
raw_lines.append(f"{prefix} {cleaned}")
else:
raw_lines.append(prefix)
else:
content = parsed_data.get("content", "")
if content:
raw_lines.append(prefix)
raw_lines.append(truncate_text(clean_text_fragment(str(content)), 1500))
else:
raw_lines.append(prefix)
return [f" {line}" for line in raw_lines]
def render_event_group(
group: Dict[str, Any],
continuation: bool = False,
) -> Tuple[List[str], bool, Optional[Dict[str, Any]], Optional[str]]:
"""将一个事件分组渲染为终端输出。
Returns: (lines, finished, token_info, error_msg)
"""
etype = group["type"]
func = group["function"]
events = group["events"]
lines: List[str] = []
finished = False
token_info: Optional[Dict[str, Any]] = None
error_msg: Optional[str] = None
if etype in _SKIP_TYPES:
return lines, finished, token_info, error_msg
if etype == "trace":
text = str(_collect_content(events)).strip()
if text and not continuation:
lines.append(f"[trace] {text}")
return lines, finished, token_info, error_msg
if etype == "error":
error_msg = clean_text_fragment(_collect_content(events))
if not error_msg:
error_msg = "未知错误"
lines.append(f"[error] {error_msg}")
finished = True
return lines, finished, token_info, error_msg
if etype == "plan":
if func == "usedToken":
return lines, finished, token_info, error_msg
label = _FUNCTION_LABEL.get(func, func or "plan")
content = _streaming_content(events)
if content:
if not continuation:
lines.append(label)
lines.append(content)
return lines, finished, token_info, error_msg
if etype == "schedule":
if func == "usedToken":
pass
else:
content = _streaming_content(events)
if content:
lines.append(f"[schedule] {truncate_text(content, 1500)}")
return lines, finished, token_info, error_msg
if etype == "step":
for e in events:
pd = e.get("parsedData")
if not isinstance(pd, dict):
continue
f = pd.get("function", "")
if f == "usedToken":
continue
else:
title = pd.get("title", "")
desc = pd.get("desc", "")
lines.append(f"Step: {title}" if title else "Step")
if desc and desc != title:
lines.append(f" {desc}")
return lines, finished, token_info, error_msg
if etype == "actionThinking":
content = _streaming_content(events)
if content:
if not continuation:
lines.append("思考中...")
lines.append(content)
return lines, finished, token_info, error_msg
if etype == "subStep":
for e in events:
pd = e.get("parsedData")
if isinstance(pd, dict):
lines.extend(_render_substep(pd))
return lines, finished, token_info, error_msg
if etype == "qreport":
if func in ("qreportUsedToken", "usedToken"):
pd = events[-1].get("parsedData", {})
if isinstance(pd, dict):
_, token_info = _render_token_info(pd, func)
if func == "qreportUsedToken":
finished = True
return lines, finished, token_info, error_msg
if func == "onlineSearchResult":
for e in events:
pd = e.get("parsedData")
if isinstance(pd, dict):
web_items = pd.get("webItems") or []
lines.append(f"[qreport:联网搜索结果] {len(web_items)} 条网页结果")
for idx, item in enumerate(web_items[:3], 1):
title = item.get("title", "")
link = item.get("link", "")
host = item.get("hostName", "")
display_title = truncate_text(title, 100)
if link:
lines.append(f" {idx}. [{display_title}]({link}) — {host}")
elif host:
lines.append(f" {idx}. {display_title} — {host}")
else:
lines.append(f" {idx}. {display_title}")
return lines, finished, token_info, error_msg
if func == "structuredChart":
for e in events:
pd = e.get("parsedData")
if isinstance(pd, dict):
lines.extend(summarize_structured_chart(pd))
return lines, finished, token_info, error_msg
if func == "unStructuredChart":
for e in events:
pd = e.get("parsedData")
if isinstance(pd, dict):
lines.extend(summarize_unstructured_chart(pd))
return lines, finished, token_info, error_msg
content = _streaming_content(events)
if content and not continuation:
label = func or "qreport"
lines.append(f"[qreport:{label}] {truncate_text(content, 2000)}")
return lines, finished, token_info, error_msg
if etype == "finish":
return lines, finished, token_info, error_msg
content = _streaming_content(events)
if content:
lines.append(f"[{etype or 'message'}] {truncate_text(content, 1000)}")
return lines, finished, token_info, error_msg
def poll_report_result(
chat_id: str,
*,
poll_interval: float = DEFAULT_POLL_INTERVAL_SECONDS,
max_wait_seconds: int = DEFAULT_MAX_POLL_SECONDS,
show_progress: bool = True,
config: Optional[dict] = None,
) -> Dict[str, Any]:
"""轮询小Q报告结果,直到 qreportUsedToken 或 error 出现。"""
config = config or read_config()
start_time = time.time()
processed_events = 0
result: Dict[str, Any] = {
"chatId": chat_id,
"reportUrl": None,
"finished": False,
"error": None,
"eventCount": 0,
"tokenInfo": None,
}
prev_group_key: Tuple[str, str] = ("", "")
prev_output_key: Tuple[str, str] = ("", "")
last_new_event_time = time.time()
idle_hint_printed = False
_IDLE_HINT_SECONDS = 9.0
while True:
elapsed = time.time() - start_time
if elapsed > max_wait_seconds:
raise TimeoutError(f"轮询超时:已等待 {max_wait_seconds // 60} 分钟")
raw_text = fetch_report_result(chat_id, config=config)
events = parse_report_events(raw_text)
if processed_events > len(events):
processed_events = 0
new_events = events[processed_events:]
if not new_events:
if (
show_progress
and not idle_hint_printed
and time.time() - last_new_event_time >= _IDLE_HINT_SECONDS
):
print(flush=True)
print("结果生成中,请耐心等待", flush=True)
print(flush=True)
idle_hint_printed = True
time.sleep(poll_interval)
continue
last_new_event_time = time.time()
idle_hint_printed = False
groups = group_events(new_events)
batch_finished = False
saw_qreport_used_token = False
for idx, grp in enumerate(groups):
grp_key = (grp["type"], grp["function"])
is_continuation = (idx == 0 and grp_key == prev_group_key)
lines, finished, token_info, error_msg = render_event_group(grp, continuation=is_continuation)
if show_progress and lines:
curr_key = (grp["type"], grp["function"])
if prev_output_key[0] and curr_key != prev_output_key:
if not (prev_output_key[0] == "step" and curr_key[0] == "subStep"):
if not (prev_output_key[0] == "subStep" and curr_key[0] == "subStep"):
print(flush=True)
for line in lines:
print(line, flush=True)
prev_output_key = curr_key
result["eventCount"] += len(grp["events"])
if token_info:
result["tokenInfo"] = token_info
if error_msg:
result["error"] = error_msg
if finished:
batch_finished = True
if (
not error_msg
and grp["type"] == "qreport"
and grp["function"] == "qreportUsedToken"
):
saw_qreport_used_token = True
if groups:
prev_group_key = (groups[-1]["type"], groups[-1]["function"])
processed_events = len(events)
if batch_finished:
result["finished"] = True
success_with_url = saw_qreport_used_token and result.get("error") is None
if success_with_url:
result["reportUrl"] = format_report_url(chat_id, config)
else:
result["reportUrl"] = None
if show_progress:
print(flush=True)
if result.get("error"):
print(f"报告生成失败:{result['error']}", flush=True)
elif success_with_url and result.get("reportUrl"):
url = result["reportUrl"]
print("📊 报告链接", flush=True)
print(f"在线数据报告:[点击查看完整报告]({url})", flush=True)
return result
time.sleep(poll_interval)
def upload_files(
file_paths: Sequence[str],
*,
config: Optional[dict] = None,
) -> Dict[str, Any]:
"""批量上传文件并返回上传结果和 resources 列表。"""
config = config or read_config()
upload_results = [upload_reference_file(path, config=config) for path in file_paths]
resources = build_resources(upload_results)
return {
"uploadResults": upload_results,
"resources": resources,
}
FILE:scripts/dashboard/__init__.py
FILE:scripts/dashboard/dataportal_resolver.py
# -*- coding: utf-8 -*-
"""
数据门户 URL 到 pageId 解析器
通过 /openapi/v2/dataportal/query 接口获取菜单树,
根据 menuId 或 homeMenu 逻辑找到对应的仪表板 pageId。
"""
from __future__ import annotations
import re
from typing import Optional, Dict, Any, List, Tuple
from .quickbi_openapi import call_quickbi_api
# ---------------------------------------------------------------------------
# Menu 类型定义(对应 find_menu.ts)
# ---------------------------------------------------------------------------
# content.type 支持的类型
CONTENT_TYPE_REPORT = 'report' # 仪表板
CONTENT_TYPE_EXCEL = 'excel' # 电子表格
CONTENT_TYPE_URL = 'url' # 外部链接
CONTENT_TYPE_PAGE = 'page' # 页面
CONTENT_TYPE_FORM = 'form' # 数据填报
CONTENT_TYPE_DOWNLOAD = 'download' # 自助取数
CONTENT_TYPE_CUBE = 'Cube' # 数据集
CONTENT_TYPE_ANALYSIS = 'analysis' # 即席分析
CONTENT_TYPE_SCREEN = 'screen' # 数据大屏
# 支持的仪表板类型(只有 report 才能提取 pageId)
SUPPORTED_CONTENT_TYPES = {CONTENT_TYPE_REPORT}
# 内容类型中文描述
CONTENT_TYPE_NAMES = {
CONTENT_TYPE_REPORT: '仪表板',
CONTENT_TYPE_EXCEL: '电子表格',
CONTENT_TYPE_URL: '外部链接',
CONTENT_TYPE_PAGE: '页面',
CONTENT_TYPE_FORM: '数据填报',
CONTENT_TYPE_DOWNLOAD: '自助取数',
CONTENT_TYPE_CUBE: '数据集',
CONTENT_TYPE_ANALYSIS: '即席分析',
CONTENT_TYPE_SCREEN: '数据大屏',
}
# ---------------------------------------------------------------------------
# URL 解析
# ---------------------------------------------------------------------------
def extract_dataportal_info(url: str) -> Dict[str, Optional[str]]:
"""
从数据门户 URL 中提取 productId 和 menuId
Args:
url: 数据门户 URL
Returns:
{"productId": "xxx", "menuId": "yyy" 或 None}
Raises:
ValueError: 如果无法提取 productId
"""
product_match = re.search(r'productId=([a-zA-Z0-9-]+)', url)
menu_match = re.search(r'menuId=([a-zA-Z0-9-]+)', url)
if not product_match:
raise ValueError(f"无法从数据门户 URL 中提取 productId: {url}")
return {
"productId": product_match.group(1),
"menuId": menu_match.group(1) if menu_match else None
}
# ---------------------------------------------------------------------------
# 菜单树遍历(移植自 find_menu.ts)
# ---------------------------------------------------------------------------
def is_hide_menu(menu: Dict[str, Any], template_name: str = 'default') -> bool:
"""判断菜单是否隐藏"""
config_key = 'mobileConfig' if template_name == 'mobile' else 'pcConfig'
config = menu.get(config_key, {})
return config.get('isHide', False) if config else False
def find_first_child(
menu: Dict[str, Any],
template_name: str = 'default',
with_content: bool = True
) -> Optional[Dict[str, Any]]:
"""
递归查找第一个有效的菜单叶子节点(移植自 find_menu.ts)
查找条件:
- 非空节点(isEmpty = False)
- 非隐藏节点
- 无子节点(叶子节点)
- 有内容且内容类型不是 url
Args:
menu: 菜单节点
template_name: 模板名称 'default' 或 'mobile'
with_content: 是否要求有 content
Returns:
找到的菜单节点或 None
"""
is_hide = is_hide_menu(menu, template_name)
children = menu.get('children', [])
content = menu.get('content', [])
is_empty = menu.get('isEmpty', False)
# 如果是有效的叶子节点
if (
not is_empty
and not is_hide
and not children
):
if with_content:
# 需要有 content 且 type 不是 url
if content and content[0].get('type') and content[0].get('type') != 'url':
return menu
else:
return menu
# 如果当前节点被隐藏,不再遍历子节点
if is_hide or not children:
return None
# 递归查找子节点
for child_menu in children:
result = find_first_child(child_menu, template_name, with_content)
if result:
return result
return None
def get_menu_by(
menu_root: List[Dict[str, Any]],
predicate: callable
) -> Optional[Dict[str, Any]]:
"""
在菜单树中查找满足条件的菜单节点
Args:
menu_root: 菜单根节点列表
predicate: 判断函数
Returns:
找到的菜单节点或 None
"""
def visit(menus: List[Dict[str, Any]]) -> Optional[Dict[str, Any]]:
for menu in menus:
if predicate(menu):
return menu
children = menu.get('children', [])
if children:
result = visit(children)
if result:
return result
return None
return visit(menu_root)
def get_selected_menu(
menus: List[Dict[str, Any]],
menu_id: Optional[str] = None,
template_name: str = 'default'
) -> Optional[Dict[str, Any]]:
"""
获取选中的菜单(移植自 find_menu.ts 的 getSelectedMenuId 逻辑)
逻辑:
1. 如果有 menuId,直接查找该菜单
2. 如果没有 menuId,选中首页(isHome=True)
3. 如果首页是空节点,选择第一个非空节点
Args:
menus: 菜单列表
menu_id: 可选的菜单 ID
template_name: 模板名称
Returns:
选中的菜单节点或 None
"""
# 1. 如果有 menuId,直接查找
if menu_id:
return get_menu_by(menus, lambda m: m.get('id') == menu_id)
# 2. 没有 menuId,查找首页
home_menu = get_menu_by(menus, lambda m: m.get('isHome', False))
if home_menu:
# 如果首页是空节点,找第一个非空子节点
first_child = find_first_child(home_menu, template_name)
if first_child:
return first_child
# 3. 没有首页或首页为空,找整个菜单树的第一个非空节点
virtual_root = {'isEmpty': True, 'children': menus}
first_menu = find_first_child(virtual_root, template_name)
return first_menu
def extract_page_id_from_menu(menu: Dict[str, Any]) -> Tuple[bool, str, Optional[str]]:
"""
从菜单节点中提取 pageId
Args:
menu: 菜单节点
Returns:
(success, message, page_id)
- success: 是否成功
- message: 成功或错误信息
- page_id: 仪表板 pageId(成功时有值)
"""
content = menu.get('content', [])
if not content:
return False, "菜单没有关联任何内容", None
first_content = content[0]
content_type = first_content.get('type')
content_id = first_content.get('id')
content_name = first_content.get('name', '未命名')
if content_type not in SUPPORTED_CONTENT_TYPES:
type_name = CONTENT_TYPE_NAMES.get(content_type, content_type)
return (
False,
f"该菜单关联的是「{type_name}」类型({content_name}),"
f"不是仪表板。请提供指向仪表板的数据门户链接。",
None
)
if not content_id:
return False, f"菜单「{content_name}」未关联有效的仪表板 ID", None
return True, f"成功获取仪表板「{content_name}」", content_id
# ---------------------------------------------------------------------------
# OpenAPI 调用
# ---------------------------------------------------------------------------
def query_dataportal_menus(
host: str,
access_id: str,
access_key: str,
dataportal_id: str
) -> Dict[str, Any]:
"""
查询数据门户菜单树
调用 /openapi/v2/dataportal/query 接口获取菜单结构。
Args:
host: QuickBI 服务域名
access_id: API Key
access_key: API Secret
dataportal_id: 数据门户 ID(即 URL 中的 productId)
Returns:
{
"success": True,
"menus": [...菜单树...]
}
或
{
"success": False,
"error_code": "错误码",
"error_message": "错误信息"
}
"""
uri = "/openapi/v2/dataportal/query"
form_params = {
"dataPortalId": dataportal_id
}
try:
result = call_quickbi_api(
host=host,
uri=uri,
access_id=access_id,
access_key=access_key,
method="GET",
form_params=form_params
)
success_val = result.get("success")
is_success = success_val is True or success_val == "true"
if is_success:
data = result.get("data", {})
# 菜单树在 data.menu 或 data.menus 中
menus = data.get("menu") or data.get("menus") or []
if isinstance(menus, dict):
# 如果返回的是单个菜单对象,包装成列表
menus = [menus]
return {
"success": True,
"menus": menus,
"dataportal_name": data.get("name", "")
}
else:
return {
"success": False,
"error_code": str(result.get("errorCode", result.get("code", "UNKNOWN"))),
"error_message": result.get("errorMsg", result.get("message", "未知错误"))
}
except Exception as e:
return {
"success": False,
"error_code": "CONNECTION_ERROR",
"error_message": f"查询数据门户失败: {str(e)}"
}
def resolve_dataportal_page_id(
host: str,
access_id: str,
access_key: str,
dataportal_url: str
) -> Dict[str, Any]:
"""
从数据门户 URL 解析出仪表板 pageId
完整流程:
1. 从 URL 提取 productId 和 menuId
2. 调用 /openapi/v2/dataportal/query 获取菜单树
3. 根据 menuId 或 homeMenu 逻辑找到目标菜单
4. 验证菜单内容类型为 report,提取 pageId
Args:
host: QuickBI 服务域名
access_id: API Key
access_key: API Secret
dataportal_url: 数据门户 URL
Returns:
{
"success": True,
"page_id": "仪表板 pageId",
"dashboard_name": "仪表板名称",
"dataportal_name": "数据门户名称"
}
或
{
"success": False,
"error_code": "错误码",
"error_message": "错误信息"
}
"""
# 1. 从 URL 提取参数
try:
url_info = extract_dataportal_info(dataportal_url)
product_id = url_info["productId"]
menu_id = url_info["menuId"]
except ValueError as e:
return {
"success": False,
"error_code": "INVALID_URL",
"error_message": str(e)
}
# 2. 查询菜单树
query_result = query_dataportal_menus(host, access_id, access_key, product_id)
if not query_result["success"]:
return query_result
menus = query_result["menus"]
dataportal_name = query_result.get("dataportal_name", "")
if not menus:
return {
"success": False,
"error_code": "EMPTY_MENU",
"error_message": "数据门户没有配置任何菜单"
}
# 3. 查找目标菜单
target_menu = get_selected_menu(menus, menu_id)
if not target_menu:
if menu_id:
return {
"success": False,
"error_code": "MENU_NOT_FOUND",
"error_message": f"在数据门户中找不到 menuId={menu_id} 对应的菜单"
}
else:
return {
"success": False,
"error_code": "NO_DEFAULT_MENU",
"error_message": "数据门户没有可用的默认菜单(首页或第一个非空菜单)"
}
# 4. 从菜单中提取 pageId
success, message, page_id = extract_page_id_from_menu(target_menu)
if not success:
return {
"success": False,
"error_code": "INVALID_CONTENT_TYPE",
"error_message": message
}
# 获取仪表板名称
content = target_menu.get('content', [])
dashboard_name = content[0].get('name', '') if content else ''
return {
"success": True,
"page_id": page_id,
"dashboard_name": dashboard_name,
"dataportal_name": dataportal_name,
"menu_id": target_menu.get('id'),
"menu_title": target_menu.get('title', '')
}
# ---------------------------------------------------------------------------
# 测试入口
# ---------------------------------------------------------------------------
if __name__ == "__main__":
import argparse as _argparse
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
from common.config_loader import load_config, set_workspace_dir
_parser = _argparse.ArgumentParser(description="数据门户 URL 解析器")
_parser.add_argument("--url", default="https://bi.aliyun.com/product/view.htm?productId=xxx&menuId=yyy",
help="数据门户 URL")
_parser.add_argument("--workspace-dir", default=None, help="用户工作目录路径")
_args = _parser.parse_args()
if _args.workspace_dir:
set_workspace_dir(_args.workspace_dir)
config = load_config()
# 检查必要配置
_api_key = config.get("api_key")
_api_secret = config.get("api_secret")
if not _api_key or not _api_secret:
print("缺少必要配置项: api_key / api_secret,请检查配置文件")
sys.exit(1)
# 测试 URL
test_url = _args.url
result = resolve_dataportal_page_id(
host=config.get("server_domain", "https://quickbi-public.cn-hangzhou.aliyuncs.com"),
access_id=_api_key,
access_key=_api_secret,
dataportal_url=test_url
)
print(result)
FILE:scripts/dashboard/fetch_dashboard_data.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
仪表板数据一站式获取脚本
封装完整流程:
1. 加载配置
2. 解析 URL 获取 pageId(支持仪表板 URL 和数据门户 URL)
3. 调用 OpenAPI 获取仪表板大 JSON
4. 调用 Node.js 脚本解析 JSON 结构
5. 获取数据集名称映射
使用方式:
from scripts.fetch_dashboard_data import fetch_dashboard_data
result = fetch_dashboard_data(user_input_url)
if result["success"]:
dashboardData = result["dashboardData"]
datasetNameMap = result["datasetNameMap"]
"""
import json
import os
import subprocess
import sys
import tempfile
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
from common.config_loader import load_config, set_workspace_dir
from dashboard.quickbi_openapi import (
batch_get_dataset_schema,
extract_dataportal_ids,
extract_page_id,
get_dashboard_json,
get_dataportal_page_id,
is_dataportal_url,
validate_and_prepare_dashboard,
)
def fetch_dashboard_data(url: str, config: dict = None) -> dict:
"""
一站式获取仪表板数据(包含解析和数据集名称)
Args:
url: 仪表板 URL 或数据门户 URL
config: 可选,配置字典。不传则自动加载
Returns:
{
"success": True,
"dashboardData": {...}, # 解析后的仪表板结构
"datasetNameMap": {...}, # cubeId -> cubeName 映射
"pageId": "xxx", # 仪表板 pageId
"preparedUrl": "xxx", # 预处理后的 URL
"error": None
}
或
{
"success": False,
"error": "错误信息",
"error_code": "错误码"
}
"""
# 1. 加载配置
if config is None:
config = load_config()
if not config:
return {
"success": False,
"error": "配置加载失败,请检查工作目录级配置或全局配置 ~/.qbi/config.yaml",
"error_code": "CONFIG_LOAD_ERROR"
}
# 检查必要配置项
required_keys = ["server_domain", "api_key", "api_secret", "user_token"]
missing_keys = [k for k in required_keys if not config.get(k)]
if missing_keys:
return {
"success": False,
"error": f"配置缺失: {', '.join(missing_keys)}",
"error_code": "CONFIG_INCOMPLETE"
}
# 2. 解析 URL 获取 pageId
try:
if is_dataportal_url(url):
# 数据门户 URL:需要先获取关联的仪表板 pageId
portal_ids = extract_dataportal_ids(url)
portal_result = get_dataportal_page_id(
host=config["server_domain"],
access_id=config["api_key"],
access_key=config["api_secret"],
dataportal_id=portal_ids["productId"],
menu_id=portal_ids["menuId"]
)
if not portal_result["success"]:
return {
"success": False,
"error": f"获取数据门户关联仪表板失败: {portal_result['error_message']}",
"error_code": portal_result.get("error_code", "PORTAL_ERROR")
}
page_id = portal_result["page_id"]
else:
# 普通仪表板 URL:直接提取 pageId
page_id = extract_page_id(url)
except ValueError as e:
return {
"success": False,
"error": str(e),
"error_code": "URL_PARSE_ERROR"
}
# 3. 预校验
validate_result = validate_and_prepare_dashboard(
host=config["server_domain"],
access_id=config["api_key"],
access_key=config["api_secret"],
page_id=page_id,
user_id=config["user_token"]
)
if validate_result.get("success") == False or validate_result.get("success") == "false":
return {
"success": False,
"error": f"预校验失败: {validate_result.get('error_message', '未知错误')}",
"error_code": validate_result.get("error_code", "VALIDATE_ERROR")
}
prepared_url = validate_result.get("url", url)
# 4. 获取仪表板大 JSON
dashboard_result = get_dashboard_json(
host=config["server_domain"],
access_id=config["api_key"],
access_key=config["api_secret"],
page_id=page_id,
user_id=config["user_token"]
)
if not dashboard_result["success"]:
return {
"success": False,
"error": f"获取仪表板数据失败: {dashboard_result.get('error_message', '未知错误')}",
"error_code": dashboard_result.get("error_code", "DASHBOARD_ERROR")
}
raw_dashboard_json = dashboard_result["data"]
# 5. 调用 Node.js 脚本解析
script_dir = Path(__file__).parent
js_script = script_dir / "get_dashboard_json.js"
if not js_script.exists():
return {
"success": False,
"error": f"解析脚本不存在: {js_script}",
"error_code": "SCRIPT_NOT_FOUND"
}
# 先获取数据集schema信息,用于字段别名解析
# 从 dataFromId 字段获取数据集ID
cube_ids = list(set([
comp.get("dataFromId")
for comp in raw_dashboard_json.get("components", [])
if comp.get("dataFromId")
]))
dataset_schema_map = {}
if cube_ids:
dataset_result = batch_get_dataset_schema(
host=config["server_domain"],
access_id=config["api_key"],
access_key=config["api_secret"],
cube_ids=cube_ids
)
if dataset_result["success"]:
for cube_id, info in dataset_result["data"].items():
cube_schema = info.get("data", {}).get("cubeSchema", {})
fields = cube_schema.get("fields", [])
dataset_schema_map[cube_id] = {
"cubeName": cube_schema.get("caption", cube_id),
"fields": fields
}
# 准备输入数据:包含仪表板JSON和schema信息
input_data = {
"dashboardJson": raw_dashboard_json,
"datasetSchemaMap": dataset_schema_map
}
with tempfile.NamedTemporaryFile(mode='w', suffix='.json', delete=False, encoding='utf-8') as f:
json.dump(input_data, f)
temp_file = f.name
try:
# 使用 context manager 确保文件句柄正确关闭,避免 Windows 上文件锁导致后续删除失败
with open(temp_file, 'r', encoding='utf-8') as stdin_f:
result = subprocess.run(
['node', str(js_script)],
stdin=stdin_f,
capture_output=True,
text=True,
encoding='utf-8',
cwd=str(script_dir.parent)
)
if result.returncode != 0:
return {
"success": False,
"error": f"解析脚本执行失败: {result.stderr}",
"error_code": "SCRIPT_EXEC_ERROR"
}
dashboard_data = json.loads(result.stdout)
if not dashboard_data.get('success'):
return {
"success": False,
"error": f"解析失败: {dashboard_data.get('error', '未知错误')}",
"error_code": "PARSE_ERROR"
}
except json.JSONDecodeError as e:
return {
"success": False,
"error": f"解析结果格式错误: {str(e)}",
"error_code": "JSON_DECODE_ERROR"
}
finally:
try:
os.unlink(temp_file)
except OSError:
pass # Windows 上文件可能仍被锁定,忽略删除失败
# 6. 从解析结果中提取数据集名称映射
# datasetSchemaMap 已由 Node.js 脚本在解析过程中使用,并保留在返回结果中
dataset_name_map = {}
if dashboard_data.get("datasetSchemaMap"):
for cube_id, schema_info in dashboard_data["datasetSchemaMap"].items():
dataset_name_map[cube_id] = schema_info.get("cubeName", cube_id)
# 构建标准仪表板预览页 URL
dashboard_url = f"{config['server_domain']}/dashboard/view/pc.htm?pageId={page_id}"
return {
"success": True,
"dashboardData": dashboard_data,
"datasetNameMap": dataset_name_map,
"pageId": page_id,
"dashboardUrl": dashboard_url, # 标准仪表板预览页地址
"preparedUrl": prepared_url, # 原始预处理 URL(保留兼容)
"error": None
}
if __name__ == "__main__":
import argparse as _argparse
_parser = _argparse.ArgumentParser(description="仪表板数据一站式获取")
_parser.add_argument("url", help="仪表板 URL 或数据门户 URL")
_parser.add_argument("--workspace-dir", default=None, help="用户工作目录路径")
_args = _parser.parse_args()
if _args.workspace_dir:
set_workspace_dir(_args.workspace_dir)
result = fetch_dashboard_data(_args.url)
print(json.dumps(result, ensure_ascii=False, indent=2))
FILE:scripts/dashboard/get_dashboard_json.js
/**
* 解析仪表板 JSON 数据
*
* 使用方式:
* - 作为 Python 脚本调用:node get_dashboard_json.js < dashboard.json
* - 或在 Node.js 中导入使用
*
* 输入:通过 OpenAPI 获取的仪表板原始 JSON 数据
* 输出:解析后的仪表板结构化数据,包含:
* - basicInfo: 基本信息(名称、ID、URL)
* - queryControls: 查询控件列表
* - chartComponents: 图表组件列表
* - tabComponents: Tab 组件列表
* - richTextComponents: 富文本组件列表
* - layoutAnalysis: 布局分析结果
* - datasetSchemaMap: 数据集schema信息(用于字段别名解析)
*/
// 布局分析辅助函数 - 基于 tileLayout 的 x/y 坐标
function analyzeLayout(charts) {
if (charts.length === 0) return { rows: [], source: 'none' };
// 基于 tileLayout 的 y 坐标分组(同一行的 y 值相同或相近)
const rowMap = {};
charts.forEach(chart => {
const y = chart.position.y || 0;
// 找到最近的行(允许 y 值有小幅偏差)
let targetY = y;
for (const existingY of Object.keys(rowMap)) {
if (Math.abs(parseInt(existingY) - y) < 5) {
targetY = parseInt(existingY);
break;
}
}
if (!rowMap[targetY]) {
rowMap[targetY] = { y: targetY, items: [] };
}
rowMap[targetY].items.push({
chart: chart,
x: chart.position.x || 0,
w: chart.position.w || 1,
h: chart.position.h || 1
});
});
// 按行号(y 坐标)排序
const sortedRows = Object.values(rowMap).sort((a, b) => a.y - b.y);
// 每行内按 x 坐标排序
sortedRows.forEach(row => {
row.items.sort((a, b) => a.x - b.x);
row.charts = row.items.map(item => item.chart);
row.gridInfo = row.items.map(item => ({
internalId: item.chart.internalId,
x: item.x,
y: row.y,
w: item.w,
h: item.h
}));
delete row.items;
});
return { rows: sortedRows, source: 'tileLayout' };
}
/**
* 检查字符串是否为乱码(10位长度的十六进制字符串)
* @param {string} str - 要检查的字符串
* @returns {boolean} 是否为乱码
*/
function isGarbageString(str) {
if (!str || typeof str !== 'string') return true;
// 检查是否为10位长度的十六进制字符串(如 "bcdcf35a53")
return /^[a-f0-9]{10}$/i.test(str);
}
/**
* 从数据集schema中查找字段的caption
* @param {string} pathId - 字段的pathId
* @param {Array} fields - 数据集schema中的fields数组
* @returns {string|null} 字段caption,未找到返回null
*/
function findFieldCaptionFromSchema(pathId, fields) {
if (!pathId || !fields || !Array.isArray(fields)) return null;
for (const field of fields) {
// 检查当前字段
if (field.uniqueId === pathId || field.name === pathId) {
return field.caption || null;
}
// 检查attributes中的子字段
if (field.attributes && Array.isArray(field.attributes)) {
for (const attr of field.attributes) {
if (attr.uniqueId === pathId || attr.id === pathId) {
return attr.caption || null;
}
}
}
}
return null;
}
/**
* 解析仪表板 JSON
* @param {Object} json - 通过 OpenAPI 获取的仪表板原始 JSON 数据
* @param {Object} datasetSchemaMap - 可选,数据集schema信息,用于字段别名解析
* @returns {Object} 解析后的结构化数据
*/
function parseDashboardJson(json, datasetSchemaMap = {}) {
try {
// 1. 基本信息
const basicInfo = {
name: json.name, // 仪表板名称
pageId: json.treeId, // 页面ID
workspaceId: json.workspaceId, // 工作空间ID
gmtModified: json.gmtModified,
};
// 2. 组件容器
const queryControls = []; // 查询控件
const chartComponents = []; // 图表组件
const tabComponents = []; // Tab组件
const richTextComponents = []; // 富文本组件
// 内部ID到组件的映射
const componentMap = {};
// 确保 components 存在
const components = json.components || [];
components.forEach(comp => {
const content = JSON.parse(comp.componentContent || '{}');
const queryInput = JSON.parse(comp.queryInput || '{}');
const internalId = content.id;
// 从 tileLayout 提取位置信息(w/h/x/y)
const tileLayout = content.tileLayout || {};
const position = {
x: tileLayout.x || 0, // 栅格 x 坐标
y: tileLayout.y || 0, // 栅格 y 坐标
w: tileLayout.w || 1, // 栅格宽度
h: tileLayout.h || 1, // 栅格高度
minW: tileLayout.minW || 1, // 最小宽度
minH: tileLayout.minH || 1 // 最小高度
};
componentMap[internalId] = {
componentId: comp.componentId,
componentName: comp.componentName,
customComponentId: comp.customComponentId,
internalId: internalId,
position: position
};
if (comp.customComponentId === 'query2') {
// === 查询控件 ===
const modelConfig = content.modelConfig || {};
const fieldConfigs = modelConfig.fieldConfigs || [];
const styleConfig = content.styleConfig || {};
const needManualQuery = Array.isArray(styleConfig.buttons) && styleConfig.buttons.includes('query');
const isSingleComponent = content.queryType === 'qbi-inside-chart' && content.parentId;
const parentId = content.parentId || null;
queryControls.push({
componentId: comp.componentId,
internalId: internalId,
needManualQuery: needManualQuery,
isSingleComponent: isSingleComponent,
parentId: parentId,
position: position,
fields: fieldConfigs.map(f => ({
id: f.id,
labelName: f.labelName,
componentType: f.componentType, // datetime / enumSelect
enumType: f.config?.enumType || 'single',
isRequired: f.isRequired || false,
defaultValue: f.defaultValue || null,
// 时间控件的粒度
timeGranularity: f.config?.timeGranularity || null,
relatedGraphIds: (f.graphMappings || []).map(g => g.graphId)
}))
});
} else if (comp.customComponentId === 'tab') {
// === Tab 组件 ===
const attr = content.attribute || {};
const tabs = attr.tabs || content.items || [];
tabComponents.push({
componentId: comp.componentId,
internalId: internalId,
componentName: content.caption || 'Tab',
activeId: content.activeId,
position: position,
tabs: tabs.map(tab => ({
id: tab.id,
title: tab.title || tab.text
}))
});
} else if (comp.customComponentId === 'text') {
// === 文本组件 ===
const attr = content.attribute || {};
// 提取纯文本内容(去除HTML标签)
const htmlContent = attr.content || '';
const textContent = htmlContent.replace(/<[^>]*>/g, '').trim();
if (textContent) {
richTextComponents.push({
componentId: comp.componentId,
internalId: internalId,
position: position,
htmlContent: htmlContent,
textContent: textContent
});
}
} else {
// === 图表组件(非 query2、非 tab、非富文本)===
const attr = content.attribute || {};
const caption = attr.caption || comp.componentName || '';
const parentId = content.parentId || null;
// 获取当前图表的数据集ID
const sourceId = queryInput.sourceId;
const datasetSchema = sourceId ? datasetSchemaMap[sourceId] : null;
const schemaFields = datasetSchema ? datasetSchema.fields : null;
// 获取字段别名映射(funcSettingMap)
const funcSettingMap = attr.funcSettingMap || {};
// 辅助函数:获取字段别名
// 优先级:1. funcSettingMap.aliasName(非乱码) 2. cubeSchema.fields.caption 3. col.caption 4. col.name
const getFieldCaption = (col) => {
const uuid = col.uuid;
const pathId = col.pathId;
// 1. 尝试从 funcSettingMap 获取别名
if (uuid && funcSettingMap[uuid] && funcSettingMap[uuid].aliasName) {
const aliasName = funcSettingMap[uuid].aliasName;
// 检查别名是否为乱码(10位十六进制字符串)
if (!isGarbageString(aliasName)) {
return aliasName;
}
}
// 2. 尝试从 cubeSchema.fields 中匹配 pathId 获取 caption
if (pathId && schemaFields) {
const schemaCaption = findFieldCaptionFromSchema(pathId, schemaFields);
if (schemaCaption) {
return schemaCaption;
}
}
// 3. 降级到 col.caption 或 col.name
return col.caption || col.name;
};
// 解析 queryInput.area 获取字段配置
const areas = queryInput.area || [];
const dimensions = []; // 维度字段
const measures = []; // 度量字段
const filters = []; // 过滤器字段
const drillFields = []; // 下钻字段
areas.forEach(area => {
// 提取下钻字段(id 为 drill 的 area)
if (area.id === 'drill') {
(area.columnList || []).forEach(col => {
drillFields.push({
caption: getFieldCaption(col),
pathId: col.pathId,
itemType: col.itemType,
key: col.key,
uuid: col.uuid,
isDrillEnabled: col.isDrillEnabled || false
});
});
return; // drill area 处理完毕,跳过后续逻辑
}
// 处理其他 area(维度、度量、过滤器)
(area.columnList || []).forEach(col => {
const fieldInfo = {
caption: getFieldCaption(col),
pathId: col.pathId,
itemType: col.itemType, // dimension / measure / datetime / geographic
key: col.key,
uuid: col.uuid,
aggregateType: col.aggregator || col.aggregateType || null,
isDrillEnabled: col.isDrillEnabled || false
};
if (area.id === 'filters' || area.queryAxis === 'filters') {
filters.push(fieldInfo);
} else if (col.itemType === 'measure') {
measures.push(fieldInfo);
} else {
dimensions.push(fieldInfo);
}
});
});
// 提取默认过滤器配置
const filterConfigs = content.filterConfigs || queryInput.filter || [];
chartComponents.push({
componentId: comp.componentId,
internalId: internalId,
componentName: caption,
componentType: comp.componentType,
customComponentId: comp.customComponentId,
sourceId: queryInput.sourceId, // 数据集ID - 问数调用的关键
position: position, // 位置信息(来自 tileLayout)
dimensions: dimensions, // 维度字段
measures: measures, // 度量字段
filters: filters, // 过滤器字段
drillFields: drillFields, // 下钻字段
defaultFilters: filterConfigs, // 默认过滤器
parentId: parentId
});
}
});
// 3. 建立图表与 Tab 的从属关系
chartComponents.forEach(chart => {
chart.tabInfo = null;
if (chart.parentId) {
for (const tab of tabComponents) {
for (const tabItem of tab.tabs) {
const expectedParentId = tab.internalId + tabItem.id;
if (chart.parentId === expectedParentId) {
chart.tabInfo = {
tabComponentId: tab.componentId,
tabInternalId: tab.internalId,
tabItemId: tabItem.id,
tabItemTitle: tabItem.title
};
break;
}
}
if (chart.tabInfo) break;
}
}
});
// 4. 建立查询控件与图表的关联关系
chartComponents.forEach(chart => {
chart.relatedQueryControls = [];
queryControls.forEach(qc => {
if (qc.isSingleComponent && qc.parentId === chart.internalId) {
chart.relatedQueryControls.push({
componentId: qc.componentId,
type: 'single-component',
fields: qc.fields.map(f => f.labelName)
});
} else if (!qc.isSingleComponent) {
const matchedFields = qc.fields.filter(f =>
f.relatedGraphIds.includes(chart.internalId)
);
if (matchedFields.length > 0) {
chart.relatedQueryControls.push({
componentId: qc.componentId,
type: 'global',
fields: matchedFields.map(f => f.labelName)
});
}
}
});
});
// 5. 为查询控件字段添加关联图表名称
queryControls.forEach(qc => {
qc.fields.forEach(f => {
f.relatedCharts = f.relatedGraphIds.map(gid => {
const chart = chartComponents.find(c => c.internalId === gid);
return chart ? { internalId: gid, name: chart.componentName } : { internalId: gid, name: '未知图表' };
});
});
});
// 6. 布局分析 - 基于 tileLayout 的 x/y 坐标
const layoutAnalysis = analyzeLayout(chartComponents);
return {
success: true,
basicInfo,
queryControls,
chartComponents,
tabComponents,
richTextComponents,
layoutAnalysis,
datasetSchemaMap // 保留数据集schema信息,供后续使用
};
} catch (e) {
return { success: false, error: e.message };
}
}
// 如果作为命令行脚本运行
if (typeof require !== 'undefined' && require.main === module) {
let inputData = '';
process.stdin.setEncoding('utf8');
process.stdin.on('readable', () => {
let chunk;
while ((chunk = process.stdin.read()) !== null) {
inputData += chunk;
}
});
process.stdin.on('end', () => {
try {
const parsedInput = JSON.parse(inputData);
// 支持两种输入格式:
// 1. 新格式:{ dashboardJson: {...}, datasetSchemaMap: {...} }
// 2. 旧格式:直接的仪表板 JSON {...}
let dashboardJson;
let datasetSchemaMap = {};
if (parsedInput.dashboardJson && parsedInput.datasetSchemaMap !== undefined) {
// 新格式
dashboardJson = parsedInput.dashboardJson;
datasetSchemaMap = parsedInput.datasetSchemaMap || {};
} else {
// 旧格式(向后兼容)
dashboardJson = parsedInput;
}
const result = parseDashboardJson(dashboardJson, datasetSchemaMap);
console.log(JSON.stringify(result, null, 2));
} catch (e) {
console.log(JSON.stringify({ success: false, error: e.message }));
process.exit(1);
}
});
}
// 导出函数供 Node.js 使用
if (typeof module !== 'undefined' && module.exports) {
module.exports = { parseDashboardJson, analyzeLayout };
}
FILE:scripts/dashboard/quickbi_openapi.py
"""
QuickBI OpenAPI HTTP 调用工具函数
使用 HMAC-SHA256 签名方式调用 QuickBI OpenAPI,无需 SDK 依赖。
"""
import base64
import datetime
import hmac
import sys
import time
import uuid
from pathlib import Path
from urllib import parse
import requests
import yaml
sys.path.insert(0, str(Path(__file__).parent.parent))
# 导入配置加载器(四层配置加载,详见 config_loader.py)
from common.config_loader import load_config as _load_config_from_loader
def load_config(config_path: str = None) -> dict:
"""加载配置。
四层配置加载(高覆盖低):
1. default_config.yaml(包内默认值)
2. ~/.qbi/config.yaml(QBI 全局配置,受 save_global_property 开关控制)
3. $WORKSPACE_DIR/.qbi/smartq-chat/config.yaml(工作目录级配置)
4. ACCESS_TOKEN 环境变量(最高优先级)
委托 config_loader.py 的四层加载器实现。
Args:
config_path: 可选,指定配置文件路径(向后兼容,不建议使用)
Returns:
配置字典
"""
if config_path:
# 向后兼容:从指定路径加载
with open(config_path, 'r', encoding='utf-8') as f:
return yaml.safe_load(f)
# 使用配置加载器(四层配置加载)
return _load_config_from_loader()
def hash_hmac(key: str, code: str, algorithm: str = 'sha256') -> str:
"""Base64编码的HMAC-SHA256计算值"""
hmac_code = hmac.new(key.encode('UTF-8'), code.encode('UTF-8'), algorithm).digest()
return base64.b64encode(hmac_code).decode()
def build_signature(
method: str,
uri: str,
params: dict,
access_id: str,
access_key: str,
nonce: str,
timestamp: str
) -> str:
"""
构造签名
StringToSign = HTTP_METHOD + "\n" + URI + QueryString +
"\nX-Gw-AccessId:" + AccessID +
"\nX-Gw-Nonce:" + UUID +
"\nX-Gw-Timestamp:" + Timestamp
Signature = Base64(HMAC-SHA256(AccessKey, URL_Encode(StringToSign)))
"""
# Request参数拼接(按key排序)
if not params:
request_query_string = ''
else:
sorted_keys = sorted(params.keys())
query_parts = [f"{key}={params[key]}" for key in sorted_keys if params[key] is not None]
request_query_string = '\n' + '&'.join(query_parts) if query_parts else ''
# Request Header拼接
request_headers = f'\nX-Gw-AccessId:{access_id}\nX-Gw-Nonce:{nonce}\nX-Gw-Timestamp:{timestamp}'
# 待签名字符串
string_to_sign = method.upper() + '\n' + uri + request_query_string + request_headers
# URL编码并计算签名
encode_string = parse.quote(string_to_sign, '')
sign = hash_hmac(access_key, encode_string)
return sign
def call_quickbi_api(
host: str,
uri: str,
access_id: str,
access_key: str,
method: str = "POST",
json_param: dict = None,
form_params: dict = None,
content_type: str = "application/json"
) -> dict:
"""
调用 QuickBI OpenAPI
Args:
host: QuickBI 服务域名
uri: API 接口路径
access_id: AccessKey ID
access_key: AccessKey Secret
method: HTTP 方法,默认 POST
json_param: JSON 格式请求体
form_params: 表单参数(参与签名计算)
content_type: Content-Type,默认 application/json
Returns:
JSON 格式的响应数据
"""
url = host + uri
nonce = str(uuid.uuid1())
timestamp = str(round(time.time() * 1000))
signature = build_signature(method, uri, form_params, access_id, access_key, nonce, timestamp)
headers = {
'X-Gw-AccessId': str(access_id),
'X-Gw-Nonce': nonce,
'X-Gw-Timestamp': timestamp,
'X-Gw-Signature': signature,
'X-Gw-Debug': 'true',
'Content-Type': content_type
}
response = requests.request(
method=method,
url=url,
headers=headers,
params=form_params,
json=json_param,
verify="test" not in url.lower(),
timeout=300
)
return response.json()
def query_openapi(
endpoint: str,
access_key_id: str,
access_key_secret: str,
question: str,
user_id: str = None,
cube_id: str = None
) -> dict:
"""
调用 QuickBI SmartQ 查询接口
与 SDK 的 SmartqQueryAbility 接口入参保持一致
Args:
endpoint: QuickBI endpoint
access_key_id: AccessKey ID
access_key_secret: AccessKey Secret
question: 自然语言问题
user_id: 用户ID(可选)
cube_id: 数据集ID(可选,多个用逗号分隔)
Returns:
查询结果 JSON
"""
uri = "/openapi/v2/smartq/queryByQuestion"
json_param = {"userQuestion": question}
if user_id:
json_param["userId"] = user_id
# 处理 cube_id(单表/多表场景)
if cube_id:
if ',' in cube_id:
json_param["multipleCubeIds"] = cube_id # 多表
else:
json_param["cubeId"] = cube_id # 单表
return call_quickbi_api(
host=endpoint,
uri=uri,
access_id=access_key_id,
access_key=access_key_secret,
method="POST",
json_param=json_param
)
def is_dataportal_url(url: str) -> bool:
"""
判断是否为数据门户页面 URL
数据门户 URL 特征:
- 路径包含 /product/view.htm
- 包含 productId 和 menuId 参数
Args:
url: 要检查的 URL
Returns:
True 如果是数据门户 URL,否则 False
"""
return '/product/view.htm' in url
def extract_dataportal_ids(url: str) -> dict:
"""
从数据门户 URL 中提取 productId 和 menuId
支持格式:
- https://bi.aliyun.com/product/view.htm?module=dashboard&productId=xxx&menuId=yyy
- https://bi.aliyun.com/product/view.htm?productId=xxx(无 menuId)
Args:
url: 数据门户 URL
Returns:
{"productId": "xxx", "menuId": "yyy" 或 None}
Raises:
ValueError: 如果无法提取 productId
"""
import re
product_match = re.search(r'productId=([a-zA-Z0-9-]+)', url)
menu_match = re.search(r'menuId=([a-zA-Z0-9-]+)', url)
if not product_match:
raise ValueError(f"无法从数据门户 URL 中提取 productId: {url}")
return {
"productId": product_match.group(1),
"menuId": menu_match.group(1) if menu_match else None
}
def get_dataportal_page_id(
host: str,
access_id: str,
access_key: str,
dataportal_id: str,
menu_id: str = None
) -> dict:
"""
通过数据门户接口获取真实的仪表板 pageId
调用 /openapi/v2/dataportal/query 接口获取菜单树,
根据 menuId 或 homeMenu 逻辑找到对应的仪表板 pageId。
Args:
host: QuickBI 服务域名
access_id: API Key
access_key: API Secret
dataportal_id: 数据门户 ID(即 URL 中的 productId)
menu_id: 菜单 ID(可选,不传则自动查找 homeMenu 或第一个有效菜单)
Returns:
{
"success": True,
"page_id": "仪表板 pageId",
"dashboard_name": "仪表板名称",
"menu_id": "实际使用的菜单 ID",
"menu_title": "菜单标题"
}
或
{
"success": False,
"error_code": "错误码",
"error_message": "错误信息"
}
"""
from .dataportal_resolver import (
query_dataportal_menus,
get_selected_menu,
extract_page_id_from_menu
)
# 1. 查询菜单树
query_result = query_dataportal_menus(host, access_id, access_key, dataportal_id)
if not query_result["success"]:
return query_result
menus = query_result["menus"]
dataportal_name = query_result.get("dataportal_name", "")
if not menus:
return {
"success": False,
"error_code": "EMPTY_MENU",
"error_message": "数据门户没有配置任何菜单"
}
# 2. 查找目标菜单
target_menu = get_selected_menu(menus, menu_id)
if not target_menu:
if menu_id:
return {
"success": False,
"error_code": "MENU_NOT_FOUND",
"error_message": f"在数据门户中找不到 menuId={menu_id} 对应的菜单"
}
else:
return {
"success": False,
"error_code": "NO_DEFAULT_MENU",
"error_message": "数据门户没有可用的默认菜单(首页或第一个非空菜单)"
}
# 3. 从菜单中提取 pageId
success, message, page_id = extract_page_id_from_menu(target_menu)
if not success:
return {
"success": False,
"error_code": "INVALID_CONTENT_TYPE",
"error_message": message
}
# 获取仪表板名称
content = target_menu.get('content', [])
dashboard_name = content[0].get('name', '') if content else ''
return {
"success": True,
"page_id": page_id,
"dashboard_name": dashboard_name,
"dataportal_name": dataportal_name,
"menu_id": target_menu.get('id'),
"menu_title": target_menu.get('title', '')
}
def extract_page_id(url: str) -> str:
"""
从仪表板 URL 中提取 pageId
支持格式:
- https://bi.aliyun.com/dashboard/view/pc.htm?pageId=XXXXXXX
- https://bi.aliyun.com/token3rd/dashboard/view/pc.htm?pageId=XXXXXXX&accessToken=...
注意: 此函数仅支持直接包含 pageId 的仪表板 URL。
对于数据门户 URL(/product/view.htm),请先使用 is_dataportal_url() 判断,
然后使用 extract_dataportal_ids() 和 get_dataportal_page_id() 获取 pageId。
Args:
url: 仪表板 URL
Returns:
pageId 字符串
Raises:
ValueError: 如果无法提取 pageId
"""
import re
match = re.search(r'pageId=([a-zA-Z0-9-]+)', url)
if match:
return match.group(1)
raise ValueError(f"无法从 URL 中提取 pageId: {url}")
def validate_and_prepare_dashboard(
host: str,
access_id: str,
access_key: str,
page_id: str,
user_id: str
) -> dict:
"""
仪表板转换前的预校验及预处理
Args:
host: QuickBI 服务域名
access_id: API Key
access_key: API Secret
page_id: 仪表板 pageId
user_id: 用户 token
Returns:
{
"success": True,
"url": "预处理后的仪表板 URL"
}
或
{
"success": False,
"error_code": "错误码",
"error_message": "错误信息"
}
"""
uri = "/openapi/v2/skills/dashboard/handle"
json_param = {
"id": page_id,
"userId": user_id,
"runningBySkill": True,
}
try:
result = call_quickbi_api(
host=host,
uri=uri,
access_id=access_id,
access_key=access_key,
method="POST",
json_param=json_param
)
# success 可能是布尔值或字符串 "true"/"false"
success_val = result.get("success")
is_success = success_val == True or success_val == "true"
if is_success:
# URL 在 data 字段中
return {
"success": True,
"url": result.get("data")
}
else:
return {
"success": False,
"error_code": str(result.get("errorCode", result.get("code", "UNKNOWN"))),
"error_message": result.get("errorMsg", result.get("message", "未知错误"))
}
except Exception as e:
return {
"success": False,
"error_code": "CONNECTION_ERROR",
"error_message": f"连接失败: {str(e)}"
}
def get_dashboard_update_time(
host: str,
access_id: str,
access_key: str,
page_id: str,
user_id: str
) -> dict:
"""
查询仪表板的最后更新时间
通过 /openapi/v2/dashboard/query 接口获取仪表板大 JSON,
从中提取 gmtModified 字段作为最后更新时间。
用于在生成的查询 skill 启动时校验仪表板是否有更新,
如果仪表板有更新,则提示用户重新生成 skill。
Args:
host: QuickBI 服务域名
access_id: API Key
access_key: API Secret
page_id: 仪表板 pageId(即 dashboardId)
user_id: 用户 token
Returns:
{
"success": True,
"data": {
"page_id": "xxx",
"last_modified": xxx, # 仪表板最后修改时间(gmtModified 原值,用于对比变化)
"dashboard_name": "仪表板名称"
}
}
或
{
"success": False,
"error_code": "错误码",
"error_message": "错误信息"
}
"""
uri = "/openapi/v2/dashboard/query"
# 使用 form_params 作为查询参数
form_params = {
"dashboardId": page_id,
"viewType": "view",
"queryFavorite": "true"
}
try:
result = call_quickbi_api(
host=host,
uri=uri,
access_id=access_id,
access_key=access_key,
method="GET",
form_params=form_params
)
success_val = result.get("success")
is_success = success_val == True or success_val == "true"
if is_success:
data = result.get("data", {})
# 从仪表板大 JSON 中提取更新时间(gmtModified 原值,用于对比变化)
gmt_modified = data.get("gmtModified")
dashboard_name = data.get("name", "")
return {
"success": True,
"data": {
"page_id": page_id,
"last_modified": gmt_modified,
"dashboard_name": dashboard_name
}
}
else:
return {
"success": False,
"error_code": str(result.get("errorCode", result.get("code", "UNKNOWN"))),
"error_message": result.get("errorMsg", result.get("message", "未知错误"))
}
except Exception as e:
return {
"success": False,
"error_code": "CONNECTION_ERROR",
"error_message": f"查询仪表板更新时间失败: {str(e)}"
}
def get_dashboard_json(
host: str,
access_id: str,
access_key: str,
page_id: str,
user_id: str
) -> dict:
"""
获取仪表板完整 JSON 数据
通过 /openapi/v2/dashboard/query 接口获取仪表板大 JSON,
用于解析仪表板结构。
Args:
host: QuickBI 服务域名
access_id: API Key
access_key: API Secret
page_id: 仪表板 pageId(即 dashboardId)
user_id: 用户 token
Returns:
{
"success": True,
"data": { ... 仪表板完整 JSON ... }
}
或
{
"success": False,
"error_code": "错误码",
"error_message": "错误信息"
}
"""
uri = "/openapi/v2/dashboard/query"
form_params = {
"dashboardId": page_id,
"viewType": "view",
"queryFavorite": "true"
}
try:
result = call_quickbi_api(
host=host,
uri=uri,
access_id=access_id,
access_key=access_key,
method="GET",
form_params=form_params
)
success_val = result.get("success")
is_success = success_val == True or success_val == "true"
if is_success:
return {
"success": True,
"data": result.get("data", {})
}
else:
return {
"success": False,
"error_code": str(result.get("errorCode", result.get("code", "UNKNOWN"))),
"error_message": result.get("errorMsg", result.get("message", "未知错误"))
}
except Exception as e:
return {
"success": False,
"error_code": "CONNECTION_ERROR",
"error_message": f"获取仪表板数据失败: {str(e)}"
}
def batch_get_dataset_schema(
host: str,
access_id: str,
access_key: str,
cube_ids: list
) -> dict:
"""
批量获取数据集详情(包括数据集名称)
通过 /openapi/v2/dataset/batchGetSchema 接口批量获取数据集 schema 信息,
主要用于获取数据集的名称等元信息,提供更友好的展示。
Args:
host: QuickBI 服务域名
access_id: API Key
access_key: API Secret
cube_ids: 数据集 ID 列表
Returns:
{
"success": True,
"data": {
"cube_id_1": {"cubeId": "...", "cubeName": "数据集名称", ...},
"cube_id_2": {"cubeId": "...", "cubeName": "数据集名称", ...}
}
}
或
{
"success": False,
"error_code": "错误码",
"error_message": "错误信息"
}
"""
if not cube_ids:
return {
"success": True,
"data": {}
}
uri = "/openapi/v2/dataset/batchGetSchema"
json_param = {
"cubeIds": cube_ids
}
try:
result = call_quickbi_api(
host=host,
uri=uri,
access_id=access_id,
access_key=access_key,
method="POST",
json_param=json_param
)
success_val = result.get("success")
is_success = success_val == True or success_val == "true"
if is_success:
# 将返回数据转换为 cubeId -> schema 的映射
data_list = result.get("data", [])
data_map = {}
if isinstance(data_list, list):
for item in data_list:
cube_id = item.get("cubeId")
if cube_id:
data_map[cube_id] = item
elif isinstance(data_list, dict):
# 如果返回的就是 dict,直接使用
data_map = data_list
return {
"success": True,
"data": data_map
}
else:
return {
"success": False,
"error_code": str(result.get("errorCode", result.get("code", "UNKNOWN"))),
"error_message": result.get("errorMsg", result.get("message", "未知错误"))
}
except Exception as e:
return {
"success": False,
"error_code": "CONNECTION_ERROR",
"error_message": f"批量获取数据集详情失败: {str(e)}"
}
def validate_api_credentials(config: dict) -> dict:
"""
验证 API 凭证有效性
Args:
config: 配置字典,需包含 server_domain, api_key, api_secret
Returns:
验证结果 {"success": bool, "error": str, "error_code": str}
"""
try:
# 调用一个简单的接口验证凭证
result = call_quickbi_api(
host=config.get("server_domain", "https://quickbi-public.cn-hangzhou.aliyuncs.com"),
uri="/openapi/v2/workspace/list",
access_id=config["api_key"],
access_key=config["api_secret"],
method="GET"
)
if result.get("success", False) or result.get("code") == 200:
return {"success": True}
else:
return {
"success": False,
"error": result.get("message", "API 验证失败"),
"error_code": str(result.get("code", "UNKNOWN"))
}
except Exception as e:
return {
"success": False,
"error": f"连接失败: {str(e)}",
"error_code": "CONNECTION_ERROR"
}
# 使用示例
if __name__ == "__main__":
# 加载配置
config = load_config()
# 检查必要配置
_server_domain = config.get("server_domain")
_api_key = config.get("api_key")
_api_secret = config.get("api_secret")
if not _server_domain or not _api_key or not _api_secret:
print("缺少必要配置项: server_domain / api_key / api_secret,请检查配置文件")
exit(1)
# 验证凭证
validation = validate_api_credentials(config)
if not validation["success"]:
print(f"验证失败: {validation['error']}")
exit(1)
# SmartQ 查询
result = query_openapi(
endpoint=_server_domain,
access_key_id=_api_key,
access_key_secret=_api_secret,
question="查询销售额排名前五的商品",
cube_id="your-cube-id"
)
print(result)
FILE:scripts/document/document_local_parse.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
通用文档文本提取工具(纯本地执行,不依赖任何外部 API)。
支持 PDF、DOC、DOCX、XLSX、XLS、CSV 及常用图片格式的文本提取。
支持单文件、多文件、文件夹递归扫描,并行处理。
技术栈:
- PDF: PyMuPDF 直接提取文本,扫描件使用 Tesseract OCR(逐页渲染为图片后识别)
- 图片: Tesseract OCR 本地识别(中英文混合)
- Word: python-docx 提取(.docx),libreoffice 转换 + OCR(.doc)
- Excel: openpyxl/xlrd 直接提取
- CSV: pandas 读取(自动检测编码)
系统依赖:
- Tesseract OCR:
* macOS: brew install tesseract tesseract-lang
* Linux: sudo apt install tesseract-ocr tesseract-ocr-chi-sim
* Windows: 从 https://github.com/UB-Mannheim/tesseract/wiki 下载安装
- LibreOffice (可选,用于 .doc 格式转换):
* macOS: brew install --cask libreoffice
* Linux: sudo apt install libreoffice
* Windows: 从 https://www.libreoffice.org 下载安装
- Python 包: pytesseract, Pillow, PyMuPDF, python-docx, openpyxl, xlrd, pandas
用法:
from document_local_parse import extract_text
# 提取单个文件
text = extract_text("/path/to/file.pdf")
# 批量提取
texts = extract_text(["/path/to/file1.pdf", "/path/to/file2.docx"])
# 扫描文件夹(递归)
texts = extract_text("/path/to/folder/")
# 命令行使用
python document_local_parse.py file1.pdf file2.png --json
python document_local_parse.py /path/to/folder/ --output result.json
"""
from __future__ import annotations
import json
import os
import sys
import tempfile
import platform
from concurrent.futures import ThreadPoolExecutor, as_completed
from pathlib import Path
from typing import Any, Dict, List, Optional, Union
# ---------------------------------------------------------------------------
# 格式分类
# ---------------------------------------------------------------------------
PDF_EXTENSIONS = {'.pdf'}
IMAGE_EXTENSIONS = {'.png', '.jpg', '.jpeg', '.bmp', '.tiff', '.tif', '.webp'}
WORD_EXTENSIONS = {'.doc', '.docx'}
EXCEL_EXTENSIONS = {'.xls', '.xlsx'}
CSV_EXTENSIONS = {'.csv'}
ALL_SUPPORTED_EXTENSIONS = (
PDF_EXTENSIONS | IMAGE_EXTENSIONS | WORD_EXTENSIONS | EXCEL_EXTENSIONS | CSV_EXTENSIONS
)
# 输出目录
OUTPUT_DIR = Path(__file__).resolve().parent.parent / "output"
# 最大并行数
MAX_WORKERS = 10
# ---------------------------------------------------------------------------
# 跨平台工具函数
# ---------------------------------------------------------------------------
def get_libreoffice_path() -> str:
"""
获取跨平台的 libreoffice 可执行文件路径。
Returns:
libreoffice 可执行文件路径
"""
system = platform.system()
if system == 'Windows':
# Windows 常见安装路径
win_paths = [
Path("C:/Program Files/LibreOffice/program/soffice.exe"),
Path("C:/Program Files (x86)/LibreOffice/program/soffice.exe"),
]
for p in win_paths:
if p.exists():
return str(p)
return 'libreoffice' # fallback: 假设在 PATH 中
elif system == 'Darwin': # macOS
mac_path = Path('/Applications/LibreOffice.app/Contents/MacOS/soffice')
if mac_path.exists():
return str(mac_path)
return 'libreoffice' # fallback: 假设通过 brew 安装且在 PATH 中
else: # Linux
return 'libreoffice'
def check_tesseract_available() -> bool:
"""
检查 Tesseract OCR 是否可用。
Returns:
True 如果可用,否则 False
"""
try:
import pytesseract
pytesseract.get_tesseract_version()
return True
except Exception:
system = platform.system()
if system == 'Windows':
msg = "请从 https://github.com/UB-Mannheim/tesseract/wiki 下载并安装,添加到 PATH"
elif system == 'Darwin':
msg = "运行: brew install tesseract tesseract-lang"
else: # Linux
msg = "运行: sudo apt install tesseract-ocr tesseract-ocr-chi-sim (Debian/Ubuntu) 或 sudo yum install tesseract (CentOS)"
print(f"[OCR检查] ⚠ Tesseract 未安装或不可用: {msg}", flush=True)
return False
def safe_remove_temp_file(file_path: str):
"""
安全删除临时文件(处理 Windows 文件占用问题)。
Args:
file_path: 临时文件路径
"""
try:
if os.path.exists(file_path):
os.unlink(file_path)
except PermissionError:
# Windows 文件被占用时延迟删除
if platform.system() == 'Windows':
import atexit
print(f"[临时文件] 警告: 文件被占用,将在程序退出时删除: {file_path}", flush=True)
atexit.register(lambda: os.path.exists(file_path) and os.unlink(file_path))
else:
raise
# ---------------------------------------------------------------------------
# 文件扫描工具
# ---------------------------------------------------------------------------
def scan_files(path: Union[str, Path], recursive: bool = True) -> List[Path]:
"""
扫描文件路径,支持单文件、多文件、文件夹递归。
Args:
path: 文件路径或文件夹路径
recursive: 是否递归子文件夹
Returns:
支持的文件路径列表
"""
path = Path(path)
if path.is_file():
# 单个文件
ext = path.suffix.lower()
if ext in ALL_SUPPORTED_EXTENSIONS:
return [path]
else:
print(f"[扫描] 跳过不支持的文件: {path.name} ({ext})", flush=True)
return []
elif path.is_dir():
# 文件夹扫描
files = []
pattern = "**/*" if recursive else "*"
for file_path in path.glob(pattern):
if file_path.is_file():
ext = file_path.suffix.lower()
if ext in ALL_SUPPORTED_EXTENSIONS:
files.append(file_path)
print(f"[扫描] 在 {path} 中找到 {len(files)} 个支持的文件", flush=True)
return files
else:
print(f"[扫描] 路径不存在: {path}", flush=True)
return []
# ---------------------------------------------------------------------------
# PDF 文本提取
# ---------------------------------------------------------------------------
def extract_pdf_text(file_path: Path, use_ocr_fallback: bool = True) -> str:
"""
提取 PDF 文本。优先使用 PyMuPDF,如果是扫描件则降级到 OCR。
Args:
file_path: PDF 文件路径
use_ocr_fallback: 是否在文本提取失败时降级到 OCR
Returns:
提取的文本内容
"""
import fitz # PyMuPDF
text = ""
try:
doc = fitz.open(str(file_path))
for page_num in range(len(doc)):
page = doc[page_num]
page_text = page.get_text()
text += page_text
doc.close()
# 检查是否成功提取到足够的文本
# 如果文本太少(< 50 字符),可能是扫描件,需要 OCR
if len(text.strip()) < 50 and use_ocr_fallback:
print(f"[PDF提取] 警告: 本地提取文本较少 ({len(text.strip())} 字符),可能是扫描件,尝试 OCR...", flush=True)
return _ocr_fallback(file_path)
print(f"[PDF提取] 成功提取 {len(text.strip())} 字符", flush=True)
return text.strip()
except Exception as e:
print(f"[PDF提取] 本地提取失败: {e}", flush=True)
if use_ocr_fallback:
print("[PDF提取] 降级到 OCR 识别...", flush=True)
return _ocr_fallback(file_path)
raise
# ---------------------------------------------------------------------------
# 图片 OCR 提取
# ---------------------------------------------------------------------------
def extract_image_text(file_path: Path, use_ocr: bool = True) -> str:
"""
提取图片中的文本(OCR)。
Args:
file_path: 图片文件路径
use_ocr: 是否使用 OCR(目前仅支持 QuickBI OCR API)
Returns:
识别的文本内容
"""
if not use_ocr:
print(f"[图片提取] 警告: 图片文件必须使用 OCR 提取,但 use_ocr=False", flush=True)
return ""
print(f"[图片提取] 使用 OCR 识别: {file_path.name}", flush=True)
return _ocr_fallback(file_path)
# ---------------------------------------------------------------------------
# Word 文档提取
# ---------------------------------------------------------------------------
def extract_word_text(file_path: Path) -> str:
"""
提取 Word 文档文本。
Args:
file_path: Word 文件路径(.doc 或 .docx)
Returns:
提取的文本内容
"""
ext = file_path.suffix.lower()
if ext == '.docx':
return _extract_docx(file_path)
elif ext == '.doc':
# .doc 格式较老,尝试使用 antiword 或转换为 .docx
return _extract_doc(file_path)
else:
raise ValueError(f"不支持的 Word 格式: {ext}")
def _extract_docx(file_path: Path) -> str:
"""提取 .docx 文件文本。"""
from docx import Document
try:
doc = Document(str(file_path))
paragraphs = []
for para in doc.paragraphs:
if para.text.strip():
paragraphs.append(para.text)
# 也提取表格内容
for table in doc.tables:
for row in table.rows:
row_text = " | ".join(cell.text for cell in row.cells)
if row_text.strip():
paragraphs.append(row_text)
text = "\n".join(paragraphs)
print(f"[Word提取] 成功提取 {len(text.strip())} 字符", flush=True)
return text.strip()
except Exception as e:
print(f"[Word提取] 提取失败: {e}", flush=True)
raise
def _extract_doc(file_path: Path) -> str:
"""
提取 .doc 文件文本。
策略:尝试使用 libreoffice 转换为 .docx
"""
# 方法1: 尝试使用 libreoffice 转换为 .docx
try:
import subprocess
libreoffice_cmd = get_libreoffice_path()
print(f"[Word提取] 使用 libreoffice: {libreoffice_cmd}", flush=True)
with tempfile.TemporaryDirectory() as tmpdir:
tmpdocx = Path(tmpdir) / f"{file_path.stem}.docx"
# Windows 下如果路径包含空格,需要使用 shell=True
use_shell = platform.system() == 'Windows' and ' ' in libreoffice_cmd
# 使用 libreoffice 转换
result = subprocess.run(
[
libreoffice_cmd, '--headless', '--convert-to', 'docx',
'--outdir', str(tmpdir), str(file_path)
],
capture_output=True,
text=True,
timeout=30,
shell=use_shell
)
if result.returncode == 0 and tmpdocx.exists():
print(f"[Word提取] 已将 .doc 转换为 .docx", flush=True)
return _extract_docx(tmpdocx)
else:
print(f"[Word提取] libreoffice 转换失败: {result.stderr}", flush=True)
except FileNotFoundError:
print("[Word提取] 警告: 未找到 libreoffice,尝试其他方法...", flush=True)
except Exception as e:
print(f"[Word提取] libreoffice 转换失败: {e}", flush=True)
# 方法2: 尝试使用 textract
try:
import textract
text = textract.process(str(file_path)).decode('utf-8')
print(f"[Word提取] 使用 textract 提取 {len(text.strip())} 字符", flush=True)
return text.strip()
except FileNotFoundError:
print("[Word提取] 警告: 未安装 textract,.doc 文件提取受限", flush=True)
except Exception as e:
print(f"[Word提取] textract 提取失败: {e}", flush=True)
# 方法3: 降级到 OCR
print("[Word提取] 降级到 OCR 识别...", flush=True)
return _ocr_fallback(file_path)
# ---------------------------------------------------------------------------
# Excel 文档提取
# ---------------------------------------------------------------------------
def extract_excel_text(file_path: Path) -> str:
"""
提取 Excel 文档文本。
Args:
file_path: Excel 文件路径(.xls 或 .xlsx)
Returns:
提取的文本内容(表格格式)
"""
ext = file_path.suffix.lower()
if ext == '.xlsx':
return _extract_xlsx(file_path)
elif ext == '.xls':
return _extract_xls(file_path)
else:
raise ValueError(f"不支持的 Excel 格式: {ext}")
def _extract_xlsx(file_path: Path) -> str:
"""提取 .xlsx 文件文本。"""
import openpyxl
try:
wb = openpyxl.load_workbook(str(file_path), read_only=True, data_only=True)
sheets_text = []
for sheet_name in wb.sheetnames:
ws = wb[sheet_name]
rows_text = []
for row in ws.iter_rows(values_only=True):
# 过滤空行
if any(cell is not None and str(cell).strip() for cell in row):
row_text = " | ".join(str(cell) if cell is not None else "" for cell in row)
rows_text.append(row_text)
if rows_text:
sheets_text.append(f"### 工作表: {sheet_name}\n" + "\n".join(rows_text))
wb.close()
text = "\n\n".join(sheets_text)
print(f"[Excel提取] 成功提取 {len(text.strip())} 字符", flush=True)
return text.strip()
except Exception as e:
print(f"[Excel提取] 提取失败: {e}", flush=True)
raise
def _extract_xls(file_path: Path) -> str:
"""提取 .xls 文件文本。"""
import xlrd
try:
wb = xlrd.open_workbook(str(file_path))
sheets_text = []
for sheet_name in wb.sheet_names():
ws = wb.sheet_by_name(sheet_name)
rows_text = []
for row_idx in range(ws.nrows):
row = ws.row_values(row_idx)
if any(str(cell).strip() for cell in row):
row_text = " | ".join(str(cell) for cell in row)
rows_text.append(row_text)
if rows_text:
sheets_text.append(f"### 工作表: {sheet_name}\n" + "\n".join(rows_text))
text = "\n\n".join(sheets_text)
print(f"[Excel提取] 成功提取 {len(text.strip())} 字符", flush=True)
return text.strip()
except Exception as e:
print(f"[Excel提取] 提取失败: {e}", flush=True)
raise
# ---------------------------------------------------------------------------
# CSV 文件提取
# ---------------------------------------------------------------------------
def extract_csv_text(file_path: Path) -> str:
"""
提取 CSV 文件文本。
Args:
file_path: CSV 文件路径
Returns:
提取的文本内容
"""
import pandas as pd
try:
# 尝试多种编码
for encoding in ['utf-8', 'gbk', 'gb2312', 'latin1']:
try:
df = pd.read_csv(str(file_path), encoding=encoding)
text = df.to_string(index=False)
print(f"[CSV提取] 成功提取 {len(text.strip())} 字符 (编码: {encoding})", flush=True)
return text.strip()
except UnicodeDecodeError:
continue
raise ValueError("无法解码 CSV 文件,尝试了 utf-8, gbk, gb2312, latin1")
except Exception as e:
print(f"[CSV提取] 提取失败: {e}", flush=True)
raise
# ---------------------------------------------------------------------------
# OCR 降级方案
# ---------------------------------------------------------------------------
def _ocr_fallback(file_path: Path) -> str:
"""
使用 Tesseract OCR 进行本地识别(纯本地,不依赖任何外部 API)。
Args:
file_path: 文件路径
Returns:
OCR 识别的文本内容
"""
import tempfile
# 检查 Tesseract 是否可用
if not check_tesseract_available():
raise RuntimeError(
"Tesseract OCR 未安装,无法进行 OCR 识别。"
"请安装后重试,或设置 use_ocr_fallback=False 禁用 OCR 降级。"
)
try:
print(f"[OCR降级] 使用 Tesseract OCR 本地识别: {file_path.name}", flush=True)
# 对于 PDF 文件,使用 PyMuPDF 将每页转为图片后再 OCR
if file_path.suffix.lower() in PDF_EXTENSIONS:
return _ocr_pdf_with_tesseract(file_path)
# 对于图片文件,直接使用 Tesseract OCR
elif file_path.suffix.lower() in IMAGE_EXTENSIONS:
return _ocr_image_with_tesseract(file_path)
# 对于 .doc 等其他文件,先转 PDF 再 OCR
else:
return _ocr_other_file_with_tesseract(file_path)
except Exception as e:
print(f"[OCR降级] OCR 识别失败: {e}", flush=True)
raise RuntimeError(f"OCR 降级失败: {e}")
def _ocr_pdf_with_tesseract(file_path: Path) -> str:
"""
使用 Tesseract OCR 识别 PDF 文件(逐页渲染为图片后 OCR)。
"""
import fitz # PyMuPDF
from PIL import Image
import pytesseract
import io
print(f"[OCR-PDF] 开始逐页 OCR: {file_path.name}", flush=True)
doc = fitz.open(str(file_path))
text_parts = []
for page_num in range(len(doc)):
page = doc[page_num]
# 将页面渲染为高分辨率图片(300 DPI)
zoom = 300 / 72 # 300 DPI
mat = fitz.Matrix(zoom, zoom)
pix = page.get_pixmap(matrix=mat)
# 转换为 PIL Image
img_data = pix.tobytes("png")
img = Image.open(io.BytesIO(img_data))
# 使用 Tesseract OCR(中英文混合)
text = pytesseract.image_to_string(img, lang='chi_sim+eng')
if text.strip():
text_parts.append(f"--- 第 {page_num + 1} 页 ---\n{text.strip()}")
print(f"[OCR-PDF] 第 {page_num + 1} 页识别成功: {len(text.strip())} 字符", flush=True)
doc.close()
full_text = "\n\n".join(text_parts)
print(f"[OCR-PDF] 总共识别 {len(text_parts)} 页,{len(full_text)} 字符", flush=True)
return full_text
def _ocr_image_with_tesseract(file_path: Path) -> str:
"""
使用 Tesseract OCR 识别图片文件。
"""
from PIL import Image
import pytesseract
print(f"[OCR-图片] 开始识别: {file_path.name}", flush=True)
# 打开图片
img = Image.open(str(file_path))
# 对于 .jpg 文件,统一转为 .jpeg 扩展名(Tesseract 支持更好)
if file_path.suffix.lower() == '.jpg':
import tempfile
tmp_path = None
try:
with tempfile.NamedTemporaryFile(suffix='.jpeg', delete=False) as tmp:
tmp_path = tmp.name
img.save(tmp_path, 'JPEG')
img = Image.open(tmp_path)
finally:
# 安全删除临时文件
if tmp_path:
safe_remove_temp_file(tmp_path)
# 使用 Tesseract OCR(中英文混合)
text = pytesseract.image_to_string(img, lang='chi_sim+eng')
if text.strip():
print(f"[OCR-图片] 识别成功: {len(text.strip())} 字符", flush=True)
return text.strip()
else:
raise RuntimeError("OCR 未能识别到任何文本")
def _ocr_other_file_with_tesseract(file_path: Path) -> str:
"""
对其他文件格式(如 .doc)进行 OCR。
策略:先尝试转换为 PDF,然后使用 PDF OCR。
"""
import fitz
import subprocess
import tempfile
print(f"[OCR-其他] 尝试转换并识别: {file_path.name}", flush=True)
# 尝试使用 libreoffice 转换为 PDF
with tempfile.TemporaryDirectory() as tmpdir:
tmppdf = Path(tmpdir) / f"{file_path.stem}.pdf"
try:
libreoffice_cmd = get_libreoffice_path()
print(f"[OCR-其他] 使用 libreoffice: {libreoffice_cmd}", flush=True)
# Windows 下如果路径包含空格,需要使用 shell=True
use_shell = platform.system() == 'Windows' and ' ' in libreoffice_cmd
result = subprocess.run(
[
libreoffice_cmd, '--headless', '--convert-to', 'pdf',
'--outdir', str(tmpdir), str(file_path)
],
capture_output=True,
text=True,
timeout=60,
shell=use_shell
)
if result.returncode == 0 and tmppdf.exists():
print(f"[OCR-其他] 转换成功,开始 OCR", flush=True)
return _ocr_pdf_with_tesseract(tmppdf)
else:
print(f"[OCR-其他] 转换失败: {result.stderr}", flush=True)
except FileNotFoundError:
print("[OCR-其他] 未找到 libreoffice", flush=True)
except Exception as e:
print(f"[OCR-其他] 转换失败: {e}", flush=True)
# 如果转换失败,尝试直接用 PyMuPDF 打开
try:
doc = fitz.open(str(file_path))
doc.close()
print(f"[OCR-其他] PyMuPDF 可以直接打开,使用 PDF OCR", flush=True)
return _ocr_pdf_with_tesseract(file_path)
except Exception as e:
print(f"[OCR-其他] PyMuPDF 也无法打开: {e}", flush=True)
raise RuntimeError(f"无法处理文件类型: {file_path.suffix}")
# ---------------------------------------------------------------------------
# 统一入口
# ---------------------------------------------------------------------------
def extract_text(
file_path: Union[str, Path, List[Union[str, Path]]],
use_ocr_fallback: bool = True,
output_dir: Optional[Union[str, Path]] = None,
save_json: bool = False
) -> Union[str, Dict[str, str], List[Dict[str, str]]]:
"""
统一文本提取接口。
Args:
file_path: 文件路径、文件夹路径或文件路径列表
use_ocr_fallback: 是否在本地提取失败时降级到 OCR
output_dir: JSON 输出目录(默认 OUTPUT_DIR)
save_json: 是否保存 JSON 结果文件
Returns:
- 单个文件: 返回提取的文本字符串
- 文件列表: 返回 [{"file": "fileName", "parsedText": "text"}] 列表
Examples:
>>> # 单个文件
>>> text = extract_text("invoice.pdf")
>>> # 批量提取
>>> texts = extract_text(["file1.pdf", "file2.docx", "image.png"])
>>> # 扫描文件夹
>>> results = extract_text("/path/to/folder/", save_json=True)
"""
# 处理单个文件/文件夹路径
if isinstance(file_path, (str, Path)):
path = Path(file_path)
# 如果是文件夹,扫描所有文件
if path.is_dir():
files = scan_files(path, recursive=True)
return _extract_files_parallel(files, use_ocr_fallback, output_dir, save_json)
# 单个文件
return _extract_single_file(path, use_ocr_fallback)
# 处理文件列表
return _extract_files_parallel(file_path, use_ocr_fallback, output_dir, save_json)
def _extract_single_file(file_path: Path, use_ocr_fallback: bool = True) -> str:
"""
提取单个文件的文本。
Args:
file_path: 文件路径
use_ocr_fallback: 是否降级到 OCR
Returns:
提取的文本
"""
if not file_path.exists():
raise FileNotFoundError(f"文件不存在: {file_path}")
if not file_path.is_file():
raise ValueError(f"路径不是文件: {file_path}")
ext = file_path.suffix.lower()
if ext not in ALL_SUPPORTED_EXTENSIONS:
raise ValueError(
f"不支持的文件格式: {ext}\n"
f"支持的格式: {', '.join(sorted(ALL_SUPPORTED_EXTENSIONS))}"
)
print(f"\n{'='*60}", flush=True)
print(f"提取文件: {file_path.name}", flush=True)
print(f"{'='*60}", flush=True)
# 根据文件类型选择提取方法
if ext in PDF_EXTENSIONS:
return extract_pdf_text(file_path, use_ocr_fallback)
elif ext in IMAGE_EXTENSIONS:
return extract_image_text(file_path, use_ocr_fallback)
elif ext in WORD_EXTENSIONS:
return extract_word_text(file_path)
elif ext in EXCEL_EXTENSIONS:
return extract_excel_text(file_path)
elif ext in CSV_EXTENSIONS:
return extract_csv_text(file_path)
else:
raise ValueError(f"未实现的文件类型: {ext}")
def _extract_files_parallel(
files: List[Union[str, Path]],
use_ocr_fallback: bool = True,
output_dir: Optional[Union[str, Path]] = None,
save_json: bool = False
) -> List[Dict[str, str]]:
"""
并行提取多个文件的文本。
Args:
files: 文件路径列表
use_ocr_fallback: 是否降级到 OCR
output_dir: JSON 输出目录
save_json: 是否保存 JSON 文件
Returns:
[{"file": "fileName", "parsedText": "text"}] 列表
"""
results = []
total = len(files)
print(f"\n{'='*60}", flush=True)
print(f"[并行提取] 开始处理 {total} 个文件 (最大并行数: {MAX_WORKERS})", flush=True)
print(f"{'='*60}", flush=True)
def process_file(file_path: Union[str, Path]) -> Dict[str, str]:
"""处理单个文件的包装函数"""
fp = Path(file_path)
try:
text = _extract_single_file(fp, use_ocr_fallback)
return {
"file": fp.name,
"parsedText": text
}
except Exception as e:
print(f"[错误] 提取失败 {fp.name}: {e}", flush=True)
return {
"file": fp.name,
"parsedText": f"[ERROR] {e}"
}
# 使用线程池并行处理
with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
# 提交所有任务
future_to_file = {executor.submit(process_file, fp): fp for fp in files}
# 收集结果
completed = 0
for future in as_completed(future_to_file):
completed += 1
result = future.result()
results.append(result)
print(f"\n[进度] {completed}/{total} 完成: {result['file']}", flush=True)
# 按文件名排序
results.sort(key=lambda x: x['file'])
print(f"\n{'='*60}", flush=True)
print(f"[并行提取] 全部完成,成功处理 {len(results)} 个文件", flush=True)
print(f"{'='*60}", flush=True)
# 保存 JSON 结果
if save_json or output_dir:
save_results_json(results, output_dir)
return results
def save_results_json(
results: List[Dict[str, str]],
output_dir: Optional[Union[str, Path]] = None
) -> Path:
"""
保存提取结果为 JSON 文件。
Args:
results: 提取结果列表
output_dir: 输出目录(默认 OUTPUT_DIR)
Returns:
JSON 文件路径
"""
import time
if output_dir is None:
output_dir = OUTPUT_DIR
else:
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
# 生成文件名(带时间戳)
timestamp = int(time.time())
json_path = output_dir / f"extract_results_{timestamp}.json"
# 写入 JSON
json_path.write_text(
json.dumps(results, ensure_ascii=False, indent=2),
encoding='utf-8'
)
print(f"\n[保存] JSON 结果已保存到: {json_path}", flush=True)
print(f"[保存] 共 {len(results)} 个文件的结果", flush=True)
return json_path
# ---------------------------------------------------------------------------
# 命令行入口
# ---------------------------------------------------------------------------
def main():
"""命令行入口:提取文件文本并输出。"""
import argparse
parser = argparse.ArgumentParser(
description="通用文档文本提取工具,支持 PDF/Word/Excel/CSV/图片",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
示例:
# 提取单个文件
python document_local_parse.py invoice.pdf
# 批量提取多个文件
python document_local_parse.py file1.pdf file2.docx image.png
# 扫描文件夹(递归)
python document_local_parse.py /path/to/folder/
# 禁用 OCR 降级(仅本地提取)
python document_local_parse.py invoice.pdf --no-ocr
# 输出 JSON 结果到默认 output 目录
python document_local_parse.py /path/to/folder/ --json
# 输出 JSON 结果到指定目录
python document_local_parse.py /path/to/folder/ --json --output-dir /custom/output/
"""
)
parser.add_argument(
"files",
nargs='+',
help="要提取的文件路径或文件夹路径(可多个)"
)
parser.add_argument(
"--no-ocr",
action="store_true",
help="禁用 OCR 降级,仅使用本地提取"
)
parser.add_argument(
"--output-dir",
type=str,
default=None,
help="JSON 输出目录(默认为 output/)"
)
parser.add_argument(
"--json",
action="store_true",
help="保存 JSON 格式结果文件"
)
parser.add_argument(
"--workspace-dir",
default=None,
help="用户工作目录路径"
)
args = parser.parse_args()
# 设置工作目录(必须在任何 read_config() 之前)
if args.workspace_dir:
from common.config_loader import set_workspace_dir
set_workspace_dir(args.workspace_dir)
use_ocr = not args.no_ocr
# 收集所有要处理的文件
all_files = []
for path_str in args.files:
path = Path(path_str)
if path.is_dir():
# 文件夹:递归扫描
files = scan_files(path, recursive=True)
all_files.extend(files)
elif path.is_file():
# 单文件
ext = path.suffix.lower()
if ext in ALL_SUPPORTED_EXTENSIONS:
all_files.append(path)
else:
print(f"[扫描] 跳过不支持的文件: {path.name} ({ext})", flush=True)
else:
print(f"[扫描] 路径不存在: {path}", flush=True)
if not all_files:
print("[错误] 没有找到可处理的文件", flush=True)
sys.exit(1)
print(f"\n[准备] 共找到 {len(all_files)} 个文件待处理", flush=True)
# 执行并行提取
results = extract_text(
all_files,
use_ocr_fallback=use_ocr,
output_dir=args.output_dir,
save_json=args.json
)
# 输出结果摘要
print(f"\n{'='*60}", flush=True)
print("[结果摘要]", flush=True)
print(f"{'='*60}", flush=True)
for result in results:
status = "成功" if not result['parsedText'].startswith('[ERROR]') else "失败"
text_len = len(result['parsedText'])
print(f" {result['file']}: {status} ({text_len} 字符)", flush=True)
print(f"\n总计: {len(results)} 个文件", flush=True)
if __name__ == "__main__":
main()
FILE:scripts/document/document_remote_ocr.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
多模态文档 OCR 识别工具。
批量上传文件夹内的文件或指定文件进行 OCR 识别,并行轮询获取所有文件的识别文本。
用法:
# 方式1:上传文件夹内所有支持的文件
python document_remote_ocr.py <文件夹路径>
# 方式2:上传指定的多个文件
python document_remote_ocr.py --files <文件1> <文件2> <文件3>
参数:
<文件夹路径> 包含待识别文件的目录路径
--files 指定要识别的文件列表(可多个)
--upload-workers 最大并发上传线程数(默认 5)
--poll-workers 最大并行轮询线程数(默认 10)
--poll-interval 初始轮询间隔秒数(默认 3,使用指数退避策略)
输出:
JSON 数组格式:[{"fileName": "xxx", "parsedText": "识别文本"}, ...]
"""
from __future__ import annotations
import argparse
import json
import os
import sys
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from pathlib import Path
from typing import Any, Dict, List, Optional
# 添加 scripts 目录到路径(以便导入 common 模块)
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from common.utils import request_openapi, require_user_id, read_config
# 支持的文件格式(根据接口文档)
SUPPORTED_EXTENSIONS = {
'.pdf', '.png', '.jpg', '.jpeg', '.jp2', '.webp', '.gif', '.bmp',
'.doc', '.docx', '.ppt', '.pptx', '.xls', '.xlsx', '.csv'
}
# 最大文件大小 10MB
MAX_FILE_SIZE = 10 * 1024 * 1024
# 最大轮询时间 120 秒
MAX_POLL_TIME = 120
def upload_document(file_path: str, config: dict) -> Dict[str, Any]:
"""
上传单个文档文件进行 OCR 识别。
Args:
file_path: 文件绝对路径
config: 配置字典
Returns:
上传响应数据,包含 taskId 等信息
"""
uri = "/openapi/v2/document/upload"
file_path = Path(file_path).resolve()
if not file_path.exists():
raise FileNotFoundError(f"文件不存在: {file_path}")
file_size = file_path.stat().st_size
if file_size > MAX_FILE_SIZE:
raise ValueError(f"文件大小 {file_size / 1024 / 1024:.2f}MB 超过限制 10MB: {file_path.name}")
print(f"[上传] 正在上传: {file_path.name} ({file_size / 1024:.1f}KB)", flush=True)
# 使用 multipart/form-data 上传文件
# 注意:签名时 sign_params 为空,不要设置 content_type(requests 会自动处理)
with open(file_path, 'rb') as f:
files = {'file': (file_path.name, f, 'application/octet-stream')}
resp = request_openapi(
"POST",
uri,
sign_params=None, # 文件上传时签名为空
config=config,
files=files,
)
result = resp.json()
print(f"[上传] 响应: {json.dumps(result, ensure_ascii=False)}", flush=True)
# 处理响应可能是字符串的情况
if isinstance(result, str):
try:
result = json.loads(result)
except:
raise RuntimeError(f"上传响应解析失败: {result}")
if not result.get("success"):
error_msg = result.get("message", "未知错误")
error_code = result.get("code", "UNKNOWN")
raise RuntimeError(f"上传失败 [{error_code}]: {error_msg}")
return result.get("data", {})
def upload_documents_batch(
file_paths: List[str],
config: dict,
max_workers: int = 5
) -> Dict[str, Dict[str, Any]]:
"""
并发批量上传文档文件。
Args:
file_paths: 文件路径列表
config: 配置字典
max_workers: 最大并发上传线程数
Returns:
{文件名: {taskId, filename, fileType, ...}} 映射
"""
upload_results = {}
print(f"\n[上传] 开始并发上传 {len(file_paths)} 个文件(最大 {max_workers} 线程)", flush=True)
# 使用线程池并发上传
with ThreadPoolExecutor(max_workers=min(max_workers, len(file_paths))) as executor:
# 提交所有上传任务
future_to_filepath = {
executor.submit(upload_document, file_path, config): file_path
for file_path in file_paths
}
# 收集结果
for future in as_completed(future_to_filepath):
file_path = future_to_filepath[future]
file_name = Path(file_path).name
try:
data = future.result()
upload_results[file_name] = data
print(f"[上传] ✓ {file_name} -> taskId: {data.get('taskId')}", flush=True)
except Exception as e:
print(f"[上传] ✗ {file_name} 失败: {e}", flush=True)
upload_results[file_name] = {
"taskId": None,
"filename": file_name,
"status": "upload_failed",
"errorMessage": str(e)
}
return upload_results
def poll_task_status(task_id: str, config: dict, poll_interval: float = 3.0) -> Dict[str, Any]:
"""
轮询单个 OCR 任务状态,直到完成或超时。
使用指数退避策略优化轮询效率。
Args:
task_id: 任务 ID
config: 配置字典
poll_interval: 初始轮询间隔(秒)
Returns:
最终任务状态数据
"""
uri = "/openapi/v2/document/task"
start_time = time.time()
# 指数退避参数
current_interval = poll_interval
min_interval = poll_interval
max_interval = 10.0 # 最大轮询间隔 10 秒
backoff_factor = 1.5 # 退避因子
print(f"[轮询] 开始轮询 taskId: {task_id}", flush=True)
while True:
elapsed = time.time() - start_time
# 检查超时
if elapsed > MAX_POLL_TIME:
print(f"[轮询] ⚠ 任务 {task_id} 超时 ({elapsed:.1f}s > {MAX_POLL_TIME}s)", flush=True)
return {
"taskId": task_id,
"status": "timeout",
"statusDesc": "处理超时",
"completed": True,
"parsedText": None,
"errorMessage": f"任务处理超时 ({elapsed:.1f}s)"
}
# 查询任务状态
params = {"taskId": task_id}
try:
resp = request_openapi("GET", uri, params=params, config=config)
result = resp.json()
if not result.get("success"):
error_msg = result.get("message", "查询失败")
print(f"[轮询] ✗ 查询失败: {error_msg}", flush=True)
return {
"taskId": task_id,
"status": "query_failed",
"completed": True,
"parsedText": None,
"errorMessage": error_msg
}
data = result.get("data", {})
status = data.get("status", "unknown")
completed = data.get("completed", False)
print(f"[轮询] 任务 {task_id[:20]}... 状态: {status} ({data.get('statusDesc', '')}) - 已耗时 {elapsed:.1f}s", flush=True)
# 任务完成
if completed:
if status == "success":
print(f"[轮询] ✓ 任务 {task_id[:20]}... 解析成功", flush=True)
elif status == "failed":
print(f"[轮询] ✗ 任务 {task_id[:20]}... 解析失败: {data.get('errorMessage', '')}", flush=True)
elif status == "not_supported":
print(f"[轮询] ⚠ 任务 {task_id[:20]}... 不支持的文件类型", flush=True)
return data
# 指数退避:如果任务仍在处理中,逐渐增加轮询间隔
# 处理初期可能较快,后期OCR识别通常较慢
if status in ("processing", "pending"):
new_interval = min(current_interval * backoff_factor, max_interval)
if new_interval != current_interval:
current_interval = new_interval
else:
# 未知状态使用固定间隔
current_interval = min_interval
# 等待下一次轮询
time.sleep(current_interval)
except Exception as e:
print(f"[轮询] ✗ 查询异常: {e}", flush=True)
# 检查是否是试用到期
error_str = str(e)
from config_loader import check_trial_expired
if check_trial_expired(error_str):
raise
# 其他异常,使用较短间隔重试
time.sleep(min_interval)
def poll_tasks_parallel(
tasks: Dict[str, Dict[str, Any]],
config: dict,
max_workers: int = 10,
poll_interval: float = 3.0
) -> List[Dict[str, Any]]:
"""
并行轮询多个 OCR 任务。
Args:
tasks: {文件名: {taskId, ...}} 映射
config: 配置字典
max_workers: 最大并行线程数
poll_interval: 轮询间隔(秒)
Returns:
[{"fileName": "xxx", "parsedText": "文本"}, ...] 列表
"""
# 过滤出有效 taskId
valid_tasks = {
name: data for name, data in tasks.items()
if data.get("taskId")
}
if not valid_tasks:
print("[轮询] 没有有效的任务需要轮询", flush=True)
return []
print(f"\n[轮询] 开始并行轮询 {len(valid_tasks)} 个任务(最大 {max_workers} 线程)", flush=True)
results = []
# 使用线程池并行轮询
with ThreadPoolExecutor(max_workers=min(max_workers, len(valid_tasks))) as executor:
# 提交所有任务
future_to_filename = {
executor.submit(poll_task_status, data["taskId"], config, poll_interval): filename
for filename, data in valid_tasks.items()
}
# 收集结果
for future in as_completed(future_to_filename):
filename = future_to_filename[future]
try:
task_data = future.result()
# 组装结果(使用 file 字段)
result_item = {
"file": filename,
"parsedText": task_data.get("parsedText")
}
results.append(result_item)
# 输出状态
status = task_data.get("status", "unknown")
if status == "success":
text_len = len(task_data.get("parsedText", "") or "")
print(f"[结果] ✓ {filename}: 识别成功 ({text_len} 字符)", flush=True)
else:
error_msg = task_data.get("errorMessage", "未知错误")
print(f"[结果] ✗ {filename}: {status} - {error_msg}", flush=True)
result_item["parsedText"] = None # 确保失败时为 None
result_item["error"] = error_msg
except Exception as e:
print(f"[结果] ✗ {filename} 轮询异常: {e}", flush=True)
results.append({
"fileName": filename,
"parsedText": None,
"error": str(e)
})
return results
def collect_files_from_directory(directory: str) -> List[str]:
"""
从目录中收集所有支持的文件(递归扫描子目录)。
Args:
directory: 目录路径
Returns:
文件绝对路径列表
"""
dir_path = Path(directory).resolve()
if not dir_path.exists():
raise FileNotFoundError(f"目录不存在: {directory}")
if not dir_path.is_dir():
raise NotADirectoryError(f"路径不是目录: {directory}")
files = []
for file_path in dir_path.rglob('*'): # 递归扫描
if file_path.is_file():
ext = file_path.suffix.lower()
if ext in SUPPORTED_EXTENSIONS:
# 检查文件大小
try:
file_size = file_path.stat().st_size
if file_size <= MAX_FILE_SIZE:
files.append(str(file_path))
else:
print(f"[扫描] ⚠ 跳过超大文件: {file_path.name} ({file_size / 1024 / 1024:.2f}MB > 10MB)", flush=True)
except Exception as e:
print(f"[扫描] ⚠ 无法访问文件: {file_path.name} - {e}", flush=True)
return sorted(files)
def validate_and_collect_files(
directory: Optional[str] = None,
files: Optional[List[str]] = None
) -> List[str]:
"""
验证并收集待识别的文件列表。
Args:
directory: 目录路径(与 files 二选一)
files: 文件路径列表(与 directory 二选一)
Returns:
文件绝对路径列表
Raises:
ValueError: 参数不正确
"""
if directory and files:
raise ValueError("不能同时指定目录和文件列表,请选择其中一种方式")
if not directory and not files:
raise ValueError("必须指定目录或文件列表")
if directory:
# 从目录收集文件
return collect_files_from_directory(directory)
# 验证指定的文件
valid_files = []
for file_path_str in files:
file_path = Path(file_path_str).resolve()
if not file_path.exists():
print(f"[验证] ✗ 文件不存在: {file_path}", flush=True)
continue
if not file_path.is_file():
print(f"[验证] ✗ 路径不是文件: {file_path}", flush=True)
continue
ext = file_path.suffix.lower()
if ext not in SUPPORTED_EXTENSIONS:
print(f"[验证] ✗ 不支持的文件格式: {file_path.name} ({ext})", flush=True)
continue
file_size = file_path.stat().st_size
if file_size > MAX_FILE_SIZE:
print(f"[验证] ✗ 文件过大: {file_path.name} ({file_size / 1024 / 1024:.2f}MB > 10MB)", flush=True)
continue
valid_files.append(str(file_path))
return sorted(valid_files)
def main():
"""主函数:批量 OCR 识别文件夹内的所有文件或指定文件。"""
parser = argparse.ArgumentParser(
description="批量上传文件夹内的文件或指定文件进行 OCR 识别,并行获取识别文本",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
示例:
# 方式1:扫描文件夹内所有支持的文件
python document_remote_ocr.py /path/to/folder
# 方式2:上传指定的多个文件
python document_remote_ocr.py --files file1.pdf file2.png file3.docx
"""
)
parser.add_argument(
"directory",
nargs='?',
default=None,
help="包含待识别文件的目录路径(与 --files 二选一)"
)
parser.add_argument(
"--files",
nargs='+',
default=None,
metavar='FILE',
help="指定要识别的文件列表(可多个,与 directory 二选一)"
)
parser.add_argument(
"--upload-workers",
type=int,
default=5,
help="最大并发上传线程数(默认 5,最大 10)"
)
parser.add_argument(
"--poll-workers",
type=int,
default=10,
help="最大并行轮询线程数(默认 10,最大 10)"
)
parser.add_argument(
"--poll-interval",
type=float,
default=3.0,
help="初始轮询间隔秒数(默认 3,使用指数退避策略)"
)
parser.add_argument(
"--output",
type=str,
default=None,
help="输出 JSON 文件路径(默认输出到脚本同级 output/ocr_result.json)"
)
parser.add_argument(
"--json",
action="store_true",
help="仅输出 JSON 结果(不打印日志,适合脚本调用)"
)
parser.add_argument(
"--workspace-dir",
default=None,
help="用户工作目录路径"
)
args = parser.parse_args()
verbose = not args.json
# 设置工作目录(必须在 read_config 之前)
if args.workspace_dir:
from common.config_loader import set_workspace_dir
set_workspace_dir(args.workspace_dir)
# 加载配置
if verbose:
print("=" * 60, flush=True)
print("QuickBI 多模态文档 OCR 识别工具", flush=True)
print("=" * 60, flush=True)
config = read_config()
# 确保 user_id 已配置
try:
require_user_id(config)
except Exception as e:
if verbose:
print(f"\n[错误] 用户配置失败: {e}", flush=True)
sys.exit(1)
# 收集文件
try:
files = validate_and_collect_files(
directory=args.directory,
files=args.files
)
except Exception as e:
if verbose:
print(f"\n[错误] {e}", flush=True)
parser.print_usage()
sys.exit(1)
if not files:
if verbose:
if args.directory:
print(f"\n[警告] 目录 {args.directory} 中没有找到支持的文件", flush=True)
else:
print(f"\n[警告] 没有找到有效的文件", flush=True)
print(f"支持的文件格式: {', '.join(sorted(SUPPORTED_EXTENSIONS))}", flush=True)
sys.exit(0)
if verbose:
print(f"\n[扫描] 找到 {len(files)} 个待识别文件:", flush=True)
for f in files:
size = Path(f).stat().st_size
print(f" - {Path(f).name} ({size / 1024:.1f}KB)", flush=True)
print(flush=True)
# 限制最大并发数为 10
max_upload_workers = min(args.upload_workers, 10)
max_poll_workers = min(args.poll_workers, 10)
# Step 1: 批量上传
if verbose:
print("=" * 60, flush=True)
print("Step 1: 并发上传文件", flush=True)
print("=" * 60, flush=True)
upload_results = upload_documents_batch(
files,
config,
max_workers=max_upload_workers
)
# Step 2: 并行轮询
if verbose:
print("\n" + "=" * 60, flush=True)
print("Step 2: 并行轮询获取识别结果(指数退避策略)", flush=True)
print("=" * 60, flush=True)
results = poll_tasks_parallel(
upload_results,
config,
max_workers=max_poll_workers,
poll_interval=args.poll_interval
)
# 输出最终 JSON 结果
output_json = json.dumps(results, ensure_ascii=False, indent=2)
# 确定输出路径(默认 output/ocr_result_{timestamp}.json)
if args.output:
output_path = Path(args.output).resolve()
else:
# 默认输出到脚本同级的 output 文件夹,带时间戳
script_dir = Path(__file__).resolve().parent.parent
output_dir = script_dir / "output"
timestamp = int(time.time())
output_path = output_dir / f"ocr_result_{timestamp}.json"
# 保存到文件
output_path.parent.mkdir(parents=True, exist_ok=True)
output_path.write_text(output_json, encoding='utf-8')
if verbose:
print(f"\n[保存] ✓ JSON 结果已保存到: {output_path}", flush=True)
# 输出到 stdout(JSON 模式)
if args.json:
print(output_json, flush=True)
elif verbose:
print("\n" + "=" * 60, flush=True)
print("最终结果(JSON 格式)", flush=True)
print("=" * 60, flush=True)
print(output_json, flush=True)
# 统计信息
success_count = sum(1 for r in results if r.get("parsedText") is not None)
fail_count = len(results) - success_count
print("\n" + "=" * 60, flush=True)
print(f"统计: 总计 {len(results)} | 成功 {success_count} | 失败 {fail_count}", flush=True)
print("=" * 60, flush=True)
if __name__ == "__main__":
main()
FILE:scripts/document/generate_excel.py
#!/usr/bin/env python3
"""
Excel 生成脚本
根据提取数据 JSON 生成汇总 Excel 报表
遵循 xlsx-format.md 格式规范
输出路径: output/doc_scan_result_{timestamp}.xlsx
"""
import json
import sys
from pathlib import Path
from datetime import datetime
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.utils import get_column_letter
# 输出目录
OUTPUT_DIR = Path(__file__).resolve().parent.parent / "output"
# 样式常量(遵循 xlsx-format.md 规范)
HEADER_FILL = PatternFill("solid", fgColor="4472C4")
HEADER_FONT = Font(bold=True, color="FFFFFF", size=11)
THIN_BORDER = Border(
left=Side("thin"), right=Side("thin"),
top=Side("thin"), bottom=Side("thin")
)
CENTER_ALIGN = Alignment(horizontal="center", vertical="center")
WRAP_ALIGN = Alignment(wrap_text=True, vertical="top")
# 分类组映射(基于 document_classification.md V2.0,10 大分类组、37 个子类型)
GROUP_MAP = {
# A. 财务与税务类 (finance-tax)
"增值税发票": "A.财务与税务",
"银行回单": "A.财务与税务",
"银行对账单": "A.财务与税务",
"费用报销单": "A.财务与税务",
"合同协议": "A.财务与税务",
"税务申报表": "A.财务与税务",
"财务报表": "A.财务与税务",
"收据付款凭证": "A.财务与税务",
# B. 人力资源类 (hr)
"简历": "B.人力资源",
"劳动合同": "B.人力资源",
"离职证明": "B.人力资源",
"工资条": "B.人力资源",
"考勤记录": "B.人力资源",
"培训证书": "B.人力资源",
"绩效考核表": "B.人力资源",
# C. 供应链与采购类 (supply-chain)
"采购订单": "C.供应链与采购",
"送货单": "C.供应链与采购",
"入库单": "C.供应链与采购",
"质检报告": "C.供应链与采购",
"供应商评估表": "C.供应链与采购",
"库存盘点表": "C.供应链与采购",
# D. 行政与法务类 (admin-legal)
"营业执照": "D.行政与法务",
"身份证": "D.行政与法务",
"护照": "D.行政与法务",
"保密协议": "D.行政与法务",
"资质证书": "D.行政与法务",
"公文通知": "D.行政与法务",
# E. 医疗类 (medical)
"病历": "E.医疗",
"处方单": "E.医疗",
"检验检查报告": "E.医疗",
"体检报告": "E.医疗",
# F. 保险类 (insurance)
"保单": "F.保险",
"理赔申请": "F.保险",
"理赔结案通知": "F.保险",
# G. 物流类 (logistics)
"运单": "G.物流",
"提单": "G.物流",
"报关单": "G.物流",
# H. 技术与运维类 (tech-ops)
"系统日志": "H.技术与运维",
"漏洞扫描报告": "H.技术与运维",
"服务器监控报表": "H.技术与运维",
# I. 客服与销售类 (sales-service)
"客诉工单记录": "I.客服与销售",
"销售报价单": "I.客服与销售",
"售后退换货单": "I.客服与销售",
# J. 政务与合规类 (gov-compliance)
"招投标文件": "J.政务与合规",
"政务审批单": "J.政务与合规",
"合规审计报告": "J.政务与合规",
# 其他
"未识别": "未识别",
"解析失败": "解析失败",
}
def display_width(s: str) -> int:
"""计算字符串显示宽度(CJK 字符宽度为 2)"""
if not s:
return 0
return sum(2 if ord(c) > 0x7F else 1 for c in str(s))
def auto_column_width(ws, headers: list, rows: list, max_width: int = 50):
"""自动调整列宽"""
for col_idx in range(1, len(headers) + 1):
widths = [display_width(headers[col_idx - 1])]
for row in rows:
if col_idx - 1 < len(row) and row[col_idx - 1]:
widths.append(display_width(str(row[col_idx - 1])))
col_letter = get_column_letter(col_idx)
ws.column_dimensions[col_letter].width = min(max(widths) + 2, max_width)
def create_data_sheet(wb: Workbook, sheet_name: str, headers: list, rows: list):
"""创建数据 Sheet"""
ws = wb.create_sheet(title=sheet_name[:31])
# 写入表头
for col, header in enumerate(headers, 1):
cell = ws.cell(row=1, column=col, value=header)
cell.fill = HEADER_FILL
cell.font = HEADER_FONT
cell.alignment = CENTER_ALIGN
cell.border = THIN_BORDER
# 写入数据行
for row_idx, row_data in enumerate(rows, 2):
for col_idx, value in enumerate(row_data, 1):
cell = ws.cell(row=row_idx, column=col_idx, value=value if value else "")
cell.border = THIN_BORDER
cell.alignment = WRAP_ALIGN
# 自动列宽
auto_column_width(ws, headers, rows)
# 冻结首行
ws.freeze_panes = "A2"
# 自动筛选(仅有数据时)
if rows:
ws.auto_filter.ref = ws.dimensions
def create_summary_sheet(wb: Workbook, extraction_data: dict, scan_time: str, total_files: int):
"""创建汇总 Sheet(放在首位)"""
ws = wb.create_sheet(title="汇总", index=0)
headers = ["分类组", "子类型", "文件数量", "提取字段"]
for col, header in enumerate(headers, 1):
cell = ws.cell(row=1, column=col, value=header)
cell.fill = HEADER_FILL
cell.font = HEADER_FONT
cell.alignment = CENTER_ALIGN
cell.border = THIN_BORDER
row_idx = 2
for subtype_name, data in extraction_data.items():
group_name = GROUP_MAP.get(subtype_name, "其他")
headers_cn = data.get("headers_cn", [])
fields = ", ".join(headers_cn[1:]) if len(headers_cn) > 1 else ""
ws.cell(row_idx, 1, value=group_name).border = THIN_BORDER
ws.cell(row_idx, 2, value=subtype_name).border = THIN_BORDER
ws.cell(row_idx, 3, value=len(data.get("rows", []))).border = THIN_BORDER
ws.cell(row_idx, 4, value=fields).border = THIN_BORDER
row_idx += 1
# 元信息
row_idx += 1
ws.cell(row_idx, 1, value="扫描时间").font = Font(bold=True)
ws.cell(row_idx, 2, value=scan_time)
row_idx += 1
ws.cell(row_idx, 1, value="文件总数").font = Font(bold=True)
ws.cell(row_idx, 2, value=total_files)
# 列宽
ws.column_dimensions["A"].width = 16
ws.column_dimensions["B"].width = 18
ws.column_dimensions["C"].width = 12
ws.column_dimensions["D"].width = 60
def generate_excel(input_json_path: str, output_path: str = None) -> str:
"""
根据提取数据 JSON 生成 Excel
JSON 格式:
{
"scan_time": "2026-03-30 10:00:00",
"total_files": 10,
"extraction_data": {
"增值税发票": {
"headers_cn": ["源文件名", "发票类型", "发票代码", ...],
"rows": [["发票_001.pdf", "专用", "1234567890", ...], ...]
},
...
}
}
Args:
input_json_path: 输入 JSON 文件路径
output_path: 输出 Excel 路径(默认 output/doc_scan_result_{timestamp}.xlsx)
Returns:
输出 Excel 文件路径
"""
with open(input_json_path, "r", encoding="utf-8") as f:
data = json.load(f)
extraction_data = data.get("extraction_data", {})
scan_time = data.get("scan_time", datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
total_files = data.get("total_files", sum(len(d.get("rows", [])) for d in extraction_data.values()))
wb = Workbook()
wb.remove(wb.active)
# 为每个子类型创建 Sheet
for subtype_name, subtype_data in extraction_data.items():
headers = subtype_data.get("headers_cn", [])
rows = subtype_data.get("rows", [])
if headers:
create_data_sheet(wb, subtype_name, headers, rows)
# 创建汇总 Sheet
create_summary_sheet(wb, extraction_data, scan_time, total_files)
# 输出路径(默认 output/doc_scan_result_{timestamp}.xlsx)
if output_path:
output_path = Path(output_path).resolve()
else:
# 默认输出到 output 目录
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_path = OUTPUT_DIR / f"doc_scan_result_{timestamp}.xlsx"
wb.save(str(output_path))
print(f"[保存] ✓ Excel 结果已保存到: {output_path}", flush=True)
return str(output_path)
def main():
import argparse
parser = argparse.ArgumentParser(
description="根据提取数据 JSON 生成汇总 Excel 报表"
)
parser.add_argument(
"input_path",
help="输入 JSON 文件路径"
)
parser.add_argument(
"output_path",
nargs="?",
default=None,
help="输出 Excel 路径(可选,默认 output/doc_scan_result_{timestamp}.xlsx)"
)
parser.add_argument(
"--workspace-dir",
default=None,
help="用户工作目录路径"
)
args = parser.parse_args()
# 设置工作目录(必须在任何 read_config() 之前)
if args.workspace_dir:
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
from common.config_loader import set_workspace_dir
set_workspace_dir(args.workspace_dir)
if not Path(args.input_path).exists():
print(json.dumps({"error": f"文件不存在: {args.input_path}"}, ensure_ascii=False))
sys.exit(1)
try:
result_path = generate_excel(args.input_path, args.output_path)
print(json.dumps({
"status": "success",
"output_path": result_path
}, ensure_ascii=False))
except Exception as e:
print(json.dumps({"error": str(e)}, ensure_ascii=False))
sys.exit(1)
if __name__ == "__main__":
main()
FILE:scripts/insight/__init__.py
FILE:scripts/insight/q_insights.py
# -*- coding: utf-8 -*-
"""
Quick BI 小Q解读:Excel 文件解析 / 仪表板快照 + 数据解读流式输出。
用法:
# 通过 Excel 文件解读
python scripts/q_insights.py "这个报表有什么异常?" --excel-file "/path/to/data.xlsx"
# 通过仪表板快照解读(旧模式)
python scripts/q_insights.py "这个报表有什么异常?" --works-id "your-works-id"
"""
from __future__ import annotations
import argparse
import base64
import json
import os
import sys
import time
from pathlib import Path
from typing import Any, Dict, List, Optional
sys.path.insert(0, str(Path(__file__).parent.parent))
from common.utils import (
read_config,
require_user_id,
request_openapi,
request_openapi_stream,
parse_sse_event,
set_workspace_dir,
)
SNAPSHOT_URI = "/openapi/v2/snapshot/calling/shot"
INTERPRETATION_URI = "/openapi/v2/smartq/dataInterpretationStream"
POLL_INTERVAL_SECONDS = 3
MAX_POLL_COUNT = 60
MAX_MARKDOWN_CHARS = 100000
ACC_PROMPT = """【关键要求】:若输入数据中未包含用户提问的相关内容,则直接回复“不存在”,禁止随意捏造数据。若报告中包含相关数据,且用户问题有明确指向,则须在报告开头设置“用户问题解答”章节,针对问题作出直接、明确的回答,并予以标注。"""
# ---------------------------------------------------------------------------
# Excel 文件解析
# ---------------------------------------------------------------------------
def _parse_xls_to_markdown(file_path: str) -> Optional[str]:
"""解析 .xls 文件(旧版 Excel 97-2003 格式),需要 xlrd 库。"""
try:
import xlrd
except ImportError:
print("[Excel] 缺少 xlrd 依赖,请执行: pip install xlrd", flush=True)
return None
try:
wb = xlrd.open_workbook(file_path)
except Exception as e:
print(f"[Excel] 文件打开失败: {e}", flush=True)
return None
md_parts: List[str] = []
for sheet in wb.sheets():
if sheet.nrows == 0:
continue
rows: List[List[str]] = []
for row_idx in range(sheet.nrows):
rows.append([str(sheet.cell_value(row_idx, col)) for col in range(sheet.ncols)])
headers = rows[0]
col_count = len(headers)
lines: List[str] = []
lines.append(f"## {sheet.name}\n")
lines.append("| " + " | ".join(headers) + " |")
lines.append("|" + "|".join([" --- "] * col_count) + "|")
for data_row in rows[1:]:
padded = data_row + [""] * (col_count - len(data_row))
lines.append("| " + " | ".join(padded[:col_count]) + " |")
md_parts.append("\n".join(lines))
if not md_parts:
print("[Excel] 文件中无有效数据", flush=True)
return None
markdown_text = "\n\n".join(md_parts)
print(f"[Excel] 解析完成,共 {wb.nsheets} 个 Sheet,数据长度: {len(markdown_text)} 字符", flush=True)
return markdown_text
def _parse_xlsx_to_markdown(file_path: str) -> Optional[str]:
"""解析 .xlsx 文件(Office Open XML 格式),需要 openpyxl 库。"""
try:
import openpyxl
except ImportError:
print("[Excel] 缺少 openpyxl 依赖,请执行: pip install openpyxl", flush=True)
return None
try:
wb = openpyxl.load_workbook(file_path, read_only=True, data_only=True)
except Exception as e:
print(f"[Excel] 文件打开失败: {e}", flush=True)
return None
md_parts: List[str] = []
for sheet_name in wb.sheetnames:
ws = wb[sheet_name]
rows: List[List[str]] = []
for row in ws.iter_rows(values_only=True):
rows.append([str(cell) if cell is not None else "" for cell in row])
if not rows:
continue
headers = rows[0]
col_count = len(headers)
lines: List[str] = []
lines.append(f"## {sheet_name}\n")
lines.append("| " + " | ".join(headers) + " |")
lines.append("|" + "|".join([" --- "] * col_count) + "|")
for data_row in rows[1:]:
padded = data_row + [""] * (col_count - len(data_row))
lines.append("| " + " | ".join(padded[:col_count]) + " |")
md_parts.append("\n".join(lines))
wb.close()
if not md_parts:
print("[Excel] 文件中无有效数据", flush=True)
return None
markdown_text = "\n\n".join(md_parts)
print(f"[Excel] 解析完成,共 {len(wb.sheetnames)} 个 Sheet,数据长度: {len(markdown_text)} 字符", flush=True)
return markdown_text
def parse_excel_to_markdown(file_path: str) -> Optional[str]:
"""
将 Excel 文件解析为 Markdown 表格文本。
自动识别 .xls / .xlsx 格式,支持多 Sheet。
返回合并后的 Markdown 文本;解析失败时返回 None。
"""
if not os.path.isfile(file_path):
print(f"[Excel] 文件不存在: {file_path}", flush=True)
return None
print(f"[Excel] 正在解析文件: {file_path}", flush=True)
ext = os.path.splitext(file_path)[1].lower()
if ext == ".xls":
return _parse_xls_to_markdown(file_path)
elif ext == ".xlsx":
return _parse_xlsx_to_markdown(file_path)
else:
print(f"[Excel] 不支持的文件格式: {ext},仅支持 .xls 和 .xlsx", flush=True)
return None
# ---------------------------------------------------------------------------
# 仪表板快照
# ---------------------------------------------------------------------------
def call_snapshot(
works_id: str,
user_id: str,
*,
config: Optional[dict] = None,
) -> dict:
"""
调用 POST /openapi/v2/snapshot/calling/shot 拉取仪表板快照。
返回 SnapshotResultModel:
{status: "processing"|"success"|"failed", errorInfo: {...}, resultMarkdownText: "..."}
"""
payload: Dict[str, Any] = {
"worksId": works_id,
"worksType": "dashboard",
"userId": user_id,
"targetType": "excel",
}
resp = request_openapi(
"POST",
SNAPSHOT_URI,
json_body=payload,
timeout=60,
config=config,
)
return resp.json()
def poll_snapshot(
works_id: str,
user_id: str,
*,
config: Optional[dict] = None,
) -> Optional[str]:
"""
轮询快照接口直到 status 为 success 或 failed。
返回 resultMarkdownText(成功时)或 None(失败 / 超时时)。
"""
for attempt in range(1, MAX_POLL_COUNT + 1):
result = call_snapshot(works_id, user_id, config=config)
status = result.get("status", "")
print(f"[快照] 第 {attempt} 次轮询,状态: {status}", flush=True)
if status == "success":
markdown_text = result.get("resultMarkdownText", "")
if not markdown_text:
print("[快照] 状态为 success 但 resultMarkdownText 为空", flush=True)
return None
print(f"[快照] 快照数据获取成功,数据长度: {len(markdown_text)} 字符", flush=True)
return markdown_text
if status == "failed":
error_info = result.get("errorInfo", {})
print(f"[快照] 仪表板数据处理失败: {json.dumps(error_info, ensure_ascii=False)}", flush=True)
print("[快照] 仪表板数据处理失败,请联系产品服务同学处理", flush=True)
return None
if status == "processing":
time.sleep(POLL_INTERVAL_SECONDS)
continue
print(f"[快照] 未知状态: {status},原始响应: {json.dumps(result, ensure_ascii=False)}", flush=True)
return None
print(f"[快照] 轮询超时(已等待 {MAX_POLL_COUNT * POLL_INTERVAL_SECONDS} 秒)", flush=True)
return None
# ---------------------------------------------------------------------------
# 数据解读(SSE 流式)
# ---------------------------------------------------------------------------
def run_interpretation_stream(
string_data: str,
user_question: str,
*,
config: Optional[dict] = None,
):
"""
调用 POST /openapi/v2/smartq/dataInterpretationStream 进行数据解读。
实时解析 SSE 事件并输出推理过程和解读结果。
"""
config = config or read_config()
oapi_user_id = require_user_id(config)
payload: Dict[str, Any] = {
"stringData": base64.b64encode(string_data.encode("utf-8")).decode("utf-8"),
"userQuestion": ACC_PROMPT + user_question,
"modelCode": config.get("model_code", "SYSTEM_deepseek-v3"),
"oapiUserId": oapi_user_id,
}
print(f"\n{'=' * 60}", flush=True)
print(f"[数据解读] 问题: {user_question}", flush=True)
print(f"{'=' * 60}\n", flush=True)
reasoning_buf: List[str] = []
text_buf: List[str] = []
event_count = 0
for raw_event in request_openapi_stream(
INTERPRETATION_URI, json_body=payload, config=config, timeout=600
):
event_count += 1
# print(f"[DEBUG] 原始事件 #{event_count}: {raw_event[:300]}", flush=True)
event_data = parse_sse_event(raw_event)
if not event_data:
print(f"[DEBUG] 事件 #{event_count} 解析为空,跳过", flush=True)
continue
event_type = event_data.get("type", "")
data = event_data.get("data", "")
# print(f"[DEBUG] 事件 #{event_count}: type={event_type}, data={str(data)[:200]}", flush=True)
if event_type in ("heartbeat", "trace", "locale"):
continue
elif event_type == "reasoning":
reasoning_buf.append(str(data))
elif event_type in ("text", "summary"):
text_buf.append(str(data))
elif event_type == "finish":
break
else:
print(f"[SSE] 未处理事件: type={event_type}, data={json.dumps(event_data, ensure_ascii=False)[:500]}", flush=True)
print(f"[DEBUG] SSE 流结束,共收到 {event_count} 个事件", flush=True)
if reasoning_buf:
reasoning_text = "".join(reasoning_buf)
print(f"[推理过程]\n{reasoning_text}\n", flush=True)
if text_buf:
interpretation_text = "".join(text_buf)
print(f"[解读结果]\n{interpretation_text}", flush=True)
else:
print("[解读结果] 未获取到解读内容", flush=True)
print(f"\n[完成] 数据解读结束", flush=True)
return "".join(text_buf)
# ---------------------------------------------------------------------------
# 完整流程
# ---------------------------------------------------------------------------
def run_insights(
question: str,
works_id: Optional[str] = None,
*,
excel_file: Optional[str] = None,
config: Optional[dict] = None,
):
"""执行小Q解读的完整流程:Excel 解析 / 快照轮询 → 数据解读。"""
config = config or read_config()
# 统一调用 require_user_id,确保试用注册流程正常执行
user_id = require_user_id(config)
print(f"{'=' * 60}", flush=True)
print(f"[小Q解读] 问题: {question}", flush=True)
# 优先从 Excel 文件获取数据
if excel_file:
print(f"[小Q解读] 数据来源: Excel 文件 ({excel_file})", flush=True)
print(f"{'=' * 60}\n", flush=True)
markdown_text = parse_excel_to_markdown(excel_file)
elif works_id:
print(f"[小Q解读] 数据来源: 仪表板快照 ({works_id})", flush=True)
print(f"{'=' * 60}\n", flush=True)
print("[快照] 开始拉取仪表板快照数据...", flush=True)
markdown_text = poll_snapshot(works_id, user_id, config=config)
else:
print(f"{'=' * 60}\n", flush=True)
print("[小Q解读] 流程终止:必须指定 --excel-file 或 --works-id", flush=True)
sys.exit(1)
if not markdown_text:
print(f"\n{'=' * 60}", flush=True)
print("[小Q解读] 流程终止:无法获取数据", flush=True)
print(f"{'=' * 60}", flush=True)
sys.exit(1)
# 数据超限检查:超限时报错终止,要求 Agent 先做数据过滤
if len(markdown_text) > MAX_MARKDOWN_CHARS:
print(f"[小Q解读] 数据量超限:当前 {len(markdown_text)} 字符,上限 {MAX_MARKDOWN_CHARS} 字符。", flush=True)
print("[小Q解读] 请先根据用户问题对 Excel 数据进行过滤(只保留相关行和列),将结果另存为新文件后重新调用。", flush=True)
print("[小Q解读] 若过滤后仍超限,请按行拆分为多份文件分批调用,最后汇总结果。", flush=True)
sys.exit(1)
# Step 3 & 4: 调用数据解读流式接口
result = run_interpretation_stream(
string_data=markdown_text,
user_question=question,
config=config
)
print(f"\n{'=' * 60}", flush=True)
print("[小Q解读] 解读流程完成", flush=True)
print(f"{'=' * 60}", flush=True)
return result
# ---------------------------------------------------------------------------
# CLI
# ---------------------------------------------------------------------------
def main():
parser = argparse.ArgumentParser(description="Quick BI 小Q解读")
parser.add_argument("question", help="用户的解读问题")
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument("--excel-file", dest="excel_file", help="Excel 文件路径(.xlsx)")
# group.add_argument("--works-id", dest="works_id", help="仪表板 ID (worksId)")
parser.add_argument("--workspace-dir", default=None, help="用户工作目录路径")
args = parser.parse_args()
if args.workspace_dir:
set_workspace_dir(args.workspace_dir)
run_insights(args.question, excel_file=args.excel_file)
if __name__ == "__main__":
main()
FILE:scripts/report/__init__.py
FILE:scripts/report/create_chat.py
# -*- coding: utf-8 -*-
"""
创建小Q报告会话。
"""
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
from typing import List, Optional
sys.path.insert(0, str(Path(__file__).parent.parent))
from common.utils import create_report_chat, read_config
def load_resources_from_args(resources_json: Optional[str], resources_file: Optional[str]) -> Optional[List[dict]]:
"""从命令行参数读取 resources。"""
if resources_file:
with open(resources_file, "r", encoding="utf-8") as file_handle:
loaded = json.load(file_handle)
if isinstance(loaded, dict):
if "resources" in loaded:
return loaded["resources"]
raise ValueError("resources-file 必须是 resources 数组,或包含 resources 字段的 JSON 对象")
return loaded
if resources_json:
loaded = json.loads(resources_json)
if isinstance(loaded, dict):
if "resources" in loaded:
return loaded["resources"]
raise ValueError("resources-json 必须是 resources 数组,或包含 resources 字段的 JSON 对象")
return loaded
return None
def parse_args() -> argparse.Namespace:
"""解析命令行参数。"""
parser = argparse.ArgumentParser(description="创建小Q报告会话")
parser.add_argument("question", help="用户输入的问题")
parser.add_argument("--resources-json", default=None, help="resources 的 JSON 字符串")
parser.add_argument("--resources-file", default=None, help="包含 resources 的 JSON 文件")
parser.add_argument("--chat-id", default=None, help="自定义 chatId,不传则自动生成")
parser.add_argument("--message-id", default=None, help="自定义 messageId,不传则自动生成")
parser.add_argument("--workspace-dir", default=None, help="用户工作目录路径")
return parser.parse_args()
def main() -> int:
"""脚本入口。"""
args = parse_args()
if args.workspace_dir:
from common.config_loader import set_workspace_dir
set_workspace_dir(args.workspace_dir)
config = read_config()
resources = load_resources_from_args(args.resources_json, args.resources_file)
result = create_report_chat(
args.question,
resources=resources,
chat_id=args.chat_id,
message_id=args.message_id,
config=config,
)
print(f"chatId: {result['chatId']}")
print(f"messageId: {result['messageId']}")
print(
json.dumps(
{
"chatId": result["chatId"],
"messageId": result["messageId"],
"statusCode": result["statusCode"],
"response": result["response"],
},
ensure_ascii=False,
indent=2,
)
)
return 0
if __name__ == "__main__":
try:
raise SystemExit(main())
except Exception as exc: # pragma: no cover - 终端脚本直接输出错误
print(f"创建会话失败:{exc}", file=sys.stderr)
raise SystemExit(1)
FILE:scripts/report/generate_report.py
# -*- coding: utf-8 -*-
"""
一键生成小Q报告:上传文件 -> 创建会话 -> 轮询结果。
"""
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
from typing import Any, Dict, List
sys.path.insert(0, str(Path(__file__).parent.parent))
from common.utils import (
UPLOAD_CHAT_TYPE,
create_report_chat,
poll_report_result,
read_config,
upload_files,
)
def parse_args() -> argparse.Namespace:
"""解析命令行参数。"""
parser = argparse.ArgumentParser(description="一键生成小Q报告")
parser.add_argument("question", help="用户输入的问题")
parser.add_argument("files", nargs="*", help="可选的本地文件路径")
parser.add_argument(
"--poll-interval",
type=float,
default=3.0,
help="轮询间隔(秒)",
)
parser.add_argument(
"--max-wait",
type=int,
default=30 * 60,
help="最大等待时间(秒)",
)
parser.add_argument("--workspace-dir", default=None, help="用户工作目录路径")
return parser.parse_args()
def main() -> int:
"""脚本入口。"""
args = parse_args()
if args.workspace_dir:
from common.config_loader import set_workspace_dir
set_workspace_dir(args.workspace_dir)
config = read_config()
resources: List[Dict[str, Any]] = []
upload_result: Dict[str, Any] = {}
if args.files:
upload_result = upload_files(args.files, config=config)
resources = upload_result["resources"]
print(
json.dumps(
{
"chatType": UPLOAD_CHAT_TYPE,
"fileCount": len(args.files),
"resources": resources,
"summary": f"已上传 {len(args.files)} 个文件",
},
ensure_ascii=False,
indent=2,
),
flush=True,
)
create_result = create_report_chat(
args.question,
resources=resources,
config=config,
)
print(f"chatId: {create_result['chatId']}", flush=True)
print(f"messageId: {create_result['messageId']}", flush=True)
poll_result = poll_report_result(
create_result["chatId"],
poll_interval=args.poll_interval,
max_wait_seconds=args.max_wait,
show_progress=True,
config=config,
)
final_output = {
"chatId": create_result["chatId"],
"messageId": create_result["messageId"],
# 与 poll_report_result 一致:仅轮询正常完成时才有回放地址
"reportUrl": poll_result.get("reportUrl"),
"finished": poll_result["finished"],
"error": poll_result.get("error"),
"eventCount": poll_result["eventCount"],
"tokenInfo": poll_result["tokenInfo"],
"resources": resources,
"uploadResults": upload_result.get("uploadResults", []),
"createResponse": create_result["response"],
}
print(
json.dumps(
final_output,
ensure_ascii=False,
indent=2,
),
flush=True,
)
return 0
if __name__ == "__main__":
try:
raise SystemExit(main())
except Exception as exc: # pragma: no cover - 终端脚本直接输出错误
print(f"生成报告失败:{exc}", file=sys.stderr)
raise SystemExit(1)
FILE:scripts/report/query_report_result.py
# -*- coding: utf-8 -*-
"""
轮询小Q报告 SSE 结果并输出增量解析内容。
"""
from __future__ import annotations
import argparse
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
from common.utils import DEFAULT_MAX_POLL_SECONDS, DEFAULT_POLL_INTERVAL_SECONDS, poll_report_result
def parse_args() -> argparse.Namespace:
"""解析命令行参数。"""
parser = argparse.ArgumentParser(description="轮询小Q报告 SSE 结果")
parser.add_argument("chat_id", help="创建会话时生成的 chatId")
parser.add_argument(
"--poll-interval",
type=float,
default=DEFAULT_POLL_INTERVAL_SECONDS,
help="轮询间隔(秒)",
)
parser.add_argument(
"--max-wait",
type=int,
default=DEFAULT_MAX_POLL_SECONDS,
help="最大等待时间(秒)",
)
parser.add_argument("--workspace-dir", default=None, help="用户工作目录路径")
return parser.parse_args()
def main() -> int:
"""脚本入口。"""
args = parse_args()
if args.workspace_dir:
from common.config_loader import set_workspace_dir
set_workspace_dir(args.workspace_dir)
result = poll_report_result(
args.chat_id,
poll_interval=args.poll_interval,
max_wait_seconds=args.max_wait,
show_progress=True,
)
if result.get("error"):
return 1
return 0
if __name__ == "__main__":
try:
raise SystemExit(main())
except Exception as exc: # pragma: no cover - 终端脚本直接输出错误
print(f"轮询失败:{exc}", file=sys.stderr)
raise SystemExit(1)
FILE:scripts/report/upload_reference_file.py
# -*- coding: utf-8 -*-
"""
上传本地文件到小Q报告,并返回可用于创建会话的 resources 映射。
"""
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent))
from common.utils import UPLOAD_CHAT_TYPE, read_config, upload_files
def parse_args() -> argparse.Namespace:
"""解析命令行参数。"""
parser = argparse.ArgumentParser(description="上传本地文件到小Q报告")
parser.add_argument("files", nargs="+", help="要上传的本地文件路径")
parser.add_argument("--workspace-dir", default=None, help="用户工作目录路径")
return parser.parse_args()
def main() -> int:
"""脚本入口。"""
args = parse_args()
if args.workspace_dir:
from common.config_loader import set_workspace_dir
set_workspace_dir(args.workspace_dir)
config = read_config()
upload_result = upload_files(args.files, config=config)
records = []
for source_file, raw_result in zip(args.files, upload_result["uploadResults"]):
records.append(
{
"sourceFile": str(Path(source_file).expanduser().resolve()),
"fileId": raw_result.get("fileId"),
"fileName": raw_result.get("fileName"),
"fileType": raw_result.get("fileType"),
}
)
output = {
"chatType": UPLOAD_CHAT_TYPE,
"fileCount": len(records),
"files": records,
"resources": upload_result["resources"],
"summary": f"已上传 {len(records)} 个文件",
}
print(json.dumps(output, ensure_ascii=False, indent=2))
return 0
if __name__ == "__main__":
try:
raise SystemExit(main())
except Exception as exc: # pragma: no cover - 终端脚本直接输出错误
print(f"上传文件失败:{exc}", file=sys.stderr)
raise SystemExit(1)
FILE:scripts/requirements.txt
requests
pyyaml
matplotlib
numpy
openpyxl
xlrd
PyMuPDF
python-docx
pandas
pytesseract
PillowAlibaba Cloud SLS (Simple Log Service) log query & analysis skill. Use this skill to help users write, explain, optimize, execute, or troubleshoot SLS index...
---
name: alibabacloud-sls-query
description: |
Alibaba Cloud SLS (Simple Log Service) log query & analysis skill. Use this skill to help users write, explain, optimize, execute, or troubleshoot SLS index search, SQL analytics, and SPL scan/pipeline statements through the aliyun CLI.
Triggers: "SLS 查询", "SLS 分析", "日志查询", "日志分析", "log query", "analyze sls logs", "aliyun log query".
---
# Alibaba Cloud SLS Query & Analysis
## Scenario Description
Use this skill when the user wants to:
- Explain, rewrite, optimize or execute an existing query
- Translate a natural-language requirement into an SLS **index query**, **SQL**, or **SPL** statement
---
## Prerequisites
### Install Aliyun CLI
Run `aliyun version` to verify if version >= `3.3.8`. If not installed or outdated, follow the doc [references/cli-installation-guide.md](references/cli-installation-guide.md) to install or update.
### Ensure AI Mode Enabled
Before executing any CLI commands, enable AI-Mode, set User-Agent, and update plugins:
```bash
aliyun configure ai-mode enable
aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-sls-query"
aliyun plugin update
```
### Check Alibaba Cloud credentials configured
Run `aliyun configure list` to check if credentials configured.
If no valid profile is shown, **STOP** here and ask the user to run `aliyun configure` outside of this session.
**Security rules:**
- **NEVER** read, echo, or print AK/SK values
- **NEVER** ask the user to paste AK/SK into the conversation
- **ONLY** use `aliyun configure list` to check credential status
---
## RAM Permission Requirements
| API | CLI | Action | Purpose |
|-----|-----|--------|---------|
| GetLogsV2 | `get-logs-v2` | `log:GetLogStoreLogs` | Run query / SQL / SPL and read results |
| GetIndex | `get-index` | `log:GetIndex` | Read index config to verify prerequisites |
For the minimum and complete RAM policy JSON, see [references/ram-policies.md](references/ram-policies.md).
> **Permission failure handling:** If a call returns `Unauthorized` permission error, stop and surface [references/ram-policies.md](references/ram-policies.md) to the user. Do **not** retry with a different account without explicit user confirmation.
---
## Core Workflow
1. Read index configuration (GetIndex)
2. Pick query mode
3. Build statement
4. Resolve time range
5. Execute query
6. Extract data from response
7. Present CLI command and results
### Step 1: Read the Index Configuration (Mandatory)
Always call `get-index` first — the index config decides which query modes are available in Step 2.
```bash
aliyun sls get-index \
--project <project> --logstore <logstore>
```
Two sections in the response drive every later decision:
| Section | Meaning |
|---------|---------|
| `line` | **Full-text index** — absence means full-text search is disabled |
| `keys` | **Field indexes** — map of field → `{ type, doc_value, token, caseSensitive, chn, ... }`. `doc_value: true` means statistics are enabled on that field |
If the call returns `IndexConfigNotExist` (HTTP 404), or the response has neither `line` nor `keys` populated, the Logstore has no index at all — stop immediately and tell the user they must create an index before any query / SQL / SPL can run.
- **The response can be large** — extract only the fields relevant to the current query. Cache per `logstore` and reuse within the session.
For field types, tokenization, and how `get-index` maps to capabilities, see [references/related-apis.md](references/related-apis.md) and [references/query-analysis.md](references/query-analysis.md).
---
### Step 2: Pick the Query Mode (Critical)
The query statement takes one of the following forms:
| Priority | Mode | Statement Form | Use when | Requires |
|----------|------|----------------|----------|----------|
| 1 | **Index search** | `<index-search>` | Filtering raw logs; return time-ordered and paginated logs | Full-text (`line`) or any field index (`keys.<field>`) |
| 2 | **SQL** | `<index-search> \| <SQL>` | Aggregation, `GROUP BY`, sort, window, top-N, projection, and other analytical operations | Target field has `keys.<field>` with `doc_value: true` |
| 3 | **SQL scan** | `<index-search> \| <SQL scan>` | User requested | None |
| 4 | **SPL** | `<index-search> \| <SPL>` | User requested | None |
**Selection rule:**
- Always prefer **Index search** for fastest speed.
- Use **Index search + SQL** when the user needs analytical operations or field projection rather than full raw-log retrieval, such as aggregation, `GROUP BY`, sorting, window analysis, top-N, or returning only the required fields/columns.
- Do **not** proactively choose **SQL scan** or **SPL**; use them only when the user explicitly requests.
For the full decision guide, see [references/query-analysis.md](references/query-analysis.md).
---
### Step 3: Write the Statement
#### 3.1 Build the index-search segment first (left of `|`)
Collect every filter that can be expressed in index-search syntax and place it before the first `|`. Use `*` if no filter applies.
```text
* and "payment failed" and status: "500" and not path: "/healthz"
```
- `*` matches all; `"..."` is full-text (needs full-text index).
- `key: "value"` is a field filter (needs field index).
- Combine with `and` / `or` / `not`; group with parentheses.
- `key: *` means field exists. Range (`>`, `>=`, `[a, b]`) works only on `long` / `double`.
If the requirement can be fully answered without aggregation or row-level processing, stop here — this is already a complete index search. For full index-search syntax, see [references/query-analysis.md](references/query-analysis.md).
#### 3.2 Append SQL — for aggregation / analytics
```sql
status: 500 | SELECT date_trunc('minute', __time__) AS minute,
count(*) AS errors
FROM log
GROUP BY minute
ORDER BY minute
```
- Read [references/query-analysis.md](references/query-analysis.md) for Query & SQL rules
- Table name is `log` (recommended to omit).
- SQL respects the indexed field type from `get-index` — a `long` / `double` field can be compared directly (`status >= 500`). Cast only when a field is indexed as `text` but numeric semantics are needed (`try_cast` to suppress errors).
- Read [references/functions-guide.md](references/functions-guide.md) for unusual Function selection (aggregate, JSON, regex, datetime, IP geo …)
#### 3.3 Append SPL — for row-level processing / flexible filtering
```spl
status: 500 and service: payment
| where try_cast(latency as BIGINT) > 1000
| extend latency_ms = try_cast(latency as BIGINT)
| project service, latency_ms, message
```
For SPL syntax, pipeline commands, and field-handling rules, read [references/spl-guide.md](references/spl-guide.md).
#### 3.4 Append SQL scan — fallback when the target field has no index / statistics
Syntax follows regular SQL (see 3.2), with one difference: **every field is `varchar`**, so always `cast()` / `try_cast()` before numeric comparison or arithmetic. See [references/query-analysis.md](references/query-analysis.md) for scan semantics.
```sql
* | set session mode=scan; SELECT api, count(1) AS pv FROM log GROUP BY api
```
---
### Step 4: Resolve the Time Range
Generate `--from` / `--to` as **Unix timestamps in seconds** before building the CLI command. `--from` is inclusive and `--to` is exclusive.
Choose one of three input patterns:
1. **Relative time** — user says "recent / last N minutes|hours|days".
2. **Natural-language absolute time without timezone** — normalize to `YYYY-MM-DD HH:MM:SS`, then parse using the machine's local timezone.
3. **Absolute time with explicit timezone** — parse using the customer-provided timezone or UTC offset.
**1. Relative time**
```bash
# recent 15 minutes
FROM=$(($(date +%s) - 900))
TO=$(date +%s)
```
**2. Natural-language absolute time without timezone**
If the user gives a date/time but no timezone, use the machine's local timezone. First normalize natural language such as `2026年3月13日12点` to `2026-03-13 12:00:00`, then parse it as local time.
```bash
# Example: 2026年3月13日12点 -> 2026-03-13 12:00:00
# Linux (GNU date): local timezone
FROM=$(date -d "2026-03-13 12:00:00" +%s)
# macOS (BSD date): local timezone
FROM=$(date -j -f "%Y-%m-%d %H:%M:%S" "2026-03-13 12:00:00" +%s)
```
For a time range such as "2026年3月13日12点到13点", compute both endpoints the same way. For a single point-in-time request, infer a practical window from the user's intent; if unclear, ask for the range before executing.
**3. Absolute time with explicit timezone**
To convert a local date/time to a Unix timestamp: parse the input as UTC with `date -u`, then **subtract** the timezone's UTC offset in seconds.
Formula: `unix_ts = date_utc_parse(input) − (UTC_offset_hours × 3600)`
```bash
# Example: 2025-01-15 10:30:00 Beijing Time (UTC+8)
# Beijing is UTC+8, so subtract 8 × 3600 = 28800
# Linux (GNU date)
FROM=$(( $(date -u -d "2025-01-15 10:30:00" +%s) - 28800 ))
# macOS (BSD date)
FROM=$(( $(date -u -j -f "%Y-%m-%d %H:%M:%S" "2025-01-15 10:30:00" +%s) - 28800 ))
```
```bash
# Example: 2025-01-15 10:30:00 New York Time (UTC-5)
# New York is UTC-5, so subtract -5 × 3600 = subtract -18000 = add 18000
# Linux (GNU date)
FROM=$(( $(date -u -d "2025-01-15 10:30:00" +%s) + 18000 ))
# macOS (BSD date)
FROM=$(( $(date -u -j -f "%Y-%m-%d %H:%M:%S" "2025-01-15 10:30:00" +%s) + 18000 ))
```
Common UTC offsets (value to subtract):
| Timezone | UTC offset hours | Seconds to subtract |
|------------------|------------------|---------------------|
| Beijing (UTC+8) | +8 | `28800` |
| Tokyo (UTC+9) | +9 | `32400` |
| London (UTC) | 0 | `0` |
| New York (UTC-5) | -5 | `-18000` |
---
### Step 5: Execute via `get-logs-v2`
Use `aliyun sls get-logs-v2` to execute queries. Run `aliyun help sls get-logs-v2` to see CLI parameter usage; read [references/related-apis.md](references/related-apis.md) for detailed API parameter descriptions.
**Required CLI flags:**
- `--project`: SLS project name
- `--logstore`: Logstore name within the project
- `--from`: Start of time range, **Unix timestamp in seconds** (inclusive)
- `--to`: End of time range, **Unix timestamp in seconds** (exclusive)
- `--query`: Statement built in Step 3
Pagination works differently depending on whether the statement has a `|`:
#### 5.1 Index-search only — paginate with `--offset` / `--line`
```bash
aliyun sls get-logs-v2 \
--project my-project --logstore my-logstore \
--from 1740000000 --to 1740003600 \
--query '* and "payment failed" and status: "500"' \
--line 100 --offset 0 --reverse true
```
- Pagination: `--line` is page size (`1–100`, required); `--offset` is the start row (optional, default `0`).
- Ordering: `--reverse true` returns newest first; default `false` is oldest first.
#### 5.2 With SQL — paginate with `LIMIT` inside the statement
```bash
aliyun sls get-logs-v2 \
--project my-project --logstore my-logstore \
--from 1740000000 --to 1740003600 \
--query 'status: "500" | SELECT request_uri, count(*) AS cnt FROM log GROUP BY request_uri ORDER BY cnt DESC LIMIT 20'
```
- SQL default result cap is **100 rows**. To get more results or paginate:
- `LIMIT count` — raise the cap (e.g., `LIMIT 500` returns up to 500 rows)
- `LIMIT offset, count` — paginate (e.g., `LIMIT 20, 20` for rows 21–40; `LIMIT 40, 20` for rows 41–60). Max offset+count is 1000000.
- **Do not** use `LIMIT count OFFSET offset` syntax — it is **not supported**. Always use `LIMIT offset, count`.
- Ordering: use `ORDER BY <field> DESC/ASC` to sort.
**Result completeness check:** every response contains `meta.progress`. If it is `Incomplete`, **re-issue the same request** until it returns `Complete`.
---
### Step 6: Extract Data from the Response
`get-logs-v2` returns:
```json
{
"meta": { "progress": "Complete", "count": 10, ... },
"data": [ { "field1": "value1", ... }, ... ]
}
```
| Field | Meaning |
|-------|---------|
| `meta.progress` | `Complete` or `Incomplete` (see Step 5) |
| `meta.count` | Number of rows returned |
| `data` | Array of log entries or aggregation rows; may contain `__time__` (Unix seconds, string) |
Use `jq` (preferred) or `--cli-query` (JMESPath) to extract the fields the user needs:
| Extract | `jq` | `--cli-query` (JMESPath) |
|---------|------|--------------------------|
| Data rows | `\| jq '.data'` | `--cli-query 'data'` |
| Progress | `\| jq '.meta.progress'` | `--cli-query 'meta.progress'` |
| Row count | `\| jq '.meta.count'` | `--cli-query 'meta.count'` |
| Specific fields | `\| jq '.data[] \| {LogStore, read_mb}'` | `--cli-query 'data[].{LogStore: LogStore, read_mb: read_mb}'` |
---
### Step 7: Present the CLI Command and Results
**CLI command** — always show the full, copy-paste-ready `aliyun sls get-logs-v2 ...` command. Redact any AK/SK. If the query was not executed (write / explain scenario), present the command the user should run.
**Results** — when a query was executed, use Step 6 to extract `data` and format according to the user's request (table, list, summary, etc.). Append one sentence explaining the query mode choice.
---
## Cleanup
**Whether operations succeed or fail, you MUST disable AI-Mode before ending the session:**
```bash
aliyun configure ai-mode disable
```
---
## Global Rules
- **Always prefer Index search for fastest raw-log retrieval, and use Index search + SQL for analysis or field projection.**
- **When the user only needs specific fields, use `SELECT` to project them** rather than fetching full raw logs — this reduces network overhead. Requires `doc_value: true` on the target fields (confirmed in Step 1).
- **Do not** hard-code `__time__` filters — pass time range via `--from` / `--to`.
- **Deprecated API**: never call `get-logs`; always use `get-logs-v2`.
---
## Troubleshooting
When the user reports "no data", "wrong result", or a CLI error, walk through the checklist in this exact order:
1. **Time range** — wrong `--from`/`--to`? Milliseconds instead of seconds? Recent writes still indexing?
2. **Index configuration** — field index missing? Full-text index off? Target field not in `keys`?
3. **Field type / statistics** — range query on a `text` field? SQL on a field without `doc_value`?
4. **Syntax** — mixed SQL and SPL? Leading `*` in fuzzy match? SPL string escaping?
5. **Mode choice** — scanning when an index-based query would do? Aggregating in SPL instead of SQL?
6. **Completeness** — `meta.progress = Incomplete`, caller did not retry (see Step 5).
7. **ProjectNotExist** — region or endpoint is wrong. See [references/regions.md](references/regions.md).
8. **Network failure** (timeout, connection refused) — try switching to internal endpoint. See [references/regions.md](references/regions.md).
For the full catalog of failure modes and error codes, see [references/troubleshooting.md](references/troubleshooting.md) and the `Common Errors` table in [references/related-apis.md](references/related-apis.md).
---
## Reference Documents
| Document | Description |
|----------|-------------|
| [references/query-analysis.md](references/query-analysis.md) | Mode decision, index-search / SQL rules, scan semantics |
| [references/spl-guide.md](references/spl-guide.md) | SPL pipeline syntax, common commands, field handling |
| [references/functions-guide.md](references/functions-guide.md) | Function categories, SQL/SPL differences, templates |
| [references/troubleshooting.md](references/troubleshooting.md) | "No data / wrong result / error" playbook |
| [references/related-apis.md](references/related-apis.md) | `GetLogsV2` and `GetIndex` API & CLI reference |
| [references/ram-policies.md](references/ram-policies.md) | Minimum and complete RAM policies |
| [references/cli-installation-guide.md](references/cli-installation-guide.md) | Aliyun CLI install, auth modes, profiles |
| [references/regions.md](references/regions.md) | Region / endpoint configuration, internal endpoint, ProjectNotExist troubleshooting |
| [references/acceptance-criteria.md](references/acceptance-criteria.md) | CLI invocation acceptance tests |
| `references/query_analysis/*.yaml` · `references/spl/*.yaml` · `references/functions/*.yaml` | Source-of-truth YAMLs bundled with this skill |
FILE:references/acceptance-criteria.md
# Acceptance Criteria: sls-query-analysis
**Scenario**: SLS Log Query & Analysis
**Purpose**: Skill testing acceptance criteria
---
## Correct CLI Invocation Patterns
### 1. Command Format — verify product and API name
#### CORRECT
```bash
aliyun sls get-logs-v2 \
--project my-project \
--logstore my-logstore \
--from 1740000000 \
--to 1740003600 \
--query '* and status: "500"' \
--line 100
```
#### INCORRECT — Wrong product name
```bash
aliyun log get-logs-v2 --project my-project --logstore my-logstore
```
**Why**: Product name is `sls`, not `log`, `logservice`, `aliyunlog`, or `aliyun-sls`.
### 2. Parameter Format
#### CORRECT — Kebab-case CLI sub-command and flags
```bash
aliyun sls get-logs-v2 \
--project my-project \
--logstore my-logstore \
--from 1740000000 \
--to 1740003600 \
--query '* | select count(*) as total from log' \
--line 100 \
--offset 0 \
--reverse true
```
#### INCORRECT — PascalCase sub-command or flags
```bash
# Sub-command in PascalCase
aliyun sls GetLogsV2 --project my-project --logstore my-logstore
aliyun sls GetIndex --project my-project --logstore my-logstore
# Flags in PascalCase
aliyun sls get-logs-v2 --Project my-project --Logstore my-logstore --From 1740000000 --To 1740003600
```
**Why**: The SLS plugin uses **kebab-case** for both sub-commands (`get-logs-v2`, `get-index`) and flags (`--project`, `--logstore`, `--from`, `--to`, `--query`).
#### INCORRECT — Using `--region-id` instead of `--region`
```bash
aliyun sls get-logs-v2 --region-id cn-hangzhou --project p --logstore l --from 1 --to 2
```
**Why**: The CLI global flag is `--region`, not `--region-id`.
#### INCORRECT — JSON `--params` string (old SDK pattern)
```bash
aliyun sls get-logs-v2 --params '{"Project":"my-project","Logstore":"my-logstore","From":"1740000000","To":"1740003600"}'
```
**Why**: The CLI takes individual flags, not a JSON `--params` blob.
### 3. Authentication — never expose credentials
#### CORRECT — Verify credential profile via default credential chain
```bash
aliyun configure list
```
#### INCORRECT — Passing AK/SK directly in the command
```bash
aliyun sls get-logs-v2 \
--access-key-id LTAI5tXXXX \
--access-key-secret 8dXXXX \
--project p --logstore l --from 1740000000 --to 1740003600
```
**Why**: Credentials must come from the configured profile, environment variables, STS, or RAM role — never be typed into the command line.
#### INCORRECT — Reading or printing raw credentials
```bash
aliyun configure get # FORBIDDEN: may expose credential details
cat ~/.aliyun/config.json # FORBIDDEN: may expose credential details
```
#### INCORRECT — Any command that prints environment credentials
```bash
echo $ALIBABA_CLOUD_ACCESS_KEY_ID # FORBIDDEN: example of secret output
printenv | grep -i credential # FORBIDDEN: may reveal secrets
env | grep -i access_key # FORBIDDEN: may reveal secrets
```
### 4. API Names — verify exact sub-command
#### CORRECT
```
get-logs-v2 # OpenAPI Action: GetLogsV2
get-index # OpenAPI Action: GetIndex
```
#### INCORRECT
```
GetLogsV2 # PascalCase is the Action name, not the CLI sub-command
GetIndex # PascalCase is the Action name, not the CLI sub-command
getLogsV2 # Wrong casing
get_logs_v2 # Wrong separator (snake_case)
getlogsv2 # Missing separators
get-logs # Deprecated — use get-logs-v2
get-logs-2 # Wrong suffix (v2, not 2)
describe-index # Wrong verb — SLS uses get-, not describe-
get-log-index # Not a real sub-command — use get-index
```
### 5. Region Parameter
#### CORRECT
```bash
--region cn-hangzhou
--region cn-shanghai
--region ap-southeast-1
--region us-west-1
```
#### INCORRECT
```bash
--region hangzhou # Missing country prefix
--region cn-hangzhou-1 # Not a real region ID
```
**Why**: Only valid Alibaba Cloud region IDs are accepted (e.g., `cn-hangzhou`, `ap-southeast-1`). The project is region-scoped — a region mismatch returns `ProjectNotExist`.
### 6. Time Parameters
#### CORRECT — Unix timestamp in seconds
```bash
--from 1711324800 --to 1711411200
```
#### INCORRECT — Millisecond timestamps
```bash
--from 1711324800000 --to 1711411200000
```
**Why**: `--from` / `--to` are Unix **seconds**, not milliseconds.
#### INCORRECT — Date or ISO strings
```bash
--from "2024-03-25" --to "2024-03-26"
--from "2024-03-25T00:00:00Z" --to "2024-03-26T00:00:00Z"
```
**Why**: Only integer seconds are accepted; date strings must be converted first (e.g., `date -d "2024-03-25 00:00:00 UTC" +%s`).
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
Complete guide for installing and configuring Aliyun CLI.
> **Aliyun CLI 3.3.8+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.8 or later for full plugin ecosystem coverage.
## Installation
### macOS
**Using Homebrew (Recommended)**
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
# Verify version (>= 3.3.8)
aliyun version
```
**Using Binary**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
# Extract
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
# Move to PATH
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
### Linux
**Debian/Ubuntu**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**CentOS/RHEL**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**ARM64 Architecture**
```bash
# Download ARM64 version
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-arm64.tgz
sudo mv aliyun /usr/local/bin/
```
### Windows
**Using Binary**
1. Download from: <https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip>
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Open new Command Prompt or PowerShell
5. Verify: `aliyun version`
**Using PowerShell**
```powershell
# Download
Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip"
# Extract
Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli
# Add to PATH (requires admin privileges)
$env:Path += ";C:\aliyun-cli"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)
# Verify
aliyun version
```
## Configuration
### Quick Start
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
All `aliyun configure` commands support non-interactive flags, which is the recommended approach —
it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts.
**Where to Get Access Keys**
1. Log in to Aliyun Console: <https://ram.console.aliyun.com/>
2. Navigate to: AccessKey Management
3. Create a new AccessKey pair
4. Save the secret immediately — it's only shown once
### Configuration Modes
Aliyun CLI supports 6 authentication modes. All examples below use non-interactive flags.
#### 1. AK Mode (Access Key)
Most common mode for personal accounts and scripts.
```bash
aliyun configure set \
--mode AK \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Configuration is stored in `~/.aliyun/config.json`:
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "AK",
"access_key_id": "LTAI5tXXXXXXXX",
"access_key_secret": "8dXXXXXXXXXXXXXXXXXXXXXXXX",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "en"
}
]
}
```
#### 2. StsToken Mode (Temporary Credentials)
For short-lived access (tokens expire in 1-12 hours).
```bash
aliyun configure set \
--mode StsToken \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--sts-token v1.0:XXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Use cases: CI/CD pipelines, temporary access for external contractors, cross-account access.
#### 3. RamRoleArn Mode (Assume RAM Role)
Assume a RAM role for elevated or cross-account access.
```bash
aliyun configure set \
--mode RamRoleArn \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--ram-role-arn acs:ram::123456789012:role/AdminRole \
--role-session-name my-session \
--region cn-hangzhou
```
Use cases: cross-account resource access, temporary elevated privileges, role-based access control.
#### 4. EcsRamRole Mode (ECS Instance RAM Role)
Use the RAM role attached to an ECS instance — no credentials needed.
```bash
aliyun configure set \
--mode EcsRamRole \
--ram-role-name MyEcsRole \
--region cn-hangzhou
```
Requirements: must be running on an ECS instance with a RAM role attached.
Use cases: scripts and automation running on ECS instances.
#### 5. RsaKeyPair Mode (RSA Key Pair)
Use RSA key pair for authentication (generate key pair in Aliyun Console first).
```bash
aliyun configure set \
--mode RsaKeyPair \
--private-key /path/to/private-key.pem \
--key-pair-name my-key-pair \
--region cn-hangzhou
```
#### 6. RamRoleArnWithEcs Mode (ECS + RAM Role)
Combine ECS instance role with RAM role assumption for cross-account access from ECS.
```bash
aliyun configure set \
--mode RamRoleArnWithEcs \
--ram-role-name MyEcsRole \
--ram-role-arn acs:ram::123456789012:role/TargetRole \
--role-session-name my-session \
--region cn-hangzhou
```
### Environment Variables
**Highest priority** - overrides config file
**Access Key Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**STS Token Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_SECURITY_TOKEN=your_sts_token
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**ECS RAM Role Mode**
```bash
export ALIBABA_CLOUD_ECS_METADATA=role_name
```
**Use Case**:
- CI/CD pipelines
- Docker containers
- Temporary credential override
### Managing Multiple Profiles
**Create Named Profiles**
```bash
aliyun configure set --profile projectA \
--mode AK \
--access-key-id LTAI5tAAAAAAAA \
--access-key-secret 8dAAAAAAAAAAAAAAAAAAAAAAAA \
--region cn-hangzhou
aliyun configure set --profile projectB \
--mode AK \
--access-key-id LTAI5tBBBBBBBB \
--access-key-secret 8dBBBBBBBBBBBBBBBBBBBBBBBB \
--region cn-shanghai
```
**Use Specific Profile**
```bash
aliyun ecs describe-instances --profile projectA
export ALIBABA_CLOUD_PROFILE=projectA
aliyun ecs describe-instances # Uses projectA
```
**List and Switch Profiles**
```bash
aliyun configure list # List all profiles
aliyun configure set --current projectA # Switch default profile
```
### Credential Priority
Credentials are loaded in this order (first found wins):
1. **Command-line flag**: `--profile <name>`
2. **Environment variable**: `ALIBABA_CLOUD_PROFILE`
3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc.
4. **Configuration file**: `~/.aliyun/config.json` (current profile)
5. **ECS Instance RAM Role**: If running on ECS with attached role
## Verification
### Test Authentication
```bash
# Basic test - list regions
aliyun ecs describe-regions
# Expected output: JSON array of regions
```
**If successful**, you'll see:
```json
{
"Regions": {
"Region": [
{
"RegionId": "cn-hangzhou",
"RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com",
"LocalName": "华东 1(杭州)"
},
...
]
},
"RequestId": "..."
}
```
**If failed**, you'll see error messages:
- `InvalidAccessKeyId.NotFound` - Wrong Access Key ID
- `SignatureDoesNotMatch` - Wrong Access Key Secret
- `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode)
- `Forbidden.RAM` - Insufficient permissions
### Debug Configuration
```bash
# Show current configuration
aliyun configure get
# Test with debug logging
aliyun ecs describe-regions --log-level=debug
# Check credential provider
aliyun configure get mode
```
## Security Best Practices
### 1. Use RAM Users (Not Root Account)
❌ **Don't**: Use Aliyun root account credentials
✅ **Do**: Create RAM users with specific permissions
```bash
# Create RAM user in console
# Attach only necessary policies
# Use RAM user's access keys
```
### 2. Principle of Least Privilege
Grant only the minimum permissions needed:
```bash
# Example: Read-only ECS access
# Attach policy: AliyunECSReadOnlyAccess
```
### 3. Rotate Access Keys Regularly
```bash
# Create new access key in RAM Console, then update configuration
aliyun configure set --access-key-id NEW_KEY --access-key-secret NEW_SECRET
# Delete old access key from console
```
### 4. Use STS Tokens for Temporary Access
```bash
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token XXXX --region cn-hangzhou
```
### 5. Use ECS RAM Roles When Possible
```bash
aliyun configure set --mode EcsRamRole --ram-role-name MyRole --region cn-hangzhou
```
### 6. Never Commit Credentials
```bash
# Add to .gitignore
echo "~/.aliyun/config.json" >> .gitignore
# Use environment variables in CI/CD instead
```
### 7. Secure Config File
```bash
# Restrict permissions
chmod 600 ~/.aliyun/config.json
```
## Troubleshooting
### Issue: Command Not Found
```bash
# Check installation
which aliyun
# Check PATH
echo $PATH
# Reinstall or add to PATH
```
### Issue: Authentication Failed
```bash
# Verify configuration
aliyun configure get
# Test with debug
aliyun ecs describe-regions --log-level=debug
# Check credentials in console
# Verify access key is active
```
### Issue: Permission Denied
```bash
# Error: Forbidden.RAM
# Check RAM user permissions
# Attach necessary policies in RAM console
# Example: AliyunECSFullAccess for ECS operations
```
### Issue: STS Token Expired
```bash
# Error: InvalidSecurityToken.Expired
# Reconfigure with new token
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token NEW_TOKEN --region cn-hangzhou
```
### Issue: Wrong Region
```bash
# Some resources may not exist in the specified region
# Check available regions
aliyun ecs describe-regions
# Update default region
aliyun configure set region cn-shanghai
```
## Advanced Configuration
### Custom Endpoint
```bash
# Use custom or private endpoint
export ALIBABA_CLOUD_ECS_ENDPOINT=ecs-vpc.cn-hangzhou.aliyuncs.com
```
### Proxy Settings
```bash
# HTTP proxy
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
# No proxy for specific domains
export NO_PROXY=localhost,127.0.0.1,.aliyuncs.com
```
### Timeout Settings
```bash
# Connection timeout (default: 10s)
export ALIBABA_CLOUD_CONNECT_TIMEOUT=30
# Read timeout (default: 10s)
export ALIBABA_CLOUD_READ_TIMEOUT=30
```
## Next Steps
After installation and configuration:
1. **Install plugins** for services you need (v3.3.1+ supports all published product plugins):
```bash
aliyun plugin install --names ecs vpc rds
# List all available plugins
aliyun plugin list-remote
```
2. **Explore commands**:
```bash
aliyun sls --help
aliyun fc --help
```
3. **Read documentation**:
- [Command Syntax Guide](./command-syntax.md)
- [Global Flags Reference](./global-flags.md)
- [Common Scenarios](./common-scenarios.md)
## References
- Official Documentation: <https://help.aliyun.com/zh/cli/>
- RAM Console: <https://ram.console.aliyun.com/>
- Access Key Management: <https://ram.console.aliyun.com/manage/ak>
- Plugin Repository: <https://github.com/aliyun/aliyun-cli>
FILE:references/functions/README.md
# SLS 函数参考文档
本目录包含 SLS SQL 和 SPL 分析语句支持的所有函数,按功能分类。
## 目录结构
| 文件 | 类别 | 描述 | 支持 |
|------|------|------|------|
| `aggregate.yaml` | 聚合函数 | count、sum、avg、max、min 等统计函数 | SQL |
| `string.yaml` | 字符串函数 | 文本拼接、截取、大小写转换、查找替换 | SQL + SPL |
| `regex.yaml` | 正则表达式函数 | 正则匹配、提取、替换 | SQL + SPL |
| `datetime.yaml` | 日期时间函数 | 时间格式化、解析、截断、转换 | SQL + SPL |
| `type_conversion.yaml` | 类型转换函数 | cast、try_cast 类型转换 | SQL + SPL |
| `conditional.yaml` | 条件函数 | if、case、coalesce 条件判断 | SQL + SPL |
| `json.yaml` | JSON函数 | JSON 数据提取和解析 | SQL + SPL |
| `math.yaml` | 数学函数 | 数值计算、取整、幂运算 | SQL + SPL |
| `url.yaml` | URL函数 | URL 解析和参数提取 | SQL + SPL |
| `ip_geo.yaml` | IP地理位置函数 | IP 转省份、城市、国家、经纬度 | 仅SPL |
| `encoding.yaml` | 编码解码函数 | URL、Base64 编码解码 | SQL + SPL |
| `hash.yaml` | 哈希函数 | MD5、SHA1、SHA256 哈希计算 | SQL + SPL |
## 使用说明
### 1. 查找函数
- **按功能查找**:根据上表选择对应的分类文件
- **按名称查找**:在对应分类的 YAML 文件中查找具体函数
- **按场景查找**:参考 `overview.yaml` 中的常见场景示例
### 2. 查看函数详情
每个 YAML 文件包含以下信息:
```yaml
functions:
- name: 函数名
syntax: 函数语法
description: 功能描述
examples:
sql: SQL 示例
spl: SPL 示例
note: 注意事项(可选)
```
### 3. 重要提示
#### 类型转换
- 字段默认为 VARCHAR 类型
- 数值比较和运算前必须使用 `cast()` 或 `try_cast()` 转换
- SPL 中尤其需要注意类型转换
示例:
```sql
-- ✅ 正确
* | SELECT * WHERE cast(status as BIGINT) >= 500
-- ❌ 错误
* | SELECT * WHERE status >= 500
```
#### SQL vs SPL
- **SQL**:使用 SELECT、WHERE、GROUP BY 语法
- **SPL**:使用 extend、where、stats 语法
- **聚合函数**主要用于 SQL
- **IP地理函数**仅用于 SPL
#### 正则表达式
- SPL 使用 RE2 正则引擎
- 不支持:后向引用(\1)、环视(?<=...)等
- 反斜杠不需要双重转义:`\d` 直接写 `\d`
## 快速示例
### 统计分析
```sql
-- 按状态码统计
* | SELECT status, count(*) AS pv GROUP BY status ORDER BY pv DESC
```
### 时间分组
```sql
-- 按小时统计
* | SELECT date_trunc('hour', __time__) AS hour, count(*) AS pv GROUP BY hour
```
### 正则提取
```sql
-- 提取错误码
* | SELECT regexp_extract(message, 'code:(\d+)', 1) AS error_code, count(*) GROUP BY error_code
```
### JSON 解析
```sql
-- 提取 JSON 字段
* | SELECT json_extract_scalar(payload, '$.user.name') AS user, count(*) GROUP BY user
```
### IP 地理分析(SPL)
```spl
# 按省份统计
* | extend province = ip_to_province(client_ip) | stats pv = count(*) by province
```
## 相关文档
- [函数索引](./overview.yaml) - 函数分类索引和使用指南
- [SQL 查询语法](../query_analysis/sql.yaml) - SQL 查询完整语法
- [SPL 基础语法](../spl/overview.yaml) - SPL 查询基础
- [索引查询](../query_analysis/indexSearch.yaml) - 关键字搜索语法
FILE:references/functions/aggregate.yaml
category: aggregate_functions
name: 聚合函数
description: 对数据进行汇总计算,通常与GROUP BY配合使用
support:
sql: true
spl: false
functions:
- name: count
syntax: "count(*) 或 count(x)"
description: 统计日志条数
example: "* | SELECT count(*) AS pv"
note: "count(*) 统计所有,count(x) 统计x非NULL的数量"
- name: sum
syntax: "sum(x)"
description: 计算总和
example: "* | SELECT sum(cast(response_size as BIGINT)) AS total_size"
- name: avg
syntax: "avg(x)"
description: 计算平均值
example: "* | SELECT avg(cast(request_time as DOUBLE)) AS avg_time"
- name: max
syntax: "max(x)"
description: 返回最大值
example: "* | SELECT max(cast(response_time as BIGINT)) AS max_time"
- name: min
syntax: "min(x)"
description: 返回最小值
example: "* | SELECT min(cast(response_time as BIGINT)) AS min_time"
- name: count_if
syntax: "count_if(condition)"
description: 统计满足条件的日志数
example: "* | SELECT count_if(cast(status as BIGINT) >= 500) AS error_count"
- name: arbitrary
syntax: "arbitrary(x)"
description: 返回任意一个非空值
example: "* | SELECT status, arbitrary(request_time) GROUP BY status"
note: 用于GROUP BY时获取非分组字段的值
FILE:references/functions/approximate.yaml
category: approximate_functions
name: 估算函数
description: 基于数据预测或填充缺失值的近似计算
support:
sql: true
spl: false
functions:
- name: approx_distinct
syntax: "approx_distinct(x)"
description: 估算唯一值的个数,使用HyperLogLog算法
returns: 近似计数
example: "* | SELECT approx_distinct(client_ip) AS uv"
note: 比count(distinct x)更快,但是近似值
- name: approx_percentile
syntax: "approx_percentile(x, percentage)"
description: 计算近似百分位数
params:
- x: 列名
- percentage: 百分位,取值0~1
example: "* | SELECT approx_percentile(cast(request_time as double), 0.99) AS p99"
- name: approx_percentile (with array)
syntax: "approx_percentile(x, array[p1, p2,...])"
description: 同时计算多个百分位数
example: "* | SELECT approx_percentile(cast(request_time as double), array[0.5, 0.95, 0.99]) AS percentiles"
- name: numeric_histogram
syntax: "numeric_histogram(bucket_count, x)"
description: 按照bucket数量统计x列的近似直方图
params:
- bucket_count: 桶的数量
- x: 数值列
returns: Map类型,键为桶的代表值,值为该桶的近似计数
example: "* | SELECT numeric_histogram(10, cast(request_time as double))"
- name: numeric_histogram_u
syntax: "numeric_histogram_u(bucket_count, x)"
description: 按照bucket数量统计x列的近似直方图,返回多行格式
example: "* | SELECT numeric_histogram_u(10, cast(request_time as double))"
use_cases:
- 快速估算UV(独立访客)
- 计算性能指标的P50、P95、P99
- 生成数值分布直方图
- 大数据量下的快速统计
important_notes:
- 估算函数牺牲精度换取性能
- approx_distinct使用HyperLogLog算法,标准误差约2.3%
- approx_percentile误差在1%以内
- 适用于大数据量场景
FILE:references/functions/array.yaml
category: array_functions
name: 数组函数和运算符
description: 对数组进行增删改查、遍历和转换操作
support:
sql: true
spl: partial
functions:
- name: array_distinct
syntax: "array_distinct(x)"
description: 删除数组中重复的元素
examples:
sql: "* | SELECT array_distinct(cast(json_parse(number) as array(bigint)))"
spl: "* | extend unique_arr = array_distinct(arr_field)"
- name: array_intersect
syntax: "array_intersect(x, y)"
description: 计算两个数组的交集
examples:
sql: "* | SELECT array_intersect(array[1,2,3,4,5], array[1,3,5,7])"
spl: "* | extend intersection = array_intersect(arr1, arr2)"
- name: array_union
syntax: "array_union(x, y)"
description: 计算两个数组的并集
examples:
sql: "* | SELECT array_union(array[1,2,3,4,5], array[1,3,5,7])"
note: 仅支持SQL
- name: array_except
syntax: "array_except(x, y)"
description: 计算两个数组的差集
examples:
sql: "* | SELECT array_except(array[1,2,3,4,5], array[1,3,5,7])"
spl: "* | extend diff = array_except(arr1, arr2)"
- name: array_join
syntax: "array_join(x, delimiter [, null_replacement])"
description: 使用指定连接符将数组元素拼接为字符串
params:
- x: 数组
- delimiter: 连接符
- null_replacement: 可选,用于替换null元素的字符串
examples:
sql: "* | SELECT array_join(array[null,'Log','Service'], ' ', 'Alicloud')"
spl: "* | extend joined = array_join(arr_field, ',')"
note: 返回结果最大1KB,超出会被截断
- name: array_max
syntax: "array_max(x)"
description: 获取数组中的最大值
examples:
sql: "* | SELECT array_max(try_cast(json_parse(number) as array(bigint))) AS max_number"
- name: array_min
syntax: "array_min(x)"
description: 获取数组中的最小值
examples:
sql: "* | SELECT array_min(try_cast(json_parse(number) as array(bigint))) AS min_number"
- name: array_position
syntax: "array_position(x, element)"
description: 获取指定元素的下标(从1开始),不存在返回0
examples:
sql: "* | SELECT array_position(array[49,45,47], 45)"
- name: array_remove
syntax: "array_remove(x, element)"
description: 删除数组中指定的元素
examples:
sql: "* | SELECT array_remove(array[49,45,47], 45)"
- name: array_sort
syntax: "array_sort(x)"
description: 对数组元素进行升序排序,null元素排在最后
examples:
sql: "* | SELECT array_sort(array['b','d',null,'c','a'])"
- name: cardinality
syntax: "cardinality(x)"
description: 计算数组中元素的个数
examples:
sql: "* | SELECT cardinality(cast(json_parse(number) as array(bigint)))"
- name: contains
syntax: "contains(x, element)"
description: 判断数组中是否包含指定元素
returns: boolean类型
examples:
sql: "* | SELECT contains(cast(json_parse(region) as array(varchar)), 'cn-beijing')"
- name: reverse
syntax: "reverse(x)"
description: 对数组中的元素进行反向排列
examples:
sql: "* | SELECT reverse(array[1,2,3,4,5])"
spl: "* | extend reversed = reverse(arr_field)"
- name: slice
syntax: "slice(x, start, length)"
description: 获取数组的子集
params:
- start: 索引开始位置(负数从末尾开始,正数从头部开始)
- length: 子集元素个数
examples:
sql: "* | SELECT slice(array[1,2,4,5,6,7,7], 3, 2)"
- name: filter
syntax: "filter(x, lambda_expression)"
description: 结合Lambda表达式过滤数组元素
examples:
sql: "* | SELECT filter(array[5,-6,null,7], x -> x > 0)"
spl: "* | extend filtered = filter(arr_field, x -> x > 0)"
- name: transform
syntax: "transform(x, lambda_expression)"
description: 将Lambda表达式应用到数组的每个元素
examples:
sql: "* | SELECT transform(array[5,6], x -> x + 1)"
spl: "* | extend transformed = transform(arr_field, x -> x * 2)"
- name: reduce
syntax: "reduce(x, lambda_expression)"
description: 根据Lambda表达式对数组元素进行累加计算
examples:
sql: "* | SELECT reduce(array[5,20,50], 0, (s, x) -> s + x, s -> s)"
- name: sequence
syntax: "sequence(x, y [, step])"
description: 返回起始值范围内连续递增的数组
params:
- x: 起始值
- y: 结束值
- step: 可选,递增间隔(默认为1)
examples:
sql: "* | SELECT sequence(0, 10, 2)"
spl: "* | extend seq = sequence(1, 100)"
- name: zip
syntax: "zip(x, y...)"
description: 将多个数组合并为二维数组
examples:
sql: "* | SELECT zip(array[1,2,3], array['1b',null,'3b'], array[1,2,3])"
important_notes:
- 数组下标从1开始
- array_join返回结果最大1KB
- 使用Lambda表达式可以实现复杂的数组处理逻辑
- 配合cast和json_parse处理JSON格式的数组字段
FILE:references/functions/binary.yaml
category: binary_functions
name: 二进制函数
description: 处理二进制类型的数据,进行编码和解码
support:
sql: true
spl: partial
functions:
- name: from_base64
syntax: "from_base64(x)"
description: 对Base64编码的字符串进行解码
returns: varbinary类型
example: "* | SELECT from_base64('aGVsbG8=')"
- name: to_base64
syntax: "to_base64(x)"
description: 将二进制数据编码为Base64字符串
returns: varchar类型
example: "* | SELECT to_base64(cast('hello' as varbinary))"
- name: from_hex
syntax: "from_hex(x)"
description: 将十六进制字符串转换为二进制
example: "* | SELECT from_hex('68656C6C6F')"
- name: to_hex
syntax: "to_hex(x)"
description: 将二进制数据转换为十六进制字符串
example: "* | SELECT to_hex(cast('hello' as varbinary))"
- name: from_big_endian_64
syntax: "from_big_endian_64(x)"
description: 将大端序的8字节二进制转为bigint
example: "* | SELECT from_big_endian_64(from_hex('0000000000000001'))"
- name: to_big_endian_64
syntax: "to_big_endian_64(x)"
description: 将bigint转为大端序的8字节二进制
example: "* | SELECT to_big_endian_64(1)"
- name: md5
syntax: "md5(x)"
description: 计算MD5哈希值,返回二进制
example: "* | SELECT to_hex(md5(cast('hello' as varbinary)))"
- name: sha1
syntax: "sha1(x)"
description: 计算SHA1哈希值,返回二进制
example: "* | SELECT to_hex(sha1(cast('hello' as varbinary)))"
- name: sha256
syntax: "sha256(x)"
description: 计算SHA256哈希值,返回二进制
example: "* | SELECT to_hex(sha256(cast('hello' as varbinary)))"
- name: sha512
syntax: "sha512(x)"
description: 计算SHA512哈希值,返回二进制
example: "* | SELECT to_hex(sha512(cast('hello' as varbinary)))"
use_cases:
- Base64编解码
- 哈希值计算
- 二进制数据处理
- 数据校验
important_notes:
- 配合to_hex可以将二进制结果转为可读的十六进制
- 哈希函数返回二进制,通常需要to_hex转换
- 注意cast类型转换
FILE:references/functions/bitwise.yaml
category: bitwise_functions
name: 位运算函数
description: 直接操作二进制位的运算函数
support:
sql: true
spl: false
functions:
- name: bit_count
syntax: "bit_count(x, bits)"
description: 统计二进制表示中1的个数
params:
- x: bigint类型的数值
- bits: 位数(32或64)
example: "* | SELECT bit_count(5, 64)"
- name: bitwise_and
syntax: "bitwise_and(x, y)"
description: 按位与运算
example: "* | SELECT bitwise_and(5, 3)"
- name: bitwise_or
syntax: "bitwise_or(x, y)"
description: 按位或运算
example: "* | SELECT bitwise_or(5, 3)"
- name: bitwise_xor
syntax: "bitwise_xor(x, y)"
description: 按位异或运算
example: "* | SELECT bitwise_xor(5, 3)"
- name: bitwise_not
syntax: "bitwise_not(x)"
description: 按位取反运算
example: "* | SELECT bitwise_not(5)"
- name: bitwise_left_shift
syntax: "bitwise_left_shift(x, n)"
description: 按位左移n位
example: "* | SELECT bitwise_left_shift(5, 2)"
- name: bitwise_right_shift
syntax: "bitwise_right_shift(x, n)"
description: 按位右移n位
example: "* | SELECT bitwise_right_shift(5, 1)"
- name: bitwise_right_shift_arithmetic
syntax: "bitwise_right_shift_arithmetic(x, n)"
description: 算术右移n位(保留符号位)
example: "* | SELECT bitwise_right_shift_arithmetic(-8, 2)"
use_cases:
- 权限位掩码操作
- 标志位检查
- 位图运算
- 低级别数据处理
important_notes:
- 所有位运算函数参数必须为bigint类型
- 需要先cast转换为bigint
- 位运算结果也是bigint类型
FILE:references/functions/color.yaml
category: color_functions
name: 颜色函数
description: 颜色表示与转换,用于可视化展示
support:
sql: true
spl: false
functions:
- name: bar
syntax: "bar(x, width [, low, high])"
description: 生成ASCII条形图
params:
- x: 数值
- width: 条形图宽度
- low: 最小值(可选)
- high: 最大值(可选)
example: "* | SELECT request_time, bar(cast(request_time as double), 20) as bar"
- name: color
syntax: "color(string [, color])"
description: 为字符串添加颜色标记(ANSI颜色码)
params:
- string: 要着色的字符串
- color: 颜色名称(可选)
example: "* | SELECT color('ERROR', 'red')"
- name: render
syntax: "render(x, color)"
description: 使用指定颜色渲染布尔值
example: "* | SELECT render(cast(status as bigint) >= 400, 'red')"
- name: rgb
syntax: "rgb(red, green, blue)"
description: 根据RGB值创建颜色
params:
- red: 红色分量(0-255)
- green: 绿色分量(0-255)
- blue: 蓝色分量(0-255)
example: "* | SELECT rgb(255, 0, 0)"
use_cases:
- 控制台输出美化
- 日志级别着色
- 可视化标记
- ASCII图表
important_notes:
- 主要用于控制台输出
- 支持标准ANSI颜色
- 在Web界面可能不显示颜色
FILE:references/functions/comparison.yaml
category: comparison_functions
name: 同比和环比函数
description: 计算时间序列数据的相对变化,用于同比环比分析
support:
sql: true
spl: false
functions:
- name: compare
syntax: "compare(x, n)"
description: 对比当前时间周期内的计算结果与n秒之前时间周期内的计算结果
params:
- x: double或long类型的计算结果
- n: 时间窗口,单位为秒。如3600(1小时)、86400(1天)、604800(1周)
returns: 数组格式 [当前结果, n秒前结果, 比值]
examples:
sql: "* | SELECT compare(PV, 86400) FROM (SELECT count(*) AS PV FROM log)"
note: 对比的时间必须相同,支持对比当前1小时与昨天同时段,不支持对比当前1小时与上1小时
- name: compare (multiple)
syntax: "compare(x, n1, n2, n3...)"
description: 对比当前时间周期与多个历史时间周期的计算结果
params:
- x: double或long类型的计算结果
- n1, n2, n3: 多个时间窗口,单位为秒
returns: 数组格式,包含当前结果和各历史时间点的结果及比值
examples:
sql: "* | SELECT status, request_method, compare(PV, 3600) FROM (SELECT status, request_method, count(*) AS PV FROM log GROUP BY status, request_method) GROUP BY status, request_method"
- name: ts_compare
syntax: "ts_compare(x, n)"
description: 对比当前时间周期内的计算结果与n秒之前时间周期内的计算结果,必须按照时间列分组
params:
- x: double或long类型的计算结果
- n: 时间窗口,单位为秒
returns: 数组格式 [当前结果, n秒前结果, 比值, n秒前的Unix时间戳]
examples:
sql: "* | SELECT time, ts_compare(PV, 86400) as diff FROM (SELECT count(*) as PV, date_trunc('hour', __time__) AS time FROM log GROUP BY time) GROUP BY time ORDER BY time"
note: 必须按照时间列进行分组(GROUP BY),不支持嵌套使用
important_notes:
- compare函数要求对比的时间周期必须相同
- ts_compare函数必须按照时间列进行GROUP BY
- compare和ts_compare不支持嵌套使用
- 返回结果为数组,可使用下标[1][2][3]获取具体值
use_cases:
- 对比今天与昨天同时段的访问量
- 计算周同比增长率
- 分析各时段流量趋势
- 监控关键指标的环比变化
FILE:references/functions/conditional.yaml
category: conditional_functions
name: 条件函数
description: 条件判断和分支
support:
sql: true
spl: true
functions:
- name: if
syntax: "if(condition, true_value, false_value)"
description: 条件表达式
examples:
sql: "* | SELECT if(cast(status as BIGINT) >= 500, 'error', 'normal') AS level"
spl: "* | extend level = if(cast(status as BIGINT) >= 500, 'error', 'normal')"
- name: case
syntax: "CASE WHEN condition THEN result [...] ELSE default END"
description: 多条件分支
examples:
sql: |
* | SELECT CASE
WHEN cast(status as BIGINT) >= 500 THEN 'error'
WHEN cast(status as BIGINT) >= 400 THEN 'client_error'
ELSE 'success'
END AS status_type
spl: |
* | extend status_type = CASE
WHEN cast(status as BIGINT) >= 500 THEN 'error'
WHEN cast(status as BIGINT) >= 400 THEN 'client_error'
ELSE 'success'
END
- name: coalesce
syntax: "coalesce(value1, value2, ..., default)"
description: 返回第一个非NULL值
examples:
sql: "* | SELECT coalesce(user_name, user_id, 'anonymous') AS user"
spl: "* | extend user = coalesce(user_name, user_id, 'anonymous')"
- name: nullif
syntax: "nullif(value1, value2)"
description: 相等则返回NULL,否则返回第一个值
examples:
sql: "* | SELECT nullif(status, '')"
spl: "* | extend valid_status = nullif(status, '')"
FILE:references/functions/conversion.yaml
category: conversion_functions
name: 单位换算函数
description: 换算数据量或时间间隔的单位
support:
sql: true
spl: false
functions:
- name: convert_data_size
syntax: "convert_data_size(size, unit)"
description: 将数据大小转换为指定单位
params:
- size: 数据大小(字节)
- unit: 目标单位(B、KB、MB、GB、TB、PB)
returns: 转换后的数据大小(保留两位小数)
example: "* | SELECT convert_data_size(cast(body_bytes_sent as double), 'MB') AS size_mb"
- name: format_duration
syntax: "format_duration(duration)"
description: 将毫秒数转换为可读的时间格式
params:
- duration: 时间长度(毫秒)
returns: 格式化的时间字符串(如 1d 2h 3m 4s)
example: "* | SELECT format_duration(cast(request_time as bigint) * 1000) AS duration"
use_cases:
- 数据大小可读化展示
- 时间间隔格式化
- 性能指标展示
- 报表生成
important_notes:
- convert_data_size输入单位为字节(Byte)
- format_duration输入单位为毫秒
- 返回结果为字符串类型
- 适合用于结果展示,不适合用于计算
examples:
- desc: 将响应大小从字节转换为MB
sql: "* | SELECT convert_data_size(cast(body_bytes_sent as double), 'MB')"
- desc: 格式化请求时间
sql: "* | SELECT format_duration(cast(request_time as bigint))"
FILE:references/functions/datetime.yaml
category: datetime_functions
name: 日期时间函数
description: 处理日期和时间
support:
sql: true
spl: true
functions:
- name: date_format
syntax: "date_format(timestamp, format)"
description: 格式化时间戳
format_examples:
- "%Y-%m-%d": "2024-01-01"
- "%Y-%m-%d %H:%i:%s": "2024-01-01 12:30:45"
- "%H:%i": "12:30"
examples:
sql: "* | SELECT date_format(__time__, '%Y-%m-%d %H:%i:%s') AS time"
spl: "* | extend time_str = date_format(__time__, '%Y-%m-%d %H:%i:%s')"
note: __time__是日志时间戳(Unix秒)
- name: date_parse
syntax: "date_parse(string, format)"
description: 解析时间字符串为时间戳
examples:
sql: "* | SELECT date_parse(time_str, '%Y-%m-%d %H:%i:%s')"
spl: "* | extend timestamp = date_parse(time_str, '%Y-%m-%d %H:%i:%s')"
- name: from_unixtime
syntax: "from_unixtime(unix_timestamp)"
description: Unix时间戳转时间对象
examples:
sql: "* | SELECT from_unixtime(__time__)"
spl: "* | extend time_obj = from_unixtime(__time__)"
- name: to_unixtime
syntax: "to_unixtime(timestamp)"
description: 时间对象转Unix时间戳
examples:
sql: "* | SELECT to_unixtime(cast(time_str as TIMESTAMP))"
spl: "* | extend unix_ts = to_unixtime(cast(time_str as TIMESTAMP))"
- name: date_trunc
syntax: "date_trunc(unit, timestamp)"
description: 时间截断到指定粒度
units: [second, minute, hour, day, week, month, quarter, year]
examples:
sql: "* | SELECT date_trunc('hour', __time__) AS hour"
spl: "* | extend hour = date_trunc('hour', __time__)"
note: 常用于按时间分组统计
- name: current_timestamp
syntax: "current_timestamp"
description: 获取当前时间戳
examples:
sql: "* | SELECT current_timestamp"
spl: "* | extend now = current_timestamp"
FILE:references/functions/encoding.yaml
category: encoding_functions
name: 编码解码函数
description: 数据编码和解码
support:
sql: true
spl: true
functions:
- name: url_encode
syntax: "url_encode(str)"
description: URL编码
examples:
sql: "* | SELECT url_encode(query)"
spl: "* | extend encoded_query = url_encode(query)"
- name: url_decode
syntax: "url_decode(str)"
description: URL解码
examples:
sql: "* | SELECT url_decode(encoded_param)"
spl: "* | extend decoded_param = url_decode(encoded_param)"
- name: base64_encode
syntax: "base64_encode(str)"
description: Base64编码
examples:
sql: "* | SELECT base64_encode(data)"
spl: "* | extend encoded_data = base64_encode(data)"
- name: base64_decode
syntax: "base64_decode(str)"
description: Base64解码
examples:
sql: "* | SELECT base64_decode(encoded_data)"
spl: "* | extend decoded_data = base64_decode(encoded_data)"
FILE:references/functions/geo.yaml
category: geo_functions
name: 地理函数
description: 地理位置分析与地图计算
support:
sql: true
spl: false
functions:
- name: geohash
syntax: "geohash(latitude, longitude)"
description: 将经纬度编码为geohash字符串
returns: varchar类型
example: "* | SELECT geohash(39.9075, 116.3972)"
- name: geohash (with precision)
syntax: "geohash(latitude, longitude, precision)"
description: 将经纬度编码为指定精度的geohash字符串
params:
- precision: 精度,取值1-12,默认为12
example: "* | SELECT geohash(39.9075, 116.3972, 8)"
- name: geohash_decode
syntax: "geohash_decode(geohash_string)"
description: 将geohash字符串解码为经纬度
returns: 包含latitude和longitude的Row类型
example: "* | SELECT geohash_decode('wx4g0s')"
use_cases:
- 地理位置编码
- 附近位置查询
- 地理聚类
- 热力图展示
important_notes:
- geohash精度越高,表示的区域越小
- 常用精度:6位约±0.61km,8位约±19m
- geohash具有前缀特性,前缀相同表示位置接近
FILE:references/functions/geospatial.yaml
category: geospatial_functions
name: 空间几何函数
description: 处理空间几何体和地理位置数据
support:
sql: true
spl: false
functions:
- name: ST_Point
syntax: "ST_Point(longitude, latitude)"
description: 根据经纬度创建一个点
example: "* | SELECT ST_Point(120.13, 30.26)"
- name: ST_GeometryFromText
syntax: "ST_GeometryFromText(wkt_string)"
description: 从WKT(Well-Known Text)字符串创建几何对象
example: "* | SELECT ST_GeometryFromText('POINT(120.13 30.26)')"
- name: ST_AsText
syntax: "ST_AsText(geometry)"
description: 将几何对象转换为WKT字符串
example: "* | SELECT ST_AsText(ST_Point(120.13, 30.26))"
- name: ST_Contains
syntax: "ST_Contains(geometry1, geometry2)"
description: 判断geometry1是否完全包含geometry2
returns: boolean
example: "* | SELECT ST_Contains(ST_GeometryFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))'), ST_Point(5, 5))"
- name: ST_Distance
syntax: "ST_Distance(geometry1, geometry2)"
description: 计算两个几何对象之间的距离
example: "* | SELECT ST_Distance(ST_Point(120.13, 30.26), ST_Point(121.47, 31.23))"
- name: ST_Intersects
syntax: "ST_Intersects(geometry1, geometry2)"
description: 判断两个几何对象是否相交
returns: boolean
example: "* | SELECT ST_Intersects(ST_GeometryFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))'), ST_Point(5, 5))"
- name: ST_Boundary
syntax: "ST_Boundary(geometry)"
description: 返回几何对象的边界
example: "* | SELECT ST_Boundary(ST_GeometryFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))'))"
- name: ST_Buffer
syntax: "ST_Buffer(geometry, distance)"
description: 返回距离几何对象指定距离内的区域
example: "* | SELECT ST_Buffer(ST_Point(0, 0), 1.0)"
- name: ST_Area
syntax: "ST_Area(geometry)"
description: 计算几何对象的面积
example: "* | SELECT ST_Area(ST_GeometryFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))'))"
use_cases:
- 地理围栏判断
- 距离计算
- 区域分析
- 空间查询
important_notes:
- 支持WKT格式的几何对象
- 坐标系统为WGS84
- 距离单位与坐标单位一致
FILE:references/functions/hash.yaml
category: hash_functions
name: 哈希函数
description: 计算哈希值
support:
sql: true
spl: true
functions:
- name: md5
syntax: "md5(str)"
description: 计算MD5哈希
returns: 32位十六进制字符串
examples:
sql: "* | SELECT md5(user_id)"
spl: "* | extend user_id_hash = md5(user_id)"
- name: sha1
syntax: "sha1(str)"
description: 计算SHA1哈希
examples:
sql: "* | SELECT sha1(token)"
spl: "* | extend token_hash = sha1(token)"
- name: sha256
syntax: "sha256(str)"
description: 计算SHA256哈希
examples:
sql: "* | SELECT sha256(token)"
spl: "* | extend token_hash = sha256(token)"
FILE:references/functions/hyperloglog.yaml
category: hyperloglog_functions
name: HyperLogLog函数
description: 对大数据集进行统计处理,牺牲精度以节省内存
support:
sql: true
spl: false
functions:
- name: approx_set
syntax: "approx_set(x)"
description: 将x的值转换为HyperLogLog对象
returns: HyperLogLog类型
example: "* | SELECT approx_set(client_ip)"
- name: cardinality
syntax: "cardinality(hll)"
description: 计算HyperLogLog对象中唯一值的估算数量
returns: bigint类型
example: "* | SELECT cardinality(approx_set(client_ip)) AS uv"
- name: empty_approx_set
syntax: "empty_approx_set()"
description: 创建一个空的HyperLogLog对象
example: "* | SELECT empty_approx_set()"
- name: merge
syntax: "merge(hll)"
description: 合并多个HyperLogLog对象
returns: HyperLogLog类型
example: "* | SELECT cardinality(merge(approx_set(client_ip))) FROM log GROUP BY status"
use_cases:
- UV(独立访客)统计
- 去重计数
- 多维度UV分析
- 实时大数据去重
important_notes:
- HyperLogLog是一种概率数据结构
- 标准误差约2.3%
- 内存占用固定约12KB
- 适合大数据量的去重统计
- 比count(distinct)更高效
related:
- approx_distinct函数是HyperLogLog的便捷封装
FILE:references/functions/ip_geo.yaml
category: ip_geo_functions
name: IP地理位置函数
description: IP地址转换为地理位置
support:
sql: false
spl: true
note: 这些函数仅在SPL模式下可用
functions:
- name: ip_to_province
syntax: "ip_to_province(ip)"
description: IP转省份
returns: 省份名称(中文)
example: "* | extend province = ip_to_province(client_ip)"
- name: ip_to_city
syntax: "ip_to_city(ip)"
description: IP转城市
returns: 城市名称(中文)
example: "* | extend city = ip_to_city(client_ip)"
- name: ip_to_country
syntax: "ip_to_country(ip)"
description: IP转国家
returns: 国家名称(中文)
example: "* | extend country = ip_to_country(client_ip)"
- name: ip_to_geo
syntax: "ip_to_geo(ip)"
description: IP转经纬度
returns: 经纬度字符串(格式:纬度,经度)
example: "* | extend geo = ip_to_geo(client_ip)"
FILE:references/functions/json.yaml
category: json_functions
name: JSON函数
description: 处理JSON数据
support:
sql: true
spl: true
functions:
- name: json_extract
syntax: "json_extract(json_string, json_path)"
description: 提取JSON值(返回JSON)
path_syntax:
- "$": 根节点
- "$.key": 对象属性
- "$[index]": 数组索引
- "$[*]": 数组所有元素
examples:
sql: "* | SELECT json_extract(payload, '$.user.id')"
spl: "* | extend user_id = json_extract(payload, '$.user.id')"
- name: json_extract_scalar
syntax: "json_extract_scalar(json_string, json_path)"
description: 提取JSON标量值(返回字符串)
examples:
sql: "* | SELECT json_extract_scalar(payload, '$.user.name') AS user_name"
spl: "* | extend user_name = json_extract_scalar(payload, '$.user.name')"
note: 提取后可能需要cast转换为其他类型
FILE:references/functions/lambda.yaml
category: lambda_expressions
name: Lambda表达式
description: 定义Lambda表达式并传递给指定函数,丰富函数表达
support:
sql: true
spl: partial
functions:
- name: filter
syntax: "filter(array, lambda_expression)"
description: 过滤数组元素,只保留满足条件的元素
lambda_syntax: "x -> condition"
example: "* | SELECT filter(array[5, -6, null, 7], x -> x > 0)"
spl_example: "* | extend filtered = filter(arr, x -> x > 0)"
- name: transform
syntax: "transform(array, lambda_expression)"
description: 对数组每个元素应用转换
lambda_syntax: "x -> expression"
example: "* | SELECT transform(array[5, 6], x -> x + 1)"
spl_example: "* | extend transformed = transform(arr, x -> x * 2)"
- name: reduce
syntax: "reduce(array, initial_value, combine_function, final_function)"
description: 累加计算数组元素
lambda_syntax: "(accumulator, element) -> expression"
example: "* | SELECT reduce(array[5, 20, 50], 0, (s, x) -> s + x, s -> s)"
- name: any_match
syntax: "any_match(array, lambda_expression)"
description: 判断数组中是否有元素满足条件
returns: boolean
example: "* | SELECT any_match(array[1, 2, 3], x -> x > 2)"
- name: all_match
syntax: "all_match(array, lambda_expression)"
description: 判断数组中所有元素是否都满足条件
returns: boolean
example: "* | SELECT all_match(array[1, 2, 3], x -> x > 0)"
- name: map_filter
syntax: "map_filter(map, lambda_expression)"
description: 过滤Map中的键值对
lambda_syntax: "(key, value) -> condition"
example: "* | SELECT map_filter(map(array[1,2,3], array['a','b','c']), (k, v) -> k > 1)"
- name: zip_with
syntax: "zip_with(array1, array2, lambda_expression)"
description: 将两个数组元素两两组合并计算
lambda_syntax: "(x, y) -> expression"
example: "* | SELECT zip_with(array[1, 2], array[3, 4], (x, y) -> x + y)"
lambda_syntax:
single_parameter:
format: "x -> expression"
example: "x -> x * 2"
description: 单参数Lambda表达式
multiple_parameters:
format: "(x, y, ...) -> expression"
example: "(x, y) -> x + y"
description: 多参数Lambda表达式
complex_expression:
format: "x -> complex_expression"
example: "x -> if(x > 0, x, 0)"
description: 复杂条件表达式
use_cases:
- 数组元素过滤
- 数组元素转换
- 累加计算
- 复杂条件判断
- Map数据处理
important_notes:
- Lambda表达式是匿名函数
- 参数名可以自定义(如x, y, element等)
- 表达式可以包含条件、运算等
- 配合数组和Map函数使用
- SPL中部分支持Lambda表达式
examples:
- desc: 过滤正数
sql: "* | SELECT filter(array[5, -6, 7], x -> x > 0)"
- desc: 元素翻倍
sql: "* | SELECT transform(array[1, 2, 3], x -> x * 2)"
- desc: 计算数组和
sql: "* | SELECT reduce(array[1, 2, 3], 0, (s, x) -> s + x, s -> s)"
- desc: 过滤Map
sql: "* | SELECT map_filter(my_map, (k, v) -> v is not null)"
FILE:references/functions/map.yaml
category: map_functions
name: Map映射函数和运算符
description: 操作键值对数据结构,进行Map的创建、查询、合并等操作
support:
sql: true
spl: partial
functions:
- name: map
syntax: "map() 或 map(x, y)"
description: 返回空Map或将两个数组映射为Map
params:
- x: 键数组(可选)
- y: 值数组(可选)
examples:
sql: "* | SELECT map()"
sql_with_arrays: "* | SELECT map(try_cast(json_parse(class) AS array(varchar)), try_cast(json_parse(number) AS array(bigint)))"
spl: "* | extend mapped = map(keys_arr, values_arr)"
- name: element_at
syntax: "element_at(x, key)"
description: 获取Map中指定键的值
examples:
sql: "* | SELECT element_at(histogram(request_method), 'DELETE') AS count"
spl: "* | extend value = element_at(map_field, 'key1')"
- name: cardinality
syntax: "cardinality(x)"
description: 计算Map的大小(键值对数量)
examples:
sql: "* | SELECT cardinality(histogram(request_method)) AS kinds"
- name: map_keys
syntax: "map_keys(x)"
description: 提取Map中所有的键,以数组形式返回
examples:
sql: "* | SELECT map_keys(try_cast(json_parse(etl_context) AS map(varchar, varchar)))"
spl: "* | extend keys = map_keys(map_field)"
- name: map_values
syntax: "map_values(x)"
description: 提取Map中所有键的值,以数组形式返回
examples:
sql: "* | SELECT map_values(try_cast(json_parse(etl_context) AS map(varchar, varchar)))"
spl: "* | extend values = map_values(map_field)"
- name: map_concat
syntax: "map_concat(x, y...)"
description: 将多个Map合并为一个Map
examples:
sql: "* | SELECT map_concat(cast(json_parse(etl_context) AS map(varchar, varchar)), cast(json_parse(progress) AS map(varchar, varchar)))"
spl: "* | extend merged = map_concat(map1, map2)"
- name: map_filter
syntax: "map_filter(x, lambda_expression)"
description: 结合Lambda表达式过滤Map中的元素
examples:
sql: "* | SELECT map_filter(map(array[10, 20, 30], array['a', NULL, 'c']), (k, v) -> v is not null)"
spl: "* | extend filtered = map_filter(map_field, (k, v) -> v > 100)"
- name: histogram
syntax: "histogram(x)"
description: 对数据进行分组,返回JSON格式的Map
examples:
sql: "* | SELECT histogram(request_method) AS request_method"
note: 类似于GROUP BY,返回Map格式
- name: histogram_u
syntax: "histogram_u(x)"
description: 对数据进行分组,返回多行多列格式
examples:
sql: "* | SELECT histogram_u(request_method) as request_method"
note: 适合用柱状图展示结果
- name: map_agg
syntax: "map_agg(x, y)"
description: 将x和y映射为Map,x为键,y为键值。当y有多个值时随机选一个
examples:
sql: "* | SELECT map_agg(request_method, request_time)"
- name: multimap_agg
syntax: "multimap_agg(x, y)"
description: 将x和y映射为Map,x为键,y为键值(数组格式)。当y有多个值时全部保留
examples:
sql: "* | SELECT multimap_agg(request_method, request_time)"
use_cases:
- 分析请求方法分布
- 统计状态码出现次数
- 合并多个配置Map
- 从JSON字段提取Map结构
- 对Map进行条件过滤
important_notes:
- 使用下标运算符[key]可直接获取值
- histogram函数适合快速分组统计
- 配合cast和json_parse处理JSON格式的Map字段
- Map的键和值类型需要一致
FILE:references/functions/math.yaml
category: math_functions
name: 数学函数
description: 数学运算
support:
sql: true
spl: true
functions:
- name: abs
syntax: "abs(x)"
description: 绝对值
examples:
sql: "* | SELECT abs(cast(value1 as BIGINT) - cast(value2 as BIGINT))"
spl: "* | extend diff_abs = abs(cast(value1 as BIGINT) - cast(value2 as BIGINT))"
- name: round
syntax: "round(x [, decimals])"
description: 四舍五入
examples:
sql: "* | SELECT round(cast(price as DOUBLE), 2)"
spl: "* | extend price_rounded = round(cast(price as DOUBLE), 2)"
- name: floor
syntax: "floor(x)"
description: 向下取整
examples:
sql: "* | SELECT floor(cast(value as DOUBLE))"
spl: "* | extend value_floor = floor(cast(value as DOUBLE))"
- name: ceil
syntax: "ceil(x)"
description: 向上取整
examples:
sql: "* | SELECT ceil(cast(value as DOUBLE))"
spl: "* | extend value_ceil = ceil(cast(value as DOUBLE))"
- name: power
syntax: "power(x, y)"
description: 幂运算 x^y
examples:
sql: "* | SELECT power(2, 10)"
spl: "* | extend result = power(2, 10)"
- name: sqrt
syntax: "sqrt(x)"
description: 平方根
examples:
sql: "* | SELECT sqrt(cast(value as DOUBLE))"
spl: "* | extend value_sqrt = sqrt(cast(value as DOUBLE))"
- name: mod
syntax: "mod(x, y)"
description: 取模运算
examples:
sql: "* | SELECT mod(cast(count as BIGINT), 2)"
spl: "* | extend is_even = mod(cast(count as BIGINT), 2)"
FILE:references/functions/mobile.yaml
category: mobile_functions
name: 电话号码函数
description: 分析中国内地地域电话号码的归属地、运营商等信息
support:
sql: true
spl: false
functions:
- name: mobile_carrier
syntax: "mobile_carrier(mobile_number)"
description: 获取手机号码的运营商
returns: varchar类型(中国移动、中国联通、中国电信)
example: "* | SELECT mobile_carrier('13900000000')"
- name: mobile_city
syntax: "mobile_city(mobile_number)"
description: 获取手机号码的归属城市
returns: varchar类型
example: "* | SELECT mobile_city('13900000000')"
- name: mobile_province
syntax: "mobile_province(mobile_number)"
description: 获取手机号码的归属省份
returns: varchar类型
example: "* | SELECT mobile_province('13900000000')"
use_cases:
- 用户地域分析
- 运营商分布统计
- 号码归属地查询
- 区域用户画像
important_notes:
- 仅支持中国内地手机号码
- 号码格式需为11位数字字符串
- 数据库会定期更新
- 对于无效号码返回NULL
FILE:references/functions/operators.yaml
category: operators
name: 比较与逻辑运算符
description: 判断参数的大小关系,组合多个布尔条件
support:
sql: true
spl: true
comparison_operators:
- name: "="
description: 等于
example: "status = 200"
- name: "!= 或 <>"
description: 不等于
example: "status != 200"
- name: "<"
description: 小于
example: "cast(request_time as bigint) < 100"
- name: "<="
description: 小于等于
example: "cast(request_time as bigint) <= 100"
- name: ">"
description: 大于
example: "cast(request_time as bigint) > 100"
- name: ">="
description: 大于等于
example: "cast(request_time as bigint) >= 100"
- name: BETWEEN
syntax: "x BETWEEN min AND max"
description: 判断x是否在min和max之间(包含边界)
example: "* | SELECT * WHERE cast(status as bigint) BETWEEN 200 AND 299"
- name: IN
syntax: "x IN (value1, value2, ...)"
description: 判断x是否在值列表中
example: "* | SELECT * WHERE status IN ('200', '201', '204')"
- name: IS NULL
description: 判断是否为NULL
example: "* | SELECT * WHERE user_name IS NULL"
- name: IS NOT NULL
description: 判断是否不为NULL
example: "* | SELECT * WHERE user_name IS NOT NULL"
- name: LIKE
syntax: "x LIKE pattern"
description: 字符串模式匹配,%匹配任意字符,_匹配单个字符
example: "* | SELECT * WHERE request_uri LIKE '/api/%'"
logical_operators:
- name: AND
description: 逻辑与,所有条件都为true时返回true
example: "cast(status as bigint) >= 200 AND cast(status as bigint) < 300"
- name: OR
description: 逻辑或,任一条件为true时返回true
example: "status = '404' OR status = '500'"
- name: NOT
description: 逻辑非,对布尔值取反
example: "NOT status = '200'"
important_notes:
- 字符串比较区分大小写
- NULL参与的比较运算结果为NULL
- 使用IS NULL / IS NOT NULL判断NULL值
- SPL中字段默认为VARCHAR类型,数值比较前需要cast
- 组合多个条件时注意运算符优先级,必要时使用括号
use_cases:
- 过滤特定状态码的日志
- 筛选特定时间范围的数据
- 组合多个条件查询
- NULL值处理
FILE:references/functions/overview.yaml
# SLS 函数参考索引
description: |
本文件是当前目录下函数文档的索引。
当前目录包含 SLS SQL/SPL 查询支持的所有函数,按功能分类存放。
需要具体函数详情时,请查阅对应的 YAML 文件。
# 重要说明
important_notes:
- SPL中字段默认为VARCHAR类型,数值运算前必须用cast()或try_cast()转换
- SQL语法:* | SELECT ... (使用SELECT、WHERE、GROUP BY)
- SPL语法:* | extend/where/stats ... (使用extend、where、stats)
- 字符串必须用单引号'',双引号""或无引号表示字段名
- 正则表达式中反斜杠不需要双重转义,\d 直接写 \d
# 函数分类目录
function_files:
- file: aggregate.yaml
name: 聚合函数
functions: [count, sum, avg, max, min, count_if, arbitrary]
support: SQL
description: 数据统计和汇总,配合GROUP BY使用
- file: string.yaml
name: 字符串函数
functions: [concat, substr, lower, upper, trim, length, split, replace, position]
support: SQL + SPL
description: 文本处理、拼接、截取、查找替换
- file: regex.yaml
name: 正则表达式函数
functions: [regexp_like, regexp_extract, regexp_replace]
support: SQL + SPL
description: 正则匹配、提取、替换(SPL使用RE2引擎)
- file: datetime.yaml
name: 日期时间函数
functions: [date_format, date_parse, date_trunc, from_unixtime, to_unixtime, current_timestamp]
support: SQL + SPL
description: 时间格式化、解析、截断、转换
- file: type_conversion.yaml
name: 类型转换函数
functions: [cast, try_cast]
support: SQL + SPL
description: 数据类型转换(必须转换后才能进行数值运算)
- file: conditional.yaml
name: 条件函数
functions: [if, case, coalesce, nullif]
support: SQL + SPL
description: 条件判断、分支逻辑、NULL处理
- file: json.yaml
name: JSON函数
functions: [json_extract, json_extract_scalar]
support: SQL + SPL
description: 从JSON字符串提取数据
- file: math.yaml
name: 数学函数
functions: [abs, round, floor, ceil, power, sqrt, mod]
support: SQL + SPL
description: 数学运算和计算
- file: url.yaml
name: URL函数
functions: [url_extract_host, url_extract_path, url_extract_parameter, url_extract_protocol]
support: SQL + SPL
description: URL解析和参数提取
- file: ip_geo.yaml
name: IP地理位置函数
functions: [ip_to_province, ip_to_city, ip_to_country, ip_to_geo]
support: 仅SPL
description: IP地址转地理位置(省份、城市、国家、经纬度)
- file: encoding.yaml
name: 编码解码函数
functions: [url_encode, url_decode, base64_encode, base64_decode]
support: SQL + SPL
description: URL和Base64编码解码
- file: hash.yaml
name: 哈希函数
functions: [md5, sha1, sha256]
support: SQL + SPL
description: 哈希值计算
- file: comparison.yaml
name: 同比和环比函数
functions: [compare, ts_compare]
support: SQL + SPL
description: 计算时间序列数据的相对变化,用于同比环比分析
- file: array.yaml
name: 数组函数和运算符
functions: [array_distinct, array_intersect, array_join, array_union, reverse, slice, contains]
support: SQL + SPL
description: 对数组进行增删改查、遍历和转换
- file: map.yaml
name: Map映射函数和运算符
functions: [cardinality, element_at, histogram, histogram_u, map, map_keys, map_values]
support: SQL + SPL
description: 操作键值对数据结构
- file: statistical.yaml
name: 数学统计函数
functions: [corr, covar_pop, covar_samp, stddev, variance, regr_intercept, regr_slope]
support: SQL
description: 数据分布分析、相关性分析与数值统计计算
- file: window.yaml
name: 窗口函数
functions: [row_number, rank, dense_rank, lag, lead, first_value, last_value]
support: SQL
description: 基于数据窗口的聚合、排序和分析
- file: approximate.yaml
name: 估算函数
functions: [approx_distinct, approx_percentile, numeric_histogram, numeric_histogram_u]
support: SQL
description: 基于数据预测或填充缺失值的近似计算
- file: binary.yaml
name: 二进制函数
functions: [from_base64, to_base64, from_hex, to_hex, from_big_endian_64, to_big_endian_64]
support: SQL + SPL
description: 处理二进制类型的数据
- file: bitwise.yaml
name: 位运算函数
functions: [bit_count, bitwise_and, bitwise_or, bitwise_xor, bitwise_not]
support: SQL
description: 直接操作二进制位
- file: geospatial.yaml
name: 空间几何函数
functions: [ST_AsText, ST_GeometryFromText, ST_Point, ST_Contains, ST_Distance, ST_Intersects]
support: SQL
description: 处理空间几何体和地理位置数据
- file: geo.yaml
name: 地理函数
functions: [geohash, geohash_decode]
support: SQL
description: 地理位置分析与地图计算
- file: color.yaml
name: 颜色函数
functions: [bar, color, render, rgb]
support: SQL
description: 颜色表示与转换,用于可视化展示
- file: hyperloglog.yaml
name: HyperLogLog函数
functions: [approx_set, cardinality, empty_approx_set, merge]
support: SQL
description: 对大数据集进行统计处理,牺牲精度以节省内存
- file: mobile.yaml
name: 电话号码函数
functions: [mobile_carrier, mobile_city, mobile_province]
support: SQL
description: 分析中国内地地域电话号码的归属地、运营商等信息
- file: operators.yaml
name: 比较与逻辑运算符
functions: [AND, OR, NOT, BETWEEN, IN, LIKE, IS NULL, IS NOT NULL]
support: SQL + SPL
description: 判断参数的大小关系,组合多个布尔条件
- file: conversion.yaml
name: 单位换算函数
functions: [convert_data_size, format_duration]
support: SQL
description: 换算数据量或时间间隔的单位
- file: window_funnel.yaml
name: 窗口漏斗函数
functions: [window_funnel]
support: SQL
description: 分析用户行为、APP流量、产品目标转化等数据
- file: lambda.yaml
name: Lambda表达式
functions: [filter, reduce, transform, any_match, all_match]
support: SQL + SPL
description: 定义Lambda表达式并传递给指定函数,丰富函数表达
# 常见场景快速查找
quick_lookup:
数据统计: aggregate.yaml
文本处理: string.yaml
正则匹配: regex.yaml
时间处理: datetime.yaml
类型转换: type_conversion.yaml
条件判断: conditional.yaml
JSON解析: json.yaml
数值计算: math.yaml
URL解析: url.yaml
IP分析: ip_geo.yaml
编码解码: encoding.yaml
哈希计算: hash.yaml
同比环比: comparison.yaml
数组操作: array.yaml
Map操作: map.yaml
统计分析: statistical.yaml
窗口分析: window.yaml
近似计算: approximate.yaml
二进制处理: binary.yaml
位运算: bitwise.yaml
空间几何: geospatial.yaml
地理分析: geo.yaml
颜色处理: color.yaml
大数据统计: hyperloglog.yaml
手机号分析: mobile.yaml
逻辑运算: operators.yaml
单位转换: conversion.yaml
漏斗分析: window_funnel.yaml
Lambda表达式: lambda.yaml
# 关键使用提示
usage_tips:
类型转换: 数值比较必须先cast,如 cast(status as BIGINT) >= 500
安全转换: 用try_cast避免转换失败,如 try_cast(value as DOUBLE)
时间分组: 用date_trunc按时间粒度统计,如 date_trunc('hour', __time__)
GROUP BY: SELECT只能包含分组字段或聚合函数,非分组字段用arbitrary()
NULL处理: 用coalesce提供默认值,如 coalesce(user_name, 'anonymous')
# 相关文档
related_docs:
- ../query_analysis/sql.yaml: SQL查询完整语法
- ../spl/overview.yaml: SPL查询基础语法
- ../query_analysis/indexSearch.yaml: 索引查询和关键字搜索
FILE:references/functions/regex.yaml
category: regex_functions
name: 正则表达式函数
description: 使用正则表达式处理字符串
support:
sql: true
spl: true
notes:
- SPL使用RE2正则引擎,某些正则特性受到限制,RE2不支持的特性包括:后向引用(\1)、环视(?<=...)、贪婪模式修饰符等,建议使用RE2兼容的正则表达式语法
- 特别注意,正则表达式中的反斜杠不需要双重转义,比如\d 直接写 \d即可,不需要写成\\d
functions:
- name: regexp_like
syntax: "regexp_like(x, pattern)"
description: 检查是否匹配正则, 正则表达式中的反斜杠不需要双重转义,比如\d 直接写 \d即可
returns: 布尔值
examples:
sql: "* | SELECT * WHERE regexp_like(message, 'ERROR|FATAL')"
spl: "* | where regexp_like(message, 'ERROR|FATAL')"
- name: regexp_extract
syntax: "regexp_extract(x, pattern [, group])"
description: 提取匹配的内容
params:
- group: 捕获组索引(默认0)
examples:
sql: "* | SELECT regexp_extract(message, 'code:(\d+)', 1) AS error_code"
spl: "* | extend error_code = regexp_extract(message, 'code:(\d+)', 1)"
- name: regexp_replace
syntax: "regexp_replace(x, pattern, replacement)"
description: 正则替换
examples:
sql: "* | SELECT regexp_replace(phone, '(\d{3})\d{4}(\d{4})', '$1****$2')"
spl: "* | extend phone_masked = regexp_replace(phone, '(\d{3})\d{4}(\d{4})', '$1****$2')"
FILE:references/functions/statistical.yaml
category: statistical_functions
name: 数学统计函数
description: 数据分布分析、相关性分析与数值统计计算
support:
sql: true
spl: false
functions:
- name: corr
syntax: "corr(y, x)"
description: 计算两列的相关系数(皮尔逊相关系数)
returns: double类型,范围[-1, 1]
example: "* | SELECT corr(request_time, request_length)"
- name: covar_pop
syntax: "covar_pop(y, x)"
description: 计算两列的总体协方差
example: "* | SELECT covar_pop(request_time, request_length)"
- name: covar_samp
syntax: "covar_samp(y, x)"
description: 计算两列的样本协方差
example: "* | SELECT covar_samp(request_time, request_length)"
- name: stddev
syntax: "stddev(x)"
description: 计算标准差
example: "* | SELECT stddev(cast(request_time as double))"
- name: stddev_pop
syntax: "stddev_pop(x)"
description: 计算总体标准差
example: "* | SELECT stddev_pop(cast(request_time as double))"
- name: stddev_samp
syntax: "stddev_samp(x)"
description: 计算样本标准差
example: "* | SELECT stddev_samp(cast(request_time as double))"
- name: variance
syntax: "variance(x)"
description: 计算方差
example: "* | SELECT variance(cast(request_time as double))"
- name: var_pop
syntax: "var_pop(x)"
description: 计算总体方差
example: "* | SELECT var_pop(cast(request_time as double))"
- name: var_samp
syntax: "var_samp(x)"
description: 计算样本方差
example: "* | SELECT var_samp(cast(request_time as double))"
- name: regr_intercept
syntax: "regr_intercept(y, x)"
description: 计算线性回归的截距
example: "* | SELECT regr_intercept(cast(request_time as double), cast(body_bytes_sent as double))"
- name: regr_slope
syntax: "regr_slope(y, x)"
description: 计算线性回归的斜率
example: "* | SELECT regr_slope(cast(request_time as double), cast(body_bytes_sent as double))"
use_cases:
- 分析请求时间和响应大小的相关性
- 计算性能指标的波动情况
- 线性回归预测
- 数据质量评估
FILE:references/functions/string.yaml
category: string_functions
name: 字符串函数
description: 处理文本数据
support:
sql: true
spl: true
functions:
- name: concat
syntax: "concat(x, y, ...)"
description: 拼接字符串
examples:
sql: "* | SELECT concat(region, '-', status)"
spl: "* | extend full_info = concat(region, '-', status)"
- name: substr
syntax: "substr(x, start [, length])"
description: 提取子字符串
params:
- start: 起始位置(从1开始)
- length: 提取长度(可选)
examples:
sql: "* | SELECT substr(message, 1, 100)"
spl: "* | extend short_msg = substr(message, 1, 100)"
- name: lower
syntax: "lower(x)"
description: 转换为小写
examples:
sql: "* | SELECT lower(method)"
spl: "* | extend method_lower = lower(method)"
- name: upper
syntax: "upper(x)"
description: 转换为大写
examples:
sql: "* | SELECT upper(method)"
spl: "* | extend method_upper = upper(method)"
- name: trim
syntax: "trim(x)"
description: 去除首尾空格
examples:
sql: "* | SELECT trim(user_name)"
spl: "* | extend user_clean = trim(user_name)"
- name: ltrim
syntax: "ltrim(x)"
description: 去除左侧空格
examples:
sql: "* | SELECT ltrim(user_name)"
spl: "* | extend user_clean = ltrim(user_name)"
- name: rtrim
syntax: "rtrim(x)"
description: 去除右侧空格
examples:
sql: "* | SELECT rtrim(user_name)"
spl: "* | extend user_clean = rtrim(user_name)"
- name: length
syntax: "length(x)"
description: 返回字符串长度
examples:
sql: "* | SELECT length(message) AS msg_len"
spl: "* | extend msg_len = length(message)"
- name: split
syntax: "split(x, delimiter)"
description: 按分隔符分割字符串
returns: 数组类型
examples:
sql: "* | SELECT split(tags, ',')"
spl: "* | extend tags_array = split(tags, ',')"
- name: replace
syntax: "replace(x, search, replacement)"
description: 替换字符串
examples:
sql: "* | SELECT replace(url, 'http://', 'https://')"
spl: "* | extend url_https = replace(url, 'http://', 'https://')"
- name: position
syntax: "position(substring in string)"
description: 返回子串位置
examples:
sql: "* | SELECT position('error' in message)"
spl: 暂时还没支持
- name: ascii_escape
syntax: "ascii_escape(x)"
description: 将字符串转换为ASCII码
examples:
sql: 暂时还没支持
spl: "* | extend message_ascii = ascii_escape(message)"
- name: ascii_unescape
syntax: "ascii_unescape(x)"
description: 执行ASCII码转义,支持\n、\r、\t、\b、\f、\xXX等转义
examples:
sql: 暂时还没支持
spl: "* | extend message_ascii = ascii_unescape(message)"
- name: unicode_unescape
syntax: "unicode_unescape(x)"
description: 执行Unicode码转义,支持\uXXXX等转义
examples:
sql: 暂时还没支持
spl: "* | extend message_unicode = unicode_unescape(message)"
FILE:references/functions/type_conversion.yaml
category: type_conversion_functions
name: 类型转换函数
description: 数据类型转换
support:
sql: true
spl: true
functions:
- name: cast
syntax: "cast(value as TYPE)"
description: 类型转换(失败抛出错误)
types: [BIGINT, INTEGER, DOUBLE, VARCHAR, BOOLEAN, TIMESTAMP]
examples:
sql: "* | SELECT cast(status as BIGINT) WHERE cast(status as BIGINT) >= 500"
spl: "* | extend status_num = cast(status as BIGINT) | where status_num >= 500"
note: 字段默认为VARCHAR,数值运算前必须转换。SPL中尤其需要注意类型转换
- name: try_cast
syntax: "try_cast(value as TYPE)"
description: 安全类型转换(失败返回NULL)
examples:
sql: "* | SELECT try_cast(status as BIGINT) AS status_num"
spl: "* | extend status_num = try_cast(status as BIGINT)"
note: 推荐使用,避免转换错误导致查询失败
FILE:references/functions/url.yaml
category: url_functions
name: URL函数
description: 解析URL
support:
sql: true
spl: true
functions:
- name: url_extract_host
syntax: "url_extract_host(url)"
description: 提取主机名
examples:
sql: "* | SELECT url_extract_host(request_url)"
spl: "* | extend host = url_extract_host(request_url)"
- name: url_extract_path
syntax: "url_extract_path(url)"
description: 提取路径
examples:
sql: "* | SELECT url_extract_path(request_url)"
spl: "* | extend path = url_extract_path(request_url)"
- name: url_extract_parameter
syntax: "url_extract_parameter(url, param_name)"
description: 提取查询参数
examples:
sql: "* | SELECT url_extract_parameter(request_url, 'user_id')"
spl: "* | extend user_id = url_extract_parameter(request_url, 'user_id')"
- name: url_extract_protocol
syntax: "url_extract_protocol(url)"
description: 提取协议
examples:
sql: "* | SELECT url_extract_protocol(request_url)"
spl: "* | extend protocol = url_extract_protocol(request_url)"
FILE:references/functions/window.yaml
category: window_functions
name: 窗口函数
description: 基于数据窗口的聚合、排序和分析
support:
sql: true
spl: false
functions:
- name: row_number
syntax: "row_number() OVER ([PARTITION BY col] ORDER BY col)"
description: 为每行分配唯一的行号
example: "* | SELECT request_time, row_number() OVER (ORDER BY request_time DESC) AS row_num"
- name: rank
syntax: "rank() OVER ([PARTITION BY col] ORDER BY col)"
description: 计算排名,相同值排名相同,后续排名跳跃
example: "* | SELECT status, count(*) as cnt, rank() OVER (ORDER BY count(*) DESC) AS rank FROM log GROUP BY status"
- name: dense_rank
syntax: "dense_rank() OVER ([PARTITION BY col] ORDER BY col)"
description: 计算排名,相同值排名相同,后续排名连续
example: "* | SELECT status, count(*) as cnt, dense_rank() OVER (ORDER BY count(*) DESC) AS rank FROM log GROUP BY status"
- name: lag
syntax: "lag(col [, offset] [, default]) OVER ([PARTITION BY col] ORDER BY col)"
description: 获取当前行之前第offset行的值
params:
- offset: 偏移量,默认为1
- default: 默认值,当偏移超出范围时返回
example: "* | SELECT time, pv, lag(pv, 1, 0) OVER (ORDER BY time) AS prev_pv FROM (SELECT date_trunc('hour', __time__) as time, count(*) as pv FROM log GROUP BY time)"
- name: lead
syntax: "lead(col [, offset] [, default]) OVER ([PARTITION BY col] ORDER BY col)"
description: 获取当前行之后第offset行的值
params:
- offset: 偏移量,默认为1
- default: 默认值,当偏移超出范围时返回
example: "* | SELECT time, pv, lead(pv, 1, 0) OVER (ORDER BY time) AS next_pv FROM (SELECT date_trunc('hour', __time__) as time, count(*) as pv FROM log GROUP BY time)"
- name: first_value
syntax: "first_value(col) OVER ([PARTITION BY col] ORDER BY col)"
description: 返回窗口中第一个值
example: "* | SELECT time, pv, first_value(pv) OVER (ORDER BY time) AS first_pv FROM (SELECT date_trunc('hour', __time__) as time, count(*) as pv FROM log GROUP BY time)"
- name: last_value
syntax: "last_value(col) OVER ([PARTITION BY col] ORDER BY col)"
description: 返回窗口中最后一个值
example: "* | SELECT time, pv, last_value(pv) OVER (ORDER BY time) AS last_pv FROM (SELECT date_trunc('hour', __time__) as time, count(*) as pv FROM log GROUP BY time)"
- name: ntile
syntax: "ntile(n) OVER ([PARTITION BY col] ORDER BY col)"
description: 将有序数据分为n个桶,返回当前行所在桶号
example: "* | SELECT request_time, ntile(4) OVER (ORDER BY request_time) AS quartile"
use_cases:
- TOP N查询
- 计算排名
- 环比分析(对比相邻时间段)
- 累计计算
- 移动平均
important_notes:
- 窗口函数必须配合OVER子句使用
- PARTITION BY用于分组,ORDER BY用于排序
- 窗口函数不能在WHERE子句中使用
- 可以在窗口函数中使用聚合函数
FILE:references/functions/window_funnel.yaml
category: window_funnel_function
name: 窗口漏斗函数
description: 分析用户行为、APP流量、产品目标转化等数据
support:
sql: true
spl: false
functions:
- name: window_funnel
syntax: "window_funnel(window, mode, timestamp, event1, event2, ...)"
description: 在指定时间窗口内分析用户行为序列,计算漏斗转化
params:
- window: 时间窗口大小(秒)
- mode: 模式,可选值为default(默认)、strict(严格模式)
- timestamp: 时间戳列
- event1, event2, ...: 事件条件(布尔表达式)
returns: 用户完成的最大事件级数
example: |
* | SELECT
window_funnel(
3600,
'default',
cast(__time__ as bigint),
action='page_view',
action='add_to_cart',
action='purchase'
) AS level,
count(*) AS user_count
FROM log
GROUP BY user_id
use_cases:
- 电商购买转化漏斗
- APP使用流程分析
- 注册转化分析
- 用户行为路径分析
mode_options:
default:
description: 默认模式,事件可以不连续,中间可以有其他事件
example: 用户可以在浏览、加购之间进行其他操作
strict:
description: 严格模式,事件必须连续发生,中间不能有其他事件
example: 用户必须依次完成浏览、加购,中间不能有其他操作
important_notes:
- 返回值范围为0到事件数量
- 返回值表示用户完成到第几个步骤
- 需要按用户ID分组
- 时间窗口单位为秒
- 事件需要按时间戳排序
examples:
- desc: 分析3个步骤的购买漏斗
sql: |
* | SELECT
window_funnel(1800, 'default', cast(__time__ as bigint),
page='product',
action='add_cart',
action='payment'
) AS step,
count(distinct user_id) AS users
GROUP BY step
ORDER BY step
FILE:references/functions-guide.md
# 函数选型指南
先按场景选函数分类,再回读 skill 内部的对应 YAML。
## 高频分类
- 数据统计:`./functions/aggregate.yaml`
- 字符串处理:`./functions/string.yaml`
- 正则匹配:`./functions/regex.yaml`
- 时间处理:`./functions/datetime.yaml`
- 类型转换:`./functions/type_conversion.yaml`
- 条件判断:`./functions/conditional.yaml`
- JSON 提取:`./functions/json.yaml`
- 数值计算:`./functions/math.yaml`
- URL 解析:`./functions/url.yaml`
- 数组 / Map:`./functions/array.yaml`、`./functions/map.yaml`
- 窗口分析:`./functions/window.yaml`
- 漏斗分析:`./functions/window_funnel.yaml`
- Lambda 表达式:`./functions/lambda.yaml`
## 语言差异
- SQL + SPL 都支持:字符串、正则、时间、类型转换、条件、JSON、数学、URL、编码、哈希、数组、Map 等大部分基础函数
- 仅 SQL:窗口函数、位运算、空间函数、HyperLogLog、统计函数、漏斗函数等
- 仅 SPL:`ip_to_province`、`ip_to_city`、`ip_to_country`、`ip_to_geo`
## 高频提醒
- 数值比较前必须先 `cast()` 或 `try_cast()`
- 想避免转换失败时整条报错,用 `try_cast()`
- 时间分组优先 `date_trunc()` 或 `date_format()`
- JSON 字段优先 `json_extract()` / `json_extract_scalar()`
- SPL 做转义相关处理,优先看 `ascii_escape`、`ascii_unescape`、`unicode_unescape`
- SPL 中某些函数能力和 SQL 不完全对齐,拿不准时回读对应函数 YAML
## 常见模板
### 类型转换
```sql
* | SELECT count(*) FROM log WHERE cast(status as BIGINT) >= 500
```
```spl
* | where try_cast(status as BIGINT) >= 500
```
### JSON 提取
```sql
* | SELECT json_extract_scalar(payload, '$.user.id') AS user_id, count(*) FROM log GROUP BY user_id
```
### 正则提取
```sql
* | SELECT regexp_extract(message, 'code:(\d+)', 1) AS code, count(*) FROM log GROUP BY code
```
### SPL 地域分析
```spl
* | extend province = ip_to_province(client_ip) | stats pv = count(*) by province
```
## 本地源文档
- `./functions/overview.yaml`
- `./functions/README.md`
- `./functions/*.yaml`
FILE:references/query-analysis.md
# 查询分析路由
## 1. 先分清三种能力
- 查询:核心是不做聚合,返回的是日志级结果
- 索引查询负责高效过滤日志,语法能力相对有限
- 如果查询命中的日志还要继续做字段增减、字段筛选、parse/extend/project 等逐行处理,这部分属于 SPL 能力,不属于索引查询本身
- 分析:核心是带聚合过程,通常会使用聚合函数、分组、统计计算
- 索引查询:典型的查询语法,用来过滤日志,返回原始日志。典型形式:`status: 200 and method: GET`
- SQL 分析:典型的分析语法,用来聚合、分组、统计。典型形式:`status: 200 | SELECT count(*) AS pv FROM log`
- SPL:在查询分析场景下,主要作为查询能力的补充,用来处理扫描模式下的复杂过滤、逐行处理、字段增减、字段筛选,以及在需要时做 `stats` 聚合。典型形式:`* | where status = '500' | extend latency_ms = cast(latency as BIGINT) | project latency_ms`
## 2. 选择规则
- 查询场景优先索引查询:效率最高,但语法能力相对有限
- 如果查询场景里需要复杂条件过滤、逐行处理、parse-json/parse-regexp/parse-kv/extend 等能力:用 SPL 作为索引查询的补充
- 分析场景优先 SQL:适合 count/sum/avg/group by/topN/窗口分析等聚合计算
- 分析场景优先普通 SQL;如果发现分析字段没有索引,普通 SQL 无法直接完成,再考虑 SQL SCAN
- SPL 虽然也支持 `stats`、`sort`、`limit` 等聚合相关指令,但除非用户明确要求 SPL,或者问题本身就是扫描查询/流水线处理场景,否则分析语句默认优先生成 SQL
## 3. 管道分层规则
- 只要语句里有 `|`,第一级管道前面的部分就是索引查询
- 这一级索引查询对后面的 SPL 和 SQL 都有效
- 作用是先用索引快速过滤,提前缩小后续处理的数据范围
- 因此只要某个过滤条件可以用索引查询表达,就应该尽量前置到 `|` 前面
- 对 SQL 来说,这通常是执行更快的关键:能前置的过滤条件尽量不要只写在后面的 SQL 里
- 对 SPL 来说也是一样:能前置的索引过滤先前置,再在后面做逐行处理
## 4. 前置条件
- 查询分析都依赖索引配置
- SQL 依赖字段开启统计
- 索引和统计通常只对配置后新写入的数据生效,历史数据需要重建索引
- 查询型 Logstore 不支持统计,因此不支持 SQL 分析
- SCAN 也不是“完全无索引”,查询部分仍依赖可用索引,至少要满足最小索引条件
## 5. 时间范围
- 不要默认把时间条件写进 query / SQL
- 时间范围通过 `--from` / `--to` flag 指定,值为 Unix 时间戳(秒)
- SQL 中的 `__time__` 更适合做格式化、分组,而不是作为主要过滤手段
## 6. 索引查询高频规则
- 大小写不敏感
- 支持 `AND` / `OR` / `NOT`
- 范围查询只适用于 `long` / `double` 类型字段
- 模糊查询不能以前缀 `*` 开头
- 短语精确匹配用 `#"..."`
- `key: *` 表示字段存在且非空
## 7. SQL 高价值提醒
- 语法是 `查询语句 | 分析语句`
- SLS SQL 基于 Presto SQL
- 在 Logstore 场景一般表名默认是 `log`
- SQL does not support `LIMIT count OFFSET offset` syntax; use `LIMIT offset, count` instead for pagination (e.g., `LIMIT 20, 20` for rows 21–40)
- 默认返回最多 100 行;更大结果要配合 `LIMIT`
- SQL 后不能继续接 SPL
- 如果过滤条件可以在 `|` 前用索引查询表达,优先前置,不要只放在 SQL 里
## 8. SCAN 模式
- 用于未建索引字段的兜底分析,不是优先方案
- 查询场景:优先索引查询,其次 SPL
- 分析场景:优先普通 SQL,其次 SQL SCAN
- 典型写法:`* | set session mode = scan; SELECT count(1) AS pv, api FROM log GROUP BY api`
- 限制:
- 查询部分仍受索引影响
- 所有字段视为 `varchar`
- 单 Shard 扫描条数和总扫描行数有限制
- 性能明显低于索引分析模式
- 更适合临时分析或补救场景,不适合默认生成
## 9. 常见回答模板
### 只查日志
```text
status: 500 and service: payment
```
### 统计错误数
```sql
status: 500 | SELECT count(*) AS error_count FROM log
```
### 按小时聚合
```sql
status: 500 | SELECT date_format(__time__, '%Y-%m-%d %H:00:00') AS hour, count(*) AS error_count FROM log GROUP BY hour ORDER BY hour
```
### 先前置索引过滤,再做 SPL 处理
```spl
status: 500 and service: payment | where cast(latency as BIGINT) > 1000 | extend latency_ms = cast(latency as BIGINT) | project service, latency_ms, message
```
## 本地源文档
- `./query_analysis/overview.yaml`
- `./query_analysis/indexSearch.yaml`
- `./query_analysis/indexConfig.yaml`
- `./query_analysis/sql.yaml`
FILE:references/query_analysis/indexConfig.yaml
concept:
- name: 索引
description:
- 类似数据库索引的概念,不过SLS中字段索引需要开启后才能进行查询分析
- name: 索引配置
description:
- 索引配置由下面多个配置组成,包括下面的全文索引、字段所哟、分词、中文分词、字段类型
- name: 全文索引
description:
- 当需要对整个日志建立索引,并通过不指定字段的方式进行查询时候,需要开启全文索引
- name: 字段索引
description:
- 当需要对日志中的某个字段建立索引,并通过指定字段的方式进行查询和分析时候,需要开启字段索引
- 对于字段索引,有text、long、double和JSON四种类型
- name: 字段类型
description:
- 字段类型有text、long、double和JSON四种类型
- name: 分词与分词符
description:
- 分词符由一堆ASCII字符组成
- 分词的作用是将日志拆成多个term,分词的过程相当于是把内容按照预设的分词符(一组ASCII码)进行split
- 在建立索引的时候, 会根据分词符将日志拆成多个term, 然后建立term的倒排索引
- 在查询的时候,query会按照分词配置进行分词,然后匹配对应的term,然后查倒排索引,从而找到对应的日志
- 这里说的分词是指ASCII分词,中文分词有单独的开关
- name: 中文分词
description:
- 中文分词是针对中文的特殊处理,在索引配置有单独的开关
- 开启中文分词后索引的性能有所下降
- name: 开启统计
description:
- 开启统计后,才能进行SQL分析
- 目前 查询型logstore 不支持开启统计,因此也不支持跑SQL分析
- 对于 标准型logstore 开启统计分析没有额外的费用
- name: 统计字段(text)最大长度
description:
- 统计字段(text)最大长度是SQL分析时,默认截取一定长度,日志服务的默认配置为 2048 字节(2KB)。
- 可以在页面上调大到16KB, 调大索引字段最大长度没有额外的费用
- name: 重建索引
description:
- 索引配置提交后会对后面新写入的日志生效
- 如果希望对历史数据生效,需要使用重建索引功能
- name: 日志聚类(LogReduce)
description:
- 打开日志聚类开关后,日志服务在采集文本日志时会自动聚合相似度高的日志,提取共同的日志模式。
- name: 自动生成索引
description:
- 这个是前端提供的自动抽样一部分日志来推导出索引配置的功能,客户可以根据推导出的索引配置点击追加或者覆盖现有索引配置
- 由于自动生成索引的时候是抽样了一部分日志,所以不一定能推导出所有字段。客户可以按需再追剧新的字段
- name: 索引流量
description:
- 索引流量决定了索引的费用,按每GB为单位收费
- 如果开启了全文索引后,索引流量等同日志流量
- name: 开启索引后默认可查的字段
description:
- 开启索引后,默认会为 __time__、__topic__、__source__
- 高精度时间的纳秒部分如果需要在SQL中分析使用,需要额外加字段索引,`__time__ns_part__` 类型为bigint
api:
- name: CreateIndex
aliyun_doc: https://help.aliyun.com/zh/sls/developer-reference/create-index
description: 注意Index的创建需要传入全量的Index配置,不支持对单个字段进行操作
- name: DeleteIndex
aliyun_doc: https://help.aliyun.com/zh/sls/developer-reference/delete-index
- name: GetIndexConfig
aliyun_doc: https://help.aliyun.com/zh/sls/developer-reference/get-index-config
- name: UpdateIndex
aliyun_doc: https://help.aliyun.com/zh/sls/developer-reference/update-index
description: 注意Index的更新需要传入全量的Index配置,不支持对单个字段进行操作
faq:
- question: 之前 logstore 没有开启索引配置,后来才开启的,可以在控制台或 GetLogs 接口查到日志吗?
answer: 由于索引配置只对配置后写入的数据生效,所以之前没有开启索引配置的日志无法被查询到。如需查询历史日志,请使用重建索引功能。
- question: 即使在控制台不加任何查询或者分析条件,如果我想在控制台查到日志也要求开索引配置吗?
answer: 是的,即使在控制台不加任何查询或者分析条件,如果想在控制台查到日志也要求开索引配置。索引是查询和分析的前提条件
- question: 调大统计分析的字段长度(默认限制 2KB)调大到最大 16KB,会有什么影响吗?
answer: 调大后对新写入的数据生效,如果历史数据要生效可以重建索引。调大字段长度没有额外的费用
- question: 如何在日志中搜索包含空格的关键字?
answer:
- 一般情况下空格是一个分词符,可以通过SPL过滤的方式来处理,比如要查的是Hello World这个词可以这样写 "Hello World" | where content like '%Hello World%'
- 在| 前面加上Hello World过滤是为了减少SPL处理的数据量,加速查询过程
- question: 为什么查询和分析时,字段值会被截断?
answer:
- 查询时,单个字段值最大长度为 512 KB,超出部分不参与查询。
- 分析时,默认支持的字段值最大长度为 2 KB,最大可调整为 16 KB。
- 可以通过修改索引配置中的"统计字段(text)最大长度"来调整,该配置修改仅对新增采集的日志数据生效。
- question: 如何分析非索引字段?
answer:
- 如果是分析新写入的日志,则直接为目标字段创建索引且开启统计功能。
- 如果是分析历史日志,则需要对历史日志重建索引且开启统计功能。
- 如果无法创建索引,可以打开扫描(Scan)模式,通过扫描分析功能分析日志。
- question: 为什么我的字段索引看起来没有生效
answer:
- 可能的原因1, 字段索引刚配置,只能对新写入的数据生效,历史数据需要重建索引
- 另一个可能的原因是日志中的字段名和索引配置的字段名大小写可能不一致,比如配置的字段索引是Status,但是日志中的字段名是status
limitation:
- 索引配置的大小上限是64KB
FILE:references/query_analysis/indexSearch.yaml
index_search:
# 适用范围
target:
- logstore
- storeview
# 前置条件
prerequisites:
- requirement: 必须配置索引
description: 索引查询依赖索引配置,索引只对配置后写入的数据生效
scope: logstore级别
- requirement: 字段索引配置
description: 需要查询的字段必须在索引配置中开启(全文索引或字段索引)
note: 未配置索引的字段无法被查询
- 关于索引配置的问题,可以参考 `indexConfig.yaml`
# 费用说明
pricing:
free: true
description: 普通索引查询功能是免费的
# 整体限制
limitations:
- concurrent_queries: 100 # 单个Project最多并发查询数
- max_keywords: 128 # 每次查询最多关键词数(除布尔逻辑符外)
- page_size: 100 # 每页最多返回结果数
- field_value_size: "512 KB" # 单个字段值最大为512KB,超出部分不参与查询
- sort_order: "按时间倒序(秒级或纳秒级)"
- fuzzy_query_limit: 100 # 模糊查询最多查询到符合条件的100个词
- 索引查询不支持注释,如果写注释会被当作索引查询关键词的进行查询,最终可能导致结果错误
# 查询语法
query_syntax_base:
description: 查询语句用于指定日志查询时的过滤规则,返回符合条件的日志
basic_rules:
- 查询条件可使用关键词、数值、数值范围、空格、* 等
- 如果为空格或 *,表示无过滤条件
- 查询语句中建议不超过 128 个条件
- 支持布尔运算符(AND、OR、NOT)、括号分组、模糊查询、短语查询等
aliyun_doc: https://help.aliyun.com/zh/sls/query-syntax
query_syntax:
# 大小写是否敏感?
- type: case_sensitivity
description: 大小写不敏感
# 布尔运算
- type: and_or_not
description: 使用AND/OR/NOT组合条件
limitations:
- 最多30个条件(除布尔逻辑符外)
examples:
- query: "status: 200 and request_method: GET"
description: 查询status为200且request_method为GET的日志
# 范围查询
- type: range_query
description: 使用范围查询, field in [min max] 或者 field in (min max)
limitations:
- 只对索引字段为long或double的日志有效
examples:
- query: "status in [200 300]"
description: 查询status为200到300的日志
# 模糊查询
- type: fuzzy_query
description: 使用模糊查询, * 表示匹配任意字符, ? 表示匹配单个字符
limitations:
- 最多匹配100个term
- 通配符*不能出现在开头
examples:
- query: "status: 200*"
description: 查询status为200开头的日志
# 关键词查询
- type: keyword_query
description: 使用关键词查询
limitations:
- 索引配置中的分词配置决定了keyword最终的分词方式
examples:
- query: "host: www.example.com"
description: 查询host为www.example.com的日志
note: 如果.是分词符号,会查询www、example、com
# 全文查询
- type: full_text_query
description: 不指定key,直接输入查询字符串
limitations:
- 全文查询会使用全文索引对查询进行分词,如果目标内容所在的key单独配置了索引,并且两者索引配置不同(如分词符不一致、是否支持中文不一致),可能会查询不到结果,这时候推荐使用关键词查询
- 当没有配置全文索引时,会使用默认分词符"! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ ] { }"对输入进行分词
examples:
- query: '"www.example.com"'
description: 如果.是分词符,查询准确包含www、example、com的日志
# 短语查询
- type: phrase_query
description: 使用短语查询,精确匹配
limitations:
- 不能在NOT语句中使用
examples:
- query: '#"www.example.com"'
description: 查询准确包含www.example.com的日志
# 关于时间范围查询
time_filter:
- description: SDK查询请在GetLogs接口中指定开始结束时间,控制台查询请直接选好具体查询时间范围。不需要也无法在query中指定时间范围
# 使用示例
usage_examples:
- query: "status: 200 and request_method: GET"
description: 查询status为200且request_method为GET的日志
- query: "NOT status: 200 NOT request_method: GET"
description: 查询status不为200且request_method不为GET的日志
- query: "request_method: GE* AND status: 200"
description: 查询request_method为GET且status为200的日志
# 高级功能
advanced_features:
- name: 字段分析(Field Analysis)
description: 提供字段分布、统计指标及 TOP5 时间序列图,帮助理解数据
features:
- 可以快速了解字段的取值分布、数量统计等信息
- 支持对数值型字段进行统计分析,包括最大值、最小值、平均值等
- 在查询结果页面,点击字段名称可以查看字段分析
aliyun_doc: https://help.aliyun.com/zh/sls/field-analysis
- name: 上下文查询(Context Query)
description: 支持查看指定日志的上下文信息,方便故障排查和问题定位
usage: 在日志详情页面,点击"上下文"按钮
aliyun_doc: https://help.aliyun.com/zh/sls/contextual-query
- name: LiveTail(实时监控)
description: 实时监控线上日志,减轻运维压力
features:
- 类似 tail -f 命令,可以实时查看最新产生的日志
- 适用于实时监控、故障诊断等场景
aliyun_doc: https://help.aliyun.com/zh/sls/livetail
FAQ:
- question: 为什么模糊查询结果不全
answer: 模糊查询默认只匹配100个term,如果需要完整匹配,可以使用SPL语句,例如
```
* | where column like '%keyword%'
```
- question: 为什么查不到数据
answer:
- 确定一下时间范围是否正确
- 确定一下查询语句是否正确
- 如果索引配置刚打开,索引配置只对新写入的数据生效,历史数据需要重建索引
- 如果查询中包含了中文,需要确认一下索引配置中是否开启了中文索引
- 可能是索引分词配置导致,需要确认一下索引分词配置是否正确
- 如果是SDK查询的话,需要确认一下offset和topic是否设置正确
- 需要确认是否使用全文查询查询字段,并且配置不一样,参考full_text_query
- 如果是使用全文查询,需要确认全文索引和字段索引配置是否一致,参考full_text_query
- 检查数据写入延迟:日志写入SLS后,构建索引通常有几秒的延迟(取决于集群负载),刚写入的日志可能无法立即被检索到
- 检查逻辑运算符优先级:AND 的优先级高于 OR,例如 "a or b and c" 会被解析为 "a or (b and c)",如果预期是 "(a or b) and c",必须使用括号显式分组
- 检查Logtail采集状态:确认Logtail心跳正常且无解析错误(Logtail配置错误导致日志根本没发到SLS,自然查不到)
- 检查是否存在不可见字符:日志中可能包含不可见的控制字符(如颜色代码 \033 等),导致关键词匹配失败,建议先用模糊查询排查
- 检查数值类型字段:如果字段在JSON中是数值(如 age: 20),但在索引配置中配成了“文本”类型,由于构建索引失败会导致查不到
- question: 查询(Search)和 分析(Analysis)有什么区别?
answer: |
- 查询(Search):仅用于过滤和返回原始日志内容。语法如 `status:200`。
- 分析(Analysis):基于 SQL 92 语法对日志进行聚合、统计、计算。语法如 `* | SELECT method, count(1) group by method`。
- 两者通过管道符 `|` 分割,管道符左边是查询语句,右边是分析语句。
- question: 为什么数值范围查询(如 latency > 100)报错或结果不符合预期?
answer: |
- **索引类型错误**:范围查询要求该字段在索引配置中必须设置为 **数值类型(Long/Double)**。如果配置的是 **文本类型(Text)**,即便是数字内容,SLS 也会将其视为字符串处理,不支持数学比较运算。
- question: 为什么 JSON 日志无法通过 `key.subkey: value` 的方式查询?
answer: |
- **未开启 JSON 索引**:默认情况下,如果字段索引配置为“文本”,SLS 会将整个 JSON 对象视为一长串字符串。你只能通过全文搜索查询其中的片段。
- **配置层级**:要支持 `object.field: value` 这种 key-value 对齐的查询,必须在索引配置中,将该父字段的数据类型设置为 **JSON**,并开启“索引所有文本字段(针对string的叶子节点)”或手动逐个添加子字段的索引。
- question: 为什么我搜索 `User-Agent: Chrome` 查不到数据,但直接搜 `Chrome` 能查到?
answer: |
- **字段大小写不一致**:检查索引配置中,该字段的 Key 配置日志实际的 Key是否不一致,比如日志实际的 Key 是 `user_agent` 而索引配置中配置的 `User-Agent`。
- question: 为什么精确搜索(例如 `id: "123-456"`)匹配到了不该匹配的日志(例如 `123-456-789`)?
answer: |
- **分词符问题**:这是最容易被忽视的配置。如果索引配置中,连字符 `-` 被设置为了分词符:
- 输入内容 `123-456` 会被切分为 `123` 和 `456`。
- 目标日志 `123-456-789` 也会被切分为 `123`, `456`, `789`。
- 因为同时包含 `123` 和 `456`,所以匹配成功。
- **解决方法**:如果需要绝对精确匹配,应从分词符列表中移除该符号,或者将该字段配置为“不分词”(只有完全相等才匹配),或者使用spl或sql。
- question: 为什么使用通配符查询(如 `*error`)报错?
answer: |
- **前缀通配符限制**:为了保障集群性能,SLS 的查询语法(Search)**禁止**通配符出现在查询词的**开头**(如 `*keyword`)。
- **后缀通配符**:支持后缀通配符(如 `keyword*`),但注意后缀通配符匹配的 Term 数量有限制(默认 100 个),如果匹配到的词过多,可能会导致部分结果丢失。
- question: 如何查询包含引号等特殊字符的内容?
answer: |
- **转义查询**:如果查询的值中包含双引号 `"`,需要使用反斜杠转义,例如 `content: "error message with \"quote\""`,注意在查询语句中,串用双引号包裹,单引号没有任何特殊含义
- question: 如何查询“存在某个字段”的日志(即字段非空)?
answer: |
- 使用通配符语法:`key: *`。
- 例如 `user_id: *` 可以筛选出所有包含 `user_id` 字段且不为空的日志。
- 注意:这依赖于 `user_id` 字段必须配置了索引。
- question: 为什么我修改了索引的分词符配置,原来的日志还是查不到?
answer: |
- **不可追溯性**:这是 SLS 索引机制的核心特性。索引是写入时构建的(Write-Once)。
- **修改后果**:修改索引配置(如增加分词符、改变字段类型)**仅对修改后新写入**的日志生效。
- **解决方案**:如果必须查询历史数据,唯一的办法是使用“重建索引(Reindex)”功能或者将数据导出重导。
trouble_shooting:
- name: 怀疑索引配置相关问题导致查不到数据
description:
- 参考 `indexConfig.yaml` 文档排查索引配置相关问题
- 结合 上述索引查询相关问题给出排查结论
FILE:references/query_analysis/overview.yaml
concept:
- name: 查询
description:
- 一般通过若干关键词组合来过滤出符合条件的日志,我们叫做查询
- name: 分析
description:
- 一般带有聚合指令、函数的查询语句就是分析
- 客户有可能分不清楚查询和分析的区别,可能把分析也叫查询
- name: 索引查询
description:
- 索引查询是SLS的查询能力,通过关键词的and/or/not组合查询,过滤出符合条件的日志
- name: SQL
description:
- SLS 提供了SQL分析能力,语法符合PrestoSQL语法
- SQL一般是跟在索引查询之后,通过竖线 | 分割
- name: SPL
description:
- SPL可以在查询分析场景使用,一般跟在索引查询之后,通过竖线 | 分割
- name: 不精确
description:
- 当数据量过大的时候,索引查询或者SQL/SPL可能返回部分结果数据(部分查询结果或者部分数据的聚合结果)
- 一般情况下通过重试多次可以获得精确结果
- 控制台页面在显示日志柱状图的上方有显示"查询不精确"的提示
- SDK使用的话一般对于GetLogs的结果有isComplete字段来标识是否精确
- name: 索引配置
description:
- 需要开启索引配置后才能使用查询分析的功能
- name: 数据集/Storeview
description:
- 默认情况下查询分析只能在单个logstore/metricstore中使用
- Storeview提供了将多个logsrore或metricstore的数据联合起来查询的能力
- Storeview定义的时候关联到的store要么都是logstore,要么都是metricstore
- name: SCAN模式
description:
- SCAN场景是为了满足不建索引的情况下进行查询分析的需求,一般情况下性能会低
- SCAN场景要求至少建有__time__的索引, 也就是默认情况下开启索引即可(不开字段索引和全文索引的情况)
- SCAN模式下可以用SPL或者SQL进行查询分析, 对于| 前面的部分是查询语句(这部分依赖具体建了哪些索引),后面的部分是分析语句
- name: 上下文查询
description:
- logtail或者producer上报日志的时候会带上上下文信息,从而可以看到每个上报的日志前后的日志,这就是上下文
- 上下文信息依赖__pack_id__(这个由客户端上报)和__pack_meta__(服务端生成)这两个字段
- name: Livetail
description:
- 可以理解为tail -f命令,可以实时查看最新产生的日志,在SLS控制台可以使用
- name: 增强SQL
description:
- 对于并发能力或者单Logstore有较大数据量的情况,可以开启增强SQL来提升查询分析的性能
- name: 完全精确SQL
description:
- 对于需要完全精确结果的场景,可以开启完全精确SQL来提升查询分析的性能
- name: Shard和查询分析的性能关系
description:
- shard即是logstore中的存储数据的分区,也决定了查询分析的并发能力
- 一般情况下SQL查询能不足的时候,可以通过分裂shard来解决。但分裂shard一般对后面生成的数据有效
- 如果对历史数据shard还没分裂,可以使用增强SQL来提升查询分析的性能
- name: 日志聚类
description:
- 日志聚类是SLS的日志聚类能力,通过日志聚类可以快速发现日志的模式
- name: 外表
description:
- SQL提供了外表关联的能力,可以将外表和logstore进行关联分析
- 外表需要创建后才能在SLS中和Logstore进行Join
- 目前支持的外表有MySQL、OSS、Postgres
- 除此之外,还支持CSV数据托管,即将CSV数据通过SDK上传到SLS的external_store(当前只支持csv上传)
- name: 物化视图
description:
- 物化视图是SQL的能力,通过提供物化视图提升查询分析的性能
- name: 定时SQL
description:
- 定时SQL支持将SQL分析的结果存储到目标Logstore中
- name: 短语查询
description:
- sls的索引查询的原理是通过倒排来实现的,对于一个query会通过分词来匹配对应的日志;由于分词后不关心词的顺序,所以短语查询无法使用索引查询
- 为了精准匹配一个keyword,短语查询的语法是:#"www.example.com",也就是在字符串前面加#
- 目前短语查询只支持加在and条件中,如果有or或者not无法生效; 如果希望在or not中生效,可以使用SPL来做,比如 * | where column like '%www.example.com%'
- name: 大小写敏感问题
description:
- 在查询分析场景下,关键词以及字段名称都是大小写不敏感的
- 索引配置里的字段名称是大小写敏感的,如果索引配置中的字段名和日志中的不一致,会导致索引不生效
- name: 时间字段过滤
description:
- 在SLS中时间是一个特殊的内置字段(__time__), 一般情况下在查询分析的时候不需要在query中过滤时间范围,而是通过接口参数或者控制台界面指定时间范围
- 在SDK查询的时候GetLogs接口需要指定from_time和to_time参数来指定时间范围
- 在控制台查询的时候直接选择时间范围即可
related_docs:
- name: 索引配置细节参考 `indexConfig.yaml`
- name: 索引查询细节参考 `indexSearch.yaml`
- name: SQL细节参考 `sql.yaml`
faq:
- question: 为什么写入日志后查询不到日志?
answer:
- 检查已设置的分词符是否符合要求。
- 索引配置只对新增日志生效,如果要查询和分析历史数据,请使用重建索引功能。
- 确认是否已经开启索引配置。
- question: SQL的select * 为什么不返回所有字段?
answer:
- SQL的select * 返回的结果是所有索引字段的内容,而不是日志中所有字段。如果要返回所有字段需要用索引查询或者用扫描查询进行SCAN
FILE:references/query_analysis/sql.yaml
sql_query:
# 适用范围
target:
- logstore
- storeview
# 前置条件
prerequisites:
- requirement: 必须开启索引统计功能
description: SQL分析依赖索引配置,SQL中使用到的字段必须在索引配置中开启统计功能
scope: logstore级别
note: 索引只对配置后写入的数据生效,历史数据需重建索引
# 语法说明
syntax:
base: "查询语句 | 分析语句"
description: SQL分析需要跟在索引查询语句之后,使用竖线(|)分割
sql_dialect: Presto SQL
analysis_statement:
description: 分析语句对查询结果或全量数据进行计算和统计
rules:
- 分析语句必须与查询语句一起使用,查询语句和分析语句以竖线 | 分割
- 分析语句默认分析当前 Logstore 中的数据,不需要填写 FROM 子句和 WHERE 子句
- 分析语句不支持使用 offset,不区分大小写,末尾不需要加分号
- 执行分析操作后,默认最多返回 100 行数据,最大返回 100MB 的数据
aliyun_doc: https://help.aliyun.com/zh/sls/query-and-analyze-logs-in-index-mode
notes:
- 索引查询语法参考 `indexSearch.yaml`
- 支持标准SQL语法和丰富的SQL函数
- 函数列表参考 `../functions/overview.yaml`
- 在logstore中执行的时候,表名默认是log,表示当前logstore。所以一般都写from log
- sql后不能再接SPL语句,反之亦然
- sls的presto sql版本暂时不支持在lambda中使用try, 比如 transform(array[1, 2, 3], x -> try(cast(x as BIGINT))) 这样是不支持的, 可以使用try_cast代替try, 比如 transform(array[1, 2, 3], x -> try_cast(x as BIGINT))
# 时间范围选择
time_range:
description: 时间范围不在SQL中指定,而是通过接口参数或控制台界面指定
methods:
# SDK方式
- method: SDK调用
implementation: 在GetLogs接口中通过from_time和to_time参数指定时间范围
example: |
# Python示例
response = client.get_logs(
project=project_name,
logstore=logstore_name,
from_time=1609459200, # 起始时间(Unix时间戳,秒)
to_time=1609545600, # 结束时间(Unix时间戳,秒)
query="status: 200 | SELECT count(*) as pv"
)
note: 不需要在SQL的WHERE子句中再次过滤__time__字段
# 控制台方式
- method: 控制台使用
implementation: 在查询分析页面的时间选择器中直接选择时间范围
steps:
- 打开Logstore的查询分析页面
- 在页面顶部的时间选择器中选择时间范围(相对时间或绝对时间)
- 输入查询分析语句,无需在SQL中添加时间过滤条件
note: 控制台会自动将选择的时间范围传递给后端
important_notes:
- SQL中的__time__字段已经是过滤后的结果,代表日志的时间戳
- 如需在SQL中对时间进行分组、格式化等操作,使用date_format等函数
- 不建议在SQL WHERE子句中过滤__time__,这样会降低查询效率
# 费用说明
pricing:
# 普通模式
- mode: standard
name: 普通SQL分析
cost: 免费
limitations:
concurrent: 15 # 单个Project最大并发数
timeout: "55秒"
description: 默认开启,适用于小规模数据分析
# SQL独享版(增强模式)
- mode: enhanced
name: SQL独享版(增强模式)
cost: 按CPU时间计费
billing: "CPU时间(小时)× 单价(0.35元/核·小时)"
limitations:
concurrent: 100 # 单个Project最大并发数
timeout: "55秒"
description: 高性能与高并发,资源弹性伸缩
scenarios:
- 实时数据分析,对性能要求高
- 长周期数据分析(如月维度)
- 超大规模数据分析(千亿行级别)
- 高并发场景(SQL并发数大于15)
# SQL完全精确模式
- mode: accurate
name: SQL完全精确
cost: 按CPU时间计费
billing: "CPU时间(小时)× 单价(0.35元/核·小时)"
limitations:
concurrent: 5 # 单个Project最大并发数
timeout: "55秒"
description: 零误差保证,确保数据完整加载
scenarios:
- 业务监控告警(关键指标)
- 严肃分析场景(财务、营收、安全审计)
- 在线数据服务,要求结果完全精确
features:
# SCAN模式
- name: SCAN模式
description: 支持对未建索引的字段进行SQL分析
syntax: "查询语句 | set session mode = scan; SQL分析语句"
example: "* | set session mode = scan; SELECT count(1) as pv, api FROM log GROUP BY api"
limitations:
- 单Shard限制分析50万条日志
- 扫描总行数限制1000万条(查询过滤后)
- 所有字段视为varchar类型,需使用cast转换
- 查询语句部分仍依赖索引
- 不支持 *|select * 语句
- 性能相对较慢,适用于百万级别数据
pricing:
cost: 免费
notes:
- 已有索引的字段建议放在竖线前作为过滤条件
- 建议缩小时间范围减少扫描数据量
- 大规模数据建议使用索引分析模式
# SQL增强
- name: SQL增强(SQL独享版)
description: 对于数据规模较大的logstore,可开启SQL增强提升执行速度
enable_methods:
# Project级别开启
- level: project
steps:
- 在SLS控制台首页点击小房子图标,进入项目首页
- 找到"SQL独享版CU数"设置
- 设置对应的CU数(计算单元)
note: 全局生效,所有查询都会使用
# 单次查询开启
- level: query
method: 在SLS控制台查询分析框中开启SQL独享版开关
note: 仅对当前查询生效
advantages:
- 单节点处理能力达2GB
- 支持最多100个并发
- 资源弹性伸缩
- 秒级响应千亿级数据分析
# 整体限制
limitations:
common:
- item: 字段值大小
value: "默认2KB(2048字节),最大16KB(16384字节)"
note: 超出部分不参与分析,可在索引配置中调整
- item: 返回结果
value: "默认最多返回100行"
extended: "使用LIMIT最多返回100万行,最大输出数据量20GB"
- item: 超时时间
value: "55秒"
note: SQL分析最大执行时间
- item: Double类型精度
value: "最多52位"
note: 超过52位会造成精度损失
- item: 数据生效机制
value: "只对开启统计功能后写入的数据生效"
solution: 历史数据需重建索引
by_mode:
standard:
concurrent: 15
scan_data: "400MB(不含缓存)"
accuracy: "超出部分截断,标记为不精确"
enhanced:
concurrent: 100
scan_data: "2GB(不含缓存)"
accuracy: "超出部分截断,标记为不精确"
accurate:
concurrent: 5
scan_data: "无限制"
accuracy: "完全精确,零误差"
# 使用示例
examples:
- query: "status: 200 | SELECT count(*) as pv FROM log"
description: 统计状态码为200的日志数量
mode: standard
- query: "* | SELECT status, count(*) as cnt FROM log GROUP BY status ORDER BY cnt DESC LIMIT 10"
description: 统计各状态码出现次数,取前10
mode: standard
- query: "* | set session mode = scan; SELECT api, count(*) as pv FROM log GROUP BY api"
description: 使用SCAN模式对未建索引的api字段进行分析
mode: scan
- query: "status: 500 | SELECT date_format(__time__, '%Y-%m-%d %H:00:00') as time, count(*) as error_count FROM log GROUP BY time ORDER BY time"
description: 按小时统计错误日志数量(适合大数据量)
mode: enhanced
note: 建议开启SQL增强以提升性能
# 最佳实践
best_practices:
- 尽量在|前提前过滤好数据,减少需要分析的数据量
- 大规模或高并发场景开启SQL增强模式
- 关键业务指标分析使用完全精确模式
- 合理使用LIMIT控制返回结果数量
- 建议为常用查询字段配置索引并开启统计
- 时间范围通过接口参数或控制台选择器指定,不要在SQL的WHERE子句中过滤__time__
- 如需对时间进行操作,使用date_format、date_trunc等时间函数
- 正则表达式中的反斜杠不需要双重转义,比如\d 直接写 \d即可
FILE:references/ram-policies.md
# RAM Policies - SLS Query Analysis
## Required Permissions
The following RAM permissions are required to execute the APIs used by this skill:
| API | CLI Command | API Action | Resource Permission | Description |
|-----|-------------|------------|---------------------|-------------|
| GetLogsV2 | `get-logs-v2` | `log:GetLogStoreLogs` | `acs:log:{#regionId}:{#accountId}:project/{#ProjectName}/logstore/{#LogstoreName}` | Query logs from a Logstore using index query, SQL analysis, or SPL pipelines |
| GetIndex | `get-index` | `log:GetIndex` | `acs:log:{#regionId}:{#accountId}:project/{#ProjectName}/logstore/{#LogstoreName}` | Get the index configuration of a Logstore, used to verify index settings before running queries or analysis |
Placeholders in the resource ARN:
- `{#regionId}` -- SLS region, e.g. `cn-hangzhou`. Use `*` to match all regions.
- `{#accountId}` -- Alibaba Cloud account ID (UID). Use `*` to match all accounts.
- `{#ProjectName}` -- SLS Project name. Use `*` to match all projects.
- `{#LogstoreName}` -- Logstore name. Use `*` to match all Logstores under the project.
In the examples below, `regionId` and `accountId` are left as `*` for simplicity. Replace `<project-name>` and `<logstore-name>` with your real values, or use `*` to broaden the scope.
## Minimum RAM Policy
Use this policy when only `get-logs-v2` is used.
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"log:GetLogStoreLogs"
],
"Resource": "acs:log:*:*:project/<project-name>/logstore/<logstore-name>"
}
]
}
```
## Complete RAM Policy
Use this policy when both `get-logs-v2` and `get-index` are used.
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"log:GetLogStoreLogs",
"log:GetIndex"
],
"Resource": "acs:log:*:*:project/<project-name>/logstore/<logstore-name>"
}
]
}
```
**Example: Allow All Logstores Under a Project**
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"log:GetLogStoreLogs",
"log:GetIndex"
],
"Resource": "acs:log:*:*:project/my-project/logstore/*"
}
]
}
```
**Example: Allow All Projects**
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"log:GetLogStoreLogs",
"log:GetIndex"
],
"Resource": "acs:log:*:*:project/*"
}
]
}
```
## Recommended System Policy
| Policy Name | Description |
|-------------|-------------|
| `AliyunLogReadOnlyAccess` | Read only access to all aliyun log resource |
## Principle of Least Privilege
It is recommended to select permissions based on actual needs following the principle of least privilege:
1. **Action Scope**: Prefer the Minimum RAM Policy when your use case only involves querying logs, and grant `log:GetIndex` only when your workflow needs to read index configuration.
2. **Resource Scope**: Prefer to narrow the `Resource` field to specific projects and Logstores whenever possible.
FILE:references/regions.md
# Region & Endpoint Configuration
## Region
By default, the region is read from the active profile configured via `aliyun configure`. To override it for a single command, add `--region <region-id>`:
```bash
aliyun sls get-logs-v2 --region cn-shanghai --project my-project ...
```
## Endpoint
By default, the CLI accesses SLS through the **public endpoint**. To use the **internal (intranet) endpoint**, specify `--endpoint` to override the default endpoint generated from `region`:
```bash
aliyun sls get-logs-v2 --endpoint cn-hangzhou-intranet.log.aliyuncs.com --project my-project ...
```
SLS internal endpoint format: `<region-id>-intranet.log.aliyuncs.com`
Examples:
- `cn-hangzhou-intranet.log.aliyuncs.com`
- `cn-shanghai-intranet.log.aliyuncs.com`
- `us-west-1-intranet.log.aliyuncs.com`
**`--endpoint` takes priority over `--region`.** When `--endpoint` is set, `--region` is ignored.
## When to use the internal endpoint
If the public endpoint is unreachable (timeout, connection refused), try switching to the internal endpoint. This is common when running inside an Alibaba Cloud VPC.
## Common error: ProjectNotExist
If a call returns `ProjectNotExist`, the project likely exists in a different region. Ask the user to confirm the correct region or endpoint for their project.
FILE:references/related-apis.md
# Related APIs - SLS Query & Analysis
This document is an API / CLI reference for the SLS query/analysis skill. It covers two
read-only endpoints: `GetLogsV2` (run a query or SQL/SPL analysis) and `GetIndex` (inspect
the index configuration of a Logstore). For usage patterns and worked examples, see the
other docs in this skill.
## Command List
| CLI Command | API Action | Description | Documentation |
|-------------|------------|-------------|---------------|
| `aliyun sls get-logs-v2` | GetLogsV2 | Run a query / SQL / SPL statement against a Logstore and return log rows or aggregated results | [Doc](https://help.aliyun.com/zh/sls/developer-reference/api-sls-2020-12-30-getlogsv2) |
| `aliyun sls get-index` | GetIndex | Read the current index configuration of a Logstore (full-text index, field indexes, JSON keys, TTL, log-reduce) | [Doc](https://help.aliyun.com/zh/sls/developer-reference/api-sls-2020-12-30-getindex) |
> **Deprecated — do not use**: `GetLogs` (`aliyun sls get-logs`). Use `GetLogsV2` instead.
---
## GetLogsV2
`POST /logstores/{logstore}/logs` — invoke via `aliyun sls get-logs-v2`.
### Input — Required
| Parameter | Type | CLI Flag | Description |
|-----------|------|----------|-------------|
| Project | string | `--project` | Project name. |
| Logstore | string | `--logstore` | Logstore name. |
| From | integer | `--from` | Start time, Unix timestamp in **seconds**. Inclusive. |
| To | integer | `--to` | End time, Unix timestamp in **seconds**. Exclusive. `from == to` is rejected. |
### Input — Optional
| Parameter | Type | CLI Flag | Description |
|-----------|------|----------|-------------|
| Query | string | `--query` | Query or query + SQL/SPL statement. Omit or use `*` to match all logs. |
| Topic | string | `--topic` | Log topic filter. Default empty. |
| Line | integer | `--line` | Max rows per response, `0–100`, default `100`. Query-mode only (ignored when SQL is present; use `LIMIT` instead). |
| Offset | integer | `--offset` | Starting row for pagination, default `0`. Query-mode only. |
| Reverse | boolean | `--reverse` | `true` = newest first, `false` (default) = oldest first. Query-mode only; in SQL mode use `ORDER BY`. |
| PowerSQL | boolean | `--power-sql` | Enable Dedicated SQL. Default `false`. Equivalent to prepending `set session parallel_sql=true;` to the SQL. |
| Session | string | `--session` | Session hints, e.g. `mode=scan` to force SPL / scan mode. |
| Forward | boolean | `--forward` | Scan / phrase-query paging direction. Default `false`. |
| IsAccurate | boolean | `--is-accurate` | Enable nanosecond-level ordering. |
| Accept-Encoding | string | `--accept-encoding` | Wire compression format (`lz4` / `gzip`). Affects only transport, not the decoded result; the CLI handles decompression automatically. You normally do not need to set this. |
### Output
| Field | Meaning |
|-------|---------|
| `meta.progress` | `Complete` or `Incomplete`. **Always check**: `Incomplete` means the result is partial and the same request should be retried until it returns `Complete`. |
| `meta.count` | Number of rows returned in this response. |
| `data` | Array of `{ key: value }` objects — one log row per entry. `__time__` is seconds. |
---
## GetIndex
`GET /logstores/{logstore}/index` — invoke via `aliyun sls get-index`.
### Input — Required
| Parameter | Type | CLI Flag | Description |
|-----------|------|----------|-------------|
| Project | string | `--project` | Project name. |
| Logstore | string | `--logstore` | Logstore name. |
### Output
| Field | Meaning |
|-------|---------|
| `ttl` | Index TTL in days. |
| `max_text_len` | Max indexed field length (64–16384 bytes, default 2048). |
| `index_mode` | Index type, typically `v2`. |
| `line` | Full-text index configuration (`chn`, `caseSensitive`, `token`, `include_keys`, `exclude_keys`). Absence means full-text index is disabled. |
| `keys` | Field-index map: key = field name, value = `IndexKey` with `type` (`text` / `long` / `double` / `json`), `chn`, `caseSensitive`, `token`, `alias`, `doc_value` (enables SQL statistics), `json_keys` (for nested JSON fields). |
| `log_reduce` | Whether log clustering (LogReduce) is enabled. |
| `log_reduce_white_list` / `log_reduce_black_list` | Fields included/excluded from clustering. |
| `lastModifyTime` | Last index modification time (Unix seconds). |
| `storage` | Storage type, fixed to `pg`. |
## Common Errors
| HTTP Status | ErrorCode | Meaning |
|-------------|-----------|---------|
| 404 | `ProjectNotExist` | Project name is wrong or in a different region. |
| 404 | `LogStoreNotExist` | Logstore name is wrong. |
| 404 | `IndexConfigNotExist` | No index configured on this Logstore — any query/SQL will fail. |
| 500 | `InternalServerError` | Retry with backoff. |
---
## Reference Documentation
| Document | Description |
|----------|-------------|
| [GetLogsV2 API](https://help.aliyun.com/zh/sls/developer-reference/api-sls-2020-12-30-getlogsv2) | Official API reference for the query/analysis endpoint |
| [GetIndex API](https://help.aliyun.com/zh/sls/developer-reference/api-sls-2020-12-30-getindex) | Official API reference for reading index configuration |
| [Query syntax overview](https://help.aliyun.com/document_detail/43772.html) | Index query language (full-text, field filter, boolean operators) |
| [SQL analytics overview](https://help.aliyun.com/document_detail/53608.html) | SLS SQL grammar, functions, and pagination |
| [SPL overview](https://help.aliyun.com/zh/sls/user-guide/spl-overview) | Scan Processing Language for row-level transforms and scan queries |
| [General API error codes](https://help.aliyun.com/document_detail/29013.html) | Shared SLS error-code table |
| [Aliyun CLI — SLS plugin](https://github.com/aliyun/aliyun-cli) | CLI source and release notes |
FILE:references/spl/extend.yaml
name: extend
category: 结构化数据SQL计算指令
description: 通过SQL表达式计算结果产生新字段
grammar: extend <new_field1> = <expression1> [, <new_field2> = <expression2>, ...]
example: |
* | extend status_num = cast(status as BIGINT)
* | extend request_time = from_unixtime(__time__)
* | extend is_error = if(cast(status as BIGINT) >= 500, 'yes', 'no')
scenarios:
- 在所有场景中都适用
note:
- 表达式中可以使用SQL函数,参考 `../functions/overview.yaml`
- 如果新字段名与已有字段名相同,会覆盖原字段
- 支持在一条extend指令中定义多个新字段
FILE:references/spl/json_string_process.yaml
json_process:
- name: json_extract(x, json_path)
input:
- x: STRING 或者 JSON
- json_path: STRING
return: JSON
description: 从JSON对象或JSON数组中提取一组JSON值(数组或对象)。
- name: json_size(x, json_path)
input:
- x: STRING
- json_path: STRING
return: BIGINT
description: 计算JSON对象或数组中元素的数量。
- name: json_parse(x)
input:
- x:STRING
return: JSON
description: 把字符串类型转换为JSON类型。
- name: json_format(x)
input:
- x: JSON
return: STRING
description:
- 把JSON类型转换为字符串类型
- name: json_extract_scalar(x, json_path)
input:
- x: STRING
- json_path: STRING
return: STRING
description: 从JSON对象或JSON数组中提取标量值。
- name: json_array_length(x)
input:
- x: STRING
return: BIGINT
description: 计算JSON数组中元素的数量。
- name: json_array_contains(x, value)
input:
- x: STRING
- value: STRING
return: BOOLEAN
description: 判断JSON数组中是否包含指定值。
note:
- name: json_extract和json_extract_scalar的区别
description:
- json_extract函数的返回值是JSON类型,json_extract_scalar函数的返回值是varchar类型。
- json_extract函数可以解析JSON对象中任意一块子结构,json_extract_scalar函数只解析标量类型(字符串、布尔值或者整型值)的叶子节点,返回对应的字符串。
- name: 如何设置 json_path
description:
- 字典取值一般用$.key,表示从根节点开始,向下遍历所有key为key的节点,如果key名带.需要使用中括号[]代替半角句号(.),然后使用双引号("")包裹字段名,例如$["http.path"]。
- 数组取值一般用$[index],表示从数组中提取指定索引的元素。注意这里是不支持* 的统配方式
- 根节点用$,表示从根节点开始。
- 例子
- 查看用户第一个订单的金额。 * | extend x=json_extract_scalar(request, '$.param.orders[0].payment')
- 查看用户第一个订单中购买的第二件商品。 * | extend x=json_extract_scalar(request, '$.param.orders[0].items[1].name')
- 取字典中的数组。 * | extend x='{"input" : {"messsage": ["a","b"]}}' | project x | extend y=json_extract(x, '$.input.messsage')
- 取数组中的第二个元素。 * | extend x='["a","b","c"]' | project x | extend y=json_extract(x, '$[1]')
- 取数组中的值,并继续取字典的值 * | extend x='["a","b",{"msg" : "hello"}]' | project x | extend y=json_extract(x, '$[2].msg')
- name: 如何分析JSON中的数组
description:
- 要充分利用json类型和array<json>类型特点,结合array做复杂处理。
- 一般可以用json_extract 提取json对象,然后可以cast成array<json>类型,接着做一些transform操作。
- 如果有需要可以再用把复杂的array<json>类型转换为json对象,接着使用json_format变成字符串
- 如果遇到需要对json类型做判断的,可以借助case when + try_cast来做一些处理, 注意json_type、typeof等检查类型的函数是不支持的
- 例子
- 查看用户订单中购买的商品数量。 * | extend x=json_extract(request, '$.param.orders') | extend y=cardinality(cast(x as array))
FILE:references/spl/limit.yaml
name: limit
category: 聚合指令
description: 限制查询结果返回的日志行数
grammar: limit <number>
example: |
* | project host,client_ip,status | limit 100
* | where cast(status as BIGINT) >= 500 | project host,client_ip,status | sort __time__ desc | limit 10
scenarios:
- 扫描查询
note:
- 用于控制返回数据量,防止结果过大
- 通常与sort配合使用获取TopN结果
- limit实际被当成聚合指令来执行,对于聚合指令执行的时候要求最终列确定
- 以下指令可以让列确定
- stats
- project
- 比如 `* | limit 100` 这样是不合适的,由于limit后的列不固定无法执行会报错
- 正确的写法是 `* | project host,client_ip,status | limit 100`
FILE:references/spl/overview.yaml
# 使用场景
scenarios:
- name: Logtail采集
description: Logtail采集时使用SPL指令处理日志
- name: 写入处理器
description: 写入处理器时使用SPL指令处理数据
- name: 实时消费
description: 实时消费时使用SPL指令处理数据
- name: 扫描查询
description: 扫描查询时使用SPL指令处理数据
grammar:
- name: SPL 基本语法
description: <data-source> | <spl-cmd> | <spl-cmd> | ...
note:
- SPL语句是多级数据处理语句,通过英文管道符(|)连接,以英文分号(;)作为语句结束符
- 指令(command)是 SPL 中的基本单位,每个指令可以有参数和表达式
- 在SPL中数据源一般使用星号(*),表示所有数据作为输入
- 管道前面指令的输出会作为管道后面指令的输入
- 单条语句或多语句中最后一条的结束符(;)可选
- 关于转义说明,SPL不支持转义处理,所以正则表达式中的反斜杠不需要双重转义,比如\d 直接写 \d即可
limitations:
- 关于<data-source>说明,处理在扫描查询这个使用场景下,其他场景只能填*
- 扫描查询模式下是大小写不敏感的,其他场景是大小写敏感的
- SPL管道之后不能再写SQL语句,反之亦然
- name: 数据源说明
description: 数据源(data-source)包括SLS Logstore以及SPL定义的数据集
note:
- 在不同场景中数据源定义存在差异
- 除了扫描查询场景,其他场景都使用*(星号)表示全部原始数据,字段名大小写敏感
- 在扫描查询场景下 <data-source> 可以写符合 `../query_analysis/indexSearch.yaml` 中 query 语法规则的查询语句
- name: 类型说明
note:
- 字段默认为VARCHAR类型,数值运算前必须用cast()或try_cast()转换
- 特殊情况:__time__ 和 __time__ns_part__ 字段默认按照bigint类型处理,如果要做数字运算,无需cast
- name: 语法符号
description: SPL语法中的特殊符号说明
symbols:
- symbol: "*"
description: "SLS Logstore数据作为SPL输入数据时的占位符"
- symbol: "|"
description: "SPL管道符,用于引出SPL指令表达式"
- symbol: ";"
description: "SPL语句结束标识"
- symbol: "'...'"
description: "字符串常量引用符号"
- symbol: '"..."'
description: "字段名称、字段名模式引用符号。当字段名包含字母、数字、下划线以外的特殊字符时必须使用双引号包裹"
- symbol: "--"
description: "注释单行内容"
- symbol: "$"
description: "命名数据集引用符,格式为$<dataset-name>"
# SPL指令索引
commands:
# 字段操作指令
- name: project
file: project.yaml
category: 字段操作指令
description:
- 保留与给定模式相匹配的字段,同时可重命名指定字段
- 注意这里的project和sql的select是不一样的,不能写project *, 而是需要确定的字段列表
applicable_scenarios: [所有场景]
- name: project-away
file: project-away.yaml
category: 字段操作指令
description: 移除与给定模式相匹配的字段,保留其他所有字段
applicable_scenarios: [所有场景]
- name: project-rename
file: project-rename.yaml
category: 字段操作指令
description: 重命名指定字段,并保留其他所有字段
applicable_scenarios: [所有场景]
# 结构化数据计算指令
- name: where
file: where.yaml
category: 结构化数据SQL计算指令
description: 根据SQL表达式过滤数据,保留满足条件的数据条目
applicable_scenarios: [所有场景]
- name: extend
file: extend.yaml
category: 结构化数据SQL计算指令
description:
- 通过SQL表达式计算结果产生新字段
- 通过extend可以生成复杂结构的字段, 比如ARRAY、MAP、JSON等
- 如果有复杂的计算需求, 可以考虑使用Lambda表达式
applicable_scenarios: [所有场景]
# 弱结构化数据提取指令
- name: parse-json
file: parse-json.yaml
category: 弱结构化数据提取指令
description: 提取指定字段中的第一层JSON信息
applicable_scenarios: [所有场景]
- name: parse-regexp
file: parse-regexp.yaml
category: 弱结构化数据提取指令
description: 使用正则表达式提取指定字段中的信息
applicable_scenarios: [所有场景]
- name: parse-csv
file: parse-csv.yaml
category: 弱结构化数据提取指令
description: 提取指定字段中的CSV格式信息
applicable_scenarios: [所有场景]
- name: parse-kv
file: parse-kv.yaml
category: 弱结构化数据提取指令
description: 提取指定字段中的键值对信息
applicable_scenarios: [所有场景]
- name: pack-fields
file: pack-fields.yaml
category: 弱结构化数据提取指令
description: 将一个或多个字段打包为一个 JSON Object 格式的字段
applicable_scenarios: [规则消费]
# 聚合指令
- name: stats
file: stats.yaml
category: 聚合指令
description:
- 对日志数据进行统计、分组和聚合操作, 注意stats指令中聚合的字段必须使用extend指令计算后才能使用
- 注意stats是是 stats field=agg_func(field) 这种形式
example: |
* | stats total = count(*)
* | extend avg_latency = avg(cast(latency as double)) |stats avg_latency = avg(avg_latency) by status
* | extend total_bytes = sum(cast(bytes as BIGINT)) | stats cnt = count(*), total_bytes = sum(total_bytes) by method, status
* | stats pv = count(*) by ip
applicable_scenarios: [扫描查询]
- name: sort
file: sort.yaml
category: 聚合指令
description: 对查询结果进行排序, 注意sort使用有特殊要求,具体查看sort.yaml文件
applicable_scenarios: [扫描查询]
- name: limit
file: limit.yaml
category: 聚合指令
description: 限制查询结果返回的日志行数, 注意limit使用有特殊要求,具体查看limit.yaml文件
applicable_scenarios: [扫描查询]
best_practices:
- name: 性能优化
tips:
- 尽早使用where过滤,减少后续处理的数据量
- 字符串常量使用单引号,字段名包含特殊字符使用双引号
- name: JSON字符串提取等处理
file: json_string_process.yaml
tips:
- 注意把json对象转换为string需要用json_format函数
- name: 数据处理顺序
tips:
- 推荐处理顺序:where过滤 -> parse解析 -> extend计算 -> project选择字段
- 在扫描查询中:where -> parse -> extend -> stats -> sort -> limit
faq:
- name: 同样的正则在SQL中可以正常匹配,但是在SPL中无法匹配
answer:
- SPL使用RE2正则引擎,某些正则特性受到限制,RE2不支持的特性包括:后向引用(\1)、环视(?<=...)、贪婪模式修饰符等,建议使用RE2兼容的正则表达式语法
- name: SPL支持对字符串进行转义吗
answer:
- SPL对于字符串处理默认都是按字面量处理,也就是不带转义。有一些需要转义的场景可以这样处理
- 比如要表达ASCII码的字符,可以在extend/where指令使用chr函数,比如chr(10) 表示ASCII码为10的\n
- 使用ascii_escape/ascii_unescape/unicode_unescape等函数进行转义,参考 `../functions/string.yaml` 文档
- 如果用户写了一个`* | extend x=split(column, '\n')` 这里的\n实际上是一个字面量,不会被转义,正确的写法是`* | extend x=split(column, chr(10))`
- name: SPL中如何删除一个字段
anser:
- 把一个字段的值设置为NULL, 就会在最终输出的时候删除这个字段
特别注意点:
- name: 字段名注意事项
tips:
- 处理扫描查询场景,其他场景默认是字段名大小写敏感
- 在扫描查询场景中,字段名大小写不敏感
- 字段名包含特殊字符时必须用双引号包裹
- 需要双引号的特殊字符包括:冒号(:)、点(.)、中划线(-)、空格等字母数字下划线以外的字符
- 需要用双引号的字段示例: fieldA.x 因为有. 需要加双引号变成 "fieldA.x", 这个带点的字段可能是json字段,看下面“对带.的字符名处理,特别注意点”
- 示例:正确写法 "__tag__:__path__",错误写法 __tag__:__path__
- 系统字段如 __time__、__source__、__topic__ 等不包含特殊字符,不需要双引号
- 标签字段格式为 __tag__:<key>,其中<key>如果不含特殊字符可以不用双引号,如 __tag__:hostname
- 正则表达式中要用到反斜杠不需要双重转义,比如\d 直接写 \d即可,不需要写成\\d
- 但完整的标签字段名 "__tag__:hostname" 因为包含冒号,在某些指令中建议加双引号以保证兼容性
- name: 对带.的字符名处理,特别注意点
tips:
- 特别注意点:如果字段名中包含.,可能是因为这个字段实际是一个json类型的字符串, 而.是这个json字符串里的一个子key
- 在SLS索引查询场景可能因为单独配置了JSON子key可以直接做查询
- 这个情况的处理方法是在SPL中先使用json_extract/json_extract_scalar等函数提取key后做处理,注意一般不需要用parse-json
- 处理方法示例:* | where json_extract(fieldA, '$.xxx') = 'value' # 对应索引查询的 fieldA.xxx: 'value'
FILE:references/spl/pack-fields.yaml
name: pack-fields
category: 弱结构化数据提取指令
description: 将一个或多个字段打包为一个 JSON Object 格式的字段
grammar: pack-fields -keep -ltrim -include=<includeRegex> -exclude=<excludeRegex> as <outputField>
params:
- name: output
type: String
required: 是
description: 指定打包后输出的字段名称。字段值格式为JSON格式。
- name: include
type: RegExp
required: 否
description: 白名单配置,符合正则表达式的字段会被打包。默认为".*" ,表示全部匹配。更多信息,请参见正则表达式。
- name: exclude
type: RegExp
required: 否
description: 黑名单配置(优先于白名单),符合正则表达式的字段不会被打包。默认为空,表示不进行匹配判断。更多信息,请参见正则表达式。
- name: ltrim
type: String
required: 否
description: 在输出字段名称中,去掉前缀。
- name: keep
type: Bool
required: 否
description: |
打包数据后是否保留被打包的源数据。
True:输出结果中保留被打包的原数据。
False(默认值):输出结果中不保留被打包的源数据。
example: |
* | pack-fields -include='\w+' as test
* | pack-fields -keep -include='\w+' as test
* | pack-fields as content --不指定的情况下默认打包所有
scenarios:
- 规则消费
FILE:references/spl/parse-csv.yaml
name: parse-csv
category: 弱结构化数据提取指令
description: 提取指定字段中的CSV格式信息
grammar: parse-csv <field> [options] as <field1>, <field2>, ...
example: |
* | parse-csv content as name, age, city
* | parse-csv data -delimiter='|' as col1, col2, col3
scenarios:
- 在所有场景中都适用
note:
- 可以通过-delimiter指定分隔符,默认为逗号
- 可以通过-quote指定引用符,默认为双引号
- 适用于所有场景
FILE:references/spl/parse-json.yaml
name: parse-json
category: 弱结构化数据提取指令
description: 提取指定字段中的第一层JSON信息
grammar: parse-json <field>
example: |
* | parse-json message
* | parse-json body | extend user_id = cast(user_id as BIGINT)
scenarios:
- 在所有场景中都适用
note:
- 只提取第一层JSON字段, 提取后字段名默认就是json里dict的key名,不带前缀的,比如 `{"msg" : "hello"}` 提取后字段名就是msg
- 提取的字段会添加到原有字段中
- 如果JSON解析失败,不会报错,继续处理
- parse-json 和 * | extend x=json_extract(x, '$.key') 有区别?
- parse-json会提取json第一层所有字段出来,而json_extract/json_extract_scalar等只会提取指定字段
- 如果用户的需求是明确的使用或者提取某个json中字段的场景,应该优先使用json_extract/json_extract_scalar等函数,而不是parse-json
- json_extract等相关函数使用场景可以参考 `json_string_process.yaml` 文档
FILE:references/spl/parse-kv.yaml
name: parse-kv
category: 弱结构化数据提取指令
description: 提取指定字段中的键值对信息
grammar: parse-kv <field> [options]
example: |
* | parse-kv message
* | parse-kv content -pair_separator='&' -kv_separator='='
scenarios:
- 在所有场景中都适用
note:
- 默认使用空格或制表符作为键值对分隔符
- 默认使用等号(=)作为键和值的分隔符
- 可以通过-pair_separator和-kv_separator自定义分隔符
FILE:references/spl/parse-regexp.yaml
name: parse-regexp
category: 弱结构化数据提取指令
description: 使用正则表达式提取指定字段中的信息
grammar: parse-regexp <field>, '<regexp>' [as <field1>, <field2>, ...]
example: |
* | parse-regexp message, '(\d+\.\d+\.\d+\.\d+).*status:(\d+)' as ip, status
* | parse-regexp content, 'Error: (.*)' as error_msg
scenarios:
- 在所有场景中都适用
note:
- 使用RE2正则表达式语法
- 正则表达式需用单引号包裹
- 正则表达式中的反斜杠不需要双重转义,比如\d 直接写 \d即可
- 使用括号()创建捕获组,每个捕获组对应一个输出字段
FILE:references/spl/project-away.yaml
name: project-away
category: 字段操作指令
description: 移除与给定模式相匹配的字段,保留其他所有字段
grammar: project-away <field1>, <field2>, ...
example: |
* | project-away "__tag__:__path__", __source__
scenarios:
- 在所有场景中都适用
note:
- 支持字段名通配符
- 字段名包含特殊字符(如冒号:)时需用双引号包裹
- 适用于所有场景
FILE:references/spl/project-rename.yaml
name: project-rename
category: 字段操作指令
description: 重命名指定字段,并保留其他所有字段
grammar: project-rename <new_field1> = <old_field1>, <new_field2> = <old_field2>, ...
example: |
* | project-rename status_code = status, uri = request_uri
scenarios:
- 在所有SPL场景下都可以使用
note:
- 只进行字段重命名,不删除任何字段
- 适用于所有场景
FILE:references/spl/project.yaml
name: project
category: 字段操作指令
description: 保留与给定模式相匹配的字段,同时可重命名指定字段
grammar: project <field1>, <field2>, ... [, <new_field> = <old_field>]
example: |
* | project status, body
* | project status, request_uri, time = __time__
scenarios:
- 在所有SPL场景下都可以使用
note:
- 先完成所有字段保留表达式的执行,再执行重命名表达式
- 字段名包含特殊字符需用双引号包裹,如 project "a:b:c"
- 支持通配符*,如 project "__tag__:*"
- 适用于所有场景
FILE:references/spl/sort.yaml
name: sort
category: 聚合指令
description: 对查询结果进行排序
grammar: sort [<field1> [asc|desc]] [, <field2> [asc|desc]], ...
example: |
* | stats cnt = count(*) by status | sort cnt desc
* | extend status_num = cast(status as BIGINT) | project host,client_ip,status_num | sort status_num asc, __time__ desc
scenarios:
- 扫描查询
note:
- 默认为升序(asc)
- 可以指定多个排序字段
- sort实际被当成聚合指令来执行,对于聚合指令执行的时候要求最终列确定
- 以下指令可以让列确定
- stats
- project
- 比如 `* | sort status` 这样是不合适的,由于sort后的列不固定无法执行会报错
- 正确的写法是 `* | project host,client_ip,status | sort status`
FILE:references/spl/stats.yaml
name: stats
category: 聚合指令
description: 对日志数据进行统计、分组和聚合操作
grammar: stats <output1> = <aggOperator1> [, <output2> = <aggOperator2>, ...] [by <group1>, <group2>, ...]
example: |
* | stats total = count(*)
* | extend avg_latency = avg(latency) | stats avg_latency = avg(avg_latency) by status
* | extend total_bytes = sum(bytes) | stats cnt = count(*), total_bytes = sum(total_bytes) by method, status
* | stats pv = count(*) by ip
scenarios:
- 扫描查询
supportedAggregators:
- count
- count_if
- min
- max
- sum
- avg
- skewness
- kurtosis
- approx_percentile
- approx_distinct
- bool_and
- bool_or
- every
- arbitrary
- array_agg
note:
- 使用等号(=)指定输出字段名,不使用as关键字
- 使用by子句进行分组统计
- 可以在一条stats指令中使用多个聚合函数,用逗号分隔
- 默认返回前100条聚合结果,需要更多结果可结合limit指令使用
- 特别注意,stats中聚合函数里只能使用字段不能写表达式,如果需要计算表达式,需要先使用extend指令计算后才能使用
- 聚合操作里的字段如果有类型转换,需要先使用extend先做好cast再使用stats
FILE:references/spl/where.yaml
name: where
category: 结构化数据SQL计算指令
description: 根据SQL表达式过滤数据,保留满足条件的数据条目
grammar: where <boolean_expression>
example: |
* | where status = '500'
* | where cast(status as BIGINT) >= 500
* | where status = '200' and method = 'GET'
scenarios:
- 在所有SPL场景下都可以使用
note:
- 布尔表达式中可以使用SQL函数,具体可以读取 `../functions/overview.yaml`
- 数据类型转换必须显式使用cast函数
- 字符串需使用单引号包裹
- 适用于所有场景
- 建议尽早使用where过滤,提升性能
FILE:references/spl-guide.md
# SPL 使用指南
## 1. SPL 适用场景
- 本 skill 只覆盖查询分析里的扫描查询 SPL
- 不覆盖 Logtail 采集处理、写入处理器、实时消费、数据加工等非查询分析场景
## 2. 基本语法
```text
<data-source> | <spl-cmd> | <spl-cmd> | ...
```
- 查询分析里常见数据源是 `*`
- 扫描查询时,`<data-source>` 也可以写索引查询语句
- 如果 `<data-source>` 写的是索引查询语句,那么它会先过滤数据,再把结果交给后面的 SPL 指令
- SPL 管道后不能接 SQL,反之亦然
- 末尾分号可选
## 3. 写 SPL 时必须记住
- 字段默认是 `VARCHAR`
- 做数值比较或计算前先 `cast()` / `try_cast()`
- 字符串常量用单引号
- 字段名包含 `.` `:` `-` 空格等特殊字符时,用双引号
- 非扫描场景默认字段名大小写敏感
- SPL 不做转义处理,`\n` 默认是字面量,不会自动转成换行
- 正则使用 RE2,不支持后向引用和环视
## 4. 推荐流水线顺序
```text
where -> parse -> extend -> project
```
扫描聚合场景常用:
```text
where -> parse -> extend -> stats -> sort -> limit
```
## 5. 高频指令
### `where`
按布尔表达式过滤数据。
```spl
* | where status = '500'
* | where cast(status as BIGINT) >= 500
```
### `extend`
计算新字段,复杂表达式先在这里处理。
```spl
* | extend latency_ms = try_cast(latency as BIGINT)
```
### `parse-json` / `parse-regexp` / `parse-csv` / `parse-kv`
从弱结构化字段里提取结构化字段。
### `stats`
做聚合统计。
```spl
* | stats pv = count(*) by ip
* | extend latency_ms = try_cast(latency as BIGINT) | stats avg_latency = avg(latency_ms) by api
```
关键限制:
- `stats` 里聚合函数只能直接作用于字段,不能直接写表达式
- 如果要 `sum(cast(bytes as BIGINT))`,先在 `extend` 里算出字段再聚合
- 虽然 `stats` 可以做聚合,但在查询分析场景下,除非用户明确要求 SPL,或者已经处于扫描查询/流水线处理中,否则默认优先使用 SQL 来生成分析语句
### `sort` / `limit`
对聚合结果排序和截断。
## 6. 何时优先用扫描查询 SPL
- 字段未建索引,但需要做临时分析
- 需要 `parse` / `extend` / `stats` 这类流水线处理
- 需要在扫描场景里替代索引查询做更灵活的过滤
如果只是常规聚合统计,且字段已经具备索引和统计,优先普通 SQL,不要默认切到 SQL SCAN。
如果某些过滤条件本身可以用索引查询表达,优先把这些条件前置到第一级管道前,再在后面接 SPL。
## 7. 带点字段的处理
如果字段名里有 `.`,先判断它是不是 JSON 子 key 的索引展示形式,而不是真实名字。
常见替代写法:
```spl
* | where json_extract_scalar(fieldA, '$.xxx') = 'value'
```
## 本地源文档
- `./spl/overview.yaml`
- `./spl/where.yaml`
- `./spl/extend.yaml`
- `./spl/parse-json.yaml`
- `./spl/parse-regexp.yaml`
- `./spl/parse-csv.yaml`
- `./spl/parse-kv.yaml`
- `./spl/stats.yaml`
- `./spl/sort.yaml`
- `./spl/limit.yaml`
FILE:references/troubleshooting.md
# 查询分析排障
用户说“查不到”“结果不对”“SQL/SPL 报错”时,按这个顺序排。
## 1. 时间范围
- 先确认 `--from` / `--to` 是否覆盖了目标时间范围
- 刚写入的日志可能存在秒级索引构建延迟
## 2. 索引配置
- 是否已经开启索引
- 目标字段是否建了字段索引
- 是否误用了全文索引,导致分词规则和字段索引不一致
- 中文日志是否需要中文分词
- 修改索引后,是否误以为历史数据会自动生效
## 3. 字段类型和统计
- 范围查询依赖 `long` / `double`
- SQL 分析依赖字段开启统计
- 文本统计字段默认长度有限制,过长内容可能被截断
- 查询型 Logstore 不支持统计
## 4. 语法问题
- 索引查询里的 `AND` 优先级高于 `OR`
- 模糊查询不能写成 `*error`
- 短语匹配用 `#"..."`
- 字段存在查询用 `key: *`
- SQL 和 SPL 不能混写
- SPL 字符串默认不转义,`\n` 不是换行
## 5. 字段名问题
- 索引配置中的字段名大小写敏感
- 日志里的字段名如果和索引配置大小写不一致,索引会失效
- SPL 中字段名包含特殊字符时需要双引号
## 6. 什么时候建议换方案
- 未建索引但必须分析:切到 SCAN 模式
- 索引分词天然无法做精确短语:改用 SPL `where ... like` 或 SQL
- 大数据量/高并发 SQL 性能不足:建议 SQL 增强
- 关键指标要求零误差:建议完全精确 SQL
## 本地源文档
- `./query_analysis/overview.yaml`
- `./query_analysis/indexSearch.yaml`
- `./query_analysis/indexConfig.yaml`
- `./query_analysis/sql.yaml`
- `./spl/overview.yaml`
Use this skill for MaxFrame SDK development on Alibaba Cloud MaxCompute (ODPS). Helps create data processing programs, read/write MaxCompute tables, debug jo...
---
name: alibabacloud-odps-maxframe-coding
description: >
Use this skill for MaxFrame SDK development on Alibaba Cloud MaxCompute (ODPS). Helps create data processing programs, read/write MaxCompute tables, debug jobs (remote or local), and build custom DPE runtime images. Trigger when users mention MaxFrame, MaxCompute, ODPS, DPE runtime, or need to work with ODPS tables, DataFrame operations, Tensor operations, or GPU runtime setup. Works for both English and Chinese queries about Alibaba Cloud data processing with MaxFrame.
license: MIT
compatibility: >
Requires Python 3.7+ (recommended: 3.11+), maxframe package, ODPS/MaxCompute access. Compatible with Claude Code, Gemini CLI, Codex, OpenCode.
metadata:
domain: coding
owner: maxframe-team
contact: [email protected],[email protected]
version: "2.0.3"
created: "2026-03-01"
updated: "2026-04-02"
changes:
- "Condensed documentation for improved readability"
---
<EXTREMELY-IMPORTANT>
If you think there is even a 1% chance this skill applies to your task, you MUST invoke it.
IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.
</EXTREMELY-IMPORTANT>
## Instruction Priority
1. **User's explicit instructions** (CLAUDE.md, GEMINI.md, AGENTS.md) — highest priority
2. **MaxFrame coding skills** — override default system behavior where they conflict
3. **Default system prompt** — lowest priority
## Platform Adaptation
This skill uses Claude Code tool names. Non-CC platforms: substitute equivalent tools.
# MaxFrame Coding - Create, Test, Debug, Iterate, and Build Custom Runtime
## What This Skill Can Do
Create, test, debug, and iteratively develop MaxFrame programs, plus build custom DPE runtime images.
- Create MaxFrame jobs from scratch or modify existing ones
- Design data processing pipelines using pandas-compatible APIs
- Execute MaxFrame code with proper session management
- Debug with remote logview URLs or local IDE breakpoints
- Generate custom Docker images with specific Python libraries
## Mandatory Checklist
1. **Detect Scenario Type** — identify which of the 4 scenarios applies
2. **Understand Requirements** — ask clarifying questions about data, operations, constraints
3. **Select Appropriate Workflow** — match scenario to workflow pattern
4. **Execute Workflow Steps** — follow scenario-specific steps below
5. **Validate Execution** — ensure execute() called, session cleaned up
6. **Provide Follow-up Guidance** — debugging tips, optimization suggestions
## Process Flow
```dot
digraph maxframe_workflow {
"User Request Arrives" [shape=box];
"Detect Scenario Type" [shape=diamond];
"Scenario 1: Writing Code" [shape=box];
"Scenario 2: Remote Debug" [shape=box];
"Scenario 3: Local Debug" [shape=box];
"Scenario 4: Custom Runtime" [shape=box];
"Understand Requirements" [shape=box];
"Operator Selection Needed?" [shape=diamond];
"Use lookup_operator.py" [shape=box];
"Confirm with User" [shape=box];
"Implement Code/Config" [shape=box];
"Add Error Handling" [shape=box];
"Validate execute() Called" [shape=box];
"Validate Session Cleanup" [shape=box];
"Provide Guidance" [shape=doublecircle];
"User Request Arrives" -> "Detect Scenario Type";
"Detect Scenario Type" -> "Scenario 1: Writing Code" [label="new pipeline"];
"Detect Scenario Type" -> "Scenario 2: Remote Debug" [label="cluster testing"];
"Detect Scenario Type" -> "Scenario 3: Local Debug" [label="IDE breakpoints"];
"Detect Scenario Type" -> "Scenario 4: Custom Runtime" [label="custom image"];
"Scenario 1: Writing Code" -> "Understand Requirements";
"Scenario 2: Remote Debug" -> "Understand Requirements";
"Scenario 3: Local Debug" -> "Understand Requirements";
"Scenario 4: Custom Runtime" -> "Understand Requirements";
"Understand Requirements" -> "Operator Selection Needed?";
"Operator Selection Needed?" -> "Use lookup_operator.py" [label="yes"];
"Operator Selection Needed?" -> "Implement Code/Config" [label="no"];
"Use lookup_operator.py" -> "Confirm with User";
"Confirm with User" -> "Implement Code/Config";
"Implement Code/Config" -> "Add Error Handling";
"Add Error Handling" -> "Validate execute() Called";
"Validate execute() Called" -> "Validate Session Cleanup";
"Validate Session Cleanup" -> "Provide Guidance";
}
```
## Scenario Detection Logic
**Scenario 1: Writing MaxFrame Code**
- User wants to create new data processing pipeline
- User mentions reading from/writing to MaxCompute tables
- User asks for complete MaxFrame program
- Keywords: "create MaxFrame", "write MaxFrame code", "build pipeline", "process data with MaxCompute"
**Scenario 2: Remote Debug Mode**
- User wants to test with actual cluster resources
- User mentions job execution errors
- User asks for logview URLs
- User wants to diagnose execution failures
- Keywords: "debug MaxFrame job", "logview", "remote test", "execution error", "cluster testing"
**Scenario 3: Local Debug Mode**
- User wants to debug UDF functions iteratively
- User mentions IDE breakpoints (VSCode, PyCharm)
- User wants to test with sample data locally
- User wants fast iteration without network
- Keywords: "local debug", "IDE breakpoints", "debug UDF locally", "VSCode/PyCharm debug"
**Scenario 4: Create Custom Runtime Image**
- User needs Python libraries not in standard runtime
- User wants GPU-enabled runtime
- User mentions building custom DPE image
- Keywords: "custom runtime", "DPE runtime image", "GPU runtime", "install custom packages", "build Docker image"
## Core Rules
### 1. Use Public APIs Only
Use APIs from: `maxframe.dataframe`, `maxframe.tensor`, `maxframe.learn`, `maxframe.session`, `maxframe.udf`, `maxframe.config`
### 2. DO NOT Read Private .env Files
Use `dotenv.load_dotenv()` programmatically. Never read `.env` files directly with Read tool.
### 3. Lazy Execution
MaxFrame uses lazy execution. Operations build computation graph, execute only when `.execute()` called. **Always call .execute().**
### 4. Session Management
Always create session before operations, destroy in `finally` block for cleanup.
### 5. Operator Selection with User Confirmation
Before implementing processing logic, confirm operator selection with user using `scripts/lookup_operator.py`.
## Red Flags
| Thought | Reality |
|---------|---------|
| "This is just a simple MaxFrame question" | Questions are tasks. Invoke the skill. |
| "I already know the MaxFrame API" | Skills have latest patterns. Use them. |
| "Let me just write the code directly" | Operator selection is MANDATORY. |
| "I can skip operator confirmation" | User confirmation is REQUIRED. |
## Scenario 1: Writing MaxFrame Code
### Workflow Steps
1. **Understand Requirements** — source/target tables, schema, partition filters, write mode, processing logic
2. **Operator Selection (MANDATORY)** — use `python scripts/lookup_operator.py search "<operation>"`, present options, get confirmation
3. **Implement Code** — session setup, read data, process with confirmed operators, write results, add execute(), cleanup in finally
4. **Add Error Handling** — wrap execute() in try/except, print logview URL on error
5. **Validate** — ensure execute() called, session.destroy() in finally, no hardcoded credentials
### Example Code Structure
```python
import maxframe.dataframe as md
from maxframe.session import new_session
import dotenv
dotenv.load_dotenv()
session = new_session()
try:
df = md.read_odps_table("source_table")
result = df.groupby('column').agg({'value': 'sum'})
md.to_odps_table(result, "target_table", overwrite=True).execute()
finally:
session.destroy()
```
**See:** `references/common-workflow.md` for complete patterns.
## Scenario 2: Remote Debug Mode
### Workflow Steps
1. **Understand Requirements** — current code state, error messages, table names
2. **Add Logview Support** — session before operations, try/except around execute(), logview URL in except
3. **Provide Debugging Guidance** — explain logview usage, common error patterns
### Example Code Structure
```python
import maxframe.dataframe as md
from maxframe.session import new_session
session = new_session()
try:
df = md.read_odps_table("table_name")
result = df.groupby('region').agg({'sales': 'sum'})
result.execute()
except Exception as e:
print(f"Error: {e}")
print(f"Logview URL: {session.get_logview_address()}")
finally:
session.destroy()
```
### Common Error Patterns
1. **Authentication Errors** — verify environment variables
2. **Table Not Found** — check table name and permissions
3. **Timeout Errors** — check logview, optimize query
4. **Type Mismatch** — check DataFrame dtypes
5. **SQL Errors** — review generated SQL in logview
**See:** `references/remote-debug-guide.md` for detailed solutions.
## Scenario 3: Local Debug Mode
### Workflow Steps
1. **Understand Requirements** — UDF logic, sample data schema, IDE preference
2. **Create Local Debug Setup** — session with `debug=True`, sample data with `md.DataFrame(pd.DataFrame(...))`
3. **Provide IDE Setup Guidance** — breakpoint setup, execution flow
### Example Code Structure
```python
import maxframe.dataframe as md
from maxframe.session import new_session
import pandas as pd
session = new_session(debug=True)
sample_data = pd.DataFrame({
'user_id': ['u1', 'u2', 'u3'],
'level': ['gold', 'silver', 'bronze'],
'amount': [1000, 500, 100]
})
df = md.DataFrame(sample_data)
def calculate_discount(row):
# Set breakpoint here in IDE
if row['level'] == 'gold':
return row['amount'] * 0.1
return row['amount'] * 0.02
result = df.apply(calculate_discount, axis=1)
result.execute()
session.destroy()
```
**See:** `references/local-debug-guide.md` for complete guide.
## Scenario 4: Create Custom Runtime Image
Build custom Docker images through conversational guidance using best practices from reference guides.
### When to Create Custom Runtime
**Create when:** need Python libraries not in standard DPE runtime, GPU-enabled processing, specific Python version, custom system dependencies
**NOT needed when:** standard packages suffice, no GPU requirements
### Conversational Workflow
1. **Read Best Practices Guide** — `references/runtime-image-guides/README.md`
2. **Base Image Selection** — Ubuntu 22.04 (GPU/ML workloads) or Ubuntu 24.04 (modern development)
3. **Python Version Selection** — Python 3.11 (production), 3.10-3.12 (development), or all versions
4. **GPU Configuration** — CUDA 12.4 + PyTorch 2.6.0+cu124 (if ML workloads)
5. **Iterative Package Collection** — collect required packages, note version constraints
6. **Output Directory** — confirm where to create files
7. **Build Dockerfile Section-by-Section** — header, base setup, conda setup, GPU setup, packages, env config, verification
8. **Create Support Files** — README.md, .dockerignore, requirements.txt
9. **Provide Build and Test Instructions**
10. **MaxFrame Usage Example**
### Step-by-Step Guidance
**Step 1: Base Image Selection (AskUserQuestion)**
Present Ubuntu options with trade-offs:
```
Which Ubuntu version for your custom runtime?
A. Ubuntu 22.04 (Recommended for most cases)
- Stable, production-ready
- Excellent CUDA support (12.4, 12.1, 11.8)
- Widely tested ML libraries (PyTorch, TensorFlow)
- LTS until 2027
B. Ubuntu 24.04 (Modern/latest)
- Newer system packages
- Latest LTS (until 2029)
- Better for non-GPU workloads
- Python 3.12 integration
Recommendation:
- GPU/ML workloads → Ubuntu 22.04
- Modern development → Ubuntu 24.04
```
**Step 2: Python Version Selection (AskUserQuestion)**
```
Which Python versions?
A. Python 3.11 only (Recommended for production)
- Best performance
- Smallest image (~1 GB)
- Excellent package support
B. Python 3.10, 3.11, 3.12 (Development)
- Good compatibility
- Medium size (~2 GB)
- Recent versions
C. All versions 3.7-3.12 (Maximum flexibility)
- Largest image (~3-5 GB)
- Maximum compatibility
- Testing across versions
Recommendation:
- Production → Single version (3.11)
- Development → Recent versions (3.10-3.12)
```
**Step 3: GPU Configuration (AskUserQuestion)**
If user mentions GPU or ML packages:
```
Need GPU support?
A. Yes - GPU-enabled with CUDA 12.4 (Recommended)
- Install PyTorch 2.6.0+cu124
- CUDA toolkit 12.4
- Note: Requires Ubuntu 22.04 for best compatibility
B. No - CPU only
- Standard package installation
- Smaller image size
Recommendation: For ML/AI workloads, GPU support significantly improves performance.
```
**Compatibility Handling:**
If user selected Ubuntu 24.04 earlier and now requests GPU support:
- Explain: "Ubuntu 24.04 has limited CUDA support. Ubuntu 22.04 is recommended for GPU workloads."
- AskUserQuestion: "Should I use Ubuntu 22.04 instead for better GPU compatibility?" (Yes recommended)
**Step 4: Build Dockerfile Section-by-Section**
For each section:
- Read pattern from best practices guide
- Explain purpose and trade-offs
- Write section with inline comments
- Accumulate into complete Dockerfile
**Sections:**
1. **Header** — Image metadata, configuration summary
2. **Base setup** — FROM, apt packages, locales, timezone
3. **Conda setup** — Miniforge installation, environment creation
4. **GPU setup** — CUDA installation, PyTorch with CUDA (if applicable)
5. **Package installation** — User packages in multi-environment loops
6. **Environment config** — MF_PYTHON_EXECUTABLE, CONDA_DEFAULT_ENV, PATH
7. **Verification** — Health checks, Python version verification
**Step 5: Provide Build and Test Instructions**
```bash
# Build
docker build -t <image-tag> <output-dir>
# Test Python
docker run --rm <image-tag> conda run -n py311 python --version
# Test GPU (if applicable)
docker run --rm --gpus all <image-tag> python -c "import torch; print(torch.cuda.is_available())"
# Test packages
docker run --rm <image-tag> conda run -n py311 python -c "import transformers; print(transformers.__version__)"
# Push to registry
docker push <image-tag>
```
**Step 6: MaxFrame Usage Example**
```python
from maxframe.session import new_session
session = new_session(
odps=odps_connection,
image="your-registry/your-image:v1"
)
# Your MaxFrame operations here
```
### Default Recommendations
| Component | Recommendation |
|-----------|---------------|
| Base Image | Ubuntu 22.04 (production, GPU, ML) |
| Python | 3.11 (production), 3.10-3.12 (development) |
| GPU | Ubuntu 22.04 + CUDA 12.4 + PyTorch 2.6.0+cu124 |
### Critical Notes
**MaxFrame SDK NOT in Runtime Image:** SDK and pyodps are client-side only. Custom runtime needs user-specific packages (transformers, pandas, etc.).
**MF_PYTHON_EXECUTABLE (CRITICAL):** Always set: `ENV MF_PYTHON_EXECUTABLE=/py-runtime/envs/<env_name>/bin/python`
### Best Practices Reference
**See:** `references/runtime-image-guides/` for detailed guides on base image selection, Python environment strategy, package management, GPU/CUDA configuration, Dockerfile templates, and testing/validation.
## Operator Selection Workflow
**MANDATORY before implementing processing logic** when user mentions specific operations, asks about efficiency/performance, or you need to find appropriate MaxFrame operator.
### Workflow
1. **Identify Operations** — list required transformations
2. **Find Operators** — `python scripts/lookup_operator.py search "<operation>"`
3. **Present Options** — show operator name, description, trade-offs
4. **Get User Confirmation** — confirm operator and parameters
5. **Implement** — use confirmed operator
**See:** `references/operator-selector.md` for detailed guidance.
## Key Validation Points
Before finishing, validate:
- [ ] `.execute()` called on result DataFrame
- [ ] Session created before operations
- [ ] Session destroyed in `finally` block
- [ ] No hardcoded credentials
- [ ] Operator selection confirmed with user
- [ ] Error handling with logview URL (remote)
- [ ] `debug=True` used (local debug)
- [ ] `MF_PYTHON_EXECUTABLE` set (custom runtime)
## Resources
### References
- **Operator Selector**: `references/operator-selector.md`
- **Local Debug**: `references/local-debug-guide.md`
- **Remote Debug**: `references/remote-debug-guide.md`
- **Complete Workflow**: `references/common-workflow.md`
- **Runtime Guides**: `references/runtime_image_*.md`
### Examples
- **Working Examples**: `assets/examples/*.py`
### Scripts
- **Operator Lookup**: `scripts/lookup_operator.py`
FILE:assets/examples/ai_function_basic.py
"""
Example: Basic AI function usage with MaxFrame ManagedTextLLM.
This example demonstrates the minimum setup for using MaxFrame's AI functions
with the managed LLM models. It shows how to perform basic Q&A tasks using
the built-in managed models without requiring external API keys.
Environment variables required:
- ODPS_PROJECT, ODPS_ACCESS_ID, ODPS_ACCESS_KEY, ODPS_ENDPOINT
"""
import os
import dotenv
import maxframe.dataframe as md
from maxframe import options
from maxframe.learn.contrib.llm.models.managed import ManagedTextLLM
from maxframe.session import new_session
from odps import ODPS
# Load environment variables from .env file
# Replace with your actual .env file path or use environment variables directly
dotenv.load_dotenv()
# Configure SQL settings for AI functions
options.sql.settings = {
"odps.sql.split.dop": '{"*":10}',
}
# Create ODPS connection
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
# Create MaxFrame session
session = new_session(o)
print(f"Session created: {session.session_id}")
# Create a DataFrame with questions
df = md.DataFrame(
{
"query": [
"地球距离太阳的平均距离是多少?",
"美国独立战争是从哪一年开始的?",
"什么是水的沸点?",
]
}
)
df.execute()
# Use ManagedTextLLM for inference
# Available managed models: qwen2.5-0.5b-instruct, qwen2.5-1.5b-instruct,
# qwen2.5-3b-instruct, Qwen2.5-7B-instruct, DeepSeek-R1-Distill-Qwen-1.5B, etc.
llm = ManagedTextLLM(name="qwen2.5-1.5b-instruct")
# Define prompt template
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "{query}"},
]
# Generate responses
result = llm.generate(df, prompt_template=messages)
result.execute()
# Display results
print("AI Function Results:")
print(
result.response.mf.flatjson(
["$.choices[0].message.content"],
dtypes=["str"],
)
.execute()
.fetch()
)
# Clean up session
session.destroy()
print("Session destroyed")
FILE:assets/examples/complex_struct.py
"""
Example: Processing complex structured data with groupby and apply_chunk.
This example demonstrates how to process complex data structures using
groupby operations with custom chunk processing.
Environment variables required:
- ODPS_PROJECT, ODPS_ACCESS_ID, ODPS_ACCESS_KEY, ODPS_ENDPOINT
"""
import os
import dotenv
import maxframe.dataframe as md
import pandas as pd
from maxframe import options
from maxframe.dataframe.utils import parse_index
from maxframe.session import new_session
from odps import ODPS
from pandas.api.types import pandas_dtype
# Load environment variables
dotenv.load_dotenv()
# Configure SQL settings
options.sql.enable_mcqa = False
options.sql.settings = {
"odps.namespace.schema": "true",
"odps.sql.allow.fullscan": "true",
"odps.sql.enable.distributed.limit": "true",
"odps.session.image": "common",
"odps.maxframe.resolve_dlf_tables": "true",
"odps.sql.type.system.odps2": "true",
}
options.dag.settings = {
"engine_order": ["MCSQL"],
"unavailable_engines": ["DPE", "SPE"],
}
# Create ODPS connection
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
session = new_session(o)
# Sample data
data = {
"department": ["HR", "Tech", "HR", "Tech"],
"salary": [6000, 8000, 7000, 9000],
"experience": [3, 5, 4, 6],
}
try:
df = md.DataFrame(pd.DataFrame(data))
def process_each_department(chunk):
"""Process each department group."""
print(chunk, flush=True)
return pd.DataFrame(
{"salary": [chunk["salary"].mean()], "experience": [chunk["experience"].mean()]}
)
# Apply chunk processing with groupby
grouped = df.groupby(["department"], group_keys=True).mf.apply_chunk(
process_each_department,
output_type="dataframe",
dtypes=pd.Series(
[pandas_dtype("float64"), pandas_dtype("float64")],
index=["salary", "experience"],
),
skip_infer=True,
index=parse_index(pd.MultiIndex.from_product([[""], [0]])),
)
result = grouped.execute()
print("Result:")
print(result)
finally:
session.destroy()
FILE:assets/examples/complex_struct_arrow.py
"""
Example: Processing complex Arrow structures with groupby operations.
This example demonstrates how to work with complex Arrow data types
and process them using groupby with apply_chunk.
Environment variables required:
- ODPS_PROJECT, ODPS_ACCESS_ID, ODPS_ACCESS_KEY, ODPS_ENDPOINT
"""
import json
import os
import dotenv
import maxframe.dataframe as md
import pandas as pd
import pyarrow as pa
from maxframe import options
from maxframe.session import new_session
from odps import ODPS
# Load environment variables
dotenv.load_dotenv()
# Configure SQL settings
options.sql.enable_mcqa = False
options.sql.settings = {
"odps.namespace.schema": "true",
"odps.sql.allow.fullscan": "true",
"odps.sql.enable.distributed.limit": "true",
"odps.session.image": "common",
"odps.maxframe.resolve_dlf_tables": "true",
"odps.sql.type.system.odps2": "true",
}
options.dag.settings = {
"engine_order": ["MCSQL", "DPE"],
}
# Create ODPS connection
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
session = new_session(o)
# Define Arrow struct type
struct_type = [
pa.field("calibration_bucket", pa.string()),
pa.field("calibration_file_name", pa.string()),
pa.field("calibration_path", pa.string()),
pa.field("data_id", pa.string()),
pa.field("datanode_info", pa.string()),
pa.field("hdfs_path", pa.string()),
pa.field("meta_uuid", pa.string()),
pa.field("sensor_type", pa.string()),
pa.field("timestamp_bucket", pa.string()),
pa.field("timestamp_path", pa.string()),
]
# Sample data
data = {
"calibration_bucket": ["bucket1", "bucket1", "bucket2", "bucket2", "bucket1"],
"calibration_file_name": ["file1", "file2", "file3", "file4", "file5"],
"calibration_path": ["/path1", "/path2", "/path3", "/path4", "/path5"],
"data_id": ["id1", "id2", "id3", "id4", "id5"],
"datanode_info": ["node1", "node2", "node3", "node4", "node5"],
"hdfs_path": ["/hdfs1", "/hdfs2", "/hdfs3", "/hdfs4", "/hdfs5"],
"meta_uuid": ["uuid1", "uuid1", "uuid2", "uuid2", "uuid1"],
"sensor_type": ["type1", "type1", "type2", "type2", "type1"],
"timestamp_bucket": ["ts1", "ts1", "ts2", "ts2", "ts1"],
"timestamp_path": ["/ts1", "/ts1", "/ts2", "/ts2", "/ts1"],
}
try:
df = md.DataFrame(pd.DataFrame(data))
def process_group(chunk):
"""Process a group and convert records to JSON."""
records = chunk.to_dict(orient="records")
return pd.DataFrame({"records": [json.dumps(records)]})
# Group by meta_uuid and process
grouped = df.groupby("meta_uuid", group_keys=True).mf.apply_chunk(
process_group,
output_type="dataframe",
dtypes=pd.Series(
[pd.ArrowDtype(pa.string())],
index=["records"],
),
skip_infer=True,
)
result = grouped.execute()
print("Result:")
print(result)
finally:
session.destroy()
FILE:assets/examples/dlf_table_write_basic.py
"""
Example: Writing to DLF (Data Lake Formation) external tables.
This example demonstrates how to configure MaxFrame to write to DLF external tables.
Environment variables required:
- ODPS_PROJECT, ODPS_ACCESS_ID, ODPS_ACCESS_KEY, ODPS_ENDPOINT
"""
import os
import dotenv
import maxframe.dataframe as md
from maxframe import options
from maxframe.session import new_session
from odps import ODPS
# Load environment variables
dotenv.load_dotenv()
# Replace with your actual table names
input_table = "your_project.your_schema.your_input_table" # DLF external table
output_table = "your_project.your_schema.your_output_table" # DLF external table
lifecycle = 30
# Configure SQL settings for DLF support
options.sql.enable_mcqa = False
options.sql.settings = {
"odps.namespace.schema": "true",
"odps.sql.allow.fullscan": "true",
"odps.sql.enable.distributed.limit": "true", # Enable distributed limit
"odps.session.image": "maxframe_service_dpe_runtime",
"odps.maxframe.resolve_dlf_tables": "true", # Support DLF external tables
}
options.dag.settings = {
"engine_order": ["MCSQL"],
"unavailable_engines": ["DPE", "SPE"],
}
# Create ODPS connection
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
# Create session (adjust major_version if needed)
session = new_session(o)
try:
# Read from DLF table
df = md.read_odps_query(f"SELECT * FROM {input_table} LIMIT 100")
# Write to DLF table
md.to_odps_table(df, output_table, lifecycle=lifecycle, overwrite=True, index=True).execute()
finally:
session.destroy()
FILE:assets/examples/dlf_table_write_with_pk.py
"""
Example: Writing to DLF PK (Primary Key) tables with binary data handling.
This example demonstrates how to write to DLF tables with primary keys,
including handling binary data types.
Environment variables required:
- ODPS_PROJECT, ODPS_ACCESS_ID, ODPS_ACCESS_KEY, ODPS_ENDPOINT
"""
import os
import dotenv
import maxframe.dataframe as md
import pandas as pd
import pyarrow as pa
from maxframe import options
from maxframe.session import new_session
from odps import ODPS
# Load environment variables
dotenv.load_dotenv()
# Replace with your actual table names
input_table = "your_project.your_schema.your_input_table" # DLF external table
output_table = "your_project.your_schema.your_output_table" # DLF external table
pk_table = "your_project.your_schema.your_pk_table" # DLF PK table
lifecycle = 30
# Configure SQL settings for DLF and PK table support
options.sql.enable_mcqa = False
options.sql.settings = {
"odps.namespace.schema": "true",
"odps.sql.allow.fullscan": "true",
"odps.sql.enable.distributed.limit": "true", # Enable distributed limit
"odps.session.image": "maxframe_service_dpe_runtime",
"odps.maxframe.resolve_dlf_tables": "true", # Support DLF external tables
"odps.sql.type.system.odps2": "true", # Support DLF PK tables
}
options.dag.settings = {
"engine_order": ["MCSQL"],
"unavailable_engines": ["DPE", "SPE"],
}
# Create ODPS connection
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
session = new_session(o)
try:
# Write to DLF Append table (default)
df = md.read_odps_table(input_table)
md.to_odps_table(df, output_table, lifecycle=lifecycle, overwrite=True, index=True).execute()
# Write to DLF PK table with binary data
df = pd.DataFrame(
{
"userid": [11, 22],
"username": ["name1", "name2"],
"userbyte": [b"binary_data_11", b"binary_data_22"],
}
)
# Fix incompatible type STRING with destination column userbyte
# Use Arrow binary type for proper binary data handling
new_data = md.DataFrame(df.astype({"userbyte": pd.ArrowDtype(pa.binary())}))
new_data.to_odps_table(f"{pk_table}", overwrite=True).execute()
finally:
session.destroy()
FILE:assets/examples/fs_mount_example.py
"""
Example: MaxFrame OSS Mount - Read Model Directory
Demonstrates how to use fs_mount to read files from OSS in a distributed manner.
Environment variables required:
- ODPS_PROJECT, ODPS_ACCESS_ID, ODPS_ACCESS_KEY, ODPS_ENDPOINT
- OSS_MOUNT_PATH, OSS_MOUNT_ROLE_ARN
"""
import os
import time
import numpy as np
import pandas as pd
from dotenv import load_dotenv
from maxframe import dataframe as md
from maxframe.config import options
from maxframe.session import new_session
from maxframe.udf import with_fs_mount, with_running_options
from odps import ODPS
load_dotenv()
# MaxFrame configuration
options.sql.enable_mcqa = False
options.sql.settings = {"odps.session.image": "maxframe_service_dpe_runtime"}
options.dag.settings = {"engine_order": ["DPE", "MCSQL", "SPE"]}
# ODPS connection
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
session = new_session(o)
print(f"Session: {session.session_id}")
print(f"Logview: {session.get_logview_address()}")
# Define function to read entire directory
# NOTE: memory parameter is in GIGABYTES (GB), not MB!
# memory=4 means 4 GB, NOT 4096 MB
@with_running_options(engine="dpe", cpu=2, memory=4)
@with_fs_mount(
os.getenv("OSS_MOUNT_PATH", "oss://YOUR_BUCKET/YOUR_PATH/"),
"/mnt/model",
storage_options={
"role_arn": os.getenv("OSS_MOUNT_ROLE_ARN", "acs:ram::YOUR_ACCOUNT_ID:role/YOUR_ROLE")
},
)
def read_model_directory(row):
"""Read all files in the model directory"""
import json
import os
import time
worker_id = row.get("worker_id", "UNKNOWN")
model_dir = "/mnt/model"
chunk_size = 4 * 1024 * 1024 # 4 MB
start_time = time.time()
total_bytes = 0
files_read = 0
file_details = []
try:
if not os.path.exists(model_dir):
return {
"worker_id": int(worker_id),
"task_name": str(row.get("task_name", "unknown")),
"status": "directory_not_found",
"total_bytes": 0,
"files_count": 0,
"read_time": 0.0,
"throughput_mbps": 0.0,
"file_details": "[]",
}
for filename in os.listdir(model_dir):
file_path = os.path.join(model_dir, filename)
if not os.path.isfile(file_path):
continue
file_bytes = 0
try:
with open(file_path, "rb") as f:
while chunk := f.read(chunk_size):
file_bytes += len(chunk)
total_bytes += len(chunk)
file_details.append(
{
"filename": filename,
"size_mb": round(file_bytes / 1024 / 1024, 2),
}
)
files_read += 1
except Exception as e:
file_details.append({"filename": filename, "error": str(e)})
read_time = time.time() - start_time
throughput_mbps = (total_bytes / 1024 / 1024) / read_time if read_time > 0 else 0
print(
f"[Worker {worker_id}] Read {files_read} files, {total_bytes / 1024**3:.2f} GB in {read_time:.2f}s ({throughput_mbps:.2f} MB/s)"
)
return {
"worker_id": int(worker_id),
"task_name": str(row.get("task_name", "unknown")),
"status": "success",
"total_bytes": int(total_bytes),
"files_count": int(files_read),
"read_time": float(read_time),
"throughput_mbps": float(throughput_mbps),
"file_details": json.dumps(file_details),
}
except Exception as e:
print(f"[Worker {worker_id}] Error: {str(e)}")
return {
"worker_id": int(worker_id),
"task_name": str(row.get("task_name", "unknown")),
"status": f"error: {str(e)}",
"total_bytes": int(total_bytes),
"files_count": int(files_read),
"read_time": float(time.time() - start_time),
"throughput_mbps": 0.0,
"file_details": "[]",
}
# Create test tasks
num_workers = 10
print(f"\nCreating {num_workers} concurrent tasks...")
data = [{"worker_id": i, "task_name": f"read_dir_{i}"} for i in range(num_workers)]
try:
df = md.DataFrame(pd.DataFrame(data))
df_rebalanced = df.mf.rebalance(num_partitions=num_workers)
# Define output types
output_dtypes = df.dtypes.copy()
output_dtypes.update(
{
"status": np.dtype("O"),
"total_bytes": np.dtype("int64"),
"files_count": np.dtype("int64"),
"read_time": np.dtype("float64"),
"throughput_mbps": np.dtype("float64"),
"file_details": np.dtype("O"),
}
)
# Execute
print("Starting test...")
test_start = time.time()
result = (
df_rebalanced.apply(
read_model_directory,
axis=1,
dtypes=output_dtypes,
output_type="dataframe",
result_type="expand",
)
.execute()
.fetch()
)
total_time = time.time() - test_start
# Statistics
successful = result[result["status"] == "success"]
failed = result[result["status"] != "success"]
print(f"\n{'='*60}")
print(f"Total tasks: {len(result)} | Success: {len(successful)} | Failed: {len(failed)}")
if len(successful) > 0:
normal = successful[(successful["files_count"] > 0) & (successful["read_time"] > 0)]
stats = normal if len(normal) > 0 else successful
print("\nWorker Performance:")
print(f"{'Worker':<8} {'Files':<8} {'Data(GB)':<12} {'Time(s)':<10} {'MB/s':<10}")
print("-" * 60)
for _, row in successful.iterrows():
print(
f"{row['worker_id']:<8} {row['files_count']:<8} "
f"{row['total_bytes']/1024**3:<12.2f} {row['read_time']:<10.2f} {row['throughput_mbps']:<10.2f}"
)
print("\nSummary:")
print(f" Test time: {total_time:.2f}s")
print(
f" Avg read time: {stats['read_time'].mean():.2f}s (min: {stats['read_time'].min():.2f}s, max: {stats['read_time'].max():.2f}s)"
)
print(f" Avg throughput: {stats['throughput_mbps'].mean():.2f} MB/s")
print(f" Total data: {successful['total_bytes'].sum() / 1024**3:.2f} GB")
print(
f" Aggregate throughput: {(successful['total_bytes'].sum() / 1024**2) / total_time:.2f} MB/s"
)
if len(stats) > 0:
speedup = (stats["read_time"].mean() * len(stats)) / total_time if total_time > 0 else 0
print(f" Speedup: {speedup:.2f}x | Efficiency: {speedup / len(stats) * 100:.1f}%")
if len(failed) > 0:
print(f"\nFailed tasks: {failed[['worker_id', 'status']].to_string(index=False)}")
print("Done!")
finally:
session.destroy()
FILE:assets/examples/gpu_unit_dpe_processing.py
"""
Example: Using GPU Units (GU) with DPE engine for accelerated processing.
This example demonstrates how to use the @with_running_options decorator
to allocate GPU Units (GU) when running operations on the DPE engine.
GU allocation enables GPU-accelerated processing for compute-intensive tasks.
Environment variables required:
- ODPS_PROJECT, ODPS_ACCESS_ID, ODPS_ACCESS_KEY, ODPS_ENDPOINT
"""
import os
import dotenv
import maxframe.dataframe as md
import numpy as np
from maxframe.config import options
from maxframe.session import new_session
from maxframe.udf import with_running_options
from odps import ODPS
# Load environment variables from .env file
# Replace with your actual .env file path or use environment variables directly
dotenv.load_dotenv()
# Configure DPE engine settings
options.dag.settings = {
"engine_order": ["DPE"],
"unavailable_engines": ["MCSQL", "SPE"],
}
options.sql.settings = {"odps.session.image": "maxframe_service_dpe_runtime"}
options.local_execution.enabled = False
# Create ODPS connection
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
tunnel_endpoint=os.getenv("ODPS_TUNNEL_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
# Create MaxFrame session
session = new_session(o)
print(f"Session created: {session.get_logview_address()}")
# Define a function that uses GPU resources
# Replace 'your_gu_quota' with your actual GU quota name
# NOTE: This example uses GPU Units (GU) instead of CPU/memory
# For CPU/memory allocation, use parameters like: cpu=2, memory=4 (memory in GB!)
@with_running_options(engine="dpe", gu=1, gu_quota="your_gu_quota")
def gpu_accelerated_process(row):
"""
Process data with GPU acceleration.
This function will be executed on DPE engine with 1 GU allocated.
Replace this with your actual GPU-accelerated logic.
"""
# Example: perform some computation
result = row.copy()
result["C"] = result["A"] * result["B"]
return result
# Create sample DataFrame
df_input = md.DataFrame(
{
"A": np.random.randint(1, 100, 1000),
"B": np.random.randint(1, 100, 1000),
}
)
try:
# Apply the GPU-accelerated function
df_result = df_input.apply(
gpu_accelerated_process,
axis=1,
dtypes=df_input.dtypes,
output_type="dataframe",
result_type="expand",
skip_infer=True,
)
# Execute and fetch results
result = df_result.execute().fetch()
print(f"Processing completed. Result shape: {result.shape}")
print(f"First 5 rows:\n{result.head()}")
finally:
# Clean up session
session.destroy()
print("Session destroyed")
FILE:assets/examples/groupby_batch_processing.py
"""
Example: GroupBy operations with apply_chunk for batch processing.
This example demonstrates how to use groupby with apply_chunk to process
data in batches efficiently.
Environment variables required:
- ODPS_PROJECT, ODPS_ACCESS_ID, ODPS_ACCESS_KEY, ODPS_ENDPOINT
"""
import logging
import os
import dotenv
import maxframe.dataframe as md
import numpy as np
import pandas as pd
from maxframe.dataframe.utils import parse_index
from maxframe.session import new_session
from odps import ODPS
logging.basicConfig(level=logging.INFO)
# Load environment variables from .env file
# Replace with your actual .env file path or use environment variables directly
dotenv.load_dotenv()
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
session = new_session(o)
try:
# Create sample DataFrame
df = md.read_pandas(
pd.DataFrame(
{
"A": np.random.choice(["group1", "group2", "group3"], 3000),
"B": np.random.randn(3000),
"C": np.random.randn(3000),
"D": np.random.randn(3000),
"E": np.random.randn(3000),
}
)
)
df.execute()
def process_batch(chunk):
"""Process a batch of data."""
return chunk[["B"]]
# Apply chunk processing with groupby
result_df = df.groupby(["A"], group_keys=True).mf.apply_chunk(
process_batch,
batch_rows=1000,
output_type="dataframe",
)
print(f"Result dtypes: {result_df.dtypes}")
print(f"Result index: {result_df.index_value}")
result_df.execute()
def process_to_json(chunk):
"""Process chunk and convert to JSON string."""
import json
print(chunk, flush=True)
print(f"Group shape: {chunk.shape}")
list_value = json.dumps(chunk["B"].tolist(), ensure_ascii=False)
result = pd.DataFrame({"B": [list_value]})
print(result, flush=True)
return result
# Apply chunk with custom index
result_df = df.groupby(["A"], group_keys=True).mf.apply_chunk(
process_to_json,
batch_rows=1000,
output_type="dataframe",
index=parse_index(pd.MultiIndex.from_product([[""], [0]])),
)
print(f"Result dtypes: {result_df.dtypes}")
print(f"Result index: {result_df.index_value}")
result_df.execute()
finally:
session.destroy()
FILE:assets/examples/oss_multi_mount.py
"""
Minimum example showing how to use oss_mount with single and multiple mounting.
Environment variables required:
- ODPS_PROJECT, ODPS_ACCESS_ID, ODPS_ACCESS_KEY, ODPS_ENDPOINT
- OSS_BUCKET_NAME, OSS_ENDPOINT, OSS_ROLE_ARN
"""
import os
import dotenv
import maxframe.dataframe as md
from maxframe.config import options
from maxframe.session import new_session
from maxframe.udf import with_fs_mount, with_running_options
from odps import ODPS
# Load environment variables from .env file
dotenv.load_dotenv()
options.sql.enable_mcqa = False
options.sql.settings = {"odps.session.image": "maxframe_service_dpe_runtime"}
options.dag.settings = {"engine_order": ["DPE", "MCSQL", "SPE"]}
# Initialize ODPS and session
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
session = new_session(o, major_version=os.getenv("ODPS_MAJOR_VERSION", "default"))
print(f"Logview: {session.get_logview_address()}")
print(f"Session ID: {session.session_id}")
# Example 1: Single OSS mount
# NOTE: memory parameter is in GIGABYTES (GB), not MB!
# memory=2 means 2 GB, NOT 2048 MB
@with_running_options(engine="dpe", cpu=1, memory=2)
@with_fs_mount(
f"oss://{os.getenv('OSS_ENDPOINT')}/{os.getenv('OSS_BUCKET_NAME')}/data/",
"/mnt/oss_data",
storage_options={"role_arn": os.getenv("OSS_ROLE_ARN")},
)
def process_with_single_mount(row):
"""Example with single OSS mount"""
import os
mount_path = "/mnt/oss_data"
if os.path.exists(mount_path):
files = os.listdir(mount_path)
print(f"Single mount successful. Files: {files}")
else:
print("Single mount failed")
return row
# Example 2: Multiple OSS mounts
# NOTE: memory parameter is in GIGABYTES (GB), not MB!
# memory=2 means 2 GB, NOT 2048 MB
@with_running_options(engine="dpe", cpu=1, memory=2)
@with_fs_mount(
f"oss://{os.getenv('OSS_ENDPOINT')}/{os.getenv('OSS_BUCKET_NAME')}/data1/",
"/mnt/oss_data1",
storage_options={"role_arn": os.getenv("OSS_ROLE_ARN")},
)
@with_fs_mount(
f"oss://{os.getenv('OSS_ENDPOINT')}/{os.getenv('OSS_BUCKET_NAME')}/data2/",
"/mnt/oss_data2",
storage_options={"role_arn": os.getenv("OSS_ROLE_ARN")},
)
@with_fs_mount(
f"oss://{os.getenv('OSS_ENDPOINT')}/{os.getenv('OSS_BUCKET_NAME')}/data3/",
"/mnt/oss_data3",
storage_options={"role_arn": os.getenv("OSS_ROLE_ARN")},
)
def process_with_multiple_mounts(row):
"""Example with multiple OSS mounts"""
import os
mount_paths = ["/mnt/oss_data1", "/mnt/oss_data2", "/mnt/oss_data3"]
for mount_path in mount_paths:
if os.path.exists(mount_path):
files = os.listdir(mount_path)
print(f"Mount {mount_path} successful. Files: {files}")
else:
print(f"Mount {mount_path} failed")
return row
# Create a simple test dataframe
df = md.DataFrame({"id": [1, 2, 3]})
# Test single mount
print("\n=== Testing Single Mount ===")
df_single = df.apply(
process_with_single_mount,
axis=1,
dtypes=df.dtypes,
output_type="dataframe",
result_type="expand",
skip_infer=True,
)
result_single = df_single.execute().fetch()
print(result_single)
# Test multiple mounts
print("\n=== Testing Multiple Mounts ===")
df_multi = df.apply(
process_with_multiple_mounts,
axis=1,
dtypes=df.dtypes,
output_type="dataframe",
result_type="expand",
skip_infer=True,
)
result_multi = df_multi.execute().fetch()
print(result_multi)
print("\nAll tests completed!")
# Clean up
session.destroy()
FILE:references/common-workflow.md
# Common Workflow Complete Guide
Detailed guide for the complete MaxFrame development workflow with comprehensive examples.
## Session Setup Patterns
### Pattern 1: Auto-detect (DataWorks/MaxCompute Notebook)
```python
import os
import maxframe.dataframe as md
from maxframe.session import new_session
from odps import ODPS
# Auto-detect from environment (preferred in DataWorks/MaxCompute Notebook)
session = new_session()
```
### Pattern 2: Explicit ODPS Connection
```python
import os
import dotenv
import maxframe.dataframe as md
from maxframe.session import new_session
from odps import ODPS
dotenv.load_dotenv()
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
session = new_session(o)
```
### Pattern 3: Production-ready Session
```python
import logging
import maxframe.dataframe as md
from maxframe.session import new_session
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
session = new_session()
try:
logger.info(f"Session created. Logview: {session.get_logview_address()}")
# Your operations
...
finally:
session.destroy()
logger.info("Session destroyed")
```
## Reading Data Patterns
### Pattern 1: Basic Table Read
```python
# Read from MaxCompute table
df = md.read_odps_table("table_name")
# Read with index column
df = md.read_odps_table("table_name", index_col="id")
# With column selection
df = md.read_odps_table("table_name", columns=['id', 'value', 'timestamp'])
# With partition filter
df = md.read_odps_table("table_name", partition='ds=2024-01-01')
```
### Pattern 2: SQL Query Read
```python
# Read from SQL query with filters
df = md.read_odps_query(
"SELECT * FROM table WHERE date >= '2024-01-01' AND status = 'active'"
)
# Complex SQL with joins
df = md.read_odps_query(
"SELECT a.*, b.value FROM table_a a JOIN table_b b ON a.id = b.id"
)
```
### Pattern 3: Sample Data Construction
When user doesn't provide input table name, construct pandas DataFrame:
```python
import pandas as pd
import numpy as np
# Time series analysis example
example_pd_df = pd.DataFrame({
'timestamp': pd.date_range('2026-01-01', periods=1000, freq='H'),
'metric_name': np.random.choice(['cpu', 'memory', 'disk'], 1000),
'value': np.random.randn(1000) * 10 + 50,
'host_id': np.random.choice(['host1', 'host2', 'host3'], 1000)
})
# Load into MaxFrame
df = md.read_pandas(example_pd_df)
```
**Key guidelines for sample data:**
- Match data types and structure to job requirements
- Use realistic value ranges for the domain
- Include 100-1000 rows to demonstrate logic
- Use descriptive column names matching operations
## Operator Selection Workflow
### Step 1: Identify Required Operations
Break down the task into specific operations needed:
- Filtering
- Grouping
- Aggregation
- Transformation
- Merging
- Sorting
### Step 2: Find MaxFrame Operators
Use operator-selector agent or script:
```bash
# Search for operators by task description
python scripts/lookup_operator.py search "time series resampling"
# Check if a specific operator exists
python scripts/lookup_operator.py info apply_chunk
# Get detailed operator information
python scripts/lookup_operator.py info groupby
```
### Step 3: Present Options to User
```
For your data aggregation task, I've identified these options:
1. `groupby().agg()` - Standard pandas-compatible approach
- Pros: Familiar API, good for standard aggregations
- Cons: May be slow for large datasets with custom logic
2. `mf.apply_chunk()` - For custom aggregation with large datasets
- Pros: Efficient batch processing, custom logic support
- Cons: More complex, requires batch size tuning
Which approach do you prefer, or would you like me to explore other options?
```
### Step 4: Get User Confirmation
**MANDATORY:** Do not proceed without user confirmation.
## Processing Patterns
### Pattern 1: Standard pandas Operations
```python
# Filter
filtered = df[df['column'] > 10]
# GroupBy and aggregate
result = df.groupby('category').agg({'value': 'sum'})
# Add columns
df['new_col'] = df['col1'] + df['col2']
# Sort
df_sorted = df.sort_values('column')
# Merge
df_merged = df1.merge(df2, on='key')
# Multiple aggregations
result = df.groupby('category').agg({
'value': ['sum', 'mean', 'count'],
'price': 'max'
})
```
### Pattern 2: Batch Processing (Large Datasets)
```python
def process_batch(chunk):
# Custom processing logic
return chunk * 2
result = df.mf.apply_chunk(
process_batch,
batch_rows=1024, # Tune batch size for performance
output_type='dataframe'
)
```
### Pattern 3: UDF with Resource Allocation
```python
from maxframe.udf import with_running_options
@with_running_options(engine="dpe", cpu=2, memory=4)
def process_batch(batch):
# CRITICAL: memory=4 means 4 GB, NOT 4 MB
return batch * 2
result = df.mf.apply_chunk(process_batch)
```
## Writing Data Patterns
### Pattern 1: Write to MaxCompute Table
```python
# Write to MaxCompute table
md.to_odps_table(df, "output_table", overwrite=True).execute()
```
### Pattern 2: Write to DLF External Table
```python
from maxframe import options
# Enable DLF support
options.sql.settings = {
"odps.maxframe.resolve_dlf_tables": "true"
}
md.to_odps_table(df, "dlf_table").execute()
```
### Pattern 3: Multiple Output Tables
```python
try:
md.to_odps_table(df1, "output_table1").execute()
md.to_odps_table(df2, "output_table2").execute()
finally:
session.destroy()
```
## Execution and Cleanup Patterns
### Pattern 1: Basic Execution
```python
# Execute operations (required for lazy execution)
result.execute()
# Destroy session when done
session.destroy()
```
### Pattern 2: Safe Cleanup (Production)
```python
try:
# Execute operations
result.execute()
finally:
# Destroy session (always runs, even on error)
session.destroy()
```
### Pattern 3: Comprehensive Cleanup
```python
import logging
logger = logging.getLogger(__name__)
try:
result.execute()
logger.info("Execution successful")
except Exception as e:
logger.error(f"Execution failed: {e}")
raise
finally:
try:
session.destroy()
logger.info("Session destroyed")
except Exception as cleanup_error:
logger.warning(f"Cleanup error: {cleanup_error}")
```
## Verification Pattern
Use `py_compile` to test generated job script:
```bash
python -m py_compile your_script.py
```
## Complete Example Pipeline
```python
import os
import logging
import dotenv
import maxframe.dataframe as md
from maxframe.session import new_session
dotenv.load_dotenv()
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Setup session
session = new_session()
try:
logger.info(f"Session created. Logview: {session.get_logview_address()}")
# Read data
df = md.read_odps_table("source_table", columns=['id', 'value', 'category'])
# Process (after confirming operators with user)
filtered = df[df['value'] > 100]
result = filtered.groupby('category').agg({'value': 'sum'})
# Write output
md.to_odps_table(result, "output_table", overwrite=True).execute()
logger.info("Job completed successfully")
logger.info(f"Final Logview: {session.get_logview_address()}")
finally:
session.destroy()
logger.info("Session destroyed")
```
FILE:references/installation.md
# MaxFrame Installation Guide
This guide provides step-by-step instructions for installing and configuring MaxFrame for distributed data processing on MaxCompute.
## Table of Contents
- [Prerequisites](#prerequisites)
- [Dependencies](#dependencies)
- [Environment Configuration](#environment-config)
- [Required Environment Variables](#required-environment-variables)
- [Setting Environment Variables](#setting-environment-variables)
- [Find Your MaxCompute Endpoint](#find-your-maxcompute-endpoint)
- [Installation Verification](#installation-verification)
- [Session Setup](#session-setup)
- [Manual Session Creation](#manual-session-creation)
- [Auto-Detect from Environment](#auto-detect-from-environment)
- [Troubleshooting](#troubleshooting)
- [Common Issues](#common-issues)
- [Getting Help](#getting-help)
- [Next Steps](#next-steps)
- [Cleanup](#cleanup)
## Prerequisites
- Python 3.7 or higher
- MaxCompute (ODPS) account with valid credentials
- Access to a MaxCompute project
## Dependencies
Install the required Python packages:
```bash
pip install maxframe -U
```
The required packages are:
- **maxframe** - MaxFrame SDK for distributed data processing
- **pyodps** - ODPS Python SDK for MaxCompute access
- **pandas** - Data manipulation library (for pandas-compatible APIs)
## Environment Configuration
### Required Environment Variables
Configure the following environment variables to authenticate with MaxCompute:
| Variable | Description |
|----------|-------------|
| `ODPS_ACCESS_ID` | MaxCompute access ID (username) |
| `ODPS_ACCESS_KEY` | MaxCompute access key (password) |
| `ODPS_PROJECT` | MaxCompute project name |
| `ODPS_ENDPOINT` | MaxCompute endpoint URL |
### Setting Environment Variables
#### Option 1: Set in Shell
```bash
export ODPS_ACCESS_ID="your_access_id"
export ODPS_ACCESS_KEY="your_access_key"
export ODPS_PROJECT="your_project_name"
export ODPS_ENDPOINT="your_endpoint"
```
#### Option 2: Use .env File
Create a `.env` file in your project directory:
```env
ODPS_ACCESS_ID=your_access_id
ODPS_ACCESS_KEY=your_access_key
ODPS_PROJECT=your_project_name
ODPS_ENDPOINT=your_endpoint
```
Then load the environment variables in Python:
```python
from dotenv import load_dotenv
load_dotenv()
```
### Find Your MaxCompute Endpoint
MaxCompute endpoints vary by region, check the [MaxCompute documentation](https://www.alibabacloud.com/help/zh/maxcompute/user-guide/endpoints?spm=a2c63.p38356.help-menu-search-27797.d_0) for the correct endpoint for your region.
## Installation Verification
Verify your installation by running the following Python script:
```python
import os
from dotenv import load_dotenv
from odps import ODPS
from maxframe.session import new_session
# Load environment variables
load_dotenv()
# Create ODPS connection
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
# Create MaxFrame session
session = new_session(o)
print("MaxFrame installation verified successfully!")
print(f"Connected to project: {o.project}")
# Destroy session when done
session.destroy()
```
## Session Setup
### Manual Session Creation
Create a session with explicit credentials:
```python
import os
import maxframe.dataframe as md
from maxframe.session import new_session
from odps import ODPS
# Create ODPS connection
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
# Create MaxFrame session
session = new_session(o)
```
### Auto-Detect from Environment
In environments like DataWorks or MaxCompute Notebook, ODPS credentials are automatically available:
```python
from maxframe.session import new_session
# Auto-detects ODPS from environment
session = new_session()
```
## Troubleshooting
### Common Issues
#### Issue: Connection Authentication Failed
**Symptoms**: Error message indicating invalid credentials or authentication failure.
**Solutions**:
- Verify all environment variables are set correctly
- Check that your access key has not expired
- Ensure you have the correct endpoint for your region
- Verify your project name is accurate
```bash
# Test environment variables
echo $ODPS_ACCESS_ID
echo $ODPS_PROJECT
echo $ODPS_ENDPOINT
```
#### Issue: Package Installation Fails
**Symptoms**: `pip install` fails with dependency conflicts or permission errors.
**Solutions**:
- Use a virtual environment to isolate dependencies:
```bash
python -m venv maxframe_env
source maxframe_env/bin/activate # On Windows: maxframe_env\Scripts\activate
pip install maxframe pyodps pandas --prefer-binary
```
- Upgrade pip before installing:
```bash
pip install --upgrade pip
pip install maxframe pyodps pandas --prefer-binary
```
#### Issue: Session Creation Fails
**Symptoms**: `new_session()` raises an exception.
**Solutions**:
- Verify network connectivity to the MaxCompute endpoint
- Check firewall rules allow outbound connections
- Ensure your MaxCompute account has the necessary permissions
- Try the auto-detect method if available in your environment
- Use VPC endpoint if you are in vpc networking
#### Issue: Lazy Execution Not Working
**Symptoms**: Operations appear to do nothing until `.execute()` is called.
**Note**: This is expected behavior. MaxFrame uses lazy execution. Always call `.execute()` to trigger computation:
```python
# This does not execute immediately
result = df.groupby('category').sum()
# Execute the computation
result.execute()
```
### Getting Help
If you encounter issues not covered here:
1. Check the [MaxFrame Documentation](https://maxframe.readthedocs.io/en/latest/)
2. Review the [MaxFrame Client Repository](https://github.com/aliyun/alibabacloud-odps-maxframe-client.git)
3. Consult the sample code in `assets/examples/` for working examples
4. Contact your MaxCompute administrator for account-specific issues
## Next Steps
After successful installation:
1. Review the [MaxFrame Context Guide](maxframe-context.md) for comprehensive feature documentation
2. Explore the [sample code](../assets/examples/) for working examples
3. Start building your first MaxFrame program using the [Common Workflow](../SKILL.md#common-workflow)
## Cleanup
Destroy your session when done to free resources:
```python
session.destroy()
```
FILE:references/local-debug-guide.md
# MaxFrame Local Debug Mode Guide
This guide provides comprehensive instructions for using MaxFrame's local debug mode, which enables offline UDF development with full IDE debugging support.
## Overview
MaxFrame Local Debug Mode is designed for data development engineers to debug UDF (User-Defined Functions) locally without connecting to remote MaxCompute services. It provides a seamless development experience with IDE breakpoint support for functions like `apply()` and `apply_chunk()`.
## Core Value
| Feature | Traditional Approach | Local Debug Mode |
|---------|---------------------|------------------|
| Breakpoint Debugging | ❌ Not supported | ✅ Full IDE support |
| Remote Dependency | ❌ Requires cluster connection | ✅ Completely offline |
| Debug Cycle | ❌ Submit to remote each time | ✅ Local immediate execution |
| Code Changes | ❌ Multiple code versions | ✅ Same code for dev/prod |
### Key Benefits
1. **Zero-Configuration Startup**: Simply use `debug=True` or `debug="local"` - no additional tools or services required
2. **Completely Offline**: No dependency on network or remote cluster resources
3. **Native IDE Support**: Breakpoints, variable inspection, step-by-step execution - all debugging capabilities preserved
4. **Flexible Data Sources**: Support for in-memory data, local files, or MaxCompute tables
5. **Seamless Production Switch**: Remove `debug=True` parameter and code runs directly in production
## When to Use Local Debug Mode
Use local debug mode when:
- Developing UDF functions (`apply`, `apply_chunk`)
- Need IDE breakpoints and step-by-step debugging
- Want to debug offline without network access
- Working on complex logic that requires iterative testing
- Need to verify data transformation logic quickly
**Use remote debug mode instead when:**
- Testing with production-scale data on MaxCompute
- Need to verify execution on actual cluster
- Investigating runtime issues that require logview URLs
- Debugging distributed execution problems
## Quick Start
### Prerequisites
```bash
pip install --upgrade maxframe # Requires MaxFrame SDK 2.5.0 or later
```
### Basic Example
```python
from odps import ODPS
from maxframe import new_session
import maxframe.dataframe as md
import pandas as pd
# Initialize ODPS object
# Note: In local debug mode, ODPS object is only used for schema validation
# Actual credentials are not used for execution
o = ODPS(
access_id=os.getenv('ODPS_ACCESS_ID', 'dummy_access_id'),
secret_access_key=os.getenv('ODPS_ACCESS_KEY', 'dummy_secret_key'),
project=os.getenv('ODPS_PROJECT', 'dummy_project'),
endpoint=os.getenv('ODPS_ENDPOINT', 'dummy_endpoint'),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
# Enable local debug mode
session = new_session(o, debug=True)
# Prepare sample data
df = md.DataFrame(pd.DataFrame({
"sales": [5000, 8000, 12000, 3000],
"region": ["A", "B", "C", "D"]
}))
def calculate_commission(row):
sales = row['sales']
if sales > 10000: # Set breakpoint here
rate = 0.15
print(rate)
elif sales > 5000: # Set breakpoint here
rate = 0.10
print(rate)
else:
rate = 0.05
return sales * rate
# Execute and get results
result = df.apply(calculate_commission, axis=1).execute().fetch()
print(result)
```
## Key Features
### 1. Zero-Configuration Startup
Simply add `debug=True` or `debug="local"` when creating a session:
```python
# Local debug mode
session = new_session(o, debug=True)
# or
session = new_session(o, debug="local")
# Production mode (just remove debug parameter)
session = new_session(o)
```
### 2. IDE-Friendly Debugging
- **Supported IDEs**: PyCharm, VSCode, and other mainstream IDEs, as well as DataWorks Notebook
- **Breakpoints**: Set breakpoints anywhere in your UDF functions
- **Step-by-Step Execution**: Use F5/F6/F7/F8 to navigate through code
- **Variable Inspection**: View and modify variables during debugging
- **Debugging Experience**: Identical to local Python development
### 3. Multiple Data Sources
| Data Source Type | Access Method | Use Case |
|-----------------|---------------|----------|
| In-Memory Data | `md.DataFrame(pd.DataFrame())` | Quick logic validation |
| MaxCompute Table | `md.read_odps_table()` | Real data testing |
| Local Files | `pd.read_csv()` and other native Pandas interfaces | Offline development |
**Example with different data sources:**
```python
# 1. In-memory data (fastest for testing)
import pandas as pd
df = md.DataFrame(pd.DataFrame({
"col1": [1, 2, 3],
"col2": ["a", "b", "c"]
}))
# 2. MaxCompute table (real data)
df = md.read_odps_table("your_table_name")
# 3. Local file (offline development)
local_df = pd.read_csv("local_data.csv")
df = md.read_pandas(local_df)
```
### 4. Code Compatibility
Debugging code is identical to production code. Simply remove the `debug` parameter when deploying:
```python
# Development environment
session = new_session(o, debug=True)
# ... your code ...
# Production environment
session = new_session(o)
# ... same code ...
```
## Application Scenarios
| Scenario | Description |
|----------|-------------|
| UDF Logic Development | Real-time debugging and verification when writing complex business logic |
| Data Transformation Testing | Validate data cleaning and transformation rules |
| Problem Investigation | Identify root causes of UDF execution exceptions |
| Offline Development | Continue development work in environments without network access |
## Important Considerations
### 1. Performance Differences
Local debug mode is designed for development and verification. Performance characteristics differ from production environment:
- Execution happens locally, not distributed
- Performance is not representative of production cluster performance
- Best suited for small-scale sample data
### 2. Data Volume Limitations
For optimal debugging experience:
- Use small-scale sample data (recommended: 100-1000 rows)
- Large datasets may slow down local execution
- Focus on logic correctness rather than performance
### 3. Dependency Consistency
Ensure local Python environment matches production:
- Same Python version
- Same package versions (maxframe, pandas, numpy, etc.)
- Use `pip freeze > requirements.txt` to capture dependencies
### 4. Sensitive Data Handling
When debugging with MaxCompute tables:
- Be aware of data permissions and access controls
- Consider data masking for sensitive information
- Use sample/partitioned data to limit exposure
- Never commit sensitive credentials to version control
## Common Debugging Patterns
### Pattern 1: Breakpoint in Apply Function
```python
def process_row(row):
# Set breakpoint on this line
value = row['column_name']
if value > threshold:
# Set breakpoint here to inspect condition
result = transform(value)
else:
result = default_value
return result
df = md.DataFrame(sample_data)
result = df.apply(process_row, axis=1).execute().fetch()
```
### Pattern 2: Debugging Apply_Chunk for Batch Processing
```python
def process_batch(chunk):
# Set breakpoint here to inspect entire chunk
print(f"Processing batch with {len(chunk)} rows")
# Debug data types
print(f"Chunk dtypes:\n{chunk.dtypes}")
# Debug transformations
chunk['new_col'] = chunk['col1'] * 2
# Set breakpoint here to verify results
return chunk
result = df.mf.apply_chunk(
process_batch,
batch_rows=100,
output_type='dataframe'
).execute().fetch()
```
### Pattern 3: Debugging with Print Statements
```python
def debug_function(row):
print(f"Input row: {row.to_dict()}")
# Step 1
intermediate = row['col1'] + row['col2']
print(f"After step 1: {intermediate}")
# Step 2
result = intermediate * 2
print(f"Final result: {result}")
return result
# Execute with debug output
result = df.apply(debug_function, axis=1).execute().fetch()
```
## Transitioning to Production
### Steps to Deploy
1. **Test Locally**: Develop and debug with local debug mode
2. **Verify Logic**: Ensure all transformations work correctly
3. **Remove Debug Parameter**: Change `new_session(o, debug=True)` to `new_session(o)`
4. **Test on Cluster**: Run on MaxCompute with small dataset
5. **Production Deploy**: Deploy to production environment
### Code Checklist
Before deploying to production:
- [ ] Remove `debug=True` parameter from session creation
- [ ] Verify all data source paths are correct for production
- [ ] Test with production-scale data on MaxCompute
- [ ] Remove or reduce print statements used for debugging
- [ ] Add proper error handling and logging
- [ ] Verify resource quotas and permissions
## Troubleshooting
### Issue: IDE Breakpoints Not Triggering
**Possible Causes:**
- Session created without `debug=True`
- Using incompatible IDE or debugger
- Code not actually executing through apply/apply_chunk
**Solutions:**
- Verify `debug=True` in `new_session()`
- Ensure you're using a supported IDE (PyCharm, VSCode)
- Check that `.execute()` is called to trigger execution
### Issue: Local Execution Too Slow
**Possible Causes:**
- Dataset too large for local debugging
- Complex operations not optimized for local execution
**Solutions:**
- Reduce sample data size (use `df.head(100)` or sample)
- Simplify operations for debugging purposes
- Focus on specific problematic code sections
### Issue: Results Differ from Production
**Possible Causes:**
- Data differences between sample and production data
- Environmental differences (Python version, package versions)
- Distributed vs. local execution semantics
**Solutions:**
- Verify data consistency between environments
- Check Python and package versions match
- Test on MaxCompute with `debug=False` to verify
## Summary
Local debug mode provides a powerful development experience for MaxFrame UDF development:
- Zero-configuration startup with `debug=True`
- Full IDE debugging support with breakpoints
- Flexible data source options
- Seamless transition to production
- Perfect for iterative UDF development
Use local debug mode during development for rapid iteration, then switch to remote debug mode for cluster-based testing and validation.
## Resources
- **MaxFrame Context Guide**: `./maxframe-context.md` - Comprehensive MaxFrame features and workflows
- **Interactive Coding Guide**: `./remote-debug-guide.md` - Remote debug mode with logview support
- **Key Modules Reference**: `./key-modules.md` - DataFrame, Tensor, and ML operations
FILE:references/maxframe-client-docs/getting_started/comparison/index.md
# Comparison with other tools
* [Comparison with PyODPS DataFrame](pyodps_df.md)
* [Object abstraction](pyodps_df.md#object-abstraction)
* [Functions](pyodps_df.md#functions)
* [Execution](pyodps_df.md#execution)
FILE:references/maxframe-client-docs/getting_started/comparison/pyodps_df.md
# Comparison with PyODPS DataFrame
[PyODPS DataFrame](https://pyodps.readthedocs.io/en/stable/df.html) is
a DataFrame-like package provided by MaxCompute as a part of PyODPS package.
It provides capability for Python data analyzers to query MaxCompute data
with a set of operators similar to pandas. Despite the similarity in operators,
the usage between two sets of APIs are quite different. It might not be easy
for a developer to dive deep into PyODPS DataFrame with knowledge about
pandas only.
Though PyODPS DataFrame is still part of PyODPS, it is recommended to create
new applications with MaxFrame to enjoy its compatibility with pandas.
## Object abstraction
PyODPS DataFrame does not have indexes. This means that a majority of pandas
APIs with indexes cannot be used or not fully supported.
For instance, arithmetic operations in pandas relies on index alignment. That
is, two DataFrames are aligned first, and then arithmetic operation is performed.
```python
>>> series1 = pd.Series([2, 1, 3], index=[1, 2, 4])
>>> series2 = pd.Series([1, 5, 6], index=[1, 3, 4])
>>> series1 + series2
1 3.0
2 NaN
3 NaN
4 9.0
dtype: float64
```
However, when indexes are absent, this kind of operation is not supported.
To support this kind of operation, in MaxFrame, it is required to add an index
column to DataFrame or Series. If the index is absent, a default RangeIndex
is added. Therefore the statement above can be supported.
Another huge difference between PyODPS DataFrame and MaxFrame is that in PyODPS
DataFrame, representation of data objects and operators are mixed, and this
may confuse newcomers. For instance,
```python
df = o.get_table('table_name').to_df() # df is a DataFrame instance
df2 = df["col1", "col2"] # df2 is a CollectionExpr instance
```
In the second line, `df2` is an instance of `CollectionExpr` which means
it is an expression and different from a `DataFrame` instance. However, all
DataFrame functions can be applied directly onto `df2` and there is nothing
different from `DataFrame` instance.
In MaxFrame, however, data objects and operators are defined separately. Data
objects users interact with are all instances of a few data classes, namely
`DataFrame`, `Series` or `Index`. For the example above, now all
instances are DataFrame now.
```python
df = md.read_odps_table('table_name') # df is a DataFrame instance
df2 = df[["col1", "col2"]] # df2 is also a DataFrame instance
```
## Functions
Functions in PyODPS DataFrame are not fully compatible with pandas. Therefore
to write code with PyODPS DataFrame, users need to read the documents first
before start coding. However, the target of MaxFrame is to create a pandas-compatible
API. Hence there are API differences between PyODPS DataFrame and MaxFrame.
These differences are listed below. Methods starts with `mf.` mean that these non-pandas
methods are added in MaxFrame to facilitate migrating from PyODPS DataFrame to MaxFrame.
Note that you need to read API documents of these functions before rewriting your code.
| PyODPS DataFrame API | MaxFrame API |
|-------------------------------------|-------------------------------------------------|
| DataFrame.append_id | Not needed. DataFrame index is added by default |
| DataFrame.bloom_filter | Not implemented yet |
| DataFrame.boxplot | DataFrame.plot.boxplot |
| DataFrame.concat | maxframe.dataframe.concat |
| DataFrame.describe | DataFrame.describe |
| DataFrame.distinct | DataFrame.drop_duplicates |
| DataFrame.except_ | DataFrame.merge with filter |
| DataFrame.exclude | DataFrame.drop |
| DataFrame.extract_kv | Not implemented yet |
| DataFrame.hist | DataFrame.plot.hist |
| DataFrame.inner_join | DataFrame.merge |
| DataFrame.intersect | DataFrame.merge |
| DataFrame.left_join | DataFrame.merge |
| DataFrame.limit | DataFrame.head |
| DataFrame.map_reduce | DataFrame.mf.map_reduce |
| DataFrame.minmax_scale | Not implemented yet |
| DataFrame.outer_join | DataFrame.merge |
| DataFrame.persist | DataFrame.to_odps_table |
| DataFrame.reshuffle | DataFrame.mf.reshuffle |
| DataFrame.right_join | DataFrame.merge |
| DataFrame.setdiff | DataFrame.merge |
| DataFrame.split | Not implemented yet |
| DataFrame.std_scale | Not implemented yet |
| DataFrame.sort | DataFrame.sort_values |
| DataFrame.switch | maxframe.dataframe.case_when |
| DataFrame.to_kv | Not implemented yet |
| DataFrame.union | maxframe.dataframe.concat |
| DatetimeSequenceExpr.date | Series.dt.date |
| DatetimeSequenceExpr.day | Series.dt.day |
| DatetimeSequenceExpr.dayofweek | Series.dt.dayofweek |
| DatetimeSequenceExpr.dayofyear | Series.dt.dayofyear |
| DatetimeSequenceExpr.hour | Series.dt.hour |
| DatetimeSequenceExpr.is_month_end | Series.dt.is_month_end |
| DatetimeSequenceExpr.is_month_start | Series.dt.is_month_start |
| DatetimeSequenceExpr.is_year_end | Series.dt.is_year_end |
| DatetimeSequenceExpr.is_year_start | Series.dt.is_year_start |
| DatetimeSequenceExpr.microsecond | Series.dt.microsecond |
| DatetimeSequenceExpr.min | Series.dt.min |
| DatetimeSequenceExpr.minute | Series.dt.minute |
| DatetimeSequenceExpr.month | Series.dt.month |
| DatetimeSequenceExpr.second | Series.dt.second |
| DatetimeSequenceExpr.strftime | Series.dt.strftime |
| DatetimeSequenceExpr.unix_timestamp | Not implemented yet |
| DatetimeSequenceExpr.week | Series.dt.week |
| DatetimeSequenceExpr.weekday | Series.dt.weekday |
| DatetimeSequenceExpr.weekofyear | Series.dt.weekofyear |
| DatetimeSequenceExpr.year | Series.dt.year |
| SequenceExpr.degrees | np.degrees(Series) |
| SequenceExpr.radians | np.radians(Series) |
| SequenceExpr.tolist | Series.to_numpy |
| SequenceExpr.to_datetime | maxframe.dataframe.to_datetime |
| SequenceExpr.topk | Not implemented yet |
| SequenceExpr.trunc | np.trunc(Series) |
| SequenceExpr.hll_count | Not implemented yet |
| StringSequenceExpr.capitalize | Series.str.capitalize |
| StringSequenceExpr.contains | Series.str.contains |
| StringSequenceExpr.count | Series.str.count |
| StringSequenceExpr.endswith | Series.str.endswith |
| StringSequenceExpr.find | Series.str.find |
| StringSequenceExpr.len | Series.str.len |
| StringSequenceExpr.ljust | Series.str.ljust |
| StringSequenceExpr.lower | Series.str.lower |
| StringSequenceExpr.lstrip | Series.str.lstrip |
| StringSequenceExpr.pad | Series.str.pad |
| StringSequenceExpr.repeat | Series.str.repeat |
| StringSequenceExpr.replace | Series.str.replace |
| StringSequenceExpr.rfind | Series.str.rfind |
| StringSequenceExpr.rjust | Series.str.rjust |
| StringSequenceExpr.rstrip | Series.str.rstrip |
| StringSequenceExpr.slice | Series.str.slice |
| StringSequenceExpr.startswith | Series.str.startswith |
| StringSequenceExpr.strip | Series.str.strip |
| StringSequenceExpr.swapcase | Series.str.swapcase |
| StringSequenceExpr.title | Series.str.title |
| StringSequenceExpr.translate | Series.str.translate |
| StringSequenceExpr.upper | Series.str.upper |
| StringSequenceExpr.zfill | Series.str.zfill |
| StringSequenceExpr.isalnum | Series.str.isalnum |
| StringSequenceExpr.isalpha | Series.str.isalpha |
| StringSequenceExpr.isdigit | Series.str.isdigit |
| StringSequenceExpr.isspace | Series.str.isspace |
| StringSequenceExpr.islower | Series.str.islower |
| StringSequenceExpr.isupper | Series.str.isupper |
| StringSequenceExpr.istitle | Series.str.istitle |
| StringSequenceExpr.isnumeric | Series.str.isnumeric |
| StringSequenceExpr.isdecimal | Series.str.isdecimal |
## Execution
PyODPS DataFrame and MaxFrame both use lazy execution to leverage efficiency
of code optimization. However, the way to invoke these jobs is changed.
FILE:references/maxframe-client-docs/getting_started/index.md
<a id="getting-started-index"></a>
# Getting Started
* [Access and installation](installation.md)
* [Enable MaxFrame for your MaxCompute project](installation.md#enable-maxframe-for-your-maxcompute-project)
* [Install MaxFrame client locally](installation.md#install-maxframe-client-locally)
* [Access MaxFrame with DataWorks](installation.md#access-maxframe-with-dataworks)
* [Access MaxFrame with MaxCompute Notebook](installation.md#access-maxframe-with-maxcompute-notebook)
* [Overview](overview.md)
* [Getting started tutorials](tutorials/index.md)
* [10 minutes to MaxFrame](tutorials/10min.md)
* [Comparison with other tools](comparison/index.md)
* [Comparison with PyODPS DataFrame](comparison/pyodps_df.md)
FILE:references/maxframe-client-docs/getting_started/installation.md
# Access and installation
## Enable MaxFrame for your MaxCompute project
You need to setup a MaxCompute project Before using MaxFrame. Please take a look at
[here](https://www.alibabacloud.com/zh/product/maxcompute) for more information.
#### NOTE
Currently MaxFrame is under trial. If you need to enable MaxFrame for your MaxCompute
project, please [fill the form to apply for trial](https://survey.aliyun.com/apps/zhiliao/m40AIrxhA?spm=a2c4g.11186623.0.0.a69340f2mJENKJ) here.
## Install MaxFrame client locally
After created your own MaxCompute project and enabled MaxFrame, you may install
MaxFrame client with pip command:
```bash
pip install maxframe
```
Then you can create a MaxCompute table, perform some transformation with MaxFrame
and then store the result into another MaxCompute table.
```python
import maxframe.dataframe as md
from odps import ODPS
from maxframe import new_session
# create MaxCompute entrance object and test table
o = ODPS(
access_id=os.getenv('ODPS_ACCESS_ID'),
secret_access_key=os.getenv('ODPS_ACCESS_KEY'),
project='your-default-project',
endpoint='your-end-point',
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
table = o.create_table("test_source_table", "a string, b bigint")
with table.open_writer() as writer:
writer.write([
["value1", 0],
["value2", 1],
])
# create maxframe session
session = new_session(o)
# perform data transformation
df = md.read_odps_table("test_source_table")
df["a"] = "prefix_" + df["a"]
md.to_odps_table(df, "test_prefix_source_table").execute()
# destroy maxframe session
session.destroy()
```
## Access MaxFrame with DataWorks
DataWorks provides task scheduling capability for MaxCompute projects. You can schedule
and run MaxFrame job with DataWorks.
To run MaxFrame job with DataWorks, you need to create a PyODPS 3 node and write your code
inside it. PyODPS nodes are executed with embedded MaxCompute accounts and project information,
thus you may create your MaxFrame session directly.
```python
import maxframe.dataframe as md
from maxframe import new_session
# create maxframe session
session = new_session(o)
# perform data transformation
df = md.read_odps_table("test_source_table")
df["a"] = "prefix_" + df["a"]
md.to_odps_table(df, "test_prefix_source_table").execute()
# destroy maxframe session
session.destroy()
```
## Access MaxFrame with MaxCompute Notebook
[MaxCompute Notebook](https://help.aliyun.com/zh/maxcompute/user-guide/maxcompute-notebook-instruction)
also provides MaxFrame package. It also provides MaxCompute account in environment variables
in the notebook, thus account information is not needed.
```python
import maxframe.dataframe as md
from maxframe import new_session
# create MaxCompute entrance object
o = ODPS(
project='your-default-project',
endpoint='your-end-point',
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
# create maxframe session
session = new_session(o)
# perform data transformation
df = md.read_odps_table("test_source_table")
df["a"] = "prefix_" + df["a"]
md.to_odps_table(df, "test_prefix_source_table").execute()
# destroy maxframe session
session.destroy()
```
FILE:references/maxframe-client-docs/getting_started/overview.md
# Overview
MaxFrame is a framework for large-scale data computation built on MaxCompute
by Alibaba Cloud with API-compatibility for pandas. It intends to become
an inplace replacement for Python users familiar with Numpy or Pandas APIs
to utilize MaxCompute to run their code in a distributed environment.
FILE:references/maxframe-client-docs/getting_started/tutorials/10min.md
# 10 minutes to MaxFrame
Here, [movielens 100K](https://grouplens.org/datasets/movielens/100k/) is used
as an example. Assume that three tables already exist, which are `maxframe_ml_100k_movies`
(movie-related data), `maxframe_ml_100k_users` (user-related data), and
`maxframe_ml_100k_ratings` (rating-related data).
Create a MaxFrame session object before starting the following steps:
```python
import os
from odps import ODPS
from maxframe import new_session
# Make sure environment variable ODPS_ACCESS_ID already set to Access Key ID of user
# while environment variable ODPS_ACCESS_KEY set to Access Key Secret of user.
# Not recommended to hardcode Access Key ID or Access Key Secret in your code.
o = ODPS(
access_id=os.getenv('ODPS_ACCESS_ID'),
secret_access_key=os.getenv('ODPS_ACCESS_KEY'),
project='**your-project**',
endpoint='**your-endpoint**',
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
session = new_session(o)
```
You only need to use `read_odps_table` API to create a DataFrame object. For instance,
```python
import maxframe.dataframe as md
users = md.read_odps_table('pyodps_ml_100k_users')
```
View columns of DataFrame and the types of the columns through the `dtypes` attribute,
as shown in the following code:
```python
>>> users.dtypes
user_id int64
age int64
sex object
occupation object
zip_code object
dtype: object
```
Simply view the representation of the object will automatically show the first and last
rows of the DataFrame.
```python
>>> users
user_id age sex occupation zip_code
0 1 24 M technician 85711
1 2 53 F other 94043
2 3 23 M writer 32067
3 4 24 M technician 43537
4 5 33 F other 15213
...
5 6 42 M executive 98101
6 7 57 M administrator 91344
7 8 36 M administrator 05201
8 9 29 M student 01002
9 10 53 M lawyer 90703
```
You can use the head method to obtain the first N data records for easy and quick data
preview. For example:
```python
>>> users.head(10).execute().fetch()
user_id age sex occupation zip_code
0 1 24 M technician 85711
1 2 53 F other 94043
2 3 23 M writer 32067
3 4 24 M technician 43537
4 5 33 F other 15213
5 6 42 M executive 98101
6 7 57 M administrator 91344
7 8 36 M administrator 05201
8 9 29 M student 01002
9 10 53 M lawyer 90703
```
You can add a filter on the columns if you do not want to view all of them. For example:
```python
>>> users[['user_id', 'age']].head(5).execute().fetch()
user_id age
0 1 24
1 2 53
2 3 23
3 4 24
4 5 33
```
You can also drop several columns. For example:
```python
>>> users.drop(columns=['zip_code', 'age']).head(5)
user_id sex occupation
0 1 M technician
1 2 F other
2 3 M writer
3 4 M technician
4 5 F other
```
When excluding some columns, you may want to obtain new columns through computation.
For example, add the sex_bool attribute and set it to True if sex is Male. Otherwise,
set it to False. For example:
```python
>>> users = users.drop(['zip_code', 'sex'])
>>> users["sex_bool"] = users.sex == "M"
>>> users.head(5).execute().fetch()
user_id age occupation sex_bool
0 1 24 technician True
1 2 53 other False
2 3 23 writer True
3 4 24 technician True
4 5 33 other False
```
Obtain the number of persons at age of 20 to 25, as shown in the following code:
```python
>>> users[users.age.between(20, 25)].count().execute().fetch()
195
```
Obtain the numbers of male and female users, as shown in the following code:
```python
>>> users.groupby(users.sex).user_id.size()
F 273
M 670
dtype: int64
```
To divide users by job, obtain the first 10 jobs that have the largest population,
and sort the jobs in the descending order of population. See the following:
```python
>>> df = users.groupby("occupation").agg({"user_id": "count"})
>>> df.sort_values("user_id", ascending=False)[:10]
user_id
occupation
student 196
other 105
educator 95
administrator 79
engineer 67
programmer 66
librarian 51
writer 45
executive 32
scientist 31
```
DataFrame APIs provide the `value_counts` method to quickly achieve the same
result. An example is shown below.
```python
>>> uses.occupation.value_counts()[:10]
student 196
other 105
educator 95
administrator 79
engineer 67
programmer 66
librarian 51
writer 45
executive 32
scientist 31
dtype: int64
```
Show data in a more intuitive graph, as shown in the following code:
```python
%matplotlib inline
```
Use a horizontal bar chart to visualize data, as shown in the following code:
```python
>>> users['occupation'].value_counts().plot(kind='barh', x='occupation', ylabel='prefession')
<matplotlib.axes._subplots.AxesSubplot at 0x10653cfd0>
```
\_images/df-value-count-plot.png
Divide ages into 30 groups and view the histogram of age distribution,
as shown in the following code:
```python
>>> users.age.hist(bins=30, title="Distribution of users' ages", xlabel='age', ylabel='count of users')
<matplotlib.axes._subplots.AxesSubplot at 0x10667a510>
```
\_images/df-age-hist.png
Use join to join the three tables and save the joined tables as a new table. For example:
```python
>>> movies = md.read_odps_table('pyodps_ml_100k_movies')
>>> ratings = md.read_odps_table('pyodps_ml_100k_ratings')
>>>
>>> o.delete_table('pyodps_ml_100k_lens', if_exists=True)
>>> lens = movies.join(ratings).join(users).persist('pyodps_ml_100k_lens')
>>>
>>> lens.dtypes
odps.Schema {
movie_id int64
title string
release_date string
video_release_date string
imdb_url string
user_id int64
rating int64
unix_timestamp int64
age int64
sex string
occupation string
zip_code string
}
```
<!-- Divide ages of 0 to 80 into eight groups, as shown in the following code: -->
<!-- labels = ['0-9', '10-19', '20-29', '30-39', '40-49', '50-59', '60-69', '70-79'] -->
<!-- cut_lens = lens[lens, lens.age.cut(range(0, 81, 10), right=False, labels=labels).rename('age_group')] -->
<!-- View the first 10 data records of a single age in a group, as shown in the following code: -->
<!-- .. code-block:: python -->
<!-- >>> cut_lens['age_group', 'age'].distinct()[:10] -->
<!-- age_group age -->
<!-- 0 0-9 7 -->
<!-- 1 10-19 10 -->
<!-- 2 10-19 11 -->
<!-- 3 10-19 13 -->
<!-- 4 10-19 14 -->
<!-- 5 10-19 15 -->
<!-- 6 10-19 16 -->
<!-- 7 10-19 17 -->
<!-- 8 10-19 18 -->
<!-- 9 10-19 19 -->
<!-- View users’ total rating and average rating of each age group, as shown in the following code: -->
<!-- cut_lens.groupby('age_group').agg(cut_lens.rating.count().rename('total_rating'), cut_lens.rating.mean().rename('avg_rating')) -->
<!-- age_group avg_rating total_rating -->
<!-- 0 0-9 3.767442 43 -->
<!-- 1 10-19 3.486126 8181 -->
<!-- 2 20-29 3.467333 39535 -->
<!-- 3 30-39 3.554444 25696 -->
<!-- 4 40-49 3.591772 15021 -->
<!-- 5 50-59 3.635800 8704 -->
<!-- 6 60-69 3.648875 2623 -->
<!-- 7 70-79 3.649746 197 -->
FILE:references/maxframe-client-docs/getting_started/tutorials/index.md
# Getting started tutorials
* [10 minutes to MaxFrame](10min.md)
FILE:references/maxframe-client-docs/index.md
<a id="index"></a>
# MaxFrame Documentation
MaxFrame is a framework for large-scale data computation built on MaxCompute
by Alibaba Cloud with API-compatibility for pandas. It intends to become
an inplace replacement for Python users familiar with Numpy or Pandas APIs
to utilize MaxCompute to run their code in a distributed environment.
FILE:references/maxframe-client-docs/reference/dataframe/frame.md
<a id="generated-dataframe"></a>
# DataFrame
## Constructor
| [`DataFrame`](generated/maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)([data, index, columns, dtype, ...]) | |
|-----------------------------------------------------------------------------------------------------------------------------|----|
## Attributes and underlying data
**Axes**
| [`DataFrame.index`](generated/maxframe.dataframe.DataFrame.index.md#maxframe.dataframe.DataFrame.index) | |
|---------------------------------------------------------------------------------------------------------------|----|
| [`DataFrame.columns`](generated/maxframe.dataframe.DataFrame.columns.md#maxframe.dataframe.DataFrame.columns) | |
| [`DataFrame.dtypes`](generated/maxframe.dataframe.DataFrame.dtypes.md#maxframe.dataframe.DataFrame.dtypes) | Return the dtypes in the DataFrame. |
|-----------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
| [`DataFrame.memory_usage`](generated/maxframe.dataframe.DataFrame.memory_usage.md#maxframe.dataframe.DataFrame.memory_usage)([index, deep]) | Return the memory usage of each column in bytes. |
| [`DataFrame.ndim`](generated/maxframe.dataframe.DataFrame.ndim.md#maxframe.dataframe.DataFrame.ndim) | Return an int representing the number of axes / array dimensions. |
| [`DataFrame.select_dtypes`](generated/maxframe.dataframe.DataFrame.select_dtypes.md#maxframe.dataframe.DataFrame.select_dtypes)([include, exclude]) | Return a subset of the DataFrame's columns based on the column dtypes. |
| [`DataFrame.shape`](generated/maxframe.dataframe.DataFrame.shape.md#maxframe.dataframe.DataFrame.shape) | |
## Conversion
| [`DataFrame.astype`](generated/maxframe.dataframe.DataFrame.astype.md#maxframe.dataframe.DataFrame.astype)(dtype[, copy, errors]) | Cast a pandas object to a specified dtype `dtype`. |
|----------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|
| [`DataFrame.convert_dtypes`](generated/maxframe.dataframe.DataFrame.convert_dtypes.md#maxframe.dataframe.DataFrame.convert_dtypes)([infer_objects, ...]) | Convert columns to best possible dtypes using dtypes supporting `pd.NA`. |
| [`DataFrame.copy`](generated/maxframe.dataframe.DataFrame.copy.md#maxframe.dataframe.DataFrame.copy)() | |
| [`DataFrame.infer_objects`](generated/maxframe.dataframe.DataFrame.infer_objects.md#maxframe.dataframe.DataFrame.infer_objects)([copy]) | Attempt to infer better dtypes for object columns. |
## Indexing, iteration
| [`DataFrame.at`](generated/maxframe.dataframe.DataFrame.at.md#maxframe.dataframe.DataFrame.at) | Access a single value for a row/column label pair. |
|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| [`DataFrame.head`](generated/maxframe.dataframe.DataFrame.head.md#maxframe.dataframe.DataFrame.head)([n]) | Return the first n rows. |
| [`DataFrame.iat`](generated/maxframe.dataframe.DataFrame.iat.md#maxframe.dataframe.DataFrame.iat) | Access a single value for a row/column pair by integer position. |
| [`DataFrame.iloc`](generated/maxframe.dataframe.DataFrame.iloc.md#maxframe.dataframe.DataFrame.iloc) | Purely integer-location based indexing for selection by position. |
| [`DataFrame.insert`](generated/maxframe.dataframe.DataFrame.insert.md#maxframe.dataframe.DataFrame.insert)(loc, column, value[, ...]) | Insert column into DataFrame at specified location. |
| [`DataFrame.loc`](generated/maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc) | Access a group of rows and columns by label(s) or a boolean array. |
| [`DataFrame.mask`](generated/maxframe.dataframe.DataFrame.mask.md#maxframe.dataframe.DataFrame.mask)(cond[, other, inplace, axis, ...]) | Replace values where the condition is True. |
| [`DataFrame.pop`](generated/maxframe.dataframe.DataFrame.pop.md#maxframe.dataframe.DataFrame.pop)(item) | Return item and drop from frame. |
| [`DataFrame.query`](generated/maxframe.dataframe.DataFrame.query.md#maxframe.dataframe.DataFrame.query)(expr[, inplace]) | Query the columns of a DataFrame with a boolean expression. |
| [`DataFrame.tail`](generated/maxframe.dataframe.DataFrame.tail.md#maxframe.dataframe.DataFrame.tail)([n]) | Return the last n rows. |
| [`DataFrame.xs`](generated/maxframe.dataframe.DataFrame.xs.md#maxframe.dataframe.DataFrame.xs)(key[, axis, level, drop_level]) | Return cross-section from the Series/DataFrame. |
| [`DataFrame.where`](generated/maxframe.dataframe.DataFrame.where.md#maxframe.dataframe.DataFrame.where)(cond[, other, inplace, ...]) | Replace values where the condition is False. |
## Binary operator functions
| [`DataFrame.add`](generated/maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)(other[, axis, level, fill_value]) | Get Addition of dataframe and other, element-wise (binary operator add). |
|------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
| [`DataFrame.sub`](generated/maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)(other[, axis, level, fill_value]) | Get Subtraction of dataframe and other, element-wise (binary operator subtract). |
| [`DataFrame.mul`](generated/maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)(other[, axis, level, fill_value]) | Get Multiplication of dataframe and other, element-wise (binary operator mul). |
| [`DataFrame.div`](generated/maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)(other[, axis, level, fill_value]) | Get Floating division of dataframe and other, element-wise (binary operator truediv). |
| [`DataFrame.truediv`](generated/maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)(other[, axis, level, ...]) | Get Floating division of dataframe and other, element-wise (binary operator truediv). |
| [`DataFrame.floordiv`](generated/maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)(other[, axis, level, ...]) | Get Integer division of dataframe and other, element-wise (binary operator floordiv). |
| [`DataFrame.mod`](generated/maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)(other[, axis, level, fill_value]) | Get Modulo of dataframe and other, element-wise (binary operator mod). |
| [`DataFrame.pow`](generated/maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)(other[, axis, level, fill_value]) | Get Exponential power of dataframe and other, element-wise (binary operator pow). |
| [`DataFrame.dot`](generated/maxframe.dataframe.DataFrame.dot.md#maxframe.dataframe.DataFrame.dot)(other) | Compute the matrix multiplication between the DataFrame and other. |
| [`DataFrame.radd`](generated/maxframe.dataframe.DataFrame.radd.md#maxframe.dataframe.DataFrame.radd)(other[, axis, level, fill_value]) | Get Addition of dataframe and other, element-wise (binary operator radd). |
| [`DataFrame.rsub`](generated/maxframe.dataframe.DataFrame.rsub.md#maxframe.dataframe.DataFrame.rsub)(other[, axis, level, fill_value]) | Get Subtraction of dataframe and other, element-wise (binary operator rsubtract). |
| [`DataFrame.rmul`](generated/maxframe.dataframe.DataFrame.rmul.md#maxframe.dataframe.DataFrame.rmul)(other[, axis, level, fill_value]) | Get Multiplication of dataframe and other, element-wise (binary operator rmul). |
| [`DataFrame.rdiv`](generated/maxframe.dataframe.DataFrame.rdiv.md#maxframe.dataframe.DataFrame.rdiv)(other[, axis, level, fill_value]) | Get Floating division of dataframe and other, element-wise (binary operator rtruediv). |
| [`DataFrame.rtruediv`](generated/maxframe.dataframe.DataFrame.rtruediv.md#maxframe.dataframe.DataFrame.rtruediv)(other[, axis, level, ...]) | Get Floating division of dataframe and other, element-wise (binary operator rtruediv). |
| [`DataFrame.rfloordiv`](generated/maxframe.dataframe.DataFrame.rfloordiv.md#maxframe.dataframe.DataFrame.rfloordiv)(other[, axis, level, ...]) | Get Integer division of dataframe and other, element-wise (binary operator rfloordiv). |
| [`DataFrame.rmod`](generated/maxframe.dataframe.DataFrame.rmod.md#maxframe.dataframe.DataFrame.rmod)(other[, axis, level, fill_value]) | Get Modulo of dataframe and other, element-wise (binary operator rmod). |
| [`DataFrame.rpow`](generated/maxframe.dataframe.DataFrame.rpow.md#maxframe.dataframe.DataFrame.rpow)(other[, axis, level, fill_value]) | Get Exponential power of dataframe and other, element-wise (binary operator rpow). |
| [`DataFrame.lt`](generated/maxframe.dataframe.DataFrame.lt.md#maxframe.dataframe.DataFrame.lt)(other[, axis, level, fill_value]) | Get Less than of dataframe and other, element-wise (binary operator lt). |
| [`DataFrame.gt`](generated/maxframe.dataframe.DataFrame.gt.md#maxframe.dataframe.DataFrame.gt)(other[, axis, level, fill_value]) | Get Greater than of dataframe and other, element-wise (binary operator gt). |
| [`DataFrame.le`](generated/maxframe.dataframe.DataFrame.le.md#maxframe.dataframe.DataFrame.le)(other[, axis, level, fill_value]) | Get Less than or equal to of dataframe and other, element-wise (binary operator le). |
| [`DataFrame.ge`](generated/maxframe.dataframe.DataFrame.ge.md#maxframe.dataframe.DataFrame.ge)(other[, axis, level, fill_value]) | Get Greater than or equal to of dataframe and other, element-wise (binary operator ge). |
| [`DataFrame.ne`](generated/maxframe.dataframe.DataFrame.ne.md#maxframe.dataframe.DataFrame.ne)(other[, axis, level, fill_value]) | Get Not equal to of dataframe and other, element-wise (binary operator ne). |
| [`DataFrame.eq`](generated/maxframe.dataframe.DataFrame.eq.md#maxframe.dataframe.DataFrame.eq)(other[, axis, level, fill_value]) | Get Equal to of dataframe and other, element-wise (binary operator eq). |
| [`DataFrame.combine`](generated/maxframe.dataframe.DataFrame.combine.md#maxframe.dataframe.DataFrame.combine)(other, func[, fill_value, ...]) | Perform column-wise combine with another DataFrame. |
| [`DataFrame.combine_first`](generated/maxframe.dataframe.DataFrame.combine_first.md#maxframe.dataframe.DataFrame.combine_first)(other) | Update null elements with value in the same location in other. |
## Function application, GroupBy & window
| [`DataFrame.apply`](generated/maxframe.dataframe.DataFrame.apply.md#maxframe.dataframe.DataFrame.apply)(func[, axis, raw, ...]) | Apply a function along an axis of the DataFrame. |
|------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| [`DataFrame.applymap`](generated/maxframe.dataframe.DataFrame.applymap.md#maxframe.dataframe.DataFrame.applymap)(func[, na_action, ...]) | Apply a function to a Dataframe elementwise. |
| [`DataFrame.agg`](generated/maxframe.dataframe.DataFrame.agg.md#maxframe.dataframe.DataFrame.agg)([func, axis]) | Aggregate using one or more operations over the specified axis. |
| [`DataFrame.aggregate`](generated/maxframe.dataframe.DataFrame.aggregate.md#maxframe.dataframe.DataFrame.aggregate)([func, axis]) | Aggregate using one or more operations over the specified axis. |
| [`DataFrame.ewm`](generated/maxframe.dataframe.DataFrame.ewm.md#maxframe.dataframe.DataFrame.ewm)([com, span, halflife, alpha, ...]) | Provide exponential weighted functions. |
| [`DataFrame.expanding`](generated/maxframe.dataframe.DataFrame.expanding.md#maxframe.dataframe.DataFrame.expanding)([min_periods, shift, ...]) | Provide expanding transformations. |
| [`DataFrame.groupby`](generated/maxframe.dataframe.DataFrame.groupby.md#maxframe.dataframe.DataFrame.groupby)([by, level, as_index, ...]) | Group DataFrame using a mapper or by a Series of columns. |
| [`DataFrame.map`](generated/maxframe.dataframe.DataFrame.map.md#maxframe.dataframe.DataFrame.map)(func[, na_action, dtypes, ...]) | Apply a function to a Dataframe elementwise. |
| [`DataFrame.rolling`](generated/maxframe.dataframe.DataFrame.rolling.md#maxframe.dataframe.DataFrame.rolling)(window[, min_periods, ...]) | Provide rolling window calculations. |
| [`DataFrame.transform`](generated/maxframe.dataframe.DataFrame.transform.md#maxframe.dataframe.DataFrame.transform)(func[, axis, dtypes, ...]) | Call `func` on self producing a DataFrame with transformed values. |
<a id="generated-dataframe-stats"></a>
## Computations / descriptive stats
| [`DataFrame.abs`](generated/maxframe.dataframe.DataFrame.abs.md#maxframe.dataframe.DataFrame.abs)() | |
|--------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| [`DataFrame.all`](generated/maxframe.dataframe.DataFrame.all.md#maxframe.dataframe.DataFrame.all)([axis, bool_only, skipna, ...]) | |
| [`DataFrame.any`](generated/maxframe.dataframe.DataFrame.any.md#maxframe.dataframe.DataFrame.any)([axis, bool_only, skipna, ...]) | |
| [`DataFrame.clip`](generated/maxframe.dataframe.DataFrame.clip.md#maxframe.dataframe.DataFrame.clip)([lower, upper, axis, inplace]) | Trim values at input threshold(s). |
| [`DataFrame.count`](generated/maxframe.dataframe.DataFrame.count.md#maxframe.dataframe.DataFrame.count)([axis, level, numeric_only]) | |
| [`DataFrame.corr`](generated/maxframe.dataframe.DataFrame.corr.md#maxframe.dataframe.DataFrame.corr)([method, min_periods]) | Compute pairwise correlation of columns, excluding NA/null values. |
| [`DataFrame.corrwith`](generated/maxframe.dataframe.DataFrame.corrwith.md#maxframe.dataframe.DataFrame.corrwith)(other[, axis, drop, method]) | Compute pairwise correlation. |
| [`DataFrame.cov`](generated/maxframe.dataframe.DataFrame.cov.md#maxframe.dataframe.DataFrame.cov)([min_periods, ddof, numeric_only]) | Compute pairwise covariance of columns, excluding NA/null values. |
| [`DataFrame.describe`](generated/maxframe.dataframe.DataFrame.describe.md#maxframe.dataframe.DataFrame.describe)([percentiles, include, ...]) | Generate descriptive statistics. |
| [`DataFrame.diff`](generated/maxframe.dataframe.DataFrame.diff.md#maxframe.dataframe.DataFrame.diff)([periods, axis]) | First discrete difference of element. |
| [`DataFrame.eval`](generated/maxframe.dataframe.DataFrame.eval.md#maxframe.dataframe.DataFrame.eval)(expr[, inplace]) | Evaluate a string describing operations on DataFrame columns. |
| [`DataFrame.max`](generated/maxframe.dataframe.DataFrame.max.md#maxframe.dataframe.DataFrame.max)([axis, skipna, level, ...]) | |
| [`DataFrame.mean`](generated/maxframe.dataframe.DataFrame.mean.md#maxframe.dataframe.DataFrame.mean)([axis, skipna, level, ...]) | |
| [`DataFrame.median`](generated/maxframe.dataframe.DataFrame.median.md#maxframe.dataframe.DataFrame.median)([axis, skipna, level, ...]) | |
| [`DataFrame.min`](generated/maxframe.dataframe.DataFrame.min.md#maxframe.dataframe.DataFrame.min)([axis, skipna, level, ...]) | |
| [`DataFrame.mode`](generated/maxframe.dataframe.DataFrame.mode.md#maxframe.dataframe.DataFrame.mode)([axis, numeric_only, dropna, ...]) | Get the mode(s) of each element along the selected axis. |
| [`DataFrame.nunique`](generated/maxframe.dataframe.DataFrame.nunique.md#maxframe.dataframe.DataFrame.nunique)([axis, dropna]) | Count distinct observations over requested axis. |
| [`DataFrame.pct_change`](generated/maxframe.dataframe.DataFrame.pct_change.md#maxframe.dataframe.DataFrame.pct_change)([periods, fill_method, ...]) | Percentage change between the current and a prior element. |
| [`DataFrame.prod`](generated/maxframe.dataframe.DataFrame.prod.md#maxframe.dataframe.DataFrame.prod)([axis, skipna, level, ...]) | |
| [`DataFrame.product`](generated/maxframe.dataframe.DataFrame.product.md#maxframe.dataframe.DataFrame.product)([axis, skipna, level, ...]) | |
| [`DataFrame.quantile`](generated/maxframe.dataframe.DataFrame.quantile.md#maxframe.dataframe.DataFrame.quantile)([q, axis, numeric_only, ...]) | Return values at the given quantile over requested axis. |
| [`DataFrame.rank`](generated/maxframe.dataframe.DataFrame.rank.md#maxframe.dataframe.DataFrame.rank)([axis, method, numeric_only, ...]) | Compute numerical data ranks (1 through n) along axis. |
| [`DataFrame.round`](generated/maxframe.dataframe.DataFrame.round.md#maxframe.dataframe.DataFrame.round)([decimals]) | Round a DataFrame to a variable number of decimal places. |
| [`DataFrame.sem`](generated/maxframe.dataframe.DataFrame.sem.md#maxframe.dataframe.DataFrame.sem)([axis, skipna, level, ddof, ...]) | |
| [`DataFrame.std`](generated/maxframe.dataframe.DataFrame.std.md#maxframe.dataframe.DataFrame.std)([axis, skipna, level, ddof, ...]) | |
| [`DataFrame.sum`](generated/maxframe.dataframe.DataFrame.sum.md#maxframe.dataframe.DataFrame.sum)([axis, skipna, level, ...]) | |
| [`DataFrame.value_counts`](generated/maxframe.dataframe.DataFrame.value_counts.md#maxframe.dataframe.DataFrame.value_counts)([subset, normalize, ...]) | |
| [`DataFrame.var`](generated/maxframe.dataframe.DataFrame.var.md#maxframe.dataframe.DataFrame.var)([axis, skipna, level, ddof, ...]) | |
## Reindexing / selection / label manipulation
| [`DataFrame.add_prefix`](generated/maxframe.dataframe.DataFrame.add_prefix.md#maxframe.dataframe.DataFrame.add_prefix)(prefix) | Prefix labels with string prefix. |
|------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
| [`DataFrame.add_suffix`](generated/maxframe.dataframe.DataFrame.add_suffix.md#maxframe.dataframe.DataFrame.add_suffix)(suffix) | Suffix labels with string suffix. |
| [`DataFrame.align`](generated/maxframe.dataframe.DataFrame.align.md#maxframe.dataframe.DataFrame.align)(other[, join, axis, level, ...]) | Align two objects on their axes with the specified join method. |
| [`DataFrame.at_time`](generated/maxframe.dataframe.DataFrame.at_time.md#maxframe.dataframe.DataFrame.at_time)(time[, axis]) | Select values at particular time of day (e.g., 9:30AM). |
| [`DataFrame.between_time`](generated/maxframe.dataframe.DataFrame.between_time.md#maxframe.dataframe.DataFrame.between_time)(start_time, end_time) | Select values between particular times of the day (e.g., 9:00-9:30 AM). |
| [`DataFrame.drop`](generated/maxframe.dataframe.DataFrame.drop.md#maxframe.dataframe.DataFrame.drop)([labels, axis, index, ...]) | Drop specified labels from rows or columns. |
| [`DataFrame.drop_duplicates`](generated/maxframe.dataframe.DataFrame.drop_duplicates.md#maxframe.dataframe.DataFrame.drop_duplicates)([subset, keep, ...]) | Return DataFrame with duplicate rows removed. |
| [`DataFrame.droplevel`](generated/maxframe.dataframe.DataFrame.droplevel.md#maxframe.dataframe.DataFrame.droplevel)(level[, axis]) | Return Series/DataFrame with requested index / column level(s) removed. |
| [`DataFrame.duplicated`](generated/maxframe.dataframe.DataFrame.duplicated.md#maxframe.dataframe.DataFrame.duplicated)([subset, keep, method]) | Return boolean Series denoting duplicate rows. |
| [`DataFrame.filter`](generated/maxframe.dataframe.DataFrame.filter.md#maxframe.dataframe.DataFrame.filter)([items, like, regex, axis]) | Subset the dataframe rows or columns according to the specified index labels. |
| [`DataFrame.head`](generated/maxframe.dataframe.DataFrame.head.md#maxframe.dataframe.DataFrame.head)([n]) | Return the first n rows. |
| [`DataFrame.idxmax`](generated/maxframe.dataframe.DataFrame.idxmax.md#maxframe.dataframe.DataFrame.idxmax)([axis, skipna]) | Return index of first occurrence of maximum over requested axis. |
| [`DataFrame.idxmin`](generated/maxframe.dataframe.DataFrame.idxmin.md#maxframe.dataframe.DataFrame.idxmin)([axis, skipna]) | Return index of first occurrence of minimum over requested axis. |
| [`DataFrame.reindex`](generated/maxframe.dataframe.DataFrame.reindex.md#maxframe.dataframe.DataFrame.reindex)([labels, index, columns, ...]) | Conform Series/DataFrame to new index with optional filling logic. |
| [`DataFrame.reindex_like`](generated/maxframe.dataframe.DataFrame.reindex_like.md#maxframe.dataframe.DataFrame.reindex_like)(other[, method, ...]) | Return an object with matching indices as other object. |
| [`DataFrame.rename`](generated/maxframe.dataframe.DataFrame.rename.md#maxframe.dataframe.DataFrame.rename)([mapper, index, columns, ...]) | Alter axes labels. |
| [`DataFrame.rename_axis`](generated/maxframe.dataframe.DataFrame.rename_axis.md#maxframe.dataframe.DataFrame.rename_axis)([mapper, index, ...]) | Set the name of the axis for the index or columns. |
| [`DataFrame.reset_index`](generated/maxframe.dataframe.DataFrame.reset_index.md#maxframe.dataframe.DataFrame.reset_index)([level, drop, ...]) | Reset the index, or a level of it. |
| [`DataFrame.sample`](generated/maxframe.dataframe.DataFrame.sample.md#maxframe.dataframe.DataFrame.sample)([n, frac, replace, ...]) | Return a random sample of items from an axis of object. |
| [`DataFrame.set_axis`](generated/maxframe.dataframe.DataFrame.set_axis.md#maxframe.dataframe.DataFrame.set_axis)(labels[, axis, inplace]) | Assign desired index to given axis. |
| [`DataFrame.set_index`](generated/maxframe.dataframe.DataFrame.set_index.md#maxframe.dataframe.DataFrame.set_index)(keys[, drop, append, ...]) | Set the DataFrame index using existing columns. |
| [`DataFrame.take`](generated/maxframe.dataframe.DataFrame.take.md#maxframe.dataframe.DataFrame.take)(indices[, axis]) | Return the elements in the given *positional* indices along an axis. |
| [`DataFrame.truncate`](generated/maxframe.dataframe.DataFrame.truncate.md#maxframe.dataframe.DataFrame.truncate)([before, after, axis, copy]) | Truncate a Series or DataFrame before and after some index value. |
<a id="generated-dataframe-missing"></a>
## Missing data handling
| [`DataFrame.dropna`](generated/maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)([axis, how, thresh, ...]) | Remove missing values. |
|----------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|
| [`DataFrame.fillna`](generated/maxframe.dataframe.DataFrame.fillna.md#maxframe.dataframe.DataFrame.fillna)([value, method, axis, ...]) | Fill NA/NaN values using the specified method. |
| [`DataFrame.isna`](generated/maxframe.dataframe.DataFrame.isna.md#maxframe.dataframe.DataFrame.isna)() | Detect missing values. |
| [`DataFrame.isnull`](generated/maxframe.dataframe.DataFrame.isnull.md#maxframe.dataframe.DataFrame.isnull)() | Detect missing values. |
| [`DataFrame.notna`](generated/maxframe.dataframe.DataFrame.notna.md#maxframe.dataframe.DataFrame.notna)() | Detect existing (non-missing) values. |
| [`DataFrame.notnull`](generated/maxframe.dataframe.DataFrame.notnull.md#maxframe.dataframe.DataFrame.notnull)() | Detect existing (non-missing) values. |
## Reshaping, sorting, transposing
| [`DataFrame.melt`](generated/maxframe.dataframe.DataFrame.melt.md#maxframe.dataframe.DataFrame.melt)([id_vars, value_vars, ...]) | Unpivot a DataFrame from wide to long format, optionally leaving identifiers set. |
|-------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| [`DataFrame.nlargest`](generated/maxframe.dataframe.DataFrame.nlargest.md#maxframe.dataframe.DataFrame.nlargest)(n, columns[, keep]) | Return the first n rows ordered by columns in descending order. |
| [`DataFrame.nsmallest`](generated/maxframe.dataframe.DataFrame.nsmallest.md#maxframe.dataframe.DataFrame.nsmallest)(n, columns[, keep]) | Return the first n rows ordered by columns in ascending order. |
| [`DataFrame.pivot`](generated/maxframe.dataframe.DataFrame.pivot.md#maxframe.dataframe.DataFrame.pivot)(columns[, index, values]) | Return reshaped DataFrame organized by given index / column values. |
| [`DataFrame.pivot_table`](generated/maxframe.dataframe.DataFrame.pivot_table.md#maxframe.dataframe.DataFrame.pivot_table)([values, index, ...]) | Create a spreadsheet-style pivot table as a DataFrame. |
| [`DataFrame.reorder_levels`](generated/maxframe.dataframe.DataFrame.reorder_levels.md#maxframe.dataframe.DataFrame.reorder_levels)(order[, axis]) | Rearrange index levels using input order. |
| [`DataFrame.sort_values`](generated/maxframe.dataframe.DataFrame.sort_values.md#maxframe.dataframe.DataFrame.sort_values)(by[, axis, ascending, ...]) | Sort by the values along either axis. |
| [`DataFrame.sort_index`](generated/maxframe.dataframe.DataFrame.sort_index.md#maxframe.dataframe.DataFrame.sort_index)([axis, level, ...]) | Sort object by labels (along an axis). |
| [`DataFrame.swaplevel`](generated/maxframe.dataframe.DataFrame.swaplevel.md#maxframe.dataframe.DataFrame.swaplevel)([i, j, axis]) | Swap levels i and j in a `MultiIndex`. |
| [`DataFrame.stack`](generated/maxframe.dataframe.DataFrame.stack.md#maxframe.dataframe.DataFrame.stack)([level, dropna]) | Stack the prescribed level(s) from columns to index. |
| [`DataFrame.unstack`](generated/maxframe.dataframe.DataFrame.unstack.md#maxframe.dataframe.DataFrame.unstack)([level, fill_value]) | Unstack, also known as pivot, Series with MultiIndex to produce DataFrame. |
## Combining / comparing / joining / merging
| [`DataFrame.append`](generated/maxframe.dataframe.DataFrame.append.md#maxframe.dataframe.DataFrame.append)(other[, ignore_index, ...]) | Append rows of other to the end of caller, returning a new object. |
|-------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| [`DataFrame.assign`](generated/maxframe.dataframe.DataFrame.assign.md#maxframe.dataframe.DataFrame.assign)(\*\*kwargs) | Assign new columns to a DataFrame. |
| [`DataFrame.compare`](generated/maxframe.dataframe.DataFrame.compare.md#maxframe.dataframe.DataFrame.compare)(other[, align_axis, ...]) | Compare to another DataFrame and show the differences. |
| [`DataFrame.join`](generated/maxframe.dataframe.DataFrame.join.md#maxframe.dataframe.DataFrame.join)(other[, on, how, lsuffix, ...]) | Join columns of another DataFrame. |
| [`DataFrame.merge`](generated/maxframe.dataframe.DataFrame.merge.md#maxframe.dataframe.DataFrame.merge)(right[, how, on, left_on, ...]) | Merge DataFrame or named Series objects with a database-style join. |
| [`DataFrame.update`](generated/maxframe.dataframe.DataFrame.update.md#maxframe.dataframe.DataFrame.update)(other[, join, overwrite, ...]) | Modify in place using non-NA values from another DataFrame. |
### Time series-related
| [`DataFrame.first_valid_index`](generated/maxframe.dataframe.DataFrame.first_valid_index.md#maxframe.dataframe.DataFrame.first_valid_index)() | Return index for first non-NA value or None, if no non-NA value is found. |
|-------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| [`DataFrame.last_valid_index`](generated/maxframe.dataframe.DataFrame.last_valid_index.md#maxframe.dataframe.DataFrame.last_valid_index)() | Return index for last non-NA value or None, if no non-NA value is found. |
| [`DataFrame.shift`](generated/maxframe.dataframe.DataFrame.shift.md#maxframe.dataframe.DataFrame.shift)([periods, freq, axis, ...]) | Shift index by desired number of periods with an optional time freq. |
| [`DataFrame.tshift`](generated/maxframe.dataframe.DataFrame.tshift.md#maxframe.dataframe.DataFrame.tshift)([periods, freq, axis]) | Shift the time index, using the index's frequency if available. |
<a id="generated-dataframe-plotting"></a>
## Plotting
`DataFrame.plot` is both a callable method and a namespace attribute for
specific plotting methods of the form `DataFrame.plot.<kind>`.
| [`DataFrame.plot`](generated/maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot) | alias of `DataFramePlotAccessor` |
|--------------------------------------------------------------------------------------------------------|------------------------------------|
| [`DataFrame.plot.area`](generated/maxframe.dataframe.DataFrame.plot.area.md#maxframe.dataframe.DataFrame.plot.area)(\*args, \*\*kwargs) | Draw a stacked area plot. |
|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
| [`DataFrame.plot.bar`](generated/maxframe.dataframe.DataFrame.plot.bar.md#maxframe.dataframe.DataFrame.plot.bar)(\*args, \*\*kwargs) | Vertical bar plot. |
| [`DataFrame.plot.barh`](generated/maxframe.dataframe.DataFrame.plot.barh.md#maxframe.dataframe.DataFrame.plot.barh)(\*args, \*\*kwargs) | Make a horizontal bar plot. |
| [`DataFrame.plot.box`](generated/maxframe.dataframe.DataFrame.plot.box.md#maxframe.dataframe.DataFrame.plot.box)(\*args, \*\*kwargs) | Make a box plot of the DataFrame columns. |
| [`DataFrame.plot.density`](generated/maxframe.dataframe.DataFrame.plot.density.md#maxframe.dataframe.DataFrame.plot.density)(\*args, \*\*kwargs) | Generate Kernel Density Estimate plot using Gaussian kernels. |
| [`DataFrame.plot.hexbin`](generated/maxframe.dataframe.DataFrame.plot.hexbin.md#maxframe.dataframe.DataFrame.plot.hexbin)(\*args, \*\*kwargs) | Generate a hexagonal binning plot. |
| [`DataFrame.plot.hist`](generated/maxframe.dataframe.DataFrame.plot.hist.md#maxframe.dataframe.DataFrame.plot.hist)(\*args, \*\*kwargs) | Draw one histogram of the DataFrame's columns. |
| [`DataFrame.plot.kde`](generated/maxframe.dataframe.DataFrame.plot.kde.md#maxframe.dataframe.DataFrame.plot.kde)(\*args, \*\*kwargs) | Generate Kernel Density Estimate plot using Gaussian kernels. |
| [`DataFrame.plot.line`](generated/maxframe.dataframe.DataFrame.plot.line.md#maxframe.dataframe.DataFrame.plot.line)(\*args, \*\*kwargs) | Plot Series or DataFrame as lines. |
| [`DataFrame.plot.pie`](generated/maxframe.dataframe.DataFrame.plot.pie.md#maxframe.dataframe.DataFrame.plot.pie)(\*args, \*\*kwargs) | Generate a pie plot. |
| [`DataFrame.plot.scatter`](generated/maxframe.dataframe.DataFrame.plot.scatter.md#maxframe.dataframe.DataFrame.plot.scatter)(\*args, \*\*kwargs) | Create a scatter plot with varying marker point size and color. |
<a id="generated-dataframe-io"></a>
## Serialization / IO / conversion
| [`DataFrame.from_dict`](generated/maxframe.dataframe.DataFrame.from_dict.md#maxframe.dataframe.DataFrame.from_dict)(data[, orient, dtype, ...]) | Construct DataFrame from dict of array-like or dicts. |
|----------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| [`DataFrame.from_records`](generated/maxframe.dataframe.DataFrame.from_records.md#maxframe.dataframe.DataFrame.from_records)(data[, index, ...]) | Convert structured or record ndarray to DataFrame. |
| [`DataFrame.to_clipboard`](generated/maxframe.dataframe.DataFrame.to_clipboard.md#maxframe.dataframe.DataFrame.to_clipboard)(\*[, excel, sep, ...]) | Copy object to the system clipboard. |
| [`DataFrame.to_csv`](generated/maxframe.dataframe.DataFrame.to_csv.md#maxframe.dataframe.DataFrame.to_csv)(path[, sep, na_rep, ...]) | Write object to a comma-separated values (csv) file. |
| [`DataFrame.to_dict`](generated/maxframe.dataframe.DataFrame.to_dict.md#maxframe.dataframe.DataFrame.to_dict)([orient, into, index, ...]) | Convert the DataFrame to a dictionary. |
| [`DataFrame.to_json`](generated/maxframe.dataframe.DataFrame.to_json.md#maxframe.dataframe.DataFrame.to_json)([path, orient, ...]) | Convert the object to a JSON string. |
| [`DataFrame.to_odps_table`](generated/maxframe.dataframe.DataFrame.to_odps_table.md#maxframe.dataframe.DataFrame.to_odps_table)(table[, partition, ...]) | Write DataFrame object into a MaxCompute (ODPS) table. |
| [`DataFrame.to_pandas`](generated/maxframe.dataframe.DataFrame.to_pandas.md#maxframe.dataframe.DataFrame.to_pandas)([session]) | |
| [`DataFrame.to_parquet`](generated/maxframe.dataframe.DataFrame.to_parquet.md#maxframe.dataframe.DataFrame.to_parquet)(path[, engine, ...]) | Write a DataFrame to the binary parquet format, each chunk will be written to a Parquet file. |
<a id="generated-dataframe-mf"></a>
## MaxFrame Extensions
| [`DataFrame.mf.apply_chunk`](generated/maxframe.dataframe.DataFrame.mf.apply_chunk.md#maxframe.dataframe.DataFrame.mf.apply_chunk)(func[, batch_rows, ...]) | Apply a function that takes pandas DataFrame and outputs pandas DataFrame/Series. |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| [`DataFrame.mf.collect_kv`](generated/maxframe.dataframe.DataFrame.mf.collect_kv.md#maxframe.dataframe.DataFrame.mf.collect_kv)([columns, kv_delim, ...]) | Merge values in specified columns into a key-value represented column. |
| [`DataFrame.mf.extract_kv`](generated/maxframe.dataframe.DataFrame.mf.extract_kv.md#maxframe.dataframe.DataFrame.mf.extract_kv)([columns, kv_delim, ...]) | Extract values in key-value represented columns into standalone columns. |
| [`DataFrame.mf.flatmap`](generated/maxframe.dataframe.DataFrame.mf.flatmap.md#maxframe.dataframe.DataFrame.mf.flatmap)(func[, dtypes, raw, args]) | Apply the given function to each row and then flatten results. |
| [`DataFrame.mf.map_reduce`](generated/maxframe.dataframe.DataFrame.mf.map_reduce.md#maxframe.dataframe.DataFrame.mf.map_reduce)([mapper, reducer, ...]) | Map-reduce API over certain DataFrames. |
| [`DataFrame.mf.rebalance`](generated/maxframe.dataframe.DataFrame.mf.rebalance.md#maxframe.dataframe.DataFrame.mf.rebalance)([axis, factor, ...]) | Make data more balanced across entire cluster. |
| [`DataFrame.mf.reshuffle`](generated/maxframe.dataframe.DataFrame.mf.reshuffle.md#maxframe.dataframe.DataFrame.mf.reshuffle)([group_by, sort_by, ...]) | Shuffle data in DataFrame or Series to make data distribution more randomized. |
`DataFrame.mf` provides methods unique to MaxFrame. These methods are collated from application
scenarios in MaxCompute and these can be accessed like `DataFrame.mf.<function/property>`.
FILE:references/maxframe-client-docs/reference/dataframe/general_functions.md
<a id="generated-general-functions"></a>
# General functions
## Data manipulations
| [`concat`](generated/maxframe.dataframe.concat.md#maxframe.dataframe.concat)(objs[, axis, join, ignore_index, ...]) | Concatenate dataframe objects along a particular axis with optional set logic along the other axes. |
|------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|
| [`factorize`](generated/maxframe.dataframe.factorize.md#maxframe.dataframe.factorize)(values[, sort, use_na_sentinel]) | Encode the object as an enumerated type or categorical variable. |
| [`get_dummies`](generated/maxframe.dataframe.get_dummies.md#maxframe.dataframe.get_dummies)(data[, prefix, prefix_sep, ...]) | Convert categorical variable into dummy/indicator variables. |
| [`merge`](generated/maxframe.dataframe.merge.md#maxframe.dataframe.merge)(df, right[, how, on, left_on, ...]) | Merge DataFrame or named Series objects with a database-style join. |
## Top-level missing data
| [`isna`](generated/maxframe.dataframe.isna.md#maxframe.dataframe.isna)(obj) | Detect missing values. |
|--------------------------------------------------------------------------------------|---------------------------------------|
| [`isnull`](generated/maxframe.dataframe.isnull.md#maxframe.dataframe.isnull)(obj) | Detect missing values. |
| [`notna`](generated/maxframe.dataframe.notna.md#maxframe.dataframe.notna)(obj) | Detect existing (non-missing) values. |
| [`notnull`](generated/maxframe.dataframe.notnull.md#maxframe.dataframe.notnull)(obj) | Detect existing (non-missing) values. |
### Top-level dealing with numeric data
| [`to_numeric`](generated/maxframe.dataframe.to_numeric.md#maxframe.dataframe.to_numeric)(arg[, errors, downcast]) | Convert argument to a numeric type. |
|---------------------------------------------------------------------------------------------------------------------|---------------------------------------|
## Top-level dealing with datetimelike
| [`to_datetime`](generated/maxframe.dataframe.to_datetime.md#maxframe.dataframe.to_datetime)(arg[, errors, dayfirst, ...]) | Convert argument to datetime. |
|--------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|
| [`date_range`](generated/maxframe.dataframe.date_range.md#maxframe.dataframe.date_range)([start, end, periods, freq, tz, ...]) | Return a fixed frequency DatetimeIndex. |
## Top-level evaluation
| [`eval`](generated/maxframe.dataframe.eval.md#maxframe.dataframe.eval)(expr[, parser, engine, local_dict, ...]) | Evaluate a Python expression as a string using various backends. |
|-------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.abs.md
# maxframe.dataframe.DataFrame.abs
#### DataFrame.abs()
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.add.md
# maxframe.dataframe.DataFrame.add
#### DataFrame.add(other, axis='columns', level=None, fill_value=None)
Get Addition of dataframe and other, element-wise (binary operator add).
Equivalent to `+`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, radd.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.add_prefix.md
# maxframe.dataframe.DataFrame.add_prefix
#### DataFrame.add_prefix(prefix)
Prefix labels with string prefix.
For Series, the row labels are prefixed.
For DataFrame, the column labels are prefixed.
* **Parameters:**
**prefix** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The string to add before each label.
* **Returns:**
New Series or DataFrame with updated labels.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`Series.add_suffix`](maxframe.dataframe.Series.add_suffix.md#maxframe.dataframe.Series.add_suffix)
: Suffix row labels with string suffix.
[`DataFrame.add_suffix`](maxframe.dataframe.DataFrame.add_suffix.md#maxframe.dataframe.DataFrame.add_suffix)
: Suffix column labels with string suffix.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3, 4])
>>> s.execute()
0 1
1 2
2 3
3 4
dtype: int64
```
```pycon
>>> s.add_prefix('item_').execute()
item_0 1
item_1 2
item_2 3
item_3 4
dtype: int64
```
```pycon
>>> df = md.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df.execute()
A B
0 1 3
1 2 4
2 3 5
3 4 6
```
```pycon
>>> df.add_prefix('col_').execute()
col_A col_B
0 1 3
1 2 4
2 3 5
3 4 6
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.add_suffix.md
# maxframe.dataframe.DataFrame.add_suffix
#### DataFrame.add_suffix(suffix)
Suffix labels with string suffix.
For Series, the row labels are suffixed.
For DataFrame, the column labels are suffixed.
* **Parameters:**
**suffix** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The string to add after each label.
* **Returns:**
New Series or DataFrame with updated labels.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`Series.add_prefix`](maxframe.dataframe.Series.add_prefix.md#maxframe.dataframe.Series.add_prefix)
: Suffix row labels with string prefix.
[`DataFrame.add_prefix`](maxframe.dataframe.DataFrame.add_prefix.md#maxframe.dataframe.DataFrame.add_prefix)
: Suffix column labels with string prefix.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3, 4])
>>> s.execute()
0 1
1 2
2 3
3 4
dtype: int64
```
```pycon
>>> s.add_prefix('_item').execute()
0_item 1
1_item 2
2_item 3
3_item 4
dtype: int64
```
```pycon
>>> df = md.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df.execute()
A B
0 1 3
1 2 4
2 3 5
3 4 6
```
```pycon
>>> df.add_prefix('_col').execute()
A_col B_col
0 1 3
1 2 4
2 3 5
3 4 6
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.agg.md
# maxframe.dataframe.DataFrame.agg
#### DataFrame.agg(func=None, axis=0, \*\*kw)
Aggregate using one or more operations over the specified axis.
* **Parameters:**
* **df** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – Object to aggregate.
* **func** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) – Function to use for aggregating the data.
* **axis** ( *{0* *or* *‘index’* *,* *1* *or* *‘columns’}* *,* *default 0*) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
* **kw** – Keyword arguments to pass to func.
* **Returns:**
The return can be:
* scalar : when Series.agg is called with single function
* Series : when DataFrame.agg is called with a single function
* DataFrame : when DataFrame.agg is called with several functions
* **Return type:**
scalar, [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[1, 2, 3],
... [4, 5, 6],
... [7, 8, 9],
... [np.nan, np.nan, np.nan]],
... columns=['A', 'B', 'C']).execute()
```
Aggregate these functions over the rows.
```pycon
>>> df.agg(['sum', 'min']).execute()
A B C
min 1.0 2.0 3.0
sum 12.0 15.0 18.0
```
Different aggregations per column.
```pycon
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']}).execute()
A B
max NaN 8.0
min 1.0 2.0
sum 12.0 NaN
```
Aggregate different functions over the columns and rename the index of the resulting DataFrame.
```pycon
>>> df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean')).execute()
A B C
x 7.0 NaN NaN
y NaN 2.0 NaN
z NaN NaN 6.0
```
```pycon
>>> s = md.Series([1, 2, 3, 4])
>>> s.agg('min').execute()
1
```
```pycon
>>> s.agg(['min', 'max']).execute()
max 4
min 1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.aggregate.md
# maxframe.dataframe.DataFrame.aggregate
#### DataFrame.aggregate(func=None, axis=0, \*\*kw)
Aggregate using one or more operations over the specified axis.
* **Parameters:**
* **df** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – Object to aggregate.
* **func** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) – Function to use for aggregating the data.
* **axis** ( *{0* *or* *‘index’* *,* *1* *or* *‘columns’}* *,* *default 0*) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
* **kw** – Keyword arguments to pass to func.
* **Returns:**
The return can be:
* scalar : when Series.agg is called with single function
* Series : when DataFrame.agg is called with a single function
* DataFrame : when DataFrame.agg is called with several functions
* **Return type:**
scalar, [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[1, 2, 3],
... [4, 5, 6],
... [7, 8, 9],
... [np.nan, np.nan, np.nan]],
... columns=['A', 'B', 'C']).execute()
```
Aggregate these functions over the rows.
```pycon
>>> df.agg(['sum', 'min']).execute()
A B C
min 1.0 2.0 3.0
sum 12.0 15.0 18.0
```
Different aggregations per column.
```pycon
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']}).execute()
A B
max NaN 8.0
min 1.0 2.0
sum 12.0 NaN
```
Aggregate different functions over the columns and rename the index of the resulting DataFrame.
```pycon
>>> df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean')).execute()
A B C
x 7.0 NaN NaN
y NaN 2.0 NaN
z NaN NaN 6.0
```
```pycon
>>> s = md.Series([1, 2, 3, 4])
>>> s.agg('min').execute()
1
```
```pycon
>>> s.agg(['min', 'max']).execute()
max 4
min 1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.align.md
# maxframe.dataframe.DataFrame.align
#### DataFrame.align(other, join: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'outer', axis: [int](https://docs.python.org/3/library/functions.html#int) | [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, level: [int](https://docs.python.org/3/library/functions.html#int) | [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, copy: [bool](https://docs.python.org/3/library/functions.html#bool) = True, fill_value: [Any](https://docs.python.org/3/library/typing.html#typing.Any) = None, method: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, limit: [int](https://docs.python.org/3/library/functions.html#int) | [None](https://docs.python.org/3/library/constants.html#None) = None, fill_axis: [int](https://docs.python.org/3/library/functions.html#int) | [str](https://docs.python.org/3/library/stdtypes.html#str) = 0, broadcast_axis: [int](https://docs.python.org/3/library/functions.html#int) | [str](https://docs.python.org/3/library/stdtypes.html#str) = None)
Align two objects on their axes with the specified join method.
Join method is specified for each axis Index.
* **Parameters:**
* **other** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *or* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series))
* **join** ( *{'outer'* *,* *'inner'* *,* *'left'* *,* *'right'}* *,* *default 'outer'*)
* **axis** (*allowed axis* *of* *the other object* *,* *default None*) – Align on index (0), columns (1), or both (None).
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *level name* *,* *default None*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Always returns new objects. If copy=False and no reindexing is
required then original objects are returned.
* **fill_value** (*scalar* *,* *default np.NaN*) – Value to use for missing values. Defaults to NaN, but can be any
“compatible” value.
* **method** ( *{'backfill'* *,* *'bfill'* *,* *'pad'* *,* *'ffill'* *,* *None}* *,* *default None*) –
Method to use for filling holes in reindexed Series:
- pad / ffill: propagate last valid observation forward to next valid.
- backfill / bfill: use NEXT valid observation to fill gap.
* **limit** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled. Must be greater than 0 if not None.
* **fill_axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – Filling axis, method and limit.
* **broadcast_axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default None*) – Broadcast values along this axis, if aligning two objects of
different dimensions.
### Notes
Currently argument level is not supported.
* **Returns:**
**(left, right)** – Aligned objects.
* **Return type:**
([DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame), [type](https://docs.python.org/3/library/functions.html#type) of other)
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(
... [[1, 2, 3, 4], [6, 7, 8, 9]], columns=["D", "B", "E", "A"], index=[1, 2]
... )
>>> other = md.DataFrame(
... [[10, 20, 30, 40], [60, 70, 80, 90], [600, 700, 800, 900]],
... columns=["A", "B", "C", "D"],
... index=[2, 3, 4],
... )
>>> df.execute()
D B E A
1 1 2 3 4
2 6 7 8 9
>>> other.execute()
A B C D
2 10 20 30 40
3 60 70 80 90
4 600 700 800 900
```
Align on columns:
```pycon
>>> left, right = df.align(other, join="outer", axis=1)
>>> left.execute()
A B C D E
1 4 2 NaN 1 3
2 9 7 NaN 6 8
>>> right.execute()
A B C D E
2 10 20 30 40 NaN
3 60 70 80 90 NaN
4 600 700 800 900 NaN
```
We can also align on the index:
```pycon
>>> left, right = df.align(other, join="outer", axis=0)
>>> left.execute()
D B E A
1 1.0 2.0 3.0 4.0
2 6.0 7.0 8.0 9.0
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
>>> right.execute()
A B C D
1 NaN NaN NaN NaN
2 10.0 20.0 30.0 40.0
3 60.0 70.0 80.0 90.0
4 600.0 700.0 800.0 900.0
```
Finally, the default axis=None will align on both index and columns:
```pycon
>>> left, right = df.align(other, join="outer", axis=None)
>>> left.execute()
A B C D E
1 4.0 2.0 NaN 1.0 3.0
2 9.0 7.0 NaN 6.0 8.0
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
>>> right.execute()
A B C D E
1 NaN NaN NaN NaN NaN
2 10.0 20.0 30.0 40.0 NaN
3 60.0 70.0 80.0 90.0 NaN
4 600.0 700.0 800.0 900.0 NaN
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.all.md
# maxframe.dataframe.DataFrame.all
#### DataFrame.all(axis=0, bool_only=None, skipna=True, level=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.any.md
# maxframe.dataframe.DataFrame.any
#### DataFrame.any(axis=0, bool_only=None, skipna=True, level=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.append.md
# maxframe.dataframe.DataFrame.append
#### DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=False)
Append rows of other to the end of caller, returning a new object.
Columns in other that are not in the caller are added as new columns.
* **Parameters:**
* **other** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *or* *Series/dict-like object* *, or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *these*) – The data to append.
* **ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
* **verify_integrity** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, raise ValueError on creating index with duplicates.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Sort columns if the columns of self and other are not aligned.
* **Returns:**
A new DataFrame consisting of the rows of caller and the rows of other.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`concat`](maxframe.dataframe.concat.md#maxframe.dataframe.concat)
: General function to concatenate DataFrame or Series objects.
### Notes
If a list of dict/series is passed and the keys are all contained in
the DataFrame’s index, the order of the columns in the resulting
DataFrame will be unchanged.
Iteratively appending rows to a DataFrame can be more computationally
intensive than a single concatenate. A better solution is to append
those rows to a list and then concatenate the list with the original
DataFrame all at once.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[1, 2], [3, 4]], columns=list('AB'), index=['x', 'y'])
>>> df.execute()
A B
x 1 2
y 3 4
>>> df2 = md.DataFrame([[5, 6], [7, 8]], columns=list('AB'), index=['x', 'y'])
>>> df.append(df2).execute()
A B
x 1 2
y 3 4
x 5 6
y 7 8
```
With ignore_index set to True:
```pycon
>>> df.append(df2, ignore_index=True).execute()
A B
0 1 2
1 3 4
2 5 6
3 7 8
```
The following, while not recommended methods for generating DataFrames,
show two ways to generate a DataFrame from multiple data sources.
Less efficient:
```pycon
>>> df = md.DataFrame(columns=['A'])
>>> for i in range(5):
... df = df.append({'A': i}, ignore_index=True)
>>> df.execute()
A
0 0
1 1
2 2
3 3
4 4
```
More efficient:
```pycon
>>> md.concat([md.DataFrame([i], columns=['A']) for i in range(5)],
... ignore_index=True).execute()
A
0 0
1 1
2 2
3 3
4 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.apply.md
# maxframe.dataframe.DataFrame.apply
#### DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), dtypes=None, dtype=None, name=None, output_type=None, index=None, elementwise=None, skip_infer=False, \*\*kwds)
Apply a function along an axis of the DataFrame.
Objects passed to the function are Series objects whose index is
either the DataFrame’s index (`axis=0`) or the DataFrame’s columns
(`axis=1`). By default (`result_type=None`), the final return type
is inferred from the return type of the applied function. Otherwise,
it depends on the result_type argument.
* **Parameters:**
* **func** (*function*) – Function to apply to each column or row.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) –
Axis along which the function is applied:
* 0 or ‘index’: apply function to each column.
* 1 or ‘columns’: apply function to each row.
* **raw** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) –
Determines if row or column is passed as a Series or ndarray object:
* `False` : passes each row or column as a Series to the
function.
* `True` : the passed function will receive ndarray objects
instead.
If you are just applying a NumPy reduction function this will
achieve much better performance.
* **result_type** ( *{'expand'* *,* *'reduce'* *,* *'broadcast'* *,* *None}* *,* *default None*) –
These only act when `axis=1` (columns):
* ’expand’ : list-like results will be turned into columns.
* ’reduce’ : returns a Series if possible rather than expanding
list-like results. This is the opposite of ‘expand’.
* ’broadcast’ : results will be broadcast to the original shape
of the DataFrame, the original index and columns will be
retained.
The default behaviour (None) depends on the return value of the
applied function: list-like results will be returned as a Series
of those. However if the apply function returns a Series these
are expanded to columns.
* **output_type** ( *{'dataframe'* *,* *'series'}* *,* *default None*) – Specify type of returned object. See Notes for more details.
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *default None*) – Specify dtypes of returned DataFrames. See Notes for more details.
* **dtype** ([*numpy.dtype*](https://numpy.org/doc/stable/reference/generated/numpy.dtype.html#numpy.dtype) *,* *default None*) – Specify dtype of returned Series. See Notes for more details.
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Specify name of returned Series. See Notes for more details.
* **index** ([*Index*](maxframe.dataframe.Index.md#maxframe.dataframe.Index) *,* *default None*) – Specify index of returned object. See Notes for more details.
* **elementwise** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) –
Specify whether `func` is an elementwise function:
* `False` : The function is not elementwise. MaxFrame will try
concatenating chunks in rows (when `axis=0`) or in columns
(when `axis=1`) and then apply `func` onto the concatenated
chunk. The concatenation step can cause extra latency.
* `True` : The function is elementwise. MaxFrame will apply
`func` to original chunks. This will not introduce extra
concatenation step and reduce overhead.
* **skip_infer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether infer dtypes when dtypes or output_type is not specified.
* **args** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple)) – Positional arguments to pass to func in addition to the
array/series.
* **\*\*kwds** – Additional keyword arguments to pass as keywords arguments to
func.
* **Returns:**
Result of applying `func` along the given axis of the
DataFrame.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.applymap`](maxframe.dataframe.DataFrame.applymap.md#maxframe.dataframe.DataFrame.applymap)
: For elementwise operations.
[`DataFrame.aggregate`](maxframe.dataframe.DataFrame.aggregate.md#maxframe.dataframe.DataFrame.aggregate)
: Only perform aggregating type operations.
[`DataFrame.transform`](maxframe.dataframe.DataFrame.transform.md#maxframe.dataframe.DataFrame.transform)
: Only perform transforming type operations.
### Notes
When deciding output dtypes and shape of the return value, MaxFrame will
try applying `func` onto a mock DataFrame, and the apply call may
fail. When this happens, you need to specify the type of apply call
(DataFrame or Series) in output_type.
* For DataFrame output, you need to specify a list or a pandas Series
as `dtypes` of output DataFrame. `index` of output can also be
specified.
* For Series output, you need to specify `dtype` and `name` of
output Series.
* For any input with data type `pandas.ArrowDtype(pyarrow.MapType)`, it will always
be converted to a Python dict. And for any output with this data type, it must be
returned as a Python dict as well.
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df.execute()
A B
0 4 9
1 4 9
2 4 9
```
Using a reducing function on either axis
```pycon
>>> df.apply(np.sum, axis=0).execute()
A 12
B 27
dtype: int64
```
```pycon
>>> df.apply(lambda row: int(np.sum(row)), axis=1).execute()
0 13
1 13
2 13
dtype: int64
```
Passing `result_type='expand'` will expand list-like results
to columns of a Dataframe
```pycon
>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand').execute()
0 1
0 1 2
1 1 2
2 1 2
```
Returning a Series inside the function is similar to passing
`result_type='expand'`. The resulting column names
will be the Series index.
```pycon
>>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1).execute()
foo bar
0 1 2
1 1 2
2 1 2
```
Passing `result_type='broadcast'` will ensure the same shape
result, whether list-like or scalar is returned by the function,
and broadcast it along the axis. The resulting column names will
be the originals.
```pycon
>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast').execute()
A B
0 1 2
1 1 2
2 1 2
```
Create a dataframe with a map type.
```pycon
>>> import pyarrow as pa
>>> import pandas as pd
>>> from maxframe.lib.dtypes_extension import dict_
>>> col_a = pd.Series(
... data=[[("k1", 1), ("k2", 2)], [("k1", 3)], None],
... index=[1, 2, 3],
... dtype=dict_(pa.string(), pa.int64()),
... )
>>> col_b = pd.Series(
... data=["A", "B", "C"],
... index=[1, 2, 3],
... )
>>> df = md.DataFrame({"A": col_a, "B": col_b})
>>> df.execute()
A B
1 [('k1', 1), ('k2', 2)] A
2 [('k1', 3)] B
3 <NA> C
```
Define a function that updates the map type with a new key-value pair.
```pycon
>>> def custom_set_item(x):
... if x["A"] is not None:
... x["A"]["k2"] = 10
... return x
```
```pycon
>>> df.apply(
... custom_set_item,
... axis=1,
... output_type="dataframe",
... dtypes=df.dtypes.copy(),
... ).execute()
A B
1 [('k1', 1), ('k2', 10)] A
2 [('k1', 3), ('k2', 10)] B
3 <NA> C
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.applymap.md
# maxframe.dataframe.DataFrame.applymap
#### DataFrame.applymap(func, na_action=None, dtypes=None, dtype=None, skip_infer=False, \*\*kwargs)
Apply a function to a Dataframe elementwise.
This method applies a function that accepts and returns a scalar
to every element of a DataFrame.
* **Parameters:**
* **func** (*callable*) – Python function, returns a single value from a single value.
* **na_action** ( *{None* *,* *'ignore'}* *,* *default None*) – If ‘ignore’, propagate NaN values, without passing them to func.
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *default None*) – Specify dtypes of returned DataFrames.
* **dtype** (*np.dtype* *,* *default None*) – Specify dtypes of all columns of returned DataFrames, only
effective when dtypes is not specified.
* **skip_infer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether infer dtypes when dtypes or dtype is not specified.
* **\*\*kwargs** – Additional keyword arguments to pass as keywords arguments to
func.
* **Returns:**
Transformed DataFrame.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.apply`](maxframe.dataframe.DataFrame.apply.md#maxframe.dataframe.DataFrame.apply)
: Apply a function along input axis of DataFrame.
`DataFrame.replace`
: Replace values given in to_replace with value.
[`Series.map`](maxframe.dataframe.Series.map.md#maxframe.dataframe.Series.map)
: Apply a function elementwise on a Series.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[1, 2.12], [3.356, 4.567]])
>>> df.execute()
0 1
0 1.000 2.120
1 3.356 4.567
```
```pycon
>>> df.map(lambda x: len(str(x))).execute()
0 1
0 3 4
1 5 5
```
Like Series.map, NA values can be ignored:
```pycon
>>> df_copy = df.copy()
>>> df_copy.iloc[0, 0] = md.NA
>>> df_copy.map(lambda x: len(str(x)), na_action='ignore').execute()
0 1
0 NaN 4
1 5.0 5
```
It is also possible to use map with functions that are not
lambda functions:
```pycon
>>> df.map(round, ndigits=1).execute()
0 1
0 1.0 2.1
1 3.4 4.6
```
Note that a vectorized version of func often exists, which will
be much faster. You could square each number elementwise.
```pycon
>>> df.map(lambda x: x**2).execute()
0 1
0 1.000000 4.494400
1 11.262736 20.857489
```
But it’s better to avoid map in that case.
```pycon
>>> (df ** 2).execute()
0 1
0 1.000000 4.494400
1 11.262736 20.857489
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.assign.md
# maxframe.dataframe.DataFrame.assign
#### DataFrame.assign(\*\*kwargs)
Assign new columns to a DataFrame.
Returns a new object with all original columns in addition to new ones.
Existing columns that are re-assigned will be overwritten.
* **Parameters:**
**\*\*kwargs** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *of* *{str: callable* *or* *Series}*) – The column names are keywords. If the values are
callable, they are computed on the DataFrame and
assigned to the new columns. The callable must not
change input DataFrame (though pandas doesn’t check it).
If the values are not callable, (e.g. a Series, scalar, or array),
they are simply assigned.
* **Returns:**
A new DataFrame with the new columns in addition to
all the existing columns.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Notes
Assigning multiple columns within the same `assign` is possible.
Later items in ‘kwargs’ may refer to newly created or modified
columns in ‘df’; items are computed and assigned into ‘df’ in order.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'temp_c': [17.0, 25.0]},
... index=['Portland', 'Berkeley'])
>>> df.execute()
temp_c
Portland 17.0
Berkeley 25.0
```
Where the value is a callable, evaluated on df:
```pycon
>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32).execute()
temp_c temp_f
Portland 17.0 62.6
Berkeley 25.0 77.0
```
Alternatively, the same behavior can be achieved by directly
referencing an existing Series or sequence:
```pycon
>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32).execute()
temp_c temp_f
Portland 17.0 62.6
Berkeley 25.0 77.0
```
You can create multiple columns within the same assign where one
of the columns depends on another one defined within the same assign:
```pycon
>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
... temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9).execute()
temp_c temp_f temp_k
Portland 17.0 62.6 290.15
Berkeley 25.0 77.0 298.15
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.astype.md
# maxframe.dataframe.DataFrame.astype
#### DataFrame.astype(dtype, copy=True, errors='raise')
Cast a pandas object to a specified dtype `dtype`.
* **Parameters:**
* **dtype** (*data type* *, or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *of* *column name -> data type*) – Use a numpy.dtype or Python type to cast entire pandas object to
the same type. Alternatively, use {col: dtype, …}, where col is a
column label and dtype is a numpy.dtype or Python type to cast one
or more of the DataFrame’s columns to column-specific types.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Return a copy when `copy=True` (be very careful setting
`copy=False` as changes to values then may propagate to other
pandas objects).
* **errors** ( *{'raise'* *,* *'ignore'}* *,* *default 'raise'*) –
Control raising of exceptions on invalid data for provided dtype.
- `raise` : allow exceptions to be raised
- `ignore` : suppress exceptions. On error return original object.
* **Returns:**
**casted**
* **Return type:**
same type as caller
#### SEE ALSO
[`to_datetime`](maxframe.dataframe.to_datetime.md#maxframe.dataframe.to_datetime)
: Convert argument to datetime.
`to_timedelta`
: Convert argument to timedelta.
[`to_numeric`](maxframe.dataframe.to_numeric.md#maxframe.dataframe.to_numeric)
: Convert argument to a numeric type.
[`numpy.ndarray.astype`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html#numpy.ndarray.astype)
: Cast a numpy array to a specified type.
### Examples
Create a DataFrame:
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}))
>>> df.dtypes
col1 int64
col2 int64
dtype: object
```
Cast all columns to int32:
```pycon
>>> df.astype('int32').dtypes
col1 int32
col2 int32
dtype: object
```
Cast col1 to int32 using a dictionary:
```pycon
>>> df.astype({'col1': 'int32'}).dtypes
col1 int32
col2 int64
dtype: object
```
Create a series:
```pycon
>>> ser = md.Series(pd.Series([1, 2], dtype='int32'))
>>> ser.execute()
0 1
1 2
dtype: int32
>>> ser.astype('int64').execute()
0 1
1 2
dtype: int64
```
Convert to categorical type:
```pycon
>>> ser.astype('category').execute()
0 1
1 2
dtype: category
Categories (2, int64): [1, 2]
```
Convert to ordered categorical type with custom ordering:
```pycon
>>> cat_dtype = pd.api.types.CategoricalDtype(
... categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype).execute()
0 1
1 2
dtype: category
Categories (2, int64): [2 < 1]
```
Note that using `copy=False` and changing data on a new
pandas object may propagate changes:
```pycon
>>> s1 = md.Series(pd.Series([1, 2]))
>>> s2 = s1.astype('int64', copy=False)
>>> s1.execute() # note that s1[0] has changed too
0 1
1 2
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.at.md
# maxframe.dataframe.DataFrame.at
#### *property* DataFrame.at
Access a single value for a row/column label pair.
Similar to `loc`, in that both provide label-based lookups. Use
`at` if you only need to get or set a single value in a DataFrame
or Series.
* **Raises:**
[**KeyError**](https://docs.python.org/3/library/exceptions.html#KeyError) – If ‘label’ does not exist in DataFrame.
#### SEE ALSO
[`DataFrame.iat`](maxframe.dataframe.DataFrame.iat.md#maxframe.dataframe.DataFrame.iat)
: Access a single value for a row/column pair by integer position.
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Access a group of rows and columns by label(s).
[`Series.at`](maxframe.dataframe.Series.at.md#maxframe.dataframe.Series.at)
: Access a single value using a label.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... index=[4, 5, 6], columns=['A', 'B', 'C'])
>>> df.execute()
A B C
4 0 2 3
5 0 4 1
6 10 20 30
```
Get value at specified row/column pair
```pycon
>>> df.at[4, 'B'].execute()
2
```
# Set value at specified row/column pair
#
# >>> df.at[4, ‘B’] = 10
# >>> df.at[4, ‘B’]
# 10
Get value within a Series
```pycon
>>> df.loc[5].at['B'].execute()
4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.at_time.md
# maxframe.dataframe.DataFrame.at_time
#### DataFrame.at_time(time, axis=0)
Select values at particular time of day (e.g., 9:30AM).
* **Parameters:**
* **time** ([*datetime.time*](https://docs.python.org/3/library/datetime.html#datetime.time) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The values to select.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – For Series this parameter is unused and defaults to 0.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
* **Raises:**
[**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError) – If the index is not a `DatetimeIndex`
#### SEE ALSO
[`between_time`](maxframe.dataframe.DataFrame.between_time.md#maxframe.dataframe.DataFrame.between_time)
: Select values between particular times of the day.
`first`
: Select initial periods of time series based on a date offset.
`last`
: Select final periods of time series based on a date offset.
`DatetimeIndex.indexer_at_time`
: Get just the index locations for values at particular time of the day.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> i = md.date_range('2018-04-09', periods=4, freq='12h')
>>> ts = md.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts.execute()
A
2018-04-09 00:00:00 1
2018-04-09 12:00:00 2
2018-04-10 00:00:00 3
2018-04-10 12:00:00 4
```
```pycon
>>> ts.at_time('12:00').execute()
A
2018-04-09 12:00:00 2
2018-04-10 12:00:00 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.between_time.md
# maxframe.dataframe.DataFrame.between_time
#### DataFrame.between_time(start_time, end_time, inclusive='both', axis=0)
Select values between particular times of the day (e.g., 9:00-9:30 AM).
By setting `start_time` to be later than `end_time`,
you can get the times that are *not* between the two times.
* **Parameters:**
* **start_time** ([*datetime.time*](https://docs.python.org/3/library/datetime.html#datetime.time) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Initial time as a time filter limit.
* **end_time** ([*datetime.time*](https://docs.python.org/3/library/datetime.html#datetime.time) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – End time as a time filter limit.
* **inclusive** ( *{"both"* *,* *"neither"* *,* *"left"* *,* *"right"}* *,* *default "both"*) – Include boundaries; whether to set each bound as closed or open.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – Determine range time on index or columns value.
For Series this parameter is unused and defaults to 0.
* **Returns:**
Data from the original object filtered to the specified dates range.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
* **Raises:**
[**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError) – If the index is not a `DatetimeIndex`
#### SEE ALSO
[`at_time`](maxframe.dataframe.DataFrame.at_time.md#maxframe.dataframe.DataFrame.at_time)
: Select values at a particular time of the day.
`first`
: Select initial periods of time series based on a date offset.
`last`
: Select final periods of time series based on a date offset.
`DatetimeIndex.indexer_between_time`
: Get just the index locations for values between particular times of the day.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> i = md.date_range('2018-04-09', periods=4, freq='1D20min')
>>> ts = md.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts.execute()
A
2018-04-09 00:00:00 1
2018-04-10 00:20:00 2
2018-04-11 00:40:00 3
2018-04-12 01:00:00 4
```
```pycon
>>> ts.between_time('0:15', '0:45').execute()
A
2018-04-10 00:20:00 2
2018-04-11 00:40:00 3
```
You get the times that are *not* between two times by setting
`start_time` later than `end_time`:
```pycon
>>> ts.between_time('0:45', '0:15').execute()
A
2018-04-09 00:00:00 1
2018-04-12 01:00:00 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.clip.md
# maxframe.dataframe.DataFrame.clip
#### DataFrame.clip(lower=None, upper=None, , axis=None, inplace=False)
Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds
can be singular values or array like, and in the latter case
the clipping is performed element-wise in the specified axis.
* **Parameters:**
* **lower** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array-like* *,* *default None*) – Minimum threshold value. All values below this
threshold will be set to it. If None, no lower clipping is performed.
* **upper** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array-like* *,* *default None*) – Maximum threshold value. All values above this
threshold will be set to it. If None, no upper clipping is performed.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *str axis name* *,* *optional*) – Align object with lower and upper along the given axis.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether to perform the operation in place on the data.
* **\*args** – Additional keywords have no effect but might be accepted
for compatibility with numpy.
* **\*\*kwargs** – Additional keywords have no effect but might be accepted
for compatibility with numpy.
* **Returns:**
Same type as calling object with the values outside the
clip boundaries replaced or None if `inplace=True`.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or None
#### SEE ALSO
[`Series.clip`](maxframe.dataframe.Series.clip.md#maxframe.dataframe.Series.clip)
: Trim values at input threshold in series.
[`DataFrame.clip`](#maxframe.dataframe.DataFrame.clip)
: Trim values at input threshold in dataframe.
[`numpy.clip`](https://numpy.org/doc/stable/reference/generated/numpy.clip.html#numpy.clip)
: Clip (limit) the values in an array.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
>>> df = md.DataFrame(data)
>>> df.execute()
col_0 col_1
0 9 -2
1 -3 -7
2 0 6
3 -1 8
4 5 -5
```
Clips per column using lower and upper thresholds:
```pycon
>>> df.clip(lower=-4, upper=7).execute()
col_0 col_1
0 7 -2
1 -3 -4
2 0 6
3 -1 7
4 5 -4
```
Clips using specific lower and upper thresholds per column element:
```pycon
>>> t = md.Series([2, -4, -1, 6, 3])
>>> t.execute()
0 2
1 -4
2 -1
3 6
4 3
dtype: int64
```
```pycon
>>> df.clip(lower=t, upper=t).execute()
col_0 col_1
0 2 2
1 -3 -4
2 0 -1
3 -1 6
4 5 3
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.columns.md
# maxframe.dataframe.DataFrame.columns
#### *property* DataFrame.columns
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.combine.md
# maxframe.dataframe.DataFrame.combine
#### DataFrame.combine(other, func, fill_value=None, overwrite=True)
Perform column-wise combine with another DataFrame.
Combines a DataFrame with other DataFrame using func
to element-wise combine columns. The row and column indexes of the
resulting DataFrame will be the union of the two.
* **Parameters:**
* **other** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – The DataFrame to merge column-wise.
* **func** (*function*) – Function that takes two series as inputs and return a Series or a
scalar. Used to merge the two dataframes column by columns.
* **fill_value** (*scalar value* *,* *default None*) – The value to fill NaNs with prior to passing any column to the
merge func.
* **overwrite** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If True, columns in self that do not exist in other will be
overwritten with NaNs.
* **Returns:**
Combination of the provided DataFrames.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.combine_first`](maxframe.dataframe.DataFrame.combine_first.md#maxframe.dataframe.DataFrame.combine_first)
: Combine two DataFrame objects and default to non-null values in frame calling the method.
### Examples
Combine using a simple function that chooses the smaller column.
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df1 = md.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = md.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
>>> df1.combine(df2, take_smaller).execute()
A B
0 0 3
1 0 3
```
Example using a true element-wise combine function.
```pycon
>>> df1 = md.DataFrame({'A': [5, 0], 'B': [2, 4]})
>>> df2 = md.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, mt.minimum).execute()
A B
0 1 2
1 0 3
```
Using fill_value fills Nones prior to passing the column to the
merge function.
```pycon
>>> df1 = md.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = md.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5).execute()
A B
0 0 -5.0
1 0 4.0
```
However, if the same element in both dataframes is None, that None
is preserved
```pycon
>>> df1 = md.DataFrame({'A': [0, 0], 'B': [None, 4]})
>>> df2 = md.DataFrame({'A': [1, 1], 'B': [None, 3]})
>>> df1.combine(df2, take_smaller, fill_value=-5).execute()
A B
0 0 -5.0
1 0 3.0
```
Example that demonstrates the use of overwrite and behavior when
the axis differ between the dataframes.
```pycon
>>> df1 = md.DataFrame({'A': [0, 0], 'B': [4, 4]})
>>> df2 = md.DataFrame({'B': [3, 3], 'C': [-10, 1], }, index=[1, 2])
>>> df1.combine(df2, take_smaller).execute()
A B C
0 NaN NaN NaN
1 NaN 3.0 -10.0
2 NaN 3.0 1.0
```
```pycon
>>> df1.combine(df2, take_smaller, overwrite=False).execute()
A B C
0 0.0 NaN NaN
1 0.0 3.0 -10.0
2 NaN 3.0 1.0
```
Demonstrating the preference of the passed in dataframe.
```pycon
>>> df2 = md.DataFrame({'B': [3, 3], 'C': [1, 1], }, index=[1, 2])
>>> df2.combine(df1, take_smaller).execute()
A B C
0 0.0 NaN NaN
1 0.0 3.0 NaN
2 NaN 3.0 NaN
```
```pycon
>>> df2.combine(df1, take_smaller, overwrite=False).execute()
A B C
0 0.0 NaN NaN
1 0.0 3.0 1.0
2 NaN 3.0 1.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.combine_first.md
# maxframe.dataframe.DataFrame.combine_first
#### DataFrame.combine_first(other)
Update null elements with value in the same location in other.
Combine two DataFrame objects by filling null values in one DataFrame
with non-null values from other DataFrame. The row and column indexes
of the resulting DataFrame will be the union of the two. The resulting
dataframe contains the ‘first’ dataframe values and overrides the
second one values where both first.loc[index, col] and
second.loc[index, col] are not missing values, upon calling
first.combine_first(second).
* **Parameters:**
**other** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Provided DataFrame to use to fill null values.
* **Returns:**
The result of combining the provided DataFrame with the other object.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.combine`](maxframe.dataframe.DataFrame.combine.md#maxframe.dataframe.DataFrame.combine)
: Perform series-wise operation on two DataFrames using a given function.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df1 = md.DataFrame({'A': [None, 0], 'B': [None, 4]})
>>> df2 = md.DataFrame({'A': [1, 1], 'B': [3, 3]})
>>> df1.combine_first(df2).execute()
A B
0 1.0 3.0
1 0.0 4.0
```
Null values still persist if the location of that null value
does not exist in other
```pycon
>>> df1 = md.DataFrame({'A': [None, 0], 'B': [4, None]})
>>> df2 = md.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])
>>> df1.combine_first(df2).execute()
A B C
0 NaN 4.0 NaN
1 0.0 3.0 1.0
2 NaN 3.0 1.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.compare.md
# maxframe.dataframe.DataFrame.compare
#### DataFrame.compare(other, align_axis: [int](https://docs.python.org/3/library/functions.html#int) | [str](https://docs.python.org/3/library/stdtypes.html#str) = 1, keep_shape: [bool](https://docs.python.org/3/library/functions.html#bool) = False, keep_equal: [bool](https://docs.python.org/3/library/functions.html#bool) = False, result_names: [Tuple](https://docs.python.org/3/library/typing.html#typing.Tuple)[[str](https://docs.python.org/3/library/stdtypes.html#str), [str](https://docs.python.org/3/library/stdtypes.html#str)] = ('self', 'other'))
Compare to another DataFrame and show the differences.
* **Parameters:**
* **other** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Object to compare with.
* **align_axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 1*) –
Determine which axis to align the comparison on.
* 0, or ‘index’
: with rows drawn alternately from self and other.
* 1, or ‘columns’
: with columns drawn alternately from self and other.
* **keep_shape** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If true, all rows and columns are kept.
Otherwise, only the ones with different values are kept.
* **keep_equal** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If true, the result keeps values that are equal.
Otherwise, equal values are shown as NaNs.
* **result_names** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *,* *default* *(* *‘self’* *,* *‘other’* *)*) – Set the dataframes names in the comparison.
* **Returns:**
DataFrame that shows the differences stacked side by side.
The resulting index will be a MultiIndex with ‘self’ and ‘other’
stacked alternately at the inner level.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – When the two DataFrames don’t have identical labels or shape.
#### SEE ALSO
[`Series.compare`](maxframe.dataframe.Series.compare.md#maxframe.dataframe.Series.compare)
: Compare with another Series and show differences.
`DataFrame.equals`
: Test whether two objects contain the same elements.
### Notes
Matching NaNs will not appear as a difference.
Can only compare identically-labeled
(i.e. same shape, identical row and column labels) DataFrames
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(
... {
... "col1": ["a", "a", "b", "b", "a"],
... "col2": [1.0, 2.0, 3.0, mt.nan, 5.0],
... "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
... },
... columns=["col1", "col2", "col3"],
... )
>>> df.execute()
col1 col2 col3
0 a 1.0 1.0
1 a 2.0 2.0
2 b 3.0 3.0
3 b NaN 4.0
4 a 5.0 5.0
```
```pycon
>>> df2 = df.copy()
>>> df2.loc[0, 'col1'] = 'c'
>>> df2.loc[2, 'col3'] = 4.0
>>> df2.execute()
col1 col2 col3
0 c 1.0 1.0
1 a 2.0 2.0
2 b 3.0 4.0
3 b NaN 4.0
4 a 5.0 5.0
```
Align the differences on columns
```pycon
>>> df.compare(df2).execute()
col1 col3
self other self other
0 a c NaN NaN
2 NaN NaN 3.0 4.0
```
Stack the differences on rows
```pycon
>>> df.compare(df2, align_axis=0).execute()
col1 col3
0 self a NaN
other c NaN
2 self NaN 3.0
other NaN 4.0
```
Keep the equal values
```pycon
>>> df.compare(df2, keep_equal=True).execute()
col1 col3
self other self other
0 a c 1.0 1.0
2 b b 3.0 4.0
```
Keep all original rows and columns
```pycon
>>> df.compare(df2, keep_shape=True).execute()
col1 col2 col3
self other self other self other
0 a c NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN 3.0 4.0
3 NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN
```
Keep all original rows and columns and also all original values
```pycon
>>> df.compare(df2, keep_shape=True, keep_equal=True).execute()
col1 col2 col3
self other self other self other
0 a c 1.0 1.0 1.0 1.0
1 a a 2.0 2.0 2.0 2.0
2 b b 3.0 3.0 3.0 4.0
3 b b NaN NaN 4.0 4.0
4 a a 5.0 5.0 5.0 5.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.convert_dtypes.md
# maxframe.dataframe.DataFrame.convert_dtypes
#### DataFrame.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True, dtype_backend='numpy')
Convert columns to best possible dtypes using dtypes supporting `pd.NA`.
* **Parameters:**
* **infer_objects** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Whether object dtypes should be converted to the best possible types.
* **convert_string** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Whether object dtypes should be converted to `StringDtype()`.
* **convert_integer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Whether, if possible, conversion can be done to integer extension types.
* **convert_boolean** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *defaults True*) – Whether object dtypes should be converted to `BooleanDtypes()`.
* **convert_floating** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *defaults True*) – Whether, if possible, conversion can be done to floating extension types.
If convert_integer is also True, preference will be give to integer
dtypes if the floats can be faithfully casted to integers.
* **Returns:**
Copy of input object with new dtype.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`infer_objects`](maxframe.dataframe.DataFrame.infer_objects.md#maxframe.dataframe.DataFrame.infer_objects)
: Infer dtypes of objects.
[`to_datetime`](maxframe.dataframe.to_datetime.md#maxframe.dataframe.to_datetime)
: Convert argument to datetime.
`to_timedelta`
: Convert argument to timedelta.
[`to_numeric`](maxframe.dataframe.to_numeric.md#maxframe.dataframe.to_numeric)
: Convert argument to a numeric type.
### Notes
By default, `convert_dtypes` will attempt to convert a Series (or each
Series in a DataFrame) to dtypes that support `pd.NA`. By using the options
`convert_string`, `convert_integer`, `convert_boolean` and
`convert_boolean`, it is possible to turn off individual conversions
to `StringDtype`, the integer extension types, `BooleanDtype`
or floating extension types, respectively.
For object-dtyped columns, if `infer_objects` is `True`, use the inference
rules as during normal Series/DataFrame construction. Then, if possible,
convert to `StringDtype`, `BooleanDtype` or an appropriate integer
or floating extension type, otherwise leave as `object`.
If the dtype is integer, convert to an appropriate integer extension type.
If the dtype is numeric, and consists of all integers, convert to an
appropriate integer extension type. Otherwise, convert to an
appropriate floating extension type.
#### Versionchanged
Changed in version 1.2: Starting with pandas 1.2, this method also converts float columns
to the nullable floating extension type.
In the future, as new dtypes are added that support `pd.NA`, the results
of this method will change to support those new dtypes.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(
... {
... "a": md.Series([1, 2, 3], dtype=mt.dtype("int32")),
... "b": md.Series(["x", "y", "z"], dtype=mt.dtype("O")),
... "c": md.Series([True, False, mt.nan], dtype=mt.dtype("O")),
... "d": md.Series(["h", "i", mt.nan], dtype=mt.dtype("O")),
... "e": md.Series([10, mt.nan, 20], dtype=mt.dtype("float")),
... "f": md.Series([mt.nan, 100.5, 200], dtype=mt.dtype("float")),
... }
... )
```
Start with a DataFrame with default dtypes.
```pycon
>>> df.execute()
a b c d e f
0 1 x True h 10.0 NaN
1 2 y False i NaN 100.5
2 3 z NaN NaN 20.0 200.0
```
```pycon
>>> df.dtypes.execute()
a int32
b object
c object
d object
e float64
f float64
dtype: object
```
Convert the DataFrame to use best possible dtypes.
```pycon
>>> dfn = df.convert_dtypes()
>>> dfn.execute()
a b c d e f
0 1 x True h 10 <NA>
1 2 y False i <NA> 100.5
2 3 z <NA> <NA> 20 200.0
```
```pycon
>>> dfn.dtypes.execute()
a Int32
b string
c boolean
d string
e Int64
f Float64
dtype: object
```
Start with a Series of strings and missing data represented by `np.nan`.
```pycon
>>> s = md.Series(["a", "b", mt.nan])
>>> s.execute()
0 a
1 b
2 NaN
dtype: object
```
Obtain a Series with dtype `StringDtype`.
```pycon
>>> s.convert_dtypes().execute()
0 a
1 b
2 <NA>
dtype: string
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.copy.md
# maxframe.dataframe.DataFrame.copy
#### DataFrame.copy() → TileableType
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.corr.md
# maxframe.dataframe.DataFrame.corr
#### DataFrame.corr(method='pearson', min_periods=1)
Compute pairwise correlation of columns, excluding NA/null values.
* **Parameters:**
* **method** ( *{'pearson'* *,* *'kendall'* *,* *'spearman'}* *or* *callable*) –
Method of correlation:
* pearson : standard correlation coefficient
* kendall : Kendall Tau correlation coefficient
* spearman : Spearman rank correlation
* callable: callable with input two 1d ndarrays
: and returning a float. Note that the returned matrix from corr
will have 1 along the diagonals and will be symmetric
regardless of the callable’s behavior.
#### NOTE
kendall, spearman and callables not supported on multiple chunks yet.
* **min_periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Minimum number of observations required per pair of columns
to have a valid result. Currently only available for Pearson
and Spearman correlation.
* **Returns:**
Correlation matrix.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.corrwith`](maxframe.dataframe.DataFrame.corrwith.md#maxframe.dataframe.DataFrame.corrwith)
: Compute pairwise correlation with another DataFrame or Series.
[`Series.corr`](maxframe.dataframe.Series.corr.md#maxframe.dataframe.Series.corr)
: Compute the correlation between two Series.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
... columns=['dogs', 'cats'])
>>> df.corr(method='pearson').execute()
dogs cats
dogs 1.000000 -0.851064
cats -0.851064 1.000000
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.corrwith.md
# maxframe.dataframe.DataFrame.corrwith
#### DataFrame.corrwith(other, axis=0, drop=False, method='pearson')
Compute pairwise correlation.
Pairwise correlation is computed between rows or columns of
DataFrame with rows or columns of Series or DataFrame. DataFrames
are first aligned along both axes before computing the
correlations.
* **Parameters:**
* **other** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – Object with which to compute correlations.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – The axis to use. 0 or ‘index’ to compute column-wise, 1 or ‘columns’ for
row-wise.
* **drop** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Drop missing indices from result.
* **method** ( *{'pearson'* *,* *'kendall'* *,* *'spearman'}* *or* *callable*) –
Method of correlation:
* pearson : standard correlation coefficient
* kendall : Kendall Tau correlation coefficient
* spearman : Spearman rank correlation
* callable: callable with input two 1d ndarrays
: and returning a float.
#### NOTE
kendall, spearman and callables not supported on multiple chunks yet.
* **Returns:**
Pairwise correlations.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`DataFrame.corr`](maxframe.dataframe.DataFrame.corr.md#maxframe.dataframe.DataFrame.corr)
: Compute pairwise correlation of columns.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.count.md
# maxframe.dataframe.DataFrame.count
#### DataFrame.count(axis=0, level=None, numeric_only=False, \*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.cov.md
# maxframe.dataframe.DataFrame.cov
#### DataFrame.cov(min_periods=None, ddof=1, numeric_only=True)
Compute pairwise covariance of columns, excluding NA/null values.
Compute the pairwise covariance among the series of a DataFrame.
The returned data frame is the [covariance matrix](https://en.wikipedia.org/wiki/Covariance_matrix) of the columns
of the DataFrame.
Both NA and null values are automatically excluded from the
calculation. (See the note below about bias from missing values.)
A threshold can be set for the minimum number of
observations for each value created. Comparisons with observations
below this threshold will be returned as `NaN`.
This method is generally used for the analysis of time series data to
understand the relationship between different measures
across time.
* **Parameters:**
* **min_periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Minimum number of observations required per pair of columns
to have a valid result.
* **ddof** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 1*) – Delta degrees of freedom. The divisor used in calculations
is `N - ddof`, where `N` represents the number of elements.
This argument is applicable only when no `nan` is in the dataframe.
* **numeric_only** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Include only float, int or boolean data.
* **Returns:**
The covariance matrix of the series of the DataFrame.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`Series.cov`](maxframe.dataframe.Series.cov.md#maxframe.dataframe.Series.cov)
: Compute covariance with another Series.
`core.window.ewm.ExponentialMovingWindow.cov`
: Exponential weighted sample covariance.
`core.window.expanding.Expanding.cov`
: Expanding sample covariance.
`core.window.rolling.Rolling.cov`
: Rolling sample covariance.
### Notes
Returns the covariance matrix of the DataFrame’s time series.
The covariance is normalized by N-ddof.
For DataFrames that have Series that are missing data (assuming that
data is [missing at random](https://en.wikipedia.org/wiki/Missing_data#Missing_at_random))
the returned covariance matrix will be an unbiased estimate
of the variance and covariance between the member Series.
However, for many applications this estimate may not be acceptable
because the estimate covariance matrix is not guaranteed to be positive
semi-definite. This could lead to estimate correlations having
absolute values which are greater than one, and/or a non-invertible
covariance matrix. See [Estimation of covariance matrices](https://en.wikipedia.org/w/index.php?title=Estimation_of_covariance_matrices) for more details.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
... columns=['dogs', 'cats'])
>>> df.cov().execute()
dogs cats
dogs 0.666667 -1.000000
cats -1.000000 1.666667
```
```pycon
>>> mt.random.seed(42)
>>> df = md.DataFrame(mt.random.randn(1000, 5),
... columns=['a', 'b', 'c', 'd', 'e'])
>>> df.cov().execute()
a b c d e
a 0.998438 -0.020161 0.059277 -0.008943 0.014144
b -0.020161 1.059352 -0.008543 -0.024738 0.009826
c 0.059277 -0.008543 1.010670 -0.001486 -0.000271
d -0.008943 -0.024738 -0.001486 0.921297 -0.013692
e 0.014144 0.009826 -0.000271 -0.013692 0.977795
```
**Minimum number of periods**
This method also supports an optional `min_periods` keyword
that specifies the required minimum number of non-NA observations for
each column pair in order to have a valid result:
```pycon
>>> mt.random.seed(42)
>>> df = md.DataFrame(mt.random.randn(20, 3),
... columns=['a', 'b', 'c'])
>>> df.loc[df.index[:5], 'a'] = mt.nan
>>> df.loc[df.index[5:10], 'b'] = mt.nan
>>> df.cov(min_periods=12).execute()
a b c
a 0.316741 NaN -0.150812
b NaN 1.248003 0.191417
c -0.150812 0.191417 0.895202
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.describe.md
# maxframe.dataframe.DataFrame.describe
#### DataFrame.describe(percentiles=None, include=None, exclude=None)
Generate descriptive statistics.
Descriptive statistics include those that summarize the central
tendency, dispersion and shape of a
dataset’s distribution, excluding `NaN` values.
Analyzes both numeric and object series, as well
as `DataFrame` column sets of mixed data types. The output
will vary depending on what is provided. Refer to the notes
below for more detail.
* **Parameters:**
* **percentiles** (*list-like* *of* *numbers* *,* *optional*) – The percentiles to include in the output. All should
fall between 0 and 1. The default is
`[.25, .5, .75]`, which returns the 25th, 50th, and
75th percentiles.
* **include** ( *'all'* *,* *list-like* *of* *dtypes* *or* *None* *(**default* *)* *,* *optional*) –
A white list of data types to include in the result. Ignored
for `Series`. Here are the options:
- ’all’ : All columns of the input will be included in the output.
- A list-like of dtypes : Limits the results to the
provided data types.
To limit the result to numeric types submit
`numpy.number`. To limit it instead to object columns submit
the `numpy.object` data type. Strings
can also be used in the style of
`select_dtypes` (e.g. `df.describe(include=['O'])`).
- None (default) : The result will include all numeric columns.
* **exclude** (*list-like* *of* *dtypes* *or* *None* *(**default* *)* *,* *optional* *,*) –
A black list of data types to omit from the result. Ignored
for `Series`. Here are the options:
- A list-like of dtypes : Excludes the provided data types
from the result. To exclude numeric types submit
`numpy.number`. To exclude object columns submit the data
type `numpy.object`. Strings can also be used in the style of
`select_dtypes` (e.g. `df.describe(exclude=['O'])`).
- None (default) : The result will exclude nothing.
* **Returns:**
Summary statistics of the Series or Dataframe provided.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.count`](maxframe.dataframe.DataFrame.count.md#maxframe.dataframe.DataFrame.count)
: Count number of non-NA/null observations.
[`DataFrame.max`](maxframe.dataframe.DataFrame.max.md#maxframe.dataframe.DataFrame.max)
: Maximum of the values in the object.
[`DataFrame.min`](maxframe.dataframe.DataFrame.min.md#maxframe.dataframe.DataFrame.min)
: Minimum of the values in the object.
[`DataFrame.mean`](maxframe.dataframe.DataFrame.mean.md#maxframe.dataframe.DataFrame.mean)
: Mean of the values.
[`DataFrame.std`](maxframe.dataframe.DataFrame.std.md#maxframe.dataframe.DataFrame.std)
: Standard deviation of the observations.
[`DataFrame.select_dtypes`](maxframe.dataframe.DataFrame.select_dtypes.md#maxframe.dataframe.DataFrame.select_dtypes)
: Subset of a DataFrame including/excluding columns based on their dtype.
### Notes
For numeric data, the result’s index will include `count`,
`mean`, `std`, `min`, `max` as well as lower, `50` and
upper percentiles. By default the lower percentile is `25` and the
upper percentile is `75`. The `50` percentile is the
same as the median.
For object data (e.g. strings or timestamps), the result’s index
will include `count`, `unique`, `top`, and `freq`. The `top`
is the most common value. The `freq` is the most common value’s
frequency. Timestamps also include the `first` and `last` items.
If multiple object values have the highest count, then the
`count` and `top` results will be arbitrarily chosen from
among those with the highest count.
For mixed data types provided via a `DataFrame`, the default is to
return only an analysis of numeric columns. If the dataframe consists
only of object data without any numeric columns, the default is to
return an analysis of object columns. If `include='all'` is provided
as an option, the result will include a union of attributes of each type.
The include and exclude parameters can be used to limit
which columns in a `DataFrame` are analyzed for the output.
The parameters are ignored when analyzing a `Series`.
### Examples
Describing a numeric `Series`.
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3])
>>> s.describe().execute()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
dtype: float64
```
Describing a `DataFrame`. By default only numeric fields
are returned.
```pycon
>>> df = md.DataFrame({'numeric': [1, 2, 3],
... 'object': ['a', 'b', 'c']
... })
>>> df.describe().execute()
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
```
Describing all columns of a `DataFrame` regardless of data type.
```pycon
>>> df.describe(include='all').execute()
numeric object
count 3.0 3
unique NaN 3
top NaN a
freq NaN 1
mean 2.0 NaN
std 1.0 NaN
min 1.0 NaN
25% 1.5 NaN
50% 2.0 NaN
75% 2.5 NaN
max 3.0 NaN
```
Describing a column from a `DataFrame` by accessing it as
an attribute.
```pycon
>>> df.numeric.describe().execute()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Name: numeric, dtype: float64
```
Including only numeric columns in a `DataFrame` description.
```pycon
>>> df.describe(include=[mt.number]).execute()
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
```
Including only string columns in a `DataFrame` description.
```pycon
>>> df.describe(include=[object]).execute()
object
count 3
unique 3
top a
freq 1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.diff.md
# maxframe.dataframe.DataFrame.diff
#### DataFrame.diff(periods=1, axis=0)
First discrete difference of element.
Calculates the difference of a DataFrame element compared with another
element in the DataFrame (default is the element in the same column
of the previous row).
* **Parameters:**
* **periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 1*) – Periods to shift for calculating difference, accepts negative
values.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – Take difference over rows (0) or columns (1).
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`Series.diff`
: First discrete difference for a Series.
[`DataFrame.pct_change`](maxframe.dataframe.DataFrame.pct_change.md#maxframe.dataframe.DataFrame.pct_change)
: Percent change over given number of periods.
[`DataFrame.shift`](maxframe.dataframe.DataFrame.shift.md#maxframe.dataframe.DataFrame.shift)
: Shift index by desired number of periods with an optional time freq.
### Notes
For boolean dtypes, this uses `operator.xor()` rather than
`operator.sub()`.
### Examples
Difference with previous row
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'a': [1, 2, 3, 4, 5, 6],
... 'b': [1, 1, 2, 3, 5, 8],
... 'c': [1, 4, 9, 16, 25, 36]})
>>> df.execute()
a b c
0 1 1 1
1 2 1 4
2 3 2 9
3 4 3 16
4 5 5 25
5 6 8 36
```
```pycon
>>> df.diff().execute()
a b c
0 NaN NaN NaN
1 1.0 0.0 3.0
2 1.0 1.0 5.0
3 1.0 1.0 7.0
4 1.0 2.0 9.0
5 1.0 3.0 11.0
```
Difference with previous column
```pycon
>>> df.diff(axis=1).execute()
a b c
0 NaN 0.0 0.0
1 NaN -1.0 3.0
2 NaN -1.0 7.0
3 NaN -1.0 13.0
4 NaN 0.0 20.0
5 NaN 2.0 28.0
```
Difference with 3rd previous row
```pycon
>>> df.diff(periods=3).execute()
a b c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 3.0 2.0 15.0
4 3.0 4.0 21.0
5 3.0 6.0 27.0
```
Difference with following row
```pycon
>>> df.diff(periods=-1).execute()
a b c
0 -1.0 0.0 -3.0
1 -1.0 -1.0 -5.0
2 -1.0 -1.0 -7.0
3 -1.0 -2.0 -9.0
4 -1.0 -3.0 -11.0
5 NaN NaN NaN
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.div.md
# maxframe.dataframe.DataFrame.div
#### DataFrame.div(other, axis='columns', level=None, fill_value=None)
Get Floating division of dataframe and other, element-wise (binary operator truediv).
Equivalent to `/`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rtruediv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.dot.md
# maxframe.dataframe.DataFrame.dot
#### DataFrame.dot(other)
Compute the matrix multiplication between the DataFrame and other.
This method computes the matrix product between the DataFrame and the
values of an other Series, DataFrame or a numpy array.
It can also be called using `self @ other` in Python >= 3.5.
* **Parameters:**
**other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *or* *array-like*) – The other object to compute the matrix product with.
* **Returns:**
If other is a Series, return the matrix product between self and
other as a Series. If other is a DataFrame or a numpy.array, return
the matrix product of self and other in a DataFrame of a np.array.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`Series.dot`
: Similar method for Series.
### Notes
The dimensions of DataFrame and other must be compatible in order to
compute the matrix multiplication. In addition, the column names of
DataFrame and the index of other must contain the same values, as they
will be aligned prior to the multiplication.
The dot method for Series computes the inner product, instead of the
matrix product here.
### Examples
Here we multiply a DataFrame with a Series.
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
>>> s = md.Series([1, 1, 2, 1])
>>> df.dot(s).execute()
0 -4
1 5
dtype: int64
```
Here we multiply a DataFrame with another DataFrame.
```pycon
>>> other = md.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(other).execute()
0 1
0 1 4
1 2 2
```
Note that the dot method give the same result as @
```pycon
>>> (df @ other).execute()
0 1
0 1 4
1 2 2
```
The dot method works also if other is an np.array.
```pycon
>>> arr = mt.array([[0, 1], [1, 2], [-1, -1], [2, 0]])
>>> df.dot(arr).execute()
0 1
0 1 4
1 2 2
```
Note how shuffling of the objects does not change the result.
```pycon
>>> s2 = s.reindex([1, 0, 2, 3])
>>> df.dot(s2).execute()
0 -4
1 5
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.drop.md
# maxframe.dataframe.DataFrame.drop
#### DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Drop specified labels from rows or columns.
Remove rows or columns by specifying label names and corresponding
axis, or by specifying directly index or column names. When using a
multi-index, labels on different levels can be removed by specifying
the level.
* **Parameters:**
* **labels** (*single label* *or* *list-like*) – Index or column labels to drop.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – Whether to drop labels from the index (0 or ‘index’) or
columns (1 or ‘columns’).
* **index** (*single label* *or* *list-like*) – Alternative to specifying axis (`labels, axis=0`
is equivalent to `index=labels`).
* **columns** (*single label* *or* *list-like*) – Alternative to specifying axis (`labels, axis=1`
is equivalent to `columns=labels`).
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *level name* *,* *optional*) – For MultiIndex, level from which the labels will be removed.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, do operation inplace and return None.
* **errors** ( *{'ignore'* *,* *'raise'}* *,* *default 'raise'*) – If ‘ignore’, suppress error and only existing labels are
dropped. Note that errors for missing indices will not raise.
* **Returns:**
DataFrame without the removed index or column labels.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
* **Raises:**
[**KeyError**](https://docs.python.org/3/library/exceptions.html#KeyError) – If any of the labels is not found in the selected axis.
#### SEE ALSO
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Label-location based indexer for selection by label.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Return DataFrame with labels on given axis omitted where (all or any) data are missing.
[`DataFrame.drop_duplicates`](maxframe.dataframe.DataFrame.drop_duplicates.md#maxframe.dataframe.DataFrame.drop_duplicates)
: Return DataFrame with duplicate rows removed, optionally only considering certain columns.
[`Series.drop`](maxframe.dataframe.Series.drop.md#maxframe.dataframe.Series.drop)
: Return Series with specified index labels removed.
### Examples
```pycon
>>> import numpy as np
>>> import pandas as pd
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(np.arange(12).reshape(3, 4),
... columns=['A', 'B', 'C', 'D'])
>>> df.execute()
A B C D
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
```
Drop columns
```pycon
>>> df.drop(['B', 'C'], axis=1).execute()
A D
0 0 3
1 4 7
2 8 11
```
```pycon
>>> df.drop(columns=['B', 'C']).execute()
A D
0 0 3
1 4 7
2 8 11
```
Drop a row by index
```pycon
>>> df.drop([0, 1]).execute()
A B C D
2 8 9 10 11
```
Drop columns and/or rows of MultiIndex DataFrame
```pycon
>>> midx = pd.MultiIndex(levels=[['lame', 'cow', 'falcon'],
... ['speed', 'weight', 'length']],
... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
... [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> df = md.DataFrame(index=midx, columns=['big', 'small'],
... data=[[45, 30], [200, 100], [1.5, 1], [30, 20],
... [250, 150], [1.5, 0.8], [320, 250],
... [1, 0.8], [0.3, 0.2]])
>>> df.execute()
big small
lame speed 45.0 30.0
weight 200.0 100.0
length 1.5 1.0
cow speed 30.0 20.0
weight 250.0 150.0
length 1.5 0.8
falcon speed 320.0 250.0
weight 1.0 0.8
length 0.3 0.2
```
```pycon
>>> df.drop(index='cow', columns='small').execute()
big
lame speed 45.0
weight 200.0
length 1.5
falcon speed 320.0
weight 1.0
length 0.3
```
```pycon
>>> df.drop(index='length', level=1).execute()
big small
lame speed 45.0 30.0
weight 200.0 100.0
cow speed 30.0 20.0
weight 250.0 150.0
falcon speed 320.0 250.0
weight 1.0 0.8
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.drop_duplicates.md
# maxframe.dataframe.DataFrame.drop_duplicates
#### DataFrame.drop_duplicates(subset=None, keep='first', inplace=False, ignore_index=False, method='auto', default_index_type=None)
Return DataFrame with duplicate rows removed.
Considering certain columns is optional. Indexes, including time indexes
are ignored.
* **Parameters:**
* **subset** (*column label* *or* *sequence* *of* *labels* *,* *optional*) – Only consider certain columns for identifying duplicates, by
default use all of the columns.
* **keep** ( *{'first'* *,* *'last'* *,* *False}* *,* *default 'first'*) – Determines which duplicates (if any) to keep.
- `first` : Drop duplicates except for the first occurrence.
- `last` : Drop duplicates except for the last occurrence.
- `any` : Drop duplicates except for a random occurrence.
- False : Drop all duplicates.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether to drop duplicates in place or to return a copy.
* **ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
* **Returns:**
DataFrame with duplicates removed or None if `inplace=True`.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.droplevel.md
# maxframe.dataframe.DataFrame.droplevel
#### DataFrame.droplevel(level, axis=0)
Return Series/DataFrame with requested index / column level(s) removed.
* **Parameters:**
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *, or* *list-like*) – If a string is given, must be the name of a level
If list-like, elements must be names or positional indexes
of levels.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) –
Axis along which the level(s) is removed:
* 0 or ‘index’: remove level(s) in column.
* 1 or ‘columns’: remove level(s) in row.
For Series this parameter is unused and defaults to 0.
* **Returns:**
Series/DataFrame with requested index / column level(s) removed.
* **Return type:**
Series/DataFrame
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([
... [1, 2, 3, 4],
... [5, 6, 7, 8],
... [9, 10, 11, 12]
... ]).set_index([0, 1]).rename_axis(['a', 'b'])
```
```pycon
>>> df.columns = md.MultiIndex.from_tuples([
... ('c', 'e'), ('d', 'f')
... ], names=['level_1', 'level_2'])
```
```pycon
>>> df.execute()
level_1 c d
level_2 e f
a b
1 2 3 4
5 6 7 8
9 10 11 12
```
```pycon
>>> df.droplevel('a').execute()
level_1 c d
level_2 e f
b
2 3 4
6 7 8
10 11 12
```
```pycon
>>> df.droplevel('level_2', axis=1).execute()
level_1 c d
a b
1 2 3 4
5 6 7 8
9 10 11 12
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.dropna.md
# maxframe.dataframe.DataFrame.dropna
#### DataFrame.dropna(axis=0, how=<no_default>, thresh=<no_default>, subset=None, inplace=False, ignore_index=False)
Remove missing values.
See the [User Guide](https://www.statsmodels.org/devel/missing.html#missing-data) for more on which values are
considered missing, and how to work with missing data.
* **Parameters:**
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) –
Determine if rows or columns which contain missing values are
removed.
* 0, or ‘index’ : Drop rows which contain missing values.
* 1, or ‘columns’ : Drop columns which contain missing value.
* **how** ( *{'any'* *,* *'all'}* *,* *default 'any'*) –
Determine if row or column is removed from DataFrame, when we have
at least one NA or all NA.
* ’any’ : If any NA values are present, drop that row or column.
* ’all’ : If all values are NA, drop that row or column.
* **thresh** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Require that many non-NA values.
* **subset** (*array-like* *,* *optional*) – Labels along other axis to consider, e.g. if you are dropping rows
these would be a list of columns to include.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, do operation inplace and return None.
* **ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
* **Returns:**
DataFrame with NA entries dropped from it.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.isna`](maxframe.dataframe.DataFrame.isna.md#maxframe.dataframe.DataFrame.isna)
: Indicate missing values.
[`DataFrame.notna`](maxframe.dataframe.DataFrame.notna.md#maxframe.dataframe.DataFrame.notna)
: Indicate existing (non-missing) values.
[`DataFrame.fillna`](maxframe.dataframe.DataFrame.fillna.md#maxframe.dataframe.DataFrame.fillna)
: Replace missing values.
[`Series.dropna`](maxframe.dataframe.Series.dropna.md#maxframe.dataframe.Series.dropna)
: Drop missing values.
[`Index.dropna`](maxframe.dataframe.Index.dropna.md#maxframe.dataframe.Index.dropna)
: Drop missing indices.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
... "toy": [np.nan, 'Batmobile', 'Bullwhip'],
... "born": [md.NaT, md.Timestamp("1940-04-25"),
... md.NaT]})
>>> df.execute()
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
```
Drop the rows where at least one element is missing.
```pycon
>>> df.dropna().execute()
name toy born
1 Batman Batmobile 1940-04-25
```
Drop the rows where all elements are missing.
```pycon
>>> df.dropna(how='all').execute()
name toy born
0 Alfred NaN NaT
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
```
Keep only the rows with at least 2 non-NA values.
```pycon
>>> df.dropna(thresh=2).execute()
name toy born
1 Batman Batmobile 1940-04-25
2 Catwoman Bullwhip NaT
```
Define in which columns to look for missing values.
```pycon
>>> df.dropna(subset=['name', 'born']).execute()
name toy born
1 Batman Batmobile 1940-04-25
```
Keep the DataFrame with valid entries in the same variable.
```pycon
>>> df.dropna(inplace=True)
>>> df.execute()
name toy born
1 Batman Batmobile 1940-04-25
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.dtypes.md
# maxframe.dataframe.DataFrame.dtypes
#### *property* DataFrame.dtypes
Return the dtypes in the DataFrame.
This returns a Series with the data type of each column.
The result’s index is the original DataFrame’s columns. Columns
with mixed types are stored with the `object` dtype. See
[the User Guide](https://pandas.pydata.org/docs/user_guide/basics.html#basics-dtypes) for more.
* **Returns:**
The data type of each column.
* **Return type:**
[pandas.Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'float': [1.0],
... 'int': [1],
... 'datetime': [md.Timestamp('20180310')],
... 'string': ['foo']})
>>> df.dtypes
float float64
int int64
datetime datetime64[ns]
string object
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.duplicated.md
# maxframe.dataframe.DataFrame.duplicated
#### DataFrame.duplicated(subset=None, keep='first', method='auto')
Return boolean Series denoting duplicate rows.
Considering certain columns is optional.
* **Parameters:**
* **subset** (*column label* *or* *sequence* *of* *labels* *,* *optional*) – Only consider certain columns for identifying duplicates, by
default use all of the columns.
* **keep** ( *{'first'* *,* *'last'* *,* *False}* *,* *default 'first'*) –
Determines which duplicates (if any) to mark.
- `first` : Mark duplicates as `True` except for the first occurrence.
- `last` : Mark duplicates as `True` except for the last occurrence.
- False : Mark all duplicates as `True`.
* **Returns:**
Boolean series for each duplicated rows.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
`Index.duplicated`
: Equivalent method on index.
`Series.duplicated`
: Equivalent method on Series.
[`Series.drop_duplicates`](maxframe.dataframe.Series.drop_duplicates.md#maxframe.dataframe.Series.drop_duplicates)
: Remove duplicate values from Series.
[`DataFrame.drop_duplicates`](maxframe.dataframe.DataFrame.drop_duplicates.md#maxframe.dataframe.DataFrame.drop_duplicates)
: Remove duplicate values from DataFrame.
### Examples
Consider dataset containing ramen rating.
```pycon
>>> import maxframe.dataframe as md
```
```pycon
>>> df = md.DataFrame({
... 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
... 'rating': [4, 4, 3.5, 15, 5]
... })
>>> df.execute()
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
```
By default, for each set of duplicated values, the first occurrence
is set on False and all others on True.
```pycon
>>> df.duplicated().execute()
0 False
1 True
2 False
3 False
4 False
dtype: bool
```
By using ‘last’, the last occurrence of each set of duplicated values
is set on False and all others on True.
```pycon
>>> df.duplicated(keep='last').execute()
0 True
1 False
2 False
3 False
4 False
dtype: bool
```
By setting `keep` on False, all duplicates are True.
```pycon
>>> df.duplicated(keep=False).execute()
0 True
1 True
2 False
3 False
4 False
dtype: bool
```
To find duplicates on specific column(s), use `subset`.
```pycon
>>> df.duplicated(subset=['brand']).execute()
0 False
1 True
2 False
3 True
4 True
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.eq.md
# maxframe.dataframe.DataFrame.eq
#### DataFrame.eq(other, axis='columns', level=None, fill_value=None)
Get Equal to of dataframe and other, element-wise (binary operator eq).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison
operators.
Equivalent to dataframe == other with support to choose axis (rows or columns)
and level for comparison.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 'columns'*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’).
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the passed
MultiIndex level.
* **Returns:**
Result of the comparison.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`DataFrame.eq`](#maxframe.dataframe.DataFrame.eq)
: Compare DataFrames for equality elementwise.
[`DataFrame.ne`](maxframe.dataframe.DataFrame.ne.md#maxframe.dataframe.DataFrame.ne)
: Compare DataFrames for inequality elementwise.
[`DataFrame.le`](maxframe.dataframe.DataFrame.le.md#maxframe.dataframe.DataFrame.le)
: Compare DataFrames for less than inequality or equality elementwise.
[`DataFrame.lt`](maxframe.dataframe.DataFrame.lt.md#maxframe.dataframe.DataFrame.lt)
: Compare DataFrames for strictly less than inequality elementwise.
[`DataFrame.ge`](maxframe.dataframe.DataFrame.ge.md#maxframe.dataframe.DataFrame.ge)
: Compare DataFrames for greater than inequality or equality elementwise.
[`DataFrame.gt`](maxframe.dataframe.DataFrame.gt.md#maxframe.dataframe.DataFrame.gt)
: Compare DataFrames for strictly greater than inequality elementwise.
### Notes
Mismatched indices will be unioned together.
NaN values are considered different (i.e. NaN != NaN).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]},
... index=['A', 'B', 'C'])
>>> df.execute()
cost revenue
A 250 100
B 150 250
C 100 300
```
Comparison with a scalar, using either the operator or method:
```pycon
>>> (df == 100).execute()
cost revenue
A False True
B False False
C True False
```
```pycon
>>> df.eq(100).execute()
cost revenue
A False True
B False False
C True False
```
When other is a [`Series`](maxframe.dataframe.Series.md#maxframe.dataframe.Series), the columns of a DataFrame are aligned
with the index of other and broadcast:
```pycon
>>> (df != pd.Series([100, 250], index=["cost", "revenue"])).execute()
cost revenue
A True True
B True False
C False True
```
Use the method to control the broadcast axis:
```pycon
>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index').execute()
cost revenue
A True False
B True True
C True True
D True True
```
When comparing to an arbitrary sequence, the number of columns must
match the number elements in other:
```pycon
>>> (df == [250, 100]).execute()
cost revenue
A True True
B False False
C False False
```
Use the method to control the axis:
```pycon
>>> df.eq([250, 250, 100], axis='index').execute()
cost revenue
A True False
B False True
C True False
```
Compare to a DataFrame of different shape.
```pycon
>>> other = md.DataFrame({'revenue': [300, 250, 100, 150]},
... index=['A', 'B', 'C', 'D'])
>>> other.execute()
revenue
A 300
B 250
C 100
D 150
```
```pycon
>>> df.gt(other).execute()
cost revenue
A False False
B False False
C False True
D False False
```
Compare to a MultiIndex by level.
```pycon
>>> df_multindex = md.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex.execute()
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225
```
```pycon
>>> df.le(df_multindex, level=1).execute()
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.eval.md
# maxframe.dataframe.DataFrame.eval
#### DataFrame.eval(expr, inplace=False, \*\*kwargs)
Evaluate a string describing operations on DataFrame columns.
Operates on columns only, not specific rows or elements. This allows
eval to run arbitrary code, which can make you vulnerable to code
injection if you pass user input to this function.
* **Parameters:**
* **expr** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The expression string to evaluate.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If the expression contains an assignment, whether to perform the
operation inplace and mutate the existing DataFrame. Otherwise,
a new DataFrame is returned.
* **\*\*kwargs** – See the documentation for [`eval()`](maxframe.dataframe.eval.md#maxframe.dataframe.eval) for complete details
on the keyword arguments accepted by
[`query()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html#pandas.DataFrame.query).
* **Returns:**
The result of the evaluation.
* **Return type:**
ndarray, scalar, or pandas object
#### SEE ALSO
[`DataFrame.query`](maxframe.dataframe.DataFrame.query.md#maxframe.dataframe.DataFrame.query)
: Evaluates a boolean expression to query the columns of a frame.
[`DataFrame.assign`](maxframe.dataframe.DataFrame.assign.md#maxframe.dataframe.DataFrame.assign)
: Can evaluate an expression or function to create new values for a column.
[`eval`](maxframe.dataframe.eval.md#maxframe.dataframe.eval)
: Evaluate a Python expression as a string using various backends.
### Notes
For more details see the API documentation for [`eval()`](maxframe.dataframe.eval.md#maxframe.dataframe.eval).
For detailed examples see [enhancing performance with eval](https://pandas.pydata.org/docs/user_guide/enhancingperf.html#enhancingperf-eval).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A': range(1, 6), 'B': range(10, 0, -2)})
>>> df.execute()
A B
0 1 10
1 2 8
2 3 6
3 4 4
4 5 2
>>> df.eval('A + B').execute()
0 11
1 10
2 9
3 8
4 7
dtype: int64
```
Assignment is allowed though by default the original DataFrame is not
modified.
```pycon
>>> df.eval('C = A + B').execute()
A B C
0 1 10 11
1 2 8 10
2 3 6 9
3 4 4 8
4 5 2 7
>>> df.execute()
A B
0 1 10
1 2 8
2 3 6
3 4 4
4 5 2
```
Use `inplace=True` to modify the original DataFrame.
```pycon
>>> df.eval('C = A + B', inplace=True)
>>> df.execute()
A B C
0 1 10 11
1 2 8 10
2 3 6 9
3 4 4 8
4 5 2 7
```
Multiple columns can be assigned to using multi-line expressions:
```pycon
>>> df.eval('''
... C = A + B
... D = A - B
... ''').execute()
A B C D
0 1 10 11 -9
1 2 8 10 -6
2 3 6 9 -3
3 4 4 8 0
4 5 2 7 3
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.ewm.md
# maxframe.dataframe.DataFrame.ewm
#### DataFrame.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0)
Provide exponential weighted functions.
* **Parameters:**
* **com** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Specify decay in terms of center of mass,
$\alpha = 1 / (1 + com),\text{ for } com \geq 0$.
* **span** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Specify decay in terms of span,
$\alpha = 2 / (span + 1),\text{ for } span \geq 1$.
* **halflife** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Specify decay in terms of half-life,
$\alpha = 1 - exp(log(0.5) / halflife),\text{for} halflife > 0$.
* **alpha** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Specify smoothing factor $\alpha$ directly,
$0 < \alpha \leq 1$.
* **min_periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 0*) – Minimum number of observations in window required to have a value
(otherwise result is NA).
* **adjust** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Divide by decaying adjustment factor in beginning periods to account
for imbalance in relative weightings
(viewing EWMA as a moving average).
* **ignore_na** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Ignore missing values when calculating weights;
specify True to reproduce pre-0.15.0 behavior.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – The axis to use. The value 0 identifies the rows, and 1
identifies the columns.
* **Returns:**
A Window sub-classed for the particular operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`rolling`](maxframe.dataframe.DataFrame.rolling.md#maxframe.dataframe.DataFrame.rolling)
: Provides rolling window calculations.
[`expanding`](maxframe.dataframe.DataFrame.expanding.md#maxframe.dataframe.DataFrame.expanding)
: Provides expanding transformations.
### Notes
Exactly one of center of mass, span, half-life, and alpha must be provided.
Allowed values and relationship between the parameters are specified in the
parameter descriptions above; see the link at the end of this section for
a detailed explanation.
When adjust is True (default), weighted averages are calculated using
weights (1-alpha)\*\*(n-1), (1-alpha)\*\*(n-2), …, 1-alpha, 1.
When adjust is False, weighted averages are calculated recursively as:
> weighted_average[0] = arg[0];
> weighted_average[i] = (1-alpha)\*weighted_average[i-1] + alpha\*arg[i].
When ignore_na is False (default), weights are based on absolute positions.
For example, the weights of x and y used in calculating the final weighted
average of [x, None, y] are (1-alpha)\*\*2 and 1 (if adjust is True), and
(1-alpha)\*\*2 and alpha (if adjust is False).
When ignore_na is True (reproducing pre-0.15.0 behavior), weights are based
on relative positions. For example, the weights of x and y used in
calculating the final weighted average of [x, None, y] are 1-alpha and 1
(if adjust is True), and 1-alpha and alpha (if adjust is False).
More details can be found at
[https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#exponentially-weighted-windows](https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#exponentially-weighted-windows)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df.execute()
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
>>> df.ewm(com=0.5).mean().execute()
B
0 0.000000
1 0.750000
2 1.615385
3 1.615385
4 3.670213
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.expanding.md
# maxframe.dataframe.DataFrame.expanding
#### DataFrame.expanding(min_periods=1, shift=0, reverse_range=False)
Provide expanding transformations.
* **Parameters:**
* **min_periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 1*)
* **value** (*Minimum number* *of* *observations in window required to have a*)
* **NA****)****.** ( *(**otherwise result is*)
* **center** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*)
* **window.** (*Set the labels at the center* *of* *the*)
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 0*)
* **Return type:**
a Window sub-classed for the particular operation
#### SEE ALSO
[`rolling`](maxframe.dataframe.DataFrame.rolling.md#maxframe.dataframe.DataFrame.rolling)
: Provides rolling window calculations.
[`ewm`](maxframe.dataframe.DataFrame.ewm.md#maxframe.dataframe.DataFrame.ewm)
: Provides exponential weighted functions.
### Notes
By default, the result is set to the right edge of the window. This can be
changed to the center of the window by setting `center=True`.
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df.execute()
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
>>> df.expanding(2).sum().execute()
B
0 NaN
1 1.0
2 3.0
3 3.0
4 7.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.fillna.md
# maxframe.dataframe.DataFrame.fillna
#### DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
Fill NA/NaN values using the specified method.
* **Parameters:**
* **value** (*scalar* *,* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Value to use to fill holes (e.g. 0), alternately a
dict/Series/DataFrame of values specifying which value to use for
each index (for a Series) or column (for a DataFrame). Values not
in the dict/Series/DataFrame will not be filled. This value cannot
be a list.
* **method** ( *{'backfill'* *,* *'bfill'* *,* *'pad'* *,* *'ffill'* *,* *None}* *,* *default None*) – Method to use for filling holes in reindexed Series
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use next valid observation to fill gap.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Axis along which to fill missing values.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, fill in-place. Note: this will modify any
other views on this object (e.g., a no-copy slice for a column in a
DataFrame).
* **limit** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled. Must be greater than 0 if not None.
* **downcast** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *default is None*) – A dict of item->dtype of what to downcast if possible,
or the string ‘infer’ which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
* **Returns:**
Object with missing values filled or None if `inplace=True`.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or None
#### SEE ALSO
`interpolate`
: Fill NaN values using interpolation.
[`reindex`](maxframe.dataframe.DataFrame.reindex.md#maxframe.dataframe.DataFrame.reindex)
: Conform object to new index.
`asfreq`
: Convert TimeSeries to specified frequency.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[np.nan, 2, np.nan, 0],
[3, 4, np.nan, 1],
[np.nan, np.nan, np.nan, 5],
[np.nan, 3, np.nan, 4]],
columns=list('ABCD'))
>>> df.execute()
A B C D
0 NaN 2.0 NaN 0
1 3.0 4.0 NaN 1
2 NaN NaN NaN 5
3 NaN 3.0 NaN 4
```
Replace all NaN elements with 0s.
```pycon
>>> df.fillna(0).execute()
A B C D
0 0.0 2.0 0.0 0
1 3.0 4.0 0.0 1
2 0.0 0.0 0.0 5
3 0.0 3.0 0.0 4
```
We can also propagate non-null values forward or backward.
```pycon
>>> df.fillna(method='ffill').execute()
A B C D
0 NaN 2.0 NaN 0
1 3.0 4.0 NaN 1
2 3.0 4.0 NaN 5
3 3.0 3.0 NaN 4
```
Replace all NaN elements in column ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1,
2, and 3 respectively.
```pycon
>>> values = {'A': 0, 'B': 1, 'C': 2, 'D': 3}
>>> df.fillna(value=values).execute()
A B C D
0 0.0 2.0 2.0 0
1 3.0 4.0 2.0 1
2 0.0 1.0 2.0 5
3 0.0 3.0 2.0 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.filter.md
# maxframe.dataframe.DataFrame.filter
#### DataFrame.filter(items=None, like=None, regex=None, axis=None)
Subset the dataframe rows or columns according to the specified index labels.
Note that this routine does not filter a dataframe on its
contents. The filter is applied to the labels of the index.
* **Parameters:**
* **items** (*list-like*) – Keep labels from axis which are in items.
* **like** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Keep labels from axis for which “like in label == True”.
* **regex** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *(**regular expression* *)*) – Keep labels from axis for which re.search(regex, label) == True.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'* *,* *None}* *,* *default None*) – The axis to filter on, expressed either as an index (int)
or axis name (str). By default this is the info axis, ‘columns’ for
DataFrame. For Series this parameter is unused and defaults to None.
* **Return type:**
same type as input object
#### SEE ALSO
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Access a group of rows and columns by label(s) or a boolean array.
### Notes
The `items`, `like`, and `regex` parameters are
enforced to be mutually exclusive.
`axis` defaults to the info axis that is used when indexing
with `[]`.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(mt.array(([1, 2, 3], [4, 5, 6])),
... index=['mouse', 'rabbit'],
... columns=['one', 'two', 'three'])
>>> df.execute()
one two three
mouse 1 2 3
rabbit 4 5 6
```
```pycon
>>> # select columns by name
>>> df.filter(items=['one', 'three']).execute()
one three
mouse 1 3
rabbit 4 6
```
```pycon
>>> # select columns by regular expression
>>> df.filter(regex='e$', axis=1).execute()
one three
mouse 1 3
rabbit 4 6
```
```pycon
>>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0).execute()
one two three
rabbit 4 5 6
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.first_valid_index.md
# maxframe.dataframe.DataFrame.first_valid_index
#### DataFrame.first_valid_index()
Return index for first non-NA value or None, if no non-NA value is found.
* **Return type:**
[type](https://docs.python.org/3/library/functions.html#type) of index
### Examples
For Series:
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([None, 3, 4])
>>> s.first_valid_index().execute()
1
>>> s.last_valid_index().execute()
2
```
```pycon
>>> s = md.Series([None, None])
>>> print(s.first_valid_index()).execute()
None
>>> print(s.last_valid_index()).execute()
None
```
If all elements in Series are NA/null, returns None.
```pycon
>>> s = md.Series()
>>> print(s.first_valid_index()).execute()
None
>>> print(s.last_valid_index()).execute()
None
```
If Series is empty, returns None.
For DataFrame:
```pycon
>>> df = md.DataFrame({'A': [None, None, 2], 'B': [None, 3, 4]})
>>> df.execute()
A B
0 NaN NaN
1 NaN 3.0
2 2.0 4.0
>>> df.first_valid_index().execute()
1
>>> df.last_valid_index().execute()
2
```
```pycon
>>> df = md.DataFrame({'A': [None, None, None], 'B': [None, None, None]})
>>> df.execute()
A B
0 None None
1 None None
2 None None
>>> print(df.first_valid_index()).execute()
None
>>> print(df.last_valid_index()).execute()
None
```
If all elements in DataFrame are NA/null, returns None.
```pycon
>>> df = md.DataFrame()
>>> df.execute()
Empty DataFrame
Columns: []
Index: []
>>> print(df.first_valid_index()).execute()
None
>>> print(df.last_valid_index()).execute()
None
```
If DataFrame is empty, returns None.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.floordiv.md
# maxframe.dataframe.DataFrame.floordiv
#### DataFrame.floordiv(other, axis='columns', level=None, fill_value=None)
Get Integer division of dataframe and other, element-wise (binary operator floordiv).
Equivalent to `//`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rfloordiv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.from_dict.md
# maxframe.dataframe.DataFrame.from_dict
#### *static* DataFrame.from_dict(data, orient='columns', dtype=None, columns=None)
Construct DataFrame from dict of array-like or dicts.
Creates DataFrame object from dictionary by columns or by index
allowing dtype specification.
* **Parameters:**
* **data** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) – Of the form {field : array-like} or {field : dict}.
* **orient** ( *{'columns'* *,* *'index'* *,* *'tight'}* *,* *default 'columns'*) – The “orientation” of the data. If the keys of the passed dict
should be the columns of the resulting DataFrame, pass ‘columns’
(default). Otherwise if the keys should be rows, pass ‘index’.
If ‘tight’, assume a dict with keys [‘index’, ‘columns’, ‘data’,
‘index_names’, ‘column_names’].
* **dtype** (*dtype* *,* *default None*) – Data type to force after DataFrame construction, otherwise infer.
* **columns** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *default None*) – Column labels to use when `orient='index'`. Raises a ValueError
if used with `orient='columns'` or `orient='tight'`.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.from_records`](maxframe.dataframe.DataFrame.from_records.md#maxframe.dataframe.DataFrame.from_records)
: DataFrame from structured ndarray, sequence of tuples or dicts, or DataFrame.
[`DataFrame`](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
: DataFrame object creation using constructor.
[`DataFrame.to_dict`](maxframe.dataframe.DataFrame.to_dict.md#maxframe.dataframe.DataFrame.to_dict)
: Convert the DataFrame to a dictionary.
### Examples
By default the keys of the dict become the DataFrame columns:
```pycon
>>> import maxframe.dataframe as md
>>> data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
>>> md.DataFrame.from_dict(data).execute()
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
```
Specify `orient='index'` to create the DataFrame using dictionary
keys as rows:
```pycon
>>> data = {'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd']}
>>> md.DataFrame.from_dict(data, orient='index').execute()
0 1 2 3
row_1 3 2 1 0
row_2 a b c d
```
When using the ‘index’ orientation, the column names can be
specified manually:
```pycon
>>> md.DataFrame.from_dict(data, orient='index',
... columns=['A', 'B', 'C', 'D']).execute()
A B C D
row_1 3 2 1 0
row_2 a b c d
```
Specify `orient='tight'` to create the DataFrame using a ‘tight’
format:
```pycon
>>> data = {'index': [('a', 'b'), ('a', 'c')],
... 'columns': [('x', 1), ('y', 2)],
... 'data': [[1, 3], [2, 4]],
... 'index_names': ['n1', 'n2'],
... 'column_names': ['z1', 'z2']}
>>> md.DataFrame.from_dict(data, orient='tight').execute()
z1 x y
z2 1 2
n1 n2
a b 1 3
c 2 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.from_records.md
# maxframe.dataframe.DataFrame.from_records
#### *static* DataFrame.from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None, gpu=None, sparse=False, \*\*kw)
Convert structured or record ndarray to DataFrame.
Creates a DataFrame object from a structured ndarray, sequence of
tuples or dicts, or DataFrame.
* **Parameters:**
* **data** (*structured ndarray* *,* *sequence* *of* *tuples* *or* *dicts* *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) –
Structured input data.
#### Deprecated
Deprecated since version 2.1.0: Passing a DataFrame is deprecated.
* **index** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *fields* *,* *array-like*) – Field of array to use as the index, alternately a specific set of
input labels to use.
* **exclude** (*sequence* *,* *default None*) – Columns or fields to exclude.
* **columns** (*sequence* *,* *default None*) – Column names to use. If the passed data do not have names
associated with them, this argument provides names for the
columns. Otherwise this argument indicates the order of the columns
in the result (any names not found in the data will become all-NA
columns).
* **coerce_float** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Attempt to convert values of non-string, non-numeric objects (like
decimal.Decimal) to floating point, useful for SQL result sets.
* **nrows** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Number of rows to read if data is an iterator.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.from_dict`](maxframe.dataframe.DataFrame.from_dict.md#maxframe.dataframe.DataFrame.from_dict)
: DataFrame from dict of array-like or dicts.
[`DataFrame`](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
: DataFrame object creation using constructor.
### Examples
Data can be provided as a structured ndarray:
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> data = mt.array([(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')],
... dtype=[('col_1', 'i4'), ('col_2', 'U1')])
>>> md.DataFrame.from_records(data).execute()
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
```
Data can be provided as a list of dicts:
```pycon
>>> data = [{'col_1': 3, 'col_2': 'a'},
... {'col_1': 2, 'col_2': 'b'},
... {'col_1': 1, 'col_2': 'c'},
... {'col_1': 0, 'col_2': 'd'}]
>>> md.DataFrame.from_records(data).execute()
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
```
Data can be provided as a list of tuples with corresponding columns:
```pycon
>>> data = [(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')]
>>> md.DataFrame.from_records(data, columns=['col_1', 'col_2']).execute()
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.ge.md
# maxframe.dataframe.DataFrame.ge
#### DataFrame.ge(other, axis='columns', level=None, fill_value=None)
Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison
operators.
Equivalent to dataframe >= other with support to choose axis (rows or columns)
and level for comparison.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 'columns'*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’).
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the passed
MultiIndex level.
* **Returns:**
Result of the comparison.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`DataFrame.eq`](maxframe.dataframe.DataFrame.eq.md#maxframe.dataframe.DataFrame.eq)
: Compare DataFrames for equality elementwise.
[`DataFrame.ne`](maxframe.dataframe.DataFrame.ne.md#maxframe.dataframe.DataFrame.ne)
: Compare DataFrames for inequality elementwise.
[`DataFrame.le`](maxframe.dataframe.DataFrame.le.md#maxframe.dataframe.DataFrame.le)
: Compare DataFrames for less than inequality or equality elementwise.
[`DataFrame.lt`](maxframe.dataframe.DataFrame.lt.md#maxframe.dataframe.DataFrame.lt)
: Compare DataFrames for strictly less than inequality elementwise.
[`DataFrame.ge`](#maxframe.dataframe.DataFrame.ge)
: Compare DataFrames for greater than inequality or equality elementwise.
[`DataFrame.gt`](maxframe.dataframe.DataFrame.gt.md#maxframe.dataframe.DataFrame.gt)
: Compare DataFrames for strictly greater than inequality elementwise.
### Notes
Mismatched indices will be unioned together.
NaN values are considered different (i.e. NaN != NaN).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]},
... index=['A', 'B', 'C'])
>>> df.execute()
cost revenue
A 250 100
B 150 250
C 100 300
```
Comparison with a scalar, using either the operator or method:
```pycon
>>> (df == 100).execute()
cost revenue
A False True
B False False
C True False
```
```pycon
>>> df.eq(100).execute()
cost revenue
A False True
B False False
C True False
```
When other is a [`Series`](maxframe.dataframe.Series.md#maxframe.dataframe.Series), the columns of a DataFrame are aligned
with the index of other and broadcast:
```pycon
>>> (df != pd.Series([100, 250], index=["cost", "revenue"])).execute()
cost revenue
A True True
B True False
C False True
```
Use the method to control the broadcast axis:
```pycon
>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index').execute()
cost revenue
A True False
B True True
C True True
D True True
```
When comparing to an arbitrary sequence, the number of columns must
match the number elements in other:
```pycon
>>> (df == [250, 100]).execute()
cost revenue
A True True
B False False
C False False
```
Use the method to control the axis:
```pycon
>>> df.eq([250, 250, 100], axis='index').execute()
cost revenue
A True False
B False True
C True False
```
Compare to a DataFrame of different shape.
```pycon
>>> other = md.DataFrame({'revenue': [300, 250, 100, 150]},
... index=['A', 'B', 'C', 'D'])
>>> other.execute()
revenue
A 300
B 250
C 100
D 150
```
```pycon
>>> df.gt(other).execute()
cost revenue
A False False
B False False
C False True
D False False
```
Compare to a MultiIndex by level.
```pycon
>>> df_multindex = md.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex.execute()
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225
```
```pycon
>>> df.le(df_multindex, level=1).execute()
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.groupby.md
# maxframe.dataframe.DataFrame.groupby
#### DataFrame.groupby(by=None, level=None, as_index=True, sort=True, group_keys=True)
Group DataFrame using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the
object, applying a function, and combining the results. This can be
used to group large amounts of data and compute operations on these
groups.
* **Parameters:**
* **by** (*mapping* *,* *function* *,* *label* *, or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *labels*) – Used to determine the groups for the groupby.
If `by` is a function, it’s called on each value of the object’s
index. If a dict or Series is passed, the Series or dict VALUES
will be used to determine the groups (the Series’ values are first
aligned; see `.align()` method). If an ndarray is passed, the
values are used as-is to determine the groups. A label or list of
labels may be passed to group by the columns in `self`. Notice
that a tuple is interpreted as a (single) key.
* **as_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – For aggregated output, return object with group labels as the
index. Only relevant for DataFrame input. as_index=False is
effectively “SQL-style” grouped output.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Sort group keys. Get better performance by turning this off.
Note this does not influence the order of observations within each
group. Groupby preserves the order of rows within each group.
* **group_keys** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – When calling apply, add group keys to index to identify pieces.
### Notes
MaxFrame only supports groupby with axis=0.
Default value of group_keys will be decided given the version of local
pandas library, which is True since pandas 2.0.
* **Returns:**
Returns a groupby object that contains information about the groups.
* **Return type:**
DataFrameGroupBy
#### SEE ALSO
`resample`
: Convenience method for frequency conversion and resampling of time series.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'Animal': ['Falcon', 'Falcon',
... 'Parrot', 'Parrot'],
... 'Max Speed': [380., 370., 24., 26.]})
>>> df.execute()
Animal Max Speed
0 Falcon 380.0
1 Falcon 370.0
2 Parrot 24.0
3 Parrot 26.0
>>> df.groupby(['Animal']).mean().execute()
Max Speed
Animal
Falcon 375.0
Parrot 25.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.gt.md
# maxframe.dataframe.DataFrame.gt
#### DataFrame.gt(other, axis='columns', level=None, fill_value=None)
Get Greater than of dataframe and other, element-wise (binary operator gt).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison
operators.
Equivalent to dataframe > other with support to choose axis (rows or columns)
and level for comparison.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 'columns'*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’).
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the passed
MultiIndex level.
* **Returns:**
Result of the comparison.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`DataFrame.eq`](maxframe.dataframe.DataFrame.eq.md#maxframe.dataframe.DataFrame.eq)
: Compare DataFrames for equality elementwise.
[`DataFrame.ne`](maxframe.dataframe.DataFrame.ne.md#maxframe.dataframe.DataFrame.ne)
: Compare DataFrames for inequality elementwise.
[`DataFrame.le`](maxframe.dataframe.DataFrame.le.md#maxframe.dataframe.DataFrame.le)
: Compare DataFrames for less than inequality or equality elementwise.
[`DataFrame.lt`](maxframe.dataframe.DataFrame.lt.md#maxframe.dataframe.DataFrame.lt)
: Compare DataFrames for strictly less than inequality elementwise.
[`DataFrame.ge`](maxframe.dataframe.DataFrame.ge.md#maxframe.dataframe.DataFrame.ge)
: Compare DataFrames for greater than inequality or equality elementwise.
[`DataFrame.gt`](#maxframe.dataframe.DataFrame.gt)
: Compare DataFrames for strictly greater than inequality elementwise.
### Notes
Mismatched indices will be unioned together.
NaN values are considered different (i.e. NaN != NaN).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]},
... index=['A', 'B', 'C'])
>>> df.execute()
cost revenue
A 250 100
B 150 250
C 100 300
```
Comparison with a scalar, using either the operator or method:
```pycon
>>> (df == 100).execute()
cost revenue
A False True
B False False
C True False
```
```pycon
>>> df.eq(100).execute()
cost revenue
A False True
B False False
C True False
```
When other is a [`Series`](maxframe.dataframe.Series.md#maxframe.dataframe.Series), the columns of a DataFrame are aligned
with the index of other and broadcast:
```pycon
>>> (df != pd.Series([100, 250], index=["cost", "revenue"])).execute()
cost revenue
A True True
B True False
C False True
```
Use the method to control the broadcast axis:
```pycon
>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index').execute()
cost revenue
A True False
B True True
C True True
D True True
```
When comparing to an arbitrary sequence, the number of columns must
match the number elements in other:
```pycon
>>> (df == [250, 100]).execute()
cost revenue
A True True
B False False
C False False
```
Use the method to control the axis:
```pycon
>>> df.eq([250, 250, 100], axis='index').execute()
cost revenue
A True False
B False True
C True False
```
Compare to a DataFrame of different shape.
```pycon
>>> other = md.DataFrame({'revenue': [300, 250, 100, 150]},
... index=['A', 'B', 'C', 'D'])
>>> other.execute()
revenue
A 300
B 250
C 100
D 150
```
```pycon
>>> df.gt(other).execute()
cost revenue
A False False
B False False
C False True
D False False
```
Compare to a MultiIndex by level.
```pycon
>>> df_multindex = md.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex.execute()
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225
```
```pycon
>>> df.le(df_multindex, level=1).execute()
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.head.md
# maxframe.dataframe.DataFrame.head
#### DataFrame.head(n=5)
Return the first n rows.
This function returns the first n rows for the object based
on position. It is useful for quickly testing if your object
has the right type of data in it.
For negative values of n, this function returns all rows except
the last n rows, equivalent to `df[:-n]`.
* **Parameters:**
**n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 5*) – Number of rows to select.
* **Returns:**
The first n rows of the caller object.
* **Return type:**
same type as caller
#### SEE ALSO
[`DataFrame.tail`](maxframe.dataframe.DataFrame.tail.md#maxframe.dataframe.DataFrame.tail)
: Returns the last n rows.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df.execute()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
```
Viewing the first 5 lines
```pycon
>>> df.head().execute()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
```
Viewing the first n lines (three in this case)
```pycon
>>> df.head(3).execute()
animal
0 alligator
1 bee
2 falcon
```
For negative values of n
```pycon
>>> df.head(-3).execute()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.iat.md
# maxframe.dataframe.DataFrame.iat
#### *property* DataFrame.iat
Access a single value for a row/column pair by integer position.
Similar to `iloc`, in that both provide integer-based lookups. Use
`iat` if you only need to get or set a single value in a DataFrame
or Series.
* **Raises:**
[**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – When integer position is out of bounds.
#### SEE ALSO
[`DataFrame.at`](maxframe.dataframe.DataFrame.at.md#maxframe.dataframe.DataFrame.at)
: Access a single value for a row/column label pair.
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Access a group of rows and columns by label(s).
[`DataFrame.iloc`](maxframe.dataframe.DataFrame.iloc.md#maxframe.dataframe.DataFrame.iloc)
: Access a group of rows and columns by integer position(s).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... columns=['A', 'B', 'C'])
>>> df.execute()
A B C
0 0 2 3
1 0 4 1
2 10 20 30
```
Get value at specified row/column pair
```pycon
>>> df.iat[1, 2].execute()
1
```
Set value at specified row/column pair
```pycon
>>> df.iat[1, 2] = 10
>>> df.iat[1, 2].execute()
10
```
Get value within a series
```pycon
>>> df.loc[0].iat[1].execute()
2
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.idxmax.md
# maxframe.dataframe.DataFrame.idxmax
#### DataFrame.idxmax(axis=0, skipna=True)
Return index of first occurrence of maximum over requested axis.
NA/null values are excluded.
* **Parameters:**
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.
* **skipna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
* **Returns:**
Indexes of maxima along the specified axis.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) –
* If the row/column is empty
#### SEE ALSO
[`Series.idxmax`](maxframe.dataframe.Series.idxmax.md#maxframe.dataframe.Series.idxmax)
: Return index of the maximum element.
### Notes
This method is the DataFrame version of `ndarray.argmax`.
### Examples
Consider a dataset containing food consumption in Argentina.
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'consumption': [10.51, 103.11, 55.48],
... 'co2_emissions': [37.2, 19.66, 1712]},
... index=['Pork', 'Wheat Products', 'Beef'])
```
```pycon
>>> df.execute()
consumption co2_emissions
Pork 10.51 37.20
Wheat Products 103.11 19.66
Beef 55.48 1712.00
```
By default, it returns the index for the maximum value in each column.
```pycon
>>> df.idxmax().execute()
consumption Wheat Products
co2_emissions Beef
dtype: object
```
To return the index for the maximum value in each row, use `axis="columns"`.
```pycon
>>> df.idxmax(axis="columns").execute()
Pork co2_emissions
Wheat Products consumption
Beef co2_emissions
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.idxmin.md
# maxframe.dataframe.DataFrame.idxmin
#### DataFrame.idxmin(axis=0, skipna=True)
Return index of first occurrence of minimum over requested axis.
NA/null values are excluded.
* **Parameters:**
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.
* **skipna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
* **Returns:**
Indexes of minima along the specified axis.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) –
* If the row/column is empty
#### SEE ALSO
[`Series.idxmin`](maxframe.dataframe.Series.idxmin.md#maxframe.dataframe.Series.idxmin)
: Return index of the minimum element.
### Notes
This method is the DataFrame version of `ndarray.argmin`.
### Examples
Consider a dataset containing food consumption in Argentina.
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'consumption': [10.51, 103.11, 55.48],
... 'co2_emissions': [37.2, 19.66, 1712]},
... index=['Pork', 'Wheat Products', 'Beef'])
```
```pycon
>>> df.execute()
consumption co2_emissions
Pork 10.51 37.20
Wheat Products 103.11 19.66
Beef 55.48 1712.00
```
By default, it returns the index for the minimum value in each column.
```pycon
>>> df.idxmin().execute()
consumption Pork
co2_emissions Wheat Products
dtype: object
```
To return the index for the minimum value in each row, use `axis="columns"`.
```pycon
>>> df.idxmin(axis="columns").execute()
Pork consumption
Wheat Products co2_emissions
Beef consumption
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.iloc.md
# maxframe.dataframe.DataFrame.iloc
#### *property* DataFrame.iloc
Purely integer-location based indexing for selection by position.
`.iloc[]` is primarily integer position based (from `0` to
`length-1` of the axis), but may also be used with a boolean
array.
Allowed inputs are:
- An integer, e.g. `5`.
- A list or array of integers, e.g. `[4, 3, 0]`.
- A slice object with ints, e.g. `1:7`.
- A boolean array.
- A `callable` function with one argument (the calling Series or
DataFrame) and that returns valid output for indexing (one of the above).
This is useful in method chains, when you don’t have a reference to the
calling object, but would like to base your selection on some value.
`.iloc` will raise `IndexError` if a requested indexer is
out-of-bounds, except *slice* indexers which allow out-of-bounds
indexing (this conforms with python/numpy *slice* semantics).
See more at [Selection by Position](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-integer).
#### SEE ALSO
[`DataFrame.iat`](maxframe.dataframe.DataFrame.iat.md#maxframe.dataframe.DataFrame.iat)
: Fast integer location scalar accessor.
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Purely label-location based indexer for selection by label.
[`Series.iloc`](maxframe.dataframe.Series.iloc.md#maxframe.dataframe.Series.iloc)
: Purely integer-location based indexing for selection by position.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
... {'a': 100, 'b': 200, 'c': 300, 'd': 400},
... {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
>>> df = md.DataFrame(mydict)
>>> df.execute()
a b c d
0 1 2 3 4
1 100 200 300 400
2 1000 2000 3000 4000
```
**Indexing just the rows**
With a scalar integer.
```pycon
>>> type(df.iloc[0]).execute()
<class 'pandas.core.series.Series'>
>>> df.iloc[0].execute()
a 1
b 2
c 3
d 4
Name: 0, dtype: int64
```
With a list of integers.
```pycon
>>> df.iloc[[0]].execute()
a b c d
0 1 2 3 4
>>> type(df.iloc[[0]]).execute()
<class 'pandas.core.frame.DataFrame'>
```
```pycon
>>> df.iloc[[0, 1]].execute()
a b c d
0 1 2 3 4
1 100 200 300 400
```
With a slice object.
```pycon
>>> df.iloc[:3].execute()
a b c d
0 1 2 3 4
1 100 200 300 400
2 1000 2000 3000 4000
```
With a boolean mask the same length as the index.
```pycon
>>> df.iloc[[True, False, True]].execute()
a b c d
0 1 2 3 4
2 1000 2000 3000 4000
```
With a callable, useful in method chains. The x passed
to the `lambda` is the DataFrame being sliced. This selects
the rows whose index label even.
```pycon
>>> df.iloc[lambda x: x.index % 2 == 0].execute()
a b c d
0 1 2 3 4
2 1000 2000 3000 4000
```
**Indexing both axes**
You can mix the indexer types for the index and columns. Use `:` to
select the entire axis.
With scalar integers.
```pycon
>>> df.iloc[0, 1].execute()
2
```
With lists of integers.
```pycon
>>> df.iloc[[0, 2], [1, 3]].execute()
b d
0 2 4
2 2000 4000
```
With slice objects.
```pycon
>>> df.iloc[1:3, 0:3].execute()
a b c
1 100 200 300
2 1000 2000 3000
```
With a boolean array whose length matches the columns.
```pycon
>>> df.iloc[:, [True, False, True, False]].execute()
a c
0 1 3
1 100 300
2 1000 3000
```
With a callable function that expects the Series or DataFrame.
```pycon
>>> df.iloc[:, lambda df: [0, 2]].execute()
a c
0 1 3
1 100 300
2 1000 3000
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.index.md
# maxframe.dataframe.DataFrame.index
#### *property* DataFrame.index
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.infer_objects.md
# maxframe.dataframe.DataFrame.infer_objects
#### DataFrame.infer_objects(copy=True)
Attempt to infer better dtypes for object columns.
Attempts soft conversion of object-dtyped
columns, leaving non-object and unconvertible
columns unchanged. The inference rules are the
same as during normal Series/DataFrame construction.
* **Returns:**
**converted**
* **Return type:**
same type as input object
#### SEE ALSO
[`to_datetime`](maxframe.dataframe.to_datetime.md#maxframe.dataframe.to_datetime)
: Convert argument to datetime.
`to_timedelta`
: Convert argument to timedelta.
[`to_numeric`](maxframe.dataframe.to_numeric.md#maxframe.dataframe.to_numeric)
: Convert argument to numeric type.
[`convert_dtypes`](maxframe.dataframe.DataFrame.convert_dtypes.md#maxframe.dataframe.DataFrame.convert_dtypes)
: Convert argument to best possible dtype.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df.execute()
A
1 1
2 2
3 3
```
```pycon
>>> df.dtypes.execute()
A object
dtype: object
```
```pycon
>>> df.infer_objects().dtypes.execute()
A int64
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.insert.md
# maxframe.dataframe.DataFrame.insert
#### DataFrame.insert(loc, column, value, allow_duplicates=False)
Insert column into DataFrame at specified location.
Raises a ValueError if column is already contained in the DataFrame,
unless allow_duplicates is set to True.
* **Parameters:**
* **loc** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Insertion index. Must verify 0 <= loc <= len(columns).
* **column** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *number* *, or* *hashable object*) – Label of the inserted column.
* **value** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* *array-like*)
* **allow_duplicates** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.isna.md
# maxframe.dataframe.DataFrame.isna
#### DataFrame.isna()
Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as None or `numpy.NaN`, gets mapped to True
values.
Everything else gets mapped to False values. Characters such as empty
strings `''` or `numpy.inf` are not considered NA values
(unless you set `pandas.options.mode.use_inf_as_na = True`).
* **Returns:**
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.isnull`](maxframe.dataframe.DataFrame.isnull.md#maxframe.dataframe.DataFrame.isnull)
: Alias of isna.
[`DataFrame.notna`](maxframe.dataframe.DataFrame.notna.md#maxframe.dataframe.DataFrame.notna)
: Boolean inverse of isna.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Omit axes labels with missing values.
[`isna`](maxframe.dataframe.isna.md#maxframe.dataframe.isna)
: Top-level isna.
### Examples
Show which entries in a DataFrame are NA.
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'age': [5, 6, np.NaN],
... 'born': [md.NaT, md.Timestamp('1939-05-27'),
... md.Timestamp('1940-04-25')],
... 'name': ['Alfred', 'Batman', ''],
... 'toy': [None, 'Batmobile', 'Joker']})
>>> df.execute()
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
```
```pycon
>>> df.isna().execute()
age born name toy
0 False True False True
1 False False False False
2 True False False False
```
Show which entries in a Series are NA.
```pycon
>>> ser = md.Series([5, 6, np.NaN])
>>> ser.execute()
0 5.0
1 6.0
2 NaN
dtype: float64
```
```pycon
>>> ser.isna().execute()
0 False
1 False
2 True
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.isnull.md
# maxframe.dataframe.DataFrame.isnull
#### DataFrame.isnull()
Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as None or `numpy.NaN`, gets mapped to True
values.
Everything else gets mapped to False values. Characters such as empty
strings `''` or `numpy.inf` are not considered NA values
(unless you set `pandas.options.mode.use_inf_as_na = True`).
* **Returns:**
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.isnull`](#maxframe.dataframe.DataFrame.isnull)
: Alias of isna.
[`DataFrame.notna`](maxframe.dataframe.DataFrame.notna.md#maxframe.dataframe.DataFrame.notna)
: Boolean inverse of isna.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Omit axes labels with missing values.
[`isna`](maxframe.dataframe.isna.md#maxframe.dataframe.isna)
: Top-level isna.
### Examples
Show which entries in a DataFrame are NA.
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'age': [5, 6, np.NaN],
... 'born': [md.NaT, md.Timestamp('1939-05-27'),
... md.Timestamp('1940-04-25')],
... 'name': ['Alfred', 'Batman', ''],
... 'toy': [None, 'Batmobile', 'Joker']})
>>> df.execute()
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
```
```pycon
>>> df.isna().execute()
age born name toy
0 False True False True
1 False False False False
2 True False False False
```
Show which entries in a Series are NA.
```pycon
>>> ser = md.Series([5, 6, np.NaN])
>>> ser.execute()
0 5.0
1 6.0
2 NaN
dtype: float64
```
```pycon
>>> ser.isna().execute()
0 False
1 False
2 True
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.join.md
# maxframe.dataframe.DataFrame.join
#### DataFrame.join(other: DataFrame | Series, on: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, how: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'left', lsuffix: [str](https://docs.python.org/3/library/stdtypes.html#str) = '', rsuffix: [str](https://docs.python.org/3/library/stdtypes.html#str) = '', sort: [bool](https://docs.python.org/3/library/functions.html#bool) = False, method: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, auto_merge: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'both', auto_merge_threshold: [int](https://docs.python.org/3/library/functions.html#int) = 8, bloom_filter: [bool](https://docs.python.org/3/library/functions.html#bool) | [Dict](https://docs.python.org/3/library/typing.html#typing.Dict) = True, bloom_filter_options: [Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)] = None, left_hint: JoinHint = None, right_hint: JoinHint = None) → DataFrame
Join columns of another DataFrame.
Join columns with other DataFrame either on index or on a key
column. Efficiently join multiple DataFrame objects by index at once by
passing a list.
* **Parameters:**
* **other** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Index should be similar to one of the columns in this one. If a
Series is passed, its name attribute must be set, and that will be
used as the column name in the resulting joined DataFrame.
* **on** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *, or* *array-like* *,* *optional*) – Column or index level name(s) in the caller to join on the index
in other, otherwise joins index-on-index. If multiple
values given, the other DataFrame must have a MultiIndex. Can
pass an array as the join key if it is not already contained in
the calling DataFrame. Like an Excel VLOOKUP operation.
* **how** ( *{'left'* *,* *'right'* *,* *'outer'* *,* *'inner'}* *,* *default 'left'*) –
How to handle the operation of the two objects.
* left: use calling frame’s index (or column if on is specified)
* right: use other’s index.
* outer: form union of calling frame’s index (or column if on is
specified) with other’s index, and sort it.
lexicographically.
* inner: form intersection of calling frame’s index (or column if
on is specified) with other’s index, preserving the order
of the calling’s one.
* **lsuffix** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default ''*) – Suffix to use from left frame’s overlapping columns.
* **rsuffix** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default ''*) – Suffix to use from right frame’s overlapping columns.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Order result DataFrame lexicographically by the join key. If False,
the order of the join key depends on the join type (how keyword).
* **method** ( *{"shuffle"* *,* *"broadcast"}* *,* *default None*) – “broadcast” is recommended when one DataFrame is much smaller than the other,
otherwise, “shuffle” will be a better choice. By default, we choose method
according to actual data size.
* **auto_merge** ( *{"both"* *,* *"none"* *,* *"before"* *,* *"after"}* *,* *default both*) –
Auto merge small chunks before or after merge
* ”both”: auto merge small chunks before and after,
* ”none”: do not merge small chunks
* ”before”: only merge small chunks before merge
* ”after”: only merge small chunks after merge
* **auto_merge_threshold** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 8*) – When how is “inner”, merged result could be much smaller than original DataFrame,
if the number of chunks is greater than the threshold,
it will merge small chunks automatically.
* **bloom_filter** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default "auto"*) – Use bloom filter to optimize merge
* **bloom_filter_options** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) –
* “max_elements”: max elements in bloom filter,
default value is the max size of all input chunks
* ”error_rate”: error raite, default 0.1.
* ”apply_chunk_size_threshold”: min chunk size of input chunks to apply bloom filter, default 10
when chunk size of left and right is greater than this threshold, apply bloom filter
* ”filter”: “large”, “small”, “both”, default “large”
decides to filter on large, small or both DataFrames.
* **left_hint** (*JoinHint* *,* *default None*) – Join strategy to use for left frame. When data skew occurs, consider these strategies to avoid long-tail issues,
but use them cautiously to prevent OOM and unnecessary overhead.
* **right_hint** (*JoinHint* *,* *default None*) – Join strategy to use for right frame.
* **Returns:**
A dataframe containing columns from both the caller and other.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.merge`](maxframe.dataframe.DataFrame.merge.md#maxframe.dataframe.DataFrame.merge)
: For column(s)-on-column(s) operations.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
... 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
```
```pycon
>>> df.execute()
key A
0 K0 A0
1 K1 A1
2 K2 A2
3 K3 A3
4 K4 A4
5 K5 A5
```
```pycon
>>> other = md.DataFrame({'key': ['K0', 'K1', 'K2'],
... 'B': ['B0', 'B1', 'B2']})
```
```pycon
>>> other.execute()
key B
0 K0 B0
1 K1 B1
2 K2 B2
```
Join DataFrames using their indexes.
```pycon
>>> df.join(other, lsuffix='_caller', rsuffix='_other').execute()
key_caller A key_other B
0 K0 A0 K0 B0
1 K1 A1 K1 B1
2 K2 A2 K2 B2
3 K3 A3 NaN NaN
4 K4 A4 NaN NaN
5 K5 A5 NaN NaN
```
If we want to join using the key columns, we need to set key to be
the index in both df and other. The joined DataFrame will have
key as its index.
```pycon
>>> df.set_index('key').join(other.set_index('key')).execute()
A B
key
K0 A0 B0
K1 A1 B1
K2 A2 B2
K3 A3 NaN
K4 A4 NaN
K5 A5 NaN
```
Another option to join using the key columns is to use the on
parameter. DataFrame.join always uses other’s index but we can use
any column in df. This method preserves the original DataFrame’s
index in the result.
```pycon
>>> df.join(other.set_index('key'), on='key').execute()
key A B
0 K0 A0 B0
1 K1 A1 B1
2 K2 A2 B2
3 K3 A3 NaN
4 K4 A4 NaN
5 K5 A5 NaN
```
Using non-unique key values shows how they are matched.
```pycon
>>> df = md.DataFrame({'key': ['K0', 'K1', 'K1', 'K3', 'K0', 'K1'],
... 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
```
```pycon
>>> df.execute()
key A
0 K0 A0
1 K1 A1
2 K1 A2
3 K3 A3
4 K0 A4
5 K1 A5
```
```pycon
>>> df.join(other.set_index('key'), on='key').execute()
key A B
0 K0 A0 B0
1 K1 A1 B1
2 K1 A2 B1
3 K3 A3 NaN
4 K0 A4 B0
5 K1 A5 B1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.last_valid_index.md
# maxframe.dataframe.DataFrame.last_valid_index
#### DataFrame.last_valid_index()
Return index for last non-NA value or None, if no non-NA value is found.
* **Return type:**
[type](https://docs.python.org/3/library/functions.html#type) of index
### Examples
For Series:
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([None, 3, 4])
>>> s.first_valid_index().execute()
1
>>> s.last_valid_index().execute()
2
```
```pycon
>>> s = md.Series([None, None])
>>> print(s.first_valid_index()).execute()
None
>>> print(s.last_valid_index()).execute()
None
```
If all elements in Series are NA/null, returns None.
```pycon
>>> s = md.Series()
>>> print(s.first_valid_index()).execute()
None
>>> print(s.last_valid_index()).execute()
None
```
If Series is empty, returns None.
For DataFrame:
```pycon
>>> df = md.DataFrame({'A': [None, None, 2], 'B': [None, 3, 4]})
>>> df.execute()
A B
0 NaN NaN
1 NaN 3.0
2 2.0 4.0
>>> df.first_valid_index().execute()
1
>>> df.last_valid_index().execute()
2
```
```pycon
>>> df = md.DataFrame({'A': [None, None, None], 'B': [None, None, None]})
>>> df.execute()
A B
0 None None
1 None None
2 None None
>>> print(df.first_valid_index()).execute()
None
>>> print(df.last_valid_index()).execute()
None
```
If all elements in DataFrame are NA/null, returns None.
```pycon
>>> df = md.DataFrame()
>>> df.execute()
Empty DataFrame
Columns: []
Index: []
>>> print(df.first_valid_index()).execute()
None
>>> print(df.last_valid_index()).execute()
None
```
If DataFrame is empty, returns None.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.le.md
# maxframe.dataframe.DataFrame.le
#### DataFrame.le(other, axis='columns', level=None, fill_value=None)
Get Less than or equal to of dataframe and other, element-wise (binary operator le).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison
operators.
Equivalent to dataframe <= other with support to choose axis (rows or columns)
and level for comparison.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 'columns'*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’).
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the passed
MultiIndex level.
* **Returns:**
Result of the comparison.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`DataFrame.eq`](maxframe.dataframe.DataFrame.eq.md#maxframe.dataframe.DataFrame.eq)
: Compare DataFrames for equality elementwise.
[`DataFrame.ne`](maxframe.dataframe.DataFrame.ne.md#maxframe.dataframe.DataFrame.ne)
: Compare DataFrames for inequality elementwise.
[`DataFrame.le`](#maxframe.dataframe.DataFrame.le)
: Compare DataFrames for less than inequality or equality elementwise.
[`DataFrame.lt`](maxframe.dataframe.DataFrame.lt.md#maxframe.dataframe.DataFrame.lt)
: Compare DataFrames for strictly less than inequality elementwise.
[`DataFrame.ge`](maxframe.dataframe.DataFrame.ge.md#maxframe.dataframe.DataFrame.ge)
: Compare DataFrames for greater than inequality or equality elementwise.
[`DataFrame.gt`](maxframe.dataframe.DataFrame.gt.md#maxframe.dataframe.DataFrame.gt)
: Compare DataFrames for strictly greater than inequality elementwise.
### Notes
Mismatched indices will be unioned together.
NaN values are considered different (i.e. NaN != NaN).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]},
... index=['A', 'B', 'C'])
>>> df.execute()
cost revenue
A 250 100
B 150 250
C 100 300
```
Comparison with a scalar, using either the operator or method:
```pycon
>>> (df == 100).execute()
cost revenue
A False True
B False False
C True False
```
```pycon
>>> df.eq(100).execute()
cost revenue
A False True
B False False
C True False
```
When other is a [`Series`](maxframe.dataframe.Series.md#maxframe.dataframe.Series), the columns of a DataFrame are aligned
with the index of other and broadcast:
```pycon
>>> (df != pd.Series([100, 250], index=["cost", "revenue"])).execute()
cost revenue
A True True
B True False
C False True
```
Use the method to control the broadcast axis:
```pycon
>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index').execute()
cost revenue
A True False
B True True
C True True
D True True
```
When comparing to an arbitrary sequence, the number of columns must
match the number elements in other:
```pycon
>>> (df == [250, 100]).execute()
cost revenue
A True True
B False False
C False False
```
Use the method to control the axis:
```pycon
>>> df.eq([250, 250, 100], axis='index').execute()
cost revenue
A True False
B False True
C True False
```
Compare to a DataFrame of different shape.
```pycon
>>> other = md.DataFrame({'revenue': [300, 250, 100, 150]},
... index=['A', 'B', 'C', 'D'])
>>> other.execute()
revenue
A 300
B 250
C 100
D 150
```
```pycon
>>> df.gt(other).execute()
cost revenue
A False False
B False False
C False True
D False False
```
Compare to a MultiIndex by level.
```pycon
>>> df_multindex = md.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex.execute()
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225
```
```pycon
>>> df.le(df_multindex, level=1).execute()
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.loc.md
# maxframe.dataframe.DataFrame.loc
#### *property* DataFrame.loc
Access a group of rows and columns by label(s) or a boolean array.
`.loc[]` is primarily label based, but may also be used with a
boolean array.
Allowed inputs are:
- A single label, e.g. `5` or `'a'`, (note that `5` is
interpreted as a *label* of the index, and **never** as an
integer position along the index).
- A list or array of labels, e.g. `['a', 'b', 'c']`.
- A slice object with labels, e.g. `'a':'f'`.
#### WARNING
Note that contrary to usual python slices, **both** the
start and the stop are included
- A boolean array of the same length as the axis being sliced,
e.g. `[True, False, True]`.
- An alignable boolean Series. The index of the key will be aligned before
masking.
- An alignable Index. The Index of the returned selection will be the input.
- A `callable` function with one argument (the calling Series or
DataFrame) and that returns valid output for indexing (one of the above)
See more at [Selection by Label](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-label).
* **Raises:**
* [**KeyError**](https://docs.python.org/3/library/exceptions.html#KeyError) – If any items are not found.
* **IndexingError** – If an indexed key is passed and its index is unalignable to the frame index.
#### SEE ALSO
[`DataFrame.at`](maxframe.dataframe.DataFrame.at.md#maxframe.dataframe.DataFrame.at)
: Access a single value for a row/column label pair.
[`DataFrame.iloc`](maxframe.dataframe.DataFrame.iloc.md#maxframe.dataframe.DataFrame.iloc)
: Access group of rows and columns by integer position(s).
[`DataFrame.xs`](maxframe.dataframe.DataFrame.xs.md#maxframe.dataframe.DataFrame.xs)
: Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.
[`Series.loc`](maxframe.dataframe.Series.loc.md#maxframe.dataframe.Series.loc)
: Access group of values using labels.
### Examples
**Getting values**
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df.execute()
max_speed shield
cobra 1 2
viper 4 5
sidewinder 7 8
```
Single label. Note this returns the row as a Series.
```pycon
>>> df.loc['viper'].execute()
max_speed 4
shield 5
Name: viper, dtype: int64
```
List of labels. Note using `[[]]` returns a DataFrame.
```pycon
>>> df.loc[['viper', 'sidewinder']].execute()
max_speed shield
viper 4 5
sidewinder 7 8
```
Single label for row and column
```pycon
>>> df.loc['cobra', 'shield'].execute()
2
```
Slice with labels for row and single label for column. As mentioned
above, note that both the start and stop of the slice are included.
```pycon
>>> df.loc['cobra':'viper', 'max_speed'].execute()
cobra 1
viper 4
Name: max_speed, dtype: int64
```
Boolean list with the same length as the row axis
```pycon
>>> df.loc[[False, False, True]].execute()
max_speed shield
sidewinder 7 8
```
Alignable boolean Series:
```pycon
>>> df.loc[md.Series([False, True, False],
... index=['viper', 'sidewinder', 'cobra'])].execute()
max_speed shield
sidewinder 7 8
```
Index (same behavior as `df.reindex`)
```pycon
>>> df.loc[md.Index(["cobra", "viper"], name="foo")].execute()
max_speed shield
foo
cobra 1 2
viper 4 5
```
Conditional that returns a boolean Series
```pycon
>>> df.loc[df['shield'] > 6].execute()
max_speed shield
sidewinder 7 8
```
Conditional that returns a boolean Series with column labels specified
```pycon
>>> df.loc[df['shield'] > 6, ['max_speed']].execute()
max_speed
sidewinder 7
```
Callable that returns a boolean Series
```pycon
>>> df.loc[lambda df: df['shield'] == 8].execute()
max_speed shield
sidewinder 7 8
```
**Setting values**
Set value for all items matching the list of labels
```pycon
>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df.execute()
max_speed shield
cobra 1 2
viper 4 50
sidewinder 7 50
```
Set value for an entire row
```pycon
>>> df.loc['cobra'] = 10
>>> df.execute()
max_speed shield
cobra 10 10
viper 4 50
sidewinder 7 50
```
Set value for an entire column
```pycon
>>> df.loc[:, 'max_speed'] = 30
>>> df.execute()
max_speed shield
cobra 30 10
viper 30 50
sidewinder 30 50
```
Set value for rows matching callable condition
```pycon
>>> df.loc[df['shield'] > 35] = 0
>>> df.execute()
max_speed shield
cobra 30 10
viper 0 0
sidewinder 0 0
```
**Getting values on a DataFrame with an index that has integer labels**
Another example using integers for the index
```pycon
>>> df = md.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df.execute()
max_speed shield
7 1 2
8 4 5
9 7 8
```
Slice with integer labels for rows. As mentioned above, note that both
the start and stop of the slice are included.
```pycon
>>> df.loc[7:9].execute()
max_speed shield
7 1 2
8 4 5
9 7 8
```
**Getting values with a MultiIndex**
A number of examples using a DataFrame with a MultiIndex
```pycon
>>> tuples = [
... ('cobra', 'mark i'), ('cobra', 'mark ii'),
... ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
... ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = md.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
... [1, 4], [7, 1], [16, 36]]
>>> df = md.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df.execute()
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
mark iii 16 36
```
Single label. Note this returns a DataFrame with a single index.
```pycon
>>> df.loc['cobra'].execute()
max_speed shield
mark i 12 2
mark ii 0 4
```
Single index tuple. Note this returns a Series.
```pycon
>>> df.loc[('cobra', 'mark ii')].execute()
max_speed 0
shield 4
Name: (cobra, mark ii), dtype: int64
```
Single label for row and column. Similar to passing in a tuple, this
returns a Series.
```pycon
>>> df.loc['cobra', 'mark i'].execute()
max_speed 12
shield 2
Name: (cobra, mark i), dtype: int64
```
Single tuple. Note using `[[]]` returns a DataFrame.
```pycon
>>> df.loc[[('cobra', 'mark ii')]].execute()
max_speed shield
cobra mark ii 0 4
```
Single tuple for the index with a single label for the column
```pycon
>>> df.loc[('cobra', 'mark i'), 'shield'].execute()
2
```
Slice from index tuple to single label
```pycon
>>> df.loc[('cobra', 'mark i'):'viper'].execute()
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
mark iii 16 36
```
Slice from index tuple to index tuple
```pycon
>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')].execute()
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.lt.md
# maxframe.dataframe.DataFrame.lt
#### DataFrame.lt(other, axis='columns', level=None, fill_value=None)
Get Less than of dataframe and other, element-wise (binary operator lt).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison
operators.
Equivalent to dataframe < other with support to choose axis (rows or columns)
and level for comparison.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 'columns'*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’).
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the passed
MultiIndex level.
* **Returns:**
Result of the comparison.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`DataFrame.eq`](maxframe.dataframe.DataFrame.eq.md#maxframe.dataframe.DataFrame.eq)
: Compare DataFrames for equality elementwise.
[`DataFrame.ne`](maxframe.dataframe.DataFrame.ne.md#maxframe.dataframe.DataFrame.ne)
: Compare DataFrames for inequality elementwise.
[`DataFrame.le`](maxframe.dataframe.DataFrame.le.md#maxframe.dataframe.DataFrame.le)
: Compare DataFrames for less than inequality or equality elementwise.
[`DataFrame.lt`](#maxframe.dataframe.DataFrame.lt)
: Compare DataFrames for strictly less than inequality elementwise.
[`DataFrame.ge`](maxframe.dataframe.DataFrame.ge.md#maxframe.dataframe.DataFrame.ge)
: Compare DataFrames for greater than inequality or equality elementwise.
[`DataFrame.gt`](maxframe.dataframe.DataFrame.gt.md#maxframe.dataframe.DataFrame.gt)
: Compare DataFrames for strictly greater than inequality elementwise.
### Notes
Mismatched indices will be unioned together.
NaN values are considered different (i.e. NaN != NaN).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]},
... index=['A', 'B', 'C'])
>>> df.execute()
cost revenue
A 250 100
B 150 250
C 100 300
```
Comparison with a scalar, using either the operator or method:
```pycon
>>> (df == 100).execute()
cost revenue
A False True
B False False
C True False
```
```pycon
>>> df.eq(100).execute()
cost revenue
A False True
B False False
C True False
```
When other is a [`Series`](maxframe.dataframe.Series.md#maxframe.dataframe.Series), the columns of a DataFrame are aligned
with the index of other and broadcast:
```pycon
>>> (df != pd.Series([100, 250], index=["cost", "revenue"])).execute()
cost revenue
A True True
B True False
C False True
```
Use the method to control the broadcast axis:
```pycon
>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index').execute()
cost revenue
A True False
B True True
C True True
D True True
```
When comparing to an arbitrary sequence, the number of columns must
match the number elements in other:
```pycon
>>> (df == [250, 100]).execute()
cost revenue
A True True
B False False
C False False
```
Use the method to control the axis:
```pycon
>>> df.eq([250, 250, 100], axis='index').execute()
cost revenue
A True False
B False True
C True False
```
Compare to a DataFrame of different shape.
```pycon
>>> other = md.DataFrame({'revenue': [300, 250, 100, 150]},
... index=['A', 'B', 'C', 'D'])
>>> other.execute()
revenue
A 300
B 250
C 100
D 150
```
```pycon
>>> df.gt(other).execute()
cost revenue
A False False
B False False
C False True
D False False
```
Compare to a MultiIndex by level.
```pycon
>>> df_multindex = md.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex.execute()
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225
```
```pycon
>>> df.le(df_multindex, level=1).execute()
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.map.md
# maxframe.dataframe.DataFrame.map
#### DataFrame.map(func, na_action=None, dtypes=None, dtype=None, skip_infer=False, \*\*kwargs)
Apply a function to a Dataframe elementwise.
This method applies a function that accepts and returns a scalar
to every element of a DataFrame.
* **Parameters:**
* **func** (*callable*) – Python function, returns a single value from a single value.
* **na_action** ( *{None* *,* *'ignore'}* *,* *default None*) – If ‘ignore’, propagate NaN values, without passing them to func.
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *default None*) – Specify dtypes of returned DataFrames.
* **dtype** (*np.dtype* *,* *default None*) – Specify dtypes of all columns of returned DataFrames, only
effective when dtypes is not specified.
* **skip_infer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether infer dtypes when dtypes or dtype is not specified.
* **\*\*kwargs** – Additional keyword arguments to pass as keywords arguments to
func.
* **Returns:**
Transformed DataFrame.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.apply`](maxframe.dataframe.DataFrame.apply.md#maxframe.dataframe.DataFrame.apply)
: Apply a function along input axis of DataFrame.
`DataFrame.replace`
: Replace values given in to_replace with value.
[`Series.map`](maxframe.dataframe.Series.map.md#maxframe.dataframe.Series.map)
: Apply a function elementwise on a Series.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[1, 2.12], [3.356, 4.567]])
>>> df.execute()
0 1
0 1.000 2.120
1 3.356 4.567
```
```pycon
>>> df.map(lambda x: len(str(x))).execute()
0 1
0 3 4
1 5 5
```
Like Series.map, NA values can be ignored:
```pycon
>>> df_copy = df.copy()
>>> df_copy.iloc[0, 0] = md.NA
>>> df_copy.map(lambda x: len(str(x)), na_action='ignore').execute()
0 1
0 NaN 4
1 5.0 5
```
It is also possible to use map with functions that are not
lambda functions:
```pycon
>>> df.map(round, ndigits=1).execute()
0 1
0 1.0 2.1
1 3.4 4.6
```
Note that a vectorized version of func often exists, which will
be much faster. You could square each number elementwise.
```pycon
>>> df.map(lambda x: x**2).execute()
0 1
0 1.000000 4.494400
1 11.262736 20.857489
```
But it’s better to avoid map in that case.
```pycon
>>> (df ** 2).execute()
0 1
0 1.000000 4.494400
1 11.262736 20.857489
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.mask.md
# maxframe.dataframe.DataFrame.mask
#### DataFrame.mask(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
Replace values where the condition is True.
* **Parameters:**
* **cond** (*bool Series/DataFrame* *,* *array-like* *, or* *callable*) – Where cond is False, keep the original value. Where
True, replace with corresponding value from other.
If cond is callable, it is computed on the Series/DataFrame and
should return boolean Series/DataFrame or array. The callable must
not change input Series/DataFrame (though pandas doesn’t check it).
* **other** (*scalar* *,* *Series/DataFrame* *, or* *callable*) – Entries where cond is True are replaced with
corresponding value from other.
If other is callable, it is computed on the Series/DataFrame and
should return scalar or Series/DataFrame. The callable must not
change input Series/DataFrame (though pandas doesn’t check it).
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether to perform the operation in place on the data.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Alignment axis if needed.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Alignment level if needed.
* **Return type:**
Same type as caller
#### SEE ALSO
[`DataFrame.where()`](maxframe.dataframe.DataFrame.where.md#maxframe.dataframe.DataFrame.where)
: Return an object of same shape as self.
### Notes
The mask method is an application of the if-then idiom. For each
element in the calling DataFrame, if `cond` is `False` the
element is used; otherwise the corresponding element from the DataFrame
`other` is used.
The signature for [`DataFrame.where()`](maxframe.dataframe.DataFrame.where.md#maxframe.dataframe.DataFrame.where) differs from
[`numpy.where()`](https://numpy.org/doc/stable/reference/generated/numpy.where.html#numpy.where). Roughly `df1.where(m, df2)` is equivalent to
`np.where(m, df1, df2)`.
For further details and examples see the `mask` documentation in
[indexing](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-where-mask).
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series(range(5))
>>> s.where(s > 0).execute()
0 NaN
1 1.0
2 2.0
3 3.0
4 4.0
dtype: float64
```
```pycon
>>> s.mask(s > 0).execute()
0 0.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
```
```pycon
>>> s.where(s > 1, 10).execute()
0 10
1 10
2 2
3 3
4 4
dtype: int64
```
```pycon
>>> df = md.DataFrame(mt.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df.execute()
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
>>> m = df % 3 == 0
>>> df.where(m, -df).execute()
A B
0 0 -1
1 -2 3
2 -4 -5
3 6 -7
4 -8 9
>>> df.where(m, -df) == mt.where(m, df, -df).execute()
A B
0 True True
1 True True
2 True True
3 True True
4 True True
>>> df.where(m, -df) == df.mask(~m, -df).execute()
A B
0 True True
1 True True
2 True True
3 True True
4 True True
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.max.md
# maxframe.dataframe.DataFrame.max
#### DataFrame.max(axis=None, skipna=True, level=None, numeric_only=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.md
# maxframe.dataframe.DataFrame
### *class* maxframe.dataframe.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False, chunk_size=None, gpu=None, sparse=None, num_partitions=None)
#### \_\_init_\_(data=None, index=None, columns=None, dtype=None, copy=False, chunk_size=None, gpu=None, sparse=None, num_partitions=None)
### Methods
| [`__init__`](#maxframe.dataframe.DataFrame.__init__)([data, index, columns, dtype, ...]) | |
|-------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| [`abs`](maxframe.dataframe.DataFrame.abs.md#maxframe.dataframe.DataFrame.abs)() | |
| [`add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)(other[, axis, level, fill_value]) | Get Addition of dataframe and other, element-wise (binary operator add). |
| [`add_prefix`](maxframe.dataframe.DataFrame.add_prefix.md#maxframe.dataframe.DataFrame.add_prefix)(prefix) | Prefix labels with string prefix. |
| [`add_suffix`](maxframe.dataframe.DataFrame.add_suffix.md#maxframe.dataframe.DataFrame.add_suffix)(suffix) | Suffix labels with string suffix. |
| [`agg`](maxframe.dataframe.DataFrame.agg.md#maxframe.dataframe.DataFrame.agg)([func, axis]) | Aggregate using one or more operations over the specified axis. |
| [`aggregate`](maxframe.dataframe.DataFrame.aggregate.md#maxframe.dataframe.DataFrame.aggregate)([func, axis]) | Aggregate using one or more operations over the specified axis. |
| [`align`](maxframe.dataframe.DataFrame.align.md#maxframe.dataframe.DataFrame.align)(other[, join, axis, level, copy, ...]) | Align two objects on their axes with the specified join method. |
| [`all`](maxframe.dataframe.DataFrame.all.md#maxframe.dataframe.DataFrame.all)([axis, bool_only, skipna, level, method]) | |
| [`any`](maxframe.dataframe.DataFrame.any.md#maxframe.dataframe.DataFrame.any)([axis, bool_only, skipna, level, method]) | |
| [`append`](maxframe.dataframe.DataFrame.append.md#maxframe.dataframe.DataFrame.append)(other[, ignore_index, ...]) | Append rows of other to the end of caller, returning a new object. |
| [`apply`](maxframe.dataframe.DataFrame.apply.md#maxframe.dataframe.DataFrame.apply)(func[, axis, raw, result_type, args, ...]) | Apply a function along an axis of the DataFrame. |
| [`applymap`](maxframe.dataframe.DataFrame.applymap.md#maxframe.dataframe.DataFrame.applymap)(func[, na_action, dtypes, dtype, ...]) | Apply a function to a Dataframe elementwise. |
| `around`([decimals]) | Round a DataFrame to a variable number of decimal places. |
| [`assign`](maxframe.dataframe.DataFrame.assign.md#maxframe.dataframe.DataFrame.assign)(\*\*kwargs) | Assign new columns to a DataFrame. |
| [`astype`](maxframe.dataframe.DataFrame.astype.md#maxframe.dataframe.DataFrame.astype)(dtype[, copy, errors]) | Cast a pandas object to a specified dtype `dtype`. |
| [`at_time`](maxframe.dataframe.DataFrame.at_time.md#maxframe.dataframe.DataFrame.at_time)(time[, axis]) | Select values at particular time of day (e.g., 9:30AM). |
| `backfill`([axis, inplace, limit, downcast]) | Synonym for [`DataFrame.fillna()`](maxframe.dataframe.DataFrame.fillna.md#maxframe.dataframe.DataFrame.fillna) with `method='bfill'`. |
| [`between_time`](maxframe.dataframe.DataFrame.between_time.md#maxframe.dataframe.DataFrame.between_time)(start_time, end_time[, ...]) | Select values between particular times of the day (e.g., 9:00-9:30 AM). |
| `bfill`([axis, inplace, limit, downcast]) | Synonym for [`DataFrame.fillna()`](maxframe.dataframe.DataFrame.fillna.md#maxframe.dataframe.DataFrame.fillna) with `method='bfill'`. |
| [`clip`](maxframe.dataframe.DataFrame.clip.md#maxframe.dataframe.DataFrame.clip)([lower, upper, axis, inplace]) | Trim values at input threshold(s). |
| [`combine`](maxframe.dataframe.DataFrame.combine.md#maxframe.dataframe.DataFrame.combine)(other, func[, fill_value, overwrite]) | Perform column-wise combine with another DataFrame. |
| [`combine_first`](maxframe.dataframe.DataFrame.combine_first.md#maxframe.dataframe.DataFrame.combine_first)(other) | Update null elements with value in the same location in other. |
| [`compare`](maxframe.dataframe.DataFrame.compare.md#maxframe.dataframe.DataFrame.compare)(other[, align_axis, keep_shape, ...]) | Compare to another DataFrame and show the differences. |
| [`convert_dtypes`](maxframe.dataframe.DataFrame.convert_dtypes.md#maxframe.dataframe.DataFrame.convert_dtypes)([infer_objects, ...]) | Convert columns to best possible dtypes using dtypes supporting `pd.NA`. |
| [`copy`](maxframe.dataframe.DataFrame.copy.md#maxframe.dataframe.DataFrame.copy)() | |
| `copy_from`(obj) | |
| `copy_to`(target) | |
| [`corr`](maxframe.dataframe.DataFrame.corr.md#maxframe.dataframe.DataFrame.corr)([method, min_periods]) | Compute pairwise correlation of columns, excluding NA/null values. |
| [`corrwith`](maxframe.dataframe.DataFrame.corrwith.md#maxframe.dataframe.DataFrame.corrwith)(other[, axis, drop, method]) | Compute pairwise correlation. |
| [`count`](maxframe.dataframe.DataFrame.count.md#maxframe.dataframe.DataFrame.count)([axis, level, numeric_only]) | |
| [`cov`](maxframe.dataframe.DataFrame.cov.md#maxframe.dataframe.DataFrame.cov)([min_periods, ddof, numeric_only]) | Compute pairwise covariance of columns, excluding NA/null values. |
| `cummax`([axis, skipna]) | |
| `cummin`([axis, skipna]) | |
| `cumprod`([axis, skipna]) | |
| `cumsum`([axis, skipna]) | |
| [`describe`](maxframe.dataframe.DataFrame.describe.md#maxframe.dataframe.DataFrame.describe)([percentiles, include, exclude]) | Generate descriptive statistics. |
| [`diff`](maxframe.dataframe.DataFrame.diff.md#maxframe.dataframe.DataFrame.diff)([periods, axis]) | First discrete difference of element. |
| [`div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)(other[, axis, level, fill_value]) | Get Floating division of dataframe and other, element-wise (binary operator truediv). |
| [`dot`](maxframe.dataframe.DataFrame.dot.md#maxframe.dataframe.DataFrame.dot)(other) | Compute the matrix multiplication between the DataFrame and other. |
| [`drop`](maxframe.dataframe.DataFrame.drop.md#maxframe.dataframe.DataFrame.drop)([labels, axis, index, columns, level, ...]) | Drop specified labels from rows or columns. |
| [`drop_duplicates`](maxframe.dataframe.DataFrame.drop_duplicates.md#maxframe.dataframe.DataFrame.drop_duplicates)([subset, keep, inplace, ...]) | Return DataFrame with duplicate rows removed. |
| [`droplevel`](maxframe.dataframe.DataFrame.droplevel.md#maxframe.dataframe.DataFrame.droplevel)(level[, axis]) | Return Series/DataFrame with requested index / column level(s) removed. |
| [`dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)([axis, how, thresh, subset, inplace, ...]) | Remove missing values. |
| [`duplicated`](maxframe.dataframe.DataFrame.duplicated.md#maxframe.dataframe.DataFrame.duplicated)([subset, keep, method]) | Return boolean Series denoting duplicate rows. |
| [`eq`](maxframe.dataframe.DataFrame.eq.md#maxframe.dataframe.DataFrame.eq)(other[, axis, level, fill_value]) | Get Equal to of dataframe and other, element-wise (binary operator eq). |
| [`eval`](maxframe.dataframe.DataFrame.eval.md#maxframe.dataframe.DataFrame.eval)(expr[, inplace]) | Evaluate a string describing operations on DataFrame columns. |
| [`ewm`](maxframe.dataframe.DataFrame.ewm.md#maxframe.dataframe.DataFrame.ewm)([com, span, halflife, alpha, ...]) | Provide exponential weighted functions. |
| `execute`([session]) | |
| [`expanding`](maxframe.dataframe.DataFrame.expanding.md#maxframe.dataframe.DataFrame.expanding)([min_periods, shift, reverse_range]) | Provide expanding transformations. |
| `explode`(column[, ignore_index, ...]) | Transform each element of a list-like to a row, replicating index values. |
| `ffill`([axis, inplace, limit, downcast]) | Synonym for [`DataFrame.fillna()`](maxframe.dataframe.DataFrame.fillna.md#maxframe.dataframe.DataFrame.fillna) with `method='ffill'`. |
| [`fillna`](maxframe.dataframe.DataFrame.fillna.md#maxframe.dataframe.DataFrame.fillna)([value, method, axis, inplace, ...]) | Fill NA/NaN values using the specified method. |
| [`filter`](maxframe.dataframe.DataFrame.filter.md#maxframe.dataframe.DataFrame.filter)([items, like, regex, axis]) | Subset the dataframe rows or columns according to the specified index labels. |
| [`first_valid_index`](maxframe.dataframe.DataFrame.first_valid_index.md#maxframe.dataframe.DataFrame.first_valid_index)() | Return index for first non-NA value or None, if no non-NA value is found. |
| [`floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)(other[, axis, level, fill_value]) | Get Integer division of dataframe and other, element-wise (binary operator floordiv). |
| [`from_dict`](maxframe.dataframe.DataFrame.from_dict.md#maxframe.dataframe.DataFrame.from_dict)(data[, orient, dtype, columns]) | Construct DataFrame from dict of array-like or dicts. |
| [`from_records`](maxframe.dataframe.DataFrame.from_records.md#maxframe.dataframe.DataFrame.from_records)(data[, index, exclude, ...]) | Convert structured or record ndarray to DataFrame. |
| `from_tensor`(tensor[, index, columns, gpu, ...]) | |
| [`ge`](maxframe.dataframe.DataFrame.ge.md#maxframe.dataframe.DataFrame.ge)(other[, axis, level, fill_value]) | Get Greater than or equal to of dataframe and other, element-wise (binary operator ge). |
| [`groupby`](maxframe.dataframe.DataFrame.groupby.md#maxframe.dataframe.DataFrame.groupby)([by, level, as_index, sort, group_keys]) | Group DataFrame using a mapper or by a Series of columns. |
| [`gt`](maxframe.dataframe.DataFrame.gt.md#maxframe.dataframe.DataFrame.gt)(other[, axis, level, fill_value]) | Get Greater than of dataframe and other, element-wise (binary operator gt). |
| [`head`](maxframe.dataframe.DataFrame.head.md#maxframe.dataframe.DataFrame.head)([n]) | Return the first n rows. |
| [`idxmax`](maxframe.dataframe.DataFrame.idxmax.md#maxframe.dataframe.DataFrame.idxmax)([axis, skipna]) | Return index of first occurrence of maximum over requested axis. |
| [`idxmin`](maxframe.dataframe.DataFrame.idxmin.md#maxframe.dataframe.DataFrame.idxmin)([axis, skipna]) | Return index of first occurrence of minimum over requested axis. |
| [`infer_objects`](maxframe.dataframe.DataFrame.infer_objects.md#maxframe.dataframe.DataFrame.infer_objects)([copy]) | Attempt to infer better dtypes for object columns. |
| [`insert`](maxframe.dataframe.DataFrame.insert.md#maxframe.dataframe.DataFrame.insert)(loc, column, value[, allow_duplicates]) | Insert column into DataFrame at specified location. |
| `isin`(values) | Whether each element in the DataFrame is contained in values. |
| [`isna`](maxframe.dataframe.DataFrame.isna.md#maxframe.dataframe.DataFrame.isna)() | Detect missing values. |
| [`isnull`](maxframe.dataframe.DataFrame.isnull.md#maxframe.dataframe.DataFrame.isnull)() | Detect missing values. |
| `iterrows`([batch_size, session]) | Iterate over DataFrame rows as (index, Series) pairs. |
| `itertuples`([index, name, batch_size, session]) | Iterate over DataFrame rows as namedtuples. |
| [`join`](maxframe.dataframe.DataFrame.join.md#maxframe.dataframe.DataFrame.join)(other[, on, how, lsuffix, rsuffix, ...]) | Join columns of another DataFrame. |
| `keys`() | Get the 'info axis' (see Indexing for more). |
| `kurt`([axis, skipna, level, numeric_only, ...]) | |
| `kurtosis`([axis, skipna, level, ...]) | |
| [`last_valid_index`](maxframe.dataframe.DataFrame.last_valid_index.md#maxframe.dataframe.DataFrame.last_valid_index)() | Return index for last non-NA value or None, if no non-NA value is found. |
| [`le`](maxframe.dataframe.DataFrame.le.md#maxframe.dataframe.DataFrame.le)(other[, axis, level, fill_value]) | Get Less than or equal to of dataframe and other, element-wise (binary operator le). |
| [`lt`](maxframe.dataframe.DataFrame.lt.md#maxframe.dataframe.DataFrame.lt)(other[, axis, level, fill_value]) | Get Less than of dataframe and other, element-wise (binary operator lt). |
| [`map`](maxframe.dataframe.DataFrame.map.md#maxframe.dataframe.DataFrame.map)(func[, na_action, dtypes, dtype, skip_infer]) | Apply a function to a Dataframe elementwise. |
| [`mask`](maxframe.dataframe.DataFrame.mask.md#maxframe.dataframe.DataFrame.mask)(cond[, other, inplace, axis, level, ...]) | Replace values where the condition is True. |
| [`max`](maxframe.dataframe.DataFrame.max.md#maxframe.dataframe.DataFrame.max)([axis, skipna, level, numeric_only, method]) | |
| [`mean`](maxframe.dataframe.DataFrame.mean.md#maxframe.dataframe.DataFrame.mean)([axis, skipna, level, numeric_only, method]) | |
| [`median`](maxframe.dataframe.DataFrame.median.md#maxframe.dataframe.DataFrame.median)([axis, skipna, level, numeric_only, ...]) | |
| [`melt`](maxframe.dataframe.DataFrame.melt.md#maxframe.dataframe.DataFrame.melt)([id_vars, value_vars, var_name, ...]) | Unpivot a DataFrame from wide to long format, optionally leaving identifiers set. |
| [`memory_usage`](maxframe.dataframe.DataFrame.memory_usage.md#maxframe.dataframe.DataFrame.memory_usage)([index, deep]) | Return the memory usage of each column in bytes. |
| [`merge`](maxframe.dataframe.DataFrame.merge.md#maxframe.dataframe.DataFrame.merge)(right[, how, on, left_on, right_on, ...]) | Merge DataFrame or named Series objects with a database-style join. |
| [`min`](maxframe.dataframe.DataFrame.min.md#maxframe.dataframe.DataFrame.min)([axis, skipna, level, numeric_only, method]) | |
| [`mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)(other[, axis, level, fill_value]) | Get Modulo of dataframe and other, element-wise (binary operator mod). |
| [`mode`](maxframe.dataframe.DataFrame.mode.md#maxframe.dataframe.DataFrame.mode)([axis, numeric_only, dropna, combine_size]) | Get the mode(s) of each element along the selected axis. |
| [`mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)(other[, axis, level, fill_value]) | Get Multiplication of dataframe and other, element-wise (binary operator mul). |
| `multiply`(other[, axis, level, fill_value]) | Get Multiplication of dataframe and other, element-wise (binary operator mul). |
| [`ne`](maxframe.dataframe.DataFrame.ne.md#maxframe.dataframe.DataFrame.ne)(other[, axis, level, fill_value]) | Get Not equal to of dataframe and other, element-wise (binary operator ne). |
| [`nlargest`](maxframe.dataframe.DataFrame.nlargest.md#maxframe.dataframe.DataFrame.nlargest)(n, columns[, keep]) | Return the first n rows ordered by columns in descending order. |
| [`notna`](maxframe.dataframe.DataFrame.notna.md#maxframe.dataframe.DataFrame.notna)() | Detect existing (non-missing) values. |
| [`notnull`](maxframe.dataframe.DataFrame.notnull.md#maxframe.dataframe.DataFrame.notnull)() | Detect existing (non-missing) values. |
| [`nsmallest`](maxframe.dataframe.DataFrame.nsmallest.md#maxframe.dataframe.DataFrame.nsmallest)(n, columns[, keep]) | Return the first n rows ordered by columns in ascending order. |
| [`nunique`](maxframe.dataframe.DataFrame.nunique.md#maxframe.dataframe.DataFrame.nunique)([axis, dropna]) | Count distinct observations over requested axis. |
| `pad`([axis, inplace, limit, downcast]) | Synonym for [`DataFrame.fillna()`](maxframe.dataframe.DataFrame.fillna.md#maxframe.dataframe.DataFrame.fillna) with `method='ffill'`. |
| [`pct_change`](maxframe.dataframe.DataFrame.pct_change.md#maxframe.dataframe.DataFrame.pct_change)([periods, fill_method, limit, freq]) | Percentage change between the current and a prior element. |
| [`pivot`](maxframe.dataframe.DataFrame.pivot.md#maxframe.dataframe.DataFrame.pivot)(columns[, index, values]) | Return reshaped DataFrame organized by given index / column values. |
| [`pivot_table`](maxframe.dataframe.DataFrame.pivot_table.md#maxframe.dataframe.DataFrame.pivot_table)([values, index, columns, ...]) | Create a spreadsheet-style pivot table as a DataFrame. |
| [`pop`](maxframe.dataframe.DataFrame.pop.md#maxframe.dataframe.DataFrame.pop)(item) | Return item and drop from frame. |
| [`pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)(other[, axis, level, fill_value]) | Get Exponential power of dataframe and other, element-wise (binary operator pow). |
| [`prod`](maxframe.dataframe.DataFrame.prod.md#maxframe.dataframe.DataFrame.prod)([axis, skipna, level, min_count, ...]) | |
| [`product`](maxframe.dataframe.DataFrame.product.md#maxframe.dataframe.DataFrame.product)([axis, skipna, level, min_count, ...]) | |
| [`quantile`](maxframe.dataframe.DataFrame.quantile.md#maxframe.dataframe.DataFrame.quantile)([q, axis, numeric_only, interpolation]) | Return values at the given quantile over requested axis. |
| [`query`](maxframe.dataframe.DataFrame.query.md#maxframe.dataframe.DataFrame.query)(expr[, inplace]) | Query the columns of a DataFrame with a boolean expression. |
| [`radd`](maxframe.dataframe.DataFrame.radd.md#maxframe.dataframe.DataFrame.radd)(other[, axis, level, fill_value]) | Get Addition of dataframe and other, element-wise (binary operator radd). |
| [`rank`](maxframe.dataframe.DataFrame.rank.md#maxframe.dataframe.DataFrame.rank)([axis, method, numeric_only, ...]) | Compute numerical data ranks (1 through n) along axis. |
| [`rdiv`](maxframe.dataframe.DataFrame.rdiv.md#maxframe.dataframe.DataFrame.rdiv)(other[, axis, level, fill_value]) | Get Floating division of dataframe and other, element-wise (binary operator rtruediv). |
| `rechunk`(chunk_size[, reassign_worker]) | |
| [`reindex`](maxframe.dataframe.DataFrame.reindex.md#maxframe.dataframe.DataFrame.reindex)([labels, index, columns, axis, ...]) | Conform Series/DataFrame to new index with optional filling logic. |
| [`reindex_like`](maxframe.dataframe.DataFrame.reindex_like.md#maxframe.dataframe.DataFrame.reindex_like)(other[, method, copy, limit, ...]) | Return an object with matching indices as other object. |
| [`rename`](maxframe.dataframe.DataFrame.rename.md#maxframe.dataframe.DataFrame.rename)([mapper, index, columns, axis, copy, ...]) | Alter axes labels. |
| [`rename_axis`](maxframe.dataframe.DataFrame.rename_axis.md#maxframe.dataframe.DataFrame.rename_axis)([mapper, index, columns, axis, ...]) | Set the name of the axis for the index or columns. |
| [`reorder_levels`](maxframe.dataframe.DataFrame.reorder_levels.md#maxframe.dataframe.DataFrame.reorder_levels)(order[, axis]) | Rearrange index levels using input order. |
| `replace`([to_replace, value, inplace, limit, ...]) | Replace values given in to_replace with value. |
| [`reset_index`](maxframe.dataframe.DataFrame.reset_index.md#maxframe.dataframe.DataFrame.reset_index)([level, drop, inplace, ...]) | Reset the index, or a level of it. |
| [`rfloordiv`](maxframe.dataframe.DataFrame.rfloordiv.md#maxframe.dataframe.DataFrame.rfloordiv)(other[, axis, level, fill_value]) | Get Integer division of dataframe and other, element-wise (binary operator rfloordiv). |
| [`rmod`](maxframe.dataframe.DataFrame.rmod.md#maxframe.dataframe.DataFrame.rmod)(other[, axis, level, fill_value]) | Get Modulo of dataframe and other, element-wise (binary operator rmod). |
| [`rmul`](maxframe.dataframe.DataFrame.rmul.md#maxframe.dataframe.DataFrame.rmul)(other[, axis, level, fill_value]) | Get Multiplication of dataframe and other, element-wise (binary operator rmul). |
| [`rolling`](maxframe.dataframe.DataFrame.rolling.md#maxframe.dataframe.DataFrame.rolling)(window[, min_periods, center, ...]) | Provide rolling window calculations. |
| [`round`](maxframe.dataframe.DataFrame.round.md#maxframe.dataframe.DataFrame.round)([decimals]) | Round a DataFrame to a variable number of decimal places. |
| [`rpow`](maxframe.dataframe.DataFrame.rpow.md#maxframe.dataframe.DataFrame.rpow)(other[, axis, level, fill_value]) | Get Exponential power of dataframe and other, element-wise (binary operator rpow). |
| [`rsub`](maxframe.dataframe.DataFrame.rsub.md#maxframe.dataframe.DataFrame.rsub)(other[, axis, level, fill_value]) | Get Subtraction of dataframe and other, element-wise (binary operator rsubtract). |
| [`rtruediv`](maxframe.dataframe.DataFrame.rtruediv.md#maxframe.dataframe.DataFrame.rtruediv)(other[, axis, level, fill_value]) | Get Floating division of dataframe and other, element-wise (binary operator rtruediv). |
| [`sample`](maxframe.dataframe.DataFrame.sample.md#maxframe.dataframe.DataFrame.sample)([n, frac, replace, weights, ...]) | Return a random sample of items from an axis of object. |
| [`select_dtypes`](maxframe.dataframe.DataFrame.select_dtypes.md#maxframe.dataframe.DataFrame.select_dtypes)([include, exclude]) | Return a subset of the DataFrame's columns based on the column dtypes. |
| [`sem`](maxframe.dataframe.DataFrame.sem.md#maxframe.dataframe.DataFrame.sem)([axis, skipna, level, ddof, ...]) | |
| [`set_axis`](maxframe.dataframe.DataFrame.set_axis.md#maxframe.dataframe.DataFrame.set_axis)(labels[, axis, inplace]) | Assign desired index to given axis. |
| [`set_index`](maxframe.dataframe.DataFrame.set_index.md#maxframe.dataframe.DataFrame.set_index)(keys[, drop, append, inplace, ...]) | Set the DataFrame index using existing columns. |
| [`shift`](maxframe.dataframe.DataFrame.shift.md#maxframe.dataframe.DataFrame.shift)([periods, freq, axis, fill_value]) | Shift index by desired number of periods with an optional time freq. |
| `skew`([axis, skipna, level, numeric_only, ...]) | |
| [`sort_index`](maxframe.dataframe.DataFrame.sort_index.md#maxframe.dataframe.DataFrame.sort_index)([axis, level, ascending, ...]) | Sort object by labels (along an axis). |
| [`sort_values`](maxframe.dataframe.DataFrame.sort_values.md#maxframe.dataframe.DataFrame.sort_values)(by[, axis, ascending, inplace, ...]) | Sort by the values along either axis. |
| [`stack`](maxframe.dataframe.DataFrame.stack.md#maxframe.dataframe.DataFrame.stack)([level, dropna]) | Stack the prescribed level(s) from columns to index. |
| [`std`](maxframe.dataframe.DataFrame.std.md#maxframe.dataframe.DataFrame.std)([axis, skipna, level, ddof, ...]) | |
| [`sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)(other[, axis, level, fill_value]) | Get Subtraction of dataframe and other, element-wise (binary operator subtract). |
| [`sum`](maxframe.dataframe.DataFrame.sum.md#maxframe.dataframe.DataFrame.sum)([axis, skipna, level, min_count, ...]) | |
| [`swaplevel`](maxframe.dataframe.DataFrame.swaplevel.md#maxframe.dataframe.DataFrame.swaplevel)([i, j, axis]) | Swap levels i and j in a `MultiIndex`. |
| [`tail`](maxframe.dataframe.DataFrame.tail.md#maxframe.dataframe.DataFrame.tail)([n]) | Return the last n rows. |
| [`take`](maxframe.dataframe.DataFrame.take.md#maxframe.dataframe.DataFrame.take)(indices[, axis]) | Return the elements in the given *positional* indices along an axis. |
| [`to_clipboard`](maxframe.dataframe.DataFrame.to_clipboard.md#maxframe.dataframe.DataFrame.to_clipboard)(\*[, excel, sep, batch_size, ...]) | Copy object to the system clipboard. |
| [`to_csv`](maxframe.dataframe.DataFrame.to_csv.md#maxframe.dataframe.DataFrame.to_csv)(path[, sep, na_rep, float_format, ...]) | Write object to a comma-separated values (csv) file. |
| [`to_dict`](maxframe.dataframe.DataFrame.to_dict.md#maxframe.dataframe.DataFrame.to_dict)([orient, into, index, batch_size, ...]) | Convert the DataFrame to a dictionary. |
| [`to_json`](maxframe.dataframe.DataFrame.to_json.md#maxframe.dataframe.DataFrame.to_json)([path, orient, date_format, ...]) | Convert the object to a JSON string. |
| [`to_odps_table`](maxframe.dataframe.DataFrame.to_odps_table.md#maxframe.dataframe.DataFrame.to_odps_table)(table[, partition, ...]) | Write DataFrame object into a MaxCompute (ODPS) table. |
| [`to_pandas`](maxframe.dataframe.DataFrame.to_pandas.md#maxframe.dataframe.DataFrame.to_pandas)([session]) | |
| [`to_parquet`](maxframe.dataframe.DataFrame.to_parquet.md#maxframe.dataframe.DataFrame.to_parquet)(path[, engine, compression, ...]) | Write a DataFrame to the binary parquet format, each chunk will be written to a Parquet file. |
| `to_tensor`() | |
| [`transform`](maxframe.dataframe.DataFrame.transform.md#maxframe.dataframe.DataFrame.transform)(func[, axis, dtypes, skip_infer]) | Call `func` on self producing a DataFrame with transformed values. |
| `transpose`() | Transpose index and columns. |
| [`truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)(other[, axis, level, fill_value]) | Get Floating division of dataframe and other, element-wise (binary operator truediv). |
| [`truncate`](maxframe.dataframe.DataFrame.truncate.md#maxframe.dataframe.DataFrame.truncate)([before, after, axis, copy]) | Truncate a Series or DataFrame before and after some index value. |
| [`tshift`](maxframe.dataframe.DataFrame.tshift.md#maxframe.dataframe.DataFrame.tshift)([periods, freq, axis]) | Shift the time index, using the index's frequency if available. |
| [`unstack`](maxframe.dataframe.DataFrame.unstack.md#maxframe.dataframe.DataFrame.unstack)([level, fill_value]) | Unstack, also known as pivot, Series with MultiIndex to produce DataFrame. |
| [`update`](maxframe.dataframe.DataFrame.update.md#maxframe.dataframe.DataFrame.update)(other[, join, overwrite, ...]) | Modify in place using non-NA values from another DataFrame. |
| [`value_counts`](maxframe.dataframe.DataFrame.value_counts.md#maxframe.dataframe.DataFrame.value_counts)([subset, normalize, sort, ...]) | |
| [`var`](maxframe.dataframe.DataFrame.var.md#maxframe.dataframe.DataFrame.var)([axis, skipna, level, ddof, ...]) | |
| [`where`](maxframe.dataframe.DataFrame.where.md#maxframe.dataframe.DataFrame.where)(cond[, other, inplace, axis, level, ...]) | Replace values where the condition is False. |
| [`xs`](maxframe.dataframe.DataFrame.xs.md#maxframe.dataframe.DataFrame.xs)(key[, axis, level, drop_level]) | Return cross-section from the Series/DataFrame. |
### Attributes
| `T` | |
|-------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| [`at`](maxframe.dataframe.DataFrame.at.md#maxframe.dataframe.DataFrame.at) | Access a single value for a row/column label pair. |
| [`columns`](maxframe.dataframe.DataFrame.columns.md#maxframe.dataframe.DataFrame.columns) | |
| `data` | |
| [`dtypes`](maxframe.dataframe.DataFrame.dtypes.md#maxframe.dataframe.DataFrame.dtypes) | Return the dtypes in the DataFrame. |
| [`iat`](maxframe.dataframe.DataFrame.iat.md#maxframe.dataframe.DataFrame.iat) | Access a single value for a row/column pair by integer position. |
| [`iloc`](maxframe.dataframe.DataFrame.iloc.md#maxframe.dataframe.DataFrame.iloc) | Purely integer-location based indexing for selection by position. |
| [`index`](maxframe.dataframe.DataFrame.index.md#maxframe.dataframe.DataFrame.index) | |
| [`loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc) | Access a group of rows and columns by label(s) or a boolean array. |
| [`ndim`](maxframe.dataframe.DataFrame.ndim.md#maxframe.dataframe.DataFrame.ndim) | Return an int representing the number of axes / array dimensions. |
| [`shape`](maxframe.dataframe.DataFrame.shape.md#maxframe.dataframe.DataFrame.shape) | |
| `size` | |
| `type_name` | |
| `values` | |
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.mean.md
# maxframe.dataframe.DataFrame.mean
#### DataFrame.mean(axis=None, skipna=True, level=None, numeric_only=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.median.md
# maxframe.dataframe.DataFrame.median
#### DataFrame.median(axis=0, skipna=True, level=None, numeric_only=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.melt.md
# maxframe.dataframe.DataFrame.melt
#### DataFrame.melt(id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=False, default_index_type=None)
Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (id_vars), while all other
columns, considered measured variables (value_vars), are “unpivoted” to
the row axis, leaving just two non-identifier columns, ‘variable’ and
‘value’.
* **Parameters:**
* **id_vars** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *,* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *, or* *ndarray* *,* *optional*) – Column(s) to use as identifier variables.
* **value_vars** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *,* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *, or* *ndarray* *,* *optional*) – Column(s) to unpivot. If not specified, uses all columns that
are not set as id_vars.
* **var_name** (*scalar*) – Name to use for the ‘variable’ column. If None it uses
`frame.columns.name` or ‘variable’.
* **value_name** (*scalar* *,* *default 'value'*) – Name to use for the ‘value’ column.
* **col_level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – If columns are a MultiIndex then use this level to melt.
* **ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If True, original index is ignored. If False, the original index
is retained. Index labels will be repeated as necessary.
* **Returns:**
Unpivoted DataFrame.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`melt`](#maxframe.dataframe.DataFrame.melt), [`pivot_table`](maxframe.dataframe.DataFrame.pivot_table.md#maxframe.dataframe.DataFrame.pivot_table), [`DataFrame.pivot`](maxframe.dataframe.DataFrame.pivot.md#maxframe.dataframe.DataFrame.pivot), [`Series.explode`](maxframe.dataframe.Series.explode.md#maxframe.dataframe.Series.explode)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df.execute()
A B C
0 a 1 2
1 b 3 4
2 c 5 6
```
```pycon
>>> df.melt(id_vars=['A'], value_vars=['B']).execute()
A variable value
0 a B 1
1 b B 3
2 c B 5
```
```pycon
>>> df.melt(id_vars=['A'], value_vars=['B', 'C']).execute()
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6
```
The names of ‘variable’ and ‘value’ columns can be customized:
```pycon
>>> df.melt(id_vars=['A'], value_vars=['B'],
... var_name='myVarname', value_name='myValname').execute()
A myVarname myValname
0 a B 1
1 b B 3
2 c B 5
```
If you have multi-index columns:
```pycon
>>> df = md.DataFrame({('A', 'D'): {0: 'a', 1: 'b', 2: 'c'},
... ('B', 'E'): {0: 1, 1: 3, 2: 5},
... ('C', 'F'): {0: 2, 1: 4, 2: 6}})
>>> df.execute()
A B C
D E F
0 a 1 2
1 b 3 4
2 c 5 6
```
```pycon
>>> df.melt(col_level=0, id_vars=['A'], value_vars=['B']).execute()
A variable value
0 a B 1
1 b B 3
2 c B 5
```
```pycon
>>> df.melt(id_vars=[('A', 'D')], value_vars=[('B', 'E')]).execute()
(A, D) variable_0 variable_1 value
0 a B E 1
1 b B E 3
2 c B E 5
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.memory_usage.md
# maxframe.dataframe.DataFrame.memory_usage
#### DataFrame.memory_usage(index=True, deep=False)
Return the memory usage of each column in bytes.
The memory usage can optionally include the contribution of
the index and elements of object dtype.
This value is displayed in DataFrame.info by default. This can be
suppressed by setting `pandas.options.display.memory_usage` to False.
* **Parameters:**
* **index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Specifies whether to include the memory usage of the DataFrame’s
index in returned Series. If `index=True`, the memory usage of
the index is the first item in the output.
* **deep** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, introspect the data deeply by interrogating
object dtypes for system-level memory consumption, and include
it in the returned values.
* **Returns:**
A Series whose index is the original column names and whose values
is the memory usage of each column in bytes.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`numpy.ndarray.nbytes`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.nbytes.html#numpy.ndarray.nbytes)
: Total bytes consumed by the elements of an ndarray.
[`Series.memory_usage`](maxframe.dataframe.Series.memory_usage.md#maxframe.dataframe.Series.memory_usage)
: Bytes consumed by a Series.
`Categorical`
: Memory-efficient array for string values with many repeated values.
`DataFrame.info`
: Concise summary of a DataFrame.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
>>> data = dict([(t, mt.ones(shape=5000).astype(t))
... for t in dtypes])
>>> df = md.DataFrame(data)
>>> df.head().execute()
int64 float64 complex128 object bool
0 1 1.0 1.000000+0.000000j 1 True
1 1 1.0 1.000000+0.000000j 1 True
2 1 1.0 1.000000+0.000000j 1 True
3 1 1.0 1.000000+0.000000j 1 True
4 1 1.0 1.000000+0.000000j 1 True
```
```pycon
>>> df.memory_usage().execute()
Index 128
int64 40000
float64 40000
complex128 80000
object 40000
bool 5000
dtype: int64
```
```pycon
>>> df.memory_usage(index=False).execute()
int64 40000
float64 40000
complex128 80000
object 40000
bool 5000
dtype: int64
```
The memory footprint of object dtype columns is ignored by default:
```pycon
>>> df.memory_usage(deep=True).execute()
Index 128
int64 40000
float64 40000
complex128 80000
object 160000
bool 5000
dtype: int64
```
Use a Categorical for efficient storage of an object-dtype column with
many repeated values.
```pycon
>>> df['object'].astype('category').memory_usage(deep=True).execute()
5216
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.merge.md
# maxframe.dataframe.DataFrame.merge
#### DataFrame.merge(right: DataFrame | Series, how: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'inner', on: [str](https://docs.python.org/3/library/stdtypes.html#str) | [List](https://docs.python.org/3/library/typing.html#typing.List)[[str](https://docs.python.org/3/library/stdtypes.html#str)] = None, left_on: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, right_on: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, left_index: [bool](https://docs.python.org/3/library/functions.html#bool) = False, right_index: [bool](https://docs.python.org/3/library/functions.html#bool) = False, sort: [bool](https://docs.python.org/3/library/functions.html#bool) = False, suffixes: [Tuple](https://docs.python.org/3/library/typing.html#typing.Tuple)[[str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None), [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None)] = ('_x', '_y'), copy: [bool](https://docs.python.org/3/library/functions.html#bool) = True, indicator: [bool](https://docs.python.org/3/library/functions.html#bool) = False, validate: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, method: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'auto', auto_merge: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'both', auto_merge_threshold: [int](https://docs.python.org/3/library/functions.html#int) = 8, bloom_filter: [bool](https://docs.python.org/3/library/functions.html#bool) | [str](https://docs.python.org/3/library/stdtypes.html#str) = 'auto', bloom_filter_options: [Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)] = None, left_hint: JoinHint = None, right_hint: JoinHint = None) → DataFrame
Merge DataFrame or named Series objects with a database-style join.
A named Series object is treated as a DataFrame with a single named column.
The join is done on columns or indexes. If joining columns on
columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes
on indexes or indexes on a column or columns, the index will be passed on.
When performing a cross merge, no column specifications to merge on are
allowed.
* **Parameters:**
* **right** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *or* *named Series*) – Object to merge with.
* **how** ( *{'left'* *,* *'right'* *,* *'outer'* *,* *'inner'}* *,* *default 'inner'*) –
Type of merge to be performed.
* left: use only keys from left frame, similar to a SQL left outer join;
preserve key order.
* right: use only keys from right frame, similar to a SQL right outer join;
preserve key order.
* outer: use union of keys from both frames, similar to a SQL full outer
join; sort keys lexicographically.
* inner: use intersection of keys from both frames, similar to a SQL inner
join; preserve the order of the left keys.
* **on** (*label* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list)) – Column or index level names to join on. These must be found in both
DataFrames. If on is None and not merging on indexes then this defaults
to the intersection of the columns in both DataFrames.
* **left_on** (*label* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *, or* *array-like*) – Column or index level names to join on in the left DataFrame. Can also
be an array or list of arrays of the length of the left DataFrame.
These arrays are treated as if they are columns.
* **right_on** (*label* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *, or* *array-like*) – Column or index level names to join on in the right DataFrame. Can also
be an array or list of arrays of the length of the right DataFrame.
These arrays are treated as if they are columns.
* **left_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Use the index from the left DataFrame as the join key(s). If it is a
MultiIndex, the number of keys in the other DataFrame (either the index
or a number of columns) must match the number of levels.
* **right_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Use the index from the right DataFrame as the join key. Same caveats as
left_index.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Sort the join keys lexicographically in the result DataFrame. If False,
the order of the join keys depends on the join type (how keyword).
* **suffixes** (*list-like* *,* *default is* *(* *"_x"* *,* *"_y"* *)*) – A length-2 sequence where each element is optionally a string
indicating the suffix to add to overlapping column names in
left and right respectively. Pass a value of None instead
of a string to indicate that the column name from left or
right should be left as-is, with no suffix. At least one of the
values must not be None.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If False, avoid copy if possible.
* **indicator** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default False*) – If True, adds a column to the output DataFrame called “_merge” with
information on the source of each row. The column can be given a different
name by providing a string argument. The column will have a Categorical
type with the value of “left_only” for observations whose merge key only
appears in the left DataFrame, “right_only” for observations
whose merge key only appears in the right DataFrame, and “both”
if the observation’s merge key is found in both DataFrames.
* **validate** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) –
If specified, checks if merge is of specified type.
* ”one_to_one” or “1:1”: check if merge keys are unique in both
left and right datasets.
* ”one_to_many” or “1:m”: check if merge keys are unique in left
dataset.
* ”many_to_one” or “m:1”: check if merge keys are unique in right
dataset.
* ”many_to_many” or “m:m”: allowed, but does not result in checks.
* **method** ( *{"auto"* *,* *"shuffle"* *,* *"broadcast"}* *,* *default auto*) – “broadcast” is recommended when one DataFrame is much smaller than the other,
otherwise, “shuffle” will be a better choice. By default, we choose method
according to actual data size.
* **auto_merge** ( *{"both"* *,* *"none"* *,* *"before"* *,* *"after"}* *,* *default both*) –
Auto merge small chunks before or after merge
* ”both”: auto merge small chunks before and after,
* ”none”: do not merge small chunks
* ”before”: only merge small chunks before merge
* ”after”: only merge small chunks after merge
* **auto_merge_threshold** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 8*) – When how is “inner”, merged result could be much smaller than original DataFrame,
if the number of chunks is greater than the threshold,
it will merge small chunks automatically.
* **bloom_filter** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default "auto"*) – Use bloom filter to optimize merge
* **bloom_filter_options** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) –
* “max_elements”: max elements in bloom filter,
default value is the max size of all input chunks
* ”error_rate”: error raite, default 0.1.
* ”apply_chunk_size_threshold”: min chunk size of input chunks to apply bloom filter, default 10
when chunk size of left and right is greater than this threshold, apply bloom filter
* ”filter”: “large”, “small”, “both”, default “large”
decides to filter on large, small or both DataFrames.
* **left_hint** (*JoinHint* *,* *default None*) – Join strategy to use for left frame. When data skew occurs, consider these strategies to avoid long-tail issues,
but use them cautiously to prevent OOM and unnecessary overhead.
* **right_hint** (*JoinHint* *,* *default None*) – Join strategy to use for right frame.
* **Returns:**
A DataFrame of the two merged objects.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df1 = md.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
... 'value': [1, 2, 3, 5]})
>>> df2 = md.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
... 'value': [5, 6, 7, 8]})
>>> df1.execute()
lkey value
0 foo 1
1 bar 2
2 baz 3
3 foo 5
>>> df2.execute()
rkey value
0 foo 5
1 bar 6
2 baz 7
3 foo 8
```
Merge df1 and df2 on the lkey and rkey columns. The value columns have
the default suffixes, \_x and \_y, appended.
```pycon
>>> df1.merge(df2, left_on='lkey', right_on='rkey').execute()
lkey value_x rkey value_y
0 foo 1 foo 5
1 foo 1 foo 8
2 foo 5 foo 5
3 foo 5 foo 8
4 bar 2 bar 6
5 baz 3 baz 7
```
Merge DataFrames df1 and df2 with specified left and right suffixes
appended to any overlapping columns.
```pycon
>>> df1.merge(df2, left_on='lkey', right_on='rkey',
... suffixes=('_left', '_right')).execute()
lkey value_left rkey value_right
0 foo 1 foo 5
1 foo 1 foo 8
2 foo 5 foo 5
3 foo 5 foo 8
4 bar 2 bar 6
5 baz 3 baz 7
```
Merge DataFrames df1 and df2, but raise an exception if the DataFrames have
any overlapping columns.
```pycon
>>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False)).execute()
Traceback (most recent call last):
...
ValueError: columns overlap but no suffix specified:
Index(['value'], dtype='object')
```
```pycon
>>> df1 = md.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
>>> df2 = md.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
>>> df1.execute()
a b
0 foo 1
1 bar 2
>>> df2.execute()
a c
0 foo 3
1 baz 4
```
```pycon
>>> df1.merge(df2, how='inner', on='a').execute()
a b c
0 foo 1 3
```
```pycon
>>> df1.merge(df2, how='left', on='a').execute()
a b c
0 foo 1 3.0
1 bar 2 NaN
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.mf.apply_chunk.md
# maxframe.dataframe.DataFrame.mf.apply_chunk
#### DataFrame.mf.apply_chunk(func: [str](https://docs.python.org/3/library/stdtypes.html#str) | [Callable](https://docs.python.org/3/library/typing.html#typing.Callable), batch_rows=None, dtypes=None, dtype=None, name=None, output_type=None, index=None, skip_infer=False, args=(), \*\*kwargs)
Apply a function that takes pandas DataFrame and outputs pandas DataFrame/Series.
The pandas DataFrame given to the function is a chunk of the input dataframe, consider as a batch rows.
The objects passed into this function are slices of the original DataFrame, containing at most batch_rows
number of rows and all columns. It is equivalent to merging multiple `df.apply` with `axis=1` inputs and then
passing them into the function for execution, thereby improving performance in specific scenarios. The function
output can be either a DataFrame or a Series. `apply_chunk` will ultimately merge the results into a new
DataFrame or Series.
Don’t expect to receive all rows of the DataFrame in the function, as it depends on the implementation
of MaxFrame and the internal running state of MaxCompute.
* **Parameters:**
* **func** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *Callable*) – Function to apply to the dataframe chunk.
* **batch_rows** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Specify expected number of rows in a batch, as well as the len of function input dataframe. When the remaining
data is insufficient, it may be less than this number.
* **output_type** ( *{'dataframe'* *,* *'series'}* *,* *default None*) – Specify type of returned object. See Notes for more details.
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *default None*) – Specify dtypes of returned DataFrames. See Notes for more details.
* **dtype** ([*numpy.dtype*](https://numpy.org/doc/stable/reference/generated/numpy.dtype.html#numpy.dtype) *,* *default None*) – Specify dtype of returned Series. See Notes for more details.
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Specify name of returned Series. See Notes for more details.
* **index** ([*Index*](maxframe.dataframe.Index.md#maxframe.dataframe.Index) *,* *default None*) – Specify index of returned object. See Notes for more details.
* **skip_infer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether infer dtypes when dtypes or output_type is not specified.
* **args** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple)) – Positional arguments to pass to `func` in addition to the
array/series.
* **\*\*kwds** – Additional keyword arguments to pass as keywords arguments to
`func`.
* **Returns:**
Result of applying `func` along the given chunk of the
DataFrame.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.apply`](maxframe.dataframe.DataFrame.apply.md#maxframe.dataframe.DataFrame.apply)
: For non-batching operations.
[`Series.mf.apply_chunk`](maxframe.dataframe.Series.mf.apply_chunk.md#maxframe.dataframe.Series.mf.apply_chunk)
: Apply function to Series chunk.
### Notes
When deciding output dtypes and shape of the return value, MaxFrame will
try applying `func` onto a mock DataFrame, and the apply call may
fail. When this happens, you need to specify the type of apply call
(DataFrame or Series) in output_type.
* For DataFrame output, you need to specify a list or a pandas Series
as `dtypes` of output DataFrame. `index` of output can also be
specified.
* For Series output, you need to specify `dtype` and `name` of
output Series.
* For any input with data type `pandas.ArrowDtype(pyarrow.MapType)`, it will always
be converted to a Python dict. And for any output with this data type, it must be
returned as a Python dict as well.
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df.execute()
A B
0 4 9
1 4 9
2 4 9
```
Use different batch_rows will collect different dataframe chunk into the function.
For example, when you use `batch_rows=3`, it means that the function will wait until 3 rows are collected.
```pycon
>>> df.mf.apply_chunk(np.sum, batch_rows=3).execute()
A 12
B 27
dtype: int64
```
While, if `batch_rows=2`, the data will be divided into at least two segments. Additionally, if your function
alters the shape of the dataframe, it may result in different outputs.
```pycon
>>> df.mf.apply_chunk(np.sum, batch_rows=2).execute()
A 8
B 18
A 4
B 9
dtype: int64
```
If the function requires some parameters, you can specify them using args or kwargs.
```pycon
>>> def calc(df, x, y):
... return df * x + y
>>> df.mf.apply_chunk(calc, args=(10,), y=20).execute()
A B
0 60 110
1 60 110
2 60 110
```
The batch rows will benefit the actions consume a dataframe, like sklearn predict.
You can easily use sklearn in MaxFrame to perform offline inference, and apply_chunk makes this process more
efficient. The `@with_python_requirements` provides the capability to automatically package and load
dependencies.
Once you rely on some third-party dependencies, MaxFrame may not be able to correctly infer the return type.
Therefore, using `output_type` with `dtype` or `dtypes` is necessary.
```pycon
>>> from maxframe.udf import with_python_requirements
>>> data = {
... 'A': np.random.rand(10),
... 'B': np.random.rand(10)
... }
>>> pd_df = pd.DataFrame(data)
>>> X = pd_df[['A']]
>>> y = pd_df['B']
```
```pycon
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LinearRegression
>>> model = LinearRegression()
>>> model.fit(X, y)
```
```pycon
>>> @with_python_requirements("scikit-learn")
... def predict(df):
... predict_B = model.predict(df[["A"]])
... return pd.Series(predict_B, index=df.A.index)
```
```pycon
>>> df.mf.apply_chunk(predict, batch_rows=3, output_type="series", dtype="float", name="predict_B").execute()
0 -0.765025
1 -0.765025
2 -0.765025
Name: predict_B, dtype: float64
```
Create a dataframe with a dict type.
```pycon
>>> import pyarrow as pa
>>> import pandas as pd
>>> from maxframe.lib.dtypes_extension import dict_
>>> col_a = pd.Series(
... data=[[("k1", 1), ("k2", 2)], [("k1", 3)], None],
... index=[1, 2, 3],
... dtype=dict_(pa.string(), pa.int64()),
... )
>>> col_b = pd.Series(
... data=["A", "B", "C"],
... index=[1, 2, 3],
... )
>>> df = md.DataFrame({"A": col_a, "B": col_b})
>>> df.execute()
A B
1 [('k1', 1), ('k2', 2)] A
2 [('k1', 3)] B
3 <NA> C
```
Define a function that updates the map type with a new key-value pair in a batch.
```pycon
>>> def custom_set_item(df):
... for name, value in df["A"].items():
... if value is not None:
... df["A"][name]["x"] = 100
... return df
```
```pycon
>>> mf.apply_chunk(
... process,
... output_type="dataframe",
... dtypes=md_df.dtypes.copy(),
... batch_rows=2,
... skip_infer=True,
... index=md_df.index,
... )
A B
1 [('k1', 1), ('k2', 2), ('x', 10))] A
2 [('k1', 3), ('x', 10)] B
3 <NA> C
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.mf.collect_kv.md
# maxframe.dataframe.DataFrame.mf.collect_kv
#### DataFrame.mf.collect_kv(columns=None, kv_delim='=', item_delim=',', kv_col='kv_col')
Merge values in specified columns into a key-value represented column.
* **Parameters:**
* **columns** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *default None*) – The columns to be merged.
* **kv_delim** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '='*) – Delimiter between key and value.
* **item_delim** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '* *,* *'*) – Delimiter between key-value pairs.
* **kv_col** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 'kv_col'*) – Name of the new key-value column
* **Returns:**
converted data frame
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.mf.extract_kv`](maxframe.dataframe.DataFrame.mf.extract_kv.md#maxframe.dataframe.DataFrame.mf.extract_kv)
### Examples
```pycon
>>> import maxframe.dataframe as md
```
```pycon
>>> df = md.DataFrame({"name": ["name1", "name2", "name3", "name4", "name5"],
... "k1": [1.0, NaN, 7.1, NaN, NaN],
... "k2": [3.0, 3.0, NaN, 1.2, 1.0],
... "k3": [NaN, 5.1, NaN, 1.5, NaN],
... "k5": [10.0, NaN, NaN, NaN, NaN,],
... "k7": [NaN, NaN, 8.2, NaN, NaN, ],
... "k9": [NaN, NaN, NaN, NaN, 1.1]})
>>> df.execute()
name k1 k2 k3 k5 k7 k9
0 name1 1.0 3.0 NaN 10.0 NaN NaN
1 name2 NaN 3.0 5.1 NaN NaN NaN
2 name3 7.1 NaN NaN NaN 8.2 NaN
3 name4 NaN 1.2 1.5 NaN NaN NaN
4 name5 NaN 1.0 NaN NaN NaN 1.1
```
The field names to be merged are specified by columns
kv_delim is to delimit the key and value and ‘=’ is default
item_delim is to delimit the Key-Value pairs, ‘,’ is default
The new column name is specified by kv_col, ‘kv_col’ is default
```pycon
>>> df.mf.collect_kv(columns=['k1', 'k2', 'k3', 'k5', 'k7', 'k9']).execute()
name kv_col
0 name1 k1=1.0,k2=3.0,k5=10.0
1 name2 k2=3.0,k3=5.1
2 name3 k1=7.1,k7=8.2
3 name4 k2=1.2,k3=1.5
4 name5 k2=1.0,k9=1.1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.mf.extract_kv.md
# maxframe.dataframe.DataFrame.mf.extract_kv
#### DataFrame.mf.extract_kv(columns=None, kv_delim='=', item_delim=',', dtype='float', fill_value=None, errors='raise')
Extract values in key-value represented columns into standalone columns.
New column names will be the name of the key-value column followed by
an underscore and the key.
* **Parameters:**
* **columns** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *default None*) – The key-value columns to be extracted.
* **kv_delim** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '='*) – Delimiter between key and value.
* **item_delim** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '* *,* *'*) – Delimiter between key-value pairs.
* **dtype** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Type of value columns to generate.
* **fill_value** ([*object*](https://docs.python.org/3/library/functions.html#object) *,* *default None*) – Default value for missing key-value pairs.
* **errors** ( *{'ignore'* *,* *'raise'}* *,* *default 'raise'*) –
* If ‘raise’, then invalid parsing will raise an exception.
* If ‘ignore’, then invalid parsing will return the input.
* **Returns:**
extracted data frame
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.mf.collect_kv`](maxframe.dataframe.DataFrame.mf.collect_kv.md#maxframe.dataframe.DataFrame.mf.collect_kv)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
```
```pycon
>>> df = md.DataFrame({"name": ["name1", "name2", "name3", "name4", "name5"],
... "kv": ["k1=1.0,k2=3.0,k5=10.0",
... "k2=3.0,k3=5.1",
... "k1=7.1,k7=8.2",
... "k2=1.2,k3=1.5",
... "k2=1.0,k9=1.1"]})
>>> df.execute()
name kv
0 name1 k1=1.0,k2=3.0,k5=10.0
1 name2 k2=3.0,k3=5.1
2 name3 k1=7.1,k7=8.2
3 name4 k2=1.2,k3=1.5
4 name5 k2=1.0,k9=1.1
```
The field names to be expanded are specified by columns
kv_delim is to delimit the key and value and ‘=’ is default
item_delim is to delimit the Key-Value pairs, ‘,’ is default
The output field name is the original field name connect with the key by “_”
fill_value is used to fill missing values, None is default
```pycon
>>> df.mf.extract_kv(columns=['kv'], kv_delim='=', item_delim=',').execute()
name kv_k1 kv_k2 kv_k3 kv_k5 kv_k7 kv_k9
0 name1 1.0 3.0 NaN 10.0 NaN NaN
1 name2 NaN 3.0 5.1 NaN NaN NaN
2 name3 7.1 NaN NaN NaN 8.2 NaN
3 name4 NaN 1.2 1.5 NaN NaN NaN
4 name5 NaN 1.0 NaN NaN NaN 1.1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.mf.flatmap.md
# maxframe.dataframe.DataFrame.mf.flatmap
#### DataFrame.mf.flatmap(func: [Callable](https://docs.python.org/3/library/typing.html#typing.Callable), dtypes=None, raw=False, args=(), \*\*kwargs)
Apply the given function to each row and then flatten results. Use this method if your transformation returns
multiple rows for each input row.
This function applies a transformation to each row of the DataFrame, where the transformation can return zero
or multiple values, effectively flattening Python generators, list-like collections, and DataFrames.
* **Parameters:**
* **func** (*Callable*) – Function to apply to each row of the DataFrame. It should accept a Series (or an array if raw=True)
representing a row and return a list or iterable of values.
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list)) – Specify dtypes of returned DataFrame.
* **raw** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) –
Determines if the row is passed as a Series or as a numpy array:
* `False` : passes each row as a Series to the function.
* `True` : the passed function will receive numpy array objects instead.
* **args** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple)) – Positional arguments to pass to func.
* **\*\*kwargs** – Additional keyword arguments to pass as keywords arguments to func.
* **Returns:**
Return DataFrame with specified dtypes.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Notes
The `func` must return an iterable of values for each input row. The index of the resulting DataFrame will be
repeated based on the number of output rows generated by func.
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
>>> df.execute()
A B
0 1 4
1 2 5
2 3 6
```
Define a function that takes a number and returns a list of two numbers:
```pycon
>>> def generate_values_array(row):
... return [row['A'] * 2, row['B'] * 3]
```
Define a function that takes a row and return two rows and two columns:
```pycon
>>> def generate_values_in_generator(row):
... yield [row[0] * 2, row[1] * 4]
... yield [row[0] * 3, row[1] * 5]
```
Which equals to the following function return a dataframe:
```pycon
>>> def generate_values_in_dataframe(row):
... return pd.DataFrame([[row[0] * 2, row[1] * 4], [row[0] * 3, row[1] * 5]])
```
Specify dtypes with a function which returns a DataFrame:
```pycon
>>> df.mf.flatmap(generate_values_array, dtypes=pd.Series({'A': 'int'})).execute()
A
0 2
0 12
1 4
1 15
2 6
2 18
```
Specify raw=True to pass input row as array:
```pycon
>>> df.mf.flatmap(generate_values_in_generator, dtypes={"A": "int", "B": "int"}, raw=True).execute()
A B
0 2 16
0 3 20
1 4 20
1 6 25
2 6 24
2 9 30
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.mf.map_reduce.md
# maxframe.dataframe.DataFrame.mf.map_reduce
#### DataFrame.mf.map_reduce(mapper: [Callable](https://docs.python.org/3/library/typing.html#typing.Callable) | [None](https://docs.python.org/3/library/constants.html#None) = None, reducer: [Callable](https://docs.python.org/3/library/typing.html#typing.Callable) | [None](https://docs.python.org/3/library/constants.html#None) = None, group_cols: [List](https://docs.python.org/3/library/typing.html#typing.List)[[Any](https://docs.python.org/3/library/typing.html#typing.Any)] | [None](https://docs.python.org/3/library/constants.html#None) = None, , order_cols: [List](https://docs.python.org/3/library/typing.html#typing.List)[[Any](https://docs.python.org/3/library/typing.html#typing.Any)] = None, ascending: [bool](https://docs.python.org/3/library/functions.html#bool) | [List](https://docs.python.org/3/library/typing.html#typing.List)[[bool](https://docs.python.org/3/library/functions.html#bool)] = True, combiner: [Callable](https://docs.python.org/3/library/typing.html#typing.Callable) = None, batch_rows: [int](https://docs.python.org/3/library/functions.html#int) | [None](https://docs.python.org/3/library/constants.html#None) = 1024, mapper_dtypes: Series = None, mapper_index: Index = None, mapper_batch_rows: [int](https://docs.python.org/3/library/functions.html#int) | [None](https://docs.python.org/3/library/constants.html#None) = None, reducer_dtypes: Series = None, reducer_index: Index = None, reducer_batch_rows: [int](https://docs.python.org/3/library/functions.html#int) | [None](https://docs.python.org/3/library/constants.html#None) = None, ignore_index: [bool](https://docs.python.org/3/library/functions.html#bool) = False)
Map-reduce API over certain DataFrames. This function is roughly
a shortcut for
```python
df.mf.apply_chunk(mapper).groupby(group_keys).mf.apply_chunk(reducer)
```
* **Parameters:**
* **mapper** (*function* *or* [*type*](https://docs.python.org/3/library/functions.html#type)) – Mapper function or class.
* **reducer** (*function* *or* [*type*](https://docs.python.org/3/library/functions.html#type)) – Reducer function or class.
* **group_cols** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *]*) – The keys to group after mapper. If absent, all columns in the mapped
DataFrame will be used.
* **order_cols** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *]*) – The columns to sort after groupby.
* **ascending** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *[*[*bool*](https://docs.python.org/3/library/functions.html#bool) *] or* *None*) – Whether columns should be in ascending order or not, only effective when
order_cols are specified. If a list of booleans are passed, orders of
every column in order_cols are specified.
* **combiner** (*function* *or* *class*) – Combiner function or class. Should accept and returns the same schema
of mapper outputs.
* **batch_rows** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *None*) – Rows in batches for mappers and reducers. Ignored if mapper_batch_rows
specified for mappers or reducer_batch_rows specified for reducers.
1024 by default.
* **mapper_dtypes** (*pd.Series* *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *or* *None*) – Output dtypes of mapper stage.
* **mapper_index** (*pd.Index* *or* *None*) – Index of DataFrame returned by mappers.
* **mapper_batch_rows** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *None*) – Rows in batches for mappers. If specified, batch_rows will be ignored
for mappers.
* **reducer_dtypes** (*pd.Series* *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *or* *None*) – Output dtypes of reducer stage.
* **reducer_index** (*pd.Index* *or* *None*) – Index of DataFrame returned by reducers.
* **reducer_batch_rows** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *None*) – Rows in batches for mappers. If specified, batch_rows will be ignored
for reducers.
* **ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – If true, indexes generated at mapper or reducer functions will be ignored.
* **Returns:**
**output** – Result DataFrame after map and reduce.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
We first define a DataFrame with a column of several words.
```pycon
>>> from collections import defaultdict
>>> import maxframe.dataframe as md
>>> from maxframe.udf import with_running_options
>>> df = pd.DataFrame(
>>> {
>>> "name": ["name key", "name", "key", "name", "key name"],
>>> "id": [4, 2, 4, 3, 3],
>>> "fid": [5.3, 3.5, 4.2, 2.2, 4.1],
>>> }
>>> )
```
Then we write a mapper function which accepts batches in the DataFrame
and returns counts of words in every row.
```pycon
>>> def mapper(batch):
>>> word_to_count = defaultdict(lambda: 0)
>>> for words in batch["name"]:
>>> for w in words.split():
>>> word_to_count[w] += 1
>>> return pd.DataFrame(
>>> [list(tp) for tp in word_to_count.items()], columns=["word", "count"]
>>> )
```
After that we write a reducer function which aggregates records with
the same word. Running options such as CPU specifications can be supplied
as well.
```pycon
>>> @with_running_options(cpu=2)
>>> class TestReducer:
>>> def __init__(self):
>>> self._word_to_count = defaultdict(lambda: 0)
>>>
>>> def __call__(self, batch, end=False):
>>> word = None
>>> for _, row in batch.iterrows():
>>> word = row.iloc[0]
>>> self._word_to_count[row.iloc[0]] += row.iloc[1]
>>> if end:
>>> return pd.DataFrame(
>>> [[word, self._word_to_count[word]]], columns=["word", "count"]
>>> )
>>>
>>> def close(self):
>>> # you can do several cleanups here
>>> print("close")
```
Finally we can call map_reduce with mappers and reducers specified above.
```pycon
>>> res = df.mf.map_reduce(
>>> mapper,
>>> TestReducer,
>>> group_cols=["word"],
>>> mapper_dtypes={"word": "str", "count": "int"},
>>> mapper_index=pd.Index([0]),
>>> reducer_dtypes={"word": "str", "count": "int"},
>>> reducer_index=pd.Index([0]),
>>> ignore_index=True,
>>> )
>>> res.execute().fetch()
word count
0 key 3
1 name 4
```
#### SEE ALSO
[`DataFrame.mf.apply_chunk`](maxframe.dataframe.DataFrame.mf.apply_chunk.md#maxframe.dataframe.DataFrame.mf.apply_chunk), `DataFrame.groupby.mf.apply_chunk`
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.mf.rebalance.md
# maxframe.dataframe.DataFrame.mf.rebalance
#### DataFrame.mf.rebalance(axis=0, factor=None, num_partitions=None)
Make data more balanced across entire cluster.
* **Parameters:**
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int)) – The axis to rebalance.
* **factor** ([*float*](https://docs.python.org/3/library/functions.html#float)) – Specified so that number of chunks after balance is
total number of input chunks \* factor.
* **num_partitions** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Specified so the number of chunks are at most
num_partitions.
* **Returns:**
Result of DataFrame or Series after rebalanced.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.mf.reshuffle.md
# maxframe.dataframe.DataFrame.mf.reshuffle
#### DataFrame.mf.reshuffle(group_by: [List](https://docs.python.org/3/library/typing.html#typing.List)[[Any](https://docs.python.org/3/library/typing.html#typing.Any)] | [None](https://docs.python.org/3/library/constants.html#None) = None, sort_by: [List](https://docs.python.org/3/library/typing.html#typing.List)[[Any](https://docs.python.org/3/library/typing.html#typing.Any)] | [None](https://docs.python.org/3/library/constants.html#None) = None, ascending: [bool](https://docs.python.org/3/library/functions.html#bool) = True, ignore_index: [bool](https://docs.python.org/3/library/functions.html#bool) = False)
Shuffle data in DataFrame or Series to make data distribution more
randomized.
* **Parameters:**
* **group_by** (*Optional* *[**List* *[**Any* *]* *]*) – Determine columns to group data while shuffling.
* **sort_by** (*Optional* *[**List* *[**Any* *]* *]*)
* **ascending**
* **ignore_index**
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.min.md
# maxframe.dataframe.DataFrame.min
#### DataFrame.min(axis=None, skipna=True, level=None, numeric_only=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.mod.md
# maxframe.dataframe.DataFrame.mod
#### DataFrame.mod(other, axis='columns', level=None, fill_value=None)
Get Modulo of dataframe and other, element-wise (binary operator mod).
Equivalent to `%`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rmod.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.mode.md
# maxframe.dataframe.DataFrame.mode
#### DataFrame.mode(axis=0, numeric_only=False, dropna=True, combine_size=None)
Get the mode(s) of each element along the selected axis.
The mode of a set of values is the value that appears most often.
It can be multiple values.
* **Parameters:**
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) –
The axis to iterate over while searching for the mode:
* 0 or ‘index’ : get mode of each column
* 1 or ‘columns’ : get mode of each row.
* **numeric_only** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, only apply to numeric columns.
* **dropna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Don’t consider counts of NaN/NaT.
* **Returns:**
The modes of each column or row.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`Series.mode`](maxframe.dataframe.Series.mode.md#maxframe.dataframe.Series.mode)
: Return the highest frequency value in a Series.
[`Series.value_counts`](maxframe.dataframe.Series.value_counts.md#maxframe.dataframe.Series.value_counts)
: Return the counts of values in a Series.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([('bird', 2, 2),
... ('mammal', 4, mt.nan),
... ('arthropod', 8, 0),
... ('bird', 2, mt.nan)],
... index=('falcon', 'horse', 'spider', 'ostrich'),
... columns=('species', 'legs', 'wings'))
>>> df.execute()
species legs wings
falcon bird 2 2.0
horse mammal 4 NaN
spider arthropod 8 0.0
ostrich bird 2 NaN
```
By default, missing values are not considered, and the mode of wings
are both 0 and 2. Because the resulting DataFrame has two rows,
the second row of `species` and `legs` contains `NaN`.
```pycon
>>> df.mode().execute()
species legs wings
0 bird 2.0 0.0
1 NaN NaN 2.0
```
Setting `dropna=False` `NaN` values are considered and they can be
the mode (like for wings).
```pycon
>>> df.mode(dropna=False).execute()
species legs wings
0 bird 2 NaN
```
Setting `numeric_only=True`, only the mode of numeric columns is
computed, and columns of other types are ignored.
```pycon
>>> df.mode(numeric_only=True).execute()
legs wings
0 2.0 0.0
1 NaN 2.0
```
To compute the mode over columns and not rows, use the axis parameter:
```pycon
>>> df.mode(axis='columns', numeric_only=True).execute()
0 1
falcon 2.0 NaN
horse 4.0 NaN
spider 0.0 8.0
ostrich 2.0 NaN
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.mul.md
# maxframe.dataframe.DataFrame.mul
#### DataFrame.mul(other, axis='columns', level=None, fill_value=None)
Get Multiplication of dataframe and other, element-wise (binary operator mul).
Equivalent to `*`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rmul.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.ndim.md
# maxframe.dataframe.DataFrame.ndim
#### *property* DataFrame.ndim
Return an int representing the number of axes / array dimensions.
Return 1 if Series. Otherwise return 2 if DataFrame.
#### SEE ALSO
`ndarray.ndim`
: Number of array dimensions.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.ndim
1
```
```pycon
>>> df = md.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.ndim
2
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.ne.md
# maxframe.dataframe.DataFrame.ne
#### DataFrame.ne(other, axis='columns', level=None, fill_value=None)
Get Not equal to of dataframe and other, element-wise (binary operator ne).
Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison
operators.
Equivalent to dataframe != other with support to choose axis (rows or columns)
and level for comparison.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 'columns'*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’).
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the passed
MultiIndex level.
* **Returns:**
Result of the comparison.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`DataFrame.eq`](maxframe.dataframe.DataFrame.eq.md#maxframe.dataframe.DataFrame.eq)
: Compare DataFrames for equality elementwise.
[`DataFrame.ne`](#maxframe.dataframe.DataFrame.ne)
: Compare DataFrames for inequality elementwise.
[`DataFrame.le`](maxframe.dataframe.DataFrame.le.md#maxframe.dataframe.DataFrame.le)
: Compare DataFrames for less than inequality or equality elementwise.
[`DataFrame.lt`](maxframe.dataframe.DataFrame.lt.md#maxframe.dataframe.DataFrame.lt)
: Compare DataFrames for strictly less than inequality elementwise.
[`DataFrame.ge`](maxframe.dataframe.DataFrame.ge.md#maxframe.dataframe.DataFrame.ge)
: Compare DataFrames for greater than inequality or equality elementwise.
[`DataFrame.gt`](maxframe.dataframe.DataFrame.gt.md#maxframe.dataframe.DataFrame.gt)
: Compare DataFrames for strictly greater than inequality elementwise.
### Notes
Mismatched indices will be unioned together.
NaN values are considered different (i.e. NaN != NaN).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'cost': [250, 150, 100],
... 'revenue': [100, 250, 300]},
... index=['A', 'B', 'C'])
>>> df.execute()
cost revenue
A 250 100
B 150 250
C 100 300
```
Comparison with a scalar, using either the operator or method:
```pycon
>>> (df == 100).execute()
cost revenue
A False True
B False False
C True False
```
```pycon
>>> df.eq(100).execute()
cost revenue
A False True
B False False
C True False
```
When other is a [`Series`](maxframe.dataframe.Series.md#maxframe.dataframe.Series), the columns of a DataFrame are aligned
with the index of other and broadcast:
```pycon
>>> (df != pd.Series([100, 250], index=["cost", "revenue"])).execute()
cost revenue
A True True
B True False
C False True
```
Use the method to control the broadcast axis:
```pycon
>>> df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index').execute()
cost revenue
A True False
B True True
C True True
D True True
```
When comparing to an arbitrary sequence, the number of columns must
match the number elements in other:
```pycon
>>> (df == [250, 100]).execute()
cost revenue
A True True
B False False
C False False
```
Use the method to control the axis:
```pycon
>>> df.eq([250, 250, 100], axis='index').execute()
cost revenue
A True False
B False True
C True False
```
Compare to a DataFrame of different shape.
```pycon
>>> other = md.DataFrame({'revenue': [300, 250, 100, 150]},
... index=['A', 'B', 'C', 'D'])
>>> other.execute()
revenue
A 300
B 250
C 100
D 150
```
```pycon
>>> df.gt(other).execute()
cost revenue
A False False
B False False
C False True
D False False
```
Compare to a MultiIndex by level.
```pycon
>>> df_multindex = md.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
... 'revenue': [100, 250, 300, 200, 175, 225]},
... index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
... ['A', 'B', 'C', 'A', 'B', 'C']])
>>> df_multindex.execute()
cost revenue
Q1 A 250 100
B 150 250
C 100 300
Q2 A 150 200
B 300 175
C 220 225
```
```pycon
>>> df.le(df_multindex, level=1).execute()
cost revenue
Q1 A True True
B True True
C True True
Q2 A False True
B True False
C True False
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.nlargest.md
# maxframe.dataframe.DataFrame.nlargest
#### DataFrame.nlargest(n, columns, keep='first')
Return the first n rows ordered by columns in descending order.
Return the first n rows with the largest values in columns, in
descending order. The columns that are not specified are returned as
well, but not used for ordering.
This method is equivalent to
`df.sort_values(columns, ascending=False).head(n)`, but more
performant.
* **Parameters:**
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Number of rows to return.
* **columns** (*label* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *labels*) – Column label(s) to order by.
* **keep** ( *{'first'* *,* *'last'* *,* *'all'}* *,* *default 'first'*) –
Where there are duplicate values:
- first : prioritize the first occurrence(s)
- last : prioritize the last occurrence(s)
- `all`
: selecting more than n items.
* **Returns:**
The first n rows ordered by the given columns in descending
order.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.nsmallest`](maxframe.dataframe.DataFrame.nsmallest.md#maxframe.dataframe.DataFrame.nsmallest)
: Return the first n rows ordered by columns in ascending order.
[`DataFrame.sort_values`](maxframe.dataframe.DataFrame.sort_values.md#maxframe.dataframe.DataFrame.sort_values)
: Sort DataFrame by the values.
[`DataFrame.head`](maxframe.dataframe.DataFrame.head.md#maxframe.dataframe.DataFrame.head)
: Return the first n rows without re-ordering.
### Notes
This function cannot be used with all column types. For example, when
specifying columns with object or category dtypes, `TypeError` is
raised.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'population': [59000000, 65000000, 434000,
... 434000, 434000, 337000, 11300,
... 11300, 11300],
... 'GDP': [1937894, 2583560 , 12011, 4520, 12128,
... 17036, 182, 38, 311],
... 'alpha-2': ["IT", "FR", "MT", "MV", "BN",
... "IS", "NR", "TV", "AI"]},
... index=["Italy", "France", "Malta",
... "Maldives", "Brunei", "Iceland",
... "Nauru", "Tuvalu", "Anguilla"])
>>> df.execute()
population GDP alpha-2
Italy 59000000 1937894 IT
France 65000000 2583560 FR
Malta 434000 12011 MT
Maldives 434000 4520 MV
Brunei 434000 12128 BN
Iceland 337000 17036 IS
Nauru 11300 182 NR
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
```
In the following example, we will use `nlargest` to select the three
rows having the largest values in column “population”.
```pycon
>>> df.nlargest(3, 'population').execute()
population GDP alpha-2
France 65000000 2583560 FR
Italy 59000000 1937894 IT
Malta 434000 12011 MT
```
When using `keep='last'`, ties are resolved in reverse order:
```pycon
>>> df.nlargest(3, 'population', keep='last').execute()
population GDP alpha-2
France 65000000 2583560 FR
Italy 59000000 1937894 IT
Brunei 434000 12128 BN
```
When using `keep='all'`, all duplicate items are maintained:
```pycon
>>> df.nlargest(3, 'population', keep='all').execute()
population GDP alpha-2
France 65000000 2583560 FR
Italy 59000000 1937894 IT
Malta 434000 12011 MT
Maldives 434000 4520 MV
Brunei 434000 12128 BN
```
To order by the largest values in column “population” and then “GDP”,
we can specify multiple columns like in the next example.
```pycon
>>> df.nlargest(3, ['population', 'GDP']).execute()
population GDP alpha-2
France 65000000 2583560 FR
Italy 59000000 1937894 IT
Brunei 434000 12128 BN
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.notna.md
# maxframe.dataframe.DataFrame.notna
#### DataFrame.notna()
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings `''` or `numpy.inf` are not considered NA values
(unless you set `pandas.options.mode.use_inf_as_na = True`).
NA values, such as None or `numpy.NaN`, get mapped to False
values.
* **Returns:**
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.notnull`](maxframe.dataframe.DataFrame.notnull.md#maxframe.dataframe.DataFrame.notnull)
: Alias of notna.
[`DataFrame.isna`](maxframe.dataframe.DataFrame.isna.md#maxframe.dataframe.DataFrame.isna)
: Boolean inverse of notna.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Omit axes labels with missing values.
[`notna`](maxframe.dataframe.notna.md#maxframe.dataframe.notna)
: Top-level notna.
### Examples
Show which entries in a DataFrame are not NA.
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'age': [5, 6, np.NaN],
... 'born': [md.NaT, md.Timestamp('1939-05-27'),
... md.Timestamp('1940-04-25')],
... 'name': ['Alfred', 'Batman', ''],
... 'toy': [None, 'Batmobile', 'Joker']})
>>> df.execute()
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
```
```pycon
>>> df.notna().execute()
age born name toy
0 True False True False
1 True True True True
2 False True True True
```
Show which entries in a Series are not NA.
```pycon
>>> ser = md.Series([5, 6, np.NaN])
>>> ser.execute()
0 5.0
1 6.0
2 NaN
dtype: float64
```
```pycon
>>> ser.notna().execute()
0 True
1 True
2 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.notnull.md
# maxframe.dataframe.DataFrame.notnull
#### DataFrame.notnull()
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings `''` or `numpy.inf` are not considered NA values
(unless you set `pandas.options.mode.use_inf_as_na = True`).
NA values, such as None or `numpy.NaN`, get mapped to False
values.
* **Returns:**
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.notnull`](#maxframe.dataframe.DataFrame.notnull)
: Alias of notna.
[`DataFrame.isna`](maxframe.dataframe.DataFrame.isna.md#maxframe.dataframe.DataFrame.isna)
: Boolean inverse of notna.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Omit axes labels with missing values.
[`notna`](maxframe.dataframe.notna.md#maxframe.dataframe.notna)
: Top-level notna.
### Examples
Show which entries in a DataFrame are not NA.
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'age': [5, 6, np.NaN],
... 'born': [md.NaT, md.Timestamp('1939-05-27'),
... md.Timestamp('1940-04-25')],
... 'name': ['Alfred', 'Batman', ''],
... 'toy': [None, 'Batmobile', 'Joker']})
>>> df.execute()
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
```
```pycon
>>> df.notna().execute()
age born name toy
0 True False True False
1 True True True True
2 False True True True
```
Show which entries in a Series are not NA.
```pycon
>>> ser = md.Series([5, 6, np.NaN])
>>> ser.execute()
0 5.0
1 6.0
2 NaN
dtype: float64
```
```pycon
>>> ser.notna().execute()
0 True
1 True
2 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.nsmallest.md
# maxframe.dataframe.DataFrame.nsmallest
#### DataFrame.nsmallest(n, columns, keep='first')
Return the first n rows ordered by columns in ascending order.
Return the first n rows with the smallest values in columns, in
ascending order. The columns that are not specified are returned as
well, but not used for ordering.
This method is equivalent to
`df.sort_values(columns, ascending=True).head(n)`, but more
performant.
* **Parameters:**
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Number of items to retrieve.
* **columns** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Column name or names to order by.
* **keep** ( *{'first'* *,* *'last'* *,* *'all'}* *,* *default 'first'*) –
Where there are duplicate values:
- `first` : take the first occurrence.
- `last` : take the last occurrence.
- `all` : do not drop any duplicates, even it means
selecting more than n items.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.nlargest`](maxframe.dataframe.DataFrame.nlargest.md#maxframe.dataframe.DataFrame.nlargest)
: Return the first n rows ordered by columns in descending order.
[`DataFrame.sort_values`](maxframe.dataframe.DataFrame.sort_values.md#maxframe.dataframe.DataFrame.sort_values)
: Sort DataFrame by the values.
[`DataFrame.head`](maxframe.dataframe.DataFrame.head.md#maxframe.dataframe.DataFrame.head)
: Return the first n rows without re-ordering.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'population': [59000000, 65000000, 434000,
... 434000, 434000, 337000, 337000,
... 11300, 11300],
... 'GDP': [1937894, 2583560 , 12011, 4520, 12128,
... 17036, 182, 38, 311],
... 'alpha-2': ["IT", "FR", "MT", "MV", "BN",
... "IS", "NR", "TV", "AI"]},
... index=["Italy", "France", "Malta",
... "Maldives", "Brunei", "Iceland",
... "Nauru", "Tuvalu", "Anguilla"])
>>> df.execute()
population GDP alpha-2
Italy 59000000 1937894 IT
France 65000000 2583560 FR
Malta 434000 12011 MT
Maldives 434000 4520 MV
Brunei 434000 12128 BN
Iceland 337000 17036 IS
Nauru 337000 182 NR
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
```
In the following example, we will use `nsmallest` to select the
three rows having the smallest values in column “population”.
```pycon
>>> df.nsmallest(3, 'population').execute()
population GDP alpha-2
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
Iceland 337000 17036 IS
```
When using `keep='last'`, ties are resolved in reverse order:
```pycon
>>> df.nsmallest(3, 'population', keep='last').execute()
population GDP alpha-2
Anguilla 11300 311 AI
Tuvalu 11300 38 TV
Nauru 337000 182 NR
```
When using `keep='all'`, all duplicate items are maintained:
```pycon
>>> df.nsmallest(3, 'population', keep='all').execute()
population GDP alpha-2
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
Iceland 337000 17036 IS
Nauru 337000 182 NR
```
To order by the smallest values in column “population” and then “GDP”, we can
specify multiple columns like in the next example.
```pycon
>>> df.nsmallest(3, ['population', 'GDP']).execute()
population GDP alpha-2
Tuvalu 11300 38 TV
Anguilla 11300 311 AI
Nauru 337000 182 NR
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.nunique.md
# maxframe.dataframe.DataFrame.nunique
#### DataFrame.nunique(axis=0, dropna=True)
Count distinct observations over requested axis.
Return Series with number of distinct observations. Can ignore NaN
values.
* **Parameters:**
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – The axis to use. 0 or ‘index’ for row-wise, 1 or ‘columns’ for
column-wise.
* **dropna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Don’t include NaN in the counts.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.nunique`](maxframe.dataframe.Series.nunique.md#maxframe.dataframe.Series.nunique)
: Method nunique for Series.
[`DataFrame.count`](maxframe.dataframe.DataFrame.count.md#maxframe.dataframe.DataFrame.count)
: Count non-NA cells for each column or row.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A': [1, 2, 3], 'B': [1, 1, 1]})
>>> df.nunique().execute()
A 3
B 1
dtype: int64
```
```pycon
>>> df.nunique(axis=1).execute()
0 1
1 2
2 2
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.pct_change.md
# maxframe.dataframe.DataFrame.pct_change
#### DataFrame.pct_change(periods=1, fill_method='pad', limit=None, freq=None, \*\*kwargs)
Percentage change between the current and a prior element.
Computes the percentage change from the immediately previous row by
default. This is useful in comparing the percentage of change in a time
series of elements.
* **Parameters:**
* **periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 1*) – Periods to shift for forming percent change.
* **fill_method** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 'pad'*) – How to handle NAs before computing percent changes.
* **limit** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – The number of consecutive NAs to fill before stopping.
* **freq** (*DateOffset* *,* *timedelta* *, or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Increment to use from time series API (e.g. ‘M’ or BDay()).
* **\*\*kwargs** – Additional keyword arguments are passed into
DataFrame.shift or Series.shift.
* **Returns:**
**chg** – The same type as the calling object.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`Series.diff`
: Compute the difference of two elements in a Series.
[`DataFrame.diff`](maxframe.dataframe.DataFrame.diff.md#maxframe.dataframe.DataFrame.diff)
: Compute the difference of two elements in a DataFrame.
[`Series.shift`](maxframe.dataframe.Series.shift.md#maxframe.dataframe.Series.shift)
: Shift the index by some number of periods.
[`DataFrame.shift`](maxframe.dataframe.DataFrame.shift.md#maxframe.dataframe.DataFrame.shift)
: Shift the index by some number of periods.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.pivot.md
# maxframe.dataframe.DataFrame.pivot
#### DataFrame.pivot(columns, index=None, values=None)
Return reshaped DataFrame organized by given index / column values.
Reshape data (produce a “pivot” table) based on column values. Uses
unique values from specified index / columns to form axes of the
resulting DataFrame. This function does not support data
aggregation, multiple values will result in a MultiIndex in the
columns. See the [User Guide](https://pandas.pydata.org/docs/user_guide/reshaping.html#reshaping) for more on reshaping.
* **Parameters:**
* **index** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*object*](https://docs.python.org/3/library/functions.html#object) *or* *a list* *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Column to use to make new frame’s index. If None, uses
existing index.
* **columns** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*object*](https://docs.python.org/3/library/functions.html#object) *or* *a list* *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Column to use to make new frame’s columns.
* **values** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*object*](https://docs.python.org/3/library/functions.html#object) *or* *a list* *of* *the previous* *,* *optional*) – Column(s) to use for populating new frame’s values. If not
specified, all remaining columns will be used and the result will
have hierarchically indexed columns.
* **Returns:**
Returns reshaped DataFrame.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
* **Raises:**
**ValueError:** – When there are any index, columns combinations with multiple
values. DataFrame.pivot_table when you need to aggregate.
#### SEE ALSO
[`DataFrame.pivot_table`](maxframe.dataframe.DataFrame.pivot_table.md#maxframe.dataframe.DataFrame.pivot_table)
: Generalization of pivot that can handle duplicate values for one index/column pair.
[`DataFrame.unstack`](maxframe.dataframe.DataFrame.unstack.md#maxframe.dataframe.DataFrame.unstack)
: Pivot based on the index values instead of a column.
`wide_to_long`
: Wide panel to long format. Less flexible but more user-friendly than melt.
### Notes
For finer-tuned control, see hierarchical indexing documentation along
with the related stack/unstack methods.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
... 'two'],
... 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
... 'baz': [1, 2, 3, 4, 5, 6],
... 'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df.execute()
foo bar baz zoo
0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t
```
```pycon
>>> df.pivot(index='foo', columns='bar', values='baz').execute()
bar A B C
foo
one 1 2 3
two 4 5 6
```
```pycon
>>> df.pivot(index='foo', columns='bar', values=['baz', 'zoo']).execute()
baz zoo
bar A B C A B C
foo
one 1 2 3 x y z
two 4 5 6 q w t
```
You could also assign a list of column names or a list of index names.
```pycon
>>> df = md.DataFrame({
... "lev1": [1, 1, 1, 2, 2, 2],
... "lev2": [1, 1, 2, 1, 1, 2],
... "lev3": [1, 2, 1, 2, 1, 2],
... "lev4": [1, 2, 3, 4, 5, 6],
... "values": [0, 1, 2, 3, 4, 5]})
>>> df.execute()
lev1 lev2 lev3 lev4 values
0 1 1 1 1 0
1 1 1 2 2 1
2 1 2 1 3 2
3 2 1 2 4 3
4 2 1 1 5 4
5 2 2 2 6 5
```
```pycon
>>> df.pivot(index="lev1", columns=["lev2", "lev3"], values="values").execute()
lev2 1 2
lev3 1 2 1 2
lev1
1 0.0 1.0 2.0 NaN
2 4.0 3.0 NaN 5.0
```
```pycon
>>> df.pivot(index=["lev1", "lev2"], columns=["lev3"], values="values").execute()
lev3 1 2
lev1 lev2
1 1 0.0 1.0
2 2.0 NaN
2 1 4.0 3.0
2 NaN 5.0
```
A ValueError is raised if there are any duplicates.
```pycon
>>> df = md.DataFrame({"foo": ['one', 'one', 'two', 'two'],
... "bar": ['A', 'A', 'B', 'C'],
... "baz": [1, 2, 3, 4]})
>>> df.execute()
foo bar baz
0 one A 1
1 one A 2
2 two B 3
3 two C 4
```
Notice that the first two rows are the same for our index
and columns arguments.
```pycon
>>> df.pivot(index='foo', columns='bar', values='baz').execute()
Traceback (most recent call last):
...
ValueError: Index contains duplicate entries, cannot reshape
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.pivot_table.md
# maxframe.dataframe.DataFrame.pivot_table
#### DataFrame.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', sort=True)
Create a spreadsheet-style pivot table as a DataFrame.
The levels in the pivot table will be stored in MultiIndex objects
(hierarchical indexes) on the index and columns of the result DataFrame.
* **Parameters:**
* **values** (*column to aggregate* *,* *optional*)
* **index** (*column* *,* *Grouper* *,* *array* *, or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *the previous*) – If an array is passed, it must be the same length as the data. The
list can contain any of the other types (except list).
Keys to group by on the pivot table index. If an array is passed,
it is being used as the same manner as column values.
* **columns** (*column* *,* *Grouper* *,* *array* *, or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *the previous*) – If an array is passed, it must be the same length as the data. The
list can contain any of the other types (except list).
Keys to group by on the pivot table column. If an array is passed,
it is being used as the same manner as column values.
* **aggfunc** (*function* *,* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *functions* *,* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *default numpy.mean*) – If list of functions passed, the resulting pivot table will have
hierarchical columns whose top level are the function names
(inferred from the function objects themselves)
If dict is passed, the key is column to aggregate and value
is function or list of functions.
* **fill_value** (*scalar* *,* *default None*) – Value to replace missing values with (in the resulting pivot table,
after aggregation).
* **margins** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Add all row / columns (e.g. for subtotal / grand totals).
* **dropna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Do not include columns whose entries are all NaN.
* **margins_name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 'All'*) – Name of the row / column that will contain the totals
when margins is True.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Specifies if the result should be sorted.
* **Returns:**
An Excel style pivot table.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.pivot`](maxframe.dataframe.DataFrame.pivot.md#maxframe.dataframe.DataFrame.pivot)
: Pivot without aggregation that can handle non-numeric data.
[`DataFrame.melt`](maxframe.dataframe.DataFrame.melt.md#maxframe.dataframe.DataFrame.melt)
: Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
`wide_to_long`
: Wide panel to long format. Less flexible but more user-friendly than melt.
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
... "bar", "bar", "bar", "bar"],
... "B": ["one", "one", "one", "two", "two",
... "one", "one", "two", "two"],
... "C": ["small", "large", "large", "small",
... "small", "large", "small", "small",
... "large"],
... "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
... "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
>>> df.execute()
A B C D E
0 foo one small 1 2
1 foo one large 2 4
2 foo one large 2 5
3 foo two small 3 5
4 foo two small 3 6
5 bar one large 4 6
6 bar one small 5 8
7 bar two small 6 9
8 bar two large 7 9
```
This first example aggregates values by taking the sum.
```pycon
>>> table = md.pivot_table(df, values='D', index=['A', 'B'],
... columns=['C'], aggfunc=np.sum)
>>> table.execute()
C large small
A B
bar one 4.0 5.0
two 7.0 6.0
foo one 4.0 1.0
two NaN 6.0
```
We can also fill missing values using the fill_value parameter.
```pycon
>>> table = md.pivot_table(df, values='D', index=['A', 'B'],
... columns=['C'], aggfunc=np.sum, fill_value=0)
>>> table.execute()
C large small
A B
bar one 4 5
two 7 6
foo one 4 1
two 0 6
```
The next example aggregates by taking the mean across multiple columns.
```pycon
>>> table = md.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
... aggfunc={'D': np.mean,
... 'E': np.mean})
>>> table.execute()
D E
A C
bar large 5.500000 7.500000
small 5.500000 8.500000
foo large 2.000000 4.500000
small 2.333333 4.333333
```
We can also calculate multiple types of aggregations for any given
value column.
```pycon
>>> table = md.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
... aggfunc={'D': np.mean,
... 'E': [min, max, np.mean]})
>>> table.execute()
D E
mean max mean min
A C
bar large 5.500000 9.0 7.500000 6.0
small 5.500000 9.0 8.500000 8.0
foo large 2.000000 5.0 4.500000 4.0
small 2.333333 6.0 4.333333 2.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.area.md
# maxframe.dataframe.DataFrame.plot.area
#### DataFrame.plot.area(\*args, \*\*kwargs)
Draw a stacked area plot.
An area plot displays quantitative data visually.
This function wraps the matplotlib area function.
* **Parameters:**
* **x** (*label* *or* *position* *,* *optional*) – Coordinates for the X axis. By default uses the index.
* **y** (*label* *or* *position* *,* *optional*) – Column to plot. By default uses all columns.
* **stacked** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Area plots are stacked by default. Set to False to create a
unstacked plot.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
Area plot, or array of area plots if subplots is True.
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray)
#### SEE ALSO
[`DataFrame.plot`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot)
: Make plots of DataFrame using matplotlib / pylab.
### Examples
Draw an area plot based on basic business metrics:
Area plots are stacked by default. To produce an unstacked plot,
pass `stacked=False`:
Draw an area plot for a single column:
Draw with a different x:
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.bar.md
# maxframe.dataframe.DataFrame.plot.bar
#### DataFrame.plot.bar(\*args, \*\*kwargs)
Vertical bar plot.
A bar plot is a plot that presents categorical data with
rectangular bars with lengths proportional to the values that they
represent. A bar plot shows comparisons among discrete categories. One
axis of the plot shows the specific categories being compared, and the
other axis represents a measured value.
* **Parameters:**
* **x** (*label* *or* *position* *,* *optional*) – Allows plotting of one column versus another. If not specified,
the index of the DataFrame is used.
* **y** (*label* *or* *position* *,* *optional*) – Allows plotting of one column versus another. If not specified,
all numerical columns are used.
* **color** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *array-like* *, or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *optional*) –
The color for each of the DataFrame’s columns. Possible values are:
- A single color string referred to by name, RGB or RGBA code,
: for instance ‘red’ or ‘#a98d19’.
- A sequence of color strings referred to by name, RGB or RGBA
: code, which will be used for each column recursively. For
instance [‘green’,’yellow’] each column’s bar will be filled in
green or yellow, alternatively. If there is only a single column to
be plotted, then only the first color from the color list will be
used.
- A dict of the form {column name
: colored accordingly. For example, if your columns are called a and
b, then passing {‘a’: ‘green’, ‘b’: ‘red’} will color bars for
column a in green and bars for column b in red.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
An ndarray is returned with one [`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes)
per column when `subplots=True`.
> DataFrame.plot.barh : Horizontal bar plot.
> DataFrame.plot : Make plots of a DataFrame.
> matplotlib.pyplot.bar : Make a bar plot with matplotlib.
> Basic plot.
> Plot a whole dataframe to a bar plot. Each column is assigned a
> distinct color, and each row is nested in a group along the
> horizontal axis.
> Plot stacked bar charts for the DataFrame
> Instead of nesting, the figure can be split by column with
> `subplots=True`. In this case, a [`numpy.ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray) of
> [`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) are returned.
> If you don’t like the default colours, you can specify how you’d
> like each column to be colored.
> Plot a single column.
> Plot only selected categories for the DataFrame.
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or np.ndarray of them
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.barh.md
# maxframe.dataframe.DataFrame.plot.barh
#### DataFrame.plot.barh(\*args, \*\*kwargs)
Make a horizontal bar plot.
A horizontal bar plot is a plot that presents quantitative data with
rectangular bars with lengths proportional to the values that they
represent. A bar plot shows comparisons among discrete categories. One
axis of the plot shows the specific categories being compared, and the
other axis represents a measured value.
* **Parameters:**
* **x** (*label* *or* *position* *,* *optional*) – Allows plotting of one column versus another. If not specified,
the index of the DataFrame is used.
* **y** (*label* *or* *position* *,* *optional*) – Allows plotting of one column versus another. If not specified,
all numerical columns are used.
* **color** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *array-like* *, or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *optional*) –
The color for each of the DataFrame’s columns. Possible values are:
- A single color string referred to by name, RGB or RGBA code,
: for instance ‘red’ or ‘#a98d19’.
- A sequence of color strings referred to by name, RGB or RGBA
: code, which will be used for each column recursively. For
instance [‘green’,’yellow’] each column’s bar will be filled in
green or yellow, alternatively. If there is only a single column to
be plotted, then only the first color from the color list will be
used.
- A dict of the form {column name
: colored accordingly. For example, if your columns are called a and
b, then passing {‘a’: ‘green’, ‘b’: ‘red’} will color bars for
column a in green and bars for column b in red.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
An ndarray is returned with one [`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes)
per column when `subplots=True`.
> DataFrame.plot.bar: Vertical bar plot.
> DataFrame.plot : Make plots of DataFrame using matplotlib.
> matplotlib.axes.Axes.bar : Plot a vertical bar plot using matplotlib.
> Basic example
> Plot a whole DataFrame to a horizontal bar plot
> Plot stacked barh charts for the DataFrame
> We can specify colors for each column
> Plot a column of the DataFrame to a horizontal bar plot
> Plot DataFrame versus the desired column
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or np.ndarray of them
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.box.md
# maxframe.dataframe.DataFrame.plot.box
#### DataFrame.plot.box(\*args, \*\*kwargs)
Make a box plot of the DataFrame columns.
A box plot is a method for graphically depicting groups of numerical
data through their quartiles.
The box extends from the Q1 to Q3 quartile values of the data,
with a line at the median (Q2). The whiskers extend from the edges
of box to show the range of the data. The position of the whiskers
is set by default to 1.5\*IQR (IQR = Q3 - Q1) from the edges of the
box. Outlier points are those past the end of the whiskers.
For further details see Wikipedia’s
entry for [boxplot](https://en.wikipedia.org/wiki/Box_plot).
A consideration when using this chart is that the box and the whiskers
can overlap, which is very common when plotting small sets of data.
* **Parameters:**
* **by** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *sequence*) –
Column in the DataFrame to group by.
#### Versionchanged
Changed in version 1.4.0: Previously, by is silently ignore and makes no groupings
* **\*\*kwargs** – Additional keywords are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Return type:**
[`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or numpy.ndarray of them
#### SEE ALSO
`DataFrame.boxplot`
: Another method to draw a box plot.
[`Series.plot.box`](maxframe.dataframe.Series.plot.box.md#maxframe.dataframe.Series.plot.box)
: Draw a box plot from a Series object.
[`matplotlib.pyplot.boxplot`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html#matplotlib.pyplot.boxplot)
: Draw a box plot in matplotlib.
### Examples
Draw a box plot from a DataFrame with four columns of randomly
generated data.
You can also generate groupings if you specify the by parameter (which
can take a column name, or a list or tuple of column names):
#### Versionchanged
Changed in version 1.4.0.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.density.md
# maxframe.dataframe.DataFrame.plot.density
#### DataFrame.plot.density(\*args, \*\*kwargs)
Generate Kernel Density Estimate plot using Gaussian kernels.
In statistics, [kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) (KDE) is a non-parametric
way to estimate the probability density function (PDF) of a random
variable. This function uses Gaussian kernels and includes automatic
bandwidth determination.
* **Parameters:**
* **bw_method** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *scalar* *or* *callable* *,* *optional*) – The method used to calculate the estimator bandwidth. This can be
‘scott’, ‘silverman’, a scalar constant or a callable.
If None (default), ‘scott’ is used.
See [`scipy.stats.gaussian_kde`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html#scipy.stats.gaussian_kde) for more information.
* **ind** (*NumPy array* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Evaluation points for the estimated PDF. If None (default),
1000 equally spaced points are used. If ind is a NumPy array, the
KDE is evaluated at the points passed. If ind is an integer,
ind number of equally spaced points are used.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray) of them
#### SEE ALSO
[`scipy.stats.gaussian_kde`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html#scipy.stats.gaussian_kde)
: Representation of a kernel-density estimate using Gaussian kernels. This is the function used internally to estimate the PDF.
### Examples
Given a Series of points randomly sampled from an unknown
distribution, estimate its PDF using KDE with automatic
bandwidth determination and plot the results, evaluating them at
1000 equally spaced points (default):
A scalar bandwidth can be specified. Using a small bandwidth value can
lead to over-fitting, while using a large bandwidth value may result
in under-fitting:
Finally, the ind parameter determines the evaluation points for the
plot of the estimated PDF:
For DataFrame, it works in the same way:
A scalar bandwidth can be specified. Using a small bandwidth value can
lead to over-fitting, while using a large bandwidth value may result
in under-fitting:
Finally, the ind parameter determines the evaluation points for the
plot of the estimated PDF:
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.hexbin.md
# maxframe.dataframe.DataFrame.plot.hexbin
#### DataFrame.plot.hexbin(\*args, \*\*kwargs)
Generate a hexagonal binning plot.
Generate a hexagonal binning plot of x versus y. If C is None
(the default), this is a histogram of the number of occurrences
of the observations at `(x[i], y[i])`.
If C is specified, specifies values at given coordinates
`(x[i], y[i])`. These values are accumulated for each hexagonal
bin and then reduced according to reduce_C_function,
having as default the NumPy’s mean function (`numpy.mean()`).
(If C is specified, it must also be a 1-D sequence
of the same length as x and y, or a column label.)
* **Parameters:**
* **x** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The column label or position for x points.
* **y** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The column label or position for y points.
* **C** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – The column label or position for the value of (x, y) point.
* **reduce_C_function** (callable, default np.mean) – Function of one argument that reduces all the values in a bin to
a single number (e.g. np.mean, np.max, np.sum, np.std).
* **gridsize** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *(*[*int*](https://docs.python.org/3/library/functions.html#int) *,* [*int*](https://docs.python.org/3/library/functions.html#int) *)* *,* *default 100*) – The number of hexagons in the x-direction.
The corresponding number of hexagons in the y-direction is
chosen in a way that the hexagons are approximately regular.
Alternatively, gridsize can be a tuple with two elements
specifying the number of hexagons in the x-direction and the
y-direction.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
The matplotlib `Axes` on which the hexbin is plotted.
* **Return type:**
matplotlib.AxesSubplot
#### SEE ALSO
[`DataFrame.plot`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot)
: Make plots of a DataFrame.
[`matplotlib.pyplot.hexbin`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hexbin.html#matplotlib.pyplot.hexbin)
: Hexagonal binning plot using matplotlib, the matplotlib function that is used under the hood.
### Examples
The following examples are generated with random data from
a normal distribution.
The next example uses C and np.sum as reduce_C_function.
Note that ‘observations’ values ranges from 1 to 5 but the result
plot shows values up to more than 25. This is because of the
reduce_C_function.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.hist.md
# maxframe.dataframe.DataFrame.plot.hist
#### DataFrame.plot.hist(\*args, \*\*kwargs)
Draw one histogram of the DataFrame’s columns.
A histogram is a representation of the distribution of data.
This function groups the values of all given Series in the DataFrame
into bins and draws all bins in one [`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes).
This is useful when the DataFrame’s Series are in a similar scale.
* **Parameters:**
* **by** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *sequence* *,* *optional*) –
Column in the DataFrame to group by.
#### Versionchanged
Changed in version 1.4.0: Previously, by is silently ignore and makes no groupings
* **bins** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 10*) – Number of histogram bins to be used.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
**class** – Return a histogram plot.
* **Return type:**
matplotlib.AxesSubplot
#### SEE ALSO
`DataFrame.hist`
: Draw histograms per DataFrame’s Series.
`Series.hist`
: Draw a histogram with Series’ data.
### Examples
When we roll a die 6000 times, we expect to get each value around 1000
times. But when we roll two dice and sum the result, the distribution
is going to be quite different. A histogram illustrates those
distributions.
A grouped histogram can be generated by providing the parameter by (which
can be a column name, or a list of column names):
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.kde.md
# maxframe.dataframe.DataFrame.plot.kde
#### DataFrame.plot.kde(\*args, \*\*kwargs)
Generate Kernel Density Estimate plot using Gaussian kernels.
In statistics, [kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) (KDE) is a non-parametric
way to estimate the probability density function (PDF) of a random
variable. This function uses Gaussian kernels and includes automatic
bandwidth determination.
* **Parameters:**
* **bw_method** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *scalar* *or* *callable* *,* *optional*) – The method used to calculate the estimator bandwidth. This can be
‘scott’, ‘silverman’, a scalar constant or a callable.
If None (default), ‘scott’ is used.
See [`scipy.stats.gaussian_kde`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html#scipy.stats.gaussian_kde) for more information.
* **ind** (*NumPy array* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Evaluation points for the estimated PDF. If None (default),
1000 equally spaced points are used. If ind is a NumPy array, the
KDE is evaluated at the points passed. If ind is an integer,
ind number of equally spaced points are used.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray) of them
#### SEE ALSO
[`scipy.stats.gaussian_kde`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html#scipy.stats.gaussian_kde)
: Representation of a kernel-density estimate using Gaussian kernels. This is the function used internally to estimate the PDF.
### Examples
Given a Series of points randomly sampled from an unknown
distribution, estimate its PDF using KDE with automatic
bandwidth determination and plot the results, evaluating them at
1000 equally spaced points (default):
A scalar bandwidth can be specified. Using a small bandwidth value can
lead to over-fitting, while using a large bandwidth value may result
in under-fitting:
Finally, the ind parameter determines the evaluation points for the
plot of the estimated PDF:
For DataFrame, it works in the same way:
A scalar bandwidth can be specified. Using a small bandwidth value can
lead to over-fitting, while using a large bandwidth value may result
in under-fitting:
Finally, the ind parameter determines the evaluation points for the
plot of the estimated PDF:
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.line.md
# maxframe.dataframe.DataFrame.plot.line
#### DataFrame.plot.line(\*args, \*\*kwargs)
Plot Series or DataFrame as lines.
This function is useful to plot lines using DataFrame’s values
as coordinates.
* **Parameters:**
* **x** (*label* *or* *position* *,* *optional*) – Allows plotting of one column versus another. If not specified,
the index of the DataFrame is used.
* **y** (*label* *or* *position* *,* *optional*) – Allows plotting of one column versus another. If not specified,
all numerical columns are used.
* **color** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *array-like* *, or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *optional*) –
The color for each of the DataFrame’s columns. Possible values are:
- A single color string referred to by name, RGB or RGBA code,
: for instance ‘red’ or ‘#a98d19’.
- A sequence of color strings referred to by name, RGB or RGBA
: code, which will be used for each column recursively. For
instance [‘green’,’yellow’] each column’s line will be filled in
green or yellow, alternatively. If there is only a single column to
be plotted, then only the first color from the color list will be
used.
- A dict of the form {column name
: colored accordingly. For example, if your columns are called a and
b, then passing {‘a’: ‘green’, ‘b’: ‘red’} will color lines for
column a in green and lines for column b in red.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
An ndarray is returned with one [`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes)
per column when `subplots=True`.
> matplotlib.pyplot.plot : Plot y versus x as lines and/or markers.
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or np.ndarray of them
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.md
# maxframe.dataframe.DataFrame.plot
#### DataFrame.plot()
Make plots of Series or DataFrame.
Uses the backend specified by the
option `plotting.backend`. By default, matplotlib is used.
* **Parameters:**
* **data** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – The object for which the method is called.
* **x** (*label* *or* *position* *,* *default None*) – Only used if data is a DataFrame.
* **y** (*label* *,* *position* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *label* *,* *positions* *,* *default None*) – Allows plotting of one column versus another. Only used if data is a
DataFrame.
* **kind** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) –
The kind of plot to produce:
- ’line’ : line plot (default)
- ’bar’ : vertical bar plot
- ’barh’ : horizontal bar plot
- ’hist’ : histogram
- ’box’ : boxplot
- ’kde’ : Kernel Density Estimation plot
- ’density’ : same as ‘kde’
- ’area’ : area plot
- ’pie’ : pie plot
- ’scatter’ : scatter plot (DataFrame only)
- ’hexbin’ : hexbin plot (DataFrame only)
* **ax** (*matplotlib axes object* *,* *default None*) – An axes of the current figure.
* **subplots** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* *sequence* *of* *iterables* *,* *default False*) –
Whether to group columns into subplots:
- `False` : No subplots will be used
- `True` : Make separate subplots for each column.
- sequence of iterables of column labels: Create a subplot for each
group of columns. For example [(‘a’, ‘c’), (‘b’, ‘d’)] will
create 2 subplots: one with columns ‘a’ and ‘c’, and one
with columns ‘b’ and ‘d’. Remaining columns that aren’t specified
will be plotted in additional subplots (one per column).
#### Versionadded
Added in version 1.5.0.
* **sharex** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True if ax is None else False*) – In case `subplots=True`, share x axis and set some x axis labels
to invisible; defaults to True if ax is None otherwise False if
an ax is passed in; Be aware, that passing in both an ax and
`sharex=True` will alter all x axis labels for all axis in a figure.
* **sharey** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – In case `subplots=True`, share y axis and set some y axis labels to invisible.
* **layout** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *,* *optional*) – (rows, columns) for the layout of subplots.
* **figsize** (*a tuple* *(**width* *,* *height* *)* *in inches*) – Size of a figure object.
* **use_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Use index as ticks for x axis.
* **title** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list)) – Title to use for the plot. If a string is passed, print the string
at the top of the figure. If a list is passed and subplots is
True, print each item in the list above the corresponding subplot.
* **grid** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default None* *(**matlab style default* *)*) – Axis grid lines.
* **legend** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* *{'reverse'}*) – Place legend on axis subplots.
* **style** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) – The matplotlib line style per column.
* **logx** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* *'sym'* *,* *default False*) – Use log scaling or symlog scaling on x axis.
* **logy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* *'sym' default False*) – Use log scaling or symlog scaling on y axis.
* **loglog** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* *'sym'* *,* *default False*) – Use log scaling or symlog scaling on both x and y axes.
* **xticks** (*sequence*) – Values to use for the xticks.
* **yticks** (*sequence*) – Values to use for the yticks.
* **xlim** (*2-tuple/list*) – Set the x limits of the current axes.
* **ylim** (*2-tuple/list*) – Set the y limits of the current axes.
* **xlabel** (*label* *,* *optional*) –
Name to use for the xlabel on x-axis. Default uses index name as xlabel, or the
x-column name for planar plots.
#### Versionchanged
Changed in version 2.0.0: Now applicable to histograms.
* **ylabel** (*label* *,* *optional*) –
Name to use for the ylabel on y-axis. Default will show no ylabel, or the
y-column name for planar plots.
#### Versionchanged
Changed in version 2.0.0: Now applicable to histograms.
* **rot** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *default None*) – Rotation for ticks (xticks for vertical, yticks for horizontal
plots).
* **fontsize** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *default None*) – Font size for xticks and yticks.
* **colormap** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *matplotlib colormap object* *,* *default None*) – Colormap to select colors from. If string, load colormap with that
name from matplotlib.
* **colorbar** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True, plot colorbar (only relevant for ‘scatter’ and ‘hexbin’
plots).
* **position** ([*float*](https://docs.python.org/3/library/functions.html#float)) – Specify relative alignments for bar plot layout.
From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5
(center).
* **table** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *,* *default False*) – If True, draw a table using the data in the DataFrame and the data
will be transposed to meet matplotlib’s default layout.
If a Series or DataFrame is passed, use passed data to draw a
table.
* **yerr** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *array-like* *,* *dict and str*) – See [Plotting with Error Bars](https://pandas.pydata.org/docs/user_guide/visualization.html#visualization-errorbars) for
detail.
* **xerr** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *array-like* *,* *dict and str*) – Equivalent to yerr.
* **stacked** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False in line and bar plots* *,* *and True in area plot*) – If True, create stacked plot.
* **secondary_y** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* *sequence* *,* *default False*) – Whether to plot on the secondary y-axis if a list/tuple, which
columns to plot on secondary y-axis.
* **mark_right** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – When using a secondary_y axis, automatically mark the column
labels with “(right)” in the legend.
* **include_bool** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default is False*) – If True, boolean values can be plotted.
* **backend** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Backend to use instead of the backend specified in the option
`plotting.backend`. For instance, ‘matplotlib’. Alternatively, to
specify the `plotting.backend` for the whole session, set
`pd.options.plotting.backend`.
* **\*\*kwargs** – Options to pass to matplotlib plotting method.
* **Returns:**
If the backend is not the default matplotlib one, the return value
will be the object returned by the backend.
* **Return type:**
[`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or numpy.ndarray of them
### Notes
- See matplotlib documentation online for more on this subject
- If kind = ‘bar’ or ‘barh’, you can specify relative alignments
for bar plot layout by position keyword.
From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5
(center)
### Examples
For Series:
For DataFrame:
For SeriesGroupBy:
For DataFrameGroupBy:
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.pie.md
# maxframe.dataframe.DataFrame.plot.pie
#### DataFrame.plot.pie(\*args, \*\*kwargs)
Generate a pie plot.
A pie plot is a proportional representation of the numerical data in a
column. This function wraps `matplotlib.pyplot.pie()` for the
specified column. If no column reference is passed and
`subplots=True` a pie plot is drawn for each numerical column
independently.
* **Parameters:**
* **y** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label* *,* *optional*) – Label or position of the column to plot.
If not provided, `subplots=True` argument must be passed.
* **\*\*kwargs** – Keyword arguments to pass on to [`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
A NumPy array is returned when subplots is True.
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or np.ndarray of them
#### SEE ALSO
[`Series.plot.pie`](maxframe.dataframe.Series.plot.pie.md#maxframe.dataframe.Series.plot.pie)
: Generate a pie plot for a Series.
[`DataFrame.plot`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot)
: Make plots of a DataFrame.
### Examples
In the example below we have a DataFrame with the information about
planet’s mass and radius. We pass the ‘mass’ column to the
pie function to get a pie plot.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.scatter.md
# maxframe.dataframe.DataFrame.plot.scatter
#### DataFrame.plot.scatter(\*args, \*\*kwargs)
Create a scatter plot with varying marker point size and color.
The coordinates of each point are defined by two dataframe columns and
filled circles are used to represent each point. This kind of plot is
useful to see complex correlations between two variables. Points could
be for instance natural 2D coordinates like longitude and latitude in
a map or, in general, any pair of metrics that can be plotted against
each other.
* **Parameters:**
* **x** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The column name or column position to be used as horizontal
coordinates for each point.
* **y** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The column name or column position to be used as vertical
coordinates for each point.
* **s** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *scalar* *or* *array-like* *,* *optional*) –
The size of each point. Possible values are:
- A string with the name of the column to be used for marker’s size.
- A single scalar so all points have the same size.
- A sequence of scalars, which will be used for each point’s size
recursively. For instance, when passing [2,14] all points size
will be either 2 or 14, alternatively.
* **c** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*int*](https://docs.python.org/3/library/functions.html#int) *or* *array-like* *,* *optional*) –
The color of each point. Possible values are:
- A single color string referred to by name, RGB or RGBA code,
for instance ‘red’ or ‘#a98d19’.
- A sequence of color strings referred to by name, RGB or RGBA
code, which will be used for each point’s color recursively. For
instance [‘green’,’yellow’] all points will be filled in green or
yellow, alternatively.
- A column name or position whose values will be used to color the
marker points according to a colormap.
* **\*\*kwargs** – Keyword arguments to pass on to [`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Return type:**
[`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or numpy.ndarray of them
#### SEE ALSO
[`matplotlib.pyplot.scatter`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html#matplotlib.pyplot.scatter)
: Scatter plot using multiple input data formats.
### Examples
Let’s see how to draw a scatter plot using coordinates from the values
in a DataFrame’s columns.
And now with the color determined by a column as well.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.pop.md
# maxframe.dataframe.DataFrame.pop
#### DataFrame.pop(item)
Return item and drop from frame. Raise KeyError if not found.
* **Parameters:**
**item** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Label of column to be popped.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([('falcon', 'bird', 389.0),
... ('parrot', 'bird', 24.0),
... ('lion', 'mammal', 80.5),
... ('monkey', 'mammal', np.nan)],
... columns=('name', 'class', 'max_speed'))
>>> df.execute()
name class max_speed
0 falcon bird 389.0
1 parrot bird 24.0
2 lion mammal 80.5
3 monkey mammal NaN
```
```pycon
>>> df.pop('class').execute()
0 bird
1 bird
2 mammal
3 mammal
Name: class, dtype: object
```
```pycon
>>> df.execute()
name max_speed
0 falcon 389.0
1 parrot 24.0
2 lion 80.5
3 monkey NaN
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.pow.md
# maxframe.dataframe.DataFrame.pow
#### DataFrame.pow(other, axis='columns', level=None, fill_value=None)
Get Exponential power of dataframe and other, element-wise (binary operator pow).
Equivalent to `**`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rpow.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.prod.md
# maxframe.dataframe.DataFrame.prod
#### DataFrame.prod(axis=None, skipna=True, level=None, min_count=0, numeric_only=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.product.md
# maxframe.dataframe.DataFrame.product
#### DataFrame.product(axis=None, skipna=True, level=None, min_count=0, numeric_only=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.quantile.md
# maxframe.dataframe.DataFrame.quantile
#### DataFrame.quantile(q=0.5, axis=0, numeric_only=True, interpolation='linear')
Return values at the given quantile over requested axis.
* **Parameters:**
* **q** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array-like* *,* *default 0.5* *(**50% quantile* *)*) – Value between 0 <= q <= 1, the quantile(s) to compute.
* **axis** ( *{0* *,* *1* *,* *'index'* *,* *'columns'}* *(**default 0* *)*) – Equals 0 or ‘index’ for row-wise, 1 or ‘columns’ for column-wise.
* **numeric_only** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If False, the quantile of datetime and timedelta data will be
computed as well.
* **interpolation** ( *{'linear'* *,* *'lower'* *,* *'higher'* *,* *'midpoint'* *,* *'nearest'}*) –
This optional parameter specifies the interpolation method to use,
when the desired quantile lies between two data points i and j:
\* linear: i + (j - i) \* fraction, where fraction is the
> fractional part of the index surrounded by i and j.
* lower: i.
* higher: j.
* nearest: i or j whichever is nearest.
* midpoint: (i + j) / 2.
* **Returns:**
If `q` is an array or a tensor, a DataFrame will be returned where the
: index is `q`, the columns are the columns of self, and the
values are the quantiles.
If `q` is a float, a Series will be returned where the
: index is the columns of self and the values are the quantiles.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`core.window.Rolling.quantile`
: Rolling quantile.
[`numpy.percentile`](https://numpy.org/doc/stable/reference/generated/numpy.percentile.html#numpy.percentile)
: Numpy function to compute the percentile.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
... columns=['a', 'b'])
>>> df.quantile(.1).execute()
a 1.3
b 3.7
Name: 0.1, dtype: float64
```
```pycon
>>> df.quantile([.1, .5]).execute()
a b
0.1 1.3 3.7
0.5 2.5 55.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.query.md
# maxframe.dataframe.DataFrame.query
#### DataFrame.query(expr, inplace=False, \*\*kwargs)
Query the columns of a DataFrame with a boolean expression.
* **Parameters:**
* **expr** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) –
The query string to evaluate.
You can refer to variables
in the environment by prefixing them with an ‘@’ character like
`@a + b`.
You can refer to column names that contain spaces or operators by
surrounding them in backticks. This way you can also escape
names that start with a digit, or those that are a Python keyword.
Basically when it is not valid Python identifier. See notes down
for more details.
For example, if one of your columns is called `a a` and you want
to sum it with `b`, your query should be ``a a` + b`.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – Whether the query should modify the data in place or return
a modified copy.
* **\*\*kwargs** – See the documentation for [`eval()`](maxframe.dataframe.eval.md#maxframe.dataframe.eval) for complete details
on the keyword arguments accepted by [`DataFrame.query()`](#maxframe.dataframe.DataFrame.query).
* **Returns:**
DataFrame resulting from the provided query expression.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`eval`](maxframe.dataframe.eval.md#maxframe.dataframe.eval)
: Evaluate a string describing operations on DataFrame columns.
[`DataFrame.eval`](maxframe.dataframe.DataFrame.eval.md#maxframe.dataframe.DataFrame.eval)
: Evaluate a string describing operations on DataFrame columns.
### Notes
The result of the evaluation of this expression is first passed to
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc) and if that fails because of a
multidimensional key (e.g., a DataFrame) then the result will be passed
to `DataFrame.__getitem__()`.
This method uses the top-level [`eval()`](maxframe.dataframe.eval.md#maxframe.dataframe.eval) function to
evaluate the passed query.
The [`query()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html#pandas.DataFrame.query) method uses a slightly
modified Python syntax by default. For example, the `&` and `|`
(bitwise) operators have the precedence of their boolean cousins,
[`and`](https://docs.python.org/3/reference/expressions.html#and) and [`or`](https://docs.python.org/3/reference/expressions.html#or). This *is* syntactically valid Python,
however the semantics are different.
You can change the semantics of the expression by passing the keyword
argument `parser='python'`. This enforces the same semantics as
evaluation in Python space. Likewise, you can pass `engine='python'`
to evaluate an expression using Python itself as a backend. This is not
recommended as it is inefficient compared to using `numexpr` as the
engine.
The [`DataFrame.index`](maxframe.dataframe.DataFrame.index.md#maxframe.dataframe.DataFrame.index) and
[`DataFrame.columns`](maxframe.dataframe.DataFrame.columns.md#maxframe.dataframe.DataFrame.columns) attributes of the
[`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame) instance are placed in the query namespace
by default, which allows you to treat both the index and columns of the
frame as a column in the frame.
The identifier `index` is used for the frame index; you can also
use the name of the index to identify it in a query. Please note that
Python keywords may not be used as identifiers.
For further details and examples see the `query` documentation in
[indexing](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-query).
*Backtick quoted variables*
Backtick quoted variables are parsed as literal Python code and
are converted internally to a Python valid identifier.
This can lead to the following problems.
During parsing a number of disallowed characters inside the backtick
quoted string are replaced by strings that are allowed as a Python identifier.
These characters include all operators in Python, the space character, the
question mark, the exclamation mark, the dollar sign, and the euro sign.
For other characters that fall outside the ASCII range (U+0001..U+007F)
and those that are not further specified in PEP 3131,
the query parser will raise an error.
This excludes whitespace different than the space character,
but also the hashtag (as it is used for comments) and the backtick
itself (backtick can also not be escaped).
In a special case, quotes that make a pair around a backtick can
confuse the parser.
For example, ``it's` > `that's`` will raise an error,
as it forms a quoted string (`'s > `that'`) with a backtick inside.
See also the Python documentation about lexical analysis
([https://docs.python.org/3/reference/lexical_analysis.html](https://docs.python.org/3/reference/lexical_analysis.html))
in combination with the source code in `pandas.core.computation.parsing`.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A': range(1, 6),
... 'B': range(10, 0, -2),
... 'C C': range(10, 5, -1)})
>>> df.execute()
A B C C
0 1 10 10
1 2 8 9
2 3 6 8
3 4 4 7
4 5 2 6
>>> df.query('A > B').execute()
A B C C
4 5 2 6
```
The previous expression is equivalent to
```pycon
>>> df[df.A > df.B].execute()
A B C C
4 5 2 6
```
For columns with spaces in their name, you can use backtick quoting.
```pycon
>>> df.query('B == `C C`').execute()
A B C C
0 1 10 10
```
The previous expression is equivalent to
```pycon
>>> df[df.B == df['C C']].execute()
A B C C
0 1 10 10
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.radd.md
# maxframe.dataframe.DataFrame.radd
#### DataFrame.radd(other, axis='columns', level=None, fill_value=None)
Get Addition of dataframe and other, element-wise (binary operator radd).
Equivalent to `+`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, add.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.rank.md
# maxframe.dataframe.DataFrame.rank
#### DataFrame.rank(axis=0, method='average', numeric_only=False, na_option='keep', ascending=True, pct=False)
Compute numerical data ranks (1 through n) along axis.
By default, equal values are assigned a rank that is the average of the
ranks of those values.
* **Parameters:**
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – Index to direct ranking.
* **method** ( *{'average'* *,* *'min'* *,* *'max'* *,* *'first'* *,* *'dense'}* *,* *default 'average'*) –
How to rank the group of records that have the same value (i.e. ties):
* average: average rank of the group
* min: lowest rank in the group
* max: highest rank in the group
* first: ranks assigned in order they appear in the array
* dense: like ‘min’, but rank always increases by 1 between groups.
* **numeric_only** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – For DataFrame objects, rank only numeric columns if set to True.
* **na_option** ( *{'keep'* *,* *'top'* *,* *'bottom'}* *,* *default 'keep'*) –
How to rank NaN values:
* keep: assign NaN rank to NaN values
* top: assign lowest rank to NaN values
* bottom: assign highest rank to NaN values
* **ascending** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Whether or not the elements should be ranked in ascending order.
* **pct** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether or not to display the returned rankings in percentile
form.
* **Returns:**
Return a Series or DataFrame with data ranks as values.
* **Return type:**
same type as caller
#### SEE ALSO
`core.groupby.GroupBy.rank`
: Rank of values within each group.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(data={'Animal': ['cat', 'penguin', 'dog',
... 'spider', 'snake'],
... 'Number_legs': [4, 2, 4, 8, mt.nan]})
>>> df.execute()
Animal Number_legs
0 cat 4.0
1 penguin 2.0
2 dog 4.0
3 spider 8.0
4 snake NaN
```
The following example shows how the method behaves with the above
parameters:
* default_rank: this is the default behaviour obtained without using
any parameter.
* max_rank: setting `method = 'max'` the records that have the
same values are ranked using the highest rank (e.g.: since ‘cat’
and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)
* NA_bottom: choosing `na_option = 'bottom'`, if there are records
with NaN values they are placed at the bottom of the ranking.
* pct_rank: when setting `pct = True`, the ranking is expressed as
percentile rank.
```pycon
>>> df['default_rank'] = df['Number_legs'].rank()
>>> df['max_rank'] = df['Number_legs'].rank(method='max')
>>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
>>> df['pct_rank'] = df['Number_legs'].rank(pct=True)
>>> df.execute()
Animal Number_legs default_rank max_rank NA_bottom pct_rank
0 cat 4.0 2.5 3.0 2.5 0.625
1 penguin 2.0 1.0 1.0 1.0 0.250
2 dog 4.0 2.5 3.0 2.5 0.625
3 spider 8.0 4.0 4.0 4.0 1.000
4 snake NaN NaN NaN 5.0 NaN
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.rdiv.md
# maxframe.dataframe.DataFrame.rdiv
#### DataFrame.rdiv(other, axis='columns', level=None, fill_value=None)
Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
Equivalent to `/`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, truediv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.reindex.md
# maxframe.dataframe.DataFrame.reindex
#### DataFrame.reindex(labels=None, , index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=None, limit=None, tolerance=None, enable_sparse=False)
Conform Series/DataFrame to new index with optional filling logic.
Places NA/NaN in locations having no value in the previous index. A new object
is produced unless the new index is equivalent to the current one and
`copy=False`.
* **Parameters:**
* **labels** (*array-like* *,* *optional*) – New labels / index to conform the axis specified by ‘axis’ to.
* **index** (*array-like* *,* *optional*) – New labels / index to conform to, should be specified using
keywords. Preferably an Index object to avoid duplicating data.
* **columns** (*array-like* *,* *optional*) – New labels / index to conform to, should be specified using
keywords. Preferably an Index object to avoid duplicating data.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Axis to target. Can be either the axis name (‘index’, ‘columns’)
or number (0, 1).
* **method** ( *{None* *,* *'backfill'/'bfill'* *,* *'pad'/'ffill'* *,* *'nearest'}*) –
Method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.
* None (default): don’t fill gaps
* pad / ffill: Propagate last valid observation forward to next
valid.
* backfill / bfill: Use next valid observation to fill gap.
* nearest: Use nearest valid observations to fill gap.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Return a new object, even if the passed indexes are the same.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** (*scalar* *,* *default np.NaN*) – Value to use for missing values. Defaults to NaN, but can be any
“compatible” value.
* **limit** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Maximum number of consecutive elements to forward or backward fill.
* **tolerance** (*optional*) –
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations most
satisfy the equation `abs(index[indexer] - target) <= tolerance`.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index’s type.
* **Return type:**
Series/DataFrame with changed index.
#### SEE ALSO
[`DataFrame.set_index`](maxframe.dataframe.DataFrame.set_index.md#maxframe.dataframe.DataFrame.set_index)
: Set row labels.
[`DataFrame.reset_index`](maxframe.dataframe.DataFrame.reset_index.md#maxframe.dataframe.DataFrame.reset_index)
: Remove row labels or move them to new columns.
[`DataFrame.reindex_like`](maxframe.dataframe.DataFrame.reindex_like.md#maxframe.dataframe.DataFrame.reindex_like)
: Change to same indices as other DataFrame.
### Examples
`DataFrame.reindex` supports two calling conventions
* `(index=index_labels, columns=column_labels, ...)`
* `(labels, axis={'index', 'columns'}, ...)`
We *highly* recommend using keyword arguments to clarify your
intent.
Create a dataframe with some fictional data.
```pycon
>>> import maxframe.dataframe as md
>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = md.DataFrame({'http_status': [200, 200, 404, 404, 301],
... 'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
... index=index)
>>> df.execute()
http_status response_time
Firefox 200 0.04
Chrome 200 0.02
Safari 404 0.07
IE10 404 0.08
Konqueror 301 1.00
```
Create a new index and reindex the dataframe. By default
values in the new index that do not have corresponding
records in the dataframe are assigned `NaN`.
```pycon
>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
... 'Chrome']
>>> df.reindex(new_index).execute()
http_status response_time
Safari 404.0 0.07
Iceweasel NaN NaN
Comodo Dragon NaN NaN
IE10 404.0 0.08
Chrome 200.0 0.02
```
We can fill in the missing values by passing a value to
the keyword `fill_value`. Because the index is not monotonically
increasing or decreasing, we cannot use arguments to the keyword
`method` to fill the `NaN` values.
```pycon
>>> df.reindex(new_index, fill_value=0).execute()
http_status response_time
Safari 404 0.07
Iceweasel 0 0.00
Comodo Dragon 0 0.00
IE10 404 0.08
Chrome 200 0.02
```
```pycon
>>> df.reindex(new_index, fill_value='missing').execute()
http_status response_time
Safari 404 0.07
Iceweasel missing missing
Comodo Dragon missing missing
IE10 404 0.08
Chrome 200 0.02
```
We can also reindex the columns.
```pycon
>>> df.reindex(columns=['http_status', 'user_agent']).execute()
http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
IE10 404 NaN
Konqueror 301 NaN
```
Or we can use “axis-style” keyword arguments
```pycon
>>> df.reindex(['http_status', 'user_agent'], axis="columns").execute()
http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
IE10 404 NaN
Konqueror 301 NaN
```
To further illustrate the filling functionality in
`reindex`, we will create a dataframe with a
monotonically increasing index (for example, a sequence
of dates).
```pycon
>>> date_index = md.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = md.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
... index=date_index)
>>> df2.execute()
prices
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
```
Suppose we decide to expand the dataframe to cover a wider
date range.
```pycon
>>> date_index2 = md.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2).execute()
prices
2009-12-29 NaN
2009-12-30 NaN
2009-12-31 NaN
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
2010-01-07 NaN
```
The index entries that did not have a value in the original data frame
(for example, ‘2009-12-29’) are by default filled with `NaN`.
If desired, we can fill in the missing values using one of several
options.
For example, to back-propagate the last valid value to fill the `NaN`
values, pass `bfill` as an argument to the `method` keyword.
```pycon
>>> df2.reindex(date_index2, method='bfill').execute()
prices
2009-12-29 100.0
2009-12-30 100.0
2009-12-31 100.0
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
2010-01-07 NaN
```
Please note that the `NaN` value present in the original dataframe
(at index value 2010-01-03) will not be filled by any of the
value propagation schemes. This is because filling while reindexing
does not look at dataframe values, but only compares the original and
desired indexes. If you do want to fill in the `NaN` values present
in the original dataframe, use the `fillna()` method.
See the [user guide](https://pandas.pydata.org/docs/user_guide/basics.html#basics-reindexing) for more.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.reindex_like.md
# maxframe.dataframe.DataFrame.reindex_like
#### DataFrame.reindex_like(other, method=None, copy=True, limit=None, tolerance=None)
Return an object with matching indices as other object.
Conform the object to the same index on all axes. Optional
filling logic, placing NaN in locations having no value
in the previous index. A new object is produced unless the
new index is equivalent to the current one and copy=False.
* **Parameters:**
* **other** (*Object* *of* *the same data type*) – Its row and column indices are used to define the new indices
of this object.
* **method** ( *{None* *,* *'backfill'/'bfill'* *,* *'pad'/'ffill'* *,* *'nearest'}*) –
Method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.
* None (default): don’t fill gaps
* pad / ffill: propagate last valid observation forward to next
valid
* backfill / bfill: use next valid observation to fill gap
* nearest: use nearest valid observations to fill gap.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Return a new object, even if the passed indexes are the same.
* **limit** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Maximum number of consecutive labels to fill for inexact matches.
* **tolerance** (*optional*) –
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation `abs(index[indexer] - target) <= tolerance`.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index’s type.
* **Returns:**
Same type as caller, but with changed indices on each axis.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.set_index`](maxframe.dataframe.DataFrame.set_index.md#maxframe.dataframe.DataFrame.set_index)
: Set row labels.
[`DataFrame.reset_index`](maxframe.dataframe.DataFrame.reset_index.md#maxframe.dataframe.DataFrame.reset_index)
: Remove row labels or move them to new columns.
[`DataFrame.reindex`](maxframe.dataframe.DataFrame.reindex.md#maxframe.dataframe.DataFrame.reindex)
: Change to new indices or expand indices.
### Notes
Same as calling
`.reindex(index=other.index, columns=other.columns,...)`.
### Examples
```pycon
>>> import pandas as pd
>>> import maxframe.dataframe as md
>>> df1 = md.DataFrame([[24.3, 75.7, 'high'],
... [31, 87.8, 'high'],
... [22, 71.6, 'medium'],
... [35, 95, 'medium']],
... columns=['temp_celsius', 'temp_fahrenheit',
... 'windspeed'],
... index=md.date_range(start='2014-02-12',
... end='2014-02-15', freq='D'))
```
```pycon
>>> df1.execute()
temp_celsius temp_fahrenheit windspeed
2014-02-12 24.3 75.7 high
2014-02-13 31 87.8 high
2014-02-14 22 71.6 medium
2014-02-15 35 95 medium
```
```pycon
>>> df2 = md.DataFrame([[28, 'low'],
... [30, 'low'],
... [35.1, 'medium']],
... columns=['temp_celsius', 'windspeed'],
... index=pd.DatetimeIndex(['2014-02-12', '2014-02-13',
... '2014-02-15']))
```
```pycon
>>> df2.execute()
temp_celsius windspeed
2014-02-12 28.0 low
2014-02-13 30.0 low
2014-02-15 35.1 medium
```
```pycon
>>> df2.reindex_like(df1).execute()
temp_celsius temp_fahrenheit windspeed
2014-02-12 28.0 NaN low
2014-02-13 30.0 NaN low
2014-02-14 NaN NaN NaN
2014-02-15 35.1 NaN medium
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.rename.md
# maxframe.dataframe.DataFrame.rename
#### DataFrame.rename(mapper=None, index=None, columns=None, axis='index', copy=True, inplace=False, level=None, errors='ignore')
Alter axes labels.
Function / dict values must be unique (1-to-1). Labels not contained in
a dict / Series will be left as-is. Extra labels listed don’t throw an
error.
* **Parameters:**
* **mapper** (*dict-like* *or* *function*) – Dict-like or functions transformations to apply to
that axis’ values. Use either `mapper` and `axis` to
specify the axis to target with `mapper`, or `index` and
`columns`.
* **index** (*dict-like* *or* *function*) – Alternative to specifying axis (`mapper, axis=0`
is equivalent to `index=mapper`).
* **columns** (*dict-like* *or* *function*) – Alternative to specifying axis (`mapper, axis=1`
is equivalent to `columns=mapper`).
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Axis to target with `mapper`. Can be either the axis name
(‘index’, ‘columns’) or number (0, 1). The default is ‘index’.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Also copy underlying data.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether to return a new DataFrame. If True then value of copy is
ignored.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *level name* *,* *default None*) – In case of a MultiIndex, only rename labels in the specified
level.
* **errors** ( *{'ignore'* *,* *'raise'}* *,* *default 'ignore'*) – If ‘raise’, raise a KeyError when a dict-like mapper, index,
or columns contains labels that are not present in the Index
being transformed.
If ‘ignore’, existing keys will be renamed and extra keys will be
ignored.
* **Returns:**
DataFrame with the renamed axis labels.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
* **Raises:**
[**KeyError**](https://docs.python.org/3/library/exceptions.html#KeyError) – If any of the labels is not found in the selected axis and
“errors=’raise’”.
#### SEE ALSO
[`DataFrame.rename_axis`](maxframe.dataframe.DataFrame.rename_axis.md#maxframe.dataframe.DataFrame.rename_axis)
: Set the name of the axis.
### Examples
`DataFrame.rename` supports two calling conventions
* `(index=index_mapper, columns=columns_mapper, ...)`
* `(mapper, axis={'index', 'columns'}, ...)`
We *highly* recommend using keyword arguments to clarify your
intent.
Rename columns using a mapping:
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df.rename(columns={"A": "a", "B": "c"}).execute()
a c
0 1 4
1 2 5
2 3 6
```
Rename index using a mapping:
```pycon
>>> df.rename(index={0: "x", 1: "y", 2: "z"}).execute()
A B
x 1 4
y 2 5
z 3 6
```
Cast index labels to a different type:
```pycon
>>> df.index.execute()
RangeIndex(start=0, stop=3, step=1)
>>> df.rename(index=str).index.execute()
Index(['0', '1', '2'], dtype='object')
```
```pycon
>>> df.rename(columns={"A": "a", "B": "b", "C": "c"}, errors="raise").execute()
Traceback (most recent call last):
KeyError: ['C'] not found in axis
```
Using axis-style parameters
```pycon
>>> df.rename(str.lower, axis='columns').execute()
a b
0 1 4
1 2 5
2 3 6
```
```pycon
>>> df.rename({1: 2, 2: 4}, axis='index').execute()
A B
0 1 4
2 2 5
4 3 6
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.rename_axis.md
# maxframe.dataframe.DataFrame.rename_axis
#### DataFrame.rename_axis(mapper=<no_default>, index=<no_default>, columns=<no_default>, axis=0, copy=True, inplace=False)
Set the name of the axis for the index or columns.
* **Parameters:**
* **mapper** (*scalar* *,* *list-like* *,* *optional*) – Value to set the axis name attribute.
* **index** (*scalar* *,* *list-like* *,* *dict-like* *or* *function* *,* *optional*) –
A scalar, list-like, dict-like or functions transformations to
apply to that axis’ values.
Note that the `columns` parameter is not allowed if the
object is a Series. This parameter only apply for DataFrame
type objects.
Use either `mapper` and `axis` to
specify the axis to target with `mapper`, or `index`
and/or `columns`.
* **columns** (*scalar* *,* *list-like* *,* *dict-like* *or* *function* *,* *optional*) –
A scalar, list-like, dict-like or functions transformations to
apply to that axis’ values.
Note that the `columns` parameter is not allowed if the
object is a Series. This parameter only apply for DataFrame
type objects.
Use either `mapper` and `axis` to
specify the axis to target with `mapper`, or `index`
and/or `columns`.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – The axis to rename.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Also copy underlying data.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Modifies the object directly, instead of creating a new Series
or DataFrame.
* **Returns:**
The same type as the caller or None if inplace is True.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series), [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame), or None
#### SEE ALSO
[`Series.rename`](maxframe.dataframe.Series.rename.md#maxframe.dataframe.Series.rename)
: Alter Series index labels or name.
[`DataFrame.rename`](maxframe.dataframe.DataFrame.rename.md#maxframe.dataframe.DataFrame.rename)
: Alter DataFrame index labels or name.
[`Index.rename`](maxframe.dataframe.Index.rename.md#maxframe.dataframe.Index.rename)
: Set new names on index.
### Notes
`DataFrame.rename_axis` supports two calling conventions
* `(index=index_mapper, columns=columns_mapper, ...)`
* `(mapper, axis={'index', 'columns'}, ...)`
The first calling convention will only modify the names of
the index and/or the names of the Index object that is the columns.
In this case, the parameter `copy` is ignored.
The second calling convention will modify the names of the
the corresponding index if mapper is a list or a scalar.
However, if mapper is dict-like or a function, it will use the
deprecated behavior of modifying the axis *labels*.
We *highly* recommend using keyword arguments to clarify your
intent.
### Examples
**Series**
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(["dog", "cat", "monkey"])
>>> s.execute()
0 dog
1 cat
2 monkey
dtype: object
>>> s.rename_axis("animal").execute()
animal
0 dog
1 cat
2 monkey
dtype: object
```
**DataFrame**
```pycon
>>> df = md.DataFrame({"num_legs": [4, 4, 2],
... "num_arms": [0, 0, 2]},
... ["dog", "cat", "monkey"])
>>> df.execute()
num_legs num_arms
dog 4 0
cat 4 0
monkey 2 2
>>> df = df.rename_axis("animal")
>>> df.execute()
num_legs num_arms
animal
dog 4 0
cat 4 0
monkey 2 2
>>> df = df.rename_axis("limbs", axis="columns")
>>> df.execute()
limbs num_legs num_arms
animal
dog 4 0
cat 4 0
monkey 2 2
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.reorder_levels.md
# maxframe.dataframe.DataFrame.reorder_levels
#### DataFrame.reorder_levels(order, axis=0)
Rearrange index levels using input order. May not drop or duplicate levels.
* **Parameters:**
* **order** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – List representing new level order. Reference level by number
(position) or by key (label).
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – Where to reorder levels.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> data = {
... "class": ["Mammals", "Mammals", "Reptiles"],
... "diet": ["Omnivore", "Carnivore", "Carnivore"],
... "species": ["Humans", "Dogs", "Snakes"],
... }
>>> df = md.DataFrame(data, columns=["class", "diet", "species"])
>>> df = df.set_index(["class", "diet"])
>>> df.execute()
species
class diet
Mammals Omnivore Humans
Carnivore Dogs
Reptiles Carnivore Snakes
```
Let’s reorder the levels of the index:
```pycon
>>> df.reorder_levels(["diet", "class"]).execute()
species
diet class
Omnivore Mammals Humans
Carnivore Mammals Dogs
Reptiles Snakes
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.reset_index.md
# maxframe.dataframe.DataFrame.reset_index
#### DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='', names=None, default_index_type: DefaultIndexType | [str](https://docs.python.org/3/library/stdtypes.html#str) = None, \*\*kwargs)
Reset the index, or a level of it.
Reset the index of the DataFrame, and use the default one instead.
If the DataFrame has a MultiIndex, this method can remove one or more
levels.
* **Parameters:**
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *, or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *default None*) – Only remove the given levels from the index. Removes all levels by
default.
* **drop** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Do not try to insert index into dataframe columns. This resets
the index to the default integer index.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Modify the DataFrame in place (do not create a new object).
* **col_level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 0*) – If the columns have multiple levels, determines which level the
labels are inserted into. By default it is inserted into the first
level.
* **col_fill** ([*object*](https://docs.python.org/3/library/functions.html#object) *,* *default ''*) – If the columns have multiple levels, determines how the other
levels are named. If None then the index name is repeated.
* **Returns:**
DataFrame with the new index or None if `inplace=True`.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or None
#### SEE ALSO
[`DataFrame.set_index`](maxframe.dataframe.DataFrame.set_index.md#maxframe.dataframe.DataFrame.set_index)
: Opposite of reset_index.
[`DataFrame.reindex`](maxframe.dataframe.DataFrame.reindex.md#maxframe.dataframe.DataFrame.reindex)
: Change to new indices or expand indices.
[`DataFrame.reindex_like`](maxframe.dataframe.DataFrame.reindex_like.md#maxframe.dataframe.DataFrame.reindex_like)
: Change to same indices as other DataFrame.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([('bird', 389.0),
... ('bird', 24.0),
... ('mammal', 80.5),
... ('mammal', mt.nan)],
... index=['falcon', 'parrot', 'lion', 'monkey'],
... columns=('class', 'max_speed'))
>>> df.execute()
class max_speed
falcon bird 389.0
parrot bird 24.0
lion mammal 80.5
monkey mammal NaN
```
When we reset the index, the old index is added as a column, and a
new sequential index is used:
```pycon
>>> df.reset_index().execute()
index class max_speed
0 falcon bird 389.0
1 parrot bird 24.0
2 lion mammal 80.5
3 monkey mammal NaN
```
We can use the drop parameter to avoid the old index being added as
a column:
```pycon
>>> df.reset_index(drop=True).execute()
class max_speed
0 bird 389.0
1 bird 24.0
2 mammal 80.5
3 mammal NaN
```
You can also use reset_index with MultiIndex.
```pycon
>>> import pandas as pd
>>> index = pd.MultiIndex.from_tuples([('bird', 'falcon'),
... ('bird', 'parrot'),
... ('mammal', 'lion'),
... ('mammal', 'monkey')],
... names=['class', 'name'])
>>> columns = pd.MultiIndex.from_tuples([('speed', 'max'),
... ('species', 'type')])
>>> df = md.DataFrame([(389.0, 'fly'),
... ( 24.0, 'fly'),
... ( 80.5, 'run'),
... (mt.nan, 'jump')],
... index=index,
... columns=columns)
>>> df.execute()
speed species
max type
class name
bird falcon 389.0 fly
parrot 24.0 fly
mammal lion 80.5 run
monkey NaN jump
```
If the index has multiple levels, we can reset a subset of them:
```pycon
>>> df.reset_index(level='class').execute()
class speed species
max type
name
falcon bird 389.0 fly
parrot bird 24.0 fly
lion mammal 80.5 run
monkey mammal NaN jump
```
If we are not dropping the index, by default, it is placed in the top
level. We can place it in another level:
```pycon
>>> df.reset_index(level='class', col_level=1).execute()
speed species
class max type
name
falcon bird 389.0 fly
parrot bird 24.0 fly
lion mammal 80.5 run
monkey mammal NaN jump
```
When the index is inserted under another level, we can specify under
which one with the parameter col_fill:
```pycon
>>> df.reset_index(level='class', col_level=1, col_fill='species').execute()
species speed species
class max type
name
falcon bird 389.0 fly
parrot bird 24.0 fly
lion mammal 80.5 run
monkey mammal NaN jump
```
If we specify a nonexistent level for col_fill, it is created:
```pycon
>>> df.reset_index(level='class', col_level=1, col_fill='genus').execute()
genus speed species
class max type
name
falcon bird 389.0 fly
parrot bird 24.0 fly
lion mammal 80.5 run
monkey mammal NaN jump
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.rfloordiv.md
# maxframe.dataframe.DataFrame.rfloordiv
#### DataFrame.rfloordiv(other, axis='columns', level=None, fill_value=None)
Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).
Equivalent to `//`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, floordiv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.rmod.md
# maxframe.dataframe.DataFrame.rmod
#### DataFrame.rmod(other, axis='columns', level=None, fill_value=None)
Get Modulo of dataframe and other, element-wise (binary operator rmod).
Equivalent to `%`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, mod.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.rmul.md
# maxframe.dataframe.DataFrame.rmul
#### DataFrame.rmul(other, axis='columns', level=None, fill_value=None)
Get Multiplication of dataframe and other, element-wise (binary operator rmul).
Equivalent to `*`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, mul.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.rolling.md
# maxframe.dataframe.DataFrame.rolling
#### DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)
Provide rolling window calculations.
* **Parameters:**
* **window** ([*int*](https://docs.python.org/3/library/functions.html#int) *, or* *offset*) – Size of the moving window. This is the number of observations used for
calculating the statistic. Each window will be a fixed size.
If its an offset then this will be the time period of each window. Each
window will be a variable sized based on the observations included in
the time-period. This is only valid for datetimelike indexes. This is
new in 0.19.0
* **min_periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Minimum number of observations in window required to have a value
(otherwise result is NA). For a window that is specified by an offset,
min_periods will default to 1. Otherwise, min_periods will default
to the size of the window.
* **center** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Set the labels at the center of the window.
* **win_type** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Provide a window type. If `None`, all points are evenly weighted.
See the notes below for further information.
* **on** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – For a DataFrame, a datetime-like column on which to calculate the rolling
window, rather than the DataFrame’s index. Provided integer column is
ignored and excluded from result since an integer index is not used to
calculate the rolling window.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 0*)
* **closed** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Make the interval closed on the ‘right’, ‘left’, ‘both’ or
‘neither’ endpoints.
For offset-based windows, it defaults to ‘right’.
For fixed windows, defaults to ‘both’. Remaining cases not implemented
for fixed windows.
* **Return type:**
a Window or Rolling sub-classed for the particular operation
#### SEE ALSO
[`expanding`](maxframe.dataframe.DataFrame.expanding.md#maxframe.dataframe.DataFrame.expanding)
: Provides expanding transformations.
[`ewm`](maxframe.dataframe.DataFrame.ewm.md#maxframe.dataframe.DataFrame.ewm)
: Provides exponential weighted functions.
### Notes
By default, the result is set to the right edge of the window. This can be
changed to the center of the window by setting `center=True`.
To learn more about the offsets & frequency strings, please see [this link](http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).
The recognized win_types are:
\* `boxcar`
\* `triang`
\* `blackman`
\* `hamming`
\* `bartlett`
\* `parzen`
\* `bohman`
\* `blackmanharris`
\* `nuttall`
\* `barthann`
\* `kaiser` (needs beta)
\* `gaussian` (needs std)
\* `general_gaussian` (needs power, width)
\* `slepian` (needs width)
\* `exponential` (needs tau), center is set to None.
If `win_type=None` all points are evenly weighted. To learn more about
different window types see [scipy.signal window functions](https://docs.scipy.org/doc/scipy/reference/signal.html#window-functions).
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df.execute()
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
```
Rolling sum with a window length of 2, using the ‘triang’
window type.
```pycon
>>> df.rolling(2, win_type='triang').sum().execute()
B
0 NaN
1 0.5
2 1.5
3 NaN
4 NaN
```
Rolling sum with a window length of 2, min_periods defaults
to the window length.
```pycon
>>> df.rolling(2).sum().execute()
B
0 NaN
1 1.0
2 3.0
3 NaN
4 NaN
```
Same as above, but explicitly set the min_periods
```pycon
>>> df.rolling(2, min_periods=1).sum().execute()
B
0 0.0
1 1.0
2 3.0
3 2.0
4 4.0
```
A ragged (meaning not-a-regular frequency), time-indexed DataFrame
```pycon
>>> df = md.DataFrame({'B': [0, 1, 2, np.nan, 4]},
>>> index = [md.Timestamp('20130101 09:00:00'),
>>> md.Timestamp('20130101 09:00:02'),
>>> md.Timestamp('20130101 09:00:03'),
>>> md.Timestamp('20130101 09:00:05'),
>>> md.Timestamp('20130101 09:00:06')])
>>> df.execute()
B
2013-01-01 09:00:00 0.0
2013-01-01 09:00:02 1.0
2013-01-01 09:00:03 2.0
2013-01-01 09:00:05 NaN
2013-01-01 09:00:06 4.0
```
Contrasting to an integer rolling window, this will roll a variable
length window corresponding to the time period.
The default for min_periods is 1.
```pycon
>>> df.rolling('2s').sum().execute()
B
2013-01-01 09:00:00 0.0
2013-01-01 09:00:02 1.0
2013-01-01 09:00:03 3.0
2013-01-01 09:00:05 NaN
2013-01-01 09:00:06 4.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.round.md
# maxframe.dataframe.DataFrame.round
#### DataFrame.round(decimals=0, \*args, \*\*kwargs)
Round a DataFrame to a variable number of decimal places.
* **Parameters:**
* **decimals** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) – Number of decimal places to round each column to. If an int is
given, round each column to the same number of places.
Otherwise dict and Series round to variable numbers of places.
Column names should be in the keys if decimals is a
dict-like. Any columns not included in decimals will be left
as is. Elements of decimals which are not columns of the
input will be ignored.
* **\*args** – Additional keywords have no effect but might be accepted for
compatibility with numpy.
* **\*\*kwargs** – Additional keywords have no effect but might be accepted for
compatibility with numpy.
* **Returns:**
A DataFrame with the affected columns rounded to the specified
number of decimal places.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`numpy.around`](https://numpy.org/doc/stable/reference/generated/numpy.around.html#numpy.around)
: Round a numpy array to the given number of decimals.
[`Series.round`](maxframe.dataframe.Series.round.md#maxframe.dataframe.Series.round)
: Round a Series to the given number of decimals.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([(.21, .32), (.01, .67), (.66, .03), (.21, .18)],
... columns=['dogs', 'cats'])
>>> df.execute()
dogs cats
0 0.21 0.32
1 0.01 0.67
2 0.66 0.03
3 0.21 0.18
```
By providing an integer each column is rounded to the same number
of decimal places
```pycon
>>> df.round(1).execute()
dogs cats
0 0.2 0.3
1 0.0 0.7
2 0.7 0.0
3 0.2 0.2
```
With a dict, the number of places for specific columns can be
specified with the column names as key and the number of decimal
places as value
```pycon
>>> df.round({'dogs': 1, 'cats': 0}).execute()
dogs cats
0 0.2 0.0
1 0.0 1.0
2 0.7 0.0
3 0.2 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.rpow.md
# maxframe.dataframe.DataFrame.rpow
#### DataFrame.rpow(other, axis='columns', level=None, fill_value=None)
Get Exponential power of dataframe and other, element-wise (binary operator rpow).
Equivalent to `**`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, pow.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.rsub.md
# maxframe.dataframe.DataFrame.rsub
#### DataFrame.rsub(other, axis='columns', level=None, fill_value=None)
Get Subtraction of dataframe and other, element-wise (binary operator rsubtract).
Equivalent to `-`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, subtract.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.rtruediv.md
# maxframe.dataframe.DataFrame.rtruediv
#### DataFrame.rtruediv(other, axis='columns', level=None, fill_value=None)
Get Floating division of dataframe and other, element-wise (binary operator rtruediv).
Equivalent to `/`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, truediv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.sample.md
# maxframe.dataframe.DataFrame.sample
#### DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, always_multinomial=False)
Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
* **Parameters:**
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Number of items from axis to return. Cannot be used with frac.
Default = 1 if frac = None.
* **frac** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Fraction of axis items to return. Cannot be used with n.
* **replace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Allow or disallow sampling of the same row more than once.
* **weights** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *ndarray-like* *,* *optional*) – Default ‘None’ results in equal probability weighting.
If passed a Series, will align with target object on index. Index
values in weights not found in sampled object will be ignored and
index values in sampled object not in weights will be assigned
weights of zero.
If called on a DataFrame, will accept the name of a column
when axis = 0.
Unless weights are a Series, weights must be same length as axis
being sampled.
If weights do not sum to 1, they will be normalized to sum to 1.
Missing values in the weights column will be treated as zero.
Infinite values not allowed.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *array-like* *,* *BitGenerator* *,* *np.random.RandomState* *,* *optional*) – If int, array-like, or BitGenerator (NumPy>=1.17), seed for
random number generator
If np.random.RandomState, use as numpy RandomState object.
* **axis** ( *{0* *or* *‘index’* *,* *1* *or* *‘columns’* *,* *None}* *,* *default None*) – Axis to sample. Accepts axis number or name. Default is stat axis
for given data type (0 for Series and DataFrames).
* **always_multinomial** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, always treat distribution of sample counts between data chunks
as multinomial distribution. This will accelerate sampling when data
is huge, but may affect randomness of samples when number of instances
is not very large.
* **Returns:**
A new object of same type as caller containing n items randomly
sampled from the caller object.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`DataFrameGroupBy.sample`
: Generates random samples from each group of a DataFrame object.
`SeriesGroupBy.sample`
: Generates random samples from each group of a Series object.
[`numpy.random.choice`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html#numpy.random.choice)
: Generates a random sample from a given 1-D numpy array.
### Notes
If frac > 1, replacement should be set to True.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'num_legs': [2, 4, 8, 0],
... 'num_wings': [2, 0, 0, 0],
... 'num_specimen_seen': [10, 2, 1, 8]},
... index=['falcon', 'dog', 'spider', 'fish'])
>>> df.execute()
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
```
Extract 3 random elements from the `Series` `df['num_legs']`:
Note that we use random_state to ensure the reproducibility of
the examples.
```pycon
>>> df['num_legs'].sample(n=3, random_state=1).execute()
fish 0
spider 8
falcon 2
Name: num_legs, dtype: int64
```
A random 50% sample of the `DataFrame` with replacement:
```pycon
>>> df.sample(frac=0.5, replace=True, random_state=1).execute()
num_legs num_wings num_specimen_seen
dog 4 0 2
fish 0 0 8
```
An upsample sample of the `DataFrame` with replacement:
Note that replace parameter has to be True for frac parameter > 1.
```pycon
>>> df.sample(frac=2, replace=True, random_state=1).execute()
num_legs num_wings num_specimen_seen
dog 4 0 2
fish 0 0 8
falcon 2 2 10
falcon 2 2 10
fish 0 0 8
dog 4 0 2
fish 0 0 8
dog 4 0 2
```
Using a DataFrame column as weights. Rows with larger value in the
num_specimen_seen column are more likely to be sampled.
```pycon
>>> df.sample(n=2, weights='num_specimen_seen', random_state=1).execute()
num_legs num_wings num_specimen_seen
falcon 2 2 10
fish 0 0 8
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.select_dtypes.md
# maxframe.dataframe.DataFrame.select_dtypes
#### DataFrame.select_dtypes(include=None, exclude=None)
Return a subset of the DataFrame’s columns based on the column dtypes.
* **Parameters:**
* **include** (*scalar* *or* *list-like*) – A selection of dtypes or strings to be included/excluded. At least
one of these parameters must be supplied.
* **exclude** (*scalar* *or* *list-like*) – A selection of dtypes or strings to be included/excluded. At least
one of these parameters must be supplied.
* **Returns:**
The subset of the frame including the dtypes in `include` and
excluding the dtypes in `exclude`.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) –
* If both of `include` and `exclude` are empty
\* If `include` and `exclude` have overlapping elements
\* If any kind of string dtype is passed in.
#### SEE ALSO
[`DataFrame.dtypes`](maxframe.dataframe.DataFrame.dtypes.md#maxframe.dataframe.DataFrame.dtypes)
: Return Series with the data type of each column.
### Notes
* To select all *numeric* types, use `np.number` or `'number'`
* To select strings you must use the `object` dtype, but note that
this will return *all* object dtype columns
* See the [numpy dtype hierarchy](https://numpy.org/doc/stable/reference/arrays.scalars.html)
* To select datetimes, use `np.datetime64`, `'datetime'` or
`'datetime64'`
* To select timedeltas, use `np.timedelta64`, `'timedelta'` or
`'timedelta64'`
* To select Pandas categorical dtypes, use `'category'`
* To select Pandas datetimetz dtypes, use `'datetimetz'` (new in
0.20.0) or `'datetime64[ns, tz]'`
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'a': [1, 2] * 3,
... 'b': [True, False] * 3,
... 'c': [1.0, 2.0] * 3})
>>> df.execute()
a b c
0 1 True 1.0
1 2 False 2.0
2 1 True 1.0
3 2 False 2.0
4 1 True 1.0
5 2 False 2.0
```
```pycon
>>> df.select_dtypes(include='bool').execute()
b
0 True
1 False
2 True
3 False
4 True
5 False
```
```pycon
>>> df.select_dtypes(include=['float64']).execute()
c
0 1.0
1 2.0
2 1.0
3 2.0
4 1.0
5 2.0
```
```pycon
>>> df.select_dtypes(exclude=['int64']).execute()
b c
0 True 1.0
1 False 2.0
2 True 1.0
3 False 2.0
4 True 1.0
5 False 2.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.sem.md
# maxframe.dataframe.DataFrame.sem
#### DataFrame.sem(axis=None, skipna=True, level=None, ddof=1, numeric_only=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.set_axis.md
# maxframe.dataframe.DataFrame.set_axis
#### DataFrame.set_axis(labels, axis=0, inplace=False)
Assign desired index to given axis.
Indexes for column or row labels can be changed by assigning
a list-like or Index.
* **Parameters:**
* **labels** (*list-like* *,* [*Index*](maxframe.dataframe.Index.md#maxframe.dataframe.Index)) – The values for the new index.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – The axis to update. The value 0 identifies the rows, and 1 identifies the columns.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether to return a new DataFrame instance.
* **Returns:**
**renamed** – An object of type DataFrame or None if `inplace=True`.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or None
#### SEE ALSO
[`DataFrame.rename_axis`](maxframe.dataframe.DataFrame.rename_axis.md#maxframe.dataframe.DataFrame.rename_axis)
: Alter the name of the index or columns.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
```
Change the row labels.
```pycon
>>> df.set_axis(['a', 'b', 'c'], axis='index').execute()
A B
a 1 4
b 2 5
c 3 6
```
Change the column labels.
```pycon
>>> df.set_axis(['I', 'II'], axis='columns').execute()
I II
0 1 4
1 2 5
2 3 6
```
Now, update the labels inplace.
```pycon
>>> df.set_axis(['i', 'ii'], axis='columns', inplace=True)
>>> df.execute()
i ii
0 1 4
1 2 5
2 3 6
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.set_index.md
# maxframe.dataframe.DataFrame.set_index
#### DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
Set the DataFrame index using existing columns.
Set the DataFrame index (row labels) using one or more existing
columns. The index can replace the existing index or expand on it.
* **Parameters:**
* **keys** (*label* *or* *array-like* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *labels*) – This parameter can be either a single column key, or a list containing column keys.
* **drop** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Delete columns to be used as the new index.
* **append** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether to append columns to existing index.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, modifies the DataFrame in place (do not create a new object).
* **verify_integrity** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Check the new index for duplicates. Otherwise defer the check until
necessary. Setting to False will improve the performance of this
method.
* **Returns:**
Changed row labels or None if `inplace=True`.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or None
#### SEE ALSO
[`DataFrame.reset_index`](maxframe.dataframe.DataFrame.reset_index.md#maxframe.dataframe.DataFrame.reset_index)
: Opposite of set_index.
[`DataFrame.reindex`](maxframe.dataframe.DataFrame.reindex.md#maxframe.dataframe.DataFrame.reindex)
: Change to new indices or expand indices.
[`DataFrame.reindex_like`](maxframe.dataframe.DataFrame.reindex_like.md#maxframe.dataframe.DataFrame.reindex_like)
: Change to same indices as other DataFrame.
### Examples
```pycon
>>> import maxframe.dataframe as md
```
```pycon
>>> df = md.DataFrame({'month': [1, 4, 7, 10],
... 'year': [2012, 2014, 2013, 2014],
... 'sale': [55, 40, 84, 31]})
>>> df
month year sale
0 1 2012 55
1 4 2014 40
2 7 2013 84
3 10 2014 31
```
Set the index to become the ‘month’ column:
```pycon
>>> df.set_index('month')
year sale
month
1 2012 55
4 2014 40
7 2013 84
10 2014 31
```
Create a MultiIndex using columns ‘year’ and ‘month’:
```pycon
>>> df.set_index(['year', 'month'])
sale
year month
2012 1 55
2014 4 40
2013 7 84
2014 10 31
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.shape.md
# maxframe.dataframe.DataFrame.shape
#### *property* DataFrame.shape
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.shift.md
# maxframe.dataframe.DataFrame.shift
#### DataFrame.shift(periods=1, freq=None, axis=0, fill_value=None)
Shift index by desired number of periods with an optional time freq.
When freq is not passed, shift the index without realigning the data.
If freq is passed (in this case, the index must be date or datetime,
or it will raise a NotImplementedError), the index will be
increased using the periods and the freq.
* **Parameters:**
* **periods** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Number of periods to shift. Can be positive or negative.
* **freq** (*DateOffset* *,* *tseries.offsets* *,* *timedelta* *, or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Offset to use from the tseries module or time rule (e.g. ‘EOM’).
If freq is specified then the index values are shifted but the
data is not realigned. That is, use freq if you would like to
extend the index when shifting and preserve the original data.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'* *,* *None}* *,* *default None*) – Shift direction.
* **fill_value** ([*object*](https://docs.python.org/3/library/functions.html#object) *,* *optional*) – The scalar value to use for newly introduced missing values.
the default depends on the dtype of self.
For numeric data, `np.nan` is used.
For datetime, timedelta, or period data, etc. `NaT` is used.
For extension dtypes, `self.dtype.na_value` is used.
* **Returns:**
Copy of input object, shifted.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
`Index.shift`
: Shift values of Index.
`DatetimeIndex.shift`
: Shift values of DatetimeIndex.
`PeriodIndex.shift`
: Shift values of PeriodIndex.
[`tshift`](maxframe.dataframe.DataFrame.tshift.md#maxframe.dataframe.DataFrame.tshift)
: Shift the time index, using the index’s frequency if available.
### Examples
```pycon
>>> import maxframe.dataframe as md
```
```pycon
>>> df = md.DataFrame({'Col1': [10, 20, 15, 30, 45],
... 'Col2': [13, 23, 18, 33, 48],
... 'Col3': [17, 27, 22, 37, 52]})
```
```pycon
>>> df.shift(periods=3).execute()
Col1 Col2 Col3
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 10.0 13.0 17.0
4 20.0 23.0 27.0
```
```pycon
>>> df.shift(periods=1, axis='columns').execute()
Col1 Col2 Col3
0 NaN 10.0 13.0
1 NaN 20.0 23.0
2 NaN 15.0 18.0
3 NaN 30.0 33.0
4 NaN 45.0 48.0
```
```pycon
>>> df.shift(periods=3, fill_value=0).execute()
Col1 Col2 Col3
0 0 0 0
1 0 0 0
2 0 0 0
3 10 13 17
4 20 23 27
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.sort_index.md
# maxframe.dataframe.DataFrame.sort_index
#### DataFrame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index: [bool](https://docs.python.org/3/library/functions.html#bool) = False, parallel_kind='PSRS', psrs_kinds=None, default_index_type=None)
Sort object by labels (along an axis).
* **Parameters:**
* **a** (*Input DataFrame* *or* *Series.*)
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – The axis along which to sort. The value 0 identifies the rows,
and 1 identifies the columns.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *level name* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *ints* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *level names*) – If not None, sort on values in specified index level(s).
* **ascending** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Sort ascending vs. descending.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, perform operation in-place.
* **kind** ( *{'quicksort'* *,* *'mergesort'* *,* *'heapsort'}* *,* *default 'quicksort'*) – Choice of sorting algorithm. See also ndarray.np.sort for more
information. mergesort is the only stable algorithm. For
DataFrames, this option is only applied when sorting on a single
column or label.
* **na_position** ( *{'first'* *,* *'last'}* *,* *default 'last'*) – Puts NaNs at the beginning if first; last puts NaNs at the end.
Not implemented for MultiIndex.
* **sort_remaining** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If True and sorting by level and index is multilevel, sort by other
levels too (in order) after sorting by specified level.
* **ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
* **parallel_kind** ( *{'PSRS'}* *,* *optional.*) – Parallel sorting algorithm, for the details, refer to:
[http://csweb.cs.wfu.edu/bigiron/LittleFE-PSRS/build/html/PSRSalgorithm.html](http://csweb.cs.wfu.edu/bigiron/LittleFE-PSRS/build/html/PSRSalgorithm.html)
* **psrs_kinds** (*Sorting algorithms during PSRS algorithm.*)
* **Returns:**
**sorted_obj** – DataFrame with sorted index if inplace=False, None otherwise.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or None
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.sort_values.md
# maxframe.dataframe.DataFrame.sort_values
#### DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, parallel_kind='PSRS', psrs_kinds=None, default_index_type=None)
Sort by the values along either axis.
* **Parameters:**
* **df** (*MaxFrame DataFrame*) – Input dataframe.
* **by** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Name or list of names to sort by.
* **axis** ( *%* *(**axes_single_arg* *)**s* *,* *default 0*) – Axis to be sorted.
* **ascending** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Sort ascending vs. descending. Specify list for multiple sort
orders. If this is a list of bools, must match the length of
the by.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, perform operation in-place.
* **kind** ( *{'quicksort'* *,* *'mergesort'* *,* *'heapsort'}* *,* *default 'quicksort'*) – Choice of sorting algorithm. See also ndarray.np.sort for more
information. mergesort is the only stable algorithm. For
DataFrames, this option is only applied when sorting on a single
column or label.
* **na_position** ( *{'first'* *,* *'last'}* *,* *default 'last'*) – Puts NaNs at the beginning if first; last puts NaNs at the
end.
* **ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
* **parallel_kind** ( *{'PSRS'}* *,* *default 'PSRS'*) – Parallel sorting algorithm, for the details, refer to:
[http://csweb.cs.wfu.edu/bigiron/LittleFE-PSRS/build/html/PSRSalgorithm.html](http://csweb.cs.wfu.edu/bigiron/LittleFE-PSRS/build/html/PSRSalgorithm.html)
* **Returns:**
**sorted_obj** – DataFrame with sorted values if inplace=False, None otherwise.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or None
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({
... 'col1': ['A', 'A', 'B', np.nan, 'D', 'C'],
... 'col2': [2, 1, 9, 8, 7, 4],
... 'col3': [0, 1, 9, 4, 2, 3],
... })
>>> df.execute()
col1 col2 col3
0 A 2 0
1 A 1 1
2 B 9 9
3 NaN 8 4
4 D 7 2
5 C 4 3
```
Sort by col1
```pycon
>>> df.sort_values(by=['col1']).execute()
col1 col2 col3
0 A 2 0
1 A 1 1
2 B 9 9
5 C 4 3
4 D 7 2
3 NaN 8 4
```
Sort by multiple columns
```pycon
>>> df.sort_values(by=['col1', 'col2']).execute()
col1 col2 col3
1 A 1 1
0 A 2 0
2 B 9 9
5 C 4 3
4 D 7 2
3 NaN 8 4
```
Sort Descending
```pycon
>>> df.sort_values(by='col1', ascending=False).execute()
col1 col2 col3
4 D 7 2
5 C 4 3
2 B 9 9
0 A 2 0
1 A 1 1
3 NaN 8 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.stack.md
# maxframe.dataframe.DataFrame.stack
#### DataFrame.stack(level=-1, dropna=True)
Stack the prescribed level(s) from columns to index.
Return a reshaped DataFrame or Series having a multi-level
index with one or more new inner-most levels compared to the current
DataFrame. The new inner-most levels are created by pivoting the
columns of the current dataframe:
> - if the columns have a single level, the output is a Series;
> - if the columns have multiple levels, the new index
> level(s) is (are) taken from the prescribed level(s) and
> the output is a DataFrame.
* **Parameters:**
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *default -1*) – Level(s) to stack from the column axis onto the index
axis, defined as one index or label, or a list of indices
or labels.
* **dropna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Whether to drop rows in the resulting Frame/Series with
missing values. Stacking a column level onto the index
axis can create combinations of index and column values
that are missing from the original dataframe. See Examples
section.
* **Returns:**
Stacked dataframe or series.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`DataFrame.unstack`](maxframe.dataframe.DataFrame.unstack.md#maxframe.dataframe.DataFrame.unstack)
: Unstack prescribed level(s) from index axis onto column axis.
[`DataFrame.pivot`](maxframe.dataframe.DataFrame.pivot.md#maxframe.dataframe.DataFrame.pivot)
: Reshape dataframe from long format to wide format.
[`DataFrame.pivot_table`](maxframe.dataframe.DataFrame.pivot_table.md#maxframe.dataframe.DataFrame.pivot_table)
: Create a spreadsheet-style pivot table as a DataFrame.
### Notes
The function is named by analogy with a collection of books
being reorganized from being side by side on a horizontal
position (the columns of the dataframe) to being stacked
vertically on top of each other (in the index of the
dataframe).
### Examples
**Single level columns**
```pycon
>>> import maxframe.dataframe as md
>>> df_single_level_cols = md.DataFrame([[0, 1], [2, 3]],
... index=['cat', 'dog'],
... columns=['weight', 'height'])
```
Stacking a dataframe with a single level column axis returns a Series:
```pycon
>>> df_single_level_cols.execute()
weight height
cat 0 1
dog 2 3
>>> df_single_level_cols.stack().execute()
cat weight 0
height 1
dog weight 2
height 3
dtype: int64
```
**Multi level columns: simple case**
```pycon
>>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
... ('weight', 'pounds')])
>>> df_multi_level_cols1 = md.DataFrame([[1, 2], [2, 4]],
... index=['cat', 'dog'],
... columns=multicol1)
```
Stacking a dataframe with a multi-level column axis:
```pycon
>>> df_multi_level_cols1.execute()
weight
kg pounds
cat 1 2
dog 2 4
>>> df_multi_level_cols1.stack().execute()
weight
cat kg 1
pounds 2
dog kg 2
pounds 4
```
**Missing values**
```pycon
>>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
... ('height', 'm')])
>>> df_multi_level_cols2 = md.DataFrame([[1.0, 2.0], [3.0, 4.0]],
... index=['cat', 'dog'],
... columns=multicol2)
```
It is common to have missing values when stacking a dataframe
with multi-level columns, as the stacked dataframe typically
has more values than the original dataframe. Missing values
are filled with NaNs:
```pycon
>>> df_multi_level_cols2.execute()
weight height
kg m
cat 1.0 2.0
dog 3.0 4.0
>>> df_multi_level_cols2.stack().execute()
height weight
cat kg NaN 1.0
m 2.0 NaN
dog kg NaN 3.0
m 4.0 NaN
```
**Prescribing the level(s) to be stacked**
The first parameter controls which level or levels are stacked:
```pycon
>>> df_multi_level_cols2.stack(0).execute()
kg m
cat height NaN 2.0
weight 1.0 NaN
dog height NaN 4.0
weight 3.0 NaN
>>> df_multi_level_cols2.stack([0, 1]).execute()
cat height m 2.0
weight kg 1.0
dog height m 4.0
weight kg 3.0
dtype: float64
```
**Dropping missing values**
```pycon
>>> df_multi_level_cols3 = md.DataFrame([[None, 1.0], [2.0, 3.0]],
... index=['cat', 'dog'],
... columns=multicol2)
```
Note that rows where all values are missing are dropped by
default but this behaviour can be controlled via the dropna
keyword parameter:
```pycon
>>> df_multi_level_cols3.execute()
weight height
kg m
cat NaN 1.0
dog 2.0 3.0
>>> df_multi_level_cols3.stack(dropna=False).execute()
height weight
cat kg NaN NaN
m 1.0 NaN
dog kg NaN 2.0
m 3.0 NaN
>>> df_multi_level_cols3.stack(dropna=True).execute()
height weight
cat m 1.0 NaN
dog kg NaN 2.0
m 3.0 NaN
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.std.md
# maxframe.dataframe.DataFrame.std
#### DataFrame.std(axis=None, skipna=True, level=None, ddof=1, numeric_only=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.sub.md
# maxframe.dataframe.DataFrame.sub
#### DataFrame.sub(other, axis='columns', level=None, fill_value=None)
Get Subtraction of dataframe and other, element-wise (binary operator subtract).
Equivalent to `-`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rsubtract.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](maxframe.dataframe.DataFrame.truediv.md#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.sum.md
# maxframe.dataframe.DataFrame.sum
#### DataFrame.sum(axis=None, skipna=True, level=None, min_count=0, numeric_only=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.swaplevel.md
# maxframe.dataframe.DataFrame.swaplevel
#### DataFrame.swaplevel(i=-2, j=-1, axis=0)
Swap levels i and j in a `MultiIndex`.
Default is to swap the two innermost levels of the index.
* **Parameters:**
* **i** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Levels of the indices to be swapped. Can pass level name as string.
* **j** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Levels of the indices to be swapped. Can pass level name as string.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – The axis to swap levels on. 0 or ‘index’ for row-wise, 1 or
‘columns’ for column-wise.
* **Returns:**
DataFrame with levels swapped in MultiIndex.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(
... {"Grade": ["A", "B", "A", "C"]},
... index=[
... ["Final exam", "Final exam", "Coursework", "Coursework"],
... ["History", "Geography", "History", "Geography"],
... ["January", "February", "March", "April"],
... ],
... )
>>> df.execute()
Grade
Final exam History January A
Geography February B
Coursework History March A
Geography April C
```
In the following example, we will swap the levels of the indices.
Here, we will swap the levels column-wise, but levels can be swapped row-wise
in a similar manner. Note that column-wise is the default behaviour.
By not supplying any arguments for i and j, we swap the last and second to
last indices.
```pycon
>>> df.swaplevel().execute()
Grade
Final exam January History A
February Geography B
Coursework March History A
April Geography C
```
By supplying one argument, we can choose which index to swap the last
index with. We can for example swap the first index with the last one as
follows.
```pycon
>>> df.swaplevel(0).execute()
Grade
January History Final exam A
February Geography Final exam B
March History Coursework A
April Geography Coursework C
```
We can also define explicitly which indices we want to swap by supplying values
for both i and j. Here, we for example swap the first and second indices.
```pycon
>>> df.swaplevel(0, 1).execute()
Grade
History Final exam January A
Geography Final exam February B
History Coursework March A
Geography Coursework April C
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.tail.md
# maxframe.dataframe.DataFrame.tail
#### DataFrame.tail(n=5)
Return the last n rows.
This function returns last n rows from the object based on
position. It is useful for quickly verifying data, for example,
after sorting or appending rows.
For negative values of n, this function returns all rows except
the first n rows, equivalent to `df[n:]`.
* **Parameters:**
**n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 5*) – Number of rows to select.
* **Returns:**
The last n rows of the caller object.
* **Return type:**
[type](https://docs.python.org/3/library/functions.html#type) of caller
#### SEE ALSO
[`DataFrame.head`](maxframe.dataframe.DataFrame.head.md#maxframe.dataframe.DataFrame.head)
: The first n rows of the caller object.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df.execute()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
```
Viewing the last 5 lines
```pycon
>>> df.tail().execute()
animal
4 monkey
5 parrot
6 shark
7 whale
8 zebra
```
Viewing the last n lines (three in this case)
```pycon
>>> df.tail(3).execute()
animal
6 shark
7 whale
8 zebra
```
For negative values of n
```pycon
>>> df.tail(-3).execute()
animal
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.take.md
# maxframe.dataframe.DataFrame.take
#### DataFrame.take(indices, axis=0, \*\*kwargs)
Return the elements in the given *positional* indices along an axis.
This means that we are not indexing according to actual values in
the index attribute of the object. We are indexing according to the
actual position of the element in the object.
* **Parameters:**
* **indices** (*array-like*) – An array of ints indicating which positions to take.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'* *,* *None}* *,* *default 0*) – The axis on which to select elements. `0` means that we are
selecting rows, `1` means that we are selecting columns.
For Series this parameter is unused and defaults to 0.
* **\*\*kwargs** – For compatibility with `numpy.take()`. Has no effect on the
output.
* **Returns:**
An array-like containing the elements taken from the object.
* **Return type:**
same type as caller
#### SEE ALSO
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Select a subset of a DataFrame by labels.
[`DataFrame.iloc`](maxframe.dataframe.DataFrame.iloc.md#maxframe.dataframe.DataFrame.iloc)
: Select a subset of a DataFrame by positions.
[`numpy.take`](https://numpy.org/doc/stable/reference/generated/numpy.take.html#numpy.take)
: Take elements from an array along an axis.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([('falcon', 'bird', 389.0),
... ('parrot', 'bird', 24.0),
... ('lion', 'mammal', 80.5),
... ('monkey', 'mammal', mt.nan)],
... columns=['name', 'class', 'max_speed'],
... index=[0, 2, 3, 1])
>>> df.execute()
name class max_speed
0 falcon bird 389.0
2 parrot bird 24.0
3 lion mammal 80.5
1 monkey mammal NaN
```
Take elements at positions 0 and 3 along the axis 0 (default).
Note how the actual indices selected (0 and 1) do not correspond to
our selected indices 0 and 3. That’s because we are selecting the 0th
and 3rd rows, not rows whose indices equal 0 and 3.
```pycon
>>> df.take([0, 3]).execute()
name class max_speed
0 falcon bird 389.0
1 monkey mammal NaN
```
Take elements at indices 1 and 2 along the axis 1 (column selection).
```pycon
>>> df.take([1, 2], axis=1).execute()
class max_speed
0 bird 389.0
2 bird 24.0
3 mammal 80.5
1 mammal NaN
```
We may take elements using negative integers for positive indices,
starting from the end of the object, just like with Python lists.
```pycon
>>> df.take([-1, -2]).execute()
name class max_speed
1 monkey mammal NaN
3 lion mammal 80.5
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.to_clipboard.md
# maxframe.dataframe.DataFrame.to_clipboard
#### DataFrame.to_clipboard(, excel=True, sep=None, batch_size=10000, session=None, \*\*kwargs)
Copy object to the system clipboard.
Write a text representation of object to the system clipboard.
This can be pasted into Excel, for example.
* **Parameters:**
* **excel** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) –
Produce output in a csv format for easy pasting into excel.
- True, use the provided separator for csv pasting.
- False, write a string representation of the object to the clipboard.
* **sep** (str, default `' '`) – Field delimiter.
* **\*\*kwargs** – These parameters will be passed to DataFrame.to_csv.
#### SEE ALSO
[`DataFrame.to_csv`](maxframe.dataframe.DataFrame.to_csv.md#maxframe.dataframe.DataFrame.to_csv)
: Write a DataFrame to a comma-separated values (csv) file.
[`read_clipboard`](maxframe.dataframe.read_clipboard.md#maxframe.dataframe.read_clipboard)
: Read text from clipboard and pass to read_csv.
### Notes
Requirements for your platform.
> - Linux : xclip, or xsel (with PyQt4 modules)
> - Windows : none
> - macOS : none
This method uses the processes developed for the package pyperclip. A
solution to render any output string format is given in the examples.
### Examples
Copy the contents of a DataFrame to the clipboard.
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])
```
```pycon
>>> df.to_clipboard(sep=',')
... # Wrote the following to the system clipboard:
... # ,A,B,C
... # 0,1,2,3
... # 1,4,5,6
```
We can omit the index by passing the keyword index and setting
it to false.
```pycon
>>> df.to_clipboard(sep=',', index=False)
... # Wrote the following to the system clipboard:
... # A,B,C
... # 1,2,3
... # 4,5,6
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.to_csv.md
# maxframe.dataframe.DataFrame.to_csv
#### DataFrame.to_csv(path, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', lineterminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.', partition_cols=None, storage_options=None, \*\*kw)
Write object to a comma-separated values (csv) file.
* **Parameters:**
* **path** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – File path.
If path is a string with wildcard e.g. ‘/to/path/out-
```
*
```
.csv’,
to_csv will try to write multiple files, for instance,
chunk (0, 0) will write data into ‘/to/path/out-0.csv’.
If path is a string without wildcard,
all data will be written into a single file.
* **sep** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '* *,* *'*) – String of length 1. Field delimiter for the output file.
* **na_rep** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default ''*) – Missing data representation.
* **float_format** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Format string for floating point numbers.
* **columns** (*sequence* *,* *optional*) – Columns to write.
* **header** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default True*) – Write out the column names. If a list of strings is given it is
assumed to be aliases for the column names.
* **index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Write row names (index).
* **index_label** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *sequence* *, or* *False* *,* *default None*) – Column label for index column(s) if desired. If None is given, and
header and index are True, then the index names are used. A
sequence should be given if the object uses MultiIndex. If
False do not print fields for index names. Use index_label=False
for easier importing in R.
* **mode** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Python write mode, default ‘w’.
* **encoding** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – A string representing the encoding to use in the output file,
defaults to ‘utf-8’.
* **compression** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *default 'infer'*) – If str, represents compression mode. If dict, value at ‘method’ is
the compression mode. Compression mode may be any of the following
possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If
compression mode is ‘infer’ and path_or_buf is path-like, then
detect compression mode from the following extensions: ‘.gz’,
‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression). If dict given
and mode is ‘zip’ or inferred as ‘zip’, other entries passed as
additional compression options.
* **quoting** (*optional constant from csv module*) – Defaults to csv.QUOTE_MINIMAL. If you have set a float_format
then floats are converted to strings and thus csv.QUOTE_NONNUMERIC
will treat them as non-numeric.
* **quotechar** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '"'*) – String of length 1. Character used to quote fields.
* **lineterminator** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – The newline character or character sequence to use in the output
file. Defaults to os.linesep, which depends on the OS in which
this method is called (’n’ for linux, ‘rn’ for Windows, i.e.).
* **chunksize** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *None*) – Rows to write at a time.
* **date_format** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Format string for datetime objects.
* **doublequote** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Control quoting of quotechar inside a field.
* **escapechar** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – String of length 1. Character used to escape sep and quotechar
when appropriate.
* **decimal** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '.'*) – Character recognized as decimal separator. E.g. use ‘,’ for
European data.
* **partition_cols** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *optional* *,* *default None*) – Column names by which to partition the dataset.
Columns are partitioned in the order they are given.
* **Returns:**
If path_or_buf is None, returns the resulting csv format as a
string. Otherwise returns None.
* **Return type:**
None or [str](https://docs.python.org/3/library/stdtypes.html#str)
#### SEE ALSO
[`read_csv`](maxframe.dataframe.read_csv.md#maxframe.dataframe.read_csv)
: Load a CSV file into a DataFrame.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'name': ['Raphael', 'Donatello'],
... 'mask': ['red', 'purple'],
... 'weapon': ['sai', 'bo staff']})
>>> df.to_csv('out.csv', index=False).execute()
>>> # Write partitioned dataset
>>> df.to_csv('dataset', partition_cols=['mask']).execute()
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.to_dict.md
# maxframe.dataframe.DataFrame.to_dict
#### DataFrame.to_dict(orient='dict', into=<class 'dict'>, index=True, batch_size=10000, session=None)
Convert the DataFrame to a dictionary.
The type of the key-value pairs can be customized with the parameters
(see below).
* **Parameters:**
* **orient** (*str {'dict'* *,* *'list'* *,* *'series'* *,* *'split'* *,* *'tight'* *,* *'records'* *,* *'index'}*) –
Determines the type of the values of the dictionary.
- ’dict’ (default) : dict like {column -> {index -> value}}
- ’list’ : dict like {column -> [values]}
- ’series’ : dict like {column -> Series(values)}
- ’split’ : dict like
{‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
- ’tight’ : dict like
{‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values],
‘index_names’ -> [index.names], ‘column_names’ -> [column.names]}
- ’records’ : list like
[{column -> value}, … , {column -> value}]
- ’index’ : dict like {index -> {column -> value}}
* **into** (*class* *,* *default dict*) – The collections.abc.MutableMapping subclass used for all Mappings
in the return value. Can be the actual class or an empty
instance of the mapping type you want. If you want a
collections.defaultdict, you must pass it initialized.
* **index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Whether to include the index item (and index_names item if orient
is ‘tight’) in the returned dictionary. Can only be `False`
when orient is ‘split’ or ‘tight’.
* **Returns:**
Return a collections.abc.MutableMapping object representing the
DataFrame. The resulting transformation depends on the orient
parameter.
* **Return type:**
[dict](https://docs.python.org/3/library/stdtypes.html#dict), [list](https://docs.python.org/3/library/stdtypes.html#list) or [collections.abc.MutableMapping](https://docs.python.org/3/library/collections.abc.html#collections.abc.MutableMapping)
#### SEE ALSO
[`DataFrame.from_dict`](maxframe.dataframe.DataFrame.from_dict.md#maxframe.dataframe.DataFrame.from_dict)
: Create a DataFrame from a dictionary.
[`DataFrame.to_json`](maxframe.dataframe.DataFrame.to_json.md#maxframe.dataframe.DataFrame.to_json)
: Convert a DataFrame to JSON format.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'col1': [1, 2],
... 'col2': [0.5, 0.75]},
... index=['row1', 'row2'])
>>> df.execute()
col1 col2
row1 1 0.50
row2 2 0.75
>>> df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}
```
You can specify the return orientation.
```pycon
>>> df.to_dict('series')
{'col1': row1 1
row2 2
Name: col1, dtype: int64,
'col2': row1 0.50
row2 0.75
Name: col2, dtype: float64}
```
```pycon
>>> df.to_dict('split')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
'data': [[1, 0.5], [2, 0.75]]}
```
```pycon
>>> df.to_dict('records')
[{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]
```
```pycon
>>> df.to_dict('index')
{'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}
```
```pycon
>>> df.to_dict('tight')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}
```
You can also specify the mapping type.
```pycon
>>> from collections import OrderedDict, defaultdict
>>> df.to_dict(into=OrderedDict)
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
```
If you want a defaultdict, you need to initialize it:
```pycon
>>> dd = defaultdict(list)
>>> df.to_dict('records', into=dd)
[defaultdict(<class 'list'>, {'col1': 1, 'col2': 0.5}),
defaultdict(<class 'list'>, {'col1': 2, 'col2': 0.75})]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.to_json.md
# maxframe.dataframe.DataFrame.to_json
#### DataFrame.to_json(path: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, orient: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, date_format: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, double_precision: [int](https://docs.python.org/3/library/functions.html#int) = 10, force_ascii: [bool](https://docs.python.org/3/library/functions.html#bool) = True, date_unit: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = 'ms', default_handler: callable | [None](https://docs.python.org/3/library/constants.html#None) = None, lines: [bool](https://docs.python.org/3/library/functions.html#bool) = False, compression: [str](https://docs.python.org/3/library/stdtypes.html#str) | [Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)] | [None](https://docs.python.org/3/library/constants.html#None) = 'infer', index: [bool](https://docs.python.org/3/library/functions.html#bool) | [None](https://docs.python.org/3/library/constants.html#None) = None, indent: [int](https://docs.python.org/3/library/functions.html#int) | [None](https://docs.python.org/3/library/constants.html#None) = None, storage_options: [Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)] | [None](https://docs.python.org/3/library/constants.html#None) = None, partition_cols: [str](https://docs.python.org/3/library/stdtypes.html#str) | [list](https://docs.python.org/3/library/stdtypes.html#list) | [None](https://docs.python.org/3/library/constants.html#None) = None, \*\*kwargs)
Convert the object to a JSON string.
Note NaN’s and None will be converted to null and datetime objects
will be converted to UNIX timestamps.
* **Parameters:**
* **path** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *path object* *,* *file-like object* *, or* *None* *,* *default None*) – String, path object (implementing os.PathLike[str]), or file-like
object implementing a write() function. If None, the result is
returned as a string.
* **orient** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) –
Indication of expected JSON string format.
* Series:
> - default is ‘index’
> - allowed values are: {‘split’, ‘records’, ‘index’, ‘table’}.
* DataFrame:
> - default is ‘columns’
> - allowed values are: {‘split’, ‘records’, ‘index’, ‘columns’,
> ‘values’, ‘table’}.
* The format of the JSON string:
> - ’split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns],
> ‘data’ -> [values]}
> - ’records’ : list like [{column -> value}, … , {column -> value}]
> - ’index’ : dict like {index -> {column -> value}}
> - ’columns’ : dict like {column -> {index -> value}}
> - ’values’ : just the values array
> - ’table’ : dict like {‘schema’: {schema}, ‘data’: {data}}
> Describing the data, where data component is like `orient='records'`.
* **date_format** ( *{None* *,* *'epoch'* *,* *'iso'}*) – Type of date conversion. ‘epoch’ = epoch milliseconds,
‘iso’ = ISO8601. The default depends on the orient. For
`orient='table'`, the default is ‘iso’. For all other orients,
the default is ‘epoch’.
* **double_precision** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 10*) – The number of decimal places to use when encoding
floating point numbers.
* **force_ascii** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Force encoded string to be ASCII.
* **date_unit** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 'ms'* *(**milliseconds* *)*) – The time unit to encode to, governs timestamp and ISO8601
precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond,
microsecond, and nanosecond respectively.
* **default_handler** (*callable* *,* *default None*) – Handler to call if object cannot otherwise be converted to a
suitable format for JSON. Should receive a single argument which is
the object to convert and return a serializable object.
* **lines** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If ‘orient’ is ‘records’ write out line-delimited json format. Will
throw ValueError if incorrect ‘orient’ is used.
* **compression** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *default 'infer'*) – For on-the-fly compression of the output data. If str, represents
compression mode. If dict, value at ‘method’ is the compression mode.
Compression mode may be any of the following possible
values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression
mode is ‘infer’ and path_or_buf is path-like, then detect
compression mode from the following extensions: ‘.gz’, ‘.bz2’,
‘.zip’ or ‘.xz’. (otherwise no compression). If dict given and
mode is one of {‘zip’, ‘xz’}, other entries passed as
additional compression options.
* **index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default None*) – Whether to include the index values in the JSON string. Not
including the index (`index=False`) is only supported when
orient is ‘split’ or ‘table’.
* **indent** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Length of whitespace used to indent each record.
* **partition_cols** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *optional* *,* *default None*) – Column names by which to partition the dataset.
Columns are partitioned in the order they are given.
#### SEE ALSO
[`read_json`](maxframe.dataframe.read_json.md#maxframe.dataframe.read_json)
: Convert a JSON string to pandas object.
### Notes
The behavior of `indent=0` varies from the stdlib, which does not
indent the output but does insert newlines. Currently, `indent=0`
and the default `indent=None` are equivalent in pandas, though this
may change in a future release.
`orient='table'` contains a ‘pandas_version’ field under ‘schema’.
This stores the version of pandas used in the latest revision of the
schema.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([['a', 'b'], ['c', 'd']],
... index=['row 1', 'row 2'],
... columns=['col 1', 'col 2'])
>>> df.to_json('data.json')
>>> # Writing to a file with orient='records'
>>> df.to_json('records.json', orient='records')
>>> # Writing in line-delimited json format
>>> df.to_json('ldjson.json', orient='records', lines=True)
>>> # Write partitioned dataset
>>> df.to_json('dataset', partition_cols=['col 1'])
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.to_lance.md
# maxframe.dataframe.DataFrame.to_lance
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.to_odps_table.md
# maxframe.dataframe.DataFrame.to_odps_table
#### DataFrame.to_odps_table(table: Table | [str](https://docs.python.org/3/library/stdtypes.html#str), partition: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, partition_col: [None](https://docs.python.org/3/library/constants.html#None) | [str](https://docs.python.org/3/library/stdtypes.html#str) | [List](https://docs.python.org/3/library/typing.html#typing.List)[[str](https://docs.python.org/3/library/stdtypes.html#str)] = None, overwrite: [bool](https://docs.python.org/3/library/functions.html#bool) = False, unknown_as_string: [bool](https://docs.python.org/3/library/functions.html#bool) | [None](https://docs.python.org/3/library/constants.html#None) = True, index: [bool](https://docs.python.org/3/library/functions.html#bool) = True, index_label: [None](https://docs.python.org/3/library/constants.html#None) | [str](https://docs.python.org/3/library/stdtypes.html#str) | [List](https://docs.python.org/3/library/typing.html#typing.List)[[str](https://docs.python.org/3/library/stdtypes.html#str)] = None, lifecycle: [int](https://docs.python.org/3/library/functions.html#int) | [None](https://docs.python.org/3/library/constants.html#None) = None, table_properties: [dict](https://docs.python.org/3/library/stdtypes.html#dict) | [None](https://docs.python.org/3/library/constants.html#None) = None, primary_key: [None](https://docs.python.org/3/library/constants.html#None) | [str](https://docs.python.org/3/library/stdtypes.html#str) | [List](https://docs.python.org/3/library/typing.html#typing.List)[[str](https://docs.python.org/3/library/stdtypes.html#str)] = None, odps_types: [Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [str](https://docs.python.org/3/library/stdtypes.html#str)] | [None](https://docs.python.org/3/library/constants.html#None) = None)
Write DataFrame object into a MaxCompute (ODPS) table.
You need to provide the name of the table to write to. If you want to store
data into a specific partitioned of a table, argument partition can be used.
You can also use partition_col to specify DataFrame columns as partition
columns, and data in the DataFrame will be grouped by these columns and
inserted into partitions the values of these columns.
If the table does not exist, to_odps_table will create one.
Column names for indexes is determined by index_label argument. If the
argument is absent, names of the levels is used if they are not None, or
default names will be used. The default name for indexes with only one level
will be index, and for indexes with multiple levels, the name will be
level_x while x is the index of the level.
* **Parameters:**
* **table** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Name ot the table to write DataFrame into
* **partition** (*Optional* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *]*) – Spec of the partition to write to, can be ‘pt1=xxx,pt2=yyy’
* **partition_col** (*Union* *[**None* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *List* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *]* *]*) – Name of columns in DataFrame as partition columns.
* **overwrite** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – Overwrite data if the table / partition already exists.
* **unknown_as_string** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – If True, object type in the DataFrame will be treated as strings.
Otherwise errors might be raised.
* **index** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – If True, indexes will be stored. Otherwise they are ignored.
* **index_label** (*Union* *[**None* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *List* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *]* *]*) – Specify column names for index levels. If absent, level names or default
names will be used.
* **lifecycle** (*Optional* *[*[*int*](https://docs.python.org/3/library/functions.html#int) *]*) – Specify lifecycle of the output table.
* **table_properties** (*Optional* *[*[*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *]*) – Specify properties of the output table.
* **primary_key** (*Union* *[**None* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *List* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *]* *]*) – If provided and target table does not exist, target table
will be a delta table with columns specified in this argument
as primary key.
* **Returns:**
**result** – Stub DataFrame for execution.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Notes
to_odps_table returns a stub object for execution. The result returned is
not reusable.
### Examples
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.to_pandas.md
# maxframe.dataframe.DataFrame.to_pandas
#### DataFrame.to_pandas(session=None, \*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.to_parquet.md
# maxframe.dataframe.DataFrame.to_parquet
#### DataFrame.to_parquet(path, engine='auto', compression='snappy', index=None, partition_cols=None, storage_options: [dict](https://docs.python.org/3/library/stdtypes.html#dict) = None, \*\*kwargs)
Write a DataFrame to the binary parquet format, each chunk will be
written to a Parquet file.
* **Parameters:**
* **path** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *file-like object*) – If path is a string with wildcard e.g. ‘/to/path/out-
```
*
```
.parquet’,
to_parquet will try to write multiple files, for instance,
chunk (0, 0) will write data into ‘/to/path/out-0.parquet’.
If path is a string without wildcard and partition_cols is None,
all data will be written into a single file.
If path is a string without wildcard or partition_cols is not None,
we will treat it as a directory.
* **engine** ( *{'auto'* *,* *'pyarrow'* *,* *'fastparquet'}* *,* *default 'auto'*) – Parquet library to use. The default behavior is to try ‘pyarrow’,
falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.
* **compression** ( *{'snappy'* *,* *'gzip'* *,* *'brotli'* *,* *None}* *,* *default 'snappy'*) – Name of the compression to use. Use `None` for no compression.
* **index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default None*) – If `True`, include the dataframe’s index(es) in the file output.
If `False`, they will not be written to the file.
If `None`, similar to `True` the dataframe’s index(es)
will be saved. However, instead of being saved as values,
the RangeIndex will be stored as a range in the metadata so it
doesn’t require much space and is faster. Other indexes will
be included as columns in the file output.
* **partition_cols** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *optional* *,* *default None*) – Column names by which to partition the dataset.
Columns are partitioned in the order they are given.
Must be None if path is not a string.
* **\*\*kwargs** – Additional arguments passed to the parquet library.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('*.parquet.gzip',
... compression='gzip').execute()
>>> md.read_parquet('*.parquet.gzip').execute()
col1 col2
0 1 3
1 2 4
```
```pycon
>>> import io
>>> f = io.BytesIO()
>>> df.to_parquet(f).execute()
>>> f.seek(0)
0
>>> content = f.read()
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.transform.md
# maxframe.dataframe.DataFrame.transform
#### DataFrame.transform(func, axis=0, \*args, dtypes=None, skip_infer=False, \*\*kwargs)
Call `func` on self producing a DataFrame with transformed values.
Produced DataFrame will have same axis length as self.
* **Parameters:**
* **func** (*function* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) –
Function to use for transforming the data. If a function, must either
work when passed a DataFrame or when passed to DataFrame.apply.
Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. `[np.exp. 'sqrt']`
- dict of axis labels -> functions, function names or list of such.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – If 0 or ‘index’: apply function to each column.
If 1 or ‘columns’: apply function to each row.
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *default None*) – Specify dtypes of returned DataFrames. See Notes for more details.
* **skip_infer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether infer dtypes when dtypes or output_type is not specified.
* **\*args** – Positional arguments to pass to func.
* **\*\*kwargs** – Keyword arguments to pass to func.
* **Returns:**
A DataFrame that must have the same length as self.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
:raises ValueError : If the returned DataFrame has a different length than self.:
#### SEE ALSO
[`DataFrame.agg`](maxframe.dataframe.DataFrame.agg.md#maxframe.dataframe.DataFrame.agg)
: Only perform aggregating type operations.
[`DataFrame.apply`](maxframe.dataframe.DataFrame.apply.md#maxframe.dataframe.DataFrame.apply)
: Invoke function on a DataFrame.
### Notes
When deciding output dtypes and shape of the return value, MaxFrame will
try applying `func` onto a mock DataFrame and the apply call may
fail. When this happens, you need to specify a list or a pandas
Series as `dtypes` of output DataFrame.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A': range(3), 'B': range(1, 4)})
>>> df.execute()
A B
0 0 1
1 1 2
2 2 3
>>> df.transform(lambda x: x + 1).execute()
A B
0 1 2
1 2 3
2 3 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.truediv.md
# maxframe.dataframe.DataFrame.truediv
#### DataFrame.truediv(other, axis='columns', level=None, fill_value=None)
Get Floating division of dataframe and other, element-wise (binary operator truediv).
Equivalent to `/`, but with support to substitute a fill_value
for missing data in one of the inputs. With reverse version, rtruediv.
Among flexible wrappers (add, sub, mul, div, mod, pow) to
arithmetic operators: +, -, \*, /, //, %, \*\*.
* **Parameters:**
* **other** (*scalar* *,* *sequence* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Any single or multiple element data structure, or list-like object.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Whether to compare by the index (0 or ‘index’) or columns
(1 or ‘columns’). For Series input, axis to match Series index on.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *None* *,* *default None*) – Fill existing missing (NaN) values, and any new element needed for
successful DataFrame alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
the result will be missing.
* **Returns:**
Result of the arithmetic operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.add`](maxframe.dataframe.DataFrame.add.md#maxframe.dataframe.DataFrame.add)
: Add DataFrames.
[`DataFrame.sub`](maxframe.dataframe.DataFrame.sub.md#maxframe.dataframe.DataFrame.sub)
: Subtract DataFrames.
[`DataFrame.mul`](maxframe.dataframe.DataFrame.mul.md#maxframe.dataframe.DataFrame.mul)
: Multiply DataFrames.
[`DataFrame.div`](maxframe.dataframe.DataFrame.div.md#maxframe.dataframe.DataFrame.div)
: Divide DataFrames (float division).
[`DataFrame.truediv`](#maxframe.dataframe.DataFrame.truediv)
: Divide DataFrames (float division).
[`DataFrame.floordiv`](maxframe.dataframe.DataFrame.floordiv.md#maxframe.dataframe.DataFrame.floordiv)
: Divide DataFrames (integer division).
[`DataFrame.mod`](maxframe.dataframe.DataFrame.mod.md#maxframe.dataframe.DataFrame.mod)
: Calculate modulo (remainder after division).
[`DataFrame.pow`](maxframe.dataframe.DataFrame.pow.md#maxframe.dataframe.DataFrame.pow)
: Calculate exponential power.
### Notes
Mismatched indices will be unioned together.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'angles': [0, 3, 4],
... 'degrees': [360, 180, 360]},
... index=['circle', 'triangle', 'rectangle'])
>>> df.execute()
angles degrees
circle 0 360
triangle 3 180
rectangle 4 360
```
Add a scalar with operator version which return the same
results.
```pycon
>>> (df + 1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
```pycon
>>> df.add(1).execute()
angles degrees
circle 1 361
triangle 4 181
rectangle 5 361
```
Divide by constant with reverse version.
```pycon
>>> df.div(10).execute()
angles degrees
circle 0.0 36.0
triangle 0.3 18.0
rectangle 0.4 36.0
```
```pycon
>>> df.rdiv(10).execute()
angles degrees
circle inf 0.027778
triangle 3.333333 0.055556
rectangle 2.500000 0.027778
```
Subtract a list and Series by axis with operator version.
```pycon
>>> (df - [1, 2]).execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub([1, 2], axis='columns').execute()
angles degrees
circle -1 358
triangle 2 178
rectangle 3 358
```
```pycon
>>> df.sub(md.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
... axis='index').execute()
angles degrees
circle -1 359
triangle 2 179
rectangle 3 359
```
Multiply a DataFrame of different shape with operator version.
```pycon
>>> other = md.DataFrame({'angles': [0, 3, 4]},
... index=['circle', 'triangle', 'rectangle'])
>>> other.execute()
angles
circle 0
triangle 3
rectangle 4
```
```pycon
>>> df.mul(other, fill_value=0).execute()
angles degrees
circle 0 0.0
triangle 9 0.0
rectangle 16 0.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.truncate.md
# maxframe.dataframe.DataFrame.truncate
#### DataFrame.truncate(before=None, after=None, axis=0, copy=None)
Truncate a Series or DataFrame before and after some index value.
This is a useful shorthand for boolean indexing based on index
values above or below certain thresholds.
* **Parameters:**
* **before** (*date* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*int*](https://docs.python.org/3/library/functions.html#int)) – Truncate all rows before this index value.
* **after** (*date* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*int*](https://docs.python.org/3/library/functions.html#int)) – Truncate all rows after this index value.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *optional*) – Axis to truncate. Truncates the index (rows) by default.
For Series this parameter is unused and defaults to 0.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default is True* *,*) – This parameter is only kept for compatibility with pandas.
* **Returns:**
The truncated Series or DataFrame.
* **Return type:**
[type](https://docs.python.org/3/library/functions.html#type) of caller
#### SEE ALSO
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Select a subset of a DataFrame by label.
[`DataFrame.iloc`](maxframe.dataframe.DataFrame.iloc.md#maxframe.dataframe.DataFrame.iloc)
: Select a subset of a DataFrame by position.
### Notes
If the index being truncated contains only datetime values,
before and after may be specified as strings instead of
Timestamps.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
... 'B': ['f', 'g', 'h', 'i', 'j'],
... 'C': ['k', 'l', 'm', 'n', 'o']},
... index=[1, 2, 3, 4, 5])
>>> df.execute()
A B C
1 a f k
2 b g l
3 c h m
4 d i n
5 e j o
```
```pycon
>>> df.truncate(before=2, after=4).execute()
A B C
2 b g l
3 c h m
4 d i n
```
The columns of a DataFrame can be truncated.
```pycon
>>> df.truncate(before="A", after="B", axis="columns").execute()
A B
1 a f
2 b g
3 c h
4 d i
5 e j
```
For Series, only rows can be truncated.
```pycon
>>> df['A'].truncate(before=2, after=4).execute()
2 b
3 c
4 d
Name: A, dtype: object
```
The index values in `truncate` can be datetimes or string
dates.
```pycon
>>> dates = md.date_range('2016-01-01', '2016-02-01', freq='s')
>>> df = md.DataFrame(index=dates, data={'A': 1})
>>> df.tail().execute()
A
2016-01-31 23:59:56 1
2016-01-31 23:59:57 1
2016-01-31 23:59:58 1
2016-01-31 23:59:59 1
2016-02-01 00:00:00 1
```
```pycon
>>> df.truncate(before=md.Timestamp('2016-01-05'),
... after=md.Timestamp('2016-01-10')).tail().execute()
A
2016-01-09 23:59:56 1
2016-01-09 23:59:57 1
2016-01-09 23:59:58 1
2016-01-09 23:59:59 1
2016-01-10 00:00:00 1
```
Because the index is a DatetimeIndex containing only dates, we can
specify before and after as strings. They will be coerced to
Timestamps before truncation.
```pycon
>>> df.truncate('2016-01-05', '2016-01-10').tail().execute()
A
2016-01-09 23:59:56 1
2016-01-09 23:59:57 1
2016-01-09 23:59:58 1
2016-01-09 23:59:59 1
2016-01-10 00:00:00 1
```
Note that `truncate` assumes a 0 value for any unspecified time
component (midnight). This differs from partial string slicing, which
returns any partially matching dates.
```pycon
>>> df.loc['2016-01-05':'2016-01-10', :].tail().execute()
A
2016-01-10 23:59:55 1
2016-01-10 23:59:56 1
2016-01-10 23:59:57 1
2016-01-10 23:59:58 1
2016-01-10 23:59:59 1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.tshift.md
# maxframe.dataframe.DataFrame.tshift
#### DataFrame.tshift(periods: [int](https://docs.python.org/3/library/functions.html#int) = 1, freq=None, axis=0)
Shift the time index, using the index’s frequency if available.
* **Parameters:**
* **periods** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Number of periods to move, can be positive or negative.
* **freq** (*DateOffset* *,* *timedelta* *, or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Increment to use from the tseries module
or time rule expressed as a string (e.g. ‘EOM’).
* **axis** ( *{0* *or* *‘index’* *,* *1* *or* *‘columns’* *,* *None}* *,* *default 0*) – Corresponds to the axis that contains the Index.
* **Returns:**
**shifted**
* **Return type:**
Series/DataFrame
### Notes
If freq is not specified then tries to use the freq or inferred_freq
attributes of the index. If neither of those attributes exist, a
ValueError is thrown
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.unstack.md
# maxframe.dataframe.DataFrame.unstack
#### DataFrame.unstack(level=-1, fill_value=None)
Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.
* **Parameters:**
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *, or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *these* *,* *default last level*) – Level(s) to unstack, can pass level name.
* **fill_value** (*scalar value* *,* *default None*) – Value to use when replacing NaN values.
* **Returns:**
Unstacked Series.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3, 4],
... index=md.MultiIndex.from_product([['one', 'two'],
... ['a', 'b']]))
>>> s.execute()
one a 1
b 2
two a 3
b 4
dtype: int64
```
```pycon
>>> s.unstack(level=-1).execute()
a b
one 1 2
two 3 4
```
```pycon
>>> s.unstack(level=0).execute()
one two
a 1 3
b 2 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.update.md
# maxframe.dataframe.DataFrame.update
#### DataFrame.update(other, join='left', overwrite=True, filter_func=None, errors='ignore')
Modify in place using non-NA values from another DataFrame.
Aligns on indices. There is no return value.
* **Parameters:**
* **other** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *, or* *object coercible into a DataFrame*) – Should have at least one matching index/column label
with the original DataFrame. If a Series is passed,
its name attribute must be set, and that will be
used as the column name to align with the original DataFrame.
* **join** ( *{'left'}* *,* *default 'left'*) – Only left join is implemented, keeping the index and columns of the
original object.
* **overwrite** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) –
How to handle non-NA values for overlapping keys:
* True: overwrite original DataFrame’s values
with values from other.
* False: only update values that are NA in
the original DataFrame.
* **filter_func** (*callable* *(**1d-array* *)* *-> bool 1d-array* *,* *optional*) – Can choose to replace values other than NA. Return True for values
that should be updated.
* **errors** ( *{'raise'* *,* *'ignore'}* *,* *default 'ignore'*) – If ‘raise’, will raise a ValueError if the DataFrame and other
both contain non-NA data in the same place.
* **Returns:**
This method directly changes calling object.
* **Return type:**
None
* **Raises:**
* [**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) –
* When errors=’raise’ and there’s overlapping non-NA data.
\* When errors is not either ‘ignore’ or ‘raise’
* [**NotImplementedError**](https://docs.python.org/3/library/exceptions.html#NotImplementedError) –
* If join != ‘left’
#### SEE ALSO
[`dict.update`](https://docs.python.org/3/library/stdtypes.html#dict.update)
: Similar method for dictionaries.
[`DataFrame.merge`](maxframe.dataframe.DataFrame.merge.md#maxframe.dataframe.DataFrame.merge)
: For column(s)-on-column(s) operations.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A': [1, 2, 3],
... 'B': [400, 500, 600]})
>>> new_df = md.DataFrame({'B': [4, 5, 6],
... 'C': [7, 8, 9]})
>>> df.update(new_df)
>>> df.execute()
A B
0 1 4
1 2 5
2 3 6
```
The DataFrame’s length does not increase as a result of the update,
only values at matching index/column labels are updated.
```pycon
>>> df = md.DataFrame({'A': ['a', 'b', 'c'],
... 'B': ['x', 'y', 'z']})
>>> new_df = md.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']})
>>> df.update(new_df)
>>> df.execute()
A B
0 a d
1 b e
2 c f
```
```pycon
>>> df = md.DataFrame({'A': ['a', 'b', 'c'],
... 'B': ['x', 'y', 'z']})
>>> new_df = md.DataFrame({'B': ['d', 'f']}, index=[0, 2])
>>> df.update(new_df)
>>> df.execute()
A B
0 a d
1 b y
2 c f
```
For Series, its name attribute must be set.
```pycon
>>> df = md.DataFrame({'A': ['a', 'b', 'c'],
... 'B': ['x', 'y', 'z']})
>>> new_column = md.Series(['d', 'e', 'f'], name='B')
>>> df.update(new_column)
>>> df.execute()
A B
0 a d
1 b e
2 c f
```
If other contains NaNs the corresponding values are not updated
in the original dataframe.
```pycon
>>> df = md.DataFrame({'A': [1, 2, 3],
... 'B': [400., 500., 600.]})
>>> new_df = md.DataFrame({'B': [4, mt.nan, 6]})
>>> df.update(new_df)
>>> df.execute()
A B
0 1 4.0
1 2 500.0
2 3 6.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.value_counts.md
# maxframe.dataframe.DataFrame.value_counts
#### DataFrame.value_counts(subset=None, normalize=False, sort=True, ascending=False, dropna=True, method='auto')
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.var.md
# maxframe.dataframe.DataFrame.var
#### DataFrame.var(axis=None, skipna=True, level=None, ddof=1, numeric_only=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.where.md
# maxframe.dataframe.DataFrame.where
#### DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
Replace values where the condition is False.
* **Parameters:**
* **cond** (*bool Series/DataFrame* *,* *array-like* *, or* *callable*) – Where cond is False, keep the original value. Where
True, replace with corresponding value from other.
If cond is callable, it is computed on the Series/DataFrame and
should return boolean Series/DataFrame or array. The callable must
not change input Series/DataFrame (though pandas doesn’t check it).
* **other** (*scalar* *,* *Series/DataFrame* *, or* *callable*) – Entries where cond is True are replaced with
corresponding value from other.
If other is callable, it is computed on the Series/DataFrame and
should return scalar or Series/DataFrame. The callable must not
change input Series/DataFrame (though pandas doesn’t check it).
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether to perform the operation in place on the data.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Alignment axis if needed.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Alignment level if needed.
* **Return type:**
Same type as caller
#### SEE ALSO
[`DataFrame.mask()`](maxframe.dataframe.DataFrame.mask.md#maxframe.dataframe.DataFrame.mask)
: Return an object of same shape as self.
### Notes
The mask method is an application of the if-then idiom. For each
element in the calling DataFrame, if `cond` is `False` the
element is used; otherwise the corresponding element from the DataFrame
`other` is used.
The signature for [`DataFrame.where()`](#maxframe.dataframe.DataFrame.where) differs from
[`numpy.where()`](https://numpy.org/doc/stable/reference/generated/numpy.where.html#numpy.where). Roughly `df1.where(m, df2)` is equivalent to
`np.where(m, df1, df2)`.
For further details and examples see the `mask` documentation in
[indexing](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-where-mask).
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series(range(5))
>>> s.where(s > 0).execute()
0 NaN
1 1.0
2 2.0
3 3.0
4 4.0
dtype: float64
```
```pycon
>>> s.mask(s > 0).execute()
0 0.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
```
```pycon
>>> s.where(s > 1, 10).execute()
0 10
1 10
2 2
3 3
4 4
dtype: int64
```
```pycon
>>> df = md.DataFrame(mt.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df.execute()
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
>>> m = df % 3 == 0
>>> df.where(m, -df).execute()
A B
0 0 -1
1 -2 3
2 -4 -5
3 6 -7
4 -8 9
>>> df.where(m, -df) == mt.where(m, df, -df).execute()
A B
0 True True
1 True True
2 True True
3 True True
4 True True
>>> df.where(m, -df) == df.mask(~m, -df).execute()
A B
0 True True
1 True True
2 True True
3 True True
4 True True
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.DataFrame.xs.md
# maxframe.dataframe.DataFrame.xs
#### DataFrame.xs(key, axis=0, level=None, drop_level=True)
Return cross-section from the Series/DataFrame.
This method takes a key argument to select data at a particular
level of a MultiIndex.
* **Parameters:**
* **key** (*label* *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *label*) – Label contained in the index, or partially in a MultiIndex.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – Axis to retrieve cross-section on.
* **level** ([*object*](https://docs.python.org/3/library/functions.html#object) *,* *defaults to first n levels* *(**n=1* *or* *len* *(**key* *)* *)*) – In case of a key partially contained in a MultiIndex, indicate
which levels are used. Levels can be referred by label or position.
* **drop_level** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If False, returns object with same levels as self.
* **Returns:**
Cross-section from the original Series or DataFrame
corresponding to the selected index levels.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Access a group of rows and columns by label(s) or a boolean array.
[`DataFrame.iloc`](maxframe.dataframe.DataFrame.iloc.md#maxframe.dataframe.DataFrame.iloc)
: Purely integer-location based indexing for selection by position.
### Notes
xs can not be used to set values.
MultiIndex Slicers is a generic way to get/set values on
any level or levels.
It is a superset of xs functionality, see
[MultiIndex Slicers](https://pandas.pydata.org/docs/user_guide/advanced.html#advanced-mi-slicers).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> d = {'num_legs': [4, 4, 2, 2],
... 'num_wings': [0, 0, 2, 2],
... 'class': ['mammal', 'mammal', 'mammal', 'bird'],
... 'animal': ['cat', 'dog', 'bat', 'penguin'],
... 'locomotion': ['walks', 'walks', 'flies', 'walks']}
>>> df = md.DataFrame(data=d)
>>> df = df.set_index(['class', 'animal', 'locomotion'])
>>> df.execute()
num_legs num_wings
class animal locomotion
mammal cat walks 4 0
dog walks 4 0
bat flies 2 2
bird penguin walks 2 2
```
Get values at specified index
```pycon
>>> df.xs('mammal').execute()
num_legs num_wings
animal locomotion
cat walks 4 0
dog walks 4 0
bat flies 2 2
```
Get values at several indexes
```pycon
>>> df.xs(('mammal', 'dog')).execute()
num_legs num_wings
locomotion
walks 4 0
```
Get values at specified index and level
```pycon
>>> df.xs('cat', level=1).execute()
num_legs num_wings
class locomotion
mammal walks 4 0
```
Get values at several indexes and levels
```pycon
>>> df.xs(('bird', 'walks'),
... level=[0, 'locomotion']).execute()
num_legs num_wings
animal
penguin 2 2
```
Get values at specified column and axis
```pycon
>>> df.xs('num_wings', axis=1).execute()
class animal locomotion
mammal cat walks 0
dog walks 0
bat flies 2
bird penguin walks 2
Name: num_wings, dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.all.md
# maxframe.dataframe.Index.all
#### Index.all()
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.any.md
# maxframe.dataframe.Index.any
#### Index.any()
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.argmax.md
# maxframe.dataframe.Index.argmax
#### Index.argmax(axis=0, skipna=True, \*args, \*\*kwargs)
Return int position of the smallest value in the Series.
If the maximum is achieved in multiple locations,
the first row position is returned.
* **Parameters:**
* **axis** ( *{None}*) – Unused. Parameter needed for compatibility with DataFrame.
* **skipna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Exclude NA/null values when showing the result.
* **\*args** – Additional arguments and keywords for compatibility with NumPy.
* **\*\*kwargs** – Additional arguments and keywords for compatibility with NumPy.
* **Returns:**
Row position of the maximum value.
* **Return type:**
[int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
[`Series.argmin`](maxframe.dataframe.Series.argmin.md#maxframe.dataframe.Series.argmin)
: Return position of the minimum value.
[`Series.argmax`](maxframe.dataframe.Series.argmax.md#maxframe.dataframe.Series.argmax)
: Return position of the maximum value.
[`maxframe.tensor.argmax`](../../tensor/generated/maxframe.tensor.argmax.md#maxframe.tensor.argmax)
: Equivalent method for tensors.
[`Series.idxmax`](maxframe.dataframe.Series.idxmax.md#maxframe.dataframe.Series.idxmax)
: Return index label of the maximum values.
[`Series.idxmin`](maxframe.dataframe.Series.idxmin.md#maxframe.dataframe.Series.idxmin)
: Return index label of the minimum values.
### Examples
Consider dataset containing cereal calories
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s.execute()
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
```
```pycon
>>> s.argmax().execute()
2
>>> s.argmin().execute()
0
```
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.argmin.md
# maxframe.dataframe.Index.argmin
#### Index.argmin(axis=0, skipna=True, \*args, \*\*kwargs)
Return int position of the smallest value in the Series.
If the minimum is achieved in multiple locations,
the first row position is returned.
* **Parameters:**
* **axis** ( *{None}*) – Unused. Parameter needed for compatibility with DataFrame.
* **skipna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Exclude NA/null values when showing the result.
* **\*args** – Additional arguments and keywords for compatibility with NumPy.
* **\*\*kwargs** – Additional arguments and keywords for compatibility with NumPy.
* **Returns:**
Row position of the minimum value.
* **Return type:**
[int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
[`Series.argmin`](maxframe.dataframe.Series.argmin.md#maxframe.dataframe.Series.argmin)
: Return position of the minimum value.
[`Series.argmax`](maxframe.dataframe.Series.argmax.md#maxframe.dataframe.Series.argmax)
: Return position of the maximum value.
[`maxframe.tensor.argmin`](../../tensor/generated/maxframe.tensor.argmin.md#maxframe.tensor.argmin)
: Equivalent method for tensors.
[`Series.idxmax`](maxframe.dataframe.Series.idxmax.md#maxframe.dataframe.Series.idxmax)
: Return index label of the maximum values.
[`Series.idxmin`](maxframe.dataframe.Series.idxmin.md#maxframe.dataframe.Series.idxmin)
: Return index label of the minimum values.
### Examples
Consider dataset containing cereal calories
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s.execute()
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
```
```pycon
>>> s.argmax().execute()
2
>>> s.argmin().execute()
0
```
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.argsort.md
# maxframe.dataframe.Index.argsort
#### Index.argsort(\*args, \*\*kwargs)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.astype.md
# maxframe.dataframe.Index.astype
#### Index.astype(dtype, copy=True)
Create an Index with values cast to dtypes.
The class of a new Index is determined by dtype. When conversion is
impossible, a ValueError exception is raised.
* **Parameters:**
* **dtype** (*numpy dtype* *or* *pandas type*) – Note that any signed integer dtype is treated as `'int64'`,
and any unsigned integer dtype is treated as `'uint64'`,
regardless of the size.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – By default, astype always returns a newly allocated object.
If copy is set to False and internal requirements on dtype are
satisfied, the original data is used to create a new Index
or the original Index is returned.
* **Returns:**
Index with values cast to specified dtype.
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.drop.md
# maxframe.dataframe.Index.drop
#### Index.drop(labels, errors='raise')
Make new Index with passed list of labels deleted.
* **Parameters:**
* **labels** (*array-like*)
* **errors** ( *{'ignore'* *,* *'raise'}* *,* *default 'raise'*) – Note that this argument is kept only for compatibility, and errors
will not raise even if `errors=='raise'`.
* **Returns:**
**dropped**
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
* **Raises:**
[**KeyError**](https://docs.python.org/3/library/exceptions.html#KeyError) – If not all of the labels are found in the selected axis
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.drop_duplicates.md
# maxframe.dataframe.Index.drop_duplicates
#### Index.drop_duplicates(keep='first', method='auto')
Return Index with duplicate values removed.
* **Parameters:**
**keep** ({‘first’, ‘last’, `False`}, default ‘first’) –
- ‘first’ : Drop duplicates except for the first occurrence.
- ’last’ : Drop duplicates except for the last occurrence.
- ’any’ : Drop duplicates except for a random occurrence.
- `False` : Drop all duplicates.
* **Returns:**
**deduplicated**
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
#### SEE ALSO
[`Series.drop_duplicates`](maxframe.dataframe.Series.drop_duplicates.md#maxframe.dataframe.Series.drop_duplicates)
: Equivalent method on Series.
[`DataFrame.drop_duplicates`](maxframe.dataframe.DataFrame.drop_duplicates.md#maxframe.dataframe.DataFrame.drop_duplicates)
: Equivalent method on DataFrame.
`Index.duplicated`
: Related method on Index, indicating duplicate Index values.
### Examples
Generate a pandas.Index with duplicate values.
```pycon
>>> import maxframe.dataframe as md
```
```pycon
>>> idx = md.Index(['lame', 'cow', 'lame', 'beetle', 'lame', 'hippo'])
```
The keep parameter controls which duplicate values are removed.
The value ‘first’ keeps the first occurrence for each
set of duplicated entries. The default value of keep is ‘first’.
```pycon
>>> idx.drop_duplicates(keep='first').execute()
Index(['lame', 'cow', 'beetle', 'hippo'], dtype='object')
```
The value ‘last’ keeps the last occurrence for each set of duplicated
entries.
```pycon
>>> idx.drop_duplicates(keep='last').execute()
Index(['cow', 'beetle', 'lame', 'hippo'], dtype='object')
```
The value `False` discards all sets of duplicated entries.
```pycon
>>> idx.drop_duplicates(keep=False).execute()
Index(['cow', 'beetle', 'hippo'], dtype='object')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.droplevel.md
# maxframe.dataframe.Index.droplevel
#### Index.droplevel(level)
Return index with requested level(s) removed.
If resulting index has only 1 level left, the result will be
of Index type, not MultiIndex. The original index is not modified inplace.
* **Parameters:**
**level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *, or* *list-like* *,* *default 0*) – If a string is given, must be the name of a level
If list-like, elements must be names or indexes of levels.
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) or MultiIndex
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> mi = md.MultiIndex.from_arrays(
... [[1, 2], [3, 4], [5, 6]], names=['x', 'y', 'z'])
>>> mi.execute()
MultiIndex([(1, 3, 5),
(2, 4, 6)],
names=['x', 'y', 'z'])
```
```pycon
>>> mi.droplevel().execute()
MultiIndex([(3, 5),
(4, 6)],
names=['y', 'z'])
```
```pycon
>>> mi.droplevel(2).execute()
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
```
```pycon
>>> mi.droplevel('z').execute()
MultiIndex([(1, 3),
(2, 4)],
names=['x', 'y'])
```
```pycon
>>> mi.droplevel(['x', 'y']).execute()
Index([5, 6], dtype='int64', name='z')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.dropna.md
# maxframe.dataframe.Index.dropna
#### Index.dropna(how='any')
Return Index without NA/NaN values.
* **Parameters:**
**how** ( *{'any'* *,* *'all'}* *,* *default 'any'*) – If the Index is a MultiIndex, drop the value when any or all levels
are NaN.
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.factorize.md
# maxframe.dataframe.Index.factorize
#### Index.factorize(sort=False, use_na_sentinel=True)
Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
array when all that matters is identifying distinct values. factorize
is available as both a top-level function [`pandas.factorize()`](https://pandas.pydata.org/docs/reference/api/pandas.factorize.html#pandas.factorize),
and as a method [`Series.factorize()`](maxframe.dataframe.Series.factorize.md#maxframe.dataframe.Series.factorize) and [`Index.factorize()`](#maxframe.dataframe.Index.factorize).
* **Parameters:**
* **values** (*sequence*) – A 1-D sequence. Sequences that aren’t pandas objects are
coerced to ndarrays before factorization.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Sort uniques and shuffle codes to maintain the
relationship.
* **use_na_sentinel** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If True, the sentinel -1 will be used for NaN values. If False,
NaN values will be encoded as non-negative integers and will not drop the
NaN from the uniques of the values.
* **Returns:**
* **codes** (*ndarray*) – An integer ndarray that’s an indexer into uniques.
`uniques.take(codes)` will have the same values as values.
* **uniques** (*ndarray, Index, or Categorical*) – The unique valid values. When values is Categorical, uniques
is a Categorical. When values is some other pandas object, an
Index is returned. Otherwise, a 1-D ndarray is returned.
#### NOTE
Even if there’s a missing value in values, uniques will
*not* contain an entry for it.
#### SEE ALSO
`cut`
: Discretize continuous-valued array.
`unique`
: Find the unique value in an array.
### Notes
Reference [the user guide](https://pandas.pydata.org/docs/user_guide/reshaping.html#reshaping-factorize) for more examples.
### Examples
These examples all show factorize as a top-level method like
`pd.factorize(values)`. The results are identical for methods like
[`Series.factorize()`](maxframe.dataframe.Series.factorize.md#maxframe.dataframe.Series.factorize).
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> codes, uniques = md.factorize(mt.array(['b', 'b', 'a', 'c', 'b'], dtype="O"))
>>> codes.execute()
array([0, 0, 1, 2, 0])
>>> uniques.execute()
array(['b', 'a', 'c'], dtype=object)
```
With `sort=True`, the uniques will be sorted, and codes will be
shuffled so that the relationship is the maintained.
```pycon
>>> codes, uniques = md.factorize(mt.array(['b', 'b', 'a', 'c', 'b'], dtype="O"),
... sort=True)
>>> codes.execute()
array([1, 1, 0, 2, 1])
>>> uniques.execute()
array(['a', 'b', 'c'], dtype=object)
```
When `use_na_sentinel=True` (the default), missing values are indicated in
the codes with the sentinel value `-1` and missing values are not
included in uniques.
```pycon
>>> codes, uniques = md.factorize(mt.array(['b', None, 'a', 'c', 'b'], dtype="O"))
>>> codes.execute()
array([ 0, -1, 1, 2, 0])
>>> uniques.execute()
array(['b', 'a', 'c'], dtype=object)
```
Thus far, we’ve only factorized lists (which are internally coerced to
NumPy arrays). When factorizing pandas objects, the type of uniques
will differ. For Categoricals, a Categorical is returned.
```pycon
>>> cat = md.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> codes, uniques = md.factorize(cat)
>>> codes.execute()
array([0, 0, 1])
>>> uniques.execute()
['a', 'c']
Categories (3, object): ['a', 'b', 'c']
```
Notice that `'b'` is in `uniques.categories`, despite not being
present in `cat.values`.
For all other pandas objects, an Index of the appropriate type is
returned.
```pycon
>>> cat = md.Series(['a', 'a', 'c'])
>>> codes, uniques = md.factorize(cat)
>>> codes.execute()
array([0, 0, 1])
>>> uniques.execute()
Index(['a', 'c'], dtype='object')
```
If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting `use_na_sentinel=False`.
```pycon
>>> values = mt.array([1, 2, 1, mt.nan])
>>> codes, uniques = md.factorize(values) # default: use_na_sentinel=True
>>> codes.execute()
array([ 0, 1, 0, -1])
>>> uniques.execute()
array([1., 2.])
```
```pycon
>>> codes, uniques = md.factorize(values, use_na_sentinel=False)
>>> codes.execute()
array([0, 1, 0, 2])
>>> uniques.execute()
array([ 1., 2., nan])
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.fillna.md
# maxframe.dataframe.Index.fillna
#### Index.fillna(value=None, downcast=None)
Fill NA/NaN values with the specified value.
* **Parameters:**
* **value** (*scalar*) – Scalar value to use to fill holes (e.g. 0).
This value cannot be a list-likes.
* **downcast** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *default is None*) – A dict of item->dtype of what to downcast if possible,
or the string ‘infer’ which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
#### SEE ALSO
[`DataFrame.fillna`](maxframe.dataframe.DataFrame.fillna.md#maxframe.dataframe.DataFrame.fillna)
: Fill NaN values of a DataFrame.
[`Series.fillna`](maxframe.dataframe.Series.fillna.md#maxframe.dataframe.Series.fillna)
: Fill NaN Values of a Series.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.get_level_values.md
# maxframe.dataframe.Index.get_level_values
#### Index.get_level_values(level)
Return vector of label values for requested level.
Length of returned vector is equal to the length of the index.
* **Parameters:**
**level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – `level` is either the integer position of the level in the
MultiIndex, or the name of the level.
* **Returns:**
**values** – Values is a level of this MultiIndex converted to
a single [`Index`](maxframe.dataframe.Index.md#maxframe.dataframe.Index) (or subclass thereof).
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
### Examples
Create a MultiIndex:
```pycon
>>> import maxframe.dataframe as md
>>> import pandas as pd
>>> mi = md.Index(pd.MultiIndex.from_arrays((list('abc'), list('def')), names=['level_1', 'level_2']))
```
Get level values by supplying level as either integer or name:
```pycon
>>> mi.get_level_values(0).execute()
Index(['a', 'b', 'c'], dtype='object', name='level_1')
>>> mi.get_level_values('level_2').execute()
Index(['d', 'e', 'f'], dtype='object', name='level_2')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.has_duplicates.md
# maxframe.dataframe.Index.has_duplicates
#### *property* Index.has_duplicates
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.hasnans.md
# maxframe.dataframe.Index.hasnans
#### *property* Index.hasnans
Return True if there are any NaNs.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> idx = md.Index([1, 2, 3, None])
>>> idx.execute()
Index([1.0, 2.0, 3.0, nan], dtype='float64')
>>> idx.hasnans.execute()
True
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.insert.md
# maxframe.dataframe.Index.insert
#### Index.insert(loc, value)
Make new Index inserting new item at location.
Follows Python list.append semantics for negative values.
* **Parameters:**
* **loc** ([*int*](https://docs.python.org/3/library/functions.html#int))
* **item** ([*object*](https://docs.python.org/3/library/functions.html#object))
* **Returns:**
**new_index**
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.is_monotonic_decreasing.md
# maxframe.dataframe.Index.is_monotonic_decreasing
#### *property* Index.is_monotonic_decreasing
Return boolean scalar if values in the object are
monotonic_decreasing.
* **Return type:**
Scalar
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.is_monotonic_increasing.md
# maxframe.dataframe.Index.is_monotonic_increasing
#### *property* Index.is_monotonic_increasing
Return boolean scalar if values in the object are
monotonic_increasing.
* **Return type:**
Scalar
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.is_unique.md
# maxframe.dataframe.Index.is_unique
#### *property* Index.is_unique
Return boolean if values in the index are unique.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> index = md.Index([1, 2, 3])
>>> index.is_unique.execute()
True
```
```pycon
>>> index = md.Index([1, 2, 3, 1])
>>> index.is_unique.execute()
False
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.isna.md
# maxframe.dataframe.Index.isna
#### Index.isna()
Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as None or `numpy.NaN`, gets mapped to True
values.
Everything else gets mapped to False values. Characters such as empty
strings `''` or `numpy.inf` are not considered NA values
(unless you set `pandas.options.mode.use_inf_as_na = True`).
* **Returns:**
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.isnull`](maxframe.dataframe.DataFrame.isnull.md#maxframe.dataframe.DataFrame.isnull)
: Alias of isna.
[`DataFrame.notna`](maxframe.dataframe.DataFrame.notna.md#maxframe.dataframe.DataFrame.notna)
: Boolean inverse of isna.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Omit axes labels with missing values.
[`isna`](maxframe.dataframe.isna.md#maxframe.dataframe.isna)
: Top-level isna.
### Examples
Show which entries in a DataFrame are NA.
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'age': [5, 6, np.NaN],
... 'born': [md.NaT, md.Timestamp('1939-05-27'),
... md.Timestamp('1940-04-25')],
... 'name': ['Alfred', 'Batman', ''],
... 'toy': [None, 'Batmobile', 'Joker']})
>>> df.execute()
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
```
```pycon
>>> df.isna().execute()
age born name toy
0 False True False True
1 False False False False
2 True False False False
```
Show which entries in a Series are NA.
```pycon
>>> ser = md.Series([5, 6, np.NaN])
>>> ser.execute()
0 5.0
1 6.0
2 NaN
dtype: float64
```
```pycon
>>> ser.isna().execute()
0 False
1 False
2 True
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.max.md
# maxframe.dataframe.Index.max
#### Index.max(axis=None, skipna=True)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.md
# maxframe.dataframe.Index
### *class* maxframe.dataframe.Index(data, \*\*\_)
#### \_\_init_\_(data=None, dtype=None, copy=False, name=None, tupleize_cols=True, chunk_size=None, gpu=None, sparse=None, names=None, num_partitions=None, store_data=False)
### Methods
| [`__init__`](#maxframe.dataframe.Index.__init__)([data, dtype, copy, name, ...]) | |
|---------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|
| `agg`([func, axis]) | Aggregate using one or more operations over the specified axis. |
| `aggregate`([func, axis]) | Aggregate using one or more operations over the specified axis. |
| [`all`](maxframe.dataframe.Index.all.md#maxframe.dataframe.Index.all)() | |
| [`any`](maxframe.dataframe.Index.any.md#maxframe.dataframe.Index.any)() | |
| [`argmax`](maxframe.dataframe.Index.argmax.md#maxframe.dataframe.Index.argmax)([axis, skipna]) | Return int position of the smallest value in the Series. |
| [`argmin`](maxframe.dataframe.Index.argmin.md#maxframe.dataframe.Index.argmin)([axis, skipna]) | Return int position of the smallest value in the Series. |
| [`argsort`](maxframe.dataframe.Index.argsort.md#maxframe.dataframe.Index.argsort)(\*args, \*\*kwargs) | |
| [`astype`](maxframe.dataframe.Index.astype.md#maxframe.dataframe.Index.astype)(dtype[, copy]) | Create an Index with values cast to dtypes. |
| `check_monotonic`([decreasing, strict]) | Check if values in the object are monotonic increasing or decreasing. |
| `clip`([lower, upper, axis, inplace]) | Trim values at input threshold(s). |
| `copy`() | |
| `copy_from`(obj) | |
| `copy_to`(target) | |
| [`drop`](maxframe.dataframe.Index.drop.md#maxframe.dataframe.Index.drop)(labels[, errors]) | Make new Index with passed list of labels deleted. |
| [`drop_duplicates`](maxframe.dataframe.Index.drop_duplicates.md#maxframe.dataframe.Index.drop_duplicates)([keep, method]) | Return Index with duplicate values removed. |
| [`droplevel`](maxframe.dataframe.Index.droplevel.md#maxframe.dataframe.Index.droplevel)(level) | Return index with requested level(s) removed. |
| [`dropna`](maxframe.dataframe.Index.dropna.md#maxframe.dataframe.Index.dropna)([how]) | Return Index without NA/NaN values. |
| `duplicated`([keep]) | Indicate duplicate index values. |
| `execute`([session]) | |
| [`factorize`](maxframe.dataframe.Index.factorize.md#maxframe.dataframe.Index.factorize)([sort, use_na_sentinel]) | Encode the object as an enumerated type or categorical variable. |
| [`fillna`](maxframe.dataframe.Index.fillna.md#maxframe.dataframe.Index.fillna)([value, downcast]) | Fill NA/NaN values with the specified value. |
| [`get_level_values`](maxframe.dataframe.Index.get_level_values.md#maxframe.dataframe.Index.get_level_values)(level) | Return vector of label values for requested level. |
| [`insert`](maxframe.dataframe.Index.insert.md#maxframe.dataframe.Index.insert)(loc, value) | Make new Index inserting new item at location. |
| [`isna`](maxframe.dataframe.Index.isna.md#maxframe.dataframe.Index.isna)() | Detect missing values. |
| `isnull`() | Detect missing values. |
| `map`(mapper[, na_action, dtype, ...]) | Map values using input correspondence (a dict, Series, or function). |
| [`max`](maxframe.dataframe.Index.max.md#maxframe.dataframe.Index.max)([axis, skipna]) | |
| `memory_usage`([deep]) | Memory usage of the values. |
| [`min`](maxframe.dataframe.Index.min.md#maxframe.dataframe.Index.min)([axis, skipna]) | |
| [`notna`](maxframe.dataframe.Index.notna.md#maxframe.dataframe.Index.notna)() | Detect existing (non-missing) values. |
| `notnull`() | Detect existing (non-missing) values. |
| `rechunk`(chunk_size[, reassign_worker]) | |
| [`rename`](maxframe.dataframe.Index.rename.md#maxframe.dataframe.Index.rename)(name[, inplace]) | Alter Index or MultiIndex name. |
| [`repeat`](maxframe.dataframe.Index.repeat.md#maxframe.dataframe.Index.repeat)(repeats[, axis]) | Repeat elements of an Index. |
| [`set_names`](maxframe.dataframe.Index.set_names.md#maxframe.dataframe.Index.set_names)(names[, level, inplace]) | Set Index or MultiIndex name. |
| [`to_frame`](maxframe.dataframe.Index.to_frame.md#maxframe.dataframe.Index.to_frame)([index, name]) | Create a DataFrame with a column containing the Index. |
| `to_pandas`([session]) | |
| [`to_series`](maxframe.dataframe.Index.to_series.md#maxframe.dataframe.Index.to_series)([index, name]) | Create a Series with both index and values equal to the index keys. |
| `value_counts`([normalize, sort, ascending, ...]) | Return a Series containing counts of unique values. |
### Attributes
| `T` | Return the transpose, which is by definition self. |
|-----------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| `data` | |
| [`has_duplicates`](maxframe.dataframe.Index.has_duplicates.md#maxframe.dataframe.Index.has_duplicates) | |
| [`hasnans`](maxframe.dataframe.Index.hasnans.md#maxframe.dataframe.Index.hasnans) | Return True if there are any NaNs. |
| `is_monotonic` | Return boolean scalar if values in the object are monotonic_increasing. |
| [`is_monotonic_decreasing`](maxframe.dataframe.Index.is_monotonic_decreasing.md#maxframe.dataframe.Index.is_monotonic_decreasing) | Return boolean scalar if values in the object are monotonic_decreasing. |
| [`is_monotonic_increasing`](maxframe.dataframe.Index.is_monotonic_increasing.md#maxframe.dataframe.Index.is_monotonic_increasing) | Return boolean scalar if values in the object are monotonic_increasing. |
| [`is_unique`](maxframe.dataframe.Index.is_unique.md#maxframe.dataframe.Index.is_unique) | Return boolean if values in the index are unique. |
| [`name`](maxframe.dataframe.Index.name.md#maxframe.dataframe.Index.name) | |
| [`names`](maxframe.dataframe.Index.names.md#maxframe.dataframe.Index.names) | |
| [`ndim`](maxframe.dataframe.Index.ndim.md#maxframe.dataframe.Index.ndim) | |
| `shape` | |
| [`size`](maxframe.dataframe.Index.size.md#maxframe.dataframe.Index.size) | |
| `type_name` | |
| `values` | |
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.min.md
# maxframe.dataframe.Index.min
#### Index.min(axis=None, skipna=True)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.name.md
# maxframe.dataframe.Index.name
#### *property* Index.name
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.names.md
# maxframe.dataframe.Index.names
#### *property* Index.names
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.ndim.md
# maxframe.dataframe.Index.ndim
#### *property* Index.ndim
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.notna.md
# maxframe.dataframe.Index.notna
#### Index.notna()
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings `''` or `numpy.inf` are not considered NA values
(unless you set `pandas.options.mode.use_inf_as_na = True`).
NA values, such as None or `numpy.NaN`, get mapped to False
values.
* **Returns:**
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.notnull`](maxframe.dataframe.DataFrame.notnull.md#maxframe.dataframe.DataFrame.notnull)
: Alias of notna.
[`DataFrame.isna`](maxframe.dataframe.DataFrame.isna.md#maxframe.dataframe.DataFrame.isna)
: Boolean inverse of notna.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Omit axes labels with missing values.
[`notna`](maxframe.dataframe.notna.md#maxframe.dataframe.notna)
: Top-level notna.
### Examples
Show which entries in a DataFrame are not NA.
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'age': [5, 6, np.NaN],
... 'born': [md.NaT, md.Timestamp('1939-05-27'),
... md.Timestamp('1940-04-25')],
... 'name': ['Alfred', 'Batman', ''],
... 'toy': [None, 'Batmobile', 'Joker']})
>>> df.execute()
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
```
```pycon
>>> df.notna().execute()
age born name toy
0 True False True False
1 True True True True
2 False True True True
```
Show which entries in a Series are not NA.
```pycon
>>> ser = md.Series([5, 6, np.NaN])
>>> ser.execute()
0 5.0
1 6.0
2 NaN
dtype: float64
```
```pycon
>>> ser.notna().execute()
0 True
1 True
2 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.rename.md
# maxframe.dataframe.Index.rename
#### Index.rename(name, inplace=False)
Alter Index or MultiIndex name.
Able to set new names without level. Defaults to returning new index.
Length of names must match number of levels in MultiIndex.
* **Parameters:**
* **name** (*label* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *labels*) – Name(s) to set.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Modifies the object directly, instead of creating a new Index or
MultiIndex.
* **Returns:**
The same type as the caller or None if inplace is True.
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
#### SEE ALSO
[`Index.set_names`](maxframe.dataframe.Index.set_names.md#maxframe.dataframe.Index.set_names)
: Able to set new names partially and by level.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> idx = md.Index(['A', 'C', 'A', 'B'], name='score')
>>> idx.rename('grade').execute()
Index(['A', 'C', 'A', 'B'], dtype='object', name='grade')
```
```pycon
>>> idx = md.Index([('python', 2018),
... ('python', 2019),
... ('cobra', 2018),
... ('cobra', 2019)],
... names=['kind', 'year'])
>>> idx.execute()
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['kind', 'year'])
>>> idx.rename(['species', 'year']).execute()
MultiIndex([('python', 2018),
('python', 2019),
( 'cobra', 2018),
( 'cobra', 2019)],
names=['species', 'year'])
>>> idx.rename('species').execute()
Traceback (most recent call last):
TypeError: Must pass list-like as `names`.
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.repeat.md
# maxframe.dataframe.Index.repeat
#### Index.repeat(repeats, axis=None)
Repeat elements of an Index.
Returns a new Index where each element of the current Index
is repeated consecutively a given number of times.
* **Parameters:**
* **repeats** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *array* *of* *ints*) – The number of repetitions for each element. This should be a
non-negative integer. Repeating 0 times will return an empty
Index.
* **axis** (*None*) – Must be `None`. Has no effect but is accepted for compatibility
with numpy.
* **Returns:**
**repeated_index** – Newly created Index with repeated elements.
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
#### SEE ALSO
[`Series.repeat`](maxframe.dataframe.Series.repeat.md#maxframe.dataframe.Series.repeat)
: Equivalent function for Series.
[`numpy.repeat`](https://numpy.org/doc/stable/reference/generated/numpy.repeat.html#numpy.repeat)
: Similar method for [`numpy.ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> idx = md.Index(['a', 'b', 'c'])
>>> idx.execute()
Index(['a', 'b', 'c'], dtype='object')
>>> idx.repeat(2).execute()
Index(['a', 'a', 'b', 'b', 'c', 'c'], dtype='object')
>>> idx.repeat([1, 2, 3]).execute()
Index(['a', 'b', 'b', 'c', 'c', 'c'], dtype='object')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.set_names.md
# maxframe.dataframe.Index.set_names
#### Index.set_names(names, level=None, inplace=False)
Set Index or MultiIndex name.
Able to set new names partially and by level.
* **Parameters:**
* **names** (*label* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *label*) – Name(s) to set.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *label* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* *label* *,* *optional*) – If the index is a MultiIndex, level(s) to set (None for all
levels). Otherwise level must be None.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Modifies the object directly, instead of creating a new Index or
MultiIndex.
* **Returns:**
The same type as the caller or None if inplace is True.
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
#### SEE ALSO
[`Index.rename`](maxframe.dataframe.Index.rename.md#maxframe.dataframe.Index.rename)
: Able to set new names without level.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> idx = md.Index([1, 2, 3, 4])
>>> idx.execute()
Int64Index([1, 2, 3, 4], dtype='int64')
>>> idx.set_names('quarter').execute()
Int64Index([1, 2, 3, 4], dtype='int64', name='quarter')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.size.md
# maxframe.dataframe.Index.size
#### *property* Index.size
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.to_frame.md
# maxframe.dataframe.Index.to_frame
#### Index.to_frame(index: [bool](https://docs.python.org/3/library/functions.html#bool) = True, name=None)
Create a DataFrame with a column containing the Index.
* **Parameters:**
* **index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Set the index of the returned DataFrame as the original Index.
* **name** ([*object*](https://docs.python.org/3/library/functions.html#object) *,* *default None*) – The passed name should substitute for the index name (if it has
one).
* **Returns:**
DataFrame containing the original Index data.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`Index.to_series`](maxframe.dataframe.Index.to_series.md#maxframe.dataframe.Index.to_series)
: Convert an Index to a Series.
[`Series.to_frame`](maxframe.dataframe.Series.to_frame.md#maxframe.dataframe.Series.to_frame)
: Convert Series to DataFrame.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> idx = md.Index(['Ant', 'Bear', 'Cow'], name='animal')
>>> idx.to_frame().execute()
animal
animal
Ant Ant
Bear Bear
Cow Cow
```
By default, the original Index is reused. To enforce a new Index:
```pycon
>>> idx.to_frame(index=False).execute()
animal
0 Ant
1 Bear
2 Cow
```
To override the name of the resulting column, specify name:
```pycon
>>> idx.to_frame(index=False, name='zoo').execute()
zoo
0 Ant
1 Bear
2 Cow
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Index.to_series.md
# maxframe.dataframe.Index.to_series
#### Index.to_series(index=None, name=None)
Create a Series with both index and values equal to the index keys.
Useful with map for returning an indexer based on an index.
* **Parameters:**
* **index** ([*Index*](maxframe.dataframe.Index.md#maxframe.dataframe.Index) *,* *optional*) – Index of resulting Series. If None, defaults to original index.
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Dame of resulting Series. If None, defaults to name of original
index.
* **Returns:**
The dtype will be based on the type of the Index values.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.T.md
# maxframe.dataframe.Series.T
#### *property* Series.T
Return the transpose, which is by definition self.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.abs.md
# maxframe.dataframe.Series.abs
#### Series.abs()
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.add.md
# maxframe.dataframe.Series.add
#### Series.add(other, level=None, fill_value=None, axis=0)
Return Addition of series and other, element-wise (binary operator add).
Equivalent to `series + other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.radd`](maxframe.dataframe.Series.radd.md#maxframe.dataframe.Series.radd)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.add(b, fill_value=0).execute()
a 2.0
b 1.0
c 1.0
d 1.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.add_prefix.md
# maxframe.dataframe.Series.add_prefix
#### Series.add_prefix(prefix)
Prefix labels with string prefix.
For Series, the row labels are prefixed.
For DataFrame, the column labels are prefixed.
* **Parameters:**
**prefix** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The string to add before each label.
* **Returns:**
New Series or DataFrame with updated labels.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`Series.add_suffix`](maxframe.dataframe.Series.add_suffix.md#maxframe.dataframe.Series.add_suffix)
: Suffix row labels with string suffix.
[`DataFrame.add_suffix`](maxframe.dataframe.DataFrame.add_suffix.md#maxframe.dataframe.DataFrame.add_suffix)
: Suffix column labels with string suffix.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3, 4])
>>> s.execute()
0 1
1 2
2 3
3 4
dtype: int64
```
```pycon
>>> s.add_prefix('item_').execute()
item_0 1
item_1 2
item_2 3
item_3 4
dtype: int64
```
```pycon
>>> df = md.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df.execute()
A B
0 1 3
1 2 4
2 3 5
3 4 6
```
```pycon
>>> df.add_prefix('col_').execute()
col_A col_B
0 1 3
1 2 4
2 3 5
3 4 6
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.add_suffix.md
# maxframe.dataframe.Series.add_suffix
#### Series.add_suffix(suffix)
Suffix labels with string suffix.
For Series, the row labels are suffixed.
For DataFrame, the column labels are suffixed.
* **Parameters:**
**suffix** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The string to add after each label.
* **Returns:**
New Series or DataFrame with updated labels.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`Series.add_prefix`](maxframe.dataframe.Series.add_prefix.md#maxframe.dataframe.Series.add_prefix)
: Suffix row labels with string prefix.
[`DataFrame.add_prefix`](maxframe.dataframe.DataFrame.add_prefix.md#maxframe.dataframe.DataFrame.add_prefix)
: Suffix column labels with string prefix.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3, 4])
>>> s.execute()
0 1
1 2
2 3
3 4
dtype: int64
```
```pycon
>>> s.add_prefix('_item').execute()
0_item 1
1_item 2
2_item 3
3_item 4
dtype: int64
```
```pycon
>>> df = md.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
>>> df.execute()
A B
0 1 3
1 2 4
2 3 5
3 4 6
```
```pycon
>>> df.add_prefix('_col').execute()
A_col B_col
0 1 3
1 2 4
2 3 5
3 4 6
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.agg.md
# maxframe.dataframe.Series.agg
#### Series.agg(func=None, axis=0, \*\*kw)
Aggregate using one or more operations over the specified axis.
* **Parameters:**
* **df** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – Object to aggregate.
* **func** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) – Function to use for aggregating the data.
* **axis** ( *{0* *or* *‘index’* *,* *1* *or* *‘columns’}* *,* *default 0*) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
* **kw** – Keyword arguments to pass to func.
* **Returns:**
The return can be:
* scalar : when Series.agg is called with single function
* Series : when DataFrame.agg is called with a single function
* DataFrame : when DataFrame.agg is called with several functions
* **Return type:**
scalar, [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[1, 2, 3],
... [4, 5, 6],
... [7, 8, 9],
... [np.nan, np.nan, np.nan]],
... columns=['A', 'B', 'C']).execute()
```
Aggregate these functions over the rows.
```pycon
>>> df.agg(['sum', 'min']).execute()
A B C
min 1.0 2.0 3.0
sum 12.0 15.0 18.0
```
Different aggregations per column.
```pycon
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']}).execute()
A B
max NaN 8.0
min 1.0 2.0
sum 12.0 NaN
```
Aggregate different functions over the columns and rename the index of the resulting DataFrame.
```pycon
>>> df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean')).execute()
A B C
x 7.0 NaN NaN
y NaN 2.0 NaN
z NaN NaN 6.0
```
```pycon
>>> s = md.Series([1, 2, 3, 4])
>>> s.agg('min').execute()
1
```
```pycon
>>> s.agg(['min', 'max']).execute()
max 4
min 1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.aggregate.md
# maxframe.dataframe.Series.aggregate
#### Series.aggregate(func=None, axis=0, \*\*kw)
Aggregate using one or more operations over the specified axis.
* **Parameters:**
* **df** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – Object to aggregate.
* **func** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) – Function to use for aggregating the data.
* **axis** ( *{0* *or* *‘index’* *,* *1* *or* *‘columns’}* *,* *default 0*) – If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.
* **kw** – Keyword arguments to pass to func.
* **Returns:**
The return can be:
* scalar : when Series.agg is called with single function
* Series : when DataFrame.agg is called with a single function
* DataFrame : when DataFrame.agg is called with several functions
* **Return type:**
scalar, [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[1, 2, 3],
... [4, 5, 6],
... [7, 8, 9],
... [np.nan, np.nan, np.nan]],
... columns=['A', 'B', 'C']).execute()
```
Aggregate these functions over the rows.
```pycon
>>> df.agg(['sum', 'min']).execute()
A B C
min 1.0 2.0 3.0
sum 12.0 15.0 18.0
```
Different aggregations per column.
```pycon
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']}).execute()
A B
max NaN 8.0
min 1.0 2.0
sum 12.0 NaN
```
Aggregate different functions over the columns and rename the index of the resulting DataFrame.
```pycon
>>> df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean')).execute()
A B C
x 7.0 NaN NaN
y NaN 2.0 NaN
z NaN NaN 6.0
```
```pycon
>>> s = md.Series([1, 2, 3, 4])
>>> s.agg('min').execute()
1
```
```pycon
>>> s.agg(['min', 'max']).execute()
max 4
min 1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.align.md
# maxframe.dataframe.Series.align
#### Series.align(other, join: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'outer', axis: [int](https://docs.python.org/3/library/functions.html#int) | [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, level: [int](https://docs.python.org/3/library/functions.html#int) | [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, copy: [bool](https://docs.python.org/3/library/functions.html#bool) = True, fill_value: [Any](https://docs.python.org/3/library/typing.html#typing.Any) = None, method: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, limit: [int](https://docs.python.org/3/library/functions.html#int) | [None](https://docs.python.org/3/library/constants.html#None) = None, fill_axis: [int](https://docs.python.org/3/library/functions.html#int) | [str](https://docs.python.org/3/library/stdtypes.html#str) = 0, broadcast_axis: [int](https://docs.python.org/3/library/functions.html#int) | [str](https://docs.python.org/3/library/stdtypes.html#str) = None)
Align two objects on their axes with the specified join method.
Join method is specified for each axis Index.
* **Parameters:**
* **other** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *or* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series))
* **join** ( *{'outer'* *,* *'inner'* *,* *'left'* *,* *'right'}* *,* *default 'outer'*)
* **axis** (*allowed axis* *of* *the other object* *,* *default None*) – Align on index (0), columns (1), or both (None).
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *level name* *,* *default None*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Always returns new objects. If copy=False and no reindexing is
required then original objects are returned.
* **fill_value** (*scalar* *,* *default np.NaN*) – Value to use for missing values. Defaults to NaN, but can be any
“compatible” value.
* **method** ( *{'backfill'* *,* *'bfill'* *,* *'pad'* *,* *'ffill'* *,* *None}* *,* *default None*) –
Method to use for filling holes in reindexed Series:
- pad / ffill: propagate last valid observation forward to next valid.
- backfill / bfill: use NEXT valid observation to fill gap.
* **limit** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled. Must be greater than 0 if not None.
* **fill_axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – Filling axis, method and limit.
* **broadcast_axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default None*) – Broadcast values along this axis, if aligning two objects of
different dimensions.
### Notes
Currently argument level is not supported.
* **Returns:**
**(left, right)** – Aligned objects.
* **Return type:**
([DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame), [type](https://docs.python.org/3/library/functions.html#type) of other)
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(
... [[1, 2, 3, 4], [6, 7, 8, 9]], columns=["D", "B", "E", "A"], index=[1, 2]
... )
>>> other = md.DataFrame(
... [[10, 20, 30, 40], [60, 70, 80, 90], [600, 700, 800, 900]],
... columns=["A", "B", "C", "D"],
... index=[2, 3, 4],
... )
>>> df.execute()
D B E A
1 1 2 3 4
2 6 7 8 9
>>> other.execute()
A B C D
2 10 20 30 40
3 60 70 80 90
4 600 700 800 900
```
Align on columns:
```pycon
>>> left, right = df.align(other, join="outer", axis=1)
>>> left.execute()
A B C D E
1 4 2 NaN 1 3
2 9 7 NaN 6 8
>>> right.execute()
A B C D E
2 10 20 30 40 NaN
3 60 70 80 90 NaN
4 600 700 800 900 NaN
```
We can also align on the index:
```pycon
>>> left, right = df.align(other, join="outer", axis=0)
>>> left.execute()
D B E A
1 1.0 2.0 3.0 4.0
2 6.0 7.0 8.0 9.0
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
>>> right.execute()
A B C D
1 NaN NaN NaN NaN
2 10.0 20.0 30.0 40.0
3 60.0 70.0 80.0 90.0
4 600.0 700.0 800.0 900.0
```
Finally, the default axis=None will align on both index and columns:
```pycon
>>> left, right = df.align(other, join="outer", axis=None)
>>> left.execute()
A B C D E
1 4.0 2.0 NaN 1.0 3.0
2 9.0 7.0 NaN 6.0 8.0
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
>>> right.execute()
A B C D E
1 NaN NaN NaN NaN NaN
2 10.0 20.0 30.0 40.0 NaN
3 60.0 70.0 80.0 90.0 NaN
4 600.0 700.0 800.0 900.0 NaN
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.all.md
# maxframe.dataframe.Series.all
#### Series.all(axis=0, bool_only=None, skipna=True, level=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.any.md
# maxframe.dataframe.Series.any
#### Series.any(axis=0, bool_only=None, skipna=True, level=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.append.md
# maxframe.dataframe.Series.append
#### Series.append(other, ignore_index=False, verify_integrity=False, sort=False)
Append rows of other to the end of caller, returning a new object.
Columns in other that are not in the caller are added as new columns.
* **Parameters:**
* **other** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *or* *Series/dict-like object* *, or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *these*) – The data to append.
* **ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
* **verify_integrity** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, raise ValueError on creating index with duplicates.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Sort columns if the columns of self and other are not aligned.
* **Returns:**
A new DataFrame consisting of the rows of caller and the rows of other.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`concat`](maxframe.dataframe.concat.md#maxframe.dataframe.concat)
: General function to concatenate DataFrame or Series objects.
### Notes
If a list of dict/series is passed and the keys are all contained in
the DataFrame’s index, the order of the columns in the resulting
DataFrame will be unchanged.
Iteratively appending rows to a DataFrame can be more computationally
intensive than a single concatenate. A better solution is to append
those rows to a list and then concatenate the list with the original
DataFrame all at once.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[1, 2], [3, 4]], columns=list('AB'), index=['x', 'y'])
>>> df.execute()
A B
x 1 2
y 3 4
>>> df2 = md.DataFrame([[5, 6], [7, 8]], columns=list('AB'), index=['x', 'y'])
>>> df.append(df2).execute()
A B
x 1 2
y 3 4
x 5 6
y 7 8
```
With ignore_index set to True:
```pycon
>>> df.append(df2, ignore_index=True).execute()
A B
0 1 2
1 3 4
2 5 6
3 7 8
```
The following, while not recommended methods for generating DataFrames,
show two ways to generate a DataFrame from multiple data sources.
Less efficient:
```pycon
>>> df = md.DataFrame(columns=['A'])
>>> for i in range(5):
... df = df.append({'A': i}, ignore_index=True)
>>> df.execute()
A
0 0
1 1
2 2
3 3
4 4
```
More efficient:
```pycon
>>> md.concat([md.DataFrame([i], columns=['A']) for i in range(5)],
... ignore_index=True).execute()
A
0 0
1 1
2 2
3 3
4 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.apply.md
# maxframe.dataframe.Series.apply
#### Series.apply(func, convert_dtype=True, output_type=None, args=(), dtypes=None, dtype=None, name=None, index=None, skip_infer=False, \*\*kwds)
Invoke function on values of Series.
Can be ufunc (a NumPy function that applies to the entire Series)
or a Python function that only works on single values.
* **Parameters:**
* **func** (*function*) – Python function or NumPy ufunc to apply.
* **convert_dtype** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Try to find better dtype for elementwise function results. If
False, leave as dtype=object.
* **output_type** ( *{'dataframe'* *,* *'series'}* *,* *default None*) – Specify type of returned object. See Notes for more details.
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *default None*) – Specify dtypes of returned DataFrames. See Notes for more details.
* **dtype** ([*numpy.dtype*](https://numpy.org/doc/stable/reference/generated/numpy.dtype.html#numpy.dtype) *,* *default None*) – Specify dtype of returned Series. See Notes for more details.
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Specify name of returned Series. See Notes for more details.
* **index** ([*Index*](maxframe.dataframe.Index.md#maxframe.dataframe.Index) *,* *default None*) – Specify index of returned object. See Notes for more details.
* **args** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple)) – Positional arguments passed to func after the series value.
* **skip_infer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether infer dtypes when dtypes or output_type is not specified.
* **\*\*kwds** – Additional keyword arguments passed to func.
* **Returns:**
If func returns a Series object the result will be a DataFrame.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`Series.map`](maxframe.dataframe.Series.map.md#maxframe.dataframe.Series.map)
: For element-wise operations.
[`Series.agg`](maxframe.dataframe.Series.agg.md#maxframe.dataframe.Series.agg)
: Only perform aggregating type operations.
[`Series.transform`](maxframe.dataframe.Series.transform.md#maxframe.dataframe.Series.transform)
: Only perform transforming type operations.
### Notes
When deciding output dtypes and shape of the return value, MaxFrame will
try applying `func` onto a mock Series, and the apply call may fail.
When this happens, you need to specify the type of apply call
(DataFrame or Series) in output_type.
* For DataFrame output, you need to specify a list or a pandas Series
as `dtypes` of output DataFrame. `index` of output can also be
specified.
* For Series output, you need to specify `dtype` and `name` of
output Series.
* For any input with data type `pandas.ArrowDtype(pyarrow.MapType)`, it will always
be converted to a Python dict. And for any output with this data type, it must be
returned as a Python dict as well.
### Examples
Create a series with typical summer temperatures for each city.
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series([20, 21, 12],
... index=['London', 'New York', 'Helsinki'])
>>> s.execute()
London 20
New York 21
Helsinki 12
dtype: int64
```
Square the values by defining a function and passing it as an
argument to `apply()`.
```pycon
>>> def square(x):
... return x ** 2
>>> s.apply(square).execute()
London 400
New York 441
Helsinki 144
dtype: int64
```
Square the values by passing an anonymous function as an
argument to `apply()`.
```pycon
>>> s.apply(lambda x: x ** 2).execute()
London 400
New York 441
Helsinki 144
dtype: int64
```
Define a custom function that needs additional positional
arguments and pass these additional arguments using the
`args` keyword.
```pycon
>>> def subtract_custom_value(x, custom_value):
... return x - custom_value
```
```pycon
>>> s.apply(subtract_custom_value, args=(5,)).execute()
London 15
New York 16
Helsinki 7
dtype: int64
```
Define a custom function that takes keyword arguments
and pass these arguments to `apply`.
```pycon
>>> def add_custom_values(x, **kwargs):
... for month in kwargs:
... x += kwargs[month]
... return x
```
```pycon
>>> s.apply(add_custom_values, june=30, july=20, august=25).execute()
London 95
New York 96
Helsinki 87
dtype: int64
```
Create a series with a map type.
```pycon
>>> import pyarrow as pa
>>> from maxframe.lib.dtypes_extension import dict_
>>> s = md.Series(
... data=[[("k1", 1), ("k2", 2)], [("k1", 3)], None],
... index=[1, 2, 3],
... dtype=dict_(pa.string(), pa.int64()),
... )
>>> s.execute()
1 [('k1', 1), ('k2', 2)]
2 [('k1', 3)]
3 <NA>
dtype: map<string, int64>[pyarrow]
```
Define a function that updates the map type with a new key-value pair.
```pycon
>>> def custom_set_item(x):
... if x is not None:
... x["k2"] = 10
... return x
```
```pycon
>>> s.apply(custom_set_item, output_type="series", dtype=dict_(pa.string(), pa.int64())).execute()
1 [('k1', 1), ('k2', 10)]
2 [('k1', 3), ('k2', 10)]
3 <NA>
dtype: map<string, int64>[pyarrow]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.argmax.md
# maxframe.dataframe.Series.argmax
#### Series.argmax(axis=0, skipna=True, \*args, \*\*kwargs)
Return int position of the smallest value in the Series.
If the maximum is achieved in multiple locations,
the first row position is returned.
* **Parameters:**
* **axis** ( *{None}*) – Unused. Parameter needed for compatibility with DataFrame.
* **skipna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Exclude NA/null values when showing the result.
* **\*args** – Additional arguments and keywords for compatibility with NumPy.
* **\*\*kwargs** – Additional arguments and keywords for compatibility with NumPy.
* **Returns:**
Row position of the maximum value.
* **Return type:**
[int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
[`Series.argmin`](maxframe.dataframe.Series.argmin.md#maxframe.dataframe.Series.argmin)
: Return position of the minimum value.
[`Series.argmax`](#maxframe.dataframe.Series.argmax)
: Return position of the maximum value.
[`maxframe.tensor.argmax`](../../tensor/generated/maxframe.tensor.argmax.md#maxframe.tensor.argmax)
: Equivalent method for tensors.
[`Series.idxmax`](maxframe.dataframe.Series.idxmax.md#maxframe.dataframe.Series.idxmax)
: Return index label of the maximum values.
[`Series.idxmin`](maxframe.dataframe.Series.idxmin.md#maxframe.dataframe.Series.idxmin)
: Return index label of the minimum values.
### Examples
Consider dataset containing cereal calories
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s.execute()
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
```
```pycon
>>> s.argmax().execute()
2
>>> s.argmin().execute()
0
```
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.argmin.md
# maxframe.dataframe.Series.argmin
#### Series.argmin(axis=0, skipna=True, \*args, \*\*kwargs)
Return int position of the smallest value in the Series.
If the minimum is achieved in multiple locations,
the first row position is returned.
* **Parameters:**
* **axis** ( *{None}*) – Unused. Parameter needed for compatibility with DataFrame.
* **skipna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Exclude NA/null values when showing the result.
* **\*args** – Additional arguments and keywords for compatibility with NumPy.
* **\*\*kwargs** – Additional arguments and keywords for compatibility with NumPy.
* **Returns:**
Row position of the minimum value.
* **Return type:**
[int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
[`Series.argmin`](#maxframe.dataframe.Series.argmin)
: Return position of the minimum value.
[`Series.argmax`](maxframe.dataframe.Series.argmax.md#maxframe.dataframe.Series.argmax)
: Return position of the maximum value.
[`maxframe.tensor.argmin`](../../tensor/generated/maxframe.tensor.argmin.md#maxframe.tensor.argmin)
: Equivalent method for tensors.
[`Series.idxmax`](maxframe.dataframe.Series.idxmax.md#maxframe.dataframe.Series.idxmax)
: Return index label of the maximum values.
[`Series.idxmin`](maxframe.dataframe.Series.idxmin.md#maxframe.dataframe.Series.idxmin)
: Return index label of the minimum values.
### Examples
Consider dataset containing cereal calories
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s.execute()
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: float64
```
```pycon
>>> s.argmax().execute()
2
>>> s.argmin().execute()
0
```
The maximum cereal calories is the third element and
the minimum cereal calories is the first element,
since series is zero-indexed.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.argsort.md
# maxframe.dataframe.Series.argsort
#### Series.argsort(axis=0, kind='quicksort', order=None, stable=None)
Return the integer indices that would sort the Series values.
Override ndarray.argsort. Argsorts the value, omitting NA/null values,
and places the result in the same locations as the non-NA values.
* **Parameters:**
* **axis** ( *{0* *or* *'index'}*) – Unused. Parameter needed for compatibility with DataFrame.
* **kind** ( *{'mergesort'* *,* *'quicksort'* *,* *'heapsort'* *,* *'stable'}* *,* *default 'quicksort'*) – Choice of sorting algorithm. See [`numpy.sort()`](https://numpy.org/doc/stable/reference/generated/numpy.sort.html#numpy.sort) for more
information. ‘mergesort’ and ‘stable’ are the only stable algorithms.
* **order** (*None*) – Has no effect but is accepted for compatibility with numpy.
* **stable** (*None*) – Has no effect but is accepted for compatibility with numpy.
* **Returns:**
Positions of values within the sort order with -1 indicating
nan values.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)[np.intp]
#### SEE ALSO
[`maxframe.tensor.argsort`](../../tensor/generated/maxframe.tensor.argsort.md#maxframe.tensor.argsort)
: Returns the indices that would sort this array.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series([3, 2, 1])
>>> s.argsort().execute()
0 2
1 1
2 0
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.astype.md
# maxframe.dataframe.Series.astype
#### Series.astype(dtype, copy=True, errors='raise')
Cast a pandas object to a specified dtype `dtype`.
* **Parameters:**
* **dtype** (*data type* *, or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *of* *column name -> data type*) – Use a numpy.dtype or Python type to cast entire pandas object to
the same type. Alternatively, use {col: dtype, …}, where col is a
column label and dtype is a numpy.dtype or Python type to cast one
or more of the DataFrame’s columns to column-specific types.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Return a copy when `copy=True` (be very careful setting
`copy=False` as changes to values then may propagate to other
pandas objects).
* **errors** ( *{'raise'* *,* *'ignore'}* *,* *default 'raise'*) –
Control raising of exceptions on invalid data for provided dtype.
- `raise` : allow exceptions to be raised
- `ignore` : suppress exceptions. On error return original object.
* **Returns:**
**casted**
* **Return type:**
same type as caller
#### SEE ALSO
[`to_datetime`](maxframe.dataframe.to_datetime.md#maxframe.dataframe.to_datetime)
: Convert argument to datetime.
`to_timedelta`
: Convert argument to timedelta.
[`to_numeric`](maxframe.dataframe.to_numeric.md#maxframe.dataframe.to_numeric)
: Convert argument to a numeric type.
[`numpy.ndarray.astype`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html#numpy.ndarray.astype)
: Cast a numpy array to a specified type.
### Examples
Create a DataFrame:
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}))
>>> df.dtypes
col1 int64
col2 int64
dtype: object
```
Cast all columns to int32:
```pycon
>>> df.astype('int32').dtypes
col1 int32
col2 int32
dtype: object
```
Cast col1 to int32 using a dictionary:
```pycon
>>> df.astype({'col1': 'int32'}).dtypes
col1 int32
col2 int64
dtype: object
```
Create a series:
```pycon
>>> ser = md.Series(pd.Series([1, 2], dtype='int32'))
>>> ser.execute()
0 1
1 2
dtype: int32
>>> ser.astype('int64').execute()
0 1
1 2
dtype: int64
```
Convert to categorical type:
```pycon
>>> ser.astype('category').execute()
0 1
1 2
dtype: category
Categories (2, int64): [1, 2]
```
Convert to ordered categorical type with custom ordering:
```pycon
>>> cat_dtype = pd.api.types.CategoricalDtype(
... categories=[2, 1], ordered=True)
>>> ser.astype(cat_dtype).execute()
0 1
1 2
dtype: category
Categories (2, int64): [2 < 1]
```
Note that using `copy=False` and changing data on a new
pandas object may propagate changes:
```pycon
>>> s1 = md.Series(pd.Series([1, 2]))
>>> s2 = s1.astype('int64', copy=False)
>>> s1.execute() # note that s1[0] has changed too
0 1
1 2
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.at.md
# maxframe.dataframe.Series.at
#### *property* Series.at
Access a single value for a row/column label pair.
Similar to `loc`, in that both provide label-based lookups. Use
`at` if you only need to get or set a single value in a DataFrame
or Series.
* **Raises:**
[**KeyError**](https://docs.python.org/3/library/exceptions.html#KeyError) – If ‘label’ does not exist in DataFrame.
#### SEE ALSO
[`DataFrame.iat`](maxframe.dataframe.DataFrame.iat.md#maxframe.dataframe.DataFrame.iat)
: Access a single value for a row/column pair by integer position.
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Access a group of rows and columns by label(s).
[`Series.at`](#maxframe.dataframe.Series.at)
: Access a single value using a label.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... index=[4, 5, 6], columns=['A', 'B', 'C'])
>>> df.execute()
A B C
4 0 2 3
5 0 4 1
6 10 20 30
```
Get value at specified row/column pair
```pycon
>>> df.at[4, 'B'].execute()
2
```
# Set value at specified row/column pair
#
# >>> df.at[4, ‘B’] = 10
# >>> df.at[4, ‘B’]
# 10
Get value within a Series
```pycon
>>> df.loc[5].at['B'].execute()
4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.at_time.md
# maxframe.dataframe.Series.at_time
#### Series.at_time(time, axis=0)
Select values at particular time of day (e.g., 9:30AM).
* **Parameters:**
* **time** ([*datetime.time*](https://docs.python.org/3/library/datetime.html#datetime.time) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The values to select.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – For Series this parameter is unused and defaults to 0.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
* **Raises:**
[**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError) – If the index is not a `DatetimeIndex`
#### SEE ALSO
[`between_time`](maxframe.dataframe.Series.between_time.md#maxframe.dataframe.Series.between_time)
: Select values between particular times of the day.
`first`
: Select initial periods of time series based on a date offset.
`last`
: Select final periods of time series based on a date offset.
`DatetimeIndex.indexer_at_time`
: Get just the index locations for values at particular time of the day.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> i = md.date_range('2018-04-09', periods=4, freq='12h')
>>> ts = md.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts.execute()
A
2018-04-09 00:00:00 1
2018-04-09 12:00:00 2
2018-04-10 00:00:00 3
2018-04-10 12:00:00 4
```
```pycon
>>> ts.at_time('12:00').execute()
A
2018-04-09 12:00:00 2
2018-04-10 12:00:00 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.between.md
# maxframe.dataframe.Series.between
#### Series.between(left, right, inclusive='both')
Return boolean Series equivalent to left <= series <= right.
This function returns a boolean vector containing True wherever the
corresponding Series element is between the boundary values left and
right. NA values are treated as False.
* **Parameters:**
* **left** (*scalar* *or* *list-like*) – Left boundary.
* **right** (*scalar* *or* *list-like*) – Right boundary.
* **inclusive** ( *{"both"* *,* *"neither"* *,* *"left"* *,* *"right"}*) – Include boundaries. Whether to set each bound as closed or open.
* **Returns:**
Series representing whether each element is between left and
right (inclusive).
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.gt`](maxframe.dataframe.Series.gt.md#maxframe.dataframe.Series.gt)
: Greater than of series and other.
[`Series.lt`](maxframe.dataframe.Series.lt.md#maxframe.dataframe.Series.lt)
: Less than of series and other.
### Notes
This function is equivalent to `(left <= ser) & (ser <= right)`
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([2, 0, 4, 8, np.nan])
```
Boundary values are included by default:
```pycon
>>> s.between(1, 4).execute()
0 True
1 False
2 True
3 False
4 False
dtype: bool
```
With inclusive set to `"neither"` boundary values are excluded:
```pycon
>>> s.between(1, 4, inclusive="neither").execute()
0 True
1 False
2 False
3 False
4 False
dtype: bool
```
left and right can be any scalar value:
```pycon
>>> s = md.Series(['Alice', 'Bob', 'Carol', 'Eve'])
>>> s.between('Anna', 'Daniel').execute()
0 False
1 True
2 True
3 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.between_time.md
# maxframe.dataframe.Series.between_time
#### Series.between_time(start_time, end_time, inclusive='both', axis=0)
Select values between particular times of the day (e.g., 9:00-9:30 AM).
By setting `start_time` to be later than `end_time`,
you can get the times that are *not* between the two times.
* **Parameters:**
* **start_time** ([*datetime.time*](https://docs.python.org/3/library/datetime.html#datetime.time) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Initial time as a time filter limit.
* **end_time** ([*datetime.time*](https://docs.python.org/3/library/datetime.html#datetime.time) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – End time as a time filter limit.
* **inclusive** ( *{"both"* *,* *"neither"* *,* *"left"* *,* *"right"}* *,* *default "both"*) – Include boundaries; whether to set each bound as closed or open.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – Determine range time on index or columns value.
For Series this parameter is unused and defaults to 0.
* **Returns:**
Data from the original object filtered to the specified dates range.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
* **Raises:**
[**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError) – If the index is not a `DatetimeIndex`
#### SEE ALSO
[`at_time`](maxframe.dataframe.Series.at_time.md#maxframe.dataframe.Series.at_time)
: Select values at a particular time of the day.
`first`
: Select initial periods of time series based on a date offset.
`last`
: Select final periods of time series based on a date offset.
`DatetimeIndex.indexer_between_time`
: Get just the index locations for values between particular times of the day.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> i = md.date_range('2018-04-09', periods=4, freq='1D20min')
>>> ts = md.DataFrame({'A': [1, 2, 3, 4]}, index=i)
>>> ts.execute()
A
2018-04-09 00:00:00 1
2018-04-10 00:20:00 2
2018-04-11 00:40:00 3
2018-04-12 01:00:00 4
```
```pycon
>>> ts.between_time('0:15', '0:45').execute()
A
2018-04-10 00:20:00 2
2018-04-11 00:40:00 3
```
You get the times that are *not* between two times by setting
`start_time` later than `end_time`:
```pycon
>>> ts.between_time('0:45', '0:15').execute()
A
2018-04-09 00:00:00 1
2018-04-12 01:00:00 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.case_when.md
# maxframe.dataframe.Series.case_when
#### Series.case_when(caselist)
Replace values where the conditions are True.
* **Parameters:**
**caselist** (*A list* *of* *tuples* *of* *conditions and expected replacements*) – Takes the form: `(condition0, replacement0)`,
`(condition1, replacement1)`, … .
`condition` should be a 1-D boolean array-like object
or a callable. If `condition` is a callable,
it is computed on the Series
and should return a boolean Series or array.
The callable must not change the input Series
(though pandas doesn\`t check it). `replacement` should be a
1-D array-like object, a scalar or a callable.
If `replacement` is a callable, it is computed on the Series
and should return a scalar or Series. The callable
must not change the input Series.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.mask`](maxframe.dataframe.Series.mask.md#maxframe.dataframe.Series.mask)
: Replace values where the condition is True.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> c = md.Series([6, 7, 8, 9], name='c')
>>> a = md.Series([0, 0, 1, 2])
>>> b = md.Series([0, 3, 4, 5])
```
```pycon
>>> c.case_when(caselist=[(a.gt(0), a), # condition, replacement
... (b.gt(0), b)]).execute()
0 6
1 3
2 1
3 2
Name: c, dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.clip.md
# maxframe.dataframe.Series.clip
#### Series.clip(lower=None, upper=None, , axis=None, inplace=False)
Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds
can be singular values or array like, and in the latter case
the clipping is performed element-wise in the specified axis.
* **Parameters:**
* **lower** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array-like* *,* *default None*) – Minimum threshold value. All values below this
threshold will be set to it. If None, no lower clipping is performed.
* **upper** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array-like* *,* *default None*) – Maximum threshold value. All values above this
threshold will be set to it. If None, no upper clipping is performed.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *str axis name* *,* *optional*) – Align object with lower and upper along the given axis.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether to perform the operation in place on the data.
* **\*args** – Additional keywords have no effect but might be accepted
for compatibility with numpy.
* **\*\*kwargs** – Additional keywords have no effect but might be accepted
for compatibility with numpy.
* **Returns:**
Same type as calling object with the values outside the
clip boundaries replaced or None if `inplace=True`.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or None
#### SEE ALSO
[`Series.clip`](#maxframe.dataframe.Series.clip)
: Trim values at input threshold in series.
[`DataFrame.clip`](maxframe.dataframe.DataFrame.clip.md#maxframe.dataframe.DataFrame.clip)
: Trim values at input threshold in dataframe.
[`numpy.clip`](https://numpy.org/doc/stable/reference/generated/numpy.clip.html#numpy.clip)
: Clip (limit) the values in an array.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
>>> df = md.DataFrame(data)
>>> df.execute()
col_0 col_1
0 9 -2
1 -3 -7
2 0 6
3 -1 8
4 5 -5
```
Clips per column using lower and upper thresholds:
```pycon
>>> df.clip(lower=-4, upper=7).execute()
col_0 col_1
0 7 -2
1 -3 -4
2 0 6
3 -1 7
4 5 -4
```
Clips using specific lower and upper thresholds per column element:
```pycon
>>> t = md.Series([2, -4, -1, 6, 3])
>>> t.execute()
0 2
1 -4
2 -1
3 6
4 3
dtype: int64
```
```pycon
>>> df.clip(lower=t, upper=t).execute()
col_0 col_1
0 2 2
1 -3 -4
2 0 -1
3 -1 6
4 5 3
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.combine.md
# maxframe.dataframe.Series.combine
#### Series.combine(other, func, fill_value=None)
Combine the Series with a Series or scalar according to func.
Combine the Series and other using func to perform elementwise
selection for combined Series.
fill_value is assumed when value is missing at some index
from one of the two objects being combined.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar*) – The value(s) to be combined with the Series.
* **func** (*function*) – Function that takes two scalars as inputs and returns an element.
* **fill_value** (*scalar* *,* *optional*) – The value to assume when an index is missing from
one Series or the other. The default specifies to use the
appropriate NaN value for the underlying dtype of the Series.
* **Returns:**
The result of combining the Series with the other object.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.combine_first`](maxframe.dataframe.Series.combine_first.md#maxframe.dataframe.Series.combine_first)
: Combine Series values, choosing the calling Series’ values first.
### Examples
Consider 2 Datasets `s1` and `s2` containing
highest clocked speeds of different birds.
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series({'falcon': 330.0, 'eagle': 160.0})
>>> s1.execute()
falcon 330.0
eagle 160.0
dtype: float64
>>> s2 = md.Series({'falcon': 345.0, 'eagle': 200.0, 'duck': 30.0})
>>> s2.execute()
falcon 345.0
eagle 200.0
duck 30.0
dtype: float64
```
Now, to combine the two datasets and view the highest speeds
of the birds across the two datasets
```pycon
>>> s1.combine(s2, max).execute()
duck NaN
eagle 200.0
falcon 345.0
dtype: float64
```
In the previous example, the resulting value for duck is missing,
because the maximum of a NaN and a float is a NaN.
So, in the example, we set `fill_value=0`,
so the maximum value returned will be the value from some dataset.
```pycon
>>> s1.combine(s2, max, fill_value=0).execute()
duck 30.0
eagle 200.0
falcon 345.0
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.combine_first.md
# maxframe.dataframe.Series.combine_first
#### Series.combine_first(other)
Update null elements with value in the same location in ‘other’.
Combine two Series objects by filling null values in one Series with
non-null values from the other Series. Result index will be the union
of the two indexes.
* **Parameters:**
**other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – The value(s) to be used for filling null values.
* **Returns:**
The result of combining the provided Series with the other object.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.combine`](maxframe.dataframe.Series.combine.md#maxframe.dataframe.Series.combine)
: Perform element-wise operation on two Series using a given function.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s1 = md.Series([1, mt.nan])
>>> s2 = md.Series([3, 4, 5])
>>> s1.combine_first(s2).execute()
0 1.0
1 4.0
2 5.0
dtype: float64
```
Null values still persist if the location of that null value
does not exist in other
```pycon
>>> s1 = md.Series({'falcon': mt.nan, 'eagle': 160.0})
>>> s2 = md.Series({'eagle': 200.0, 'duck': 30.0})
>>> s1.combine_first(s2).execute()
duck 30.0
eagle 160.0
falcon NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.compare.md
# maxframe.dataframe.Series.compare
#### Series.compare(other, align_axis: [int](https://docs.python.org/3/library/functions.html#int) | [str](https://docs.python.org/3/library/stdtypes.html#str) = 1, keep_shape: [bool](https://docs.python.org/3/library/functions.html#bool) = False, keep_equal: [bool](https://docs.python.org/3/library/functions.html#bool) = False, result_names: [Tuple](https://docs.python.org/3/library/typing.html#typing.Tuple)[[str](https://docs.python.org/3/library/stdtypes.html#str), [str](https://docs.python.org/3/library/stdtypes.html#str)] = ('self', 'other'))
Compare to another Series and show the differences.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – Object to compare with.
* **align_axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 1*) –
Determine which axis to align the comparison on.
* 0, or ‘index’
: with rows drawn alternately from self and other.
* 1, or ‘columns’
: with columns drawn alternately from self and other.
* **keep_shape** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If true, all rows and columns are kept.
Otherwise, only the ones with different values are kept.
* **keep_equal** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If true, the result keeps values that are equal.
Otherwise, equal values are shown as NaNs.
* **result_names** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *,* *default* *(* *‘self’* *,* *‘other’* *)*) – Set the dataframes names in the comparison.
* **Returns:**
If axis is 0 or ‘index’ the result will be a Series.
The resulting index will be a MultiIndex with ‘self’ and ‘other’
stacked alternately at the inner level.
If axis is 1 or ‘columns’ the result will be a DataFrame.
It will have two columns namely ‘self’ and ‘other’.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.compare`](maxframe.dataframe.DataFrame.compare.md#maxframe.dataframe.DataFrame.compare)
: Compare with another DataFrame and show differences.
### Notes
Matching NaNs will not appear as a difference.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series(["a", "b", "c", "d", "e"])
>>> s2 = md.Series(["a", "a", "c", "b", "e"])
```
Align the differences on columns
```pycon
>>> s1.compare(s2).execute()
self other
1 b a
3 d b
```
Stack the differences on indices
```pycon
>>> s1.compare(s2, align_axis=0).execute()
1 self b
other a
3 self d
other b
dtype: object
```
Keep all original rows
```pycon
>>> s1.compare(s2, keep_shape=True).execute()
self other
0 NaN NaN
1 b a
2 NaN NaN
3 d b
4 NaN NaN
```
Keep all original rows and also all original values
```pycon
>>> s1.compare(s2, keep_shape=True, keep_equal=True).execute()
self other
0 a a
1 b a
2 c c
3 d b
4 e e
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.convert_dtypes.md
# maxframe.dataframe.Series.convert_dtypes
#### Series.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True, dtype_backend='numpy')
Convert columns to best possible dtypes using dtypes supporting `pd.NA`.
* **Parameters:**
* **infer_objects** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Whether object dtypes should be converted to the best possible types.
* **convert_string** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Whether object dtypes should be converted to `StringDtype()`.
* **convert_integer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Whether, if possible, conversion can be done to integer extension types.
* **convert_boolean** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *defaults True*) – Whether object dtypes should be converted to `BooleanDtypes()`.
* **convert_floating** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *defaults True*) – Whether, if possible, conversion can be done to floating extension types.
If convert_integer is also True, preference will be give to integer
dtypes if the floats can be faithfully casted to integers.
* **Returns:**
Copy of input object with new dtype.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`infer_objects`](maxframe.dataframe.Series.infer_objects.md#maxframe.dataframe.Series.infer_objects)
: Infer dtypes of objects.
[`to_datetime`](maxframe.dataframe.to_datetime.md#maxframe.dataframe.to_datetime)
: Convert argument to datetime.
`to_timedelta`
: Convert argument to timedelta.
[`to_numeric`](maxframe.dataframe.to_numeric.md#maxframe.dataframe.to_numeric)
: Convert argument to a numeric type.
### Notes
By default, `convert_dtypes` will attempt to convert a Series (or each
Series in a DataFrame) to dtypes that support `pd.NA`. By using the options
`convert_string`, `convert_integer`, `convert_boolean` and
`convert_boolean`, it is possible to turn off individual conversions
to `StringDtype`, the integer extension types, `BooleanDtype`
or floating extension types, respectively.
For object-dtyped columns, if `infer_objects` is `True`, use the inference
rules as during normal Series/DataFrame construction. Then, if possible,
convert to `StringDtype`, `BooleanDtype` or an appropriate integer
or floating extension type, otherwise leave as `object`.
If the dtype is integer, convert to an appropriate integer extension type.
If the dtype is numeric, and consists of all integers, convert to an
appropriate integer extension type. Otherwise, convert to an
appropriate floating extension type.
#### Versionchanged
Changed in version 1.2: Starting with pandas 1.2, this method also converts float columns
to the nullable floating extension type.
In the future, as new dtypes are added that support `pd.NA`, the results
of this method will change to support those new dtypes.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(
... {
... "a": md.Series([1, 2, 3], dtype=mt.dtype("int32")),
... "b": md.Series(["x", "y", "z"], dtype=mt.dtype("O")),
... "c": md.Series([True, False, mt.nan], dtype=mt.dtype("O")),
... "d": md.Series(["h", "i", mt.nan], dtype=mt.dtype("O")),
... "e": md.Series([10, mt.nan, 20], dtype=mt.dtype("float")),
... "f": md.Series([mt.nan, 100.5, 200], dtype=mt.dtype("float")),
... }
... )
```
Start with a DataFrame with default dtypes.
```pycon
>>> df.execute()
a b c d e f
0 1 x True h 10.0 NaN
1 2 y False i NaN 100.5
2 3 z NaN NaN 20.0 200.0
```
```pycon
>>> df.dtypes.execute()
a int32
b object
c object
d object
e float64
f float64
dtype: object
```
Convert the DataFrame to use best possible dtypes.
```pycon
>>> dfn = df.convert_dtypes()
>>> dfn.execute()
a b c d e f
0 1 x True h 10 <NA>
1 2 y False i <NA> 100.5
2 3 z <NA> <NA> 20 200.0
```
```pycon
>>> dfn.dtypes.execute()
a Int32
b string
c boolean
d string
e Int64
f Float64
dtype: object
```
Start with a Series of strings and missing data represented by `np.nan`.
```pycon
>>> s = md.Series(["a", "b", mt.nan])
>>> s.execute()
0 a
1 b
2 NaN
dtype: object
```
Obtain a Series with dtype `StringDtype`.
```pycon
>>> s.convert_dtypes().execute()
0 a
1 b
2 <NA>
dtype: string
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.copy.md
# maxframe.dataframe.Series.copy
#### Series.copy(deep=True)
Make a copy of this object’s indices and data.
When `deep=True` (default), a new object will be created with a
copy of the calling object’s data and indices. Modifications to
the data or indices of the copy will not be reflected in the
original object (see notes below).
When `deep=False`, a new object will be created without copying
the calling object’s data or index (only references to the data
and index are copied). Any changes to the data of the original
will be reflected in the shallow copy (and vice versa).
* **Parameters:**
**deep** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Make a deep copy, including a copy of the data and the indices.
With `deep=False` neither the indices nor the data are copied.
* **Returns:**
**copy** – Object type matches caller.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.corr.md
# maxframe.dataframe.Series.corr
#### Series.corr(other, method='pearson', min_periods=None)
Compute correlation with other Series, excluding missing values.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – Series with which to compute the correlation.
* **method** ( *{'pearson'* *,* *'kendall'* *,* *'spearman'}* *or* *callable*) –
Method used to compute correlation:
- pearson : Standard correlation coefficient
- kendall : Kendall Tau correlation coefficient
- spearman : Spearman rank correlation
- callable: Callable with input two 1d ndarrays and returning a float.
#### NOTE
kendall, spearman and callables not supported on multiple chunks yet.
* **min_periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Minimum number of observations needed to have a valid result.
* **Returns:**
Correlation with other.
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float)
#### SEE ALSO
[`DataFrame.corr`](maxframe.dataframe.DataFrame.corr.md#maxframe.dataframe.DataFrame.corr)
: Compute pairwise correlation between columns.
[`DataFrame.corrwith`](maxframe.dataframe.DataFrame.corrwith.md#maxframe.dataframe.DataFrame.corrwith)
: Compute pairwise correlation with another DataFrame or Series.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series([.2, .0, .6, .2])
>>> s2 = md.Series([.3, .6, .0, .1])
>>> s1.corr(s2, method='pearson').execute()
-0.8510644963469898
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.count.md
# maxframe.dataframe.Series.count
#### Series.count(level=None, \*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.cov.md
# maxframe.dataframe.Series.cov
#### Series.cov(other, min_periods=None, ddof=1)
Compute covariance with Series, excluding missing values.
The two Series objects are not required to be the same length and
will be aligned internally before the covariance is calculated.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – Series with which to compute the covariance.
* **min_periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Minimum number of observations needed to have a valid result.
* **ddof** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 1*) – Delta degrees of freedom. The divisor used in calculations
is `N - ddof`, where `N` represents the number of elements.
* **Returns:**
Covariance between Series and other normalized by N-1
(unbiased estimator).
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float)
#### SEE ALSO
[`DataFrame.cov`](maxframe.dataframe.DataFrame.cov.md#maxframe.dataframe.DataFrame.cov)
: Compute pairwise covariance of columns.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series([0.90010907, 0.13484424, 0.62036035])
>>> s2 = md.Series([0.12528585, 0.26962463, 0.51111198])
>>> s1.cov(s2).execute()
-0.01685762652715874
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.describe.md
# maxframe.dataframe.Series.describe
#### Series.describe(percentiles=None, include=None, exclude=None)
Generate descriptive statistics.
Descriptive statistics include those that summarize the central
tendency, dispersion and shape of a
dataset’s distribution, excluding `NaN` values.
Analyzes both numeric and object series, as well
as `DataFrame` column sets of mixed data types. The output
will vary depending on what is provided. Refer to the notes
below for more detail.
* **Parameters:**
* **percentiles** (*list-like* *of* *numbers* *,* *optional*) – The percentiles to include in the output. All should
fall between 0 and 1. The default is
`[.25, .5, .75]`, which returns the 25th, 50th, and
75th percentiles.
* **include** ( *'all'* *,* *list-like* *of* *dtypes* *or* *None* *(**default* *)* *,* *optional*) –
A white list of data types to include in the result. Ignored
for `Series`. Here are the options:
- ’all’ : All columns of the input will be included in the output.
- A list-like of dtypes : Limits the results to the
provided data types.
To limit the result to numeric types submit
`numpy.number`. To limit it instead to object columns submit
the `numpy.object` data type. Strings
can also be used in the style of
`select_dtypes` (e.g. `df.describe(include=['O'])`).
- None (default) : The result will include all numeric columns.
* **exclude** (*list-like* *of* *dtypes* *or* *None* *(**default* *)* *,* *optional* *,*) –
A black list of data types to omit from the result. Ignored
for `Series`. Here are the options:
- A list-like of dtypes : Excludes the provided data types
from the result. To exclude numeric types submit
`numpy.number`. To exclude object columns submit the data
type `numpy.object`. Strings can also be used in the style of
`select_dtypes` (e.g. `df.describe(exclude=['O'])`).
- None (default) : The result will exclude nothing.
* **Returns:**
Summary statistics of the Series or Dataframe provided.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.count`](maxframe.dataframe.DataFrame.count.md#maxframe.dataframe.DataFrame.count)
: Count number of non-NA/null observations.
[`DataFrame.max`](maxframe.dataframe.DataFrame.max.md#maxframe.dataframe.DataFrame.max)
: Maximum of the values in the object.
[`DataFrame.min`](maxframe.dataframe.DataFrame.min.md#maxframe.dataframe.DataFrame.min)
: Minimum of the values in the object.
[`DataFrame.mean`](maxframe.dataframe.DataFrame.mean.md#maxframe.dataframe.DataFrame.mean)
: Mean of the values.
[`DataFrame.std`](maxframe.dataframe.DataFrame.std.md#maxframe.dataframe.DataFrame.std)
: Standard deviation of the observations.
[`DataFrame.select_dtypes`](maxframe.dataframe.DataFrame.select_dtypes.md#maxframe.dataframe.DataFrame.select_dtypes)
: Subset of a DataFrame including/excluding columns based on their dtype.
### Notes
For numeric data, the result’s index will include `count`,
`mean`, `std`, `min`, `max` as well as lower, `50` and
upper percentiles. By default the lower percentile is `25` and the
upper percentile is `75`. The `50` percentile is the
same as the median.
For object data (e.g. strings or timestamps), the result’s index
will include `count`, `unique`, `top`, and `freq`. The `top`
is the most common value. The `freq` is the most common value’s
frequency. Timestamps also include the `first` and `last` items.
If multiple object values have the highest count, then the
`count` and `top` results will be arbitrarily chosen from
among those with the highest count.
For mixed data types provided via a `DataFrame`, the default is to
return only an analysis of numeric columns. If the dataframe consists
only of object data without any numeric columns, the default is to
return an analysis of object columns. If `include='all'` is provided
as an option, the result will include a union of attributes of each type.
The include and exclude parameters can be used to limit
which columns in a `DataFrame` are analyzed for the output.
The parameters are ignored when analyzing a `Series`.
### Examples
Describing a numeric `Series`.
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3])
>>> s.describe().execute()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
dtype: float64
```
Describing a `DataFrame`. By default only numeric fields
are returned.
```pycon
>>> df = md.DataFrame({'numeric': [1, 2, 3],
... 'object': ['a', 'b', 'c']
... })
>>> df.describe().execute()
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
```
Describing all columns of a `DataFrame` regardless of data type.
```pycon
>>> df.describe(include='all').execute()
numeric object
count 3.0 3
unique NaN 3
top NaN a
freq NaN 1
mean 2.0 NaN
std 1.0 NaN
min 1.0 NaN
25% 1.5 NaN
50% 2.0 NaN
75% 2.5 NaN
max 3.0 NaN
```
Describing a column from a `DataFrame` by accessing it as
an attribute.
```pycon
>>> df.numeric.describe().execute()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
Name: numeric, dtype: float64
```
Including only numeric columns in a `DataFrame` description.
```pycon
>>> df.describe(include=[mt.number]).execute()
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
```
Including only string columns in a `DataFrame` description.
```pycon
>>> df.describe(include=[object]).execute()
object
count 3
unique 3
top a
freq 1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dict.__getitem__.md
# maxframe.dataframe.Series.dict._\_getitem_\_
#### Series.dict.\_\_getitem_\_(query_key)
Get the value by the key of each dict in the Series. If the key is not in the dict,
raise KeyError.
* **Parameters:**
**query_key** (*Any*) – The key to check, must be in the same key type of the dict.
* **Returns:**
A Series with the dict value’s data type. Return `None` if the dict is None.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
* **Raises:**
[**KeyError**](https://docs.python.org/3/library/exceptions.html#KeyError) – If the key is not in one dict.
#### SEE ALSO
[`Series.dict.get`](maxframe.dataframe.Series.dict.get.md#maxframe.dataframe.Series.dict.get)
: Get the value by the key of each dict in the Series with an optional
`default`
### Examples
Create a series with dict type data.
```pycon
>>> import maxframe.dataframe as md
>>> import pyarrow as pa
>>> from maxframe.lib.dtypes_extension import dict_
>>> s = md.Series(
... data=[[("k1", 1), ("k2", 2)], [("k1", 3)], None],
... index=[1, 2, 3],
... dtype=dict_(pa.string(), pa.int64()),
... )
>>> s.execute()
1 [('k1', 1), ('k2', 2)]
2 [('k1', 3)]
3 <NA>
dtype: map<string, int64>[pyarrow]
```
```pycon
>>> s.dict["k1"].execute()
1 1
2 3
3 <NA>
Name: k1, dtype: int64[pyarrow]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dict.__setitem__.md
# maxframe.dataframe.Series.dict._\_setitem_\_
#### Series.dict.\_\_setitem_\_(query_key, value)
Set the value with the key to each dict of the Series.
* **Parameters:**
* **query_key** (*Any*) – The key of the value to set to, must be in the same key type of the dict.
* **value** (*Any*) – The value to set, must be in the same value type of the dict. If the `query_key`
exists, the value will be replaced. Otherwise, the value will be added. A dict
will be skipped if it’s `None`.
* **Returns:**
A Series with the same data type.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
Create a series with dict type data.
```pycon
>>> import maxframe.dataframe as md
>>> import pyarrow as pa
>>> from maxframe.lib.dtypes_extension import dict_
>>> s = md.Series(
... data=[[("k1", 1), ("k2", 2)], [("k1", 3)], None],
... index=[1, 2, 3],
... dtype=dict_(pa.string(), pa.int64()),
... )
>>> s.execute()
1 [('k1', 1), ('k2', 2)]
2 [('k1', 3)]
3 <NA>
dtype: map<string, int64>[pyarrow]
```
```pycon
>>> s.dict["k2"] = 4
>>> s.execute()
1 [('k1', 1), ('k2', 4)]
2 [('k1', 3), ('k2', 4)]
3 <NA>
dtype: map<string, int64>[pyarrow]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dict.contains.md
# maxframe.dataframe.Series.dict.contains
#### Series.dict.contains(query_key)
Check whether the key is in each dict of the Series.
* **Parameters:**
**query_key** (*Any*) – The key to check, must be in the same key type of the dict.
* **Returns:**
A Series with data type `pandas.ArrowDtype(pyarrow.bool_)`. The value will
be `True` if the key is in the dict, `False` otherwise, or `None` if the
dict is None.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
Create a series with dict type data.
```pycon
>>> import maxframe.dataframe as md
>>> import pyarrow as pa
>>> from maxframe.lib.dtypes_extension import dict_
>>> s = md.Series(
... data=[[("k1", 1), ("k2", 2)], [("k1", 3)], None],
... index=[1, 2, 3],
... dtype=map_(pa.string(), pa.int64()),
... )
>>> s.execute()
1 [('k1', 1), ('k2', 2)]
2 [('k1', 3)]
3 <NA>
dtype: map<string, int64>[pyarrow]
```
```pycon
>>> s.dict.contains("k2").execute()
1 True
2 False
3 <NA>
dtype: bool[pyarrow]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dict.get.md
# maxframe.dataframe.Series.dict.get
#### Series.dict.get(query_key, default_value=None)
Get the value by the key of each dict in the Series.
* **Parameters:**
* **query_key** (*Any*) – The key to check, must be in the same key type of the dict.
* **default_value** (*Any* *,* *optional*) – The value to return if the key is not in the dict, by default None.
* **Returns:**
A Series with the dict value’s data type. The value will be `default_value`
if the key is not in the dict, or `None` if the dict is None.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.dict.__getitem__`](maxframe.dataframe.Series.dict.__getitem__.md#maxframe.dataframe.Series.dict.__getitem__)
: Get the value by the key of each dict in the Series.
### Examples
Create a series with dict type data.
```pycon
>>> import maxframe.dataframe as md
>>> import pyarrow as pa
>>> from maxframe.lib.dtypes_extension import dict_
>>> s = md.Series(
... data=[[("k1", 1), ("k2", 2)], [("k1", 3)], None],
... index=[1, 2, 3],
... dtype=dict_(pa.string(), pa.int64()),
... )
>>> s.execute()
1 [('k1', 1), ('k2', 2)]
2 [('k1', 3)]
3 <NA>
dtype: map<string, int64>[pyarrow]
```
```pycon
>>> s.dict.get("k2", 9).execute()
1 2
2 9
3 <NA>
Name: k2, dtype: int64[pyarrow]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dict.len.md
# maxframe.dataframe.Series.dict.len
#### Series.dict.len()
Get the length of each dict of the Series.
* **Returns:**
A Series with data type `pandas.ArrowDtype(pyarrow.int64)`. Each element
represents the length of the dict, or `None` if the dict is `None`.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
Create a series with dict type data.
```pycon
>>> import maxframe.dataframe as md
>>> import pyarrow as pa
>>> from maxframe.lib.dtypes_extension import dict_
>>> s = md.Series(
... data=[[("k1", 1), ("k2", 2)], [("k1", 3)], None],
... index=[1, 2, 3],
... dtype=map_(pa.string(), pa.int64()),
... )
>>> s.execute()
1 [('k1', 1), ('k2', 2)]
2 [('k1', 3)]
3 <NA>
dtype: map<string, int64>[pyarrow]
```
```pycon
>>> s.dict.len().execute()
1 2
2 1
3 <NA>
dtype: int64[pyarrow]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dict.remove.md
# maxframe.dataframe.Series.dict.remove
#### Series.dict.remove(query_key, ignore_key_error: [bool](https://docs.python.org/3/library/functions.html#bool) = False)
Remove the item by the key from each dict of the Series.
* **Parameters:**
* **query_key** (*Any*) – The key to remove, must be in the same key type of the dict.
* **ignore_key_error** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional* *,* *default False*) – When the `query_key` is not in the dict, if `ignore_key_error` is True,
nothing will happen in the dict. If `ignore_key_error` is `False`, an
`KeyError` will be raised. If the dict is `None`, returns `None`.
* **Returns:**
A Series with the same data type. If the dict is `None`.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
* **Raises:**
**KeyError :** – If the `query_key` is not in one dict and `ignore_key_error` is `False`.
### Examples
Create a series with dict type data.
```pycon
>>> import maxframe.dataframe as md
>>> import pyarrow as pa
>>> from maxframe.lib.dtypes_extension import dict_
>>> s = md.Series(
... data=[[("k1", 1), ("k2", 2)], [("k1", 3)], None],
... index=[1, 2, 3],
... dtype=map_(pa.string(), pa.int64()),
... )
>>> s.execute()
1 [('k1', 1), ('k2', 2)]
2 [('k1', 3)]
3 <NA>
dtype: map<string, int64>[pyarrow]
```
```pycon
>>> s.dict.remove("k2", ignore_key_error=True).execute()
1 [('k1', 1)]
2 [('k1', 3)]
3 <NA>
dtype: map<string, int64>[pyarrow]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.div.md
# maxframe.dataframe.Series.div
#### Series.div(other, level=None, fill_value=None, axis=0)
Return Floating division of series and other, element-wise (binary operator truediv).
Equivalent to `series / other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.rtruediv`](maxframe.dataframe.Series.rtruediv.md#maxframe.dataframe.Series.rtruediv)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.truediv(b, fill_value=0).execute()
a 1.0
b inf
c inf
d 0.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.drop.md
# maxframe.dataframe.Series.drop
#### Series.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Return Series with specified index labels removed.
Remove elements of a Series based on specifying the index labels.
When using a multi-index, labels on different levels can be removed
by specifying the level.
* **Parameters:**
* **labels** (*single label* *or* *list-like*) – Index labels to drop.
* **axis** (*0* *,* *default 0*) – Redundant for application on Series.
* **index** (*single label* *or* *list-like*) –
Redundant for application on Series, but ‘index’ can be used instead
of ‘labels’.
#### Versionadded
Added in version 0.21.0.
* **columns** (*single label* *or* *list-like*) –
No change is made to the Series; use ‘index’ or ‘labels’ instead.
#### Versionadded
Added in version 0.21.0.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *level name* *,* *optional*) – For MultiIndex, level for which the labels will be removed.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, do operation inplace and return None.
* **errors** ( *{'ignore'* *,* *'raise'}* *,* *default 'raise'*) – Note that this argument is kept only for compatibility, and errors
will not raise even if `errors=='raise'`.
* **Returns:**
Series with specified index labels removed.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
* **Raises:**
[**KeyError**](https://docs.python.org/3/library/exceptions.html#KeyError) – If none of the labels are found in the index.
#### SEE ALSO
[`Series.reindex`](maxframe.dataframe.Series.reindex.md#maxframe.dataframe.Series.reindex)
: Return only specified index labels of Series.
[`Series.dropna`](maxframe.dataframe.Series.dropna.md#maxframe.dataframe.Series.dropna)
: Return series without null values.
[`Series.drop_duplicates`](maxframe.dataframe.Series.drop_duplicates.md#maxframe.dataframe.Series.drop_duplicates)
: Return Series with duplicate values removed.
[`DataFrame.drop`](maxframe.dataframe.DataFrame.drop.md#maxframe.dataframe.DataFrame.drop)
: Drop specified labels from rows or columns.
### Examples
```pycon
>>> import numpy as np
>>> import pandas as pd
>>> import maxframe.dataframe as md
>>> s = md.Series(data=np.arange(3), index=['A', 'B', 'C'])
>>> s.execute()
A 0
B 1
C 2
dtype: int64
```
Drop labels B en C
```pycon
>>> s.drop(labels=['B', 'C']).execute()
A 0
dtype: int64
```
Drop 2nd level label in MultiIndex Series
```pycon
>>> midx = pd.MultiIndex(levels=[['lame', 'cow', 'falcon'],
... ['speed', 'weight', 'length']],
... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
... [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> s = md.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
... index=midx)
>>> s.execute()
lame speed 45.0
weight 200.0
length 1.2
cow speed 30.0
weight 250.0
length 1.5
falcon speed 320.0
weight 1.0
length 0.3
dtype: float64
```
```pycon
>>> s.drop(labels='weight', level=1).execute()
lame speed 45.0
length 1.2
cow speed 30.0
length 1.5
falcon speed 320.0
length 0.3
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.drop_duplicates.md
# maxframe.dataframe.Series.drop_duplicates
#### Series.drop_duplicates(keep='first', inplace=False, ignore_index=False, method='auto', default_index_type=None)
Return Series with duplicate values removed.
* **Parameters:**
* **keep** ({‘first’, ‘last’, `False`}, default ‘first’) –
Method to handle dropping duplicates:
- ’first’ : Drop duplicates except for the first occurrence.
- ’last’ : Drop duplicates except for the last occurrence.
- ’any’ : Drop duplicates except for a random occurrence.
- `False` : Drop all duplicates.
* **inplace** (bool, default `False`) – If `True`, performs operation inplace and returns None.
* **Returns:**
Series with duplicates dropped.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Index.drop_duplicates`](maxframe.dataframe.Index.drop_duplicates.md#maxframe.dataframe.Index.drop_duplicates)
: Equivalent method on Index.
[`DataFrame.drop_duplicates`](maxframe.dataframe.DataFrame.drop_duplicates.md#maxframe.dataframe.DataFrame.drop_duplicates)
: Equivalent method on DataFrame.
`Series.duplicated`
: Related method on Series, indicating duplicate Series values.
### Examples
Generate a Series with duplicated entries.
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(['lame', 'cow', 'lame', 'beetle', 'lame', 'hippo'],
... name='animal')
>>> s.execute()
0 lame
1 cow
2 lame
3 beetle
4 lame
5 hippo
Name: animal, dtype: object
```
With the ‘keep’ parameter, the selection behaviour of duplicated values
can be changed. The value ‘first’ keeps the first occurrence for each
set of duplicated entries. The default value of keep is ‘first’.
>>> s.drop_duplicates().execute()
0 lame
1 cow
3 beetle
5 hippo
Name: animal, dtype: object
The value ‘last’ for parameter ‘keep’ keeps the last occurrence for
each set of duplicated entries.
>>> s.drop_duplicates(keep=’last’).execute()
1 cow
3 beetle
4 lame
5 hippo
Name: animal, dtype: object
The value `False` for parameter ‘keep’ discards all sets of
duplicated entries. Setting the value of ‘inplace’ to `True` performs
the operation inplace and returns `None`.
```pycon
>>> s.drop_duplicates(keep=False, inplace=True)
>>> s.execute()
1 cow
3 beetle
5 hippo
Name: animal, dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.droplevel.md
# maxframe.dataframe.Series.droplevel
#### Series.droplevel(level, axis=0)
Return Series/DataFrame with requested index / column level(s) removed.
* **Parameters:**
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *, or* *list-like*) – If a string is given, must be the name of a level
If list-like, elements must be names or positional indexes
of levels.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) –
Axis along which the level(s) is removed:
* 0 or ‘index’: remove level(s) in column.
* 1 or ‘columns’: remove level(s) in row.
For Series this parameter is unused and defaults to 0.
* **Returns:**
Series/DataFrame with requested index / column level(s) removed.
* **Return type:**
Series/DataFrame
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([
... [1, 2, 3, 4],
... [5, 6, 7, 8],
... [9, 10, 11, 12]
... ]).set_index([0, 1]).rename_axis(['a', 'b'])
```
```pycon
>>> df.columns = md.MultiIndex.from_tuples([
... ('c', 'e'), ('d', 'f')
... ], names=['level_1', 'level_2'])
```
```pycon
>>> df.execute()
level_1 c d
level_2 e f
a b
1 2 3 4
5 6 7 8
9 10 11 12
```
```pycon
>>> df.droplevel('a').execute()
level_1 c d
level_2 e f
b
2 3 4
6 7 8
10 11 12
```
```pycon
>>> df.droplevel('level_2', axis=1).execute()
level_1 c d
a b
1 2 3 4
5 6 7 8
9 10 11 12
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dropna.md
# maxframe.dataframe.Series.dropna
#### Series.dropna(axis=0, inplace=False, how=None, ignore_index=False)
Return a new Series with missing values removed.
See the [User Guide](https://www.statsmodels.org/devel/missing.html#missing-data) for more on which values are
considered missing, and how to work with missing data.
* **Parameters:**
* **axis** ( *{0* *or* *'index'}* *,* *default 0*) – There is only one axis to drop values from.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, do operation inplace and return None.
* **how** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Not in use. Kept for compatibility.
* **ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
* **Returns:**
Series with NA entries dropped from it.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.isna`](maxframe.dataframe.Series.isna.md#maxframe.dataframe.Series.isna)
: Indicate missing values.
[`Series.notna`](maxframe.dataframe.Series.notna.md#maxframe.dataframe.Series.notna)
: Indicate existing (non-missing) values.
[`Series.fillna`](maxframe.dataframe.Series.fillna.md#maxframe.dataframe.Series.fillna)
: Replace missing values.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Drop rows or columns which contain NA values.
[`Index.dropna`](maxframe.dataframe.Index.dropna.md#maxframe.dataframe.Index.dropna)
: Drop missing indices.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> ser = md.Series([1., 2., np.nan])
>>> ser.execute()
0 1.0
1 2.0
2 NaN
dtype: float64
```
Drop NA values from a Series.
```pycon
>>> ser.dropna().execute()
0 1.0
1 2.0
dtype: float64
```
Keep the Series with valid entries in the same variable.
```pycon
>>> ser.dropna(inplace=True)
>>> ser.execute()
0 1.0
1 2.0
dtype: float64
```
Empty strings are not considered NA values. `None` is considered an
NA value.
```pycon
>>> ser = md.Series([np.NaN, '2', md.NaT, '', None, 'I stay'])
>>> ser.execute()
0 NaN
1 2
2 NaT
3
4 None
5 I stay
dtype: object
>>> ser.dropna().execute()
1 2
3
5 I stay
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.ceil.md
# maxframe.dataframe.Series.dt.ceil
#### Series.dt.ceil(\*args, \*\*kwargs)
Perform ceil operation on the data to the specified freq.
* **Parameters:**
* **freq** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *Offset*) – The frequency level to ceil the index to. Must be a fixed
frequency like ‘S’ (second) not ‘ME’ (month end). See
[frequency aliases](https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases) for
a list of possible freq values.
* **ambiguous** ( *'infer'* *,* *bool-ndarray* *,* *'NaT'* *,* *default 'raise'*) –
Only relevant for DatetimeIndex:
- ’infer’ will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for
ambiguous times)
- ’NaT’ will return NaT where there are ambiguous times
- ’raise’ will raise an AmbiguousTimeError if there are ambiguous
times.
* **nonexistent** ( *'shift_forward'* *,* *'shift_backward'* *,* *'NaT'* *,* *timedelta* *,* *default 'raise'*) –
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- ’shift_forward’ will shift the nonexistent time forward to the
closest existing time
- ’shift_backward’ will shift the nonexistent time backward to the
closest existing time
- ’NaT’ will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- ’raise’ will raise an NonExistentTimeError if there are
nonexistent times.
* **Returns:**
Index of the same type for a DatetimeIndex or TimedeltaIndex,
or a Series with the same index for a Series.
* **Return type:**
DatetimeIndex, TimedeltaIndex, or [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
* **Raises:**
**ValueError if the freq cannot be converted.** –
### Notes
If the timestamps have a timezone, ceiling will take place relative to the
local (“wall”) time and re-localized to the same timezone. When ceiling
near daylight savings time, use `nonexistent` and `ambiguous` to
control the re-localization behavior.
### Examples
**DatetimeIndex**
```pycon
>>> import maxframe.dataframe as md
>>> rng = md.date_range('1/1/2018 11:59:00', periods=3, freq='min')
>>> rng.execute()
DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 12:00:00',
'2018-01-01 12:01:00'],
dtype='datetime64[ns]', freq='min')
>>> rng.ceil('h').execute()
DatetimeIndex(['2018-01-01 12:00:00', '2018-01-01 12:00:00',
'2018-01-01 13:00:00'],
dtype='datetime64[ns]', freq=None)
```
**Series**
```pycon
>>> md.Series(rng).dt.ceil("h").execute()
0 2018-01-01 12:00:00
1 2018-01-01 12:00:00
2 2018-01-01 13:00:00
dtype: datetime64[ns]
```
When rounding near a daylight savings time transition, use `ambiguous` or
`nonexistent` to control how the timestamp should be re-localized.
```pycon
>>> rng_tz = md.DatetimeIndex(["2021-10-31 01:30:00"], tz="Europe/Amsterdam")
```
```pycon
>>> rng_tz.ceil("h", ambiguous=False).execute()
DatetimeIndex(['2021-10-31 02:00:00+01:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
```
```pycon
>>> rng_tz.ceil("h", ambiguous=True).execute()
DatetimeIndex(['2021-10-31 02:00:00+02:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.date.md
# maxframe.dataframe.Series.dt.date
#### Series.dt.date
Returns numpy array of python [`datetime.date`](https://docs.python.org/3/library/datetime.html#datetime.date) objects.
Namely, the date part of Timestamps without time and
timezone information.
### Examples
For Series:
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(["1/1/2020 10:00:00+00:00", "2/1/2020 11:00:00+00:00"])
>>> s = md.to_datetime(s)
>>> s.execute()
0 2020-01-01 10:00:00+00:00
1 2020-02-01 11:00:00+00:00
dtype: datetime64[ns, UTC]
>>> s.dt.date.execute()
0 2020-01-01
1 2020-02-01
dtype: object
```
For DatetimeIndex:
```pycon
>>> idx = md.DatetimeIndex(["1/1/2020 10:00:00+00:00",
... "2/1/2020 11:00:00+00:00"])
>>> idx.date.execute()
array([datetime.date(2020, 1, 1), datetime.date(2020, 2, 1)], dtype=object)
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.day.md
# maxframe.dataframe.Series.dt.day
#### Series.dt.day
The day of the datetime.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> datetime_series = md.Series(
... md.date_range("2000-01-01", periods=3, freq="D")
... )
>>> datetime_series.execute()
0 2000-01-01
1 2000-01-02
2 2000-01-03
dtype: datetime64[ns]
>>> datetime_series.dt.day.execute()
0 1
1 2
2 3
dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.day_name.md
# maxframe.dataframe.Series.dt.day_name
#### Series.dt.day_name(\*args, \*\*kwargs)
Return the day names with specified locale.
* **Parameters:**
**locale** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Locale determining the language in which to return the day name.
Default is English locale (`'en_US.utf8'`). Use the command
`locale -a` on your terminal on Unix systems to find your locale
language code.
* **Returns:**
Series or Index of day names.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(md.date_range(start='2018-01-01', freq='D', periods=3))
>>> s.execute()
0 2018-01-01
1 2018-01-02
2 2018-01-03
dtype: datetime64[ns]
>>> s.dt.day_name().execute()
0 Monday
1 Tuesday
2 Wednesday
dtype: object
```
```pycon
>>> idx = md.date_range(start='2018-01-01', freq='D', periods=3)
>>> idx.execute()
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03'],
dtype='datetime64[ns]', freq='D')
>>> idx.day_name().execute()
Index(['Monday', 'Tuesday', 'Wednesday'], dtype='object')
```
Using the `locale` parameter you can set a different locale language,
for example: `idx.day_name(locale='pt_BR.utf8')` will return day
names in Brazilian Portuguese language.
```pycon
>>> idx = md.date_range(start='2018-01-01', freq='D', periods=3)
>>> idx.execute()
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03'],
dtype='datetime64[ns]', freq='D')
>>> idx.day_name(locale='pt_BR.utf8')
Index(['Segunda', 'Terça', 'Quarta'], dtype='object')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.dayofweek.md
# maxframe.dataframe.Series.dt.dayofweek
#### Series.dt.dayofweek
The day of the week with Monday=0, Sunday=6.
Return the day of the week. It is assumed the week starts on
Monday, which is denoted by 0 and ends on Sunday which is denoted
by 6. This method is available on both Series with datetime
values (using the dt accessor) or DatetimeIndex.
* **Returns:**
Containing integers indicating the day number.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
#### SEE ALSO
[`Series.dt.dayofweek`](#maxframe.dataframe.Series.dt.dayofweek)
: Alias.
[`Series.dt.weekday`](maxframe.dataframe.Series.dt.weekday.md#maxframe.dataframe.Series.dt.weekday)
: Alias.
[`Series.dt.day_name`](maxframe.dataframe.Series.dt.day_name.md#maxframe.dataframe.Series.dt.day_name)
: Returns the name of the day of the week.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.date_range('2016-12-31', '2017-01-08', freq='D').to_series()
>>> s.dt.dayofweek.execute()
2016-12-31 5
2017-01-01 6
2017-01-02 0
2017-01-03 1
2017-01-04 2
2017-01-05 3
2017-01-06 4
2017-01-07 5
2017-01-08 6
Freq: D, dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.dayofyear.md
# maxframe.dataframe.Series.dt.dayofyear
#### Series.dt.dayofyear
The ordinal day of the year.
### Examples
For Series:
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(["1/1/2020 10:00:00+00:00", "2/1/2020 11:00:00+00:00"])
>>> s = md.to_datetime(s)
>>> s.execute()
0 2020-01-01 10:00:00+00:00
1 2020-02-01 11:00:00+00:00
dtype: datetime64[ns, UTC]
>>> s.dt.dayofyear.execute()
0 1
1 32
dtype: int32
```
For DatetimeIndex:
```pycon
>>> idx = md.DatetimeIndex(["1/1/2020 10:00:00+00:00",
... "2/1/2020 11:00:00+00:00"])
>>> idx.dayofyear.execute()
Index([1, 32], dtype='int32')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.days_in_month.md
# maxframe.dataframe.Series.dt.days_in_month
#### Series.dt.days_in_month
The number of days in the month.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(["1/1/2020 10:00:00+00:00", "2/1/2020 11:00:00+00:00"])
>>> s = md.to_datetime(s)
>>> s.execute()
0 2020-01-01 10:00:00+00:00
1 2020-02-01 11:00:00+00:00
dtype: datetime64[ns, UTC]
>>> s.dt.daysinmonth.execute()
0 31
1 29
dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.daysinmonth.md
# maxframe.dataframe.Series.dt.daysinmonth
#### Series.dt.daysinmonth
The number of days in the month.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(["1/1/2020 10:00:00+00:00", "2/1/2020 11:00:00+00:00"])
>>> s = md.to_datetime(s)
>>> s.execute()
0 2020-01-01 10:00:00+00:00
1 2020-02-01 11:00:00+00:00
dtype: datetime64[ns, UTC]
>>> s.dt.daysinmonth.execute()
0 31
1 29
dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.floor.md
# maxframe.dataframe.Series.dt.floor
#### Series.dt.floor(\*args, \*\*kwargs)
Perform floor operation on the data to the specified freq.
* **Parameters:**
* **freq** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *Offset*) – The frequency level to floor the index to. Must be a fixed
frequency like ‘S’ (second) not ‘ME’ (month end). See
[frequency aliases](https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases) for
a list of possible freq values.
* **ambiguous** ( *'infer'* *,* *bool-ndarray* *,* *'NaT'* *,* *default 'raise'*) –
Only relevant for DatetimeIndex:
- ’infer’ will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for
ambiguous times)
- ’NaT’ will return NaT where there are ambiguous times
- ’raise’ will raise an AmbiguousTimeError if there are ambiguous
times.
* **nonexistent** ( *'shift_forward'* *,* *'shift_backward'* *,* *'NaT'* *,* *timedelta* *,* *default 'raise'*) –
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- ’shift_forward’ will shift the nonexistent time forward to the
closest existing time
- ’shift_backward’ will shift the nonexistent time backward to the
closest existing time
- ’NaT’ will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- ’raise’ will raise an NonExistentTimeError if there are
nonexistent times.
* **Returns:**
Index of the same type for a DatetimeIndex or TimedeltaIndex,
or a Series with the same index for a Series.
* **Return type:**
DatetimeIndex, TimedeltaIndex, or [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
* **Raises:**
**ValueError if the freq cannot be converted.** –
### Notes
If the timestamps have a timezone, flooring will take place relative to the
local (“wall”) time and re-localized to the same timezone. When flooring
near daylight savings time, use `nonexistent` and `ambiguous` to
control the re-localization behavior.
### Examples
**DatetimeIndex**
```pycon
>>> import maxframe.dataframe as md
>>> rng = md.date_range('1/1/2018 11:59:00', periods=3, freq='min')
>>> rng.execute()
DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 12:00:00',
'2018-01-01 12:01:00'],
dtype='datetime64[ns]', freq='min')
>>> rng.floor('h').execute()
DatetimeIndex(['2018-01-01 11:00:00', '2018-01-01 12:00:00',
'2018-01-01 12:00:00'],
dtype='datetime64[ns]', freq=None)
```
**Series**
```pycon
>>> md.Series(rng).dt.floor("h").execute()
0 2018-01-01 11:00:00
1 2018-01-01 12:00:00
2 2018-01-01 12:00:00
dtype: datetime64[ns]
```
When rounding near a daylight savings time transition, use `ambiguous` or
`nonexistent` to control how the timestamp should be re-localized.
```pycon
>>> rng_tz = md.DatetimeIndex(["2021-10-31 03:30:00"], tz="Europe/Amsterdam")
```
```pycon
>>> rng_tz.floor("2h", ambiguous=False).execute()
DatetimeIndex(['2021-10-31 02:00:00+01:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
```
```pycon
>>> rng_tz.floor("2h", ambiguous=True).execute()
DatetimeIndex(['2021-10-31 02:00:00+02:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.hour.md
# maxframe.dataframe.Series.dt.hour
#### Series.dt.hour
The hours of the datetime.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> datetime_series = md.Series(
... md.date_range("2000-01-01", periods=3, freq="h")
... )
>>> datetime_series.execute()
0 2000-01-01 00:00:00
1 2000-01-01 01:00:00
2 2000-01-01 02:00:00
dtype: datetime64[ns]
>>> datetime_series.dt.hour.execute()
0 0
1 1
2 2
dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.is_leap_year.md
# maxframe.dataframe.Series.dt.is_leap_year
#### Series.dt.is_leap_year
Boolean indicator if the date belongs to a leap year.
A leap year is a year, which has 366 days (instead of 365) including
29th of February as an intercalary day.
Leap years are years which are multiples of four with the exception
of years divisible by 100 but not by 400.
* **Returns:**
Booleans indicating if dates belong to a leap year.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or ndarray
### Examples
This method is available on Series with datetime values under
the `.dt` accessor, and directly on DatetimeIndex.
```pycon
>>> import maxframe.dataframe as md
>>> idx = md.date_range("2012-01-01", "2015-01-01", freq="YE")
>>> idx.execute()
DatetimeIndex(['2012-12-31', '2013-12-31', '2014-12-31'],
dtype='datetime64[ns]', freq='YE-DEC')
>>> idx.is_leap_year.execute()
array([ True, False, False])
```
```pycon
>>> dates_series = md.Series(idx)
>>> dates_series.execute()
0 2012-12-31
1 2013-12-31
2 2014-12-31
dtype: datetime64[ns]
>>> dates_series.dt.is_leap_year.execute()
0 True
1 False
2 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.is_month_end.md
# maxframe.dataframe.Series.dt.is_month_end
#### Series.dt.is_month_end
Indicates whether the date is the last day of the month.
* **Returns:**
For Series, returns a Series with boolean values.
For DatetimeIndex, returns a boolean array.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or array
#### SEE ALSO
[`is_month_start`](maxframe.dataframe.Series.dt.is_month_start.md#maxframe.dataframe.Series.dt.is_month_start)
: Return a boolean indicating whether the date is the first day of the month.
[`is_month_end`](#maxframe.dataframe.Series.dt.is_month_end)
: Return a boolean indicating whether the date is the last day of the month.
### Examples
This method is available on Series with datetime values under
the `.dt` accessor, and directly on DatetimeIndex.
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(md.date_range("2018-02-27", periods=3))
>>> s.execute()
0 2018-02-27
1 2018-02-28
2 2018-03-01
dtype: datetime64[ns]
>>> s.dt.is_month_start.execute()
0 False
1 False
2 True
dtype: bool
>>> s.dt.is_month_end.execute()
0 False
1 True
2 False
dtype: bool
```
```pycon
>>> idx = md.date_range("2018-02-27", periods=3)
>>> idx.is_month_start.execute()
array([False, False, True])
>>> idx.is_month_end.execute()
array([False, True, False])
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.is_month_start.md
# maxframe.dataframe.Series.dt.is_month_start
#### Series.dt.is_month_start
Indicates whether the date is the first day of the month.
* **Returns:**
For Series, returns a Series with boolean values.
For DatetimeIndex, returns a boolean array.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or array
#### SEE ALSO
[`is_month_start`](#maxframe.dataframe.Series.dt.is_month_start)
: Return a boolean indicating whether the date is the first day of the month.
[`is_month_end`](maxframe.dataframe.Series.dt.is_month_end.md#maxframe.dataframe.Series.dt.is_month_end)
: Return a boolean indicating whether the date is the last day of the month.
### Examples
This method is available on Series with datetime values under
the `.dt` accessor, and directly on DatetimeIndex.
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(md.date_range("2018-02-27", periods=3))
>>> s.execute()
0 2018-02-27
1 2018-02-28
2 2018-03-01
dtype: datetime64[ns]
>>> s.dt.is_month_start.execute()
0 False
1 False
2 True
dtype: bool
>>> s.dt.is_month_end.execute()
0 False
1 True
2 False
dtype: bool
```
```pycon
>>> idx = md.date_range("2018-02-27", periods=3)
>>> idx.is_month_start.execute()
array([False, False, True])
>>> idx.is_month_end.execute()
array([False, True, False])
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.is_quarter_end.md
# maxframe.dataframe.Series.dt.is_quarter_end
#### Series.dt.is_quarter_end
Indicator for whether the date is the last day of a quarter.
* **Returns:**
**is_quarter_end** – The same type as the original data with boolean values. Series will
have the same name and index. DatetimeIndex will have the same
name.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or DatetimeIndex
#### SEE ALSO
[`quarter`](maxframe.dataframe.Series.dt.quarter.md#maxframe.dataframe.Series.dt.quarter)
: Return the quarter of the date.
[`is_quarter_start`](maxframe.dataframe.Series.dt.is_quarter_start.md#maxframe.dataframe.Series.dt.is_quarter_start)
: Similar property indicating the quarter start.
### Examples
This method is available on Series with datetime values under
the `.dt` accessor, and directly on DatetimeIndex.
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'dates': md.date_range("2017-03-30",
... periods=4)})
>>> df.assign(quarter=df.dates.dt.quarter,
... is_quarter_end=df.dates.dt.is_quarter_end).execute()
dates quarter is_quarter_end
0 2017-03-30 1 False
1 2017-03-31 1 True
2 2017-04-01 2 False
3 2017-04-02 2 False
```
```pycon
>>> idx = md.date_range('2017-03-30', periods=4)
>>> idx.execute()
DatetimeIndex(['2017-03-30', '2017-03-31', '2017-04-01', '2017-04-02'],
dtype='datetime64[ns]', freq='D')
```
```pycon
>>> idx.is_quarter_end.execute()
array([False, True, False, False])
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.is_quarter_start.md
# maxframe.dataframe.Series.dt.is_quarter_start
#### Series.dt.is_quarter_start
Indicator for whether the date is the first day of a quarter.
* **Returns:**
**is_quarter_start** – The same type as the original data with boolean values. Series will
have the same name and index. DatetimeIndex will have the same
name.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or DatetimeIndex
#### SEE ALSO
[`quarter`](maxframe.dataframe.Series.dt.quarter.md#maxframe.dataframe.Series.dt.quarter)
: Return the quarter of the date.
[`is_quarter_end`](maxframe.dataframe.Series.dt.is_quarter_end.md#maxframe.dataframe.Series.dt.is_quarter_end)
: Similar property for indicating the quarter end.
### Examples
This method is available on Series with datetime values under
the `.dt` accessor, and directly on DatetimeIndex.
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'dates': md.date_range("2017-03-30",
... periods=4)})
>>> df.assign(quarter=df.dates.dt.quarter,
... is_quarter_start=df.dates.dt.is_quarter_start).execute()
dates quarter is_quarter_start
0 2017-03-30 1 False
1 2017-03-31 1 False
2 2017-04-01 2 True
3 2017-04-02 2 False
```
```pycon
>>> idx = md.date_range('2017-03-30', periods=4)
>>> idx.execute()
DatetimeIndex(['2017-03-30', '2017-03-31', '2017-04-01', '2017-04-02'],
dtype='datetime64[ns]', freq='D')
```
```pycon
>>> idx.is_quarter_start.execute()
array([False, False, True, False])
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.is_year_end.md
# maxframe.dataframe.Series.dt.is_year_end
#### Series.dt.is_year_end
Indicate whether the date is the last day of the year.
* **Returns:**
The same type as the original data with boolean values. Series will
have the same name and index. DatetimeIndex will have the same
name.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or DatetimeIndex
#### SEE ALSO
[`is_year_start`](maxframe.dataframe.Series.dt.is_year_start.md#maxframe.dataframe.Series.dt.is_year_start)
: Similar property indicating the start of the year.
### Examples
This method is available on Series with datetime values under
the `.dt` accessor, and directly on DatetimeIndex.
```pycon
>>> import maxframe.dataframe as md
>>> dates = md.Series(md.date_range("2017-12-30", periods=3))
>>> dates.execute()
0 2017-12-30
1 2017-12-31
2 2018-01-01
dtype: datetime64[ns]
```
```pycon
>>> dates.dt.is_year_end.execute()
0 False
1 True
2 False
dtype: bool
```
```pycon
>>> idx = md.date_range("2017-12-30", periods=3)
>>> idx.execute()
DatetimeIndex(['2017-12-30', '2017-12-31', '2018-01-01'],
dtype='datetime64[ns]', freq='D')
```
```pycon
>>> idx.is_year_end.execute()
array([False, True, False])
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.is_year_start.md
# maxframe.dataframe.Series.dt.is_year_start
#### Series.dt.is_year_start
Indicate whether the date is the first day of a year.
* **Returns:**
The same type as the original data with boolean values. Series will
have the same name and index. DatetimeIndex will have the same
name.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or DatetimeIndex
#### SEE ALSO
[`is_year_end`](maxframe.dataframe.Series.dt.is_year_end.md#maxframe.dataframe.Series.dt.is_year_end)
: Similar property indicating the last day of the year.
### Examples
This method is available on Series with datetime values under
the `.dt` accessor, and directly on DatetimeIndex.
```pycon
>>> import maxframe.dataframe as md
>>> dates = md.Series(md.date_range("2017-12-30", periods=3))
>>> dates.execute()
0 2017-12-30
1 2017-12-31
2 2018-01-01
dtype: datetime64[ns]
```
```pycon
>>> dates.dt.is_year_start.execute()
0 False
1 False
2 True
dtype: bool
```
```pycon
>>> idx = md.date_range("2017-12-30", periods=3)
>>> idx.execute()
DatetimeIndex(['2017-12-30', '2017-12-31', '2018-01-01'],
dtype='datetime64[ns]', freq='D')
```
```pycon
>>> idx.is_year_start.execute()
array([False, False, True])
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.md
# maxframe.dataframe.Series.dt
#### Series.dt()
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.microsecond.md
# maxframe.dataframe.Series.dt.microsecond
#### Series.dt.microsecond
The microseconds of the datetime.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> datetime_series = md.Series(
... md.date_range("2000-01-01", periods=3, freq="us")
... )
>>> datetime_series.execute()
0 2000-01-01 00:00:00.000000
1 2000-01-01 00:00:00.000001
2 2000-01-01 00:00:00.000002
dtype: datetime64[ns]
>>> datetime_series.dt.microsecond.execute()
0 0
1 1
2 2
dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.minute.md
# maxframe.dataframe.Series.dt.minute
#### Series.dt.minute
The minutes of the datetime.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> datetime_series = md.Series(
... md.date_range("2000-01-01", periods=3, freq="min")
... )
>>> datetime_series.execute()
0 2000-01-01 00:00:00
1 2000-01-01 00:01:00
2 2000-01-01 00:02:00
dtype: datetime64[ns]
>>> datetime_series.dt.minute.execute()
0 0
1 1
2 2
dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.month.md
# maxframe.dataframe.Series.dt.month
#### Series.dt.month
The month as January=1, December=12.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> datetime_series = md.Series(
... md.date_range("2000-01-01", periods=3, freq="ME")
... )
>>> datetime_series.execute()
0 2000-01-31
1 2000-02-29
2 2000-03-31
dtype: datetime64[ns]
>>> datetime_series.dt.month.execute()
0 1
1 2
2 3
dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.month_name.md
# maxframe.dataframe.Series.dt.month_name
#### Series.dt.month_name(\*args, \*\*kwargs)
Return the month names with specified locale.
* **Parameters:**
**locale** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Locale determining the language in which to return the month name.
Default is English locale (`'en_US.utf8'`). Use the command
`locale -a` on your terminal on Unix systems to find your locale
language code.
* **Returns:**
Series or Index of month names.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(md.date_range(start='2018-01', freq='ME', periods=3))
>>> s.execute()
0 2018-01-31
1 2018-02-28
2 2018-03-31
dtype: datetime64[ns]
>>> s.dt.month_name().execute()
0 January
1 February
2 March
dtype: object
```
```pycon
>>> idx = md.date_range(start='2018-01', freq='ME', periods=3)
>>> idx.execute()
DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31'],
dtype='datetime64[ns]', freq='ME')
>>> idx.month_name().execute()
Index(['January', 'February', 'March'], dtype='object')
```
Using the `locale` parameter you can set a different locale language,
for example: `idx.month_name(locale='pt_BR.utf8')` will return month
names in Brazilian Portuguese language.
```pycon
>>> idx = md.date_range(start='2018-01', freq='ME', periods=3)
>>> idx.execute()
DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31'],
dtype='datetime64[ns]', freq='ME')
>>> idx.month_name(locale='pt_BR.utf8')
Index(['Janeiro', 'Fevereiro', 'Março'], dtype='object')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.nanosecond.md
# maxframe.dataframe.Series.dt.nanosecond
#### Series.dt.nanosecond
The nanoseconds of the datetime.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> datetime_series = md.Series(
... md.date_range("2000-01-01", periods=3, freq="ns")
... )
>>> datetime_series.execute()
0 2000-01-01 00:00:00.000000000
1 2000-01-01 00:00:00.000000001
2 2000-01-01 00:00:00.000000002
dtype: datetime64[ns]
>>> datetime_series.dt.nanosecond.execute()
0 0
1 1
2 2
dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.normalize.md
# maxframe.dataframe.Series.dt.normalize
#### Series.dt.normalize(\*args, \*\*kwargs)
Convert times to midnight.
The time component of the date-time is converted to midnight i.e.
00:00:00. This is useful in cases, when the time does not matter.
Length is unaltered. The timezones are unaffected.
This method is available on Series with datetime values under
the `.dt` accessor, and directly on Datetime Array/Index.
* **Returns:**
The same type as the original data. Series will have the same
name and index. DatetimeIndex will have the same name.
* **Return type:**
DatetimeArray, DatetimeIndex or [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`floor`](maxframe.dataframe.Series.dt.floor.md#maxframe.dataframe.Series.dt.floor)
: Floor the datetimes to the specified freq.
[`ceil`](maxframe.dataframe.Series.dt.ceil.md#maxframe.dataframe.Series.dt.ceil)
: Ceil the datetimes to the specified freq.
[`round`](maxframe.dataframe.Series.dt.round.md#maxframe.dataframe.Series.dt.round)
: Round the datetimes to the specified freq.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> idx = md.date_range(start='2014-08-01 10:00', freq='h',
... periods=3, tz='Asia/Calcutta')
>>> idx.execute()
DatetimeIndex(['2014-08-01 10:00:00+05:30',
'2014-08-01 11:00:00+05:30',
'2014-08-01 12:00:00+05:30'],
dtype='datetime64[ns, Asia/Calcutta]', freq='h')
>>> idx.normalize().execute()
DatetimeIndex(['2014-08-01 00:00:00+05:30',
'2014-08-01 00:00:00+05:30',
'2014-08-01 00:00:00+05:30'],
dtype='datetime64[ns, Asia/Calcutta]', freq=None)
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.quarter.md
# maxframe.dataframe.Series.dt.quarter
#### Series.dt.quarter
The quarter of the date.
### Examples
For Series:
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(["1/1/2020 10:00:00+00:00", "4/1/2020 11:00:00+00:00"])
>>> s = md.to_datetime(s)
>>> s.execute()
0 2020-01-01 10:00:00+00:00
1 2020-04-01 11:00:00+00:00
dtype: datetime64[ns, UTC]
>>> s.dt.quarter.execute()
0 1
1 2
dtype: int32
```
For DatetimeIndex:
```pycon
>>> idx = md.DatetimeIndex(["1/1/2020 10:00:00+00:00",
... "2/1/2020 11:00:00+00:00"])
>>> idx.quarter.execute()
Index([1, 1], dtype='int32')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.round.md
# maxframe.dataframe.Series.dt.round
#### Series.dt.round(\*args, \*\*kwargs)
Perform round operation on the data to the specified freq.
* **Parameters:**
* **freq** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *Offset*) – The frequency level to round the index to. Must be a fixed
frequency like ‘S’ (second) not ‘ME’ (month end). See
[frequency aliases](https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases) for
a list of possible freq values.
* **ambiguous** ( *'infer'* *,* *bool-ndarray* *,* *'NaT'* *,* *default 'raise'*) –
Only relevant for DatetimeIndex:
- ’infer’ will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
a non-DST time (note that this flag is only applicable for
ambiguous times)
- ’NaT’ will return NaT where there are ambiguous times
- ’raise’ will raise an AmbiguousTimeError if there are ambiguous
times.
* **nonexistent** ( *'shift_forward'* *,* *'shift_backward'* *,* *'NaT'* *,* *timedelta* *,* *default 'raise'*) –
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- ’shift_forward’ will shift the nonexistent time forward to the
closest existing time
- ’shift_backward’ will shift the nonexistent time backward to the
closest existing time
- ’NaT’ will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- ’raise’ will raise an NonExistentTimeError if there are
nonexistent times.
* **Returns:**
Index of the same type for a DatetimeIndex or TimedeltaIndex,
or a Series with the same index for a Series.
* **Return type:**
DatetimeIndex, TimedeltaIndex, or [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
* **Raises:**
**ValueError if the freq cannot be converted.** –
### Notes
If the timestamps have a timezone, rounding will take place relative to the
local (“wall”) time and re-localized to the same timezone. When rounding
near daylight savings time, use `nonexistent` and `ambiguous` to
control the re-localization behavior.
### Examples
**DatetimeIndex**
```pycon
>>> import maxframe.dataframe as md
>>> rng = md.date_range('1/1/2018 11:59:00', periods=3, freq='min')
>>> rng.execute()
DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 12:00:00',
'2018-01-01 12:01:00'],
dtype='datetime64[ns]', freq='min')
>>> rng.round('h').execute()
DatetimeIndex(['2018-01-01 12:00:00', '2018-01-01 12:00:00',
'2018-01-01 12:00:00'],
dtype='datetime64[ns]', freq=None)
```
**Series**
```pycon
>>> md.Series(rng).dt.round("h").execute()
0 2018-01-01 12:00:00
1 2018-01-01 12:00:00
2 2018-01-01 12:00:00
dtype: datetime64[ns]
```
When rounding near a daylight savings time transition, use `ambiguous` or
`nonexistent` to control how the timestamp should be re-localized.
```pycon
>>> rng_tz = md.DatetimeIndex(["2021-10-31 03:30:00"], tz="Europe/Amsterdam")
```
```pycon
>>> rng_tz.floor("2h", ambiguous=False).execute()
DatetimeIndex(['2021-10-31 02:00:00+01:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
```
```pycon
>>> rng_tz.floor("2h", ambiguous=True).execute()
DatetimeIndex(['2021-10-31 02:00:00+02:00'],
dtype='datetime64[ns, Europe/Amsterdam]', freq=None)
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.second.md
# maxframe.dataframe.Series.dt.second
#### Series.dt.second
The seconds of the datetime.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> datetime_series = md.Series(
... md.date_range("2000-01-01", periods=3, freq="s")
... )
>>> datetime_series.execute()
0 2000-01-01 00:00:00
1 2000-01-01 00:00:01
2 2000-01-01 00:00:02
dtype: datetime64[ns]
>>> datetime_series.dt.second.execute()
0 0
1 1
2 2
dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.strftime.md
# maxframe.dataframe.Series.dt.strftime
#### Series.dt.strftime(\*args, \*\*kwargs)
Convert to Index using specified date_format.
Return an Index of formatted strings specified by date_format, which
supports the same string format as the python standard library. Details
of the string format can be found in [python string format
doc](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).
Formats supported by the C strftime API but not by the python string format
doc (such as “%R”, “%r”) are not officially supported and should be
preferably replaced with their supported equivalents (such as “%H:%M”,
“%I:%M:%S %p”).
Note that PeriodIndex support additional directives, detailed in
Period.strftime.
* **Parameters:**
**date_format** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Date format string (e.g. “%Y-%m-%d”).
* **Returns:**
NumPy ndarray of formatted strings.
* **Return type:**
ndarray[[object](https://docs.python.org/3/library/functions.html#object)]
#### SEE ALSO
[`to_datetime`](maxframe.dataframe.to_datetime.md#maxframe.dataframe.to_datetime)
: Convert the given argument to datetime.
`DatetimeIndex.normalize`
: Return DatetimeIndex with times to midnight.
`DatetimeIndex.round`
: Round the DatetimeIndex to the specified freq.
`DatetimeIndex.floor`
: Floor the DatetimeIndex to the specified freq.
`Timestamp.strftime`
: Format a single Timestamp.
`Period.strftime`
: Format a single Period.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> rng = md.date_range(md.Timestamp("2018-03-10 09:00"),
... periods=3, freq='s')
>>> rng.strftime('%B %d, %Y, %r').execute()
Index(['March 10, 2018, 09:00:00 AM', 'March 10, 2018, 09:00:01 AM',
'March 10, 2018, 09:00:02 AM'],
dtype='object')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.time.md
# maxframe.dataframe.Series.dt.time
#### Series.dt.time
Returns numpy array of [`datetime.time`](https://docs.python.org/3/library/datetime.html#datetime.time) objects.
The time part of the Timestamps.
### Examples
For Series:
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(["1/1/2020 10:00:00+00:00", "2/1/2020 11:00:00+00:00"])
>>> s = md.to_datetime(s)
>>> s.execute()
0 2020-01-01 10:00:00+00:00
1 2020-02-01 11:00:00+00:00
dtype: datetime64[ns, UTC]
>>> s.dt.time.execute()
0 10:00:00
1 11:00:00
dtype: object
```
For DatetimeIndex:
```pycon
>>> idx = md.DatetimeIndex(["1/1/2020 10:00:00+00:00",
... "2/1/2020 11:00:00+00:00"])
>>> idx.time.execute()
array([datetime.time(10, 0), datetime.time(11, 0)], dtype=object)
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.timetz.md
# maxframe.dataframe.Series.dt.timetz
#### Series.dt.timetz
Returns numpy array of [`datetime.time`](https://docs.python.org/3/library/datetime.html#datetime.time) objects with timezones.
The time part of the Timestamps.
### Examples
For Series:
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(["1/1/2020 10:00:00+00:00", "2/1/2020 11:00:00+00:00"])
>>> s = md.to_datetime(s)
>>> s.execute()
0 2020-01-01 10:00:00+00:00
1 2020-02-01 11:00:00+00:00
dtype: datetime64[ns, UTC]
>>> s.dt.timetz.execute()
0 10:00:00+00:00
1 11:00:00+00:00
dtype: object
```
For DatetimeIndex:
```pycon
>>> idx = md.DatetimeIndex(["1/1/2020 10:00:00+00:00",
... "2/1/2020 11:00:00+00:00"])
>>> idx.timetz.execute()
array([datetime.time(10, 0, tzinfo=datetime.timezone.utc),
datetime.time(11, 0, tzinfo=datetime.timezone.utc)], dtype=object)
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.to_period.md
# maxframe.dataframe.Series.dt.to_period
#### Series.dt.to_period(\*args, \*\*kwargs)
Cast to PeriodArray/PeriodIndex at a particular frequency.
Converts DatetimeArray/Index to PeriodArray/PeriodIndex.
* **Parameters:**
**freq** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *Period* *,* *optional*) – One of pandas’ [period aliases](https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-period-aliases)
or an Period object. Will be inferred by default.
* **Return type:**
PeriodArray/PeriodIndex
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – When converting a DatetimeArray/Index with non-regular values,
so that a frequency cannot be inferred.
#### SEE ALSO
`PeriodIndex`
: Immutable ndarray holding ordinal values.
`DatetimeIndex.to_pydatetime`
: Return DatetimeIndex as object.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({"y": [1, 2, 3]},
... index=md.to_datetime(["2000-03-31 00:00:00",
... "2000-05-31 00:00:00",
... "2000-08-31 00:00:00"]))
>>> df.index.to_period("M").execute()
PeriodIndex(['2000-03', '2000-05', '2000-08'],
dtype='period[M]')
```
Infer the daily frequency
```pycon
>>> idx = md.date_range("2017-01-01", periods=2)
>>> idx.to_period().execute()
PeriodIndex(['2017-01-01', '2017-01-02'],
dtype='period[D]')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.to_pydatetime.md
# maxframe.dataframe.Series.dt.to_pydatetime
#### Series.dt.to_pydatetime() → [ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray)
Return the data as an array of [`datetime.datetime`](https://docs.python.org/3/library/datetime.html#datetime.datetime) objects.
#### Deprecated
Deprecated since version 2.1.0: The current behavior of dt.to_pydatetime is deprecated.
In a future version this will return a Series containing python
datetime objects instead of a ndarray.
Timezone information is retained if present.
#### WARNING
Python’s datetime uses microsecond resolution, which is lower than
pandas (nanosecond). The values are truncated.
* **Returns:**
Object dtype array containing native Python datetime objects.
* **Return type:**
[numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray)
#### SEE ALSO
[`datetime.datetime`](https://docs.python.org/3/library/datetime.html#datetime.datetime)
: Standard library value for a datetime.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(md.date_range('20180310', periods=2))
>>> s.execute()
0 2018-03-10
1 2018-03-11
dtype: datetime64[ns]
```
```pycon
>>> s.dt.to_pydatetime().execute()
array([datetime.datetime(2018, 3, 10, 0, 0),
datetime.datetime(2018, 3, 11, 0, 0)], dtype=object)
```
pandas’ nanosecond precision is truncated to microseconds.
```pycon
>>> s = md.Series(md.date_range('20180310', periods=2, freq='ns'))
>>> s.execute()
0 2018-03-10 00:00:00.000000000
1 2018-03-10 00:00:00.000000001
dtype: datetime64[ns]
```
```pycon
>>> s.dt.to_pydatetime().execute()
array([datetime.datetime(2018, 3, 10, 0, 0),
datetime.datetime(2018, 3, 10, 0, 0)], dtype=object)
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.tz_convert.md
# maxframe.dataframe.Series.dt.tz_convert
#### Series.dt.tz_convert(\*args, \*\*kwargs)
Convert tz-aware Datetime Array/Index from one time zone to another.
* **Parameters:**
**tz** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *pytz.timezone* *,* *dateutil.tz.tzfile* *,* [*datetime.tzinfo*](https://docs.python.org/3/library/datetime.html#datetime.tzinfo) *or* *None*) – Time zone for time. Corresponding timestamps would be converted
to this time zone of the Datetime Array/Index. A tz of None will
convert to UTC and remove the timezone information.
* **Return type:**
Array or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
* **Raises:**
[**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError) – If Datetime Array/Index is tz-naive.
#### SEE ALSO
`DatetimeIndex.tz`
: A timezone that has a variable offset from UTC.
`DatetimeIndex.tz_localize`
: Localize tz-naive DatetimeIndex to a given time zone, or remove timezone from a tz-aware DatetimeIndex.
### Examples
With the tz parameter, we can change the DatetimeIndex
to other time zones:
```pycon
>>> import maxframe.dataframe as md
>>> dti = md.date_range(start='2014-08-01 09:00',
... freq='h', periods=3, tz='Europe/Berlin')
```
```pycon
>>> dti.execute()
DatetimeIndex(['2014-08-01 09:00:00+02:00',
'2014-08-01 10:00:00+02:00',
'2014-08-01 11:00:00+02:00'],
dtype='datetime64[ns, Europe/Berlin]', freq='h')
```
```pycon
>>> dti.tz_convert('US/Central').execute()
DatetimeIndex(['2014-08-01 02:00:00-05:00',
'2014-08-01 03:00:00-05:00',
'2014-08-01 04:00:00-05:00'],
dtype='datetime64[ns, US/Central]', freq='h')
```
With the `tz=None`, we can remove the timezone (after converting
to UTC if necessary):
```pycon
>>> dti = md.date_range(start='2014-08-01 09:00', freq='h',
... periods=3, tz='Europe/Berlin')
```
```pycon
>>> dti.execute()
DatetimeIndex(['2014-08-01 09:00:00+02:00',
'2014-08-01 10:00:00+02:00',
'2014-08-01 11:00:00+02:00'],
dtype='datetime64[ns, Europe/Berlin]', freq='h')
```
```pycon
>>> dti.tz_convert(None).execute()
DatetimeIndex(['2014-08-01 07:00:00',
'2014-08-01 08:00:00',
'2014-08-01 09:00:00'],
dtype='datetime64[ns]', freq='h')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.tz_localize.md
# maxframe.dataframe.Series.dt.tz_localize
#### Series.dt.tz_localize(\*args, \*\*kwargs)
Localize tz-naive Datetime Array/Index to tz-aware Datetime Array/Index.
This method takes a time zone (tz) naive Datetime Array/Index object
and makes this time zone aware. It does not move the time to another
time zone.
This method can also be used to do the inverse – to create a time
zone unaware object from an aware object. To that end, pass tz=None.
* **Parameters:**
* **tz** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *pytz.timezone* *,* *dateutil.tz.tzfile* *,* [*datetime.tzinfo*](https://docs.python.org/3/library/datetime.html#datetime.tzinfo) *or* *None*) – Time zone to convert timestamps to. Passing `None` will
remove the time zone information preserving local time.
* **ambiguous** ( *'infer'* *,* *'NaT'* *,* *bool array* *,* *default 'raise'*) –
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from
03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at
00:30:00 UTC and at 01:30:00 UTC. In such a situation, the
ambiguous parameter dictates how ambiguous times should be
handled.
- ’infer’ will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False signifies a
non-DST time (note that this flag is only applicable for
ambiguous times)
- ’NaT’ will return NaT where there are ambiguous times
- ’raise’ will raise an AmbiguousTimeError if there are ambiguous
times.
* **nonexistent** ( *'shift_forward'* *,* *'shift_backward* *,* *'NaT'* *,* *timedelta* *,* *default 'raise'*) –
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.
- ’shift_forward’ will shift the nonexistent time forward to the
closest existing time
- ’shift_backward’ will shift the nonexistent time backward to the
closest existing time
- ’NaT’ will return NaT where there are nonexistent times
- timedelta objects will shift nonexistent times by the timedelta
- ’raise’ will raise an NonExistentTimeError if there are
nonexistent times.
* **Returns:**
Array/Index converted to the specified time zone.
* **Return type:**
Same type as self
* **Raises:**
[**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError) – If the Datetime Array/Index is tz-aware and tz is not None.
#### SEE ALSO
`DatetimeIndex.tz_convert`
: Convert tz-aware DatetimeIndex from one time zone to another.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> tz_naive = md.date_range('2018-03-01 09:00', periods=3)
>>> tz_naive.execute()
DatetimeIndex(['2018-03-01 09:00:00', '2018-03-02 09:00:00',
'2018-03-03 09:00:00'],
dtype='datetime64[ns]', freq='D')
```
Localize DatetimeIndex in US/Eastern time zone:
```pycon
>>> tz_aware = tz_naive.tz_localize(tz='US/Eastern')
>>> tz_aware.execute()
DatetimeIndex(['2018-03-01 09:00:00-05:00',
'2018-03-02 09:00:00-05:00',
'2018-03-03 09:00:00-05:00'],
dtype='datetime64[ns, US/Eastern]', freq=None)
```
With the `tz=None`, we can remove the time zone information
while keeping the local time (not converted to UTC):
```pycon
>>> tz_aware.tz_localize(None).execute()
DatetimeIndex(['2018-03-01 09:00:00', '2018-03-02 09:00:00',
'2018-03-03 09:00:00'],
dtype='datetime64[ns]', freq=None)
```
Be careful with DST changes. When there is sequential data, pandas can
infer the DST time:
```pycon
>>> s = md.to_datetime(md.Series(['2018-10-28 01:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 03:00:00',
... '2018-10-28 03:30:00']))
>>> s.dt.tz_localize('CET', ambiguous='infer').execute()
0 2018-10-28 01:30:00+02:00
1 2018-10-28 02:00:00+02:00
2 2018-10-28 02:30:00+02:00
3 2018-10-28 02:00:00+01:00
4 2018-10-28 02:30:00+01:00
5 2018-10-28 03:00:00+01:00
6 2018-10-28 03:30:00+01:00
dtype: datetime64[ns, CET]
```
In some cases, inferring the DST is impossible. In such cases, you can
pass an ndarray to the ambiguous parameter to set the DST explicitly
```pycon
>>> s = md.to_datetime(md.Series(['2018-10-28 01:20:00',
... '2018-10-28 02:36:00',
... '2018-10-28 03:46:00']))
>>> s.dt.tz_localize('CET', ambiguous=mt.array([True, True, False])).execute()
0 2018-10-28 01:20:00+02:00
1 2018-10-28 02:36:00+02:00
2 2018-10-28 03:46:00+01:00
dtype: datetime64[ns, CET]
```
If the DST transition causes nonexistent times, you can shift these
dates forward or backwards with a timedelta object or ‘shift_forward’
or ‘shift_backwards’.
```pycon
>>> s = md.to_datetime(md.Series(['2015-03-29 02:30:00',
... '2015-03-29 03:30:00']))
>>> s.dt.tz_localize('Europe/Warsaw', nonexistent='shift_forward').execute()
0 2015-03-29 03:00:00+02:00
1 2015-03-29 03:30:00+02:00
dtype: datetime64[ns, Europe/Warsaw]
```
```pycon
>>> s.dt.tz_localize('Europe/Warsaw', nonexistent='shift_backward').execute()
0 2015-03-29 01:59:59.999999999+01:00
1 2015-03-29 03:30:00+02:00
dtype: datetime64[ns, Europe/Warsaw]
```
```pycon
>>> s.dt.tz_localize('Europe/Warsaw', nonexistent=md.Timedelta('1h')).execute()
0 2015-03-29 03:30:00+02:00
1 2015-03-29 03:30:00+02:00
dtype: datetime64[ns, Europe/Warsaw]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.week.md
# maxframe.dataframe.Series.dt.week
#### Series.dt.week
The week ordinal of the year.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> idx = md.PeriodIndex(["2023-01", "2023-02", "2023-03"], freq="M")
>>> idx.week # It can be written `weekofyear`.execute()
Index([5, 9, 13], dtype='int64')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.weekday.md
# maxframe.dataframe.Series.dt.weekday
#### Series.dt.weekday
The day of the week with Monday=0, Sunday=6.
Return the day of the week. It is assumed the week starts on
Monday, which is denoted by 0 and ends on Sunday which is denoted
by 6. This method is available on both Series with datetime
values (using the dt accessor) or DatetimeIndex.
* **Returns:**
Containing integers indicating the day number.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
#### SEE ALSO
[`Series.dt.dayofweek`](maxframe.dataframe.Series.dt.dayofweek.md#maxframe.dataframe.Series.dt.dayofweek)
: Alias.
[`Series.dt.weekday`](#maxframe.dataframe.Series.dt.weekday)
: Alias.
[`Series.dt.day_name`](maxframe.dataframe.Series.dt.day_name.md#maxframe.dataframe.Series.dt.day_name)
: Returns the name of the day of the week.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.date_range('2016-12-31', '2017-01-08', freq='D').to_series()
>>> s.dt.dayofweek.execute()
2016-12-31 5
2017-01-01 6
2017-01-02 0
2017-01-03 1
2017-01-04 2
2017-01-05 3
2017-01-06 4
2017-01-07 5
2017-01-08 6
Freq: D, dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.weekofyear.md
# maxframe.dataframe.Series.dt.weekofyear
#### Series.dt.weekofyear
The week ordinal of the year.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> idx = md.PeriodIndex(["2023-01", "2023-02", "2023-03"], freq="M")
>>> idx.week # It can be written `weekofyear`.execute()
Index([5, 9, 13], dtype='int64')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dt.year.md
# maxframe.dataframe.Series.dt.year
#### Series.dt.year
The year of the datetime.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> datetime_series = md.Series(
... md.date_range("2000-01-01", periods=3, freq="YE")
... )
>>> datetime_series.execute()
0 2000-12-31
1 2001-12-31
2 2002-12-31
dtype: datetime64[ns]
>>> datetime_series.dt.year.execute()
0 2000
1 2001
2 2002
dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.dtype.md
# maxframe.dataframe.Series.dtype
#### *property* Series.dtype
Return the dtype object of the underlying data.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.eq.md
# maxframe.dataframe.Series.eq
#### Series.eq(other, level=None, fill_value=None, axis=0)
Return Equal to of series and other, element-wise (binary operator eq).
Equivalent to `series == other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.eq(b, fill_value=0).execute()
a True
b False
c False
d False
e False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.ewm.md
# maxframe.dataframe.Series.ewm
#### Series.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0)
Provide exponential weighted functions.
* **Parameters:**
* **com** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Specify decay in terms of center of mass,
$\alpha = 1 / (1 + com),\text{ for } com \geq 0$.
* **span** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Specify decay in terms of span,
$\alpha = 2 / (span + 1),\text{ for } span \geq 1$.
* **halflife** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Specify decay in terms of half-life,
$\alpha = 1 - exp(log(0.5) / halflife),\text{for} halflife > 0$.
* **alpha** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Specify smoothing factor $\alpha$ directly,
$0 < \alpha \leq 1$.
* **min_periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 0*) – Minimum number of observations in window required to have a value
(otherwise result is NA).
* **adjust** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Divide by decaying adjustment factor in beginning periods to account
for imbalance in relative weightings
(viewing EWMA as a moving average).
* **ignore_na** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Ignore missing values when calculating weights;
specify True to reproduce pre-0.15.0 behavior.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – The axis to use. The value 0 identifies the rows, and 1
identifies the columns.
* **Returns:**
A Window sub-classed for the particular operation.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`rolling`](maxframe.dataframe.Series.rolling.md#maxframe.dataframe.Series.rolling)
: Provides rolling window calculations.
[`expanding`](maxframe.dataframe.Series.expanding.md#maxframe.dataframe.Series.expanding)
: Provides expanding transformations.
### Notes
Exactly one of center of mass, span, half-life, and alpha must be provided.
Allowed values and relationship between the parameters are specified in the
parameter descriptions above; see the link at the end of this section for
a detailed explanation.
When adjust is True (default), weighted averages are calculated using
weights (1-alpha)\*\*(n-1), (1-alpha)\*\*(n-2), …, 1-alpha, 1.
When adjust is False, weighted averages are calculated recursively as:
> weighted_average[0] = arg[0];
> weighted_average[i] = (1-alpha)\*weighted_average[i-1] + alpha\*arg[i].
When ignore_na is False (default), weights are based on absolute positions.
For example, the weights of x and y used in calculating the final weighted
average of [x, None, y] are (1-alpha)\*\*2 and 1 (if adjust is True), and
(1-alpha)\*\*2 and alpha (if adjust is False).
When ignore_na is True (reproducing pre-0.15.0 behavior), weights are based
on relative positions. For example, the weights of x and y used in
calculating the final weighted average of [x, None, y] are 1-alpha and 1
(if adjust is True), and 1-alpha and alpha (if adjust is False).
More details can be found at
[https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#exponentially-weighted-windows](https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#exponentially-weighted-windows)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df.execute()
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
>>> df.ewm(com=0.5).mean().execute()
B
0 0.000000
1 0.750000
2 1.615385
3 1.615385
4 3.670213
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.expanding.md
# maxframe.dataframe.Series.expanding
#### Series.expanding(min_periods=1, shift=0, reverse_range=False)
Provide expanding transformations.
* **Parameters:**
* **min_periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 1*)
* **value** (*Minimum number* *of* *observations in window required to have a*)
* **NA****)****.** ( *(**otherwise result is*)
* **center** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*)
* **window.** (*Set the labels at the center* *of* *the*)
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 0*)
* **Return type:**
a Window sub-classed for the particular operation
#### SEE ALSO
[`rolling`](maxframe.dataframe.Series.rolling.md#maxframe.dataframe.Series.rolling)
: Provides rolling window calculations.
[`ewm`](maxframe.dataframe.Series.ewm.md#maxframe.dataframe.Series.ewm)
: Provides exponential weighted functions.
### Notes
By default, the result is set to the right edge of the window. This can be
changed to the center of the window by setting `center=True`.
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df.execute()
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
>>> df.expanding(2).sum().execute()
B
0 NaN
1 1.0
2 3.0
3 3.0
4 7.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.explode.md
# maxframe.dataframe.Series.explode
#### Series.explode(ignore_index=False, default_index_type=None)
Transform each element of a list-like to a row.
* **Parameters:**
**ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, the resulting index will be labeled 0, 1, …, n - 1.
* **Returns:**
Exploded lists to rows; index will be duplicated for these rows.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
`Series.str.split`
: Split string values on specified separator.
[`Series.unstack`](maxframe.dataframe.Series.unstack.md#maxframe.dataframe.Series.unstack)
: Unstack, a.k.a. pivot, Series with MultiIndex to produce DataFrame.
[`DataFrame.melt`](maxframe.dataframe.DataFrame.melt.md#maxframe.dataframe.DataFrame.melt)
: Unpivot a DataFrame from wide format to long format.
`DataFrame.explode`
: Explode a DataFrame from list-like columns to long format.
### Notes
This routine will explode list-likes including lists, tuples,
Series, and np.ndarray. The result dtype of the subset rows will
be object. Scalars will be returned unchanged. Empty list-likes will
result in a np.nan for that row.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series([[1, 2, 3], 'foo', [], [3, 4]])
>>> s.execute()
0 [1, 2, 3]
1 foo
2 []
3 [3, 4]
dtype: object
```
```pycon
>>> s.explode().execute()
0 1
0 2
0 3
1 foo
2 NaN
3 3
3 4
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.factorize.md
# maxframe.dataframe.Series.factorize
#### Series.factorize(sort=False, use_na_sentinel=True)
Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
array when all that matters is identifying distinct values. factorize
is available as both a top-level function [`pandas.factorize()`](https://pandas.pydata.org/docs/reference/api/pandas.factorize.html#pandas.factorize),
and as a method [`Series.factorize()`](#maxframe.dataframe.Series.factorize) and [`Index.factorize()`](maxframe.dataframe.Index.factorize.md#maxframe.dataframe.Index.factorize).
* **Parameters:**
* **values** (*sequence*) – A 1-D sequence. Sequences that aren’t pandas objects are
coerced to ndarrays before factorization.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Sort uniques and shuffle codes to maintain the
relationship.
* **use_na_sentinel** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If True, the sentinel -1 will be used for NaN values. If False,
NaN values will be encoded as non-negative integers and will not drop the
NaN from the uniques of the values.
* **Returns:**
* **codes** (*ndarray*) – An integer ndarray that’s an indexer into uniques.
`uniques.take(codes)` will have the same values as values.
* **uniques** (*ndarray, Index, or Categorical*) – The unique valid values. When values is Categorical, uniques
is a Categorical. When values is some other pandas object, an
Index is returned. Otherwise, a 1-D ndarray is returned.
#### NOTE
Even if there’s a missing value in values, uniques will
*not* contain an entry for it.
#### SEE ALSO
`cut`
: Discretize continuous-valued array.
[`unique`](maxframe.dataframe.Series.unique.md#maxframe.dataframe.Series.unique)
: Find the unique value in an array.
### Notes
Reference [the user guide](https://pandas.pydata.org/docs/user_guide/reshaping.html#reshaping-factorize) for more examples.
### Examples
These examples all show factorize as a top-level method like
`pd.factorize(values)`. The results are identical for methods like
[`Series.factorize()`](#maxframe.dataframe.Series.factorize).
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> codes, uniques = md.factorize(mt.array(['b', 'b', 'a', 'c', 'b'], dtype="O"))
>>> codes.execute()
array([0, 0, 1, 2, 0])
>>> uniques.execute()
array(['b', 'a', 'c'], dtype=object)
```
With `sort=True`, the uniques will be sorted, and codes will be
shuffled so that the relationship is the maintained.
```pycon
>>> codes, uniques = md.factorize(mt.array(['b', 'b', 'a', 'c', 'b'], dtype="O"),
... sort=True)
>>> codes.execute()
array([1, 1, 0, 2, 1])
>>> uniques.execute()
array(['a', 'b', 'c'], dtype=object)
```
When `use_na_sentinel=True` (the default), missing values are indicated in
the codes with the sentinel value `-1` and missing values are not
included in uniques.
```pycon
>>> codes, uniques = md.factorize(mt.array(['b', None, 'a', 'c', 'b'], dtype="O"))
>>> codes.execute()
array([ 0, -1, 1, 2, 0])
>>> uniques.execute()
array(['b', 'a', 'c'], dtype=object)
```
Thus far, we’ve only factorized lists (which are internally coerced to
NumPy arrays). When factorizing pandas objects, the type of uniques
will differ. For Categoricals, a Categorical is returned.
```pycon
>>> cat = md.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> codes, uniques = md.factorize(cat)
>>> codes.execute()
array([0, 0, 1])
>>> uniques.execute()
['a', 'c']
Categories (3, object): ['a', 'b', 'c']
```
Notice that `'b'` is in `uniques.categories`, despite not being
present in `cat.values`.
For all other pandas objects, an Index of the appropriate type is
returned.
```pycon
>>> cat = md.Series(['a', 'a', 'c'])
>>> codes, uniques = md.factorize(cat)
>>> codes.execute()
array([0, 0, 1])
>>> uniques.execute()
Index(['a', 'c'], dtype='object')
```
If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting `use_na_sentinel=False`.
```pycon
>>> values = mt.array([1, 2, 1, mt.nan])
>>> codes, uniques = md.factorize(values) # default: use_na_sentinel=True
>>> codes.execute()
array([ 0, 1, 0, -1])
>>> uniques.execute()
array([1., 2.])
```
```pycon
>>> codes, uniques = md.factorize(values, use_na_sentinel=False)
>>> codes.execute()
array([0, 1, 0, 2])
>>> uniques.execute()
array([ 1., 2., nan])
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.fillna.md
# maxframe.dataframe.Series.fillna
#### Series.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
Fill NA/NaN values using the specified method.
* **Parameters:**
* **value** (*scalar* *,* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Value to use to fill holes (e.g. 0), alternately a
dict/Series/DataFrame of values specifying which value to use for
each index (for a Series) or column (for a DataFrame). Values not
in the dict/Series/DataFrame will not be filled. This value cannot
be a list.
* **method** ( *{'backfill'* *,* *'bfill'* *,* *'pad'* *,* *'ffill'* *,* *None}* *,* *default None*) – Method to use for filling holes in reindexed Series
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use next valid observation to fill gap.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}*) – Axis along which to fill missing values.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, fill in-place. Note: this will modify any
other views on this object (e.g., a no-copy slice for a column in a
DataFrame).
* **limit** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – If method is specified, this is the maximum number of consecutive
NaN values to forward/backward fill. In other words, if there is
a gap with more than this number of consecutive NaNs, it will only
be partially filled. If method is not specified, this is the
maximum number of entries along the entire axis where NaNs will be
filled. Must be greater than 0 if not None.
* **downcast** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *default is None*) – A dict of item->dtype of what to downcast if possible,
or the string ‘infer’ which will try to downcast to an appropriate
equal type (e.g. float64 to int64 if possible).
* **Returns:**
Object with missing values filled or None if `inplace=True`.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or None
#### SEE ALSO
`interpolate`
: Fill NaN values using interpolation.
[`reindex`](maxframe.dataframe.Series.reindex.md#maxframe.dataframe.Series.reindex)
: Conform object to new index.
`asfreq`
: Convert TimeSeries to specified frequency.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[np.nan, 2, np.nan, 0],
[3, 4, np.nan, 1],
[np.nan, np.nan, np.nan, 5],
[np.nan, 3, np.nan, 4]],
columns=list('ABCD'))
>>> df.execute()
A B C D
0 NaN 2.0 NaN 0
1 3.0 4.0 NaN 1
2 NaN NaN NaN 5
3 NaN 3.0 NaN 4
```
Replace all NaN elements with 0s.
```pycon
>>> df.fillna(0).execute()
A B C D
0 0.0 2.0 0.0 0
1 3.0 4.0 0.0 1
2 0.0 0.0 0.0 5
3 0.0 3.0 0.0 4
```
We can also propagate non-null values forward or backward.
```pycon
>>> df.fillna(method='ffill').execute()
A B C D
0 NaN 2.0 NaN 0
1 3.0 4.0 NaN 1
2 3.0 4.0 NaN 5
3 3.0 3.0 NaN 4
```
Replace all NaN elements in column ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1,
2, and 3 respectively.
```pycon
>>> values = {'A': 0, 'B': 1, 'C': 2, 'D': 3}
>>> df.fillna(value=values).execute()
A B C D
0 0.0 2.0 2.0 0
1 3.0 4.0 2.0 1
2 0.0 1.0 2.0 5
3 0.0 3.0 2.0 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.filter.md
# maxframe.dataframe.Series.filter
#### Series.filter(items=None, like=None, regex=None, axis=None)
Subset the dataframe rows or columns according to the specified index labels.
Note that this routine does not filter a dataframe on its
contents. The filter is applied to the labels of the index.
* **Parameters:**
* **items** (*list-like*) – Keep labels from axis which are in items.
* **like** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Keep labels from axis for which “like in label == True”.
* **regex** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *(**regular expression* *)*) – Keep labels from axis for which re.search(regex, label) == True.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'* *,* *None}* *,* *default None*) – The axis to filter on, expressed either as an index (int)
or axis name (str). By default this is the info axis, ‘columns’ for
DataFrame. For Series this parameter is unused and defaults to None.
* **Return type:**
same type as input object
#### SEE ALSO
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Access a group of rows and columns by label(s) or a boolean array.
### Notes
The `items`, `like`, and `regex` parameters are
enforced to be mutually exclusive.
`axis` defaults to the info axis that is used when indexing
with `[]`.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(mt.array(([1, 2, 3], [4, 5, 6])),
... index=['mouse', 'rabbit'],
... columns=['one', 'two', 'three'])
>>> df.execute()
one two three
mouse 1 2 3
rabbit 4 5 6
```
```pycon
>>> # select columns by name
>>> df.filter(items=['one', 'three']).execute()
one three
mouse 1 3
rabbit 4 6
```
```pycon
>>> # select columns by regular expression
>>> df.filter(regex='e$', axis=1).execute()
one three
mouse 1 3
rabbit 4 6
```
```pycon
>>> # select rows containing 'bbi'
>>> df.filter(like='bbi', axis=0).execute()
one two three
rabbit 4 5 6
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.first_valid_index.md
# maxframe.dataframe.Series.first_valid_index
#### Series.first_valid_index()
Return index for first non-NA value or None, if no non-NA value is found.
* **Return type:**
[type](https://docs.python.org/3/library/functions.html#type) of index
### Examples
For Series:
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([None, 3, 4])
>>> s.first_valid_index().execute()
1
>>> s.last_valid_index().execute()
2
```
```pycon
>>> s = md.Series([None, None])
>>> print(s.first_valid_index()).execute()
None
>>> print(s.last_valid_index()).execute()
None
```
If all elements in Series are NA/null, returns None.
```pycon
>>> s = md.Series()
>>> print(s.first_valid_index()).execute()
None
>>> print(s.last_valid_index()).execute()
None
```
If Series is empty, returns None.
For DataFrame:
```pycon
>>> df = md.DataFrame({'A': [None, None, 2], 'B': [None, 3, 4]})
>>> df.execute()
A B
0 NaN NaN
1 NaN 3.0
2 2.0 4.0
>>> df.first_valid_index().execute()
1
>>> df.last_valid_index().execute()
2
```
```pycon
>>> df = md.DataFrame({'A': [None, None, None], 'B': [None, None, None]})
>>> df.execute()
A B
0 None None
1 None None
2 None None
>>> print(df.first_valid_index()).execute()
None
>>> print(df.last_valid_index()).execute()
None
```
If all elements in DataFrame are NA/null, returns None.
```pycon
>>> df = md.DataFrame()
>>> df.execute()
Empty DataFrame
Columns: []
Index: []
>>> print(df.first_valid_index()).execute()
None
>>> print(df.last_valid_index()).execute()
None
```
If DataFrame is empty, returns None.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.floordiv.md
# maxframe.dataframe.Series.floordiv
#### Series.floordiv(other, level=None, fill_value=None, axis=0)
Return Integer division of series and other, element-wise (binary operator floordiv).
Equivalent to `series // other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.rfloordiv`](maxframe.dataframe.Series.rfloordiv.md#maxframe.dataframe.Series.rfloordiv)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.floordiv(b, fill_value=0).execute()
a 1.0
b NaN
c NaN
d 0.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.ge.md
# maxframe.dataframe.Series.ge
#### Series.ge(other, level=None, fill_value=None, axis=0)
Return Greater than or equal to of series and other, element-wise (binary operator ge).
Equivalent to `series >= other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.ge(b, fill_value=0).execute()
a True
b True
c False
d False
e True
f False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.groupby.md
# maxframe.dataframe.Series.groupby
#### Series.groupby(by=None, level=None, as_index=True, sort=True, group_keys=True)
Group DataFrame using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the
object, applying a function, and combining the results. This can be
used to group large amounts of data and compute operations on these
groups.
* **Parameters:**
* **by** (*mapping* *,* *function* *,* *label* *, or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *labels*) – Used to determine the groups for the groupby.
If `by` is a function, it’s called on each value of the object’s
index. If a dict or Series is passed, the Series or dict VALUES
will be used to determine the groups (the Series’ values are first
aligned; see `.align()` method). If an ndarray is passed, the
values are used as-is to determine the groups. A label or list of
labels may be passed to group by the columns in `self`. Notice
that a tuple is interpreted as a (single) key.
* **as_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – For aggregated output, return object with group labels as the
index. Only relevant for DataFrame input. as_index=False is
effectively “SQL-style” grouped output.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Sort group keys. Get better performance by turning this off.
Note this does not influence the order of observations within each
group. Groupby preserves the order of rows within each group.
* **group_keys** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – When calling apply, add group keys to index to identify pieces.
### Notes
MaxFrame only supports groupby with axis=0.
Default value of group_keys will be decided given the version of local
pandas library, which is True since pandas 2.0.
* **Returns:**
Returns a groupby object that contains information about the groups.
* **Return type:**
DataFrameGroupBy
#### SEE ALSO
`resample`
: Convenience method for frequency conversion and resampling of time series.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'Animal': ['Falcon', 'Falcon',
... 'Parrot', 'Parrot'],
... 'Max Speed': [380., 370., 24., 26.]})
>>> df.execute()
Animal Max Speed
0 Falcon 380.0
1 Falcon 370.0
2 Parrot 24.0
3 Parrot 26.0
>>> df.groupby(['Animal']).mean().execute()
Max Speed
Animal
Falcon 375.0
Parrot 25.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.gt.md
# maxframe.dataframe.Series.gt
#### Series.gt(other, level=None, fill_value=None, axis=0)
Return Greater than of series and other, element-wise (binary operator gt).
Equivalent to `series > other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.gt(b, fill_value=0).execute()
a True
b False
c False
d False
e True
f False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.head.md
# maxframe.dataframe.Series.head
#### Series.head(n=5)
Return the first n rows.
This function returns the first n rows for the object based
on position. It is useful for quickly testing if your object
has the right type of data in it.
For negative values of n, this function returns all rows except
the last n rows, equivalent to `df[:-n]`.
* **Parameters:**
**n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 5*) – Number of rows to select.
* **Returns:**
The first n rows of the caller object.
* **Return type:**
same type as caller
#### SEE ALSO
[`DataFrame.tail`](maxframe.dataframe.DataFrame.tail.md#maxframe.dataframe.DataFrame.tail)
: Returns the last n rows.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df.execute()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
```
Viewing the first 5 lines
```pycon
>>> df.head().execute()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
```
Viewing the first n lines (three in this case)
```pycon
>>> df.head(3).execute()
animal
0 alligator
1 bee
2 falcon
```
For negative values of n
```pycon
>>> df.head(-3).execute()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.iat.md
# maxframe.dataframe.Series.iat
#### *property* Series.iat
Access a single value for a row/column pair by integer position.
Similar to `iloc`, in that both provide integer-based lookups. Use
`iat` if you only need to get or set a single value in a DataFrame
or Series.
* **Raises:**
[**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – When integer position is out of bounds.
#### SEE ALSO
[`DataFrame.at`](maxframe.dataframe.DataFrame.at.md#maxframe.dataframe.DataFrame.at)
: Access a single value for a row/column label pair.
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Access a group of rows and columns by label(s).
[`DataFrame.iloc`](maxframe.dataframe.DataFrame.iloc.md#maxframe.dataframe.DataFrame.iloc)
: Access a group of rows and columns by integer position(s).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
... columns=['A', 'B', 'C'])
>>> df.execute()
A B C
0 0 2 3
1 0 4 1
2 10 20 30
```
Get value at specified row/column pair
```pycon
>>> df.iat[1, 2].execute()
1
```
Set value at specified row/column pair
```pycon
>>> df.iat[1, 2] = 10
>>> df.iat[1, 2].execute()
10
```
Get value within a series
```pycon
>>> df.loc[0].iat[1].execute()
2
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.idxmax.md
# maxframe.dataframe.Series.idxmax
#### Series.idxmax(axis=0, skipna=True)
Return the row label of the maximum value.
If multiple values equal the maximum, the first row label with that
value is returned.
* **Parameters:**
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 0*) – For compatibility with DataFrame.idxmax. Redundant for application
on Series.
* **skipna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Exclude NA/null values. If the entire Series is NA, the result
will be NA.
* **\*args** – Additional arguments and keywords have no effect but might be
accepted for compatibility with NumPy.
* **\*\*kwargs** – Additional arguments and keywords have no effect but might be
accepted for compatibility with NumPy.
* **Returns:**
Label of the maximum value.
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If the Series is empty.
#### SEE ALSO
[`numpy.argmax`](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html#numpy.argmax)
: Return indices of the maximum values along the given axis.
[`DataFrame.idxmax`](maxframe.dataframe.DataFrame.idxmax.md#maxframe.dataframe.DataFrame.idxmax)
: Return index of first occurrence of maximum over requested axis.
[`Series.idxmin`](maxframe.dataframe.Series.idxmin.md#maxframe.dataframe.Series.idxmin)
: Return index *label* of the first occurrence of minimum of values.
### Notes
This method is the Series version of `ndarray.argmax`. This method
returns the label of the maximum, while `ndarray.argmax` returns
the position. To get the position, use `series.values.argmax()`.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(data=[1, None, 4, 3, 4],
... index=['A', 'B', 'C', 'D', 'E'])
>>> s.execute()
A 1.0
B NaN
C 4.0
D 3.0
E 4.0
dtype: float64
```
```pycon
>>> s.idxmax().execute()
'C'
```
If skipna is False and there is an NA value in the data,
the function returns `nan`.
```pycon
>>> s.idxmax(skipna=False).execute()
nan
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.idxmin.md
# maxframe.dataframe.Series.idxmin
#### Series.idxmin(axis=0, skipna=True)
Return the row label of the minimum value.
If multiple values equal the minimum, the first row label with that
value is returned.
* **Parameters:**
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 0*) – For compatibility with DataFrame.idxmin. Redundant for application
on Series.
* **skipna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Exclude NA/null values. If the entire Series is NA, the result
will be NA.
* **\*args** – Additional arguments and keywords have no effect but might be
accepted for compatibility with NumPy.
* **\*\*kwargs** – Additional arguments and keywords have no effect but might be
accepted for compatibility with NumPy.
* **Returns:**
Label of the minimum value.
* **Return type:**
[Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If the Series is empty.
#### SEE ALSO
[`numpy.argmin`](https://numpy.org/doc/stable/reference/generated/numpy.argmin.html#numpy.argmin)
: Return indices of the minimum values along the given axis.
[`DataFrame.idxmin`](maxframe.dataframe.DataFrame.idxmin.md#maxframe.dataframe.DataFrame.idxmin)
: Return index of first occurrence of minimum over requested axis.
[`Series.idxmin`](#maxframe.dataframe.Series.idxmin)
: Return index *label* of the first occurrence of minimum of values.
### Notes
This method is the Series version of `ndarray.argmin`. This method
returns the label of the minimum, while `ndarray.argmin` returns
the position. To get the position, use `series.values.argmin()`.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(data=[1, None, 4, 3, 4],
... index=['A', 'B', 'C', 'D', 'E'])
>>> s.execute()
A 1.0
B NaN
C 4.0
D 3.0
E 4.0
dtype: float64
```
```pycon
>>> s.idxmin().execute()
'C'
```
If skipna is False and there is an NA value in the data,
the function returns `nan`.
```pycon
>>> s.idxmin(skipna=False).execute()
nan
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.iloc.md
# maxframe.dataframe.Series.iloc
#### *property* Series.iloc
Purely integer-location based indexing for selection by position.
`.iloc[]` is primarily integer position based (from `0` to
`length-1` of the axis), but may also be used with a boolean
array.
Allowed inputs are:
- An integer, e.g. `5`.
- A list or array of integers, e.g. `[4, 3, 0]`.
- A slice object with ints, e.g. `1:7`.
- A boolean array.
- A `callable` function with one argument (the calling Series or
DataFrame) and that returns valid output for indexing (one of the above).
This is useful in method chains, when you don’t have a reference to the
calling object, but would like to base your selection on some value.
`.iloc` will raise `IndexError` if a requested indexer is
out-of-bounds, except *slice* indexers which allow out-of-bounds
indexing (this conforms with python/numpy *slice* semantics).
See more at [Selection by Position](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-integer).
#### SEE ALSO
[`DataFrame.iat`](maxframe.dataframe.DataFrame.iat.md#maxframe.dataframe.DataFrame.iat)
: Fast integer location scalar accessor.
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Purely label-location based indexer for selection by label.
[`Series.iloc`](#maxframe.dataframe.Series.iloc)
: Purely integer-location based indexing for selection by position.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
... {'a': 100, 'b': 200, 'c': 300, 'd': 400},
... {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
>>> df = md.DataFrame(mydict)
>>> df.execute()
a b c d
0 1 2 3 4
1 100 200 300 400
2 1000 2000 3000 4000
```
**Indexing just the rows**
With a scalar integer.
```pycon
>>> type(df.iloc[0]).execute()
<class 'pandas.core.series.Series'>
>>> df.iloc[0].execute()
a 1
b 2
c 3
d 4
Name: 0, dtype: int64
```
With a list of integers.
```pycon
>>> df.iloc[[0]].execute()
a b c d
0 1 2 3 4
>>> type(df.iloc[[0]]).execute()
<class 'pandas.core.frame.DataFrame'>
```
```pycon
>>> df.iloc[[0, 1]].execute()
a b c d
0 1 2 3 4
1 100 200 300 400
```
With a slice object.
```pycon
>>> df.iloc[:3].execute()
a b c d
0 1 2 3 4
1 100 200 300 400
2 1000 2000 3000 4000
```
With a boolean mask the same length as the index.
```pycon
>>> df.iloc[[True, False, True]].execute()
a b c d
0 1 2 3 4
2 1000 2000 3000 4000
```
With a callable, useful in method chains. The x passed
to the `lambda` is the DataFrame being sliced. This selects
the rows whose index label even.
```pycon
>>> df.iloc[lambda x: x.index % 2 == 0].execute()
a b c d
0 1 2 3 4
2 1000 2000 3000 4000
```
**Indexing both axes**
You can mix the indexer types for the index and columns. Use `:` to
select the entire axis.
With scalar integers.
```pycon
>>> df.iloc[0, 1].execute()
2
```
With lists of integers.
```pycon
>>> df.iloc[[0, 2], [1, 3]].execute()
b d
0 2 4
2 2000 4000
```
With slice objects.
```pycon
>>> df.iloc[1:3, 0:3].execute()
a b c
1 100 200 300
2 1000 2000 3000
```
With a boolean array whose length matches the columns.
```pycon
>>> df.iloc[:, [True, False, True, False]].execute()
a c
0 1 3
1 100 300
2 1000 3000
```
With a callable function that expects the Series or DataFrame.
```pycon
>>> df.iloc[:, lambda df: [0, 2]].execute()
a c
0 1 3
1 100 300
2 1000 3000
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.index.md
# maxframe.dataframe.Series.index
#### *property* Series.index
The index (axis labels) of the Series.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.infer_objects.md
# maxframe.dataframe.Series.infer_objects
#### Series.infer_objects(copy=True)
Attempt to infer better dtypes for object columns.
Attempts soft conversion of object-dtyped
columns, leaving non-object and unconvertible
columns unchanged. The inference rules are the
same as during normal Series/DataFrame construction.
* **Returns:**
**converted**
* **Return type:**
same type as input object
#### SEE ALSO
[`to_datetime`](maxframe.dataframe.to_datetime.md#maxframe.dataframe.to_datetime)
: Convert argument to datetime.
`to_timedelta`
: Convert argument to timedelta.
[`to_numeric`](maxframe.dataframe.to_numeric.md#maxframe.dataframe.to_numeric)
: Convert argument to numeric type.
[`convert_dtypes`](maxframe.dataframe.Series.convert_dtypes.md#maxframe.dataframe.Series.convert_dtypes)
: Convert argument to best possible dtype.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({"A": ["a", 1, 2, 3]})
>>> df = df.iloc[1:]
>>> df.execute()
A
1 1
2 2
3 3
```
```pycon
>>> df.dtypes.execute()
A object
dtype: object
```
```pycon
>>> df.infer_objects().dtypes.execute()
A int64
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.is_monotonic_decreasing.md
# maxframe.dataframe.Series.is_monotonic_decreasing
#### *property* Series.is_monotonic_decreasing
Return boolean scalar if values in the object are
monotonic_decreasing.
* **Return type:**
Scalar
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.is_monotonic_increasing.md
# maxframe.dataframe.Series.is_monotonic_increasing
#### *property* Series.is_monotonic_increasing
Return boolean scalar if values in the object are
monotonic_increasing.
* **Return type:**
Scalar
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.is_unique.md
# maxframe.dataframe.Series.is_unique
#### *property* Series.is_unique
Return boolean if values in the object are unique.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3])
>>> s.is_unique.execute()
True
```
```pycon
>>> s = md.Series([1, 2, 3, 1])
>>> s.is_unique.execute()
False
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.isin.md
# maxframe.dataframe.Series.isin
#### Series.isin(values)
Whether elements in Series are contained in values.
Return a boolean Series showing whether each element in the Series
matches an element in the passed sequence of values exactly.
* **Parameters:**
**values** ([*set*](https://docs.python.org/3/library/stdtypes.html#set) *or* *list-like*) – The sequence of values to test. Passing in a single string will
raise a `TypeError`. Instead, turn a single string into a
list of one element.
* **Returns:**
Series of booleans indicating if each element is in values.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
* **Raises:**
[**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError) –
* If values is a string
#### SEE ALSO
`DataFrame.isin`
: Equivalent method on DataFrame.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(['lame', 'cow', 'lame', 'beetle', 'lame',
... 'hippo'], name='animal')
>>> s.isin(['cow', 'lame']).execute()
0 True
1 True
2 True
3 False
4 True
5 False
Name: animal, dtype: bool
```
Passing a single string as `s.isin('lame')` will raise an error. Use
a list of one element instead:
```pycon
>>> s.isin(['lame']).execute()
0 True
1 False
2 True
3 False
4 True
5 False
Name: animal, dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.isna.md
# maxframe.dataframe.Series.isna
#### Series.isna()
Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as None or `numpy.NaN`, gets mapped to True
values.
Everything else gets mapped to False values. Characters such as empty
strings `''` or `numpy.inf` are not considered NA values
(unless you set `pandas.options.mode.use_inf_as_na = True`).
* **Returns:**
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.isnull`](maxframe.dataframe.DataFrame.isnull.md#maxframe.dataframe.DataFrame.isnull)
: Alias of isna.
[`DataFrame.notna`](maxframe.dataframe.DataFrame.notna.md#maxframe.dataframe.DataFrame.notna)
: Boolean inverse of isna.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Omit axes labels with missing values.
[`isna`](maxframe.dataframe.isna.md#maxframe.dataframe.isna)
: Top-level isna.
### Examples
Show which entries in a DataFrame are NA.
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'age': [5, 6, np.NaN],
... 'born': [md.NaT, md.Timestamp('1939-05-27'),
... md.Timestamp('1940-04-25')],
... 'name': ['Alfred', 'Batman', ''],
... 'toy': [None, 'Batmobile', 'Joker']})
>>> df.execute()
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
```
```pycon
>>> df.isna().execute()
age born name toy
0 False True False True
1 False False False False
2 True False False False
```
Show which entries in a Series are NA.
```pycon
>>> ser = md.Series([5, 6, np.NaN])
>>> ser.execute()
0 5.0
1 6.0
2 NaN
dtype: float64
```
```pycon
>>> ser.isna().execute()
0 False
1 False
2 True
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.last_valid_index.md
# maxframe.dataframe.Series.last_valid_index
#### Series.last_valid_index()
Return index for last non-NA value or None, if no non-NA value is found.
* **Return type:**
[type](https://docs.python.org/3/library/functions.html#type) of index
### Examples
For Series:
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([None, 3, 4])
>>> s.first_valid_index().execute()
1
>>> s.last_valid_index().execute()
2
```
```pycon
>>> s = md.Series([None, None])
>>> print(s.first_valid_index()).execute()
None
>>> print(s.last_valid_index()).execute()
None
```
If all elements in Series are NA/null, returns None.
```pycon
>>> s = md.Series()
>>> print(s.first_valid_index()).execute()
None
>>> print(s.last_valid_index()).execute()
None
```
If Series is empty, returns None.
For DataFrame:
```pycon
>>> df = md.DataFrame({'A': [None, None, 2], 'B': [None, 3, 4]})
>>> df.execute()
A B
0 NaN NaN
1 NaN 3.0
2 2.0 4.0
>>> df.first_valid_index().execute()
1
>>> df.last_valid_index().execute()
2
```
```pycon
>>> df = md.DataFrame({'A': [None, None, None], 'B': [None, None, None]})
>>> df.execute()
A B
0 None None
1 None None
2 None None
>>> print(df.first_valid_index()).execute()
None
>>> print(df.last_valid_index()).execute()
None
```
If all elements in DataFrame are NA/null, returns None.
```pycon
>>> df = md.DataFrame()
>>> df.execute()
Empty DataFrame
Columns: []
Index: []
>>> print(df.first_valid_index()).execute()
None
>>> print(df.last_valid_index()).execute()
None
```
If DataFrame is empty, returns None.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.le.md
# maxframe.dataframe.Series.le
#### Series.le(other, level=None, fill_value=None, axis=0)
Return Less than or equal to of series and other, element-wise (binary operator le).
Equivalent to `series <= other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.le(b, fill_value=0).execute()
a False
b True
c True
d False
e False
f True
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.list.__getitem__.md
# maxframe.dataframe.Series.list._\_getitem_\_
#### Series.list.\_\_getitem_\_(query_index)
Get the value by the index of each list in the Series. If the index
is not in the list, raise IndexError.
* **Parameters:**
**query_index** (*Any*) – The index to check, must be integer.
* **Returns:**
A Series with the list value’s data type. Return `None` if the list is None.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
* **Raises:**
[**KeyError**](https://docs.python.org/3/library/exceptions.html#KeyError) – If the index is not in one list.
#### SEE ALSO
`Series.list.get`
: Get the value by the index of each list in the Series.
### Examples
Create a series with list type data.
```pycon
>>> import maxframe.dataframe as md
>>> import pyarrow as pa
>>> from maxframe.lib.dtypes_extension import list_
>>> s = md.Series(
... data=[[1, 2, 3], [4, 5, 6], None],
... index=[1, 2, 3],
... dtype=list_(pa.int64()),
... )
>>> s.execute()
1 [1, 2, 3]
2 [4, 5, 6]
3 <NA>
dtype: list<int64>[pyarrow]
```
```pycon
>>> s.list.get(0).execute()
1 1
2 4
3 <NA>
dtype: int64[pyarrow]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.list.len.md
# maxframe.dataframe.Series.list.len
#### Series.list.len()
Get the length of each list of the Series.
* **Returns:**
A Series with data type `pandas.ArrowDtype(pyarrow.int64)`. Each element
represents the length of the list, or `None` if the list is `None`.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
Create a series with list type data.
```pycon
>>> import maxframe.dataframe as md
>>> import pyarrow as pa
>>> from maxframe.lib.dtypes_extension import list_
>>> s = md.Series(
... data=[[1, 2, 3], [4, 5, 6], None],
... index=[1, 2, 3],
... dtype=list_(pa.int64()),
... )
>>> s.execute()
1 [1, 2, 3]
2 [4, 5, 6]
3 <NA>
dtype: list<int64>[pyarrow]
```
```pycon
>>> s.list.len().execute()
1 2
2 1
3 <NA>
dtype: int64[pyarrow]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.loc.md
# maxframe.dataframe.Series.loc
#### *property* Series.loc
Access a group of rows and columns by label(s) or a boolean array.
`.loc[]` is primarily label based, but may also be used with a
boolean array.
Allowed inputs are:
- A single label, e.g. `5` or `'a'`, (note that `5` is
interpreted as a *label* of the index, and **never** as an
integer position along the index).
- A list or array of labels, e.g. `['a', 'b', 'c']`.
- A slice object with labels, e.g. `'a':'f'`.
#### WARNING
Note that contrary to usual python slices, **both** the
start and the stop are included
- A boolean array of the same length as the axis being sliced,
e.g. `[True, False, True]`.
- An alignable boolean Series. The index of the key will be aligned before
masking.
- An alignable Index. The Index of the returned selection will be the input.
- A `callable` function with one argument (the calling Series or
DataFrame) and that returns valid output for indexing (one of the above)
See more at [Selection by Label](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-label).
* **Raises:**
* [**KeyError**](https://docs.python.org/3/library/exceptions.html#KeyError) – If any items are not found.
* **IndexingError** – If an indexed key is passed and its index is unalignable to the frame index.
#### SEE ALSO
[`DataFrame.at`](maxframe.dataframe.DataFrame.at.md#maxframe.dataframe.DataFrame.at)
: Access a single value for a row/column label pair.
[`DataFrame.iloc`](maxframe.dataframe.DataFrame.iloc.md#maxframe.dataframe.DataFrame.iloc)
: Access group of rows and columns by integer position(s).
[`DataFrame.xs`](maxframe.dataframe.DataFrame.xs.md#maxframe.dataframe.DataFrame.xs)
: Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.
[`Series.loc`](#maxframe.dataframe.Series.loc)
: Access group of values using labels.
### Examples
**Getting values**
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=['cobra', 'viper', 'sidewinder'],
... columns=['max_speed', 'shield'])
>>> df.execute()
max_speed shield
cobra 1 2
viper 4 5
sidewinder 7 8
```
Single label. Note this returns the row as a Series.
```pycon
>>> df.loc['viper'].execute()
max_speed 4
shield 5
Name: viper, dtype: int64
```
List of labels. Note using `[[]]` returns a DataFrame.
```pycon
>>> df.loc[['viper', 'sidewinder']].execute()
max_speed shield
viper 4 5
sidewinder 7 8
```
Single label for row and column
```pycon
>>> df.loc['cobra', 'shield'].execute()
2
```
Slice with labels for row and single label for column. As mentioned
above, note that both the start and stop of the slice are included.
```pycon
>>> df.loc['cobra':'viper', 'max_speed'].execute()
cobra 1
viper 4
Name: max_speed, dtype: int64
```
Boolean list with the same length as the row axis
```pycon
>>> df.loc[[False, False, True]].execute()
max_speed shield
sidewinder 7 8
```
Alignable boolean Series:
```pycon
>>> df.loc[md.Series([False, True, False],
... index=['viper', 'sidewinder', 'cobra'])].execute()
max_speed shield
sidewinder 7 8
```
Index (same behavior as `df.reindex`)
```pycon
>>> df.loc[md.Index(["cobra", "viper"], name="foo")].execute()
max_speed shield
foo
cobra 1 2
viper 4 5
```
Conditional that returns a boolean Series
```pycon
>>> df.loc[df['shield'] > 6].execute()
max_speed shield
sidewinder 7 8
```
Conditional that returns a boolean Series with column labels specified
```pycon
>>> df.loc[df['shield'] > 6, ['max_speed']].execute()
max_speed
sidewinder 7
```
Callable that returns a boolean Series
```pycon
>>> df.loc[lambda df: df['shield'] == 8].execute()
max_speed shield
sidewinder 7 8
```
**Setting values**
Set value for all items matching the list of labels
```pycon
>>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
>>> df.execute()
max_speed shield
cobra 1 2
viper 4 50
sidewinder 7 50
```
Set value for an entire row
```pycon
>>> df.loc['cobra'] = 10
>>> df.execute()
max_speed shield
cobra 10 10
viper 4 50
sidewinder 7 50
```
Set value for an entire column
```pycon
>>> df.loc[:, 'max_speed'] = 30
>>> df.execute()
max_speed shield
cobra 30 10
viper 30 50
sidewinder 30 50
```
Set value for rows matching callable condition
```pycon
>>> df.loc[df['shield'] > 35] = 0
>>> df.execute()
max_speed shield
cobra 30 10
viper 0 0
sidewinder 0 0
```
**Getting values on a DataFrame with an index that has integer labels**
Another example using integers for the index
```pycon
>>> df = md.DataFrame([[1, 2], [4, 5], [7, 8]],
... index=[7, 8, 9], columns=['max_speed', 'shield'])
>>> df.execute()
max_speed shield
7 1 2
8 4 5
9 7 8
```
Slice with integer labels for rows. As mentioned above, note that both
the start and stop of the slice are included.
```pycon
>>> df.loc[7:9].execute()
max_speed shield
7 1 2
8 4 5
9 7 8
```
**Getting values with a MultiIndex**
A number of examples using a DataFrame with a MultiIndex
```pycon
>>> tuples = [
... ('cobra', 'mark i'), ('cobra', 'mark ii'),
... ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
... ('viper', 'mark ii'), ('viper', 'mark iii')
... ]
>>> index = md.MultiIndex.from_tuples(tuples)
>>> values = [[12, 2], [0, 4], [10, 20],
... [1, 4], [7, 1], [16, 36]]
>>> df = md.DataFrame(values, columns=['max_speed', 'shield'], index=index)
>>> df.execute()
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
mark iii 16 36
```
Single label. Note this returns a DataFrame with a single index.
```pycon
>>> df.loc['cobra'].execute()
max_speed shield
mark i 12 2
mark ii 0 4
```
Single index tuple. Note this returns a Series.
```pycon
>>> df.loc[('cobra', 'mark ii')].execute()
max_speed 0
shield 4
Name: (cobra, mark ii), dtype: int64
```
Single label for row and column. Similar to passing in a tuple, this
returns a Series.
```pycon
>>> df.loc['cobra', 'mark i'].execute()
max_speed 12
shield 2
Name: (cobra, mark i), dtype: int64
```
Single tuple. Note using `[[]]` returns a DataFrame.
```pycon
>>> df.loc[[('cobra', 'mark ii')]].execute()
max_speed shield
cobra mark ii 0 4
```
Single tuple for the index with a single label for the column
```pycon
>>> df.loc[('cobra', 'mark i'), 'shield'].execute()
2
```
Slice from index tuple to single label
```pycon
>>> df.loc[('cobra', 'mark i'):'viper'].execute()
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
mark iii 16 36
```
Slice from index tuple to index tuple
```pycon
>>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')].execute()
max_speed shield
cobra mark i 12 2
mark ii 0 4
sidewinder mark i 10 20
mark ii 1 4
viper mark ii 7 1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.lt.md
# maxframe.dataframe.Series.lt
#### Series.lt(other, level=None, fill_value=None, axis=0)
Return Less than of series and other, element-wise (binary operator lt).
Equivalent to `series < other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.lt(b, fill_value=0).execute()
a False
b False
c True
d False
e False
f True
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.map.md
# maxframe.dataframe.Series.map
#### Series.map(arg, na_action=None, dtype=None, memory_scale=None, skip_infer=False)
Map values of Series according to input correspondence.
Used for substituting each value in a Series with another value,
that may be derived from a function, a `dict` or
a [`Series`](maxframe.dataframe.Series.md#maxframe.dataframe.Series).
* **Parameters:**
* **arg** (*function* *,* *collections.abc.Mapping subclass* *or* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – Mapping correspondence.
* **na_action** ( *{None* *,* *'ignore'}* *,* *default None*) – If ‘ignore’, propagate NaN values, without passing them to the
mapping correspondence.
* **dtype** (*np.dtype* *,* *default None*) – Specify return type of the function. Must be specified when
we cannot decide the return type of the function.
* **memory_scale** ([*float*](https://docs.python.org/3/library/functions.html#float)) – Specify the scale of memory uses in the function versus
input size.
* **skip_infer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether infer dtypes when dtypes or output_type is not specified
* **Returns:**
Same index as caller.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.apply`](maxframe.dataframe.Series.apply.md#maxframe.dataframe.Series.apply)
: For applying more complex functions on a Series.
[`DataFrame.apply`](maxframe.dataframe.DataFrame.apply.md#maxframe.dataframe.DataFrame.apply)
: Apply a function row-/column-wise.
[`DataFrame.applymap`](maxframe.dataframe.DataFrame.applymap.md#maxframe.dataframe.DataFrame.applymap)
: Apply a function elementwise on a whole DataFrame.
### Notes
When `arg` is a dictionary, values in Series that are not in the
dictionary (as keys) are converted to `NaN`. However, if the
dictionary is a `dict` subclass that defines `__missing__` (i.e.
provides a method for default values), then this default is used
rather than `NaN`.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series(['cat', 'dog', mt.nan, 'rabbit'])
>>> s.execute()
0 cat
1 dog
2 NaN
3 rabbit
dtype: object
```
`map` accepts a `dict` or a `Series`. Values that are not found
in the `dict` are converted to `NaN`, unless the dict has a default
value (e.g. `defaultdict`):
```pycon
>>> s.map({'cat': 'kitten', 'dog': 'puppy'}).execute()
0 kitten
1 puppy
2 NaN
3 NaN
dtype: object
```
It also accepts a function:
```pycon
>>> s.map('I am a {}'.format).execute()
0 I am a cat
1 I am a dog
2 I am a nan
3 I am a rabbit
dtype: object
```
To avoid applying the function to missing values (and keep them as
`NaN`) `na_action='ignore'` can be used:
```pycon
>>> s.map('I am a {}'.format, na_action='ignore').execute()
0 I am a cat
1 I am a dog
2 NaN
3 I am a rabbit
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.mask.md
# maxframe.dataframe.Series.mask
#### Series.mask(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
Replace values where the condition is True.
* **Parameters:**
* **cond** (*bool Series/DataFrame* *,* *array-like* *, or* *callable*) – Where cond is False, keep the original value. Where
True, replace with corresponding value from other.
If cond is callable, it is computed on the Series/DataFrame and
should return boolean Series/DataFrame or array. The callable must
not change input Series/DataFrame (though pandas doesn’t check it).
* **other** (*scalar* *,* *Series/DataFrame* *, or* *callable*) – Entries where cond is True are replaced with
corresponding value from other.
If other is callable, it is computed on the Series/DataFrame and
should return scalar or Series/DataFrame. The callable must not
change input Series/DataFrame (though pandas doesn’t check it).
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether to perform the operation in place on the data.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Alignment axis if needed.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Alignment level if needed.
* **Return type:**
Same type as caller
#### SEE ALSO
[`DataFrame.where()`](maxframe.dataframe.DataFrame.where.md#maxframe.dataframe.DataFrame.where)
: Return an object of same shape as self.
### Notes
The mask method is an application of the if-then idiom. For each
element in the calling DataFrame, if `cond` is `False` the
element is used; otherwise the corresponding element from the DataFrame
`other` is used.
The signature for [`DataFrame.where()`](maxframe.dataframe.DataFrame.where.md#maxframe.dataframe.DataFrame.where) differs from
[`numpy.where()`](https://numpy.org/doc/stable/reference/generated/numpy.where.html#numpy.where). Roughly `df1.where(m, df2)` is equivalent to
`np.where(m, df1, df2)`.
For further details and examples see the `mask` documentation in
[indexing](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-where-mask).
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series(range(5))
>>> s.where(s > 0).execute()
0 NaN
1 1.0
2 2.0
3 3.0
4 4.0
dtype: float64
```
```pycon
>>> s.mask(s > 0).execute()
0 0.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
```
```pycon
>>> s.where(s > 1, 10).execute()
0 10
1 10
2 2
3 3
4 4
dtype: int64
```
```pycon
>>> df = md.DataFrame(mt.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df.execute()
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
>>> m = df % 3 == 0
>>> df.where(m, -df).execute()
A B
0 0 -1
1 -2 3
2 -4 -5
3 6 -7
4 -8 9
>>> df.where(m, -df) == mt.where(m, df, -df).execute()
A B
0 True True
1 True True
2 True True
3 True True
4 True True
>>> df.where(m, -df) == df.mask(~m, -df).execute()
A B
0 True True
1 True True
2 True True
3 True True
4 True True
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.max.md
# maxframe.dataframe.Series.max
#### Series.max(axis=None, skipna=True, level=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.md
# maxframe.dataframe.Series
### *class* maxframe.dataframe.Series(data=None, index=None, dtype=None, name=None, copy=False, chunk_size=None, gpu=None, sparse=None, num_partitions=None)
#### \_\_init_\_(data=None, index=None, dtype=None, name=None, copy=False, chunk_size=None, gpu=None, sparse=None, num_partitions=None)
### Methods
| [`__init__`](#maxframe.dataframe.Series.__init__)([data, index, dtype, name, copy, ...]) | |
|---------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| [`abs`](maxframe.dataframe.Series.abs.md#maxframe.dataframe.Series.abs)() | |
| [`add`](maxframe.dataframe.Series.add.md#maxframe.dataframe.Series.add)(other[, level, fill_value, axis]) | Return Addition of series and other, element-wise (binary operator add). |
| [`add_prefix`](maxframe.dataframe.Series.add_prefix.md#maxframe.dataframe.Series.add_prefix)(prefix) | Prefix labels with string prefix. |
| [`add_suffix`](maxframe.dataframe.Series.add_suffix.md#maxframe.dataframe.Series.add_suffix)(suffix) | Suffix labels with string suffix. |
| [`agg`](maxframe.dataframe.Series.agg.md#maxframe.dataframe.Series.agg)([func, axis]) | Aggregate using one or more operations over the specified axis. |
| [`aggregate`](maxframe.dataframe.Series.aggregate.md#maxframe.dataframe.Series.aggregate)([func, axis]) | Aggregate using one or more operations over the specified axis. |
| [`align`](maxframe.dataframe.Series.align.md#maxframe.dataframe.Series.align)(other[, join, axis, level, copy, ...]) | Align two objects on their axes with the specified join method. |
| [`all`](maxframe.dataframe.Series.all.md#maxframe.dataframe.Series.all)([axis, bool_only, skipna, level, method]) | |
| [`any`](maxframe.dataframe.Series.any.md#maxframe.dataframe.Series.any)([axis, bool_only, skipna, level, method]) | |
| [`append`](maxframe.dataframe.Series.append.md#maxframe.dataframe.Series.append)(other[, ignore_index, ...]) | Append rows of other to the end of caller, returning a new object. |
| [`apply`](maxframe.dataframe.Series.apply.md#maxframe.dataframe.Series.apply)(func[, convert_dtype, output_type, ...]) | Invoke function on values of Series. |
| [`argmax`](maxframe.dataframe.Series.argmax.md#maxframe.dataframe.Series.argmax)([axis, skipna]) | Return int position of the smallest value in the Series. |
| [`argmin`](maxframe.dataframe.Series.argmin.md#maxframe.dataframe.Series.argmin)([axis, skipna]) | Return int position of the smallest value in the Series. |
| [`argsort`](maxframe.dataframe.Series.argsort.md#maxframe.dataframe.Series.argsort)([axis, kind, order, stable]) | Return the integer indices that would sort the Series values. |
| `around`([decimals]) | Round each value in a Series to the given number of decimals. |
| [`astype`](maxframe.dataframe.Series.astype.md#maxframe.dataframe.Series.astype)(dtype[, copy, errors]) | Cast a pandas object to a specified dtype `dtype`. |
| [`at_time`](maxframe.dataframe.Series.at_time.md#maxframe.dataframe.Series.at_time)(time[, axis]) | Select values at particular time of day (e.g., 9:30AM). |
| `autocorr`([lag]) | Compute the lag-N autocorrelation. |
| `backfill`([axis, inplace, limit, downcast]) | Synonym for [`DataFrame.fillna()`](maxframe.dataframe.DataFrame.fillna.md#maxframe.dataframe.DataFrame.fillna) with `method='bfill'`. |
| [`between`](maxframe.dataframe.Series.between.md#maxframe.dataframe.Series.between)(left, right[, inclusive]) | Return boolean Series equivalent to left <= series <= right. |
| [`between_time`](maxframe.dataframe.Series.between_time.md#maxframe.dataframe.Series.between_time)(start_time, end_time[, ...]) | Select values between particular times of the day (e.g., 9:00-9:30 AM). |
| `bfill`([axis, inplace, limit, downcast]) | Synonym for [`DataFrame.fillna()`](maxframe.dataframe.DataFrame.fillna.md#maxframe.dataframe.DataFrame.fillna) with `method='bfill'`. |
| [`case_when`](maxframe.dataframe.Series.case_when.md#maxframe.dataframe.Series.case_when)(caselist) | Replace values where the conditions are True. |
| `check_monotonic`([decreasing, strict]) | Check if values in the object are monotonic increasing or decreasing. |
| [`clip`](maxframe.dataframe.Series.clip.md#maxframe.dataframe.Series.clip)([lower, upper, axis, inplace]) | Trim values at input threshold(s). |
| [`combine`](maxframe.dataframe.Series.combine.md#maxframe.dataframe.Series.combine)(other, func[, fill_value]) | Combine the Series with a Series or scalar according to func. |
| [`combine_first`](maxframe.dataframe.Series.combine_first.md#maxframe.dataframe.Series.combine_first)(other) | Update null elements with value in the same location in 'other'. |
| [`compare`](maxframe.dataframe.Series.compare.md#maxframe.dataframe.Series.compare)(other[, align_axis, keep_shape, ...]) | Compare to another Series and show the differences. |
| [`convert_dtypes`](maxframe.dataframe.Series.convert_dtypes.md#maxframe.dataframe.Series.convert_dtypes)([infer_objects, ...]) | Convert columns to best possible dtypes using dtypes supporting `pd.NA`. |
| [`copy`](maxframe.dataframe.Series.copy.md#maxframe.dataframe.Series.copy)([deep]) | Make a copy of this object's indices and data. |
| `copy_from`(obj) | |
| `copy_to`(target) | |
| [`corr`](maxframe.dataframe.Series.corr.md#maxframe.dataframe.Series.corr)(other[, method, min_periods]) | Compute correlation with other Series, excluding missing values. |
| [`count`](maxframe.dataframe.Series.count.md#maxframe.dataframe.Series.count)([level]) | |
| [`cov`](maxframe.dataframe.Series.cov.md#maxframe.dataframe.Series.cov)(other[, min_periods, ddof]) | Compute covariance with Series, excluding missing values. |
| `cummax`([axis, skipna]) | |
| `cummin`([axis, skipna]) | |
| `cumprod`([axis, skipna]) | |
| `cumsum`([axis, skipna]) | |
| [`describe`](maxframe.dataframe.Series.describe.md#maxframe.dataframe.Series.describe)([percentiles, include, exclude]) | Generate descriptive statistics. |
| `diff`([periods]) | First discrete difference of element. |
| [`div`](maxframe.dataframe.Series.div.md#maxframe.dataframe.Series.div)(other[, level, fill_value, axis]) | Return Floating division of series and other, element-wise (binary operator truediv). |
| `dot`(other) | Compute the dot product between the Series and the columns of other. |
| [`drop`](maxframe.dataframe.Series.drop.md#maxframe.dataframe.Series.drop)([labels, axis, index, columns, level, ...]) | Return Series with specified index labels removed. |
| [`drop_duplicates`](maxframe.dataframe.Series.drop_duplicates.md#maxframe.dataframe.Series.drop_duplicates)([keep, inplace, ...]) | Return Series with duplicate values removed. |
| [`droplevel`](maxframe.dataframe.Series.droplevel.md#maxframe.dataframe.Series.droplevel)(level[, axis]) | Return Series/DataFrame with requested index / column level(s) removed. |
| [`dropna`](maxframe.dataframe.Series.dropna.md#maxframe.dataframe.Series.dropna)([axis, inplace, how, ignore_index]) | Return a new Series with missing values removed. |
| `duplicated`([keep, method]) | Indicate duplicate Series values. |
| [`eq`](maxframe.dataframe.Series.eq.md#maxframe.dataframe.Series.eq)(other[, level, fill_value, axis]) | Return Equal to of series and other, element-wise (binary operator eq). |
| [`ewm`](maxframe.dataframe.Series.ewm.md#maxframe.dataframe.Series.ewm)([com, span, halflife, alpha, ...]) | Provide exponential weighted functions. |
| `execute`([session]) | |
| [`expanding`](maxframe.dataframe.Series.expanding.md#maxframe.dataframe.Series.expanding)([min_periods, shift, reverse_range]) | Provide expanding transformations. |
| [`explode`](maxframe.dataframe.Series.explode.md#maxframe.dataframe.Series.explode)([ignore_index, default_index_type]) | Transform each element of a list-like to a row. |
| [`factorize`](maxframe.dataframe.Series.factorize.md#maxframe.dataframe.Series.factorize)([sort, use_na_sentinel]) | Encode the object as an enumerated type or categorical variable. |
| `ffill`([axis, inplace, limit, downcast]) | Synonym for [`DataFrame.fillna()`](maxframe.dataframe.DataFrame.fillna.md#maxframe.dataframe.DataFrame.fillna) with `method='ffill'`. |
| [`fillna`](maxframe.dataframe.Series.fillna.md#maxframe.dataframe.Series.fillna)([value, method, axis, inplace, ...]) | Fill NA/NaN values using the specified method. |
| [`filter`](maxframe.dataframe.Series.filter.md#maxframe.dataframe.Series.filter)([items, like, regex, axis]) | Subset the dataframe rows or columns according to the specified index labels. |
| [`first_valid_index`](maxframe.dataframe.Series.first_valid_index.md#maxframe.dataframe.Series.first_valid_index)() | Return index for first non-NA value or None, if no non-NA value is found. |
| [`floordiv`](maxframe.dataframe.Series.floordiv.md#maxframe.dataframe.Series.floordiv)(other[, level, fill_value, axis]) | Return Integer division of series and other, element-wise (binary operator floordiv). |
| `from_tensor`(tensor[, index, name, dtype, ...]) | |
| [`ge`](maxframe.dataframe.Series.ge.md#maxframe.dataframe.Series.ge)(other[, level, fill_value, axis]) | Return Greater than or equal to of series and other, element-wise (binary operator ge). |
| [`groupby`](maxframe.dataframe.Series.groupby.md#maxframe.dataframe.Series.groupby)([by, level, as_index, sort, group_keys]) | Group DataFrame using a mapper or by a Series of columns. |
| [`gt`](maxframe.dataframe.Series.gt.md#maxframe.dataframe.Series.gt)(other[, level, fill_value, axis]) | Return Greater than of series and other, element-wise (binary operator gt). |
| [`head`](maxframe.dataframe.Series.head.md#maxframe.dataframe.Series.head)([n]) | Return the first n rows. |
| [`idxmax`](maxframe.dataframe.Series.idxmax.md#maxframe.dataframe.Series.idxmax)([axis, skipna]) | Return the row label of the maximum value. |
| [`idxmin`](maxframe.dataframe.Series.idxmin.md#maxframe.dataframe.Series.idxmin)([axis, skipna]) | Return the row label of the minimum value. |
| [`infer_objects`](maxframe.dataframe.Series.infer_objects.md#maxframe.dataframe.Series.infer_objects)([copy]) | Attempt to infer better dtypes for object columns. |
| [`isin`](maxframe.dataframe.Series.isin.md#maxframe.dataframe.Series.isin)(values) | Whether elements in Series are contained in values. |
| [`isna`](maxframe.dataframe.Series.isna.md#maxframe.dataframe.Series.isna)() | Detect missing values. |
| `isnull`() | Detect missing values. |
| `items`([batch_size, session]) | Lazily iterate over (index, value) tuples. |
| `iteritems`([batch_size, session]) | Lazily iterate over (index, value) tuples. |
| `keys`() | Return alias for index. |
| `kurt`([axis, skipna, level, bias, fisher, method]) | |
| `kurtosis`([axis, skipna, level, bias, ...]) | |
| [`last_valid_index`](maxframe.dataframe.Series.last_valid_index.md#maxframe.dataframe.Series.last_valid_index)() | Return index for last non-NA value or None, if no non-NA value is found. |
| [`le`](maxframe.dataframe.Series.le.md#maxframe.dataframe.Series.le)(other[, level, fill_value, axis]) | Return Less than or equal to of series and other, element-wise (binary operator le). |
| [`lt`](maxframe.dataframe.Series.lt.md#maxframe.dataframe.Series.lt)(other[, level, fill_value, axis]) | Return Less than of series and other, element-wise (binary operator lt). |
| [`map`](maxframe.dataframe.Series.map.md#maxframe.dataframe.Series.map)(arg[, na_action, dtype, memory_scale, ...]) | Map values of Series according to input correspondence. |
| [`mask`](maxframe.dataframe.Series.mask.md#maxframe.dataframe.Series.mask)(cond[, other, inplace, axis, level, ...]) | Replace values where the condition is True. |
| [`max`](maxframe.dataframe.Series.max.md#maxframe.dataframe.Series.max)([axis, skipna, level, method]) | |
| [`mean`](maxframe.dataframe.Series.mean.md#maxframe.dataframe.Series.mean)([axis, skipna, level, method]) | |
| [`median`](maxframe.dataframe.Series.median.md#maxframe.dataframe.Series.median)([axis, skipna, level, method]) | |
| [`memory_usage`](maxframe.dataframe.Series.memory_usage.md#maxframe.dataframe.Series.memory_usage)([index, deep]) | Return the memory usage of the Series. |
| [`min`](maxframe.dataframe.Series.min.md#maxframe.dataframe.Series.min)([axis, skipna, level, method]) | |
| [`mod`](maxframe.dataframe.Series.mod.md#maxframe.dataframe.Series.mod)(other[, level, fill_value, axis]) | Return Modulo of series and other, element-wise (binary operator mod). |
| [`mode`](maxframe.dataframe.Series.mode.md#maxframe.dataframe.Series.mode)([dropna, combine_size]) | Return the mode(s) of the Series. |
| [`mul`](maxframe.dataframe.Series.mul.md#maxframe.dataframe.Series.mul)(other[, level, fill_value, axis]) | Return Multiplication of series and other, element-wise (binary operator mul). |
| `multiply`(other[, level, fill_value, axis]) | Return Multiplication of series and other, element-wise (binary operator mul). |
| [`ne`](maxframe.dataframe.Series.ne.md#maxframe.dataframe.Series.ne)(other[, level, fill_value, axis]) | Return Not equal to of series and other, element-wise (binary operator ne). |
| [`nlargest`](maxframe.dataframe.Series.nlargest.md#maxframe.dataframe.Series.nlargest)(n[, keep]) | Return the largest n elements. |
| [`notna`](maxframe.dataframe.Series.notna.md#maxframe.dataframe.Series.notna)() | Detect existing (non-missing) values. |
| `notnull`() | Detect existing (non-missing) values. |
| [`nsmallest`](maxframe.dataframe.Series.nsmallest.md#maxframe.dataframe.Series.nsmallest)(n[, keep]) | Return the smallest n elements. |
| [`nunique`](maxframe.dataframe.Series.nunique.md#maxframe.dataframe.Series.nunique)([dropna]) | Return number of unique elements in the object. |
| `pad`([axis, inplace, limit, downcast]) | Synonym for [`DataFrame.fillna()`](maxframe.dataframe.DataFrame.fillna.md#maxframe.dataframe.DataFrame.fillna) with `method='ffill'`. |
| `pct_change`([periods, fill_method, limit, freq]) | Percentage change between the current and a prior element. |
| [`pop`](maxframe.dataframe.Series.pop.md#maxframe.dataframe.Series.pop)(item) | Return item and drops from series. |
| [`pow`](maxframe.dataframe.Series.pow.md#maxframe.dataframe.Series.pow)(other[, level, fill_value, axis]) | Return Exponential power of series and other, element-wise (binary operator pow). |
| [`prod`](maxframe.dataframe.Series.prod.md#maxframe.dataframe.Series.prod)([axis, skipna, level, min_count, method]) | |
| [`product`](maxframe.dataframe.Series.product.md#maxframe.dataframe.Series.product)([axis, skipna, level, min_count, method]) | |
| [`quantile`](maxframe.dataframe.Series.quantile.md#maxframe.dataframe.Series.quantile)([q, interpolation]) | Return value at the given quantile. |
| [`radd`](maxframe.dataframe.Series.radd.md#maxframe.dataframe.Series.radd)(other[, level, fill_value, axis]) | Return Addition of series and other, element-wise (binary operator radd). |
| [`rank`](maxframe.dataframe.Series.rank.md#maxframe.dataframe.Series.rank)([axis, method, numeric_only, ...]) | Compute numerical data ranks (1 through n) along axis. |
| [`rdiv`](maxframe.dataframe.Series.rdiv.md#maxframe.dataframe.Series.rdiv)(other[, level, fill_value, axis]) | Return Floating division of series and other, element-wise (binary operator rtruediv). |
| `rechunk`(chunk_size[, reassign_worker]) | |
| [`reindex`](maxframe.dataframe.Series.reindex.md#maxframe.dataframe.Series.reindex)([labels, index, columns, axis, ...]) | Conform Series/DataFrame to new index with optional filling logic. |
| [`reindex_like`](maxframe.dataframe.Series.reindex_like.md#maxframe.dataframe.Series.reindex_like)(other[, method, copy, limit, ...]) | Return an object with matching indices as other object. |
| [`rename`](maxframe.dataframe.Series.rename.md#maxframe.dataframe.Series.rename)([index, axis, copy, inplace, level, ...]) | Alter Series index labels or name. |
| `rename_axis`([mapper, index, columns, axis, ...]) | Set the name of the axis for the index or columns. |
| [`reorder_levels`](maxframe.dataframe.Series.reorder_levels.md#maxframe.dataframe.Series.reorder_levels)(order) | Rearrange index levels using input order. |
| [`repeat`](maxframe.dataframe.Series.repeat.md#maxframe.dataframe.Series.repeat)(repeats[, axis]) | Repeat elements of a Series. |
| `replace`([to_replace, value, inplace, limit, ...]) | Replace values given in to_replace with value. |
| [`reset_index`](maxframe.dataframe.Series.reset_index.md#maxframe.dataframe.Series.reset_index)([level, drop, name, inplace, ...]) | Generate a new DataFrame or Series with the index reset. |
| [`rfloordiv`](maxframe.dataframe.Series.rfloordiv.md#maxframe.dataframe.Series.rfloordiv)(other[, level, fill_value, axis]) | Return Integer division of series and other, element-wise (binary operator rfloordiv). |
| [`rmod`](maxframe.dataframe.Series.rmod.md#maxframe.dataframe.Series.rmod)(other[, level, fill_value, axis]) | Return Modulo of series and other, element-wise (binary operator rmod). |
| [`rmul`](maxframe.dataframe.Series.rmul.md#maxframe.dataframe.Series.rmul)(other[, level, fill_value, axis]) | Return Multiplication of series and other, element-wise (binary operator rmul). |
| [`rolling`](maxframe.dataframe.Series.rolling.md#maxframe.dataframe.Series.rolling)(window[, min_periods, center, ...]) | Provide rolling window calculations. |
| [`round`](maxframe.dataframe.Series.round.md#maxframe.dataframe.Series.round)([decimals]) | Round each value in a Series to the given number of decimals. |
| [`rpow`](maxframe.dataframe.Series.rpow.md#maxframe.dataframe.Series.rpow)(other[, level, fill_value, axis]) | Return Exponential power of series and other, element-wise (binary operator rpow). |
| [`rsub`](maxframe.dataframe.Series.rsub.md#maxframe.dataframe.Series.rsub)(other[, level, fill_value, axis]) | Return Subtraction of series and other, element-wise (binary operator rsubtract). |
| [`rtruediv`](maxframe.dataframe.Series.rtruediv.md#maxframe.dataframe.Series.rtruediv)(other[, level, fill_value, axis]) | Return Floating division of series and other, element-wise (binary operator rtruediv). |
| [`sample`](maxframe.dataframe.Series.sample.md#maxframe.dataframe.Series.sample)([n, frac, replace, weights, ...]) | Return a random sample of items from an axis of object. |
| [`sem`](maxframe.dataframe.Series.sem.md#maxframe.dataframe.Series.sem)([axis, skipna, level, ddof, method]) | |
| [`set_axis`](maxframe.dataframe.Series.set_axis.md#maxframe.dataframe.Series.set_axis)(labels[, axis, inplace]) | Assign desired index to given axis. |
| [`shift`](maxframe.dataframe.Series.shift.md#maxframe.dataframe.Series.shift)([periods, freq, axis, fill_value]) | Shift index by desired number of periods with an optional time freq. |
| `skew`([axis, skipna, level, bias, method]) | |
| [`sort_index`](maxframe.dataframe.Series.sort_index.md#maxframe.dataframe.Series.sort_index)([axis, level, ascending, ...]) | Sort object by labels (along an axis). |
| [`sort_values`](maxframe.dataframe.Series.sort_values.md#maxframe.dataframe.Series.sort_values)([axis, ascending, inplace, ...]) | Sort by the values. |
| [`std`](maxframe.dataframe.Series.std.md#maxframe.dataframe.Series.std)([axis, skipna, level, ddof, method]) | |
| [`sub`](maxframe.dataframe.Series.sub.md#maxframe.dataframe.Series.sub)(other[, level, fill_value, axis]) | Return Subtraction of series and other, element-wise (binary operator subtract). |
| [`sum`](maxframe.dataframe.Series.sum.md#maxframe.dataframe.Series.sum)([axis, skipna, level, min_count, method]) | |
| [`swaplevel`](maxframe.dataframe.Series.swaplevel.md#maxframe.dataframe.Series.swaplevel)([i, j]) | Swap levels i and j in a `MultiIndex`. |
| `tail`([n]) | Return the last n rows. |
| [`take`](maxframe.dataframe.Series.take.md#maxframe.dataframe.Series.take)(indices[, axis]) | Return the elements in the given *positional* indices along an axis. |
| `to_clipboard`(\*[, excel, sep, batch_size, ...]) | Copy object to the system clipboard. |
| [`to_csv`](maxframe.dataframe.Series.to_csv.md#maxframe.dataframe.Series.to_csv)(path[, sep, na_rep, float_format, ...]) | Write object to a comma-separated values (csv) file. |
| [`to_dict`](maxframe.dataframe.Series.to_dict.md#maxframe.dataframe.Series.to_dict)([into, batch_size, session]) | Convert Series to {label -> value} dict or dict-like object. |
| [`to_frame`](maxframe.dataframe.Series.to_frame.md#maxframe.dataframe.Series.to_frame)([name]) | Convert Series to DataFrame. |
| [`to_json`](maxframe.dataframe.Series.to_json.md#maxframe.dataframe.Series.to_json)([path, orient, date_format, ...]) | Convert the object to a JSON string. |
| [`to_list`](maxframe.dataframe.Series.to_list.md#maxframe.dataframe.Series.to_list)([batch_size, session]) | Return a list of the values. |
| `to_pandas`([session]) | |
| `to_tensor`([dtype]) | |
| [`transform`](maxframe.dataframe.Series.transform.md#maxframe.dataframe.Series.transform)(func[, convert_dtype, axis, ...]) | Call `func` on self producing a Series with transformed values. |
| [`truediv`](maxframe.dataframe.Series.truediv.md#maxframe.dataframe.Series.truediv)(other[, level, fill_value, axis]) | Return Floating division of series and other, element-wise (binary operator truediv). |
| [`truncate`](maxframe.dataframe.Series.truncate.md#maxframe.dataframe.Series.truncate)([before, after, axis, copy]) | Truncate a Series or DataFrame before and after some index value. |
| [`tshift`](maxframe.dataframe.Series.tshift.md#maxframe.dataframe.Series.tshift)([periods, freq, axis]) | Shift the time index, using the index's frequency if available. |
| [`unique`](maxframe.dataframe.Series.unique.md#maxframe.dataframe.Series.unique)([method]) | Uniques are returned in order of appearance. |
| [`unstack`](maxframe.dataframe.Series.unstack.md#maxframe.dataframe.Series.unstack)([level, fill_value]) | Unstack, also known as pivot, Series with MultiIndex to produce DataFrame. |
| [`update`](maxframe.dataframe.Series.update.md#maxframe.dataframe.Series.update)(other) | Modify Series in place using values from passed Series. |
| [`value_counts`](maxframe.dataframe.Series.value_counts.md#maxframe.dataframe.Series.value_counts)([normalize, sort, ascending, ...]) | Return a Series containing counts of unique values. |
| [`var`](maxframe.dataframe.Series.var.md#maxframe.dataframe.Series.var)([axis, skipna, level, ddof, method]) | |
| [`where`](maxframe.dataframe.Series.where.md#maxframe.dataframe.Series.where)(cond[, other, inplace, axis, level, ...]) | Replace values where the condition is False. |
| [`xs`](maxframe.dataframe.Series.xs.md#maxframe.dataframe.Series.xs)(key[, axis, level, drop_level]) | Return cross-section from the Series/DataFrame. |
### Attributes
| [`T`](maxframe.dataframe.Series.T.md#maxframe.dataframe.Series.T) | Return the transpose, which is by definition self. |
|-------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| [`at`](maxframe.dataframe.Series.at.md#maxframe.dataframe.Series.at) | Access a single value for a row/column label pair. |
| `data` | |
| [`dtype`](maxframe.dataframe.Series.dtype.md#maxframe.dataframe.Series.dtype) | Return the dtype object of the underlying data. |
| [`iat`](maxframe.dataframe.Series.iat.md#maxframe.dataframe.Series.iat) | Access a single value for a row/column pair by integer position. |
| [`iloc`](maxframe.dataframe.Series.iloc.md#maxframe.dataframe.Series.iloc) | Purely integer-location based indexing for selection by position. |
| [`index`](maxframe.dataframe.Series.index.md#maxframe.dataframe.Series.index) | The index (axis labels) of the Series. |
| `is_monotonic` | Return boolean scalar if values in the object are monotonic_increasing. |
| [`is_monotonic_decreasing`](maxframe.dataframe.Series.is_monotonic_decreasing.md#maxframe.dataframe.Series.is_monotonic_decreasing) | Return boolean scalar if values in the object are monotonic_decreasing. |
| [`is_monotonic_increasing`](maxframe.dataframe.Series.is_monotonic_increasing.md#maxframe.dataframe.Series.is_monotonic_increasing) | Return boolean scalar if values in the object are monotonic_increasing. |
| [`is_unique`](maxframe.dataframe.Series.is_unique.md#maxframe.dataframe.Series.is_unique) | Return boolean if values in the object are unique. |
| [`loc`](maxframe.dataframe.Series.loc.md#maxframe.dataframe.Series.loc) | Access a group of rows and columns by label(s) or a boolean array. |
| [`name`](maxframe.dataframe.Series.name.md#maxframe.dataframe.Series.name) | |
| [`ndim`](maxframe.dataframe.Series.ndim.md#maxframe.dataframe.Series.ndim) | Return an int representing the number of axes / array dimensions. |
| [`shape`](maxframe.dataframe.Series.shape.md#maxframe.dataframe.Series.shape) | |
| `size` | |
| `type_name` | |
| `values` | |
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.mean.md
# maxframe.dataframe.Series.mean
#### Series.mean(axis=None, skipna=True, level=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.median.md
# maxframe.dataframe.Series.median
#### Series.median(axis=None, skipna=True, level=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.memory_usage.md
# maxframe.dataframe.Series.memory_usage
#### Series.memory_usage(index=True, deep=False)
Return the memory usage of the Series.
The memory usage can optionally include the contribution of
the index and of elements of object dtype.
* **Parameters:**
* **index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Specifies whether to include the memory usage of the Series index.
* **deep** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, introspect the data deeply by interrogating
object dtypes for system-level memory consumption, and include
it in the returned value.
* **Returns:**
Bytes of memory consumed.
* **Return type:**
[int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
[`numpy.ndarray.nbytes`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.nbytes.html#numpy.ndarray.nbytes)
: Total bytes consumed by the elements of the array.
[`DataFrame.memory_usage`](maxframe.dataframe.DataFrame.memory_usage.md#maxframe.dataframe.DataFrame.memory_usage)
: Bytes consumed by a DataFrame.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(range(3))
>>> s.memory_usage().execute()
152
```
Not including the index gives the size of the rest of the data, which
is necessarily smaller:
```pycon
>>> s.memory_usage(index=False).execute()
24
```
The memory footprint of object values is ignored by default:
```pycon
>>> s = md.Series(["a", "b"])
>>> s.values.execute()
array(['a', 'b'], dtype=object)
```
```pycon
>>> s.memory_usage().execute()
144
```
```pycon
>>> s.memory_usage(deep=True).execute()
260
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.mf.apply_chunk.md
# maxframe.dataframe.Series.mf.apply_chunk
#### Series.mf.apply_chunk(func: [str](https://docs.python.org/3/library/stdtypes.html#str) | [Callable](https://docs.python.org/3/library/typing.html#typing.Callable), batch_rows=None, dtypes=None, dtype=None, name=None, output_type=None, index=None, skip_infer=False, args=(), \*\*kwargs)
Apply a function that takes pandas Series and outputs pandas DataFrame/Series.
The pandas DataFrame given to the function is a chunk of the input series.
The objects passed into this function are slices of the original series, containing at most batch_rows
number of elements. The function output can be either a DataFrame or a Series.
`apply_chunk` will ultimately merge the results into a new DataFrame or Series.
Don’t expect to receive all elements of series in the function, as it depends on the implementation
of MaxFrame and the internal running state of MaxCompute.
Can be ufunc (a NumPy function that applies to the entire Series)
or a Python function that only works on series.
* **Parameters:**
* **func** (*function*) – Python function or NumPy ufunc to apply.
* **batch_rows** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Specify expected number of elements in a batch, as well as the len of function input series.
When the remaining data is insufficient, it may be less than this number.
* **output_type** ( *{'dataframe'* *,* *'series'}* *,* *default None*) – Specify type of returned object. See Notes for more details.
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *default None*) – Specify dtypes of returned DataFrames. See Notes for more details.
* **dtype** ([*numpy.dtype*](https://numpy.org/doc/stable/reference/generated/numpy.dtype.html#numpy.dtype) *,* *default None*) – Specify dtype of returned Series. See Notes for more details.
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Specify name of returned Series. See Notes for more details.
* **index** ([*Index*](maxframe.dataframe.Index.md#maxframe.dataframe.Index) *,* *default None*) – Specify index of returned object. See Notes for more details.
* **args** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple)) – Positional arguments passed to func after the series value.
* **skip_infer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether infer dtypes when dtypes or output_type is not specified.
* **\*\*kwds** – Additional keyword arguments passed to func.
* **Returns:**
If func returns a Series object the result will be a Series, else the result will be a DataFrame.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`DataFrame.apply_chunk`
: Apply function to DataFrame chunk.
[`Series.apply`](maxframe.dataframe.Series.apply.md#maxframe.dataframe.Series.apply)
: For non-batching operations.
### Notes
When deciding output dtypes and shape of the return value, MaxFrame will
try applying `func` onto a mock Series, and the apply call may fail.
When this happens, you need to specify the type of apply call
(DataFrame or Series) in output_type.
* For DataFrame output, you need to specify a list or a pandas Series
as `dtypes` of output DataFrame. `index` of output can also be
specified.
* For Series output, you need to specify `dtype` and `name` of
output Series.
* For any input with data type `pandas.ArrowDtype(pyarrow.MapType)`, it will always
be converted to a Python dict. And for any output with this data type, it must be
returned as a Python dict as well.
### Examples
Create a series with typical summer temperatures for each city.
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series([20, 21, 12],
... index=['London', 'New York', 'Helsinki'])
>>> s.execute()
London 20
New York 21
Helsinki 12
dtype: int64
```
Square the values by defining a function and passing it as an
argument to `apply_chunk()`.
```pycon
>>> def square(x):
... return x ** 2
>>> s.mf.apply_chunk(square, batch_rows=2).execute()
London 400
New York 441
Helsinki 144
dtype: int64
```
Square the values by passing an anonymous function as an
argument to `apply_chunk()`.
```pycon
>>> s.mf.apply_chunk(lambda x: x**2, batch_rows=2).execute()
London 400
New York 441
Helsinki 144
dtype: int64
```
Define a custom function that needs additional positional
arguments and pass these additional arguments using the
`args` keyword.
```pycon
>>> def subtract_custom_value(x, custom_value):
... return x - custom_value
```
```pycon
>>> s.mf.apply_chunk(subtract_custom_value, args=(5,), batch_rows=3).execute()
London 15
New York 16
Helsinki 7
dtype: int64
```
Define a custom function that takes keyword arguments
and pass these arguments to `apply_chunk`.
```pycon
>>> def add_custom_values(x, **kwargs):
... for month in kwargs:
... x += kwargs[month]
... return x
```
```pycon
>>> s.mf.apply_chunk(add_custom_values, batch_rows=2, june=30, july=20, august=25).execute()
London 95
New York 96
Helsinki 87
dtype: int64
```
If func return a dataframe, the apply_chunk will return a dataframe as well.
```pycon
>>> def get_dataframe(x):
... return pd.concat([x, x], axis=1)
```
```pycon
>>> s.mf.apply_chunk(get_dataframe, batch_rows=2).execute()
0 1
London 20 20
New York 21 21
Helsinki 12 12
```
Provides a dtypes or dtype with name to naming the output schema.
```pycon
>>> s.mf.apply_chunk(
... get_dataframe,
... batch_rows=2,
... dtypes={"A": np.int_, "B": np.int_},
... output_type="dataframe"
... ).execute()
A B
London 20 20
New York 21 21
Helsinki 12 12
```
Create a series with a dict type.
```pycon
>>> import pyarrow as pa
>>> from maxframe.lib.dtypes_extension import dict_
>>> s = md.Series(
... data=[[("k1", 1), ("k2", 2)], [("k1", 3)], None],
... index=[1, 2, 3],
... dtype=dict_(pa.string(), pa.int64()),
... )
>>> s.execute()
1 [('k1', 1), ('k2', 2)]
2 [('k1', 3)]
3 <NA>
dtype: map<string, int64>[pyarrow]
```
Define a function that updates the map type with a new key-value pair in a batch.
```pycon
>>> def custom_set_item(row):
... for _, value in row.items():
... if value is not None:
... value["x"] = 100
... return row
```
```pycon
>>> s.mf.apply_chunk(
... custom_set_item,
... output_type="series",
... dtype=s.dtype,
... batch_rows=2,
... skip_infer=True,
... index=s.index,
... ).execute()
1 [('k1', 1), ('k2', 2), ('x', 100)]
2 [('k1', 3), ('x', 100)]
3 <NA>
dtype: map<string, int64>[pyarrow]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.mf.flatjson.md
# maxframe.dataframe.Series.mf.flatjson
#### Series.mf.flatjson(query_paths: [List](https://docs.python.org/3/library/typing.html#typing.List)[[str](https://docs.python.org/3/library/stdtypes.html#str)], dtypes=None, dtype=None, name: [str](https://docs.python.org/3/library/stdtypes.html#str) = None) → DataFrame
Flat JSON object in the series to a dataframe according to JSON query.
* **Parameters:**
* **series** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – The series of json strings.
* **query_paths** (*List* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *] or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The JSON query paths for each generated column. The path format should follow
[RFC9535]([https://datatracker.ietf.org/doc/rfc9535/](https://datatracker.ietf.org/doc/rfc9535/)).
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *default None*) – Specify dtypes of returned DataFrame. Can’t work with dtype.
* **dtype** ([*numpy.dtype*](https://numpy.org/doc/stable/reference/generated/numpy.dtype.html#numpy.dtype) *,* *default None*) – Specify dtype of returned Series. Can’t work with dtypes.
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Specify name of the returned Series.
* **Returns:**
Result of DataFrame when dtypes specified, else Series.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> import pandas as pd
>>> s = md.Series(
... [
... '{"age": 24, "gender": "male", "graduated": false}',
... '{"age": 25, "gender": "female", "graduated": true}',
... ]
... )
>>> s.execute()
0 {"age": 24, "gender": "male", "graduated": false}
1 {"age": 25, "gender": "female", "graduated": true}
dtype: object
```
```pycon
>>> df = s.mf.flatjson(
... ["$.age", "$.gender", "$.graduated"],
... dtypes=pd.Series(["int32", "object", "bool"], index=["age", "gender", "graduated"]),
... )
>>> df.execute()
age gender graduated
0 24 male True
1 25 female True
```
```pycon
>>> s2 = s.mf.flatjson("$.age", name="age", dtype="int32")
>>> s2.execute()
0 24
1 25
Name: age, dtype: int32
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.mf.flatmap.md
# maxframe.dataframe.Series.mf.flatmap
#### Series.mf.flatmap(func: [Callable](https://docs.python.org/3/library/typing.html#typing.Callable), dtypes=None, dtype=None, name=None, args=(), \*\*kwargs)
Apply the given function to each row and then flatten results. Use this method if your transformation returns
multiple rows for each input row.
This function applies a transformation to each element of the Series, where the transformation can return zero
: or multiple values, effectively flattening Python generator, list-liked collections and DataFrame.
* **Parameters:**
* **func** (*Callable*) – Function to apply to each element of the Series. It should accept a scalar value
(or an array if `raw=True`) and return a list or iterable of values.
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *default None*) – Specify dtypes of returned DataFrame. Can’t work with dtype.
* **dtype** ([*numpy.dtype*](https://numpy.org/doc/stable/reference/generated/numpy.dtype.html#numpy.dtype) *,* *default None*) – Specify dtype of returned Series. Can’t work with dtypes.
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Specify name of the returned Series.
* **args** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple)) – Positional arguments to pass to `func`.
* **\*\*kwargs** – Additional keyword arguments to pass as keywords arguments to `func`.
* **Returns:**
Result of DataFrame when dtypes specified, else Series.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Notes
The `func` must return an iterable of values for each input element. If `dtypes` is specified,
flatmap will return a DataFrame, if `dtype` and `name` is specified, a Series will be returned.
The index of the resulting DataFrame/Series will be repeated based on the number of output rows generated
by `func`.
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
>>> df.execute()
A B
0 1 4
1 2 5
2 3 6
```
Define a function that takes a number and returns a list of two numbers:
```pycon
>>> def generate_values_array(x):
... return [x * 2, x * 3]
```
Specify `dtype` with a function which returns list to return more elements as a Series:
```pycon
>>> df['A'].mf.flatmap(generate_values_array, dtype="int", name="C").execute()
0 2
0 3
1 4
1 6
2 6
2 9
Name: C, dtype: int64
```
Specify `dtypes` to return multi columns as a DataFrame:
```pycon
>>> def generate_values_in_generator(x):
... yield pd.Series([x * 2, x * 4])
... yield pd.Series([x * 3, x * 5])
```
```pycon
>>> df['A'].mf.flatmap(generate_values_in_generator, dtypes={"A": "int", "B": "int"}).execute()
A B
0 2 4
0 3 5
1 4 8
1 6 10
2 6 12
2 9 15
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.min.md
# maxframe.dataframe.Series.min
#### Series.min(axis=None, skipna=True, level=None, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.mod.md
# maxframe.dataframe.Series.mod
#### Series.mod(other, level=None, fill_value=None, axis=0)
Return Modulo of series and other, element-wise (binary operator mod).
Equivalent to `series % other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.rmod`](maxframe.dataframe.Series.rmod.md#maxframe.dataframe.Series.rmod)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.mod(b, fill_value=0).execute()
a 0.0
b NaN
c NaN
d 0.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.mode.md
# maxframe.dataframe.Series.mode
#### Series.mode(dropna=True, combine_size=None)
Return the mode(s) of the Series.
The mode is the value that appears most often. There can be multiple modes.
Always returns Series even if only one value is returned.
* **Parameters:**
**dropna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Don’t consider counts of NaN/NaT.
* **Returns:**
Modes of the Series in sorted order.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([2, 4, 2, 2, 4, None])
>>> s.mode().execute()
0 2.0
dtype: float64
```
More than one mode:
```pycon
>>> s = md.Series([2, 4, 8, 2, 4, None])
>>> s.mode().execute()
0 2.0
1 4.0
dtype: float64
```
With and without considering null value:
```pycon
>>> s = md.Series([2, 4, None, None, 4, None])
>>> s.mode(dropna=False).execute()
0 NaN
dtype: float64
>>> s = md.Series([2, 4, None, None, 4, None])
>>> s.mode().execute()
0 4.0
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.mul.md
# maxframe.dataframe.Series.mul
#### Series.mul(other, level=None, fill_value=None, axis=0)
Return Multiplication of series and other, element-wise (binary operator mul).
Equivalent to `series * other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.rmul`](maxframe.dataframe.Series.rmul.md#maxframe.dataframe.Series.rmul)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.multiply(b, fill_value=0).execute()
a 1.0
b 0.0
c 0.0
d 0.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.name.md
# maxframe.dataframe.Series.name
#### *property* Series.name
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.ndim.md
# maxframe.dataframe.Series.ndim
#### *property* Series.ndim
Return an int representing the number of axes / array dimensions.
Return 1 if Series. Otherwise return 2 if DataFrame.
#### SEE ALSO
`ndarray.ndim`
: Number of array dimensions.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.ndim
1
```
```pycon
>>> df = md.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.ndim
2
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.ne.md
# maxframe.dataframe.Series.ne
#### Series.ne(other, level=None, fill_value=None, axis=0)
Return Not equal to of series and other, element-wise (binary operator ne).
Equivalent to `series != other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.ne(b, fill_value=0).execute()
a False
b True
c True
d True
e True
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.nlargest.md
# maxframe.dataframe.Series.nlargest
#### Series.nlargest(n, keep='first')
Return the largest n elements.
* **Parameters:**
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 5*) – Return this many descending sorted values.
* **keep** ( *{'first'* *,* *'last'* *,* *'all'}* *,* *default 'first'*) –
When there are duplicate values that cannot all fit in a
Series of n elements:
- `first`
: of appearance.
- `last`
: order of appearance.
- `all`
: size larger than n.
* **Returns:**
The n largest values in the Series, sorted in decreasing order.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.nsmallest`](maxframe.dataframe.Series.nsmallest.md#maxframe.dataframe.Series.nsmallest)
: Get the n smallest elements.
[`Series.sort_values`](maxframe.dataframe.Series.sort_values.md#maxframe.dataframe.Series.sort_values)
: Sort Series by values.
[`Series.head`](maxframe.dataframe.Series.head.md#maxframe.dataframe.Series.head)
: Return the first n rows.
### Notes
Faster than `.sort_values(ascending=False).head(n)` for small n
relative to the size of the `Series` object.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> countries_population = {"Italy": 59000000, "France": 65000000,
... "Malta": 434000, "Maldives": 434000,
... "Brunei": 434000, "Iceland": 337000,
... "Nauru": 11300, "Tuvalu": 11300,
... "Anguilla": 11300, "Montserrat": 5200}
>>> s = md.Series(countries_population)
>>> s.execute()
Italy 59000000
France 65000000
Malta 434000
Maldives 434000
Brunei 434000
Iceland 337000
Nauru 11300
Tuvalu 11300
Anguilla 11300
Montserrat 5200
dtype: int64
```
The n largest elements where `n=5` by default.
```pycon
>>> s.nlargest().execute()
France 65000000
Italy 59000000
Malta 434000
Maldives 434000
Brunei 434000
dtype: int64
```
The n largest elements where `n=3`. Default keep value is ‘first’
so Malta will be kept.
```pycon
>>> s.nlargest(3).execute()
France 65000000
Italy 59000000
Malta 434000
dtype: int64
```
The n largest elements where `n=3` and keeping the last duplicates.
Brunei will be kept since it is the last with value 434000 based on
the index order.
```pycon
>>> s.nlargest(3, keep='last').execute()
France 65000000
Italy 59000000
Brunei 434000
dtype: int64
```
The n largest elements where `n=3` with all duplicates kept. Note
that the returned Series has five elements due to the three duplicates.
```pycon
>>> s.nlargest(3, keep='all').execute()
France 65000000
Italy 59000000
Malta 434000
Maldives 434000
Brunei 434000
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.notna.md
# maxframe.dataframe.Series.notna
#### Series.notna()
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings `''` or `numpy.inf` are not considered NA values
(unless you set `pandas.options.mode.use_inf_as_na = True`).
NA values, such as None or `numpy.NaN`, get mapped to False
values.
* **Returns:**
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.notnull`](maxframe.dataframe.DataFrame.notnull.md#maxframe.dataframe.DataFrame.notnull)
: Alias of notna.
[`DataFrame.isna`](maxframe.dataframe.DataFrame.isna.md#maxframe.dataframe.DataFrame.isna)
: Boolean inverse of notna.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Omit axes labels with missing values.
[`notna`](maxframe.dataframe.notna.md#maxframe.dataframe.notna)
: Top-level notna.
### Examples
Show which entries in a DataFrame are not NA.
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'age': [5, 6, np.NaN],
... 'born': [md.NaT, md.Timestamp('1939-05-27'),
... md.Timestamp('1940-04-25')],
... 'name': ['Alfred', 'Batman', ''],
... 'toy': [None, 'Batmobile', 'Joker']})
>>> df.execute()
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
```
```pycon
>>> df.notna().execute()
age born name toy
0 True False True False
1 True True True True
2 False True True True
```
Show which entries in a Series are not NA.
```pycon
>>> ser = md.Series([5, 6, np.NaN])
>>> ser.execute()
0 5.0
1 6.0
2 NaN
dtype: float64
```
```pycon
>>> ser.notna().execute()
0 True
1 True
2 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.nsmallest.md
# maxframe.dataframe.Series.nsmallest
#### Series.nsmallest(n, keep='first')
Return the smallest n elements.
* **Parameters:**
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 5*) – Return this many ascending sorted values.
* **keep** ( *{'first'* *,* *'last'* *,* *'all'}* *,* *default 'first'*) –
When there are duplicate values that cannot all fit in a
Series of n elements:
- `first`
: of appearance.
- `last`
: order of appearance.
- `all`
: size larger than n.
* **Returns:**
The n smallest values in the Series, sorted in increasing order.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.nlargest`](maxframe.dataframe.Series.nlargest.md#maxframe.dataframe.Series.nlargest)
: Get the n largest elements.
[`Series.sort_values`](maxframe.dataframe.Series.sort_values.md#maxframe.dataframe.Series.sort_values)
: Sort Series by values.
[`Series.head`](maxframe.dataframe.Series.head.md#maxframe.dataframe.Series.head)
: Return the first n rows.
### Notes
Faster than `.sort_values().head(n)` for small n relative to
the size of the `Series` object.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> countries_population = {"Italy": 59000000, "France": 65000000,
... "Brunei": 434000, "Malta": 434000,
... "Maldives": 434000, "Iceland": 337000,
... "Nauru": 11300, "Tuvalu": 11300,
... "Anguilla": 11300, "Montserrat": 5200}
>>> s = md.Series(countries_population)
>>> s.execute()
Italy 59000000
France 65000000
Brunei 434000
Malta 434000
Maldives 434000
Iceland 337000
Nauru 11300
Tuvalu 11300
Anguilla 11300
Montserrat 5200
dtype: int64
```
The n smallest elements where `n=5` by default.
```pycon
>>> s.nsmallest().execute()
Montserrat 5200
Nauru 11300
Tuvalu 11300
Anguilla 11300
Iceland 337000
dtype: int64
```
The n smallest elements where `n=3`. Default keep value is
‘first’ so Nauru and Tuvalu will be kept.
```pycon
>>> s.nsmallest(3).execute()
Montserrat 5200
Nauru 11300
Tuvalu 11300
dtype: int64
```
The n smallest elements where `n=3` and keeping the last
duplicates. Anguilla and Tuvalu will be kept since they are the last
with value 11300 based on the index order.
```pycon
>>> s.nsmallest(3, keep='last').execute()
Montserrat 5200
Anguilla 11300
Tuvalu 11300
dtype: int64
```
The n smallest elements where `n=3` with all duplicates kept. Note
that the returned Series has four elements due to the three duplicates.
```pycon
>>> s.nsmallest(3, keep='all').execute()
Montserrat 5200
Nauru 11300
Tuvalu 11300
Anguilla 11300
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.nunique.md
# maxframe.dataframe.Series.nunique
#### Series.nunique(dropna=True)
Return number of unique elements in the object.
Excludes NA values by default.
* **Parameters:**
**dropna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Don’t include NaN in the count.
* **Return type:**
[int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
[`DataFrame.nunique`](maxframe.dataframe.DataFrame.nunique.md#maxframe.dataframe.DataFrame.nunique)
: Method nunique for DataFrame.
[`Series.count`](maxframe.dataframe.Series.count.md#maxframe.dataframe.Series.count)
: Count non-NA/null observations in the Series.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 3, 5, 7, 7])
>>> s.execute()
0 1
1 3
2 5
3 7
4 7
dtype: int64
```
```pycon
>>> s.nunique().execute()
4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.plot.area.md
# maxframe.dataframe.Series.plot.area
#### Series.plot.area(\*args, \*\*kwargs)
Draw a stacked area plot.
An area plot displays quantitative data visually.
This function wraps the matplotlib area function.
* **Parameters:**
* **x** (*label* *or* *position* *,* *optional*) – Coordinates for the X axis. By default uses the index.
* **y** (*label* *or* *position* *,* *optional*) – Column to plot. By default uses all columns.
* **stacked** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Area plots are stacked by default. Set to False to create a
unstacked plot.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
Area plot, or array of area plots if subplots is True.
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray)
#### SEE ALSO
[`DataFrame.plot`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot)
: Make plots of DataFrame using matplotlib / pylab.
### Examples
Draw an area plot based on basic business metrics:
Area plots are stacked by default. To produce an unstacked plot,
pass `stacked=False`:
Draw an area plot for a single column:
Draw with a different x:
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.plot.bar.md
# maxframe.dataframe.Series.plot.bar
#### Series.plot.bar(\*args, \*\*kwargs)
Vertical bar plot.
A bar plot is a plot that presents categorical data with
rectangular bars with lengths proportional to the values that they
represent. A bar plot shows comparisons among discrete categories. One
axis of the plot shows the specific categories being compared, and the
other axis represents a measured value.
* **Parameters:**
* **x** (*label* *or* *position* *,* *optional*) – Allows plotting of one column versus another. If not specified,
the index of the DataFrame is used.
* **y** (*label* *or* *position* *,* *optional*) – Allows plotting of one column versus another. If not specified,
all numerical columns are used.
* **color** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *array-like* *, or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *optional*) –
The color for each of the DataFrame’s columns. Possible values are:
- A single color string referred to by name, RGB or RGBA code,
: for instance ‘red’ or ‘#a98d19’.
- A sequence of color strings referred to by name, RGB or RGBA
: code, which will be used for each column recursively. For
instance [‘green’,’yellow’] each column’s bar will be filled in
green or yellow, alternatively. If there is only a single column to
be plotted, then only the first color from the color list will be
used.
- A dict of the form {column name
: colored accordingly. For example, if your columns are called a and
b, then passing {‘a’: ‘green’, ‘b’: ‘red’} will color bars for
column a in green and bars for column b in red.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
An ndarray is returned with one [`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes)
per column when `subplots=True`.
> DataFrame.plot.barh : Horizontal bar plot.
> DataFrame.plot : Make plots of a DataFrame.
> matplotlib.pyplot.bar : Make a bar plot with matplotlib.
> Basic plot.
> Plot a whole dataframe to a bar plot. Each column is assigned a
> distinct color, and each row is nested in a group along the
> horizontal axis.
> Plot stacked bar charts for the DataFrame
> Instead of nesting, the figure can be split by column with
> `subplots=True`. In this case, a [`numpy.ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray) of
> [`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) are returned.
> If you don’t like the default colours, you can specify how you’d
> like each column to be colored.
> Plot a single column.
> Plot only selected categories for the DataFrame.
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or np.ndarray of them
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.plot.barh.md
# maxframe.dataframe.Series.plot.barh
#### Series.plot.barh(\*args, \*\*kwargs)
Make a horizontal bar plot.
A horizontal bar plot is a plot that presents quantitative data with
rectangular bars with lengths proportional to the values that they
represent. A bar plot shows comparisons among discrete categories. One
axis of the plot shows the specific categories being compared, and the
other axis represents a measured value.
* **Parameters:**
* **x** (*label* *or* *position* *,* *optional*) – Allows plotting of one column versus another. If not specified,
the index of the DataFrame is used.
* **y** (*label* *or* *position* *,* *optional*) – Allows plotting of one column versus another. If not specified,
all numerical columns are used.
* **color** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *array-like* *, or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *optional*) –
The color for each of the DataFrame’s columns. Possible values are:
- A single color string referred to by name, RGB or RGBA code,
: for instance ‘red’ or ‘#a98d19’.
- A sequence of color strings referred to by name, RGB or RGBA
: code, which will be used for each column recursively. For
instance [‘green’,’yellow’] each column’s bar will be filled in
green or yellow, alternatively. If there is only a single column to
be plotted, then only the first color from the color list will be
used.
- A dict of the form {column name
: colored accordingly. For example, if your columns are called a and
b, then passing {‘a’: ‘green’, ‘b’: ‘red’} will color bars for
column a in green and bars for column b in red.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
An ndarray is returned with one [`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes)
per column when `subplots=True`.
> DataFrame.plot.bar: Vertical bar plot.
> DataFrame.plot : Make plots of DataFrame using matplotlib.
> matplotlib.axes.Axes.bar : Plot a vertical bar plot using matplotlib.
> Basic example
> Plot a whole DataFrame to a horizontal bar plot
> Plot stacked barh charts for the DataFrame
> We can specify colors for each column
> Plot a column of the DataFrame to a horizontal bar plot
> Plot DataFrame versus the desired column
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or np.ndarray of them
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.plot.box.md
# maxframe.dataframe.Series.plot.box
#### Series.plot.box(\*args, \*\*kwargs)
Make a box plot of the DataFrame columns.
A box plot is a method for graphically depicting groups of numerical
data through their quartiles.
The box extends from the Q1 to Q3 quartile values of the data,
with a line at the median (Q2). The whiskers extend from the edges
of box to show the range of the data. The position of the whiskers
is set by default to 1.5\*IQR (IQR = Q3 - Q1) from the edges of the
box. Outlier points are those past the end of the whiskers.
For further details see Wikipedia’s
entry for [boxplot](https://en.wikipedia.org/wiki/Box_plot).
A consideration when using this chart is that the box and the whiskers
can overlap, which is very common when plotting small sets of data.
* **Parameters:**
* **by** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *sequence*) –
Column in the DataFrame to group by.
#### Versionchanged
Changed in version 1.4.0: Previously, by is silently ignore and makes no groupings
* **\*\*kwargs** – Additional keywords are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Return type:**
[`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or numpy.ndarray of them
#### SEE ALSO
`DataFrame.boxplot`
: Another method to draw a box plot.
[`Series.plot.box`](#maxframe.dataframe.Series.plot.box)
: Draw a box plot from a Series object.
[`matplotlib.pyplot.boxplot`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html#matplotlib.pyplot.boxplot)
: Draw a box plot in matplotlib.
### Examples
Draw a box plot from a DataFrame with four columns of randomly
generated data.
You can also generate groupings if you specify the by parameter (which
can take a column name, or a list or tuple of column names):
#### Versionchanged
Changed in version 1.4.0.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.plot.density.md
# maxframe.dataframe.Series.plot.density
#### Series.plot.density(\*args, \*\*kwargs)
Generate Kernel Density Estimate plot using Gaussian kernels.
In statistics, [kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) (KDE) is a non-parametric
way to estimate the probability density function (PDF) of a random
variable. This function uses Gaussian kernels and includes automatic
bandwidth determination.
* **Parameters:**
* **bw_method** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *scalar* *or* *callable* *,* *optional*) – The method used to calculate the estimator bandwidth. This can be
‘scott’, ‘silverman’, a scalar constant or a callable.
If None (default), ‘scott’ is used.
See [`scipy.stats.gaussian_kde`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html#scipy.stats.gaussian_kde) for more information.
* **ind** (*NumPy array* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Evaluation points for the estimated PDF. If None (default),
1000 equally spaced points are used. If ind is a NumPy array, the
KDE is evaluated at the points passed. If ind is an integer,
ind number of equally spaced points are used.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray) of them
#### SEE ALSO
[`scipy.stats.gaussian_kde`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html#scipy.stats.gaussian_kde)
: Representation of a kernel-density estimate using Gaussian kernels. This is the function used internally to estimate the PDF.
### Examples
Given a Series of points randomly sampled from an unknown
distribution, estimate its PDF using KDE with automatic
bandwidth determination and plot the results, evaluating them at
1000 equally spaced points (default):
A scalar bandwidth can be specified. Using a small bandwidth value can
lead to over-fitting, while using a large bandwidth value may result
in under-fitting:
Finally, the ind parameter determines the evaluation points for the
plot of the estimated PDF:
For DataFrame, it works in the same way:
A scalar bandwidth can be specified. Using a small bandwidth value can
lead to over-fitting, while using a large bandwidth value may result
in under-fitting:
Finally, the ind parameter determines the evaluation points for the
plot of the estimated PDF:
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.plot.hist.md
# maxframe.dataframe.Series.plot.hist
#### Series.plot.hist(\*args, \*\*kwargs)
Draw one histogram of the DataFrame’s columns.
A histogram is a representation of the distribution of data.
This function groups the values of all given Series in the DataFrame
into bins and draws all bins in one [`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes).
This is useful when the DataFrame’s Series are in a similar scale.
* **Parameters:**
* **by** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *sequence* *,* *optional*) –
Column in the DataFrame to group by.
#### Versionchanged
Changed in version 1.4.0: Previously, by is silently ignore and makes no groupings
* **bins** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 10*) – Number of histogram bins to be used.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
**class** – Return a histogram plot.
* **Return type:**
matplotlib.AxesSubplot
#### SEE ALSO
`DataFrame.hist`
: Draw histograms per DataFrame’s Series.
`Series.hist`
: Draw a histogram with Series’ data.
### Examples
When we roll a die 6000 times, we expect to get each value around 1000
times. But when we roll two dice and sum the result, the distribution
is going to be quite different. A histogram illustrates those
distributions.
A grouped histogram can be generated by providing the parameter by (which
can be a column name, or a list of column names):
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.plot.kde.md
# maxframe.dataframe.Series.plot.kde
#### Series.plot.kde(\*args, \*\*kwargs)
Generate Kernel Density Estimate plot using Gaussian kernels.
In statistics, [kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) (KDE) is a non-parametric
way to estimate the probability density function (PDF) of a random
variable. This function uses Gaussian kernels and includes automatic
bandwidth determination.
* **Parameters:**
* **bw_method** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *scalar* *or* *callable* *,* *optional*) – The method used to calculate the estimator bandwidth. This can be
‘scott’, ‘silverman’, a scalar constant or a callable.
If None (default), ‘scott’ is used.
See [`scipy.stats.gaussian_kde`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html#scipy.stats.gaussian_kde) for more information.
* **ind** (*NumPy array* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Evaluation points for the estimated PDF. If None (default),
1000 equally spaced points are used. If ind is a NumPy array, the
KDE is evaluated at the points passed. If ind is an integer,
ind number of equally spaced points are used.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray) of them
#### SEE ALSO
[`scipy.stats.gaussian_kde`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html#scipy.stats.gaussian_kde)
: Representation of a kernel-density estimate using Gaussian kernels. This is the function used internally to estimate the PDF.
### Examples
Given a Series of points randomly sampled from an unknown
distribution, estimate its PDF using KDE with automatic
bandwidth determination and plot the results, evaluating them at
1000 equally spaced points (default):
A scalar bandwidth can be specified. Using a small bandwidth value can
lead to over-fitting, while using a large bandwidth value may result
in under-fitting:
Finally, the ind parameter determines the evaluation points for the
plot of the estimated PDF:
For DataFrame, it works in the same way:
A scalar bandwidth can be specified. Using a small bandwidth value can
lead to over-fitting, while using a large bandwidth value may result
in under-fitting:
Finally, the ind parameter determines the evaluation points for the
plot of the estimated PDF:
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.plot.line.md
# maxframe.dataframe.Series.plot.line
#### Series.plot.line(\*args, \*\*kwargs)
Plot Series or DataFrame as lines.
This function is useful to plot lines using DataFrame’s values
as coordinates.
* **Parameters:**
* **x** (*label* *or* *position* *,* *optional*) – Allows plotting of one column versus another. If not specified,
the index of the DataFrame is used.
* **y** (*label* *or* *position* *,* *optional*) – Allows plotting of one column versus another. If not specified,
all numerical columns are used.
* **color** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *array-like* *, or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *optional*) –
The color for each of the DataFrame’s columns. Possible values are:
- A single color string referred to by name, RGB or RGBA code,
: for instance ‘red’ or ‘#a98d19’.
- A sequence of color strings referred to by name, RGB or RGBA
: code, which will be used for each column recursively. For
instance [‘green’,’yellow’] each column’s line will be filled in
green or yellow, alternatively. If there is only a single column to
be plotted, then only the first color from the color list will be
used.
- A dict of the form {column name
: colored accordingly. For example, if your columns are called a and
b, then passing {‘a’: ‘green’, ‘b’: ‘red’} will color lines for
column a in green and lines for column b in red.
* **\*\*kwargs** – Additional keyword arguments are documented in
[`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
An ndarray is returned with one [`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes)
per column when `subplots=True`.
> matplotlib.pyplot.plot : Plot y versus x as lines and/or markers.
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or np.ndarray of them
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.plot.md
# maxframe.dataframe.Series.plot
#### Series.plot()
Make plots of Series or DataFrame.
Uses the backend specified by the
option `plotting.backend`. By default, matplotlib is used.
* **Parameters:**
* **data** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – The object for which the method is called.
* **x** (*label* *or* *position* *,* *default None*) – Only used if data is a DataFrame.
* **y** (*label* *,* *position* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *label* *,* *positions* *,* *default None*) – Allows plotting of one column versus another. Only used if data is a
DataFrame.
* **kind** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) –
The kind of plot to produce:
- ’line’ : line plot (default)
- ’bar’ : vertical bar plot
- ’barh’ : horizontal bar plot
- ’hist’ : histogram
- ’box’ : boxplot
- ’kde’ : Kernel Density Estimation plot
- ’density’ : same as ‘kde’
- ’area’ : area plot
- ’pie’ : pie plot
- ’scatter’ : scatter plot (DataFrame only)
- ’hexbin’ : hexbin plot (DataFrame only)
* **ax** (*matplotlib axes object* *,* *default None*) – An axes of the current figure.
* **subplots** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* *sequence* *of* *iterables* *,* *default False*) –
Whether to group columns into subplots:
- `False` : No subplots will be used
- `True` : Make separate subplots for each column.
- sequence of iterables of column labels: Create a subplot for each
group of columns. For example [(‘a’, ‘c’), (‘b’, ‘d’)] will
create 2 subplots: one with columns ‘a’ and ‘c’, and one
with columns ‘b’ and ‘d’. Remaining columns that aren’t specified
will be plotted in additional subplots (one per column).
#### Versionadded
Added in version 1.5.0.
* **sharex** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True if ax is None else False*) – In case `subplots=True`, share x axis and set some x axis labels
to invisible; defaults to True if ax is None otherwise False if
an ax is passed in; Be aware, that passing in both an ax and
`sharex=True` will alter all x axis labels for all axis in a figure.
* **sharey** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – In case `subplots=True`, share y axis and set some y axis labels to invisible.
* **layout** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *,* *optional*) – (rows, columns) for the layout of subplots.
* **figsize** (*a tuple* *(**width* *,* *height* *)* *in inches*) – Size of a figure object.
* **use_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Use index as ticks for x axis.
* **title** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list)) – Title to use for the plot. If a string is passed, print the string
at the top of the figure. If a list is passed and subplots is
True, print each item in the list above the corresponding subplot.
* **grid** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default None* *(**matlab style default* *)*) – Axis grid lines.
* **legend** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* *{'reverse'}*) – Place legend on axis subplots.
* **style** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) – The matplotlib line style per column.
* **logx** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* *'sym'* *,* *default False*) – Use log scaling or symlog scaling on x axis.
* **logy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* *'sym' default False*) – Use log scaling or symlog scaling on y axis.
* **loglog** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* *'sym'* *,* *default False*) – Use log scaling or symlog scaling on both x and y axes.
* **xticks** (*sequence*) – Values to use for the xticks.
* **yticks** (*sequence*) – Values to use for the yticks.
* **xlim** (*2-tuple/list*) – Set the x limits of the current axes.
* **ylim** (*2-tuple/list*) – Set the y limits of the current axes.
* **xlabel** (*label* *,* *optional*) –
Name to use for the xlabel on x-axis. Default uses index name as xlabel, or the
x-column name for planar plots.
#### Versionchanged
Changed in version 2.0.0: Now applicable to histograms.
* **ylabel** (*label* *,* *optional*) –
Name to use for the ylabel on y-axis. Default will show no ylabel, or the
y-column name for planar plots.
#### Versionchanged
Changed in version 2.0.0: Now applicable to histograms.
* **rot** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *default None*) – Rotation for ticks (xticks for vertical, yticks for horizontal
plots).
* **fontsize** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *default None*) – Font size for xticks and yticks.
* **colormap** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *matplotlib colormap object* *,* *default None*) – Colormap to select colors from. If string, load colormap with that
name from matplotlib.
* **colorbar** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True, plot colorbar (only relevant for ‘scatter’ and ‘hexbin’
plots).
* **position** ([*float*](https://docs.python.org/3/library/functions.html#float)) – Specify relative alignments for bar plot layout.
From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5
(center).
* **table** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *,* *default False*) – If True, draw a table using the data in the DataFrame and the data
will be transposed to meet matplotlib’s default layout.
If a Series or DataFrame is passed, use passed data to draw a
table.
* **yerr** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *array-like* *,* *dict and str*) – See [Plotting with Error Bars](https://pandas.pydata.org/docs/user_guide/visualization.html#visualization-errorbars) for
detail.
* **xerr** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *array-like* *,* *dict and str*) – Equivalent to yerr.
* **stacked** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False in line and bar plots* *,* *and True in area plot*) – If True, create stacked plot.
* **secondary_y** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* *sequence* *,* *default False*) – Whether to plot on the secondary y-axis if a list/tuple, which
columns to plot on secondary y-axis.
* **mark_right** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – When using a secondary_y axis, automatically mark the column
labels with “(right)” in the legend.
* **include_bool** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default is False*) – If True, boolean values can be plotted.
* **backend** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Backend to use instead of the backend specified in the option
`plotting.backend`. For instance, ‘matplotlib’. Alternatively, to
specify the `plotting.backend` for the whole session, set
`pd.options.plotting.backend`.
* **\*\*kwargs** – Options to pass to matplotlib plotting method.
* **Returns:**
If the backend is not the default matplotlib one, the return value
will be the object returned by the backend.
* **Return type:**
[`matplotlib.axes.Axes`](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or numpy.ndarray of them
### Notes
- See matplotlib documentation online for more on this subject
- If kind = ‘bar’ or ‘barh’, you can specify relative alignments
for bar plot layout by position keyword.
From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5
(center)
### Examples
For Series:
For DataFrame:
For SeriesGroupBy:
For DataFrameGroupBy:
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.plot.pie.md
# maxframe.dataframe.Series.plot.pie
#### Series.plot.pie(\*args, \*\*kwargs)
Generate a pie plot.
A pie plot is a proportional representation of the numerical data in a
column. This function wraps `matplotlib.pyplot.pie()` for the
specified column. If no column reference is passed and
`subplots=True` a pie plot is drawn for each numerical column
independently.
* **Parameters:**
* **y** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *label* *,* *optional*) – Label or position of the column to plot.
If not provided, `subplots=True` argument must be passed.
* **\*\*kwargs** – Keyword arguments to pass on to [`DataFrame.plot()`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot).
* **Returns:**
A NumPy array is returned when subplots is True.
* **Return type:**
[matplotlib.axes.Axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.html#matplotlib.axes.Axes) or np.ndarray of them
#### SEE ALSO
[`Series.plot.pie`](#maxframe.dataframe.Series.plot.pie)
: Generate a pie plot for a Series.
[`DataFrame.plot`](maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot)
: Make plots of a DataFrame.
### Examples
In the example below we have a DataFrame with the information about
planet’s mass and radius. We pass the ‘mass’ column to the
pie function to get a pie plot.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.pop.md
# maxframe.dataframe.Series.pop
#### Series.pop(item)
Return item and drops from series. Raise KeyError if not found.
* **Parameters:**
**item** (*label*) – Index of the element that needs to be removed.
* **Return type:**
Value that is popped from series.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> ser = md.Series([1,2,3])
```
```pycon
>>> ser.pop(0).execute()
1
```
```pycon
>>> ser.execute()
1 2
2 3
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.pow.md
# maxframe.dataframe.Series.pow
#### Series.pow(other, level=None, fill_value=None, axis=0)
Return Exponential power of series and other, element-wise (binary operator pow).
Equivalent to `series ** other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.rpow`](maxframe.dataframe.Series.rpow.md#maxframe.dataframe.Series.rpow)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.pow(b, fill_value=0).execute()
a 1.0
b 1.0
c 1.0
d 0.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.prod.md
# maxframe.dataframe.Series.prod
#### Series.prod(axis=None, skipna=True, level=None, min_count=0, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.product.md
# maxframe.dataframe.Series.product
#### Series.product(axis=None, skipna=True, level=None, min_count=0, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.quantile.md
# maxframe.dataframe.Series.quantile
#### Series.quantile(q=0.5, interpolation='linear')
Return value at the given quantile.
* **Parameters:**
* **q** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array-like* *,* *default 0.5* *(**50% quantile* *)*) – 0 <= q <= 1, the quantile(s) to compute.
* **interpolation** ( *{'linear'* *,* *'lower'* *,* *'higher'* *,* *'midpoint'* *,* *'nearest'}*) –
This optional parameter specifies the interpolation method to use,
when the desired quantile lies between two data points i and j:
> * linear: i + (j - i) \* fraction, where fraction is the
> fractional part of the index surrounded by i and j.
> * lower: i.
> * higher: j.
> * nearest: i or j whichever is nearest.
> * midpoint: (i + j) / 2.
* **Returns:**
If `q` is an array or a tensor, a Series will be returned where the
index is `q` and the values are the quantiles, otherwise
a float will be returned.
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) or [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
`core.window.Rolling.quantile`, [`numpy.percentile`](https://numpy.org/doc/stable/reference/generated/numpy.percentile.html#numpy.percentile)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3, 4])
>>> s.quantile(.5).execute()
2.5
>>> s.quantile([.25, .5, .75]).execute()
0.25 1.75
0.50 2.50
0.75 3.25
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.radd.md
# maxframe.dataframe.Series.radd
#### Series.radd(other, level=None, fill_value=None, axis=0)
Return Addition of series and other, element-wise (binary operator radd).
Equivalent to `series + other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.add`](maxframe.dataframe.Series.add.md#maxframe.dataframe.Series.add)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.add(b, fill_value=0).execute()
a 2.0
b 1.0
c 1.0
d 1.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.rank.md
# maxframe.dataframe.Series.rank
#### Series.rank(axis=0, method='average', numeric_only=False, na_option='keep', ascending=True, pct=False)
Compute numerical data ranks (1 through n) along axis.
By default, equal values are assigned a rank that is the average of the
ranks of those values.
* **Parameters:**
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – Index to direct ranking.
* **method** ( *{'average'* *,* *'min'* *,* *'max'* *,* *'first'* *,* *'dense'}* *,* *default 'average'*) –
How to rank the group of records that have the same value (i.e. ties):
* average: average rank of the group
* min: lowest rank in the group
* max: highest rank in the group
* first: ranks assigned in order they appear in the array
* dense: like ‘min’, but rank always increases by 1 between groups.
* **numeric_only** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – For DataFrame objects, rank only numeric columns if set to True.
* **na_option** ( *{'keep'* *,* *'top'* *,* *'bottom'}* *,* *default 'keep'*) –
How to rank NaN values:
* keep: assign NaN rank to NaN values
* top: assign lowest rank to NaN values
* bottom: assign highest rank to NaN values
* **ascending** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Whether or not the elements should be ranked in ascending order.
* **pct** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether or not to display the returned rankings in percentile
form.
* **Returns:**
Return a Series or DataFrame with data ranks as values.
* **Return type:**
same type as caller
#### SEE ALSO
`core.groupby.GroupBy.rank`
: Rank of values within each group.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(data={'Animal': ['cat', 'penguin', 'dog',
... 'spider', 'snake'],
... 'Number_legs': [4, 2, 4, 8, mt.nan]})
>>> df.execute()
Animal Number_legs
0 cat 4.0
1 penguin 2.0
2 dog 4.0
3 spider 8.0
4 snake NaN
```
The following example shows how the method behaves with the above
parameters:
* default_rank: this is the default behaviour obtained without using
any parameter.
* max_rank: setting `method = 'max'` the records that have the
same values are ranked using the highest rank (e.g.: since ‘cat’
and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)
* NA_bottom: choosing `na_option = 'bottom'`, if there are records
with NaN values they are placed at the bottom of the ranking.
* pct_rank: when setting `pct = True`, the ranking is expressed as
percentile rank.
```pycon
>>> df['default_rank'] = df['Number_legs'].rank()
>>> df['max_rank'] = df['Number_legs'].rank(method='max')
>>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
>>> df['pct_rank'] = df['Number_legs'].rank(pct=True)
>>> df.execute()
Animal Number_legs default_rank max_rank NA_bottom pct_rank
0 cat 4.0 2.5 3.0 2.5 0.625
1 penguin 2.0 1.0 1.0 1.0 0.250
2 dog 4.0 2.5 3.0 2.5 0.625
3 spider 8.0 4.0 4.0 4.0 1.000
4 snake NaN NaN NaN 5.0 NaN
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.rdiv.md
# maxframe.dataframe.Series.rdiv
#### Series.rdiv(other, level=None, fill_value=None, axis=0)
Return Floating division of series and other, element-wise (binary operator rtruediv).
Equivalent to `series / other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.truediv`](maxframe.dataframe.Series.truediv.md#maxframe.dataframe.Series.truediv)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.truediv(b, fill_value=0).execute()
a 1.0
b inf
c inf
d 0.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.reindex.md
# maxframe.dataframe.Series.reindex
#### Series.reindex(labels=None, , index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=None, limit=None, tolerance=None, enable_sparse=False)
Conform Series/DataFrame to new index with optional filling logic.
Places NA/NaN in locations having no value in the previous index. A new object
is produced unless the new index is equivalent to the current one and
`copy=False`.
* **Parameters:**
* **labels** (*array-like* *,* *optional*) – New labels / index to conform the axis specified by ‘axis’ to.
* **index** (*array-like* *,* *optional*) – New labels / index to conform to, should be specified using
keywords. Preferably an Index object to avoid duplicating data.
* **columns** (*array-like* *,* *optional*) – New labels / index to conform to, should be specified using
keywords. Preferably an Index object to avoid duplicating data.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Axis to target. Can be either the axis name (‘index’, ‘columns’)
or number (0, 1).
* **method** ( *{None* *,* *'backfill'/'bfill'* *,* *'pad'/'ffill'* *,* *'nearest'}*) –
Method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.
* None (default): don’t fill gaps
* pad / ffill: Propagate last valid observation forward to next
valid.
* backfill / bfill: Use next valid observation to fill gap.
* nearest: Use nearest valid observations to fill gap.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Return a new object, even if the passed indexes are the same.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **fill_value** (*scalar* *,* *default np.NaN*) – Value to use for missing values. Defaults to NaN, but can be any
“compatible” value.
* **limit** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Maximum number of consecutive elements to forward or backward fill.
* **tolerance** (*optional*) –
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations most
satisfy the equation `abs(index[indexer] - target) <= tolerance`.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index’s type.
* **Return type:**
Series/DataFrame with changed index.
#### SEE ALSO
[`DataFrame.set_index`](maxframe.dataframe.DataFrame.set_index.md#maxframe.dataframe.DataFrame.set_index)
: Set row labels.
[`DataFrame.reset_index`](maxframe.dataframe.DataFrame.reset_index.md#maxframe.dataframe.DataFrame.reset_index)
: Remove row labels or move them to new columns.
[`DataFrame.reindex_like`](maxframe.dataframe.DataFrame.reindex_like.md#maxframe.dataframe.DataFrame.reindex_like)
: Change to same indices as other DataFrame.
### Examples
`DataFrame.reindex` supports two calling conventions
* `(index=index_labels, columns=column_labels, ...)`
* `(labels, axis={'index', 'columns'}, ...)`
We *highly* recommend using keyword arguments to clarify your
intent.
Create a dataframe with some fictional data.
```pycon
>>> import maxframe.dataframe as md
>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
>>> df = md.DataFrame({'http_status': [200, 200, 404, 404, 301],
... 'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
... index=index)
>>> df.execute()
http_status response_time
Firefox 200 0.04
Chrome 200 0.02
Safari 404 0.07
IE10 404 0.08
Konqueror 301 1.00
```
Create a new index and reindex the dataframe. By default
values in the new index that do not have corresponding
records in the dataframe are assigned `NaN`.
```pycon
>>> new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
... 'Chrome']
>>> df.reindex(new_index).execute()
http_status response_time
Safari 404.0 0.07
Iceweasel NaN NaN
Comodo Dragon NaN NaN
IE10 404.0 0.08
Chrome 200.0 0.02
```
We can fill in the missing values by passing a value to
the keyword `fill_value`. Because the index is not monotonically
increasing or decreasing, we cannot use arguments to the keyword
`method` to fill the `NaN` values.
```pycon
>>> df.reindex(new_index, fill_value=0).execute()
http_status response_time
Safari 404 0.07
Iceweasel 0 0.00
Comodo Dragon 0 0.00
IE10 404 0.08
Chrome 200 0.02
```
```pycon
>>> df.reindex(new_index, fill_value='missing').execute()
http_status response_time
Safari 404 0.07
Iceweasel missing missing
Comodo Dragon missing missing
IE10 404 0.08
Chrome 200 0.02
```
We can also reindex the columns.
```pycon
>>> df.reindex(columns=['http_status', 'user_agent']).execute()
http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
IE10 404 NaN
Konqueror 301 NaN
```
Or we can use “axis-style” keyword arguments
```pycon
>>> df.reindex(['http_status', 'user_agent'], axis="columns").execute()
http_status user_agent
Firefox 200 NaN
Chrome 200 NaN
Safari 404 NaN
IE10 404 NaN
Konqueror 301 NaN
```
To further illustrate the filling functionality in
`reindex`, we will create a dataframe with a
monotonically increasing index (for example, a sequence
of dates).
```pycon
>>> date_index = md.date_range('1/1/2010', periods=6, freq='D')
>>> df2 = md.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
... index=date_index)
>>> df2.execute()
prices
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
```
Suppose we decide to expand the dataframe to cover a wider
date range.
```pycon
>>> date_index2 = md.date_range('12/29/2009', periods=10, freq='D')
>>> df2.reindex(date_index2).execute()
prices
2009-12-29 NaN
2009-12-30 NaN
2009-12-31 NaN
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
2010-01-07 NaN
```
The index entries that did not have a value in the original data frame
(for example, ‘2009-12-29’) are by default filled with `NaN`.
If desired, we can fill in the missing values using one of several
options.
For example, to back-propagate the last valid value to fill the `NaN`
values, pass `bfill` as an argument to the `method` keyword.
```pycon
>>> df2.reindex(date_index2, method='bfill').execute()
prices
2009-12-29 100.0
2009-12-30 100.0
2009-12-31 100.0
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
2010-01-07 NaN
```
Please note that the `NaN` value present in the original dataframe
(at index value 2010-01-03) will not be filled by any of the
value propagation schemes. This is because filling while reindexing
does not look at dataframe values, but only compares the original and
desired indexes. If you do want to fill in the `NaN` values present
in the original dataframe, use the `fillna()` method.
See the [user guide](https://pandas.pydata.org/docs/user_guide/basics.html#basics-reindexing) for more.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.reindex_like.md
# maxframe.dataframe.Series.reindex_like
#### Series.reindex_like(other, method=None, copy=True, limit=None, tolerance=None)
Return an object with matching indices as other object.
Conform the object to the same index on all axes. Optional
filling logic, placing NaN in locations having no value
in the previous index. A new object is produced unless the
new index is equivalent to the current one and copy=False.
* **Parameters:**
* **other** (*Object* *of* *the same data type*) – Its row and column indices are used to define the new indices
of this object.
* **method** ( *{None* *,* *'backfill'/'bfill'* *,* *'pad'/'ffill'* *,* *'nearest'}*) –
Method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.
* None (default): don’t fill gaps
* pad / ffill: propagate last valid observation forward to next
valid
* backfill / bfill: use next valid observation to fill gap
* nearest: use nearest valid observations to fill gap.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Return a new object, even if the passed indexes are the same.
* **limit** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Maximum number of consecutive labels to fill for inexact matches.
* **tolerance** (*optional*) –
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations must
satisfy the equation `abs(index[indexer] - target) <= tolerance`.
Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index’s type.
* **Returns:**
Same type as caller, but with changed indices on each axis.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.set_index`](maxframe.dataframe.DataFrame.set_index.md#maxframe.dataframe.DataFrame.set_index)
: Set row labels.
[`DataFrame.reset_index`](maxframe.dataframe.DataFrame.reset_index.md#maxframe.dataframe.DataFrame.reset_index)
: Remove row labels or move them to new columns.
[`DataFrame.reindex`](maxframe.dataframe.DataFrame.reindex.md#maxframe.dataframe.DataFrame.reindex)
: Change to new indices or expand indices.
### Notes
Same as calling
`.reindex(index=other.index, columns=other.columns,...)`.
### Examples
```pycon
>>> import pandas as pd
>>> import maxframe.dataframe as md
>>> df1 = md.DataFrame([[24.3, 75.7, 'high'],
... [31, 87.8, 'high'],
... [22, 71.6, 'medium'],
... [35, 95, 'medium']],
... columns=['temp_celsius', 'temp_fahrenheit',
... 'windspeed'],
... index=md.date_range(start='2014-02-12',
... end='2014-02-15', freq='D'))
```
```pycon
>>> df1.execute()
temp_celsius temp_fahrenheit windspeed
2014-02-12 24.3 75.7 high
2014-02-13 31 87.8 high
2014-02-14 22 71.6 medium
2014-02-15 35 95 medium
```
```pycon
>>> df2 = md.DataFrame([[28, 'low'],
... [30, 'low'],
... [35.1, 'medium']],
... columns=['temp_celsius', 'windspeed'],
... index=pd.DatetimeIndex(['2014-02-12', '2014-02-13',
... '2014-02-15']))
```
```pycon
>>> df2.execute()
temp_celsius windspeed
2014-02-12 28.0 low
2014-02-13 30.0 low
2014-02-15 35.1 medium
```
```pycon
>>> df2.reindex_like(df1).execute()
temp_celsius temp_fahrenheit windspeed
2014-02-12 28.0 NaN low
2014-02-13 30.0 NaN low
2014-02-14 NaN NaN NaN
2014-02-15 35.1 NaN medium
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.rename.md
# maxframe.dataframe.Series.rename
#### Series.rename(index=None, , axis='index', copy=True, inplace=False, level=None, errors='ignore')
Alter Series index labels or name.
Function / dict values must be unique (1-to-1). Labels not contained in
a dict / Series will be left as-is. Extra labels listed don’t throw an
error.
Alternatively, change `Series.name` with a scalar value.
* **Parameters:**
* **axis** ( *{0* *or* *"index"}*) – Unused. Accepted for compatibility with DataFrame method only.
* **index** (*scalar* *,* *hashable sequence* *,* *dict-like* *or* *function* *,* *optional*) – Functions or dict-like are transformations to apply to
the index.
Scalar or hashable sequence-like will alter the `Series.name`
attribute.
* **\*\*kwargs** – Additional keyword arguments passed to the function. Only the
“inplace” keyword is used.
* **Returns:**
Series with index labels or name altered.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`DataFrame.rename`](maxframe.dataframe.DataFrame.rename.md#maxframe.dataframe.DataFrame.rename)
: Corresponding DataFrame method.
`Series.rename_axis`
: Set the name of the axis.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3])
>>> s.execute()
0 1
1 2
2 3
dtype: int64
>>> s.rename("my_name").execute() # scalar, changes Series.name.execute()
0 1
1 2
2 3
Name: my_name, dtype: int64
>>> s.rename({1: 3, 2: 5}).execute() # mapping, changes labels.execute()
0 1
3 2
5 3
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.reorder_levels.md
# maxframe.dataframe.Series.reorder_levels
#### Series.reorder_levels(order)
Rearrange index levels using input order.
May not drop or duplicate levels.
* **Parameters:**
**order** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *int representing new level order*) – Reference level by number or key.
* **Return type:**
[type](https://docs.python.org/3/library/functions.html#type) of caller (new object)
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> arrays = [mt.array(["dog", "dog", "cat", "cat", "bird", "bird"]),
... mt.array(["white", "black", "white", "black", "white", "black"])]
>>> s = md.Series([1, 2, 3, 3, 5, 2], index=arrays)
>>> s.execute()
dog white 1
black 2
cat white 3
black 3
bird white 5
black 2
dtype: int64
>>> s.reorder_levels([1, 0]).execute()
white dog 1
black dog 2
white cat 3
black cat 3
white bird 5
black bird 2
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.repeat.md
# maxframe.dataframe.Series.repeat
#### Series.repeat(repeats, axis=None)
Repeat elements of a Series.
Returns a new Series where each element of the current Series
is repeated consecutively a given number of times.
* **Parameters:**
* **repeats** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *array* *of* *ints*) – The number of repetitions for each element. This should be a
non-negative integer. Repeating 0 times will return an empty
Series.
* **axis** (*None*) – Must be `None`. Has no effect but is accepted for compatibility
with numpy.
* **Returns:**
Newly created Series with repeated elements.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Index.repeat`](maxframe.dataframe.Index.repeat.md#maxframe.dataframe.Index.repeat)
: Equivalent function for Index.
[`numpy.repeat`](https://numpy.org/doc/stable/reference/generated/numpy.repeat.html#numpy.repeat)
: Similar method for [`numpy.ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(['a', 'b', 'c'])
>>> s.execute()
0 a
1 b
2 c
dtype: object
>>> s.repeat(2).execute()
0 a
0 a
1 b
1 b
2 c
2 c
dtype: object
>>> s.repeat([1, 2, 3]).execute()
0 a
1 b
1 b
2 c
2 c
2 c
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.reset_index.md
# maxframe.dataframe.Series.reset_index
#### Series.reset_index(level=None, drop=False, name=<no_default>, inplace=False, default_index_type: ~maxframe.protocol.DefaultIndexType | str = None, \*\*kwargs)
Generate a new DataFrame or Series with the index reset.
This is useful when the index needs to be treated as a column, or
when the index is meaningless and needs to be reset to the default
before another operation.
* **Parameters:**
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *, or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *default optional*) – For a Series with a MultiIndex, only remove the specified levels
from the index. Removes all levels by default.
* **drop** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Just reset the index, without inserting it as a column in
the new DataFrame.
* **name** ([*object*](https://docs.python.org/3/library/functions.html#object) *,* *optional*) – The name to use for the column containing the original Series
values. Uses `self.name` by default. This argument is ignored
when drop is True.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Modify the Series in place (do not create a new object).
* **Returns:**
When drop is False (the default), a DataFrame is returned.
The newly created columns will come first in the DataFrame,
followed by the original Series values.
When drop is True, a Series is returned.
In either case, if `inplace=True`, no value is returned.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.reset_index`](maxframe.dataframe.DataFrame.reset_index.md#maxframe.dataframe.DataFrame.reset_index)
: Analogous function for DataFrame.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3, 4], name='foo',
... index=md.Index(['a', 'b', 'c', 'd'], name='idx'))
```
Generate a DataFrame with default index.
```pycon
>>> s.reset_index().execute()
idx foo
0 a 1
1 b 2
2 c 3
3 d 4
```
To specify the name of the new column use name.
```pycon
>>> s.reset_index(name='values').execute()
idx values
0 a 1
1 b 2
2 c 3
3 d 4
```
To generate a new Series with the default set drop to True.
```pycon
>>> s.reset_index(drop=True).execute()
0 1
1 2
2 3
3 4
Name: foo, dtype: int64
```
To update the Series in place, without generating a new one
set inplace to True. Note that it also requires `drop=True`.
```pycon
>>> s.reset_index(inplace=True, drop=True)
>>> s.execute()
0 1
1 2
2 3
3 4
Name: foo, dtype: int64
```
The level parameter is interesting for Series with a multi-level
index.
```pycon
>>> import numpy as np
>>> import pandas as pd
>>> arrays = [np.array(['bar', 'bar', 'baz', 'baz']),
... np.array(['one', 'two', 'one', 'two'])]
>>> s2 = md.Series(
... range(4), name='foo',
... index=pd.MultiIndex.from_arrays(arrays,
... names=['a', 'b']))
```
To remove a specific level from the Index, use level.
```pycon
>>> s2.reset_index(level='a').execute()
a foo
b
one bar 0
two bar 1
one baz 2
two baz 3
```
If level is not set, all levels are removed from the Index.
```pycon
>>> s2.reset_index().execute()
a b foo
0 bar one 0
1 bar two 1
2 baz one 2
3 baz two 3
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.rfloordiv.md
# maxframe.dataframe.Series.rfloordiv
#### Series.rfloordiv(other, level=None, fill_value=None, axis=0)
Return Integer division of series and other, element-wise (binary operator rfloordiv).
Equivalent to `series // other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.floordiv`](maxframe.dataframe.Series.floordiv.md#maxframe.dataframe.Series.floordiv)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.floordiv(b, fill_value=0).execute()
a 1.0
b NaN
c NaN
d 0.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.rmod.md
# maxframe.dataframe.Series.rmod
#### Series.rmod(other, level=None, fill_value=None, axis=0)
Return Modulo of series and other, element-wise (binary operator rmod).
Equivalent to `series % other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.mod`](maxframe.dataframe.Series.mod.md#maxframe.dataframe.Series.mod)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.mod(b, fill_value=0).execute()
a 0.0
b NaN
c NaN
d 0.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.rmul.md
# maxframe.dataframe.Series.rmul
#### Series.rmul(other, level=None, fill_value=None, axis=0)
Return Multiplication of series and other, element-wise (binary operator rmul).
Equivalent to `series * other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.mul`](maxframe.dataframe.Series.mul.md#maxframe.dataframe.Series.mul)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.multiply(b, fill_value=0).execute()
a 1.0
b 0.0
c 0.0
d 0.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.rolling.md
# maxframe.dataframe.Series.rolling
#### Series.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)
Provide rolling window calculations.
* **Parameters:**
* **window** ([*int*](https://docs.python.org/3/library/functions.html#int) *, or* *offset*) – Size of the moving window. This is the number of observations used for
calculating the statistic. Each window will be a fixed size.
If its an offset then this will be the time period of each window. Each
window will be a variable sized based on the observations included in
the time-period. This is only valid for datetimelike indexes. This is
new in 0.19.0
* **min_periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Minimum number of observations in window required to have a value
(otherwise result is NA). For a window that is specified by an offset,
min_periods will default to 1. Otherwise, min_periods will default
to the size of the window.
* **center** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Set the labels at the center of the window.
* **win_type** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Provide a window type. If `None`, all points are evenly weighted.
See the notes below for further information.
* **on** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – For a DataFrame, a datetime-like column on which to calculate the rolling
window, rather than the DataFrame’s index. Provided integer column is
ignored and excluded from result since an integer index is not used to
calculate the rolling window.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 0*)
* **closed** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Make the interval closed on the ‘right’, ‘left’, ‘both’ or
‘neither’ endpoints.
For offset-based windows, it defaults to ‘right’.
For fixed windows, defaults to ‘both’. Remaining cases not implemented
for fixed windows.
* **Return type:**
a Window or Rolling sub-classed for the particular operation
#### SEE ALSO
[`expanding`](maxframe.dataframe.Series.expanding.md#maxframe.dataframe.Series.expanding)
: Provides expanding transformations.
[`ewm`](maxframe.dataframe.Series.ewm.md#maxframe.dataframe.Series.ewm)
: Provides exponential weighted functions.
### Notes
By default, the result is set to the right edge of the window. This can be
changed to the center of the window by setting `center=True`.
To learn more about the offsets & frequency strings, please see [this link](http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).
The recognized win_types are:
\* `boxcar`
\* `triang`
\* `blackman`
\* `hamming`
\* `bartlett`
\* `parzen`
\* `bohman`
\* `blackmanharris`
\* `nuttall`
\* `barthann`
\* `kaiser` (needs beta)
\* `gaussian` (needs std)
\* `general_gaussian` (needs power, width)
\* `slepian` (needs width)
\* `exponential` (needs tau), center is set to None.
If `win_type=None` all points are evenly weighted. To learn more about
different window types see [scipy.signal window functions](https://docs.scipy.org/doc/scipy/reference/signal.html#window-functions).
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'B': [0, 1, 2, np.nan, 4]})
>>> df.execute()
B
0 0.0
1 1.0
2 2.0
3 NaN
4 4.0
```
Rolling sum with a window length of 2, using the ‘triang’
window type.
```pycon
>>> df.rolling(2, win_type='triang').sum().execute()
B
0 NaN
1 0.5
2 1.5
3 NaN
4 NaN
```
Rolling sum with a window length of 2, min_periods defaults
to the window length.
```pycon
>>> df.rolling(2).sum().execute()
B
0 NaN
1 1.0
2 3.0
3 NaN
4 NaN
```
Same as above, but explicitly set the min_periods
```pycon
>>> df.rolling(2, min_periods=1).sum().execute()
B
0 0.0
1 1.0
2 3.0
3 2.0
4 4.0
```
A ragged (meaning not-a-regular frequency), time-indexed DataFrame
```pycon
>>> df = md.DataFrame({'B': [0, 1, 2, np.nan, 4]},
>>> index = [md.Timestamp('20130101 09:00:00'),
>>> md.Timestamp('20130101 09:00:02'),
>>> md.Timestamp('20130101 09:00:03'),
>>> md.Timestamp('20130101 09:00:05'),
>>> md.Timestamp('20130101 09:00:06')])
>>> df.execute()
B
2013-01-01 09:00:00 0.0
2013-01-01 09:00:02 1.0
2013-01-01 09:00:03 2.0
2013-01-01 09:00:05 NaN
2013-01-01 09:00:06 4.0
```
Contrasting to an integer rolling window, this will roll a variable
length window corresponding to the time period.
The default for min_periods is 1.
```pycon
>>> df.rolling('2s').sum().execute()
B
2013-01-01 09:00:00 0.0
2013-01-01 09:00:02 1.0
2013-01-01 09:00:03 3.0
2013-01-01 09:00:05 NaN
2013-01-01 09:00:06 4.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.round.md
# maxframe.dataframe.Series.round
#### Series.round(decimals=0, \*args, \*\*kwargs)
Round each value in a Series to the given number of decimals.
* **Parameters:**
**decimals** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 0*) – Number of decimal places to round to. If decimals is negative,
it specifies the number of positions to the left of the decimal point.
* **Returns:**
Rounded values of the Series.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`numpy.around`](https://numpy.org/doc/stable/reference/generated/numpy.around.html#numpy.around)
: Round values of an np.array.
[`DataFrame.round`](maxframe.dataframe.DataFrame.round.md#maxframe.dataframe.DataFrame.round)
: Round values of a DataFrame.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series([0.1, 1.3, 2.7])
>>> s.round().execute()
0 0.0
1 1.0
2 3.0
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.rpow.md
# maxframe.dataframe.Series.rpow
#### Series.rpow(other, level=None, fill_value=None, axis=0)
Return Exponential power of series and other, element-wise (binary operator rpow).
Equivalent to `series ** other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.pow`](maxframe.dataframe.Series.pow.md#maxframe.dataframe.Series.pow)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.pow(b, fill_value=0).execute()
a 1.0
b 1.0
c 1.0
d 0.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.rsub.md
# maxframe.dataframe.Series.rsub
#### Series.rsub(other, level=None, fill_value=None, axis=0)
Return Subtraction of series and other, element-wise (binary operator rsubtract).
Equivalent to `series - other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
`Series.subtract`
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.subtract(b, fill_value=0).execute()
a 0.0
b 1.0
c 1.0
d -1.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.rtruediv.md
# maxframe.dataframe.Series.rtruediv
#### Series.rtruediv(other, level=None, fill_value=None, axis=0)
Return Floating division of series and other, element-wise (binary operator rtruediv).
Equivalent to `series / other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.truediv`](maxframe.dataframe.Series.truediv.md#maxframe.dataframe.Series.truediv)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.truediv(b, fill_value=0).execute()
a 1.0
b inf
c inf
d 0.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.sample.md
# maxframe.dataframe.Series.sample
#### Series.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, always_multinomial=False)
Return a random sample of items from an axis of object.
You can use random_state for reproducibility.
* **Parameters:**
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Number of items from axis to return. Cannot be used with frac.
Default = 1 if frac = None.
* **frac** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Fraction of axis items to return. Cannot be used with n.
* **replace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Allow or disallow sampling of the same row more than once.
* **weights** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *ndarray-like* *,* *optional*) – Default ‘None’ results in equal probability weighting.
If passed a Series, will align with target object on index. Index
values in weights not found in sampled object will be ignored and
index values in sampled object not in weights will be assigned
weights of zero.
If called on a DataFrame, will accept the name of a column
when axis = 0.
Unless weights are a Series, weights must be same length as axis
being sampled.
If weights do not sum to 1, they will be normalized to sum to 1.
Missing values in the weights column will be treated as zero.
Infinite values not allowed.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *array-like* *,* *BitGenerator* *,* *np.random.RandomState* *,* *optional*) – If int, array-like, or BitGenerator (NumPy>=1.17), seed for
random number generator
If np.random.RandomState, use as numpy RandomState object.
* **axis** ( *{0* *or* *‘index’* *,* *1* *or* *‘columns’* *,* *None}* *,* *default None*) – Axis to sample. Accepts axis number or name. Default is stat axis
for given data type (0 for Series and DataFrames).
* **always_multinomial** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, always treat distribution of sample counts between data chunks
as multinomial distribution. This will accelerate sampling when data
is huge, but may affect randomness of samples when number of instances
is not very large.
* **Returns:**
A new object of same type as caller containing n items randomly
sampled from the caller object.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`DataFrameGroupBy.sample`
: Generates random samples from each group of a DataFrame object.
`SeriesGroupBy.sample`
: Generates random samples from each group of a Series object.
[`numpy.random.choice`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html#numpy.random.choice)
: Generates a random sample from a given 1-D numpy array.
### Notes
If frac > 1, replacement should be set to True.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'num_legs': [2, 4, 8, 0],
... 'num_wings': [2, 0, 0, 0],
... 'num_specimen_seen': [10, 2, 1, 8]},
... index=['falcon', 'dog', 'spider', 'fish'])
>>> df.execute()
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
```
Extract 3 random elements from the `Series` `df['num_legs']`:
Note that we use random_state to ensure the reproducibility of
the examples.
```pycon
>>> df['num_legs'].sample(n=3, random_state=1).execute()
fish 0
spider 8
falcon 2
Name: num_legs, dtype: int64
```
A random 50% sample of the `DataFrame` with replacement:
```pycon
>>> df.sample(frac=0.5, replace=True, random_state=1).execute()
num_legs num_wings num_specimen_seen
dog 4 0 2
fish 0 0 8
```
An upsample sample of the `DataFrame` with replacement:
Note that replace parameter has to be True for frac parameter > 1.
```pycon
>>> df.sample(frac=2, replace=True, random_state=1).execute()
num_legs num_wings num_specimen_seen
dog 4 0 2
fish 0 0 8
falcon 2 2 10
falcon 2 2 10
fish 0 0 8
dog 4 0 2
fish 0 0 8
dog 4 0 2
```
Using a DataFrame column as weights. Rows with larger value in the
num_specimen_seen column are more likely to be sampled.
```pycon
>>> df.sample(n=2, weights='num_specimen_seen', random_state=1).execute()
num_legs num_wings num_specimen_seen
falcon 2 2 10
fish 0 0 8
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.sem.md
# maxframe.dataframe.Series.sem
#### Series.sem(axis=None, skipna=True, level=None, ddof=1, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.set_axis.md
# maxframe.dataframe.Series.set_axis
#### Series.set_axis(labels, axis=0, inplace=False)
Assign desired index to given axis.
Indexes for row labels can be changed by assigning
a list-like or Index.
* **Parameters:**
* **labels** (*list-like* *,* [*Index*](maxframe.dataframe.Index.md#maxframe.dataframe.Index)) – The values for the new index.
* **axis** ( *{0* *or* *'index'}* *,* *default 0*) – The axis to update. The value 0 identifies the rows.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether to return a new Series instance.
* **Returns:**
**renamed** – An object of type Series or None if `inplace=True`.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or None
#### SEE ALSO
`Series.rename_axis`
: Alter the name of the index.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3])
>>> s.execute()
0 1
1 2
2 3
dtype: int64
```
```pycon
>>> s.set_axis(['a', 'b', 'c'], axis=0).execute()
a 1
b 2
c 3
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.shape.md
# maxframe.dataframe.Series.shape
#### *property* Series.shape
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.shift.md
# maxframe.dataframe.Series.shift
#### Series.shift(periods=1, freq=None, axis=0, fill_value=None)
Shift index by desired number of periods with an optional time freq.
When freq is not passed, shift the index without realigning the data.
If freq is passed (in this case, the index must be date or datetime,
or it will raise a NotImplementedError), the index will be
increased using the periods and the freq.
* **Parameters:**
* **periods** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Number of periods to shift. Can be positive or negative.
* **freq** (*DateOffset* *,* *tseries.offsets* *,* *timedelta* *, or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Offset to use from the tseries module or time rule (e.g. ‘EOM’).
If freq is specified then the index values are shifted but the
data is not realigned. That is, use freq if you would like to
extend the index when shifting and preserve the original data.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'* *,* *None}* *,* *default None*) – Shift direction.
* **fill_value** ([*object*](https://docs.python.org/3/library/functions.html#object) *,* *optional*) – The scalar value to use for newly introduced missing values.
the default depends on the dtype of self.
For numeric data, `np.nan` is used.
For datetime, timedelta, or period data, etc. `NaT` is used.
For extension dtypes, `self.dtype.na_value` is used.
* **Returns:**
Copy of input object, shifted.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
`Index.shift`
: Shift values of Index.
`DatetimeIndex.shift`
: Shift values of DatetimeIndex.
`PeriodIndex.shift`
: Shift values of PeriodIndex.
[`tshift`](maxframe.dataframe.Series.tshift.md#maxframe.dataframe.Series.tshift)
: Shift the time index, using the index’s frequency if available.
### Examples
```pycon
>>> import maxframe.dataframe as md
```
```pycon
>>> df = md.DataFrame({'Col1': [10, 20, 15, 30, 45],
... 'Col2': [13, 23, 18, 33, 48],
... 'Col3': [17, 27, 22, 37, 52]})
```
```pycon
>>> df.shift(periods=3).execute()
Col1 Col2 Col3
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 10.0 13.0 17.0
4 20.0 23.0 27.0
```
```pycon
>>> df.shift(periods=1, axis='columns').execute()
Col1 Col2 Col3
0 NaN 10.0 13.0
1 NaN 20.0 23.0
2 NaN 15.0 18.0
3 NaN 30.0 33.0
4 NaN 45.0 48.0
```
```pycon
>>> df.shift(periods=3, fill_value=0).execute()
Col1 Col2 Col3
0 0 0 0
1 0 0 0
2 0 0 0
3 10 13 17
4 20 23 27
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.sort_index.md
# maxframe.dataframe.Series.sort_index
#### Series.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index: [bool](https://docs.python.org/3/library/functions.html#bool) = False, parallel_kind='PSRS', psrs_kinds=None, default_index_type=None)
Sort object by labels (along an axis).
* **Parameters:**
* **a** (*Input DataFrame* *or* *Series.*)
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – The axis along which to sort. The value 0 identifies the rows,
and 1 identifies the columns.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *level name* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *ints* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *level names*) – If not None, sort on values in specified index level(s).
* **ascending** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Sort ascending vs. descending.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, perform operation in-place.
* **kind** ( *{'quicksort'* *,* *'mergesort'* *,* *'heapsort'}* *,* *default 'quicksort'*) – Choice of sorting algorithm. See also ndarray.np.sort for more
information. mergesort is the only stable algorithm. For
DataFrames, this option is only applied when sorting on a single
column or label.
* **na_position** ( *{'first'* *,* *'last'}* *,* *default 'last'*) – Puts NaNs at the beginning if first; last puts NaNs at the end.
Not implemented for MultiIndex.
* **sort_remaining** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If True and sorting by level and index is multilevel, sort by other
levels too (in order) after sorting by specified level.
* **ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
* **parallel_kind** ( *{'PSRS'}* *,* *optional.*) – Parallel sorting algorithm, for the details, refer to:
[http://csweb.cs.wfu.edu/bigiron/LittleFE-PSRS/build/html/PSRSalgorithm.html](http://csweb.cs.wfu.edu/bigiron/LittleFE-PSRS/build/html/PSRSalgorithm.html)
* **psrs_kinds** (*Sorting algorithms during PSRS algorithm.*)
* **Returns:**
**sorted_obj** – DataFrame with sorted index if inplace=False, None otherwise.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or None
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.sort_values.md
# maxframe.dataframe.Series.sort_values
#### Series.sort_values(axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, parallel_kind='PSRS', psrs_kinds=None)
Sort by the values.
Sort a Series in ascending or descending order by some
criterion.
* **Parameters:**
* **series** (*input Series.*)
* **axis** ( *{0* *or* *'index'}* *,* *default 0*) – Axis to direct sorting. The value ‘index’ is accepted for
compatibility with DataFrame.sort_values.
* **ascending** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If True, sort values in ascending order, otherwise descending.
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, perform operation in-place.
* **kind** ( *{'quicksort'* *,* *'mergesort'* *or* *'heapsort'}* *,* *default 'quicksort'*) – Choice of sorting algorithm. See also [`numpy.sort()`](https://numpy.org/doc/stable/reference/generated/numpy.sort.html#numpy.sort) for more
information. ‘mergesort’ is the only stable algorithm.
* **na_position** ( *{'first'* *or* *'last'}* *,* *default 'last'*) – Argument ‘first’ puts NaNs at the beginning, ‘last’ puts NaNs at
the end.
* **ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, the resulting axis will be labeled 0, 1, …, n - 1.
* **Returns:**
Series ordered by values.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> raw = pd.Series([np.nan, 1, 3, 10, 5])
>>> s = md.Series(raw)
>>> s.execute()
0 NaN
1 1.0
2 3.0
3 10.0
4 5.0
dtype: float64
```
Sort values ascending order (default behaviour)
```pycon
>>> s.sort_values(ascending=True).execute()
1 1.0
2 3.0
4 5.0
3 10.0
0 NaN
dtype: float64
```
Sort values descending order
```pycon
>>> s.sort_values(ascending=False).execute()
3 10.0
4 5.0
2 3.0
1 1.0
0 NaN
dtype: float64
```
Sort values inplace
```pycon
>>> s.sort_values(ascending=False, inplace=True)
>>> s.execute()
3 10.0
4 5.0
2 3.0
1 1.0
0 NaN
dtype: float64
```
Sort values putting NAs first
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.std.md
# maxframe.dataframe.Series.std
#### Series.std(axis=None, skipna=True, level=None, ddof=1, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.capitalize.md
# maxframe.dataframe.Series.str.capitalize
#### Series.str.capitalize()
Convert strings in the Series/Index to be capitalized.
Equivalent to [`str.capitalize()`](https://docs.python.org/3/library/stdtypes.html#str.capitalize).
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [object](https://docs.python.org/3/library/functions.html#object)
#### SEE ALSO
[`Series.str.lower`](maxframe.dataframe.Series.str.lower.md#maxframe.dataframe.Series.str.lower)
: Converts all characters to lowercase.
[`Series.str.upper`](maxframe.dataframe.Series.str.upper.md#maxframe.dataframe.Series.str.upper)
: Converts all characters to uppercase.
[`Series.str.title`](maxframe.dataframe.Series.str.title.md#maxframe.dataframe.Series.str.title)
: Converts first character of each word to uppercase and remaining to lowercase.
[`Series.str.capitalize`](#maxframe.dataframe.Series.str.capitalize)
: Converts first character to uppercase and remaining to lowercase.
[`Series.str.swapcase`](maxframe.dataframe.Series.str.swapcase.md#maxframe.dataframe.Series.str.swapcase)
: Converts uppercase to lowercase and lowercase to uppercase.
`Series.str.casefold`
: Removes all case distinctions in the string.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
>>> s.execute()
0 lower
1 CAPITALS
2 this is a sentence
3 SwApCaSe
dtype: object
```
```pycon
>>> s.str.lower().execute()
0 lower
1 capitals
2 this is a sentence
3 swapcase
dtype: object
```
```pycon
>>> s.str.upper().execute()
0 LOWER
1 CAPITALS
2 THIS IS A SENTENCE
3 SWAPCASE
dtype: object
```
```pycon
>>> s.str.title().execute()
0 Lower
1 Capitals
2 This Is A Sentence
3 Swapcase
dtype: object
```
```pycon
>>> s.str.capitalize().execute()
0 Lower
1 Capitals
2 This is a sentence
3 Swapcase
dtype: object
```
```pycon
>>> s.str.swapcase().execute()
0 LOWER
1 capitals
2 THIS IS A SENTENCE
3 sWaPcAsE
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.contains.md
# maxframe.dataframe.Series.str.contains
#### Series.str.contains(pat, case: [bool](https://docs.python.org/3/library/functions.html#bool) = True, flags: [int](https://docs.python.org/3/library/functions.html#int) = 0, na=None, regex: [bool](https://docs.python.org/3/library/functions.html#bool) = True)
Test if pattern or regex is contained within a string of a Series or Index.
Return boolean Series or Index based on whether a given pattern or regex is
contained within a string of a Series or Index.
* **Parameters:**
* **pat** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Character sequence or regular expression.
* **case** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If True, case sensitive.
* **flags** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 0* *(**no flags* *)*) – Flags to pass through to the re module, e.g. re.IGNORECASE.
* **na** (*scalar* *,* *optional*) – Fill value for missing values. The default depends on dtype of the
array. For object-dtype, `numpy.nan` is used. For `StringDtype`,
`pandas.NA` is used.
* **regex** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) –
If True, assumes the pat is a regular expression.
If False, treats the pat as a literal string.
* **Returns:**
A Series or Index of boolean values indicating whether the
given pattern is contained within the string of each element
of the Series or Index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of boolean values
#### SEE ALSO
`match`
: Analogous, but stricter, relying on re.match instead of re.search.
[`Series.str.startswith`](maxframe.dataframe.Series.str.startswith.md#maxframe.dataframe.Series.str.startswith)
: Test if the start of each string element matches a pattern.
[`Series.str.endswith`](maxframe.dataframe.Series.str.endswith.md#maxframe.dataframe.Series.str.endswith)
: Same as startswith, but tests the end of string.
### Examples
Returning a Series of booleans using only a literal pattern.
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s1 = md.Series(['Mouse', 'dog', 'house and parrot', '23', mt.nan])
>>> s1.str.contains('og', regex=False).execute()
0 False
1 True
2 False
3 False
4 NaN
dtype: object
```
Returning an Index of booleans using only a literal pattern.
```pycon
>>> ind = md.Index(['Mouse', 'dog', 'house and parrot', '23.0', mt.nan])
>>> ind.str.contains('23', regex=False).execute()
Index([False, False, False, True, nan], dtype='object')
```
Specifying case sensitivity using case.
```pycon
>>> s1.str.contains('oG', case=True, regex=True).execute()
0 False
1 False
2 False
3 False
4 NaN
dtype: object
```
Specifying na to be False instead of NaN replaces NaN values
with False. If Series or Index does not contain NaN values
the resultant dtype will be bool, otherwise, an object dtype.
```pycon
>>> s1.str.contains('og', na=False, regex=True).execute()
0 False
1 True
2 False
3 False
4 False
dtype: bool
```
Returning ‘house’ or ‘dog’ when either expression occurs in a string.
```pycon
>>> s1.str.contains('house|dog', regex=True).execute()
0 False
1 True
2 True
3 False
4 NaN
dtype: object
```
Ignoring case sensitivity using flags with regex.
```pycon
>>> import re
>>> s1.str.contains('PARROT', flags=re.IGNORECASE, regex=True).execute()
0 False
1 False
2 True
3 False
4 NaN
dtype: object
```
Returning any digit using regular expression.
```pycon
>>> s1.str.contains('\\d', regex=True).execute()
0 False
1 False
2 False
3 True
4 NaN
dtype: object
```
Ensure pat is a not a literal pattern when regex is set to True.
Note in the following example one might expect only s2[1] and s2[3] to
return True. However, ‘.0’ as a regex matches any character
followed by a 0.
```pycon
>>> s2 = md.Series(['40', '40.0', '41', '41.0', '35'])
>>> s2.str.contains('.0', regex=True).execute()
0 True
1 True
2 False
3 True
4 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.count.md
# maxframe.dataframe.Series.str.count
#### Series.str.count(pat, flags: [int](https://docs.python.org/3/library/functions.html#int) = 0)
Count occurrences of pattern in each string of the Series/Index.
This function is used to count the number of times a particular regex
pattern is repeated in each of the string elements of the
[`Series`](https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series).
* **Parameters:**
* **pat** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Valid regular expression.
* **flags** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 0* *,* *meaning no flags*) – Flags for the re module. For a complete list, [see here](https://docs.python.org/3/howto/regex.html#compilation-flags).
* **\*\*kwargs** – For compatibility with other string methods. Not used.
* **Returns:**
Same type as the calling object containing the integer counts.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
#### SEE ALSO
[`re`](https://docs.python.org/3/library/re.html#module-re)
: Standard library module for regular expressions.
[`str.count`](https://docs.python.org/3/library/stdtypes.html#str.count)
: Standard library version, without regular expression support.
### Notes
Some characters need to be escaped when passing in pat.
eg. `'$'` has a special meaning in regex and must be escaped when
finding this literal character.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series(['A', 'B', 'Aaba', 'Baca', mt.nan, 'CABA', 'cat'])
>>> s.str.count('a').execute()
0 0.0
1 0.0
2 2.0
3 2.0
4 NaN
5 0.0
6 1.0
dtype: float64
```
Escape `'$'` to find the literal dollar sign.
```pycon
>>> s = md.Series(['$', 'B', 'Aab$', '$$ca', 'C$B$', 'cat'])
>>> s.str.count('\\$').execute()
0 1
1 0
2 1
3 2
4 2
5 0
dtype: int64
```
This is also available on Index
```pycon
>>> md.Index(['A', 'A', 'Aaba', 'cat']).str.count('a').execute()
Index([0, 0, 2, 1], dtype='int64')
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.endswith.md
# maxframe.dataframe.Series.str.endswith
#### Series.str.endswith(pat: [str](https://docs.python.org/3/library/stdtypes.html#str) | [tuple](https://docs.python.org/3/library/stdtypes.html#tuple)[[str](https://docs.python.org/3/library/stdtypes.html#str), ...], na: Scalar | [None](https://docs.python.org/3/library/constants.html#None) = None) → [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) | [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
Test if the end of each string element matches a pattern.
Equivalent to [`str.endswith()`](https://docs.python.org/3/library/stdtypes.html#str.endswith).
* **Parameters:**
* **pat** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *...* *]*) – Character sequence or tuple of strings. Regular expressions are not
accepted.
* **na** ([*object*](https://docs.python.org/3/library/functions.html#object) *,* *default NaN*) – Object shown if element tested is not a string. The default depends
on dtype of the array. For object-dtype, `numpy.nan` is used.
For `StringDtype`, `pandas.NA` is used.
* **Returns:**
A Series of booleans indicating whether the given pattern matches
the end of each string element.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`str.endswith`](https://docs.python.org/3/library/stdtypes.html#str.endswith)
: Python standard library string method.
[`Series.str.startswith`](maxframe.dataframe.Series.str.startswith.md#maxframe.dataframe.Series.str.startswith)
: Same as endswith, but tests the start of string.
[`Series.str.contains`](maxframe.dataframe.Series.str.contains.md#maxframe.dataframe.Series.str.contains)
: Tests if string element contains a pattern.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series(['bat', 'bear', 'caT', mt.nan])
>>> s.execute()
0 bat
1 bear
2 caT
3 NaN
dtype: object
```
```pycon
>>> s.str.endswith('t').execute()
0 True
1 False
2 False
3 NaN
dtype: object
```
```pycon
>>> s.str.endswith(('t', 'T')).execute()
0 True
1 False
2 True
3 NaN
dtype: object
```
Specifying na to be False instead of NaN.
```pycon
>>> s.str.endswith('t', na=False).execute()
0 True
1 False
2 False
3 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.find.md
# maxframe.dataframe.Series.str.find
#### Series.str.find(sub, start: [int](https://docs.python.org/3/library/functions.html#int) = 0, end=None)
Return lowest indexes in each strings in the Series/Index.
Each of returned indexes corresponds to the position where the
substring is fully contained between [start:end]. Return -1 on
failure. Equivalent to standard [`str.find()`](https://docs.python.org/3/library/stdtypes.html#str.find).
* **Parameters:**
* **sub** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Substring being searched.
* **start** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Left edge index.
* **end** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Right edge index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of int.
#### SEE ALSO
[`rfind`](maxframe.dataframe.Series.str.rfind.md#maxframe.dataframe.Series.str.rfind)
: Return highest indexes in each strings.
### Examples
For Series.str.find:
```pycon
>>> import maxframe.dataframe as md
>>> ser = md.Series(["cow_", "duck_", "do_ve"])
>>> ser.str.find("_").execute()
0 3
1 4
2 2
dtype: int64
```
For Series.str.rfind:
```pycon
>>> ser = md.Series(["_cow_", "duck_", "do_v_e"])
>>> ser.str.rfind("_").execute()
0 4
1 4
2 4
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.isalnum.md
# maxframe.dataframe.Series.str.isalnum
#### Series.str.isalnum()
Check whether all characters in each string are alphanumeric.
This is equivalent to running the Python string method
[`str.isalnum()`](https://docs.python.org/3/library/stdtypes.html#str.isalnum) for each element of the Series/Index. If a string
has zero characters, `False` is returned for that check.
* **Returns:**
Series or Index of boolean values with the same length as the original
Series/Index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`Series.str.isalpha`](maxframe.dataframe.Series.str.isalpha.md#maxframe.dataframe.Series.str.isalpha)
: Check whether all characters are alphabetic.
[`Series.str.isnumeric`](maxframe.dataframe.Series.str.isnumeric.md#maxframe.dataframe.Series.str.isnumeric)
: Check whether all characters are numeric.
[`Series.str.isalnum`](#maxframe.dataframe.Series.str.isalnum)
: Check whether all characters are alphanumeric.
[`Series.str.isdigit`](maxframe.dataframe.Series.str.isdigit.md#maxframe.dataframe.Series.str.isdigit)
: Check whether all characters are digits.
[`Series.str.isdecimal`](maxframe.dataframe.Series.str.isdecimal.md#maxframe.dataframe.Series.str.isdecimal)
: Check whether all characters are decimal.
[`Series.str.isspace`](maxframe.dataframe.Series.str.isspace.md#maxframe.dataframe.Series.str.isspace)
: Check whether all characters are whitespace.
[`Series.str.islower`](maxframe.dataframe.Series.str.islower.md#maxframe.dataframe.Series.str.islower)
: Check whether all characters are lowercase.
[`Series.str.isupper`](maxframe.dataframe.Series.str.isupper.md#maxframe.dataframe.Series.str.isupper)
: Check whether all characters are uppercase.
[`Series.str.istitle`](maxframe.dataframe.Series.str.istitle.md#maxframe.dataframe.Series.str.istitle)
: Check whether all characters are titlecase.
### Examples
**Checks for Alphabetic and Numeric Characters**
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series(['one', 'one1', '1', ''])
```
```pycon
>>> s1.str.isalpha().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s1.str.isnumeric().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
```pycon
>>> s1.str.isalnum().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
Note that checks against characters mixed with any additional punctuation
or whitespace will evaluate to false for an alphanumeric check.
```pycon
>>> s2 = md.Series(['A B', '1.5', '3,000'])
>>> s2.str.isalnum().execute()
0 False
1 False
2 False
dtype: bool
```
**More Detailed Checks for Numeric Characters**
There are several different but overlapping sets of numeric characters that
can be checked for.
```pycon
>>> s3 = md.Series(['23', '³', '⅕', ''])
```
The `s3.str.isdecimal` method checks for characters used to form numbers
in base 10.
```pycon
>>> s3.str.isdecimal().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
The `s.str.isdigit` method is the same as `s3.str.isdecimal` but also
includes special digits, like superscripted and subscripted digits in
unicode.
```pycon
>>> s3.str.isdigit().execute()
0 True
1 True
2 False
3 False
dtype: bool
```
The `s.str.isnumeric` method is the same as `s3.str.isdigit` but also
includes other characters that can represent quantities such as unicode
fractions.
```pycon
>>> s3.str.isnumeric().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
**Checks for Whitespace**
```pycon
>>> s4 = md.Series([' ', '\t\r\n ', ''])
>>> s4.str.isspace().execute()
0 True
1 True
2 False
dtype: bool
```
**Checks for Character Case**
```pycon
>>> s5 = md.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
```
```pycon
>>> s5.str.islower().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s5.str.isupper().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
The `s5.str.istitle` method checks for whether all words are in title
case (whether only the first letter of each word is capitalized). Words are
assumed to be as any sequence of non-numeric characters separated by
whitespace characters.
```pycon
>>> s5.str.istitle().execute()
0 False
1 True
2 False
3 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.isalpha.md
# maxframe.dataframe.Series.str.isalpha
#### Series.str.isalpha()
Check whether all characters in each string are alphabetic.
This is equivalent to running the Python string method
[`str.isalpha()`](https://docs.python.org/3/library/stdtypes.html#str.isalpha) for each element of the Series/Index. If a string
has zero characters, `False` is returned for that check.
* **Returns:**
Series or Index of boolean values with the same length as the original
Series/Index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`Series.str.isalpha`](#maxframe.dataframe.Series.str.isalpha)
: Check whether all characters are alphabetic.
[`Series.str.isnumeric`](maxframe.dataframe.Series.str.isnumeric.md#maxframe.dataframe.Series.str.isnumeric)
: Check whether all characters are numeric.
[`Series.str.isalnum`](maxframe.dataframe.Series.str.isalnum.md#maxframe.dataframe.Series.str.isalnum)
: Check whether all characters are alphanumeric.
[`Series.str.isdigit`](maxframe.dataframe.Series.str.isdigit.md#maxframe.dataframe.Series.str.isdigit)
: Check whether all characters are digits.
[`Series.str.isdecimal`](maxframe.dataframe.Series.str.isdecimal.md#maxframe.dataframe.Series.str.isdecimal)
: Check whether all characters are decimal.
[`Series.str.isspace`](maxframe.dataframe.Series.str.isspace.md#maxframe.dataframe.Series.str.isspace)
: Check whether all characters are whitespace.
[`Series.str.islower`](maxframe.dataframe.Series.str.islower.md#maxframe.dataframe.Series.str.islower)
: Check whether all characters are lowercase.
[`Series.str.isupper`](maxframe.dataframe.Series.str.isupper.md#maxframe.dataframe.Series.str.isupper)
: Check whether all characters are uppercase.
[`Series.str.istitle`](maxframe.dataframe.Series.str.istitle.md#maxframe.dataframe.Series.str.istitle)
: Check whether all characters are titlecase.
### Examples
**Checks for Alphabetic and Numeric Characters**
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series(['one', 'one1', '1', ''])
```
```pycon
>>> s1.str.isalpha().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s1.str.isnumeric().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
```pycon
>>> s1.str.isalnum().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
Note that checks against characters mixed with any additional punctuation
or whitespace will evaluate to false for an alphanumeric check.
```pycon
>>> s2 = md.Series(['A B', '1.5', '3,000'])
>>> s2.str.isalnum().execute()
0 False
1 False
2 False
dtype: bool
```
**More Detailed Checks for Numeric Characters**
There are several different but overlapping sets of numeric characters that
can be checked for.
```pycon
>>> s3 = md.Series(['23', '³', '⅕', ''])
```
The `s3.str.isdecimal` method checks for characters used to form numbers
in base 10.
```pycon
>>> s3.str.isdecimal().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
The `s.str.isdigit` method is the same as `s3.str.isdecimal` but also
includes special digits, like superscripted and subscripted digits in
unicode.
```pycon
>>> s3.str.isdigit().execute()
0 True
1 True
2 False
3 False
dtype: bool
```
The `s.str.isnumeric` method is the same as `s3.str.isdigit` but also
includes other characters that can represent quantities such as unicode
fractions.
```pycon
>>> s3.str.isnumeric().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
**Checks for Whitespace**
```pycon
>>> s4 = md.Series([' ', '\t\r\n ', ''])
>>> s4.str.isspace().execute()
0 True
1 True
2 False
dtype: bool
```
**Checks for Character Case**
```pycon
>>> s5 = md.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
```
```pycon
>>> s5.str.islower().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s5.str.isupper().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
The `s5.str.istitle` method checks for whether all words are in title
case (whether only the first letter of each word is capitalized). Words are
assumed to be as any sequence of non-numeric characters separated by
whitespace characters.
```pycon
>>> s5.str.istitle().execute()
0 False
1 True
2 False
3 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.isdecimal.md
# maxframe.dataframe.Series.str.isdecimal
#### Series.str.isdecimal()
Check whether all characters in each string are decimal.
This is equivalent to running the Python string method
[`str.isdecimal()`](https://docs.python.org/3/library/stdtypes.html#str.isdecimal) for each element of the Series/Index. If a string
has zero characters, `False` is returned for that check.
* **Returns:**
Series or Index of boolean values with the same length as the original
Series/Index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`Series.str.isalpha`](maxframe.dataframe.Series.str.isalpha.md#maxframe.dataframe.Series.str.isalpha)
: Check whether all characters are alphabetic.
[`Series.str.isnumeric`](maxframe.dataframe.Series.str.isnumeric.md#maxframe.dataframe.Series.str.isnumeric)
: Check whether all characters are numeric.
[`Series.str.isalnum`](maxframe.dataframe.Series.str.isalnum.md#maxframe.dataframe.Series.str.isalnum)
: Check whether all characters are alphanumeric.
[`Series.str.isdigit`](maxframe.dataframe.Series.str.isdigit.md#maxframe.dataframe.Series.str.isdigit)
: Check whether all characters are digits.
[`Series.str.isdecimal`](#maxframe.dataframe.Series.str.isdecimal)
: Check whether all characters are decimal.
[`Series.str.isspace`](maxframe.dataframe.Series.str.isspace.md#maxframe.dataframe.Series.str.isspace)
: Check whether all characters are whitespace.
[`Series.str.islower`](maxframe.dataframe.Series.str.islower.md#maxframe.dataframe.Series.str.islower)
: Check whether all characters are lowercase.
[`Series.str.isupper`](maxframe.dataframe.Series.str.isupper.md#maxframe.dataframe.Series.str.isupper)
: Check whether all characters are uppercase.
[`Series.str.istitle`](maxframe.dataframe.Series.str.istitle.md#maxframe.dataframe.Series.str.istitle)
: Check whether all characters are titlecase.
### Examples
**Checks for Alphabetic and Numeric Characters**
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series(['one', 'one1', '1', ''])
```
```pycon
>>> s1.str.isalpha().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s1.str.isnumeric().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
```pycon
>>> s1.str.isalnum().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
Note that checks against characters mixed with any additional punctuation
or whitespace will evaluate to false for an alphanumeric check.
```pycon
>>> s2 = md.Series(['A B', '1.5', '3,000'])
>>> s2.str.isalnum().execute()
0 False
1 False
2 False
dtype: bool
```
**More Detailed Checks for Numeric Characters**
There are several different but overlapping sets of numeric characters that
can be checked for.
```pycon
>>> s3 = md.Series(['23', '³', '⅕', ''])
```
The `s3.str.isdecimal` method checks for characters used to form numbers
in base 10.
```pycon
>>> s3.str.isdecimal().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
The `s.str.isdigit` method is the same as `s3.str.isdecimal` but also
includes special digits, like superscripted and subscripted digits in
unicode.
```pycon
>>> s3.str.isdigit().execute()
0 True
1 True
2 False
3 False
dtype: bool
```
The `s.str.isnumeric` method is the same as `s3.str.isdigit` but also
includes other characters that can represent quantities such as unicode
fractions.
```pycon
>>> s3.str.isnumeric().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
**Checks for Whitespace**
```pycon
>>> s4 = md.Series([' ', '\t\r\n ', ''])
>>> s4.str.isspace().execute()
0 True
1 True
2 False
dtype: bool
```
**Checks for Character Case**
```pycon
>>> s5 = md.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
```
```pycon
>>> s5.str.islower().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s5.str.isupper().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
The `s5.str.istitle` method checks for whether all words are in title
case (whether only the first letter of each word is capitalized). Words are
assumed to be as any sequence of non-numeric characters separated by
whitespace characters.
```pycon
>>> s5.str.istitle().execute()
0 False
1 True
2 False
3 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.isdigit.md
# maxframe.dataframe.Series.str.isdigit
#### Series.str.isdigit()
Check whether all characters in each string are digits.
This is equivalent to running the Python string method
[`str.isdigit()`](https://docs.python.org/3/library/stdtypes.html#str.isdigit) for each element of the Series/Index. If a string
has zero characters, `False` is returned for that check.
* **Returns:**
Series or Index of boolean values with the same length as the original
Series/Index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`Series.str.isalpha`](maxframe.dataframe.Series.str.isalpha.md#maxframe.dataframe.Series.str.isalpha)
: Check whether all characters are alphabetic.
[`Series.str.isnumeric`](maxframe.dataframe.Series.str.isnumeric.md#maxframe.dataframe.Series.str.isnumeric)
: Check whether all characters are numeric.
[`Series.str.isalnum`](maxframe.dataframe.Series.str.isalnum.md#maxframe.dataframe.Series.str.isalnum)
: Check whether all characters are alphanumeric.
[`Series.str.isdigit`](#maxframe.dataframe.Series.str.isdigit)
: Check whether all characters are digits.
[`Series.str.isdecimal`](maxframe.dataframe.Series.str.isdecimal.md#maxframe.dataframe.Series.str.isdecimal)
: Check whether all characters are decimal.
[`Series.str.isspace`](maxframe.dataframe.Series.str.isspace.md#maxframe.dataframe.Series.str.isspace)
: Check whether all characters are whitespace.
[`Series.str.islower`](maxframe.dataframe.Series.str.islower.md#maxframe.dataframe.Series.str.islower)
: Check whether all characters are lowercase.
[`Series.str.isupper`](maxframe.dataframe.Series.str.isupper.md#maxframe.dataframe.Series.str.isupper)
: Check whether all characters are uppercase.
[`Series.str.istitle`](maxframe.dataframe.Series.str.istitle.md#maxframe.dataframe.Series.str.istitle)
: Check whether all characters are titlecase.
### Examples
**Checks for Alphabetic and Numeric Characters**
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series(['one', 'one1', '1', ''])
```
```pycon
>>> s1.str.isalpha().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s1.str.isnumeric().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
```pycon
>>> s1.str.isalnum().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
Note that checks against characters mixed with any additional punctuation
or whitespace will evaluate to false for an alphanumeric check.
```pycon
>>> s2 = md.Series(['A B', '1.5', '3,000'])
>>> s2.str.isalnum().execute()
0 False
1 False
2 False
dtype: bool
```
**More Detailed Checks for Numeric Characters**
There are several different but overlapping sets of numeric characters that
can be checked for.
```pycon
>>> s3 = md.Series(['23', '³', '⅕', ''])
```
The `s3.str.isdecimal` method checks for characters used to form numbers
in base 10.
```pycon
>>> s3.str.isdecimal().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
The `s.str.isdigit` method is the same as `s3.str.isdecimal` but also
includes special digits, like superscripted and subscripted digits in
unicode.
```pycon
>>> s3.str.isdigit().execute()
0 True
1 True
2 False
3 False
dtype: bool
```
The `s.str.isnumeric` method is the same as `s3.str.isdigit` but also
includes other characters that can represent quantities such as unicode
fractions.
```pycon
>>> s3.str.isnumeric().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
**Checks for Whitespace**
```pycon
>>> s4 = md.Series([' ', '\t\r\n ', ''])
>>> s4.str.isspace().execute()
0 True
1 True
2 False
dtype: bool
```
**Checks for Character Case**
```pycon
>>> s5 = md.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
```
```pycon
>>> s5.str.islower().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s5.str.isupper().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
The `s5.str.istitle` method checks for whether all words are in title
case (whether only the first letter of each word is capitalized). Words are
assumed to be as any sequence of non-numeric characters separated by
whitespace characters.
```pycon
>>> s5.str.istitle().execute()
0 False
1 True
2 False
3 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.islower.md
# maxframe.dataframe.Series.str.islower
#### Series.str.islower()
Check whether all characters in each string are lowercase.
This is equivalent to running the Python string method
[`str.islower()`](https://docs.python.org/3/library/stdtypes.html#str.islower) for each element of the Series/Index. If a string
has zero characters, `False` is returned for that check.
* **Returns:**
Series or Index of boolean values with the same length as the original
Series/Index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`Series.str.isalpha`](maxframe.dataframe.Series.str.isalpha.md#maxframe.dataframe.Series.str.isalpha)
: Check whether all characters are alphabetic.
[`Series.str.isnumeric`](maxframe.dataframe.Series.str.isnumeric.md#maxframe.dataframe.Series.str.isnumeric)
: Check whether all characters are numeric.
[`Series.str.isalnum`](maxframe.dataframe.Series.str.isalnum.md#maxframe.dataframe.Series.str.isalnum)
: Check whether all characters are alphanumeric.
[`Series.str.isdigit`](maxframe.dataframe.Series.str.isdigit.md#maxframe.dataframe.Series.str.isdigit)
: Check whether all characters are digits.
[`Series.str.isdecimal`](maxframe.dataframe.Series.str.isdecimal.md#maxframe.dataframe.Series.str.isdecimal)
: Check whether all characters are decimal.
[`Series.str.isspace`](maxframe.dataframe.Series.str.isspace.md#maxframe.dataframe.Series.str.isspace)
: Check whether all characters are whitespace.
[`Series.str.islower`](#maxframe.dataframe.Series.str.islower)
: Check whether all characters are lowercase.
[`Series.str.isupper`](maxframe.dataframe.Series.str.isupper.md#maxframe.dataframe.Series.str.isupper)
: Check whether all characters are uppercase.
[`Series.str.istitle`](maxframe.dataframe.Series.str.istitle.md#maxframe.dataframe.Series.str.istitle)
: Check whether all characters are titlecase.
### Examples
**Checks for Alphabetic and Numeric Characters**
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series(['one', 'one1', '1', ''])
```
```pycon
>>> s1.str.isalpha().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s1.str.isnumeric().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
```pycon
>>> s1.str.isalnum().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
Note that checks against characters mixed with any additional punctuation
or whitespace will evaluate to false for an alphanumeric check.
```pycon
>>> s2 = md.Series(['A B', '1.5', '3,000'])
>>> s2.str.isalnum().execute()
0 False
1 False
2 False
dtype: bool
```
**More Detailed Checks for Numeric Characters**
There are several different but overlapping sets of numeric characters that
can be checked for.
```pycon
>>> s3 = md.Series(['23', '³', '⅕', ''])
```
The `s3.str.isdecimal` method checks for characters used to form numbers
in base 10.
```pycon
>>> s3.str.isdecimal().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
The `s.str.isdigit` method is the same as `s3.str.isdecimal` but also
includes special digits, like superscripted and subscripted digits in
unicode.
```pycon
>>> s3.str.isdigit().execute()
0 True
1 True
2 False
3 False
dtype: bool
```
The `s.str.isnumeric` method is the same as `s3.str.isdigit` but also
includes other characters that can represent quantities such as unicode
fractions.
```pycon
>>> s3.str.isnumeric().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
**Checks for Whitespace**
```pycon
>>> s4 = md.Series([' ', '\t\r\n ', ''])
>>> s4.str.isspace().execute()
0 True
1 True
2 False
dtype: bool
```
**Checks for Character Case**
```pycon
>>> s5 = md.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
```
```pycon
>>> s5.str.islower().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s5.str.isupper().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
The `s5.str.istitle` method checks for whether all words are in title
case (whether only the first letter of each word is capitalized). Words are
assumed to be as any sequence of non-numeric characters separated by
whitespace characters.
```pycon
>>> s5.str.istitle().execute()
0 False
1 True
2 False
3 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.isnumeric.md
# maxframe.dataframe.Series.str.isnumeric
#### Series.str.isnumeric()
Check whether all characters in each string are numeric.
This is equivalent to running the Python string method
[`str.isnumeric()`](https://docs.python.org/3/library/stdtypes.html#str.isnumeric) for each element of the Series/Index. If a string
has zero characters, `False` is returned for that check.
* **Returns:**
Series or Index of boolean values with the same length as the original
Series/Index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`Series.str.isalpha`](maxframe.dataframe.Series.str.isalpha.md#maxframe.dataframe.Series.str.isalpha)
: Check whether all characters are alphabetic.
[`Series.str.isnumeric`](#maxframe.dataframe.Series.str.isnumeric)
: Check whether all characters are numeric.
[`Series.str.isalnum`](maxframe.dataframe.Series.str.isalnum.md#maxframe.dataframe.Series.str.isalnum)
: Check whether all characters are alphanumeric.
[`Series.str.isdigit`](maxframe.dataframe.Series.str.isdigit.md#maxframe.dataframe.Series.str.isdigit)
: Check whether all characters are digits.
[`Series.str.isdecimal`](maxframe.dataframe.Series.str.isdecimal.md#maxframe.dataframe.Series.str.isdecimal)
: Check whether all characters are decimal.
[`Series.str.isspace`](maxframe.dataframe.Series.str.isspace.md#maxframe.dataframe.Series.str.isspace)
: Check whether all characters are whitespace.
[`Series.str.islower`](maxframe.dataframe.Series.str.islower.md#maxframe.dataframe.Series.str.islower)
: Check whether all characters are lowercase.
[`Series.str.isupper`](maxframe.dataframe.Series.str.isupper.md#maxframe.dataframe.Series.str.isupper)
: Check whether all characters are uppercase.
[`Series.str.istitle`](maxframe.dataframe.Series.str.istitle.md#maxframe.dataframe.Series.str.istitle)
: Check whether all characters are titlecase.
### Examples
**Checks for Alphabetic and Numeric Characters**
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series(['one', 'one1', '1', ''])
```
```pycon
>>> s1.str.isalpha().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s1.str.isnumeric().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
```pycon
>>> s1.str.isalnum().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
Note that checks against characters mixed with any additional punctuation
or whitespace will evaluate to false for an alphanumeric check.
```pycon
>>> s2 = md.Series(['A B', '1.5', '3,000'])
>>> s2.str.isalnum().execute()
0 False
1 False
2 False
dtype: bool
```
**More Detailed Checks for Numeric Characters**
There are several different but overlapping sets of numeric characters that
can be checked for.
```pycon
>>> s3 = md.Series(['23', '³', '⅕', ''])
```
The `s3.str.isdecimal` method checks for characters used to form numbers
in base 10.
```pycon
>>> s3.str.isdecimal().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
The `s.str.isdigit` method is the same as `s3.str.isdecimal` but also
includes special digits, like superscripted and subscripted digits in
unicode.
```pycon
>>> s3.str.isdigit().execute()
0 True
1 True
2 False
3 False
dtype: bool
```
The `s.str.isnumeric` method is the same as `s3.str.isdigit` but also
includes other characters that can represent quantities such as unicode
fractions.
```pycon
>>> s3.str.isnumeric().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
**Checks for Whitespace**
```pycon
>>> s4 = md.Series([' ', '\t\r\n ', ''])
>>> s4.str.isspace().execute()
0 True
1 True
2 False
dtype: bool
```
**Checks for Character Case**
```pycon
>>> s5 = md.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
```
```pycon
>>> s5.str.islower().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s5.str.isupper().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
The `s5.str.istitle` method checks for whether all words are in title
case (whether only the first letter of each word is capitalized). Words are
assumed to be as any sequence of non-numeric characters separated by
whitespace characters.
```pycon
>>> s5.str.istitle().execute()
0 False
1 True
2 False
3 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.isspace.md
# maxframe.dataframe.Series.str.isspace
#### Series.str.isspace()
Check whether all characters in each string are whitespace.
This is equivalent to running the Python string method
[`str.isspace()`](https://docs.python.org/3/library/stdtypes.html#str.isspace) for each element of the Series/Index. If a string
has zero characters, `False` is returned for that check.
* **Returns:**
Series or Index of boolean values with the same length as the original
Series/Index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`Series.str.isalpha`](maxframe.dataframe.Series.str.isalpha.md#maxframe.dataframe.Series.str.isalpha)
: Check whether all characters are alphabetic.
[`Series.str.isnumeric`](maxframe.dataframe.Series.str.isnumeric.md#maxframe.dataframe.Series.str.isnumeric)
: Check whether all characters are numeric.
[`Series.str.isalnum`](maxframe.dataframe.Series.str.isalnum.md#maxframe.dataframe.Series.str.isalnum)
: Check whether all characters are alphanumeric.
[`Series.str.isdigit`](maxframe.dataframe.Series.str.isdigit.md#maxframe.dataframe.Series.str.isdigit)
: Check whether all characters are digits.
[`Series.str.isdecimal`](maxframe.dataframe.Series.str.isdecimal.md#maxframe.dataframe.Series.str.isdecimal)
: Check whether all characters are decimal.
[`Series.str.isspace`](#maxframe.dataframe.Series.str.isspace)
: Check whether all characters are whitespace.
[`Series.str.islower`](maxframe.dataframe.Series.str.islower.md#maxframe.dataframe.Series.str.islower)
: Check whether all characters are lowercase.
[`Series.str.isupper`](maxframe.dataframe.Series.str.isupper.md#maxframe.dataframe.Series.str.isupper)
: Check whether all characters are uppercase.
[`Series.str.istitle`](maxframe.dataframe.Series.str.istitle.md#maxframe.dataframe.Series.str.istitle)
: Check whether all characters are titlecase.
### Examples
**Checks for Alphabetic and Numeric Characters**
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series(['one', 'one1', '1', ''])
```
```pycon
>>> s1.str.isalpha().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s1.str.isnumeric().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
```pycon
>>> s1.str.isalnum().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
Note that checks against characters mixed with any additional punctuation
or whitespace will evaluate to false for an alphanumeric check.
```pycon
>>> s2 = md.Series(['A B', '1.5', '3,000'])
>>> s2.str.isalnum().execute()
0 False
1 False
2 False
dtype: bool
```
**More Detailed Checks for Numeric Characters**
There are several different but overlapping sets of numeric characters that
can be checked for.
```pycon
>>> s3 = md.Series(['23', '³', '⅕', ''])
```
The `s3.str.isdecimal` method checks for characters used to form numbers
in base 10.
```pycon
>>> s3.str.isdecimal().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
The `s.str.isdigit` method is the same as `s3.str.isdecimal` but also
includes special digits, like superscripted and subscripted digits in
unicode.
```pycon
>>> s3.str.isdigit().execute()
0 True
1 True
2 False
3 False
dtype: bool
```
The `s.str.isnumeric` method is the same as `s3.str.isdigit` but also
includes other characters that can represent quantities such as unicode
fractions.
```pycon
>>> s3.str.isnumeric().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
**Checks for Whitespace**
```pycon
>>> s4 = md.Series([' ', '\t\r\n ', ''])
>>> s4.str.isspace().execute()
0 True
1 True
2 False
dtype: bool
```
**Checks for Character Case**
```pycon
>>> s5 = md.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
```
```pycon
>>> s5.str.islower().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s5.str.isupper().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
The `s5.str.istitle` method checks for whether all words are in title
case (whether only the first letter of each word is capitalized). Words are
assumed to be as any sequence of non-numeric characters separated by
whitespace characters.
```pycon
>>> s5.str.istitle().execute()
0 False
1 True
2 False
3 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.istitle.md
# maxframe.dataframe.Series.str.istitle
#### Series.str.istitle()
Check whether all characters in each string are titlecase.
This is equivalent to running the Python string method
[`str.istitle()`](https://docs.python.org/3/library/stdtypes.html#str.istitle) for each element of the Series/Index. If a string
has zero characters, `False` is returned for that check.
* **Returns:**
Series or Index of boolean values with the same length as the original
Series/Index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`Series.str.isalpha`](maxframe.dataframe.Series.str.isalpha.md#maxframe.dataframe.Series.str.isalpha)
: Check whether all characters are alphabetic.
[`Series.str.isnumeric`](maxframe.dataframe.Series.str.isnumeric.md#maxframe.dataframe.Series.str.isnumeric)
: Check whether all characters are numeric.
[`Series.str.isalnum`](maxframe.dataframe.Series.str.isalnum.md#maxframe.dataframe.Series.str.isalnum)
: Check whether all characters are alphanumeric.
[`Series.str.isdigit`](maxframe.dataframe.Series.str.isdigit.md#maxframe.dataframe.Series.str.isdigit)
: Check whether all characters are digits.
[`Series.str.isdecimal`](maxframe.dataframe.Series.str.isdecimal.md#maxframe.dataframe.Series.str.isdecimal)
: Check whether all characters are decimal.
[`Series.str.isspace`](maxframe.dataframe.Series.str.isspace.md#maxframe.dataframe.Series.str.isspace)
: Check whether all characters are whitespace.
[`Series.str.islower`](maxframe.dataframe.Series.str.islower.md#maxframe.dataframe.Series.str.islower)
: Check whether all characters are lowercase.
[`Series.str.isupper`](maxframe.dataframe.Series.str.isupper.md#maxframe.dataframe.Series.str.isupper)
: Check whether all characters are uppercase.
[`Series.str.istitle`](#maxframe.dataframe.Series.str.istitle)
: Check whether all characters are titlecase.
### Examples
**Checks for Alphabetic and Numeric Characters**
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series(['one', 'one1', '1', ''])
```
```pycon
>>> s1.str.isalpha().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s1.str.isnumeric().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
```pycon
>>> s1.str.isalnum().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
Note that checks against characters mixed with any additional punctuation
or whitespace will evaluate to false for an alphanumeric check.
```pycon
>>> s2 = md.Series(['A B', '1.5', '3,000'])
>>> s2.str.isalnum().execute()
0 False
1 False
2 False
dtype: bool
```
**More Detailed Checks for Numeric Characters**
There are several different but overlapping sets of numeric characters that
can be checked for.
```pycon
>>> s3 = md.Series(['23', '³', '⅕', ''])
```
The `s3.str.isdecimal` method checks for characters used to form numbers
in base 10.
```pycon
>>> s3.str.isdecimal().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
The `s.str.isdigit` method is the same as `s3.str.isdecimal` but also
includes special digits, like superscripted and subscripted digits in
unicode.
```pycon
>>> s3.str.isdigit().execute()
0 True
1 True
2 False
3 False
dtype: bool
```
The `s.str.isnumeric` method is the same as `s3.str.isdigit` but also
includes other characters that can represent quantities such as unicode
fractions.
```pycon
>>> s3.str.isnumeric().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
**Checks for Whitespace**
```pycon
>>> s4 = md.Series([' ', '\t\r\n ', ''])
>>> s4.str.isspace().execute()
0 True
1 True
2 False
dtype: bool
```
**Checks for Character Case**
```pycon
>>> s5 = md.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
```
```pycon
>>> s5.str.islower().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s5.str.isupper().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
The `s5.str.istitle` method checks for whether all words are in title
case (whether only the first letter of each word is capitalized). Words are
assumed to be as any sequence of non-numeric characters separated by
whitespace characters.
```pycon
>>> s5.str.istitle().execute()
0 False
1 True
2 False
3 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.isupper.md
# maxframe.dataframe.Series.str.isupper
#### Series.str.isupper()
Check whether all characters in each string are uppercase.
This is equivalent to running the Python string method
[`str.isupper()`](https://docs.python.org/3/library/stdtypes.html#str.isupper) for each element of the Series/Index. If a string
has zero characters, `False` is returned for that check.
* **Returns:**
Series or Index of boolean values with the same length as the original
Series/Index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`Series.str.isalpha`](maxframe.dataframe.Series.str.isalpha.md#maxframe.dataframe.Series.str.isalpha)
: Check whether all characters are alphabetic.
[`Series.str.isnumeric`](maxframe.dataframe.Series.str.isnumeric.md#maxframe.dataframe.Series.str.isnumeric)
: Check whether all characters are numeric.
[`Series.str.isalnum`](maxframe.dataframe.Series.str.isalnum.md#maxframe.dataframe.Series.str.isalnum)
: Check whether all characters are alphanumeric.
[`Series.str.isdigit`](maxframe.dataframe.Series.str.isdigit.md#maxframe.dataframe.Series.str.isdigit)
: Check whether all characters are digits.
[`Series.str.isdecimal`](maxframe.dataframe.Series.str.isdecimal.md#maxframe.dataframe.Series.str.isdecimal)
: Check whether all characters are decimal.
[`Series.str.isspace`](maxframe.dataframe.Series.str.isspace.md#maxframe.dataframe.Series.str.isspace)
: Check whether all characters are whitespace.
[`Series.str.islower`](maxframe.dataframe.Series.str.islower.md#maxframe.dataframe.Series.str.islower)
: Check whether all characters are lowercase.
[`Series.str.isupper`](#maxframe.dataframe.Series.str.isupper)
: Check whether all characters are uppercase.
[`Series.str.istitle`](maxframe.dataframe.Series.str.istitle.md#maxframe.dataframe.Series.str.istitle)
: Check whether all characters are titlecase.
### Examples
**Checks for Alphabetic and Numeric Characters**
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series(['one', 'one1', '1', ''])
```
```pycon
>>> s1.str.isalpha().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s1.str.isnumeric().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
```pycon
>>> s1.str.isalnum().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
Note that checks against characters mixed with any additional punctuation
or whitespace will evaluate to false for an alphanumeric check.
```pycon
>>> s2 = md.Series(['A B', '1.5', '3,000'])
>>> s2.str.isalnum().execute()
0 False
1 False
2 False
dtype: bool
```
**More Detailed Checks for Numeric Characters**
There are several different but overlapping sets of numeric characters that
can be checked for.
```pycon
>>> s3 = md.Series(['23', '³', '⅕', ''])
```
The `s3.str.isdecimal` method checks for characters used to form numbers
in base 10.
```pycon
>>> s3.str.isdecimal().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
The `s.str.isdigit` method is the same as `s3.str.isdecimal` but also
includes special digits, like superscripted and subscripted digits in
unicode.
```pycon
>>> s3.str.isdigit().execute()
0 True
1 True
2 False
3 False
dtype: bool
```
The `s.str.isnumeric` method is the same as `s3.str.isdigit` but also
includes other characters that can represent quantities such as unicode
fractions.
```pycon
>>> s3.str.isnumeric().execute()
0 True
1 True
2 True
3 False
dtype: bool
```
**Checks for Whitespace**
```pycon
>>> s4 = md.Series([' ', '\t\r\n ', ''])
>>> s4.str.isspace().execute()
0 True
1 True
2 False
dtype: bool
```
**Checks for Character Case**
```pycon
>>> s5 = md.Series(['leopard', 'Golden Eagle', 'SNAKE', ''])
```
```pycon
>>> s5.str.islower().execute()
0 True
1 False
2 False
3 False
dtype: bool
```
```pycon
>>> s5.str.isupper().execute()
0 False
1 False
2 True
3 False
dtype: bool
```
The `s5.str.istitle` method checks for whether all words are in title
case (whether only the first letter of each word is capitalized). Words are
assumed to be as any sequence of non-numeric characters separated by
whitespace characters.
```pycon
>>> s5.str.istitle().execute()
0 False
1 True
2 False
3 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.len.md
# maxframe.dataframe.Series.str.len
#### Series.str.len()
Compute the length of each element in the Series/Index.
The element may be a sequence (such as a string, tuple or list) or a collection
(such as a dictionary).
* **Returns:**
A Series or Index of integer values indicating the length of each
element in the Series or Index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
`str.len`
: Python built-in function returning the length of an object.
`Series.size`
: Returns the length of the Series.
### Examples
Returns the length (number of characters) in a string. Returns the
number of entries for dictionaries, lists or tuples.
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(['dog',
... '',
... 5,
... {'foo' : 'bar'},
... [2, 3, 5, 7],
... ('one', 'two', 'three')])
>>> s.execute()
0 dog
1
2 5
3 {'foo': 'bar'}
4 [2, 3, 5, 7]
5 (one, two, three)
dtype: object
>>> s.str.len().execute()
0 3.0
1 0.0
2 NaN
3 1.0
4 4.0
5 3.0
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.ljust.md
# maxframe.dataframe.Series.str.ljust
#### Series.str.ljust(width: [int](https://docs.python.org/3/library/functions.html#int), fillchar: [str](https://docs.python.org/3/library/stdtypes.html#str) = ' ')
Pad right side of strings in the Series/Index.
Equivalent to [`str.ljust()`](https://docs.python.org/3/library/stdtypes.html#str.ljust).
* **Parameters:**
* **width** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Minimum width of resulting string; additional characters will be filled
with `fillchar`.
* **fillchar** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Additional character for filling, default is whitespace.
* **Return type:**
Series/Index of objects.
### Examples
For Series.str.center:
```pycon
>>> import maxframe.dataframe as md
>>> ser = md.Series(['dog', 'bird', 'mouse'])
>>> ser.str.center(8, fillchar='.').execute()
0 ..dog...
1 ..bird..
2 .mouse..
dtype: object
```
For Series.str.ljust:
```pycon
>>> ser = md.Series(['dog', 'bird', 'mouse'])
>>> ser.str.ljust(8, fillchar='.').execute()
0 dog.....
1 bird....
2 mouse...
dtype: object
```
For Series.str.rjust:
```pycon
>>> ser = md.Series(['dog', 'bird', 'mouse'])
>>> ser.str.rjust(8, fillchar='.').execute()
0 .....dog
1 ....bird
2 ...mouse
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.lower.md
# maxframe.dataframe.Series.str.lower
#### Series.str.lower()
Convert strings in the Series/Index to lowercase.
Equivalent to [`str.lower()`](https://docs.python.org/3/library/stdtypes.html#str.lower).
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [object](https://docs.python.org/3/library/functions.html#object)
#### SEE ALSO
[`Series.str.lower`](#maxframe.dataframe.Series.str.lower)
: Converts all characters to lowercase.
[`Series.str.upper`](maxframe.dataframe.Series.str.upper.md#maxframe.dataframe.Series.str.upper)
: Converts all characters to uppercase.
[`Series.str.title`](maxframe.dataframe.Series.str.title.md#maxframe.dataframe.Series.str.title)
: Converts first character of each word to uppercase and remaining to lowercase.
[`Series.str.capitalize`](maxframe.dataframe.Series.str.capitalize.md#maxframe.dataframe.Series.str.capitalize)
: Converts first character to uppercase and remaining to lowercase.
[`Series.str.swapcase`](maxframe.dataframe.Series.str.swapcase.md#maxframe.dataframe.Series.str.swapcase)
: Converts uppercase to lowercase and lowercase to uppercase.
`Series.str.casefold`
: Removes all case distinctions in the string.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
>>> s.execute()
0 lower
1 CAPITALS
2 this is a sentence
3 SwApCaSe
dtype: object
```
```pycon
>>> s.str.lower().execute()
0 lower
1 capitals
2 this is a sentence
3 swapcase
dtype: object
```
```pycon
>>> s.str.upper().execute()
0 LOWER
1 CAPITALS
2 THIS IS A SENTENCE
3 SWAPCASE
dtype: object
```
```pycon
>>> s.str.title().execute()
0 Lower
1 Capitals
2 This Is A Sentence
3 Swapcase
dtype: object
```
```pycon
>>> s.str.capitalize().execute()
0 Lower
1 Capitals
2 This is a sentence
3 Swapcase
dtype: object
```
```pycon
>>> s.str.swapcase().execute()
0 LOWER
1 capitals
2 THIS IS A SENTENCE
3 sWaPcAsE
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.lstrip.md
# maxframe.dataframe.Series.str.lstrip
#### Series.str.lstrip(to_strip=None)
Remove leading characters.
Strip whitespaces (including newlines) or a set of specified characters
from each string in the Series/Index from left side.
Replaces any non-strings in Series with NaNs.
Equivalent to [`str.lstrip()`](https://docs.python.org/3/library/stdtypes.html#str.lstrip).
* **Parameters:**
**to_strip** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *None* *,* *default None*) – Specifying the set of characters to be removed.
All combinations of this set of characters will be stripped.
If None then whitespaces are removed.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [object](https://docs.python.org/3/library/functions.html#object)
#### SEE ALSO
[`Series.str.strip`](maxframe.dataframe.Series.str.strip.md#maxframe.dataframe.Series.str.strip)
: Remove leading and trailing characters in Series/Index.
[`Series.str.lstrip`](#maxframe.dataframe.Series.str.lstrip)
: Remove leading characters in Series/Index.
[`Series.str.rstrip`](maxframe.dataframe.Series.str.rstrip.md#maxframe.dataframe.Series.str.rstrip)
: Remove trailing characters in Series/Index.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series(['1. Ant. ', '2. Bee!\n', '3. Cat?\t', mt.nan, 10, True])
>>> s.execute()
0 1. Ant.
1 2. Bee!\n
2 3. Cat?\t
3 NaN
4 10
5 True
dtype: object
```
```pycon
>>> s.str.strip().execute()
0 1. Ant.
1 2. Bee!
2 3. Cat?
3 NaN
4 NaN
5 NaN
dtype: object
```
```pycon
>>> s.str.lstrip('123.').execute()
0 Ant.
1 Bee!\n
2 Cat?\t
3 NaN
4 NaN
5 NaN
dtype: object
```
```pycon
>>> s.str.rstrip('.!? \n\t').execute()
0 1. Ant
1 2. Bee
2 3. Cat
3 NaN
4 NaN
5 NaN
dtype: object
```
```pycon
>>> s.str.strip('123.!? \n\t').execute()
0 Ant
1 Bee
2 Cat
3 NaN
4 NaN
5 NaN
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.md
# maxframe.dataframe.Series.str
#### Series.str()
Vectorized string functions for Series and Index.
NAs stay NA unless handled otherwise by a particular method.
Patterned after Python’s string methods, with some inspiration from
R’s stringr package.
.. rubric:: Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(["A_Str_Series"])
>>> s.execute()
0 A_Str_Series
dtype: object
>>> s.str.split("_").execute()
0 [A, Str, Series]
dtype: object
>>> s.str.replace("_", "").execute()
0 AStrSeries
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.pad.md
# maxframe.dataframe.Series.str.pad
#### Series.str.pad(width: [int](https://docs.python.org/3/library/functions.html#int), side: [Literal](https://docs.python.org/3/library/typing.html#typing.Literal)['left', 'right', 'both'] = 'left', fillchar: [str](https://docs.python.org/3/library/stdtypes.html#str) = ' ')
Pad strings in the Series/Index up to width.
* **Parameters:**
* **width** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Minimum width of resulting string; additional characters will be filled
with character defined in fillchar.
* **side** ( *{'left'* *,* *'right'* *,* *'both'}* *,* *default 'left'*) – Side from which to fill resulting string.
* **fillchar** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default ' '*) – Additional character for filling, default is whitespace.
* **Returns:**
Returns Series or Index with minimum number of char in object.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [object](https://docs.python.org/3/library/functions.html#object)
#### SEE ALSO
[`Series.str.rjust`](maxframe.dataframe.Series.str.rjust.md#maxframe.dataframe.Series.str.rjust)
: Fills the left side of strings with an arbitrary character. Equivalent to `Series.str.pad(side='left')`.
[`Series.str.ljust`](maxframe.dataframe.Series.str.ljust.md#maxframe.dataframe.Series.str.ljust)
: Fills the right side of strings with an arbitrary character. Equivalent to `Series.str.pad(side='right')`.
`Series.str.center`
: Fills both sides of strings with an arbitrary character. Equivalent to `Series.str.pad(side='both')`.
[`Series.str.zfill`](maxframe.dataframe.Series.str.zfill.md#maxframe.dataframe.Series.str.zfill)
: Pad strings in the Series/Index by prepending ‘0’ character. Equivalent to `Series.str.pad(side='left', fillchar='0')`.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(["caribou", "tiger"])
>>> s.execute()
0 caribou
1 tiger
dtype: object
```
```pycon
>>> s.str.pad(width=10).execute()
0 caribou
1 tiger
dtype: object
```
```pycon
>>> s.str.pad(width=10, side='right', fillchar='-').execute()
0 caribou---
1 tiger-----
dtype: object
```
```pycon
>>> s.str.pad(width=10, side='both', fillchar='-').execute()
0 -caribou--
1 --tiger---
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.repeat.md
# maxframe.dataframe.Series.str.repeat
#### Series.str.repeat(repeats)
Duplicate each string in the Series or Index.
* **Parameters:**
**repeats** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *sequence* *of* [*int*](https://docs.python.org/3/library/functions.html#int)) – Same value for all (int) or different value per (sequence).
* **Returns:**
Series or Index of repeated string objects specified by
input parameter repeats.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [pandas.Index](https://pandas.pydata.org/docs/reference/api/pandas.Index.html#pandas.Index)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(['a', 'b', 'c'])
>>> s.execute()
0 a
1 b
2 c
dtype: object
```
Single int repeats string in Series
```pycon
>>> s.str.repeat(repeats=2).execute()
0 aa
1 bb
2 cc
dtype: object
```
Sequence of int repeats corresponding string in Series
```pycon
>>> s.str.repeat(repeats=[1, 2, 3]).execute()
0 a
1 bb
2 ccc
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.replace.md
# maxframe.dataframe.Series.str.replace
#### Series.str.replace(pat: [str](https://docs.python.org/3/library/stdtypes.html#str) | [Pattern](https://docs.python.org/3/library/re.html#re.Pattern), repl: [str](https://docs.python.org/3/library/stdtypes.html#str) | [Callable](https://docs.python.org/3/library/typing.html#typing.Callable), n: [int](https://docs.python.org/3/library/functions.html#int) = -1, case: [bool](https://docs.python.org/3/library/functions.html#bool) | [None](https://docs.python.org/3/library/constants.html#None) = None, flags: [int](https://docs.python.org/3/library/functions.html#int) = 0, regex: [bool](https://docs.python.org/3/library/functions.html#bool) = False)
Replace each occurrence of pattern/regex in the Series/Index.
Equivalent to [`str.replace()`](https://docs.python.org/3/library/stdtypes.html#str.replace) or [`re.sub()`](https://docs.python.org/3/library/re.html#re.sub), depending on
the regex value.
* **Parameters:**
* **pat** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *compiled regex*) – String can be a character sequence or regular expression.
* **repl** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *callable*) – Replacement string or a callable. The callable is passed the regex
match object and must return a replacement string to be used.
See [`re.sub()`](https://docs.python.org/3/library/re.html#re.sub).
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default -1* *(**all* *)*) – Number of replacements to make from start.
* **case** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default None*) –
Determines if replace is case sensitive:
- If True, case sensitive (the default if pat is a string)
- Set to False for case insensitive
- Cannot be set if pat is a compiled regex.
* **flags** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 0* *(**no flags* *)*) – Regex module flags, e.g. re.IGNORECASE. Cannot be set if pat is a compiled
regex.
* **regex** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) –
Determines if the passed-in pattern is a regular expression:
- If True, assumes the passed-in pattern is a regular expression.
- If False, treats the pattern as a literal string
- Cannot be set to False if pat is a compiled regex or repl is
a callable.
* **Returns:**
A copy of the object with all matching occurrences of pat replaced by
repl.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [object](https://docs.python.org/3/library/functions.html#object)
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) –
* if regex is False and repl is a callable or pat is a compiled
regex
\* if pat is a compiled regex and case or flags is set
### Notes
When pat is a compiled regex, all flags should be included in the
compiled regex. Use of case, flags, or regex=False with a compiled
regex will raise an error.
### Examples
When pat is a string and regex is True, the given pat
is compiled as a regex. When repl is a string, it replaces matching
regex patterns as with `re.sub()`. NaN value(s) in the Series are
left as is:
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> md.Series(['foo', 'fuz', mt.nan]).str.replace('f.', 'ba', regex=True).execute()
0 bao
1 baz
2 NaN
dtype: object
```
When pat is a string and regex is False, every pat is replaced with
repl as with [`str.replace()`](https://docs.python.org/3/library/stdtypes.html#str.replace):
```pycon
>>> md.Series(['f.o', 'fuz', mt.nan]).str.replace('f.', 'ba', regex=False).execute()
0 bao
1 fuz
2 NaN
dtype: object
```
When repl is a callable, it is called on every pat using
[`re.sub()`](https://docs.python.org/3/library/re.html#re.sub). The callable should expect one positional argument
(a regex object) and return a string.
To get the idea:
```pycon
>>> md.Series(['foo', 'fuz', mt.nan]).str.replace('f', repr, regex=True).execute()
0 <re.Match object; span=(0, 1), match='f'>oo
1 <re.Match object; span=(0, 1), match='f'>uz
2 NaN
dtype: object
```
Reverse every lowercase alphabetic word:
```pycon
>>> repl = lambda m: m.group(0)[::-1]
>>> ser = md.Series(['foo 123', 'bar baz', mt.nan])
>>> ser.str.replace(r'[a-z]+', repl, regex=True).execute()
0 oof 123
1 rab zab
2 NaN
dtype: object
```
Using regex groups (extract second group and swap case):
```pycon
>>> pat = r"(?P<one>\w+) (?P<two>\w+) (?P<three>\w+)"
>>> repl = lambda m: m.group('two').swapcase()
>>> ser = md.Series(['One Two Three', 'Foo Bar Baz'])
>>> ser.str.replace(pat, repl, regex=True).execute()
0 tWO
1 bAR
dtype: object
```
Using a compiled regex with flags
```pycon
>>> import re
>>> regex_pat = re.compile(r'FUZ', flags=re.IGNORECASE)
>>> md.Series(['foo', 'fuz', mt.nan]).str.replace(regex_pat, 'bar', regex=True).execute()
0 foo
1 bar
2 NaN
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.rfind.md
# maxframe.dataframe.Series.str.rfind
#### Series.str.rfind(sub, start: [int](https://docs.python.org/3/library/functions.html#int) = 0, end=None)
Return highest indexes in each strings in the Series/Index.
Each of returned indexes corresponds to the position where the
substring is fully contained between [start:end]. Return -1 on
failure. Equivalent to standard [`str.rfind()`](https://docs.python.org/3/library/stdtypes.html#str.rfind).
* **Parameters:**
* **sub** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Substring being searched.
* **start** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Left edge index.
* **end** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Right edge index.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of int.
#### SEE ALSO
[`find`](maxframe.dataframe.Series.str.find.md#maxframe.dataframe.Series.str.find)
: Return lowest indexes in each strings.
### Examples
For Series.str.find:
```pycon
>>> import maxframe.dataframe as md
>>> ser = md.Series(["cow_", "duck_", "do_ve"])
>>> ser.str.find("_").execute()
0 3
1 4
2 2
dtype: int64
```
For Series.str.rfind:
```pycon
>>> ser = md.Series(["_cow_", "duck_", "do_v_e"])
>>> ser.str.rfind("_").execute()
0 4
1 4
2 4
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.rjust.md
# maxframe.dataframe.Series.str.rjust
#### Series.str.rjust(width: [int](https://docs.python.org/3/library/functions.html#int), fillchar: [str](https://docs.python.org/3/library/stdtypes.html#str) = ' ')
Pad left side of strings in the Series/Index.
Equivalent to [`str.rjust()`](https://docs.python.org/3/library/stdtypes.html#str.rjust).
* **Parameters:**
* **width** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Minimum width of resulting string; additional characters will be filled
with `fillchar`.
* **fillchar** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Additional character for filling, default is whitespace.
* **Return type:**
Series/Index of objects.
### Examples
For Series.str.center:
```pycon
>>> import maxframe.dataframe as md
>>> ser = md.Series(['dog', 'bird', 'mouse'])
>>> ser.str.center(8, fillchar='.').execute()
0 ..dog...
1 ..bird..
2 .mouse..
dtype: object
```
For Series.str.ljust:
```pycon
>>> ser = md.Series(['dog', 'bird', 'mouse'])
>>> ser.str.ljust(8, fillchar='.').execute()
0 dog.....
1 bird....
2 mouse...
dtype: object
```
For Series.str.rjust:
```pycon
>>> ser = md.Series(['dog', 'bird', 'mouse'])
>>> ser.str.rjust(8, fillchar='.').execute()
0 .....dog
1 ....bird
2 ...mouse
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.rstrip.md
# maxframe.dataframe.Series.str.rstrip
#### Series.str.rstrip(to_strip=None)
Remove trailing characters.
Strip whitespaces (including newlines) or a set of specified characters
from each string in the Series/Index from right side.
Replaces any non-strings in Series with NaNs.
Equivalent to [`str.rstrip()`](https://docs.python.org/3/library/stdtypes.html#str.rstrip).
* **Parameters:**
**to_strip** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *None* *,* *default None*) – Specifying the set of characters to be removed.
All combinations of this set of characters will be stripped.
If None then whitespaces are removed.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [object](https://docs.python.org/3/library/functions.html#object)
#### SEE ALSO
[`Series.str.strip`](maxframe.dataframe.Series.str.strip.md#maxframe.dataframe.Series.str.strip)
: Remove leading and trailing characters in Series/Index.
[`Series.str.lstrip`](maxframe.dataframe.Series.str.lstrip.md#maxframe.dataframe.Series.str.lstrip)
: Remove leading characters in Series/Index.
[`Series.str.rstrip`](#maxframe.dataframe.Series.str.rstrip)
: Remove trailing characters in Series/Index.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series(['1. Ant. ', '2. Bee!\n', '3. Cat?\t', mt.nan, 10, True])
>>> s.execute()
0 1. Ant.
1 2. Bee!\n
2 3. Cat?\t
3 NaN
4 10
5 True
dtype: object
```
```pycon
>>> s.str.strip().execute()
0 1. Ant.
1 2. Bee!
2 3. Cat?
3 NaN
4 NaN
5 NaN
dtype: object
```
```pycon
>>> s.str.lstrip('123.').execute()
0 Ant.
1 Bee!\n
2 Cat?\t
3 NaN
4 NaN
5 NaN
dtype: object
```
```pycon
>>> s.str.rstrip('.!? \n\t').execute()
0 1. Ant
1 2. Bee
2 3. Cat
3 NaN
4 NaN
5 NaN
dtype: object
```
```pycon
>>> s.str.strip('123.!? \n\t').execute()
0 Ant
1 Bee
2 Cat
3 NaN
4 NaN
5 NaN
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.slice.md
# maxframe.dataframe.Series.str.slice
#### Series.str.slice(start=None, stop=None, step=None)
Slice substrings from each element in the Series or Index.
* **Parameters:**
* **start** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Start position for slice operation.
* **stop** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Stop position for slice operation.
* **step** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Step size for slice operation.
* **Returns:**
Series or Index from sliced substring from original string object.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [object](https://docs.python.org/3/library/functions.html#object)
#### SEE ALSO
`Series.str.slice_replace`
: Replace a slice with a string.
`Series.str.get`
: Return element at position. Equivalent to Series.str.slice(start=i, stop=i+1) with i being the position.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(["koala", "dog", "chameleon"])
>>> s.execute()
0 koala
1 dog
2 chameleon
dtype: object
```
```pycon
>>> s.str.slice(start=1).execute()
0 oala
1 og
2 hameleon
dtype: object
```
```pycon
>>> s.str.slice(start=-1).execute()
0 a
1 g
2 n
dtype: object
```
```pycon
>>> s.str.slice(stop=2).execute()
0 ko
1 do
2 ch
dtype: object
```
```pycon
>>> s.str.slice(step=2).execute()
0 kaa
1 dg
2 caeen
dtype: object
```
```pycon
>>> s.str.slice(start=0, stop=5, step=3).execute()
0 kl
1 d
2 cm
dtype: object
```
Equivalent behaviour to:
```pycon
>>> s.str[0:5:3].execute()
0 kl
1 d
2 cm
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.startswith.md
# maxframe.dataframe.Series.str.startswith
#### Series.str.startswith(pat: [str](https://docs.python.org/3/library/stdtypes.html#str) | [tuple](https://docs.python.org/3/library/stdtypes.html#tuple)[[str](https://docs.python.org/3/library/stdtypes.html#str), ...], na: Scalar | [None](https://docs.python.org/3/library/constants.html#None) = None) → [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) | [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
Test if the start of each string element matches a pattern.
Equivalent to [`str.startswith()`](https://docs.python.org/3/library/stdtypes.html#str.startswith).
* **Parameters:**
* **pat** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *...* *]*) – Character sequence or tuple of strings. Regular expressions are not
accepted.
* **na** ([*object*](https://docs.python.org/3/library/functions.html#object) *,* *default NaN*) – Object shown if element tested is not a string. The default depends
on dtype of the array. For object-dtype, `numpy.nan` is used.
For `StringDtype`, `pandas.NA` is used.
* **Returns:**
A Series of booleans indicating whether the given pattern matches
the start of each string element.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`str.startswith`](https://docs.python.org/3/library/stdtypes.html#str.startswith)
: Python standard library string method.
[`Series.str.endswith`](maxframe.dataframe.Series.str.endswith.md#maxframe.dataframe.Series.str.endswith)
: Same as startswith, but tests the end of string.
[`Series.str.contains`](maxframe.dataframe.Series.str.contains.md#maxframe.dataframe.Series.str.contains)
: Tests if string element contains a pattern.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series(['bat', 'Bear', 'cat', mt.nan])
>>> s.execute()
0 bat
1 Bear
2 cat
3 NaN
dtype: object
```
```pycon
>>> s.str.startswith('b').execute()
0 True
1 False
2 False
3 NaN
dtype: object
```
```pycon
>>> s.str.startswith(('b', 'B')).execute()
0 True
1 True
2 False
3 NaN
dtype: object
```
Specifying na to be False instead of NaN.
```pycon
>>> s.str.startswith('b', na=False).execute()
0 True
1 False
2 False
3 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.strip.md
# maxframe.dataframe.Series.str.strip
#### Series.str.strip(to_strip=None)
Remove leading and trailing characters.
Strip whitespaces (including newlines) or a set of specified characters
from each string in the Series/Index from left and right sides.
Replaces any non-strings in Series with NaNs.
Equivalent to [`str.strip()`](https://docs.python.org/3/library/stdtypes.html#str.strip).
* **Parameters:**
**to_strip** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *None* *,* *default None*) – Specifying the set of characters to be removed.
All combinations of this set of characters will be stripped.
If None then whitespaces are removed.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [object](https://docs.python.org/3/library/functions.html#object)
#### SEE ALSO
[`Series.str.strip`](#maxframe.dataframe.Series.str.strip)
: Remove leading and trailing characters in Series/Index.
[`Series.str.lstrip`](maxframe.dataframe.Series.str.lstrip.md#maxframe.dataframe.Series.str.lstrip)
: Remove leading characters in Series/Index.
[`Series.str.rstrip`](maxframe.dataframe.Series.str.rstrip.md#maxframe.dataframe.Series.str.rstrip)
: Remove trailing characters in Series/Index.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series(['1. Ant. ', '2. Bee!\n', '3. Cat?\t', mt.nan, 10, True])
>>> s.execute()
0 1. Ant.
1 2. Bee!\n
2 3. Cat?\t
3 NaN
4 10
5 True
dtype: object
```
```pycon
>>> s.str.strip().execute()
0 1. Ant.
1 2. Bee!
2 3. Cat?
3 NaN
4 NaN
5 NaN
dtype: object
```
```pycon
>>> s.str.lstrip('123.').execute()
0 Ant.
1 Bee!\n
2 Cat?\t
3 NaN
4 NaN
5 NaN
dtype: object
```
```pycon
>>> s.str.rstrip('.!? \n\t').execute()
0 1. Ant
1 2. Bee
2 3. Cat
3 NaN
4 NaN
5 NaN
dtype: object
```
```pycon
>>> s.str.strip('123.!? \n\t').execute()
0 Ant
1 Bee
2 Cat
3 NaN
4 NaN
5 NaN
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.swapcase.md
# maxframe.dataframe.Series.str.swapcase
#### Series.str.swapcase()
Convert strings in the Series/Index to be swapcased.
Equivalent to [`str.swapcase()`](https://docs.python.org/3/library/stdtypes.html#str.swapcase).
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [object](https://docs.python.org/3/library/functions.html#object)
#### SEE ALSO
[`Series.str.lower`](maxframe.dataframe.Series.str.lower.md#maxframe.dataframe.Series.str.lower)
: Converts all characters to lowercase.
[`Series.str.upper`](maxframe.dataframe.Series.str.upper.md#maxframe.dataframe.Series.str.upper)
: Converts all characters to uppercase.
[`Series.str.title`](maxframe.dataframe.Series.str.title.md#maxframe.dataframe.Series.str.title)
: Converts first character of each word to uppercase and remaining to lowercase.
[`Series.str.capitalize`](maxframe.dataframe.Series.str.capitalize.md#maxframe.dataframe.Series.str.capitalize)
: Converts first character to uppercase and remaining to lowercase.
[`Series.str.swapcase`](#maxframe.dataframe.Series.str.swapcase)
: Converts uppercase to lowercase and lowercase to uppercase.
`Series.str.casefold`
: Removes all case distinctions in the string.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
>>> s.execute()
0 lower
1 CAPITALS
2 this is a sentence
3 SwApCaSe
dtype: object
```
```pycon
>>> s.str.lower().execute()
0 lower
1 capitals
2 this is a sentence
3 swapcase
dtype: object
```
```pycon
>>> s.str.upper().execute()
0 LOWER
1 CAPITALS
2 THIS IS A SENTENCE
3 SWAPCASE
dtype: object
```
```pycon
>>> s.str.title().execute()
0 Lower
1 Capitals
2 This Is A Sentence
3 Swapcase
dtype: object
```
```pycon
>>> s.str.capitalize().execute()
0 Lower
1 Capitals
2 This is a sentence
3 Swapcase
dtype: object
```
```pycon
>>> s.str.swapcase().execute()
0 LOWER
1 capitals
2 THIS IS A SENTENCE
3 sWaPcAsE
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.title.md
# maxframe.dataframe.Series.str.title
#### Series.str.title()
Convert strings in the Series/Index to titlecase.
Equivalent to [`str.title()`](https://docs.python.org/3/library/stdtypes.html#str.title).
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [object](https://docs.python.org/3/library/functions.html#object)
#### SEE ALSO
[`Series.str.lower`](maxframe.dataframe.Series.str.lower.md#maxframe.dataframe.Series.str.lower)
: Converts all characters to lowercase.
[`Series.str.upper`](maxframe.dataframe.Series.str.upper.md#maxframe.dataframe.Series.str.upper)
: Converts all characters to uppercase.
[`Series.str.title`](#maxframe.dataframe.Series.str.title)
: Converts first character of each word to uppercase and remaining to lowercase.
[`Series.str.capitalize`](maxframe.dataframe.Series.str.capitalize.md#maxframe.dataframe.Series.str.capitalize)
: Converts first character to uppercase and remaining to lowercase.
[`Series.str.swapcase`](maxframe.dataframe.Series.str.swapcase.md#maxframe.dataframe.Series.str.swapcase)
: Converts uppercase to lowercase and lowercase to uppercase.
`Series.str.casefold`
: Removes all case distinctions in the string.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
>>> s.execute()
0 lower
1 CAPITALS
2 this is a sentence
3 SwApCaSe
dtype: object
```
```pycon
>>> s.str.lower().execute()
0 lower
1 capitals
2 this is a sentence
3 swapcase
dtype: object
```
```pycon
>>> s.str.upper().execute()
0 LOWER
1 CAPITALS
2 THIS IS A SENTENCE
3 SWAPCASE
dtype: object
```
```pycon
>>> s.str.title().execute()
0 Lower
1 Capitals
2 This Is A Sentence
3 Swapcase
dtype: object
```
```pycon
>>> s.str.capitalize().execute()
0 Lower
1 Capitals
2 This is a sentence
3 Swapcase
dtype: object
```
```pycon
>>> s.str.swapcase().execute()
0 LOWER
1 capitals
2 THIS IS A SENTENCE
3 sWaPcAsE
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.translate.md
# maxframe.dataframe.Series.str.translate
#### Series.str.translate(table)
Map all characters in the string through the given mapping table.
Equivalent to standard [`str.translate()`](https://docs.python.org/3/library/stdtypes.html#str.translate).
* **Parameters:**
**table** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) – Table is a mapping of Unicode ordinals to Unicode ordinals, strings, or
None. Unmapped characters are left untouched.
Characters mapped to None are deleted. [`str.maketrans()`](https://docs.python.org/3/library/stdtypes.html#str.maketrans) is a
helper function for making translation tables.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> ser = md.Series(["El niño", "Françoise"])
>>> mytable = str.maketrans({'ñ': 'n', 'ç': 'c'})
>>> ser.str.translate(mytable).execute()
0 El nino
1 Francoise
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.upper.md
# maxframe.dataframe.Series.str.upper
#### Series.str.upper()
Convert strings in the Series/Index to uppercase.
Equivalent to [`str.upper()`](https://docs.python.org/3/library/stdtypes.html#str.upper).
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index) of [object](https://docs.python.org/3/library/functions.html#object)
#### SEE ALSO
[`Series.str.lower`](maxframe.dataframe.Series.str.lower.md#maxframe.dataframe.Series.str.lower)
: Converts all characters to lowercase.
[`Series.str.upper`](#maxframe.dataframe.Series.str.upper)
: Converts all characters to uppercase.
[`Series.str.title`](maxframe.dataframe.Series.str.title.md#maxframe.dataframe.Series.str.title)
: Converts first character of each word to uppercase and remaining to lowercase.
[`Series.str.capitalize`](maxframe.dataframe.Series.str.capitalize.md#maxframe.dataframe.Series.str.capitalize)
: Converts first character to uppercase and remaining to lowercase.
[`Series.str.swapcase`](maxframe.dataframe.Series.str.swapcase.md#maxframe.dataframe.Series.str.swapcase)
: Converts uppercase to lowercase and lowercase to uppercase.
`Series.str.casefold`
: Removes all case distinctions in the string.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
>>> s.execute()
0 lower
1 CAPITALS
2 this is a sentence
3 SwApCaSe
dtype: object
```
```pycon
>>> s.str.lower().execute()
0 lower
1 capitals
2 this is a sentence
3 swapcase
dtype: object
```
```pycon
>>> s.str.upper().execute()
0 LOWER
1 CAPITALS
2 THIS IS A SENTENCE
3 SWAPCASE
dtype: object
```
```pycon
>>> s.str.title().execute()
0 Lower
1 Capitals
2 This Is A Sentence
3 Swapcase
dtype: object
```
```pycon
>>> s.str.capitalize().execute()
0 Lower
1 Capitals
2 This is a sentence
3 Swapcase
dtype: object
```
```pycon
>>> s.str.swapcase().execute()
0 LOWER
1 capitals
2 THIS IS A SENTENCE
3 sWaPcAsE
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.str.zfill.md
# maxframe.dataframe.Series.str.zfill
#### Series.str.zfill(width: [int](https://docs.python.org/3/library/functions.html#int))
Pad strings in the Series/Index by prepending ‘0’ characters.
Strings in the Series/Index are padded with ‘0’ characters on the
left of the string to reach a total string length width. Strings
in the Series/Index with length greater or equal to width are
unchanged.
* **Parameters:**
**width** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Minimum length of resulting string; strings with length less
than width be prepended with ‘0’ characters.
* **Return type:**
Series/Index of objects.
#### SEE ALSO
[`Series.str.rjust`](maxframe.dataframe.Series.str.rjust.md#maxframe.dataframe.Series.str.rjust)
: Fills the left side of strings with an arbitrary character.
[`Series.str.ljust`](maxframe.dataframe.Series.str.ljust.md#maxframe.dataframe.Series.str.ljust)
: Fills the right side of strings with an arbitrary character.
[`Series.str.pad`](maxframe.dataframe.Series.str.pad.md#maxframe.dataframe.Series.str.pad)
: Fills the specified sides of strings with an arbitrary character.
`Series.str.center`
: Fills both sides of strings with an arbitrary character.
### Notes
Differs from [`str.zfill()`](https://docs.python.org/3/library/stdtypes.html#str.zfill) which has special handling
for ‘+’/’-’ in the string.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series(['-1', '1', '1000', 10, mt.nan])
>>> s.execute()
0 -1
1 1
2 1000
3 10
4 NaN
dtype: object
```
Note that `10` and `NaN` are not strings, therefore they are
converted to `NaN`. The minus sign in `'-1'` is treated as a
special character and the zero is added to the right of it
([`str.zfill()`](https://docs.python.org/3/library/stdtypes.html#str.zfill) would have moved it to the left). `1000`
remains unchanged as it is longer than width.
```pycon
>>> s.str.zfill(3).execute()
0 -01
1 001
2 1000
3 NaN
4 NaN
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.struct.dtypes.md
# maxframe.dataframe.Series.struct.dtypes
#### Series.struct.dtypes()
Return the dtype object of each child field of the struct.
* **Returns:**
The data type of each child field.
* **Return type:**
[pandas.Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> import pandas as pd
>>> import pyarrow as pa
>>> s = md.Series(
... [
... {"version": 1, "project": "pandas"},
... {"version": 2, "project": "pandas"},
... {"version": 1, "project": "numpy"},
... ],
... dtype=pd.ArrowDtype(pa.struct(
... [("version", pa.int64()), ("project", pa.string())]
... ))
... )
>>> s.struct.dtypes.execute()
version int64[pyarrow]
project string[pyarrow]
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.struct.field.md
# maxframe.dataframe.Series.struct.field
#### Series.struct.field(name_or_index)
Extract a child field of a struct as a Series.
* **Parameters:**
**name_or_index** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *|* [*bytes*](https://docs.python.org/3/library/stdtypes.html#bytes) *|* [*int*](https://docs.python.org/3/library/functions.html#int) *|* *expression* *|* [*list*](https://docs.python.org/3/library/stdtypes.html#list)) –
Name or index of the child field to extract.
For list-like inputs, this will index into a nested
struct.
* **Returns:**
The data corresponding to the selected child field.
* **Return type:**
[pandas.Series](https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series)
#### SEE ALSO
`Series.struct.explode`
: Return all child fields as a DataFrame.
### Notes
The name of the resulting Series will be set using the following
rules:
- For string, bytes, or integer name_or_index (or a list of these, for
a nested selection), the Series name is set to the selected
field’s name.
- For a `pyarrow.compute.Expression`, this is set to
the string form of the expression.
- For list-like name_or_index, the name will be set to the
name of the final field selected.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> import pandas as pd
>>> import pyarrow as pa
>>> s = md.Series(
... [
... {"version": 1, "project": "pandas"},
... {"version": 2, "project": "pandas"},
... {"version": 1, "project": "numpy"},
... ],
... dtype=pd.ArrowDtype(pa.struct(
... [("version", pa.int64()), ("project", pa.string())]
... ))
... )
```
Extract by field name.
```pycon
>>> s.struct.field("project").execute()
0 pandas
1 pandas
2 numpy
Name: project, dtype: string[pyarrow]
```
Extract by field index.
```pycon
>>> s.struct.field(0).execute()
0 1
1 2
2 1
Name: version, dtype: int64[pyarrow]
```
For nested struct types, you can pass a list of values to index
multiple levels:
```pycon
>>> version_type = pa.struct([
... ("major", pa.int64()),
... ("minor", pa.int64()),
... ])
>>> s = md.Series(
... [
... {"version": {"major": 1, "minor": 5}, "project": "pandas"},
... {"version": {"major": 2, "minor": 1}, "project": "pandas"},
... {"version": {"major": 1, "minor": 26}, "project": "numpy"},
... ],
... dtype=pd.ArrowDtype(pa.struct(
... [("version", version_type), ("project", pa.string())]
... ))
... )
>>> s.struct.field(["version", "minor"]).execute()
0 5
1 1
2 26
Name: minor, dtype: int64[pyarrow]
>>> s.struct.field([0, 0]).execute()
0 1
1 2
2 1
Name: major, dtype: int64[pyarrow]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.sub.md
# maxframe.dataframe.Series.sub
#### Series.sub(other, level=None, fill_value=None, axis=0)
Return Subtraction of series and other, element-wise (binary operator subtract).
Equivalent to `series - other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
`Series.rsubtract`
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.subtract(b, fill_value=0).execute()
a 0.0
b 1.0
c 1.0
d -1.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.sum.md
# maxframe.dataframe.Series.sum
#### Series.sum(axis=None, skipna=True, level=None, min_count=0, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.swaplevel.md
# maxframe.dataframe.Series.swaplevel
#### Series.swaplevel(i=-2, j=-1)
Swap levels i and j in a `MultiIndex`.
Default is to swap the two innermost levels of the index.
* **Parameters:**
* **i** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Levels of the indices to be swapped. Can pass level name as string.
* **j** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Levels of the indices to be swapped. Can pass level name as string.
* **Returns:**
Series with levels swapped in MultiIndex.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(
... ["A", "B", "A", "C"],
... index=[
... ["Final exam", "Final exam", "Coursework", "Coursework"],
... ["History", "Geography", "History", "Geography"],
... ["January", "February", "March", "April"],
... ],
... )
>>> s.execute()
Final exam History January A
Geography February B
Coursework History March A
Geography April C
dtype: object
```
In the following example, we will swap the levels of the indices.
Here, we will swap the levels column-wise, but levels can be swapped row-wise
in a similar manner. Note that column-wise is the default behaviour.
By not supplying any arguments for i and j, we swap the last and second to
last indices.
```pycon
>>> s.swaplevel().execute()
Final exam January History A
February Geography B
Coursework March History A
April Geography C
dtype: object
```
By supplying one argument, we can choose which index to swap the last
index with. We can for example swap the first index with the last one as
follows.
```pycon
>>> s.swaplevel(0).execute()
January History Final exam A
February Geography Final exam B
March History Coursework A
April Geography Coursework C
dtype: object
```
We can also define explicitly which indices we want to swap by supplying values
for both i and j. Here, we for example swap the first and second indices.
```pycon
>>> s.swaplevel(0, 1).execute()
History Final exam January A
Geography Final exam February B
History Coursework March A
Geography Coursework April C
dtype: object
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.take.md
# maxframe.dataframe.Series.take
#### Series.take(indices, axis=0, \*\*kwargs)
Return the elements in the given *positional* indices along an axis.
This means that we are not indexing according to actual values in
the index attribute of the object. We are indexing according to the
actual position of the element in the object.
* **Parameters:**
* **indices** (*array-like*) – An array of ints indicating which positions to take.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'* *,* *None}* *,* *default 0*) – The axis on which to select elements. `0` means that we are
selecting rows, `1` means that we are selecting columns.
For Series this parameter is unused and defaults to 0.
* **\*\*kwargs** – For compatibility with `numpy.take()`. Has no effect on the
output.
* **Returns:**
An array-like containing the elements taken from the object.
* **Return type:**
same type as caller
#### SEE ALSO
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Select a subset of a DataFrame by labels.
[`DataFrame.iloc`](maxframe.dataframe.DataFrame.iloc.md#maxframe.dataframe.DataFrame.iloc)
: Select a subset of a DataFrame by positions.
[`numpy.take`](https://numpy.org/doc/stable/reference/generated/numpy.take.html#numpy.take)
: Take elements from an array along an axis.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([('falcon', 'bird', 389.0),
... ('parrot', 'bird', 24.0),
... ('lion', 'mammal', 80.5),
... ('monkey', 'mammal', mt.nan)],
... columns=['name', 'class', 'max_speed'],
... index=[0, 2, 3, 1])
>>> df.execute()
name class max_speed
0 falcon bird 389.0
2 parrot bird 24.0
3 lion mammal 80.5
1 monkey mammal NaN
```
Take elements at positions 0 and 3 along the axis 0 (default).
Note how the actual indices selected (0 and 1) do not correspond to
our selected indices 0 and 3. That’s because we are selecting the 0th
and 3rd rows, not rows whose indices equal 0 and 3.
```pycon
>>> df.take([0, 3]).execute()
name class max_speed
0 falcon bird 389.0
1 monkey mammal NaN
```
Take elements at indices 1 and 2 along the axis 1 (column selection).
```pycon
>>> df.take([1, 2], axis=1).execute()
class max_speed
0 bird 389.0
2 bird 24.0
3 mammal 80.5
1 mammal NaN
```
We may take elements using negative integers for positive indices,
starting from the end of the object, just like with Python lists.
```pycon
>>> df.take([-1, -2]).execute()
name class max_speed
1 monkey mammal NaN
3 lion mammal 80.5
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.to_csv.md
# maxframe.dataframe.Series.to_csv
#### Series.to_csv(path, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', lineterminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.', partition_cols=None, storage_options=None, \*\*kw)
Write object to a comma-separated values (csv) file.
* **Parameters:**
* **path** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – File path.
If path is a string with wildcard e.g. ‘/to/path/out-
```
*
```
.csv’,
to_csv will try to write multiple files, for instance,
chunk (0, 0) will write data into ‘/to/path/out-0.csv’.
If path is a string without wildcard,
all data will be written into a single file.
* **sep** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '* *,* *'*) – String of length 1. Field delimiter for the output file.
* **na_rep** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default ''*) – Missing data representation.
* **float_format** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Format string for floating point numbers.
* **columns** (*sequence* *,* *optional*) – Columns to write.
* **header** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default True*) – Write out the column names. If a list of strings is given it is
assumed to be aliases for the column names.
* **index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Write row names (index).
* **index_label** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *sequence* *, or* *False* *,* *default None*) – Column label for index column(s) if desired. If None is given, and
header and index are True, then the index names are used. A
sequence should be given if the object uses MultiIndex. If
False do not print fields for index names. Use index_label=False
for easier importing in R.
* **mode** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Python write mode, default ‘w’.
* **encoding** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – A string representing the encoding to use in the output file,
defaults to ‘utf-8’.
* **compression** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *default 'infer'*) – If str, represents compression mode. If dict, value at ‘method’ is
the compression mode. Compression mode may be any of the following
possible values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If
compression mode is ‘infer’ and path_or_buf is path-like, then
detect compression mode from the following extensions: ‘.gz’,
‘.bz2’, ‘.zip’ or ‘.xz’. (otherwise no compression). If dict given
and mode is ‘zip’ or inferred as ‘zip’, other entries passed as
additional compression options.
* **quoting** (*optional constant from csv module*) – Defaults to csv.QUOTE_MINIMAL. If you have set a float_format
then floats are converted to strings and thus csv.QUOTE_NONNUMERIC
will treat them as non-numeric.
* **quotechar** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '"'*) – String of length 1. Character used to quote fields.
* **lineterminator** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – The newline character or character sequence to use in the output
file. Defaults to os.linesep, which depends on the OS in which
this method is called (’n’ for linux, ‘rn’ for Windows, i.e.).
* **chunksize** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *None*) – Rows to write at a time.
* **date_format** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Format string for datetime objects.
* **doublequote** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Control quoting of quotechar inside a field.
* **escapechar** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – String of length 1. Character used to escape sep and quotechar
when appropriate.
* **decimal** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '.'*) – Character recognized as decimal separator. E.g. use ‘,’ for
European data.
* **partition_cols** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *optional* *,* *default None*) – Column names by which to partition the dataset.
Columns are partitioned in the order they are given.
* **Returns:**
If path_or_buf is None, returns the resulting csv format as a
string. Otherwise returns None.
* **Return type:**
None or [str](https://docs.python.org/3/library/stdtypes.html#str)
#### SEE ALSO
[`read_csv`](maxframe.dataframe.read_csv.md#maxframe.dataframe.read_csv)
: Load a CSV file into a DataFrame.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'name': ['Raphael', 'Donatello'],
... 'mask': ['red', 'purple'],
... 'weapon': ['sai', 'bo staff']})
>>> df.to_csv('out.csv', index=False).execute()
>>> # Write partitioned dataset
>>> df.to_csv('dataset', partition_cols=['mask']).execute()
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.to_dict.md
# maxframe.dataframe.Series.to_dict
#### Series.to_dict(into=<class 'dict'>, batch_size=10000, session=None)
Convert Series to {label -> value} dict or dict-like object.
* **Parameters:**
**into** (*class* *,* *default dict*) – The collections.abc.Mapping subclass to use as the return
object. Can be the actual class or an empty
instance of the mapping type you want. If you want a
collections.defaultdict, you must pass it initialized.
* **Returns:**
Key-value representation of Series.
* **Return type:**
[collections.abc.Mapping](https://docs.python.org/3/library/collections.abc.html#collections.abc.Mapping)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3, 4])
>>> s.to_dict()
{0: 1, 1: 2, 2: 3, 3: 4}
>>> from collections import OrderedDict, defaultdict
>>> s.to_dict(OrderedDict)
OrderedDict([(0, 1), (1, 2), (2, 3), (3, 4)])
>>> dd = defaultdict(list)
>>> s.to_dict(dd)
defaultdict(<class 'list'>, {0: 1, 1: 2, 2: 3, 3: 4})
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.to_frame.md
# maxframe.dataframe.Series.to_frame
#### Series.to_frame(name=None)
Convert Series to DataFrame.
* **Parameters:**
**name** ([*object*](https://docs.python.org/3/library/functions.html#object) *,* *default None*) – The passed name should substitute for the series name (if it has
one).
* **Returns:**
DataFrame representation of Series.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(["a", "b", "c"], name="vals")
>>> s.to_frame().execute()
vals
0 a
1 b
2 c
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.to_json.md
# maxframe.dataframe.Series.to_json
#### Series.to_json(path: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, orient: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, date_format: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, double_precision: [int](https://docs.python.org/3/library/functions.html#int) = 10, force_ascii: [bool](https://docs.python.org/3/library/functions.html#bool) = True, date_unit: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = 'ms', default_handler: callable | [None](https://docs.python.org/3/library/constants.html#None) = None, lines: [bool](https://docs.python.org/3/library/functions.html#bool) = False, compression: [str](https://docs.python.org/3/library/stdtypes.html#str) | [Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)] | [None](https://docs.python.org/3/library/constants.html#None) = 'infer', index: [bool](https://docs.python.org/3/library/functions.html#bool) | [None](https://docs.python.org/3/library/constants.html#None) = None, indent: [int](https://docs.python.org/3/library/functions.html#int) | [None](https://docs.python.org/3/library/constants.html#None) = None, storage_options: [Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)] | [None](https://docs.python.org/3/library/constants.html#None) = None, partition_cols: [str](https://docs.python.org/3/library/stdtypes.html#str) | [list](https://docs.python.org/3/library/stdtypes.html#list) | [None](https://docs.python.org/3/library/constants.html#None) = None, \*\*kwargs)
Convert the object to a JSON string.
Note NaN’s and None will be converted to null and datetime objects
will be converted to UNIX timestamps.
* **Parameters:**
* **path** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *path object* *,* *file-like object* *, or* *None* *,* *default None*) – String, path object (implementing os.PathLike[str]), or file-like
object implementing a write() function. If None, the result is
returned as a string.
* **orient** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) –
Indication of expected JSON string format.
* Series:
> - default is ‘index’
> - allowed values are: {‘split’, ‘records’, ‘index’, ‘table’}.
* DataFrame:
> - default is ‘columns’
> - allowed values are: {‘split’, ‘records’, ‘index’, ‘columns’,
> ‘values’, ‘table’}.
* The format of the JSON string:
> - ’split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns],
> ‘data’ -> [values]}
> - ’records’ : list like [{column -> value}, … , {column -> value}]
> - ’index’ : dict like {index -> {column -> value}}
> - ’columns’ : dict like {column -> {index -> value}}
> - ’values’ : just the values array
> - ’table’ : dict like {‘schema’: {schema}, ‘data’: {data}}
> Describing the data, where data component is like `orient='records'`.
* **date_format** ( *{None* *,* *'epoch'* *,* *'iso'}*) – Type of date conversion. ‘epoch’ = epoch milliseconds,
‘iso’ = ISO8601. The default depends on the orient. For
`orient='table'`, the default is ‘iso’. For all other orients,
the default is ‘epoch’.
* **double_precision** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 10*) – The number of decimal places to use when encoding
floating point numbers.
* **force_ascii** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Force encoded string to be ASCII.
* **date_unit** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 'ms'* *(**milliseconds* *)*) – The time unit to encode to, governs timestamp and ISO8601
precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond,
microsecond, and nanosecond respectively.
* **default_handler** (*callable* *,* *default None*) – Handler to call if object cannot otherwise be converted to a
suitable format for JSON. Should receive a single argument which is
the object to convert and return a serializable object.
* **lines** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If ‘orient’ is ‘records’ write out line-delimited json format. Will
throw ValueError if incorrect ‘orient’ is used.
* **compression** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *default 'infer'*) – For on-the-fly compression of the output data. If str, represents
compression mode. If dict, value at ‘method’ is the compression mode.
Compression mode may be any of the following possible
values: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression
mode is ‘infer’ and path_or_buf is path-like, then detect
compression mode from the following extensions: ‘.gz’, ‘.bz2’,
‘.zip’ or ‘.xz’. (otherwise no compression). If dict given and
mode is one of {‘zip’, ‘xz’}, other entries passed as
additional compression options.
* **index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default None*) – Whether to include the index values in the JSON string. Not
including the index (`index=False`) is only supported when
orient is ‘split’ or ‘table’.
* **indent** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Length of whitespace used to indent each record.
* **partition_cols** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *optional* *,* *default None*) – Column names by which to partition the dataset.
Columns are partitioned in the order they are given.
#### SEE ALSO
[`read_json`](maxframe.dataframe.read_json.md#maxframe.dataframe.read_json)
: Convert a JSON string to pandas object.
### Notes
The behavior of `indent=0` varies from the stdlib, which does not
indent the output but does insert newlines. Currently, `indent=0`
and the default `indent=None` are equivalent in pandas, though this
may change in a future release.
`orient='table'` contains a ‘pandas_version’ field under ‘schema’.
This stores the version of pandas used in the latest revision of the
schema.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([['a', 'b'], ['c', 'd']],
... index=['row 1', 'row 2'],
... columns=['col 1', 'col 2'])
>>> df.to_json('data.json')
>>> # Writing to a file with orient='records'
>>> df.to_json('records.json', orient='records')
>>> # Writing in line-delimited json format
>>> df.to_json('ldjson.json', orient='records', lines=True)
>>> # Write partitioned dataset
>>> df.to_json('dataset', partition_cols=['col 1'])
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.to_list.md
# maxframe.dataframe.Series.to_list
#### Series.to_list(batch_size=10000, session=None)
Return a list of the values.
These are each a scalar type, which is a Python scalar
(for str, int, float) or a pandas scalar
(for Timestamp/Timedelta/Interval/Period)
* **Return type:**
[list](https://docs.python.org/3/library/stdtypes.html#list)
#### SEE ALSO
[`numpy.ndarray.tolist`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.tolist.html#numpy.ndarray.tolist)
: Return the array as an a.ndim-levels deep nested list of Python scalars.
### Examples
For Series
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3])
>>> s.to_list()
[1, 2, 3]
```
For Index:
```pycon
>>> idx = md.Index([1, 2, 3])
>>> idx.execute()
Index([1, 2, 3], dtype='int64')
```
```pycon
>>> idx.to_list()
[1, 2, 3]
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.transform.md
# maxframe.dataframe.Series.transform
#### Series.transform(func, convert_dtype=True, axis=0, \*args, skip_infer=False, dtype=None, \*\*kwargs)
Call `func` on self producing a Series with transformed values.
Produced Series will have same axis length as self.
* **Parameters:**
* **func** (*function* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict))
* **function** ( *-*)
* **either** (*must*)
* **Series.apply.** (*work when passed a Series* *or* *when passed to*)
* **are** (*Accepted combinations*)
* **function**
* **name** ( *- string function*)
* **names** ( *- list* *of* *functions and/or function*)
* **'sqrt'****]** (*e.g.* *[**np.exp.*)
* **functions** ( *- dict* *of* *axis labels ->*)
* **such.** (*function names* *or* *list of*)
* **axis** ( *{0* *or* *'index'}*) – Parameter needed for compatibility with DataFrame.
* **dtype** ([*numpy.dtype*](https://numpy.org/doc/stable/reference/generated/numpy.dtype.html#numpy.dtype) *,* *default None*) – Specify dtypes of returned DataFrames. See Notes for more details.
* **skip_infer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether infer dtypes when dtypes or output_type is not specified.
* **\*args** – Positional arguments to pass to func.
* **\*\*kwargs** – Keyword arguments to pass to func.
* **Returns:**
* *Series*
* *A Series that must have the same length as self.*
:raises ValueError : If the returned Series has a different length than self.:
#### SEE ALSO
[`Series.agg`](maxframe.dataframe.Series.agg.md#maxframe.dataframe.Series.agg)
: Only perform aggregating type operations.
[`Series.apply`](maxframe.dataframe.Series.apply.md#maxframe.dataframe.Series.apply)
: Invoke function on a Series.
### Notes
When deciding output dtypes and shape of the return value, MaxFrame will
try applying `func` onto a mock Series, and the transform call may
fail. When this happens, you need to specify `dtype` of output
Series.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A': range(3), 'B': range(1, 4)})
>>> df.execute()
A B
0 0 1
1 1 2
2 2 3
>>> df.transform(lambda x: x + 1).execute()
A B
0 1 2
1 2 3
2 3 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.truediv.md
# maxframe.dataframe.Series.truediv
#### Series.truediv(other, level=None, fill_value=None, axis=0)
Return Floating division of series and other, element-wise (binary operator truediv).
Equivalent to `series / other`, but with support to substitute a fill_value for
missing data in one of the inputs.
* **Parameters:**
* **other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *scalar value*)
* **fill_value** (*None* *or* *float value* *,* *default None* *(**NaN* *)*) – Fill existing missing (NaN) values, and any new element needed for
successful Series alignment, with this value before computation.
If data in both corresponding Series locations is missing
the result will be missing.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *name*) – Broadcast across a level, matching Index values on the
passed MultiIndex level.
* **Returns:**
The result of the operation.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.rtruediv`](maxframe.dataframe.Series.rtruediv.md#maxframe.dataframe.Series.rtruediv)
### Examples
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> a = md.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
>>> a.execute()
a 1.0
b 1.0
c 1.0
d NaN
dtype: float64
```
```pycon
>>> b = md.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
>>> b.execute()
a 1.0
b NaN
d 1.0
e NaN
dtype: float64
```
```pycon
>>> a.truediv(b, fill_value=0).execute()
a 1.0
b inf
c inf
d 0.0
e NaN
dtype: float64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.truncate.md
# maxframe.dataframe.Series.truncate
#### Series.truncate(before=None, after=None, axis=0, copy=None)
Truncate a Series or DataFrame before and after some index value.
This is a useful shorthand for boolean indexing based on index
values above or below certain thresholds.
* **Parameters:**
* **before** (*date* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*int*](https://docs.python.org/3/library/functions.html#int)) – Truncate all rows before this index value.
* **after** (*date* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*int*](https://docs.python.org/3/library/functions.html#int)) – Truncate all rows after this index value.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *optional*) – Axis to truncate. Truncates the index (rows) by default.
For Series this parameter is unused and defaults to 0.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default is True* *,*) – This parameter is only kept for compatibility with pandas.
* **Returns:**
The truncated Series or DataFrame.
* **Return type:**
[type](https://docs.python.org/3/library/functions.html#type) of caller
#### SEE ALSO
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Select a subset of a DataFrame by label.
[`DataFrame.iloc`](maxframe.dataframe.DataFrame.iloc.md#maxframe.dataframe.DataFrame.iloc)
: Select a subset of a DataFrame by position.
### Notes
If the index being truncated contains only datetime values,
before and after may be specified as strings instead of
Timestamps.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
... 'B': ['f', 'g', 'h', 'i', 'j'],
... 'C': ['k', 'l', 'm', 'n', 'o']},
... index=[1, 2, 3, 4, 5])
>>> df.execute()
A B C
1 a f k
2 b g l
3 c h m
4 d i n
5 e j o
```
```pycon
>>> df.truncate(before=2, after=4).execute()
A B C
2 b g l
3 c h m
4 d i n
```
The columns of a DataFrame can be truncated.
```pycon
>>> df.truncate(before="A", after="B", axis="columns").execute()
A B
1 a f
2 b g
3 c h
4 d i
5 e j
```
For Series, only rows can be truncated.
```pycon
>>> df['A'].truncate(before=2, after=4).execute()
2 b
3 c
4 d
Name: A, dtype: object
```
The index values in `truncate` can be datetimes or string
dates.
```pycon
>>> dates = md.date_range('2016-01-01', '2016-02-01', freq='s')
>>> df = md.DataFrame(index=dates, data={'A': 1})
>>> df.tail().execute()
A
2016-01-31 23:59:56 1
2016-01-31 23:59:57 1
2016-01-31 23:59:58 1
2016-01-31 23:59:59 1
2016-02-01 00:00:00 1
```
```pycon
>>> df.truncate(before=md.Timestamp('2016-01-05'),
... after=md.Timestamp('2016-01-10')).tail().execute()
A
2016-01-09 23:59:56 1
2016-01-09 23:59:57 1
2016-01-09 23:59:58 1
2016-01-09 23:59:59 1
2016-01-10 00:00:00 1
```
Because the index is a DatetimeIndex containing only dates, we can
specify before and after as strings. They will be coerced to
Timestamps before truncation.
```pycon
>>> df.truncate('2016-01-05', '2016-01-10').tail().execute()
A
2016-01-09 23:59:56 1
2016-01-09 23:59:57 1
2016-01-09 23:59:58 1
2016-01-09 23:59:59 1
2016-01-10 00:00:00 1
```
Note that `truncate` assumes a 0 value for any unspecified time
component (midnight). This differs from partial string slicing, which
returns any partially matching dates.
```pycon
>>> df.loc['2016-01-05':'2016-01-10', :].tail().execute()
A
2016-01-10 23:59:55 1
2016-01-10 23:59:56 1
2016-01-10 23:59:57 1
2016-01-10 23:59:58 1
2016-01-10 23:59:59 1
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.tshift.md
# maxframe.dataframe.Series.tshift
#### Series.tshift(periods: [int](https://docs.python.org/3/library/functions.html#int) = 1, freq=None, axis=0)
Shift the time index, using the index’s frequency if available.
* **Parameters:**
* **periods** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Number of periods to move, can be positive or negative.
* **freq** (*DateOffset* *,* *timedelta* *, or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Increment to use from the tseries module
or time rule expressed as a string (e.g. ‘EOM’).
* **axis** ( *{0* *or* *‘index’* *,* *1* *or* *‘columns’* *,* *None}* *,* *default 0*) – Corresponds to the axis that contains the Index.
* **Returns:**
**shifted**
* **Return type:**
Series/DataFrame
### Notes
If freq is not specified then tries to use the freq or inferred_freq
attributes of the index. If neither of those attributes exist, a
ValueError is thrown
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.unique.md
# maxframe.dataframe.Series.unique
#### Series.unique(method='tree')
Uniques are returned in order of appearance. This does NOT sort.
* **Parameters:**
* **values** (*1d array-like*)
* **method** ( *'shuffle'* *or* *'tree'* *,* *'tree' method provide a better performance* *,* *'shuffle'*)
* **large.** (*is recommended if the number* *of* *unique values is very*)
#### SEE ALSO
`Index.unique`, [`Series.unique`](#maxframe.dataframe.Series.unique)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> import pandas as pd
>>> md.unique(md.Series([2, 1, 3, 3])).execute()
array([2, 1, 3])
```
```pycon
>>> md.unique(md.Series([2] + [1] * 5)).execute()
array([2, 1])
```
```pycon
>>> md.unique(md.Series([pd.Timestamp('20160101'),
... pd.Timestamp('20160101')])).execute()
array(['2016-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
```
```pycon
>>> md.unique(md.Series([pd.Timestamp('20160101', tz='US/Eastern'),
... pd.Timestamp('20160101', tz='US/Eastern')])).execute()
array([Timestamp('2016-01-01 00:00:00-0500', tz='US/Eastern')],
dtype=object)
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.unstack.md
# maxframe.dataframe.Series.unstack
#### Series.unstack(level=-1, fill_value=None)
Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.
* **Parameters:**
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *, or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *these* *,* *default last level*) – Level(s) to unstack, can pass level name.
* **fill_value** (*scalar value* *,* *default None*) – Value to use when replacing NaN values.
* **Returns:**
Unstacked Series.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3, 4],
... index=md.MultiIndex.from_product([['one', 'two'],
... ['a', 'b']]))
>>> s.execute()
one a 1
b 2
two a 3
b 4
dtype: int64
```
```pycon
>>> s.unstack(level=-1).execute()
a b
one 1 2
two 3 4
```
```pycon
>>> s.unstack(level=0).execute()
one two
a 1 3
b 2 4
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.update.md
# maxframe.dataframe.Series.update
#### Series.update(other)
Modify Series in place using values from passed Series.
Uses non-NA values from passed Series to make updates. Aligns
on index.
* **Parameters:**
**other** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* *object coercible into Series*)
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series([1, 2, 3])
>>> s.update(md.Series([4, 5, 6]))
>>> s.execute()
0 4
1 5
2 6
dtype: int64
```
```pycon
>>> s = md.Series(['a', 'b', 'c'])
>>> s.update(md.Series(['d', 'e'], index=[0, 2]))
>>> s.execute()
0 d
1 b
2 e
dtype: object
```
```pycon
>>> s = md.Series([1, 2, 3])
>>> s.update(md.Series([4, 5, 6, 7, 8]))
>>> s.execute()
0 4
1 5
2 6
dtype: int64
```
If `other` contains NaNs the corresponding values are not updated
in the original Series.
```pycon
>>> s = md.Series([1, 2, 3])
>>> s.update(md.Series([4, mt.nan, 6]))
>>> s.execute()
0 4
1 2
2 6
dtype: int64
```
`other` can also be a non-Series object type
that is coercible into a Series
```pycon
>>> s = md.Series([1, 2, 3])
>>> s.update([4, mt.nan, 6])
>>> s.execute()
0 4
1 2
2 6
dtype: int64
```
```pycon
>>> s = md.Series([1, 2, 3])
>>> s.update({1: 9})
>>> s.execute()
0 1
1 9
2 3
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.value_counts.md
# maxframe.dataframe.Series.value_counts
#### Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True, method='auto')
Return a Series containing counts of unique values.
The resulting object will be in descending order so that the
first element is the most frequently-occurring element.
Excludes NA values by default.
* **Parameters:**
* **normalize** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True then the object returned will contain the relative
frequencies of the unique values.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Sort by frequencies.
* **ascending** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Sort in ascending order.
* **bins** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Rather than count values, group them into half-open bins,
a convenience for `pd.cut`, only works with numeric data.
* **dropna** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Don’t include counts of NaN.
* **method** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 'auto'*) – ‘auto’, ‘shuffle’, or ‘tree’, ‘tree’ method provide
a better performance, while ‘shuffle’ is recommended
if aggregated result is very large, ‘auto’ will use
‘shuffle’ method in distributed mode and use ‘tree’
in local mode.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
[`Series.count`](maxframe.dataframe.Series.count.md#maxframe.dataframe.Series.count)
: Number of non-NA elements in a Series.
[`DataFrame.count`](maxframe.dataframe.DataFrame.count.md#maxframe.dataframe.DataFrame.count)
: Number of non-NA elements in a DataFrame.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> import numpy as np
>>> s = md.Series([3, 1, 2, 3, 4, np.nan])
>>> s.value_counts().execute()
3.0 2
4.0 1
2.0 1
1.0 1
dtype: int64
```
With normalize set to True, returns the relative frequency by
dividing all values by the sum of values.
```pycon
>>> s = md.Series([3, 1, 2, 3, 4, np.nan])
>>> s.value_counts(normalize=True).execute()
3.0 0.4
4.0 0.2
2.0 0.2
1.0 0.2
dtype: float64
```
**dropna**
With dropna set to False we can also see NaN index values.
```pycon
>>> s.value_counts(dropna=False).execute()
3.0 2
NaN 1
4.0 1
2.0 1
1.0 1
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.var.md
# maxframe.dataframe.Series.var
#### Series.var(axis=None, skipna=True, level=None, ddof=1, method=None)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.where.md
# maxframe.dataframe.Series.where
#### Series.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
Replace values where the condition is False.
* **Parameters:**
* **cond** (*bool Series/DataFrame* *,* *array-like* *, or* *callable*) – Where cond is False, keep the original value. Where
True, replace with corresponding value from other.
If cond is callable, it is computed on the Series/DataFrame and
should return boolean Series/DataFrame or array. The callable must
not change input Series/DataFrame (though pandas doesn’t check it).
* **other** (*scalar* *,* *Series/DataFrame* *, or* *callable*) – Entries where cond is True are replaced with
corresponding value from other.
If other is callable, it is computed on the Series/DataFrame and
should return scalar or Series/DataFrame. The callable must not
change input Series/DataFrame (though pandas doesn’t check it).
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether to perform the operation in place on the data.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Alignment axis if needed.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Alignment level if needed.
* **Return type:**
Same type as caller
#### SEE ALSO
[`DataFrame.mask()`](maxframe.dataframe.DataFrame.mask.md#maxframe.dataframe.DataFrame.mask)
: Return an object of same shape as self.
### Notes
The mask method is an application of the if-then idiom. For each
element in the calling DataFrame, if `cond` is `False` the
element is used; otherwise the corresponding element from the DataFrame
`other` is used.
The signature for [`DataFrame.where()`](maxframe.dataframe.DataFrame.where.md#maxframe.dataframe.DataFrame.where) differs from
[`numpy.where()`](https://numpy.org/doc/stable/reference/generated/numpy.where.html#numpy.where). Roughly `df1.where(m, df2)` is equivalent to
`np.where(m, df1, df2)`.
For further details and examples see the `mask` documentation in
[indexing](https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-where-mask).
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> s = md.Series(range(5))
>>> s.where(s > 0).execute()
0 NaN
1 1.0
2 2.0
3 3.0
4 4.0
dtype: float64
```
```pycon
>>> s.mask(s > 0).execute()
0 0.0
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
```
```pycon
>>> s.where(s > 1, 10).execute()
0 10
1 10
2 2
3 3
4 4
dtype: int64
```
```pycon
>>> df = md.DataFrame(mt.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> df.execute()
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
>>> m = df % 3 == 0
>>> df.where(m, -df).execute()
A B
0 0 -1
1 -2 3
2 -4 -5
3 6 -7
4 -8 9
>>> df.where(m, -df) == mt.where(m, df, -df).execute()
A B
0 True True
1 True True
2 True True
3 True True
4 True True
>>> df.where(m, -df) == df.mask(~m, -df).execute()
A B
0 True True
1 True True
2 True True
3 True True
4 True True
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.Series.xs.md
# maxframe.dataframe.Series.xs
#### Series.xs(key, axis=0, level=None, drop_level=True)
Return cross-section from the Series/DataFrame.
This method takes a key argument to select data at a particular
level of a MultiIndex.
* **Parameters:**
* **key** (*label* *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *label*) – Label contained in the index, or partially in a MultiIndex.
* **axis** ( *{0* *or* *'index'* *,* *1* *or* *'columns'}* *,* *default 0*) – Axis to retrieve cross-section on.
* **level** ([*object*](https://docs.python.org/3/library/functions.html#object) *,* *defaults to first n levels* *(**n=1* *or* *len* *(**key* *)* *)*) – In case of a key partially contained in a MultiIndex, indicate
which levels are used. Levels can be referred by label or position.
* **drop_level** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If False, returns object with same levels as self.
* **Returns:**
Cross-section from the original Series or DataFrame
corresponding to the selected index levels.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.loc`](maxframe.dataframe.DataFrame.loc.md#maxframe.dataframe.DataFrame.loc)
: Access a group of rows and columns by label(s) or a boolean array.
[`DataFrame.iloc`](maxframe.dataframe.DataFrame.iloc.md#maxframe.dataframe.DataFrame.iloc)
: Purely integer-location based indexing for selection by position.
### Notes
xs can not be used to set values.
MultiIndex Slicers is a generic way to get/set values on
any level or levels.
It is a superset of xs functionality, see
[MultiIndex Slicers](https://pandas.pydata.org/docs/user_guide/advanced.html#advanced-mi-slicers).
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> d = {'num_legs': [4, 4, 2, 2],
... 'num_wings': [0, 0, 2, 2],
... 'class': ['mammal', 'mammal', 'mammal', 'bird'],
... 'animal': ['cat', 'dog', 'bat', 'penguin'],
... 'locomotion': ['walks', 'walks', 'flies', 'walks']}
>>> df = md.DataFrame(data=d)
>>> df = df.set_index(['class', 'animal', 'locomotion'])
>>> df.execute()
num_legs num_wings
class animal locomotion
mammal cat walks 4 0
dog walks 4 0
bat flies 2 2
bird penguin walks 2 2
```
Get values at specified index
```pycon
>>> df.xs('mammal').execute()
num_legs num_wings
animal locomotion
cat walks 4 0
dog walks 4 0
bat flies 2 2
```
Get values at several indexes
```pycon
>>> df.xs(('mammal', 'dog')).execute()
num_legs num_wings
locomotion
walks 4 0
```
Get values at specified index and level
```pycon
>>> df.xs('cat', level=1).execute()
num_legs num_wings
class locomotion
mammal walks 4 0
```
Get values at several indexes and levels
```pycon
>>> df.xs(('bird', 'walks'),
... level=[0, 'locomotion']).execute()
num_legs num_wings
animal
penguin 2 2
```
Get values at specified column and axis
```pycon
>>> df.xs('num_wings', axis=1).execute()
class animal locomotion
mammal cat walks 0
dog walks 0
bat flies 2
bird penguin walks 2
Name: num_wings, dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.concat.md
# maxframe.dataframe.concat
### maxframe.dataframe.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True, default_index_type=None)
Concatenate dataframe objects along a particular axis with optional set logic
along the other axes.
Can also add a layer of hierarchical indexing on the concatenation axis,
which may be useful if the labels are the same (or overlapping) on
the passed axis number.
* **Parameters:**
* **objs** (*a sequence* *or* *mapping* *of* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *or* *DataFrame objects*) – If a mapping is passed, the sorted keys will be used as the keys
argument, unless it is passed, in which case the values will be
selected (see below). Any None objects will be dropped silently unless
they are all None in which case a ValueError will be raised.
* **axis** ( *{0/'index'* *,* *1/'columns'}* *,* *default 0*) – The axis to concatenate along.
* **join** ( *{'inner'* *,* *'outer'}* *,* *default 'outer'*) – How to handle indexes on other axis (or axes).
* **ignore_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, do not use the index values along the concatenation axis. The
resulting axis will be labeled 0, …, n - 1. This is useful if you are
concatenating objects where the concatenation axis does not have
meaningful indexing information. Note the index values on the other
axes are still respected in the join.
* **keys** (*sequence* *,* *default None*) – If multiple levels passed, should contain tuples. Construct
hierarchical index using the passed keys as the outermost level.
* **levels** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *sequences* *,* *default None*) – Specific levels (unique values) to use for constructing a
MultiIndex. Otherwise they will be inferred from the keys.
* **names** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *default None*) – Names for the levels in the resulting hierarchical index.
* **verify_integrity** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Check whether the new concatenated axis contains duplicates. This can
be very expensive relative to the actual data concatenation.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Sort non-concatenation axis if it is not already aligned when join
is ‘outer’.
This has no effect when `join='inner'`, which already preserves
the order of the non-concatenation axis.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If False, do not copy data unnecessarily.
* **Returns:**
When concatenating all `Series` along the index (axis=0), a
`Series` is returned. When `objs` contains at least one
`DataFrame`, a `DataFrame` is returned. When concatenating along
the columns (axis=1), a `DataFrame` is returned.
* **Return type:**
[object](https://docs.python.org/3/library/functions.html#object), [type](https://docs.python.org/3/library/functions.html#type) of objs
#### SEE ALSO
[`Series.append`](maxframe.dataframe.Series.append.md#maxframe.dataframe.Series.append)
: Concatenate Series.
[`DataFrame.append`](maxframe.dataframe.DataFrame.append.md#maxframe.dataframe.DataFrame.append)
: Concatenate DataFrames.
[`DataFrame.join`](maxframe.dataframe.DataFrame.join.md#maxframe.dataframe.DataFrame.join)
: Join DataFrames using indexes.
[`DataFrame.merge`](maxframe.dataframe.DataFrame.merge.md#maxframe.dataframe.DataFrame.merge)
: Merge DataFrames by indexes or columns.
### Notes
The keys, levels, and names arguments are all optional.
A walkthrough of how this method fits in with other tools for combining
pandas objects can be found [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html).
### Examples
Combine two `Series`.
```pycon
>>> import maxframe.dataframe as md
>>> s1 = md.Series(['a', 'b'])
>>> s2 = md.Series(['c', 'd'])
>>> md.concat([s1, s2]).execute()
0 a
1 b
0 c
1 d
dtype: object
```
Clear the existing index and reset it in the result
by setting the `ignore_index` option to `True`.
```pycon
>>> md.concat([s1, s2], ignore_index=True).execute()
0 a
1 b
2 c
3 d
dtype: object
```
Add a hierarchical index at the outermost level of
the data with the `keys` option.
```pycon
>>> md.concat([s1, s2], keys=['s1', 's2']).execute()
s1 0 a
1 b
s2 0 c
1 d
dtype: object
```
Label the index keys you create with the `names` option.
```pycon
>>> md.concat([s1, s2], keys=['s1', 's2'],
... names=['Series name', 'Row ID']).execute()
Series name Row ID
s1 0 a
1 b
s2 0 c
1 d
dtype: object
```
Combine two `DataFrame` objects with identical columns.
```pycon
>>> df1 = md.DataFrame([['a', 1], ['b', 2]],
... columns=['letter', 'number'])
>>> df1.execute()
letter number
0 a 1
1 b 2
>>> df2 = md.DataFrame([['c', 3], ['d', 4]],
... columns=['letter', 'number'])
>>> df2.execute()
letter number
0 c 3
1 d 4
>>> md.concat([df1, df2]).execute()
letter number
0 a 1
1 b 2
0 c 3
1 d 4
```
Combine `DataFrame` objects with overlapping columns
and return everything. Columns outside the intersection will
be filled with `NaN` values.
```pycon
>>> df3 = md.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']],
... columns=['letter', 'number', 'animal'])
>>> df3.execute()
letter number animal
0 c 3 cat
1 d 4 dog
>>> md.concat([df1, df3], sort=False).execute()
letter number animal
0 a 1 NaN
1 b 2 NaN
0 c 3 cat
1 d 4 dog
```
Combine `DataFrame` objects with overlapping columns
and return only those that are shared by passing `inner` to
the `join` keyword argument.
```pycon
>>> md.concat([df1, df3], join="inner").execute()
letter number
0 a 1
1 b 2
0 c 3
1 d 4
```
Combine `DataFrame` objects horizontally along the x axis by
passing in `axis=1`.
```pycon
>>> df4 = md.DataFrame([['bird', 'polly'], ['monkey', 'george']],
... columns=['animal', 'name'])
>>> md.concat([df1, df4], axis=1).execute()
letter number animal name
0 a 1 bird polly
1 b 2 monkey george
```
Prevent the result from including duplicate index values with the
`verify_integrity` option.
```pycon
>>> df5 = md.DataFrame([1], index=['a'])
>>> df5.execute()
0
a 1
>>> df6 = md.DataFrame([2], index=['a'])
>>> df6.execute()
0
a 2
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.date_range.md
# maxframe.dataframe.date_range
### maxframe.dataframe.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=<no_default>, inclusive=None, chunk_size=None, \*\*kwargs)
Return a fixed frequency DatetimeIndex.
* **Parameters:**
* **start** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *datetime-like* *,* *optional*) – Left bound for generating dates.
* **end** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *datetime-like* *,* *optional*) – Right bound for generating dates.
* **periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Number of periods to generate.
* **freq** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *DateOffset* *,* *default 'D'*) – Frequency strings can have multiples, e.g. ‘5H’. See
[here](https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases) for a list of
frequency aliases.
* **tz** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *tzinfo* *,* *optional*) – Time zone name for returning localized DatetimeIndex, for example
‘Asia/Hong_Kong’. By default, the resulting DatetimeIndex is
timezone-naive.
* **normalize** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Normalize start/end dates to midnight before generating date range.
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Name of the resulting DatetimeIndex.
* **inclusive** ( *{“both”* *,* *“neither”* *,* *“left”* *,* *“right”}* *,* *default “both”*) – Include boundaries; Whether to set each bound as closed or open.
* **\*\*kwargs** – For compatibility. Has no effect on the result.
* **Returns:**
**rng**
* **Return type:**
DatetimeIndex
#### SEE ALSO
`DatetimeIndex`
: An immutable container for datetimes.
`timedelta_range`
: Return a fixed frequency TimedeltaIndex.
`period_range`
: Return a fixed frequency PeriodIndex.
`interval_range`
: Return a fixed frequency IntervalIndex.
### Notes
Of the four parameters `start`, `end`, `periods`, and `freq`,
exactly three must be specified. If `freq` is omitted, the resulting
`DatetimeIndex` will have `periods` linearly spaced elements between
`start` and `end` (closed on both sides).
To learn more about the frequency strings, please see [this link](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).
### Examples
**Specifying the values**
The next four examples generate the same DatetimeIndex, but vary
the combination of start, end and periods.
Specify start and end, with the default daily frequency.
>>> import maxframe.dataframe as md
```pycon
>>> md.date_range(start='1/1/2018', end='1/08/2018').execute()
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
'2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
dtype='datetime64[ns]', freq='D')
```
Specify start and periods, the number of periods (days).
```pycon
>>> md.date_range(start='1/1/2018', periods=8).execute()
DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
'2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
dtype='datetime64[ns]', freq='D')
```
Specify end and periods, the number of periods (days).
```pycon
>>> md.date_range(end='1/1/2018', periods=8).execute()
DatetimeIndex(['2017-12-25', '2017-12-26', '2017-12-27', '2017-12-28',
'2017-12-29', '2017-12-30', '2017-12-31', '2018-01-01'],
dtype='datetime64[ns]', freq='D')
```
Specify start, end, and periods; the frequency is generated
automatically (linearly spaced).
```pycon
>>> md.date_range(start='2018-04-24', end='2018-04-27', periods=3).execute()
DatetimeIndex(['2018-04-24 00:00:00', '2018-04-25 12:00:00',
'2018-04-27 00:00:00'],
dtype='datetime64[ns]', freq=None)
```
**Other Parameters**
Changed the freq (frequency) to `'M'` (month end frequency).
```pycon
>>> md.date_range(start='1/1/2018', periods=5, freq='M').execute()
DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
'2018-05-31'],
dtype='datetime64[ns]', freq='M')
```
Multiples are allowed
```pycon
>>> md.date_range(start='1/1/2018', periods=5, freq='3M').execute()
DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31',
'2019-01-31'],
dtype='datetime64[ns]', freq='3M')
```
freq can also be specified as an Offset object.
```pycon
>>> md.date_range(start='1/1/2018', periods=5, freq=md.offsets.MonthEnd(3)).execute()
DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31',
'2019-01-31'],
dtype='datetime64[ns]', freq='3M')
```
Specify tz to set the timezone.
```pycon
>>> md.date_range(start='1/1/2018', periods=5, tz='Asia/Tokyo').execute()
DatetimeIndex(['2018-01-01 00:00:00+09:00', '2018-01-02 00:00:00+09:00',
'2018-01-03 00:00:00+09:00', '2018-01-04 00:00:00+09:00',
'2018-01-05 00:00:00+09:00'],
dtype='datetime64[ns, Asia/Tokyo]', freq='D')
```
inclusive controls whether to include start and end that are on the
boundary. The default, “both”, includes boundary points on either end.
```pycon
>>> md.date_range(start='2017-01-01', end='2017-01-04', inclusive='both').execute()
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'],
dtype='datetime64[ns]', freq='D')
```
Use `inclusive='left'` to exclude end if it falls on the boundary.
```pycon
>>> md.date_range(start='2017-01-01', end='2017-01-04', closed='left').execute()
DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'],
dtype='datetime64[ns]', freq='D')
```
Use `inclusive='right'` to exclude start if it falls on the boundary,
and similarly inclusive=’neither’ will exclude both start and end.
```pycon
>>> md.date_range(start='2017-01-01', end='2017-01-04', closed='right').execute()
DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'],
dtype='datetime64[ns]', freq='D')
```
#### NOTE
Pandas 1.4.0 or later is required to use `inclusive='neither'`.
Otherwise an error may be raised.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.eval.md
# maxframe.dataframe.eval
### maxframe.dataframe.eval(expr, parser='maxframe', engine=None, local_dict=None, global_dict=None, resolvers=(), level=0, target=None, inplace=False)
Evaluate a Python expression as a string using various backends.
The following arithmetic operations are supported: `+`, `-`, `*`,
`/`, `**`, `%`, `//` (python engine only) along with the following
boolean operations: `|` (or), `&` (and), and `~` (not).
Additionally, the `'pandas'` parser allows the use of [`and`](https://docs.python.org/3/reference/expressions.html#and),
[`or`](https://docs.python.org/3/reference/expressions.html#or), and [`not`](https://docs.python.org/3/reference/expressions.html#not) with the same semantics as the
corresponding bitwise operators. [`Series`](https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series) and
[`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame) objects are supported and behave as they would
with plain ol’ Python evaluation.
* **Parameters:**
* **expr** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The expression to evaluate. This string cannot contain any Python
[statements](https://docs.python.org/3/reference/simple_stmts.html#simple-statements),
only Python [expressions](https://docs.python.org/3/reference/simple_stmts.html#expression-statements).
* **local_dict** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *or* *None* *,* *optional*) – A dictionary of local variables, taken from locals() by default.
* **global_dict** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *or* *None* *,* *optional*) – A dictionary of global variables, taken from globals() by default.
* **resolvers** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *dict-like* *or* *None* *,* *optional*) – A list of objects implementing the `__getitem__` special method that
you can use to inject an additional collection of namespaces to use for
variable lookup. For example, this is used in the
[`query()`](maxframe.dataframe.DataFrame.query.md#maxframe.dataframe.DataFrame.query) method to inject the
`DataFrame.index` and `DataFrame.columns`
variables that refer to their respective [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame)
instance attributes.
* **level** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The number of prior stack frames to traverse and add to the current
scope. Most users will **not** need to change this parameter.
* **target** ([*object*](https://docs.python.org/3/library/functions.html#object) *,* *optional* *,* *default None*) – This is the target object for assignment. It is used when there is
variable assignment in the expression. If so, then target must
support item assignment with string keys, and if a copy is being
returned, it must also support .copy().
* **inplace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If target is provided, and the expression mutates target, whether
to modify target inplace. Otherwise, return a copy of target with
the mutation.
* **Return type:**
ndarray, numeric scalar, [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame), [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – There are many instances where such an error can be raised:
- target=None, but the expression is multiline.
- The expression is multiline, but not all them have item assignment.
An example of such an arrangement is this:
a = b + 1
a + 2
Here, there are expressions on different lines, making it multiline,
but the last line has no variable assigned to the output of a + 2.
- inplace=True, but the expression is missing item assignment.
- Item assignment is provided, but the target does not support
string item assignment.
- Item assignment is provided and inplace=False, but the target
does not support the .copy() method
#### SEE ALSO
[`DataFrame.query`](maxframe.dataframe.DataFrame.query.md#maxframe.dataframe.DataFrame.query)
: Evaluates a boolean expression to query the columns of a frame.
[`DataFrame.eval`](maxframe.dataframe.DataFrame.eval.md#maxframe.dataframe.DataFrame.eval)
: Evaluate a string describing operations on DataFrame columns.
### Notes
The `dtype` of any objects involved in an arithmetic `%` operation are
recursively cast to `float64`.
See the [enhancing performance](https://pandas.pydata.org/docs/user_guide/enhancingperf.html#enhancingperf-eval) documentation for
more details.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({"animal": ["dog", "pig"], "age": [10, 20]})
>>> df.execute()
animal age
0 dog 10
1 pig 20
```
We can add a new column using `pd.eval`:
```pycon
>>> md.eval("double_age = df.age * 2", target=df).execute()
animal age double_age
0 dog 10 20
1 pig 20 40
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.extensions.apply_chunk.md
# maxframe.dataframe.extensions.apply_chunk
### Functions
| `df_apply_chunk`(dataframe, func[, ...]) | Apply a function that takes pandas DataFrame and outputs pandas DataFrame/Series. |
|-------------------------------------------------|-------------------------------------------------------------------------------------|
| `get_packed_func`(df, func, \*args, \*\*kwargs) | |
| `series_apply_chunk`(dataframe_or_series, func) | Apply a function that takes pandas Series and outputs pandas DataFrame/Series. |
### Classes
| `DataFrameApplyChunkOperator` | alias of `DataFrameApplyChunk` |
|---------------------------------|----------------------------------|
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.extensions.flatmap.md
# maxframe.dataframe.extensions.flatmap
### Functions
| `df_flatmap`(dataframe, func[, dtypes, raw, args]) | Apply the given function to each row and then flatten results. |
|------------------------------------------------------|------------------------------------------------------------------|
| `series_flatmap`(series, func[, dtypes, ...]) | Apply the given function to each row and then flatten results. |
### Classes
| `DataFrameFlatMapOperator`([output_types]) | |
|----------------------------------------------|----|
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.factorize.md
# maxframe.dataframe.factorize
### maxframe.dataframe.factorize(values, sort=False, use_na_sentinel=True)
Encode the object as an enumerated type or categorical variable.
This method is useful for obtaining a numeric representation of an
array when all that matters is identifying distinct values. factorize
is available as both a top-level function [`pandas.factorize()`](https://pandas.pydata.org/docs/reference/api/pandas.factorize.html#pandas.factorize),
and as a method [`Series.factorize()`](maxframe.dataframe.Series.factorize.md#maxframe.dataframe.Series.factorize) and [`Index.factorize()`](maxframe.dataframe.Index.factorize.md#maxframe.dataframe.Index.factorize).
* **Parameters:**
* **values** (*sequence*) – A 1-D sequence. Sequences that aren’t pandas objects are
coerced to ndarrays before factorization.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Sort uniques and shuffle codes to maintain the
relationship.
* **use_na_sentinel** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If True, the sentinel -1 will be used for NaN values. If False,
NaN values will be encoded as non-negative integers and will not drop the
NaN from the uniques of the values.
* **Returns:**
* **codes** (*ndarray*) – An integer ndarray that’s an indexer into uniques.
`uniques.take(codes)` will have the same values as values.
* **uniques** (*ndarray, Index, or Categorical*) – The unique valid values. When values is Categorical, uniques
is a Categorical. When values is some other pandas object, an
Index is returned. Otherwise, a 1-D ndarray is returned.
#### NOTE
Even if there’s a missing value in values, uniques will
*not* contain an entry for it.
#### SEE ALSO
`cut`
: Discretize continuous-valued array.
`unique`
: Find the unique value in an array.
### Notes
Reference [the user guide](https://pandas.pydata.org/docs/user_guide/reshaping.html#reshaping-factorize) for more examples.
### Examples
These examples all show factorize as a top-level method like
`pd.factorize(values)`. The results are identical for methods like
[`Series.factorize()`](maxframe.dataframe.Series.factorize.md#maxframe.dataframe.Series.factorize).
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> codes, uniques = md.factorize(mt.array(['b', 'b', 'a', 'c', 'b'], dtype="O"))
>>> codes.execute()
array([0, 0, 1, 2, 0])
>>> uniques.execute()
array(['b', 'a', 'c'], dtype=object)
```
With `sort=True`, the uniques will be sorted, and codes will be
shuffled so that the relationship is the maintained.
```pycon
>>> codes, uniques = md.factorize(mt.array(['b', 'b', 'a', 'c', 'b'], dtype="O"),
... sort=True)
>>> codes.execute()
array([1, 1, 0, 2, 1])
>>> uniques.execute()
array(['a', 'b', 'c'], dtype=object)
```
When `use_na_sentinel=True` (the default), missing values are indicated in
the codes with the sentinel value `-1` and missing values are not
included in uniques.
```pycon
>>> codes, uniques = md.factorize(mt.array(['b', None, 'a', 'c', 'b'], dtype="O"))
>>> codes.execute()
array([ 0, -1, 1, 2, 0])
>>> uniques.execute()
array(['b', 'a', 'c'], dtype=object)
```
Thus far, we’ve only factorized lists (which are internally coerced to
NumPy arrays). When factorizing pandas objects, the type of uniques
will differ. For Categoricals, a Categorical is returned.
```pycon
>>> cat = md.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
>>> codes, uniques = md.factorize(cat)
>>> codes.execute()
array([0, 0, 1])
>>> uniques.execute()
['a', 'c']
Categories (3, object): ['a', 'b', 'c']
```
Notice that `'b'` is in `uniques.categories`, despite not being
present in `cat.values`.
For all other pandas objects, an Index of the appropriate type is
returned.
```pycon
>>> cat = md.Series(['a', 'a', 'c'])
>>> codes, uniques = md.factorize(cat)
>>> codes.execute()
array([0, 0, 1])
>>> uniques.execute()
Index(['a', 'c'], dtype='object')
```
If NaN is in the values, and we want to include NaN in the uniques of the
values, it can be achieved by setting `use_na_sentinel=False`.
```pycon
>>> values = mt.array([1, 2, 1, mt.nan])
>>> codes, uniques = md.factorize(values) # default: use_na_sentinel=True
>>> codes.execute()
array([ 0, 1, 0, -1])
>>> uniques.execute()
array([1., 2.])
```
```pycon
>>> codes, uniques = md.factorize(values, use_na_sentinel=False)
>>> codes.execute()
array([0, 1, 0, 2])
>>> uniques.execute()
array([ 1., 2., nan])
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.get_dummies.md
# maxframe.dataframe.get_dummies
### maxframe.dataframe.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)
Convert categorical variable into dummy/indicator variables.
* **Parameters:**
* **data** (*array-like* *,* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *, or* [*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)) – Data of which to get dummy indicators.
* **prefix** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *, or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – String to append DataFrame column names.
Pass a list with length equal to the number of columns
when calling get_dummies on a DataFrame. Alternatively, prefix
can be a dictionary mapping column names to prefixes.
* **prefix_sep** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '_'*) – If appending prefix, separator/delimiter to use. Or pass a
list or dictionary as with prefix.
* **dummy_na** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Add a column to indicate NaNs, if False NaNs are ignored.
* **columns** (*list-like* *,* *default None*) – Column names in the DataFrame to be encoded.
If columns is None then all the columns with
object or category dtype will be converted.
* **sparse** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether the dummy-encoded columns should be backed by
a `SparseArray` (True) or a regular NumPy array (False).
* **drop_first** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether to get k-1 dummies out of k categorical levels by removing the
first level.
* **dtype** (*dtype* *,* *default bool*) – Data type for new columns. Only a single dtype is allowed.
* **Returns:**
Dummy-coded data.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> s = md.Series(list('abca'))
```
```pycon
>>> md.get_dummies(s).execute()
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
```
```pycon
>>> s1 = ['a', 'b', np.nan]
```
```pycon
>>> md.get_dummies(s1).execute()
a b
0 1 0
1 0 1
2 0 0
```
```pycon
>>> md.get_dummies(s1, dummy_na=True).execute()
a b NaN
0 1 0 0
1 0 1 0
2 0 0 1
```
```pycon
>>> df = md.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
... 'C': [1, 2, 3]})
```
```pycon
>>> md.get_dummies(df, prefix=['col1', 'col2']).execute()
C col1_a col1_b col2_a col2_b col2_c
0 1 1 0 0 1 0
1 2 0 1 1 0 0
2 3 1 0 0 0 1
```
```pycon
>>> md.get_dummies(pd.Series(list('abcaa'))).execute()
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
4 1 0 0
```
```pycon
>>> md.get_dummies(pd.Series(list('abcaa')), drop_first=True).execute()
b c
0 0 0
1 1 0
2 0 1
3 0 0
4 0 0
```
```pycon
>>> md.get_dummies(pd.Series(list('abc')), dtype=float).execute()
a b c
0 1.0 0.0 0.0
1 0.0 1.0 0.0
2 0.0 0.0 1.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.DataFrameGroupBy.count.md
# maxframe.dataframe.groupby.DataFrameGroupBy.count
#### DataFrameGroupBy.count(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.DataFrameGroupBy.cummax.md
# maxframe.dataframe.groupby.DataFrameGroupBy.cummax
#### DataFrameGroupBy.cummax()
Cumulative max for each group.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`Series.groupby`
: Apply a function groupby to a Series.
`DataFrame.groupby`
: Apply a function groupby to each row or column of a DataFrame.
### Examples
For SeriesGroupBy:
```pycon
>>> import maxframe.dataframe as md
>>> lst = ['a', 'a', 'b']
>>> ser = md.Series([6, 2, 0], index=lst)
>>> ser.execute()
a 6
a 2
b 0
dtype: int64
>>> ser.groupby(level=0).cummax().execute()
a 6
a 6
b 0
```
For DataFrameGroupBy:
```pycon
>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = md.DataFrame(data, columns=["a", "b", "c"],
... index=["fox", "gorilla", "lion"])
>>> df.execute()
a b c
fox 1 8 2
gorilla 1 2 5
lion 2 6 9
>>> df.groupby("a").groups.execute()
{1: ['fox', 'gorilla'], 2: ['lion']}
>>> df.groupby("a").cummax().execute()
b c
fox 8 2
gorilla 8 5
lion 6 9
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.DataFrameGroupBy.cummin.md
# maxframe.dataframe.groupby.DataFrameGroupBy.cummin
#### DataFrameGroupBy.cummin()
Cumulative min for each group.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`Series.groupby`
: Apply a function groupby to a Series.
`DataFrame.groupby`
: Apply a function groupby to each row or column of a DataFrame.
### Examples
For SeriesGroupBy:
```pycon
>>> import maxframe.dataframe as md
>>> lst = ['a', 'a', 'b']
>>> ser = md.Series([6, 2, 0], index=lst)
>>> ser.execute()
a 6
a 2
b 0
dtype: int64
>>> ser.groupby(level=0).cummin().execute()
a 6
a 2
b 0
```
For DataFrameGroupBy:
```pycon
>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = md.DataFrame(data, columns=["a", "b", "c"],
... index=["fox", "gorilla", "lion"])
>>> df.execute()
a b c
fox 1 8 2
gorilla 1 2 5
lion 2 6 9
>>> df.groupby("a").groups.execute()
{1: ['fox', 'gorilla'], 2: ['lion']}
>>> df.groupby("a").cummin().execute()
b c
fox 8 2
gorilla 2 2
lion 6 9
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.DataFrameGroupBy.cumprod.md
# maxframe.dataframe.groupby.DataFrameGroupBy.cumprod
#### DataFrameGroupBy.cumprod()
Cumulative prod for each group.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`Series.groupby`
: Apply a function groupby to a Series.
`DataFrame.groupby`
: Apply a function groupby to each row or column of a DataFrame.
### Examples
For SeriesGroupBy:
```pycon
>>> import maxframe.dataframe as md
>>> lst = ['a', 'a', 'b']
>>> ser = md.Series([6, 2, 0], index=lst)
>>> ser.execute()
a 6
a 2
b 0
dtype: int64
>>> ser.groupby(level=0).cumprod().execute()
a 6
a 12
b 0
```
For DataFrameGroupBy:
```pycon
>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = md.DataFrame(data, columns=["a", "b", "c"],
... index=["fox", "gorilla", "lion"])
>>> df.execute()
a b c
fox 1 8 2
gorilla 1 2 5
lion 2 6 9
>>> df.groupby("a").groups.execute()
{1: ['fox', 'gorilla'], 2: ['lion']}
>>> df.groupby("a").cumprod().execute()
b c
fox 8 2
gorilla 16 10
lion 6 9
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.DataFrameGroupBy.cumsum.md
# maxframe.dataframe.groupby.DataFrameGroupBy.cumsum
#### DataFrameGroupBy.cumsum()
Cumulative sum for each group.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`Series.groupby`
: Apply a function groupby to a Series.
`DataFrame.groupby`
: Apply a function groupby to each row or column of a DataFrame.
### Examples
For SeriesGroupBy:
```pycon
>>> import maxframe.dataframe as md
>>> lst = ['a', 'a', 'b']
>>> ser = md.Series([6, 2, 0], index=lst)
>>> ser.execute()
a 6
a 2
b 0
dtype: int64
>>> ser.groupby(level=0).cumsum().execute()
a 6
a 8
b 0
```
For DataFrameGroupBy:
```pycon
>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = md.DataFrame(data, columns=["a", "b", "c"],
... index=["fox", "gorilla", "lion"])
>>> df.execute()
a b c
fox 1 8 2
gorilla 1 2 5
lion 2 6 9
>>> df.groupby("a").groups.execute()
{1: ['fox', 'gorilla'], 2: ['lion']}
>>> df.groupby("a").cumsum().execute()
b c
fox 8 2
gorilla 10 7
lion 6 9
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.DataFrameGroupBy.fillna.md
# maxframe.dataframe.groupby.DataFrameGroupBy.fillna
#### DataFrameGroupBy.fillna(value=None, method=None, axis=None, limit=None, downcast=None)
Fill NA/NaN values using the specified method
value: scalar, dict, Series, or DataFrame
: Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame
of values specifying which value to use for each index (for a Series) or
column (for a DataFrame). Values not in the dict/Series/DataFrame
will not be filled. This value cannot be a list.
method: {‘backfill’,’bfill’,’ffill’,None}, default None
axis: {0 or ‘index’, 1 or ‘column’}
limit: int, default None
> If method is specified, this is the maximum number of consecutive
> NaN values to forward/backward fill
downcast: dict, default None
: A dict of item->dtype of what to downcast if possible,
or the string ‘infer’ which will try to downcast to an appropriate equal type
return: DataFrame or None
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.DataFrameGroupBy.idxmax.md
# maxframe.dataframe.groupby.DataFrameGroupBy.idxmax
#### DataFrameGroupBy.idxmax(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.DataFrameGroupBy.idxmin.md
# maxframe.dataframe.groupby.DataFrameGroupBy.idxmin
#### DataFrameGroupBy.idxmin(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.DataFrameGroupBy.mf.apply_chunk.md
# maxframe.dataframe.groupby.DataFrameGroupBy.mf.apply_chunk
#### DataFrameGroupBy.mf.apply_chunk(func: [str](https://docs.python.org/3/library/stdtypes.html#str) | [Callable](https://docs.python.org/3/library/typing.html#typing.Callable), batch_rows=None, dtypes=None, dtype=None, name=None, output_type=None, index=None, skip_infer=False, order_cols=None, ascending=True, args=(), \*\*kwargs)
Apply function func group-wise and combine the results together.
The pandas DataFrame given to the function is a chunk of the input
dataframe, consider as a batch rows.
The function passed to apply must take a dataframe as its first
argument and return a DataFrame, Series or scalar. apply will
then take care of combining the results back together into a single
dataframe or series. apply is therefore a highly flexible
grouping method.
Don’t expect to receive all rows of the DataFrame in the function,
as it depends on the implementation of MaxFrame and the internal
running state of MaxCompute.
* **Parameters:**
* **func** (*callable*) – A callable that takes a dataframe as its first argument, and
returns a dataframe, a series or a scalar. In addition the
callable may take positional and keyword arguments.
* **batch_rows** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Specify expected number of rows in a batch, as well as the len of
function input dataframe. When the remaining data is insufficient,
it may be less than this number.
* **output_type** ( *{'dataframe'* *,* *'series'}* *,* *default None*) – Specify type of returned object. See Notes for more details.
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *default None*) – Specify dtypes of returned DataFrames. See Notes for more details.
* **dtype** ([*numpy.dtype*](https://numpy.org/doc/stable/reference/generated/numpy.dtype.html#numpy.dtype) *,* *default None*) – Specify dtype of returned Series. See Notes for more details.
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Specify name of returned Series. See Notes for more details.
* **index** ([*Index*](maxframe.dataframe.Index.md#maxframe.dataframe.Index) *,* *default None*) – Specify index of returned object. See Notes for more details.
* **skip_infer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether infer dtypes when dtypes or output_type is not specified.
* **args** (*tuple and dict*) – Optional positional and keyword arguments to pass to func.
* **kwargs** (*tuple and dict*) – Optional positional and keyword arguments to pass to func.
* **Returns:**
**applied**
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`Series.apply`
: Apply a function to a Series.
`DataFrame.apply`
: Apply a function to each row or column of a DataFrame.
`DataFrame.mf.apply_chunk`
: Apply a function to row batches of a DataFrame.
### Notes
When deciding output dtypes and shape of the return value, MaxFrame will
try applying `func` onto a mock grouped object, and the apply call
may fail. When this happens, you need to specify the type of apply
call (DataFrame or Series) in output_type.
* For DataFrame output, you need to specify a list or a pandas Series
as `dtypes` of output DataFrame. `index` of output can also be
specified.
* For Series output, you need to specify `dtype` and `name` of
output Series.
MaxFrame adopts expected behavior of pandas>=3.0 by ignoring group columns
in user function input. If you still need a group column for your function
input, try selecting it right after groupby results, for instance,
`df.groupby("A")[["A", "B", "C"]].mf.apply_chunk(func)` will pass data of
column A into `func`.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.DataFrameGroupBy.nunique.md
# maxframe.dataframe.groupby.DataFrameGroupBy.nunique
#### DataFrameGroupBy.nunique(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.DataFrameGroupBy.rank.md
# maxframe.dataframe.groupby.DataFrameGroupBy.rank
#### DataFrameGroupBy.rank(method='average', ascending=True, na_option='keep', pct=False)
Provide the rank of values within each group.
* **Parameters:**
* **method** ( *{'average'* *,* *'min'* *,* *'max'* *,* *'first'* *,* *'dense'}* *,* *default 'average'*) –
* average: average rank of group.
* min: lowest rank in group.
* max: highest rank in group.
* first: ranks assigned in order they appear in the array.
* dense: like ‘min’, but rank always increases by 1 between groups.
* **ascending** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – False for ranks by high (1) to low (N).
* **na_option** ( *{'keep'* *,* *'top'* *,* *'bottom'}* *,* *default 'keep'*) –
* keep: leave NA values where they are.
* top: smallest rank if ascending.
* bottom: smallest rank if descending.
* **pct** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Compute percentage rank of data within each group.
* **Return type:**
DataFrame with ranking of values within each group
#### SEE ALSO
`Series.groupby`
: Apply a function groupby to a Series.
`DataFrame.groupby`
: Apply a function groupby to each row or column of a DataFrame.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(
... {
... "group": ["a", "a", "a", "a", "a", "b", "b", "b", "b", "b"],
... "value": [2, 4, 2, 3, 5, 1, 2, 4, 1, 5],
... }
... )
>>> df.execute()
group value
0 a 2
1 a 4
2 a 2
3 a 3
4 a 5
5 b 1
6 b 2
7 b 4
8 b 1
9 b 5
>>> for method in ['average', 'min', 'max', 'dense', 'first']:
... df[f'{method}_rank'] = df.groupby('group')['value'].rank(method)
>>> df.execute()
group value average_rank min_rank max_rank dense_rank first_rank
0 a 2 1.5 1.0 2.0 1.0 1.0
1 a 4 4.0 4.0 4.0 3.0 4.0
2 a 2 1.5 1.0 2.0 1.0 2.0
3 a 3 3.0 3.0 3.0 2.0 3.0
4 a 5 5.0 5.0 5.0 4.0 5.0
5 b 1 1.5 1.0 2.0 1.0 1.0
6 b 2 3.0 3.0 3.0 2.0 3.0
7 b 4 4.0 4.0 4.0 3.0 4.0
8 b 1 1.5 1.0 2.0 1.0 2.0
9 b 5 5.0 5.0 5.0 4.0 5.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.DataFrameGroupBy.sample.md
# maxframe.dataframe.groupby.DataFrameGroupBy.sample
#### DataFrameGroupBy.sample(n: [int](https://docs.python.org/3/library/functions.html#int) | [None](https://docs.python.org/3/library/constants.html#None) = None, frac: [float](https://docs.python.org/3/library/functions.html#float) | [None](https://docs.python.org/3/library/constants.html#None) = None, replace: [bool](https://docs.python.org/3/library/functions.html#bool) = False, weights: [Sequence](https://docs.python.org/3/library/typing.html#typing.Sequence) | Series | [None](https://docs.python.org/3/library/constants.html#None) = None, random_state: [RandomState](https://numpy.org/doc/stable/reference/random/legacy.html#numpy.random.RandomState) | [None](https://docs.python.org/3/library/constants.html#None) = None, errors: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'ignore')
Return a random sample of items from each group.
You can use random_state for reproducibility.
* **Parameters:**
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Number of items to return for each group. Cannot be used with
frac and must be no larger than the smallest group unless
replace is True. Default is one if frac is None.
* **frac** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Fraction of items to return. Cannot be used with n.
* **replace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Allow or disallow sampling of the same row more than once.
* **weights** (*list-like* *,* *optional*) – Default None results in equal probability weighting.
If passed a list-like then values must have the same length as
the underlying DataFrame or Series object and will be used as
sampling probabilities after normalization within each group.
Values must be non-negative with at least one positive element
within each group.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *array-like* *,* *BitGenerator* *,* *np.random.RandomState* *,* *optional*) – If int, array-like, or BitGenerator (NumPy>=1.17), seed for
random number generator
If np.random.RandomState, use as numpy RandomState object.
* **errors** ( *{'ignore'* *,* *'raise'}* *,* *default 'ignore'*) – If ignore, errors will not be raised when replace is False
and size of some group is less than n.
* **Returns:**
A new object of same type as caller containing items randomly
sampled within each group from the caller object.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`DataFrame.sample`
: Generate random samples from a DataFrame object.
[`numpy.random.choice`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html#numpy.random.choice)
: Generate a random sample from a given 1-D numpy array.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(
... {"a": ["red"] * 2 + ["blue"] * 2 + ["black"] * 2, "b": range(6)}
... )
>>> df.execute()
a b
0 red 0
1 red 1
2 blue 2
3 blue 3
4 black 4
5 black 5
```
Select one row at random for each distinct value in column a. The
random_state argument can be used to guarantee reproducibility:
```pycon
>>> df.groupby("a").sample(n=1, random_state=1).execute()
a b
4 black 4
2 blue 2
1 red 1
```
Set frac to sample fixed proportions rather than counts:
```pycon
>>> df.groupby("a")["b"].sample(frac=0.5, random_state=2).execute()
5 5
2 2
0 0
Name: b, dtype: int64
```
Control sample probabilities within groups by setting weights:
```pycon
>>> df.groupby("a").sample(
... n=1,
... weights=[1, 1, 1, 0, 0, 1],
... random_state=1,
... ).execute()
a b
5 black 5
2 blue 2
0 red 0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.agg.md
# maxframe.dataframe.groupby.GroupBy.agg
#### GroupBy.agg(func=None, method='auto', \*args, \*\*kwargs)
Aggregate using one or more operations on grouped data.
* **Parameters:**
* **groupby** (*MaxFrame Groupby*) – Groupby data.
* **func** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *list-like*) – Aggregation functions.
* **method** ( *{'auto'* *,* *'shuffle'* *,* *'tree'}* *,* *default 'auto'*) – ‘tree’ method provide a better performance, ‘shuffle’ is recommended
if aggregated result is very large, ‘auto’ will use ‘shuffle’ method
in distributed mode and use ‘tree’ in local mode.
* **Returns:**
Aggregated result.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(
... {
... "A": [1, 1, 2, 2],
... "B": [1, 2, 3, 4],
... "C": [0.362838, 0.227877, 1.267767, -0.562860],
... }
... ).execute()
A B C
0 1 1 0.362838
1 1 2 0.227877
2 2 3 1.267767
3 2 4 -0.562860
```
The aggregation is for each column.
```pycon
>>> df.groupby('A').agg('min').execute()
B C
A
1 1 0.227877
2 3 -0.562860
```
Multiple aggregations.
```pycon
>>> df.groupby('A').agg(['min', 'max']).execute()
B C
min max min max
A
1 1 2 0.227877 0.362838
2 3 4 -0.562860 1.267767
```
Different aggregations per column
```pycon
>>> df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'}).execute()
B C
min max sum
A
1 1 2 0.590715
2 3 4 0.704907
```
To control the output names with different aggregations per column,
MaxFrame supports “named aggregation”
```pycon
>>> from maxframe.dataframe import NamedAgg
>>> df.groupby("A").agg(
... b_min=NamedAgg(column="B", aggfunc="min"),
... c_sum=NamedAgg(column="C", aggfunc="sum")).execute()
b_min c_sum
A
1 1 0.590715
2 3 0.704907
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.aggregate.md
# maxframe.dataframe.groupby.GroupBy.aggregate
#### GroupBy.aggregate(func=None, method='auto', \*args, \*\*kwargs)
Aggregate using one or more operations on grouped data.
* **Parameters:**
* **groupby** (*MaxFrame Groupby*) – Groupby data.
* **func** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* *list-like*) – Aggregation functions.
* **method** ( *{'auto'* *,* *'shuffle'* *,* *'tree'}* *,* *default 'auto'*) – ‘tree’ method provide a better performance, ‘shuffle’ is recommended
if aggregated result is very large, ‘auto’ will use ‘shuffle’ method
in distributed mode and use ‘tree’ in local mode.
* **Returns:**
Aggregated result.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame(
... {
... "A": [1, 1, 2, 2],
... "B": [1, 2, 3, 4],
... "C": [0.362838, 0.227877, 1.267767, -0.562860],
... }
... ).execute()
A B C
0 1 1 0.362838
1 1 2 0.227877
2 2 3 1.267767
3 2 4 -0.562860
```
The aggregation is for each column.
```pycon
>>> df.groupby('A').agg('min').execute()
B C
A
1 1 0.227877
2 3 -0.562860
```
Multiple aggregations.
```pycon
>>> df.groupby('A').agg(['min', 'max']).execute()
B C
min max min max
A
1 1 2 0.227877 0.362838
2 3 4 -0.562860 1.267767
```
Different aggregations per column
```pycon
>>> df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'}).execute()
B C
min max sum
A
1 1 2 0.590715
2 3 4 0.704907
```
To control the output names with different aggregations per column,
MaxFrame supports “named aggregation”
```pycon
>>> from maxframe.dataframe import NamedAgg
>>> df.groupby("A").agg(
... b_min=NamedAgg(column="B", aggfunc="min"),
... c_sum=NamedAgg(column="C", aggfunc="sum")).execute()
b_min c_sum
A
1 1 0.590715
2 3 0.704907
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.all.md
# maxframe.dataframe.groupby.GroupBy.all
#### GroupBy.all(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.any.md
# maxframe.dataframe.groupby.GroupBy.any
#### GroupBy.any(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.apply.md
# maxframe.dataframe.groupby.GroupBy.apply
#### GroupBy.apply(func, \*args, output_type=None, dtypes=None, dtype=None, name=None, index=None, skip_infer=None, \*\*kwargs)
Apply function func group-wise and combine the results together.
The function passed to apply must take a dataframe as its first
argument and return a DataFrame, Series or scalar. apply will
then take care of combining the results back together into a single
dataframe or series. apply is therefore a highly flexible
grouping method.
While apply is a very flexible method, its downside is that
using it can be quite a bit slower than using more specific methods
like agg or transform. Pandas offers a wide range of method that will
be much faster than using apply for their specific purposes, so try to
use them before reaching for apply.
* **Parameters:**
* **func** (*callable*) – A callable that takes a dataframe as its first argument, and
returns a dataframe, a series or a scalar. In addition the
callable may take positional and keyword arguments.
* **output_type** ( *{'dataframe'* *,* *'series'}* *,* *default None*) – Specify type of returned object. See Notes for more details.
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *default None*) – Specify dtypes of returned DataFrames. See Notes for more details.
* **dtype** ([*numpy.dtype*](https://numpy.org/doc/stable/reference/generated/numpy.dtype.html#numpy.dtype) *,* *default None*) – Specify dtype of returned Series. See Notes for more details.
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Specify name of returned Series. See Notes for more details.
* **index** ([*Index*](maxframe.dataframe.Index.md#maxframe.dataframe.Index) *,* *default None*) – Specify index of returned object. See Notes for more details.
* **skip_infer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether infer dtypes when dtypes or output_type is not specified.
* **args** (*tuple and dict*) – Optional positional and keyword arguments to pass to func.
* **kwargs** (*tuple and dict*) – Optional positional and keyword arguments to pass to func.
* **Returns:**
**applied**
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`pipe`
: Apply function to the full GroupBy object instead of to each group.
[`aggregate`](maxframe.dataframe.groupby.GroupBy.aggregate.md#maxframe.dataframe.groupby.GroupBy.aggregate)
: Apply aggregate function to the GroupBy object.
[`transform`](maxframe.dataframe.groupby.GroupBy.transform.md#maxframe.dataframe.groupby.GroupBy.transform)
: Apply function column-by-column to the GroupBy object.
`Series.apply`
: Apply a function to a Series.
`DataFrame.apply`
: Apply a function to each row or column of a DataFrame.
### Notes
When deciding output dtypes and shape of the return value, MaxFrame will
try applying `func` onto a mock grouped object, and the apply call
may fail. When this happens, you need to specify the type of apply
call (DataFrame or Series) in output_type.
* For DataFrame output, you need to specify a list or a pandas Series
as `dtypes` of output DataFrame. `index` of output can also be
specified.
* For Series output, you need to specify `dtype` and `name` of
output Series.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.count.md
# maxframe.dataframe.groupby.GroupBy.count
#### GroupBy.count(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.cumcount.md
# maxframe.dataframe.groupby.GroupBy.cumcount
#### GroupBy.cumcount(ascending: [bool](https://docs.python.org/3/library/functions.html#bool) = True)
Number each item in each group from 0 to the length of that group - 1.
Essentially this is equivalent to
```python
self.apply(lambda x: pd.Series(np.arange(len(x)), x.index))
```
* **Parameters:**
**ascending** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If False, number in reverse, from length of group - 1 to 0.
* **Returns:**
Sequence number of each element within each group.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
`ngroup`
: Number the groups themselves.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([['a'], ['a'], ['a'], ['b'], ['b'], ['a']],
... columns=['A'])
>>> df.execute()
A
0 a
1 a
2 a
3 b
4 b
5 a
>>> df.groupby('A').cumcount().execute()
0 0
1 1
2 2
3 0
4 1
5 3
dtype: int64
>>> df.groupby('A').cumcount(ascending=False).execute()
0 3
1 2
2 1
3 1
4 0
5 0
dtype: int64
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.cummax.md
# maxframe.dataframe.groupby.GroupBy.cummax
#### GroupBy.cummax()
Cumulative max for each group.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`Series.groupby`
: Apply a function groupby to a Series.
`DataFrame.groupby`
: Apply a function groupby to each row or column of a DataFrame.
### Examples
For SeriesGroupBy:
```pycon
>>> import maxframe.dataframe as md
>>> lst = ['a', 'a', 'b']
>>> ser = md.Series([6, 2, 0], index=lst)
>>> ser.execute()
a 6
a 2
b 0
dtype: int64
>>> ser.groupby(level=0).cummax().execute()
a 6
a 6
b 0
```
For DataFrameGroupBy:
```pycon
>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = md.DataFrame(data, columns=["a", "b", "c"],
... index=["fox", "gorilla", "lion"])
>>> df.execute()
a b c
fox 1 8 2
gorilla 1 2 5
lion 2 6 9
>>> df.groupby("a").groups.execute()
{1: ['fox', 'gorilla'], 2: ['lion']}
>>> df.groupby("a").cummax().execute()
b c
fox 8 2
gorilla 8 5
lion 6 9
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.cummin.md
# maxframe.dataframe.groupby.GroupBy.cummin
#### GroupBy.cummin()
Cumulative min for each group.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`Series.groupby`
: Apply a function groupby to a Series.
`DataFrame.groupby`
: Apply a function groupby to each row or column of a DataFrame.
### Examples
For SeriesGroupBy:
```pycon
>>> import maxframe.dataframe as md
>>> lst = ['a', 'a', 'b']
>>> ser = md.Series([6, 2, 0], index=lst)
>>> ser.execute()
a 6
a 2
b 0
dtype: int64
>>> ser.groupby(level=0).cummin().execute()
a 6
a 2
b 0
```
For DataFrameGroupBy:
```pycon
>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = md.DataFrame(data, columns=["a", "b", "c"],
... index=["fox", "gorilla", "lion"])
>>> df.execute()
a b c
fox 1 8 2
gorilla 1 2 5
lion 2 6 9
>>> df.groupby("a").groups.execute()
{1: ['fox', 'gorilla'], 2: ['lion']}
>>> df.groupby("a").cummin().execute()
b c
fox 8 2
gorilla 2 2
lion 6 9
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.cumprod.md
# maxframe.dataframe.groupby.GroupBy.cumprod
#### GroupBy.cumprod()
Cumulative prod for each group.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`Series.groupby`
: Apply a function groupby to a Series.
`DataFrame.groupby`
: Apply a function groupby to each row or column of a DataFrame.
### Examples
For SeriesGroupBy:
```pycon
>>> import maxframe.dataframe as md
>>> lst = ['a', 'a', 'b']
>>> ser = md.Series([6, 2, 0], index=lst)
>>> ser.execute()
a 6
a 2
b 0
dtype: int64
>>> ser.groupby(level=0).cumprod().execute()
a 6
a 12
b 0
```
For DataFrameGroupBy:
```pycon
>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = md.DataFrame(data, columns=["a", "b", "c"],
... index=["fox", "gorilla", "lion"])
>>> df.execute()
a b c
fox 1 8 2
gorilla 1 2 5
lion 2 6 9
>>> df.groupby("a").groups.execute()
{1: ['fox', 'gorilla'], 2: ['lion']}
>>> df.groupby("a").cumprod().execute()
b c
fox 8 2
gorilla 16 10
lion 6 9
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.cumsum.md
# maxframe.dataframe.groupby.GroupBy.cumsum
#### GroupBy.cumsum()
Cumulative sum for each group.
* **Return type:**
[Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) or [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`Series.groupby`
: Apply a function groupby to a Series.
`DataFrame.groupby`
: Apply a function groupby to each row or column of a DataFrame.
### Examples
For SeriesGroupBy:
```pycon
>>> import maxframe.dataframe as md
>>> lst = ['a', 'a', 'b']
>>> ser = md.Series([6, 2, 0], index=lst)
>>> ser.execute()
a 6
a 2
b 0
dtype: int64
>>> ser.groupby(level=0).cumsum().execute()
a 6
a 8
b 0
```
For DataFrameGroupBy:
```pycon
>>> data = [[1, 8, 2], [1, 2, 5], [2, 6, 9]]
>>> df = md.DataFrame(data, columns=["a", "b", "c"],
... index=["fox", "gorilla", "lion"])
>>> df.execute()
a b c
fox 1 8 2
gorilla 1 2 5
lion 2 6 9
>>> df.groupby("a").groups.execute()
{1: ['fox', 'gorilla'], 2: ['lion']}
>>> df.groupby("a").cumsum().execute()
b c
fox 8 2
gorilla 10 7
lion 6 9
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.expanding.md
# maxframe.dataframe.groupby.GroupBy.expanding
#### GroupBy.expanding(min_periods=1, , shift=0, reverse_range=False, order_cols=None, ascending=True)
Return an expanding grouper, providing expanding
functionality per group.
* **Parameters:**
* **min_periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) – Minimum number of observations in window required to have a value;
otherwise, result is `np.nan`.
* **shift** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 0*) – If specified, the window will be shifted by shift rows (or data will be
shifted by -shift rows) before computing window function.
* **reverse_range** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True, the window for current row is expanded from the last row to
the current instead of the first row.
* **Return type:**
maxframe.dataframe.groupby.ExpandingGroupby
#### SEE ALSO
`Series.groupby`
: Apply a function groupby to a Series.
`DataFrame.groupby`
: Apply a function groupby to each row or column of a DataFrame.
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.max.md
# maxframe.dataframe.groupby.GroupBy.max
#### GroupBy.max(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.mean.md
# maxframe.dataframe.groupby.GroupBy.mean
#### GroupBy.mean(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.median.md
# maxframe.dataframe.groupby.GroupBy.median
#### GroupBy.median(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.min.md
# maxframe.dataframe.groupby.GroupBy.min
#### GroupBy.min(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.rolling.md
# maxframe.dataframe.groupby.GroupBy.rolling
#### GroupBy.rolling(window, min_periods=None, , center=False, win_type=None, on=None, axis=0, closed=None, shift=0, order_cols=None, ascending=True) → RollingGroupby
Return a rolling grouper, providing rolling functionality per group.
* **Parameters:**
* **window** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *timedelta* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *offset* *, or* *BaseIndexer subclass*) –
Size of the moving window.
If an integer, the fixed number of observations used for
each window.
If a timedelta, str, or offset, the time period of each window. Each
window will be a variable sized based on the observations included in
the time-period. This is only valid for datetimelike indexes.
To learn more about the offsets & frequency strings, please see [this link](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).
If a BaseIndexer subclass, the window boundaries
based on the defined `get_window_bounds` method. Additional rolling
keyword arguments, namely `min_periods`, `center`, `closed` and
`step` will be passed to `get_window_bounds`.
* **min_periods** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default None*) –
Minimum number of observations in window required to have a value;
otherwise, result is `np.nan`.
For a window that is specified by an offset,
`min_periods` will default to 1.
For a window that is specified by an integer, `min_periods` will default
to the size of the window.
* **center** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) –
If False, set the window labels as the right edge of the window index.
If True, set the window labels as the center of the window index.
* **win_type** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) –
If `None`, all points are evenly weighted.
If a string, it must be a valid [scipy.signal window function](https://docs.scipy.org/doc/scipy/reference/signal.windows.html#module-scipy.signal.windows).
Certain Scipy window types require additional parameters to be passed
in the aggregation function. The additional parameters must match
the keywords specified in the Scipy window type method signature.
* **on** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) –
For a DataFrame, a column label or Index level on which
to calculate the rolling window, rather than the DataFrame’s index.
Provided integer column is ignored and excluded from result since
an integer index is not used to calculate the rolling window.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 0*) –
If `0` or `'index'`, roll across the rows.
If `1` or `'columns'`, roll across the columns.
For Series this parameter is unused and defaults to 0.
* **closed** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) –
If `'right'`, the first point in the window is excluded from calculations.
If `'left'`, the last point in the window is excluded from calculations.
If `'both'`, no points in the window are excluded from calculations.
If `'neither'`, the first and last points in the window are excluded
from calculations.
Default `None` (`'right'`).
* **shift** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 0*) – If specified, the window will be shifted by shift rows (or data will be
shifted by -shift rows) before computing window function.
* **Returns:**
Return a new grouper with our rolling appended.
* **Return type:**
maxframe.dataframe.groupby.RollingGroupby
#### SEE ALSO
`Series.rolling`
: Calling object with Series data.
`DataFrame.rolling`
: Calling object with DataFrames.
`Series.groupby`
: Apply a function groupby to a Series.
`DataFrame.groupby`
: Apply a function groupby.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A': [1, 1, 2, 2],
... 'B': [1, 2, 3, 4],
... 'C': [0.362, 0.227, 1.267, -0.562]})
>>> df.execute()
A B C
0 1 1 0.362
1 1 2 0.227
2 2 3 1.267
3 2 4 -0.562
```
```pycon
>>> df.groupby('A').rolling(2).sum().execute()
B C
A
1 0 NaN NaN
1 3.0 0.589
2 2 NaN NaN
3 7.0 0.705
```
```pycon
>>> df.groupby('A').rolling(2, min_periods=1).sum().execute()
B C
A
1 0 1.0 0.362
1 3.0 0.589
2 2 3.0 1.267
3 7.0 0.705
```
```pycon
>>> df.groupby('A').rolling(2, on='B').sum().execute()
B C
A
1 0 1 NaN
1 2 0.589
2 2 3 NaN
3 4 0.705
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.sem.md
# maxframe.dataframe.groupby.GroupBy.sem
#### GroupBy.sem(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.size.md
# maxframe.dataframe.groupby.GroupBy.size
#### GroupBy.size(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.std.md
# maxframe.dataframe.groupby.GroupBy.std
#### GroupBy.std(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.sum.md
# maxframe.dataframe.groupby.GroupBy.sum
#### GroupBy.sum(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.transform.md
# maxframe.dataframe.groupby.GroupBy.transform
#### GroupBy.transform(f, \*args, dtypes=None, dtype=None, name=None, index=None, output_types=None, skip_infer=False, \*\*kwargs)
Call function producing a like-indexed DataFrame on each group and
return a DataFrame having the same indexes as the original object
filled with the transformed values
* **Parameters:**
* **f** (*function*) – Function to apply to each group.
* **dtypes** ([*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series) *,* *default None*) – Specify dtypes of returned DataFrames. See Notes for more details.
* **dtype** ([*numpy.dtype*](https://numpy.org/doc/stable/reference/generated/numpy.dtype.html#numpy.dtype) *,* *default None*) – Specify dtype of returned Series. See Notes for more details.
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – Specify name of returned Series. See Notes for more details.
* **skip_infer** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Whether infer dtypes when dtypes or output_type is not specified.
* **\*args** – Positional arguments to pass to func
* **\*\*kwargs** – Keyword arguments to be passed into func.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`DataFrame.groupby.apply`, `DataFrame.groupby.aggregate`, `DataFrame.transform`
### Notes
Each group is endowed the attribute ‘name’ in case you need to know
which group you are working on.
The current implementation imposes three requirements on f:
* f must return a value that either has the same shape as the input
subframe or can be broadcast to the shape of the input subframe.
For example, if f returns a scalar it will be broadcast to have the
same shape as the input subframe.
* if this is a DataFrame, f must support application column-by-column
in the subframe. If f also supports application to the entire subframe,
then a fast path is used starting from the second chunk.
* f must not mutate groups. Mutation is not supported and may
produce unexpected results.
### Notes
When deciding output dtypes and shape of the return value, MaxFrame will
try applying `func` onto a mock grouped object, and the transform call
may fail.
* For DataFrame output, you need to specify a list or a pandas Series
as `dtypes` of output DataFrame. `index` of output can also be
specified.
* For Series output, you need to specify `dtype` and `name` of
output Series.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : ['one', 'one', 'two', 'three',
... 'two', 'two'],
... 'C' : [1, 5, 5, 2, 5, 5],
... 'D' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.transform(lambda x: (x - x.mean()) / x.std()).execute()
C D
0 -1.154701 -0.577350
1 0.577350 0.000000
2 0.577350 1.154701
3 -1.154701 -1.000000
4 0.577350 -0.577350
5 0.577350 1.000000
```
Broadcast result of the transformation
```pycon
>>> grouped.transform(lambda x: x.max() - x.min()).execute()
C D
0 4 6.0
1 3 8.0
2 4 6.0
3 3 8.0
4 4 6.0
5 3 8.0
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.groupby.GroupBy.var.md
# maxframe.dataframe.groupby.GroupBy.var
#### GroupBy.var(\*\*kw)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.isna.md
# maxframe.dataframe.isna
### maxframe.dataframe.isna(obj)
Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as None or `numpy.NaN`, gets mapped to True
values.
Everything else gets mapped to False values. Characters such as empty
strings `''` or `numpy.inf` are not considered NA values
(unless you set `pandas.options.mode.use_inf_as_na = True`).
* **Returns:**
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.isnull`](maxframe.dataframe.DataFrame.isnull.md#maxframe.dataframe.DataFrame.isnull)
: Alias of isna.
[`DataFrame.notna`](maxframe.dataframe.DataFrame.notna.md#maxframe.dataframe.DataFrame.notna)
: Boolean inverse of isna.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Omit axes labels with missing values.
[`isna`](#maxframe.dataframe.isna)
: Top-level isna.
### Examples
Show which entries in a DataFrame are NA.
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'age': [5, 6, np.NaN],
... 'born': [md.NaT, md.Timestamp('1939-05-27'),
... md.Timestamp('1940-04-25')],
... 'name': ['Alfred', 'Batman', ''],
... 'toy': [None, 'Batmobile', 'Joker']})
>>> df.execute()
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
```
```pycon
>>> df.isna().execute()
age born name toy
0 False True False True
1 False False False False
2 True False False False
```
Show which entries in a Series are NA.
```pycon
>>> ser = md.Series([5, 6, np.NaN])
>>> ser.execute()
0 5.0
1 6.0
2 NaN
dtype: float64
```
```pycon
>>> ser.isna().execute()
0 False
1 False
2 True
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.isnull.md
# maxframe.dataframe.isnull
### maxframe.dataframe.isnull(obj)
Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values, such as None or `numpy.NaN`, gets mapped to True
values.
Everything else gets mapped to False values. Characters such as empty
strings `''` or `numpy.inf` are not considered NA values
(unless you set `pandas.options.mode.use_inf_as_na = True`).
* **Returns:**
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.isnull`](maxframe.dataframe.DataFrame.isnull.md#maxframe.dataframe.DataFrame.isnull)
: Alias of isna.
[`DataFrame.notna`](maxframe.dataframe.DataFrame.notna.md#maxframe.dataframe.DataFrame.notna)
: Boolean inverse of isna.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Omit axes labels with missing values.
[`isna`](maxframe.dataframe.isna.md#maxframe.dataframe.isna)
: Top-level isna.
### Examples
Show which entries in a DataFrame are NA.
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'age': [5, 6, np.NaN],
... 'born': [md.NaT, md.Timestamp('1939-05-27'),
... md.Timestamp('1940-04-25')],
... 'name': ['Alfred', 'Batman', ''],
... 'toy': [None, 'Batmobile', 'Joker']})
>>> df.execute()
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
```
```pycon
>>> df.isna().execute()
age born name toy
0 False True False True
1 False False False False
2 True False False False
```
Show which entries in a Series are NA.
```pycon
>>> ser = md.Series([5, 6, np.NaN])
>>> ser.execute()
0 5.0
1 6.0
2 NaN
dtype: float64
```
```pycon
>>> ser.isna().execute()
0 False
1 False
2 True
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.merge.md
# maxframe.dataframe.merge
### maxframe.dataframe.merge(df: DataFrame | Series, right: DataFrame | Series, how: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'inner', on: [str](https://docs.python.org/3/library/stdtypes.html#str) | [List](https://docs.python.org/3/library/typing.html#typing.List)[[str](https://docs.python.org/3/library/stdtypes.html#str)] = None, left_on: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, right_on: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, left_index: [bool](https://docs.python.org/3/library/functions.html#bool) = False, right_index: [bool](https://docs.python.org/3/library/functions.html#bool) = False, sort: [bool](https://docs.python.org/3/library/functions.html#bool) = False, suffixes: [Tuple](https://docs.python.org/3/library/typing.html#typing.Tuple)[[str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None), [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None)] = ('_x', '_y'), copy: [bool](https://docs.python.org/3/library/functions.html#bool) = True, indicator: [bool](https://docs.python.org/3/library/functions.html#bool) = False, validate: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, method: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'auto', auto_merge: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'both', auto_merge_threshold: [int](https://docs.python.org/3/library/functions.html#int) = 8, bloom_filter: [bool](https://docs.python.org/3/library/functions.html#bool) | [str](https://docs.python.org/3/library/stdtypes.html#str) = 'auto', bloom_filter_options: [Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)] = None, left_hint: JoinHint = None, right_hint: JoinHint = None) → DataFrame
Merge DataFrame or named Series objects with a database-style join.
A named Series object is treated as a DataFrame with a single named column.
The join is done on columns or indexes. If joining columns on
columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes
on indexes or indexes on a column or columns, the index will be passed on.
When performing a cross merge, no column specifications to merge on are
allowed.
* **Parameters:**
* **right** ([*DataFrame*](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *or* *named Series*) – Object to merge with.
* **how** ( *{'left'* *,* *'right'* *,* *'outer'* *,* *'inner'}* *,* *default 'inner'*) –
Type of merge to be performed.
* left: use only keys from left frame, similar to a SQL left outer join;
preserve key order.
* right: use only keys from right frame, similar to a SQL right outer join;
preserve key order.
* outer: use union of keys from both frames, similar to a SQL full outer
join; sort keys lexicographically.
* inner: use intersection of keys from both frames, similar to a SQL inner
join; preserve the order of the left keys.
* **on** (*label* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list)) – Column or index level names to join on. These must be found in both
DataFrames. If on is None and not merging on indexes then this defaults
to the intersection of the columns in both DataFrames.
* **left_on** (*label* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *, or* *array-like*) – Column or index level names to join on in the left DataFrame. Can also
be an array or list of arrays of the length of the left DataFrame.
These arrays are treated as if they are columns.
* **right_on** (*label* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *, or* *array-like*) – Column or index level names to join on in the right DataFrame. Can also
be an array or list of arrays of the length of the right DataFrame.
These arrays are treated as if they are columns.
* **left_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Use the index from the left DataFrame as the join key(s). If it is a
MultiIndex, the number of keys in the other DataFrame (either the index
or a number of columns) must match the number of levels.
* **right_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Use the index from the right DataFrame as the join key. Same caveats as
left_index.
* **sort** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Sort the join keys lexicographically in the result DataFrame. If False,
the order of the join keys depends on the join type (how keyword).
* **suffixes** (*list-like* *,* *default is* *(* *"_x"* *,* *"_y"* *)*) – A length-2 sequence where each element is optionally a string
indicating the suffix to add to overlapping column names in
left and right respectively. Pass a value of None instead
of a string to indicate that the column name from left or
right should be left as-is, with no suffix. At least one of the
values must not be None.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If False, avoid copy if possible.
* **indicator** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default False*) – If True, adds a column to the output DataFrame called “_merge” with
information on the source of each row. The column can be given a different
name by providing a string argument. The column will have a Categorical
type with the value of “left_only” for observations whose merge key only
appears in the left DataFrame, “right_only” for observations
whose merge key only appears in the right DataFrame, and “both”
if the observation’s merge key is found in both DataFrames.
* **validate** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) –
If specified, checks if merge is of specified type.
* ”one_to_one” or “1:1”: check if merge keys are unique in both
left and right datasets.
* ”one_to_many” or “1:m”: check if merge keys are unique in left
dataset.
* ”many_to_one” or “m:1”: check if merge keys are unique in right
dataset.
* ”many_to_many” or “m:m”: allowed, but does not result in checks.
* **method** ( *{"auto"* *,* *"shuffle"* *,* *"broadcast"}* *,* *default auto*) – “broadcast” is recommended when one DataFrame is much smaller than the other,
otherwise, “shuffle” will be a better choice. By default, we choose method
according to actual data size.
* **auto_merge** ( *{"both"* *,* *"none"* *,* *"before"* *,* *"after"}* *,* *default both*) –
Auto merge small chunks before or after merge
* ”both”: auto merge small chunks before and after,
* ”none”: do not merge small chunks
* ”before”: only merge small chunks before merge
* ”after”: only merge small chunks after merge
* **auto_merge_threshold** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 8*) – When how is “inner”, merged result could be much smaller than original DataFrame,
if the number of chunks is greater than the threshold,
it will merge small chunks automatically.
* **bloom_filter** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default "auto"*) – Use bloom filter to optimize merge
* **bloom_filter_options** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) –
* “max_elements”: max elements in bloom filter,
default value is the max size of all input chunks
* ”error_rate”: error raite, default 0.1.
* ”apply_chunk_size_threshold”: min chunk size of input chunks to apply bloom filter, default 10
when chunk size of left and right is greater than this threshold, apply bloom filter
* ”filter”: “large”, “small”, “both”, default “large”
decides to filter on large, small or both DataFrames.
* **left_hint** (*JoinHint* *,* *default None*) – Join strategy to use for left frame. When data skew occurs, consider these strategies to avoid long-tail issues,
but use them cautiously to prevent OOM and unnecessary overhead.
* **right_hint** (*JoinHint* *,* *default None*) – Join strategy to use for right frame.
* **Returns:**
A DataFrame of the two merged objects.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df1 = md.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
... 'value': [1, 2, 3, 5]})
>>> df2 = md.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
... 'value': [5, 6, 7, 8]})
>>> df1.execute()
lkey value
0 foo 1
1 bar 2
2 baz 3
3 foo 5
>>> df2.execute()
rkey value
0 foo 5
1 bar 6
2 baz 7
3 foo 8
```
Merge df1 and df2 on the lkey and rkey columns. The value columns have
the default suffixes, \_x and \_y, appended.
```pycon
>>> df1.merge(df2, left_on='lkey', right_on='rkey').execute()
lkey value_x rkey value_y
0 foo 1 foo 5
1 foo 1 foo 8
2 foo 5 foo 5
3 foo 5 foo 8
4 bar 2 bar 6
5 baz 3 baz 7
```
Merge DataFrames df1 and df2 with specified left and right suffixes
appended to any overlapping columns.
```pycon
>>> df1.merge(df2, left_on='lkey', right_on='rkey',
... suffixes=('_left', '_right')).execute()
lkey value_left rkey value_right
0 foo 1 foo 5
1 foo 1 foo 8
2 foo 5 foo 5
3 foo 5 foo 8
4 bar 2 bar 6
5 baz 3 baz 7
```
Merge DataFrames df1 and df2, but raise an exception if the DataFrames have
any overlapping columns.
```pycon
>>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False)).execute()
Traceback (most recent call last):
...
ValueError: columns overlap but no suffix specified:
Index(['value'], dtype='object')
```
```pycon
>>> df1 = md.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
>>> df2 = md.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
>>> df1.execute()
a b
0 foo 1
1 bar 2
>>> df2.execute()
a c
0 foo 3
1 baz 4
```
```pycon
>>> df1.merge(df2, how='inner', on='a').execute()
a b c
0 foo 1 3
```
```pycon
>>> df1.merge(df2, how='left', on='a').execute()
a b c
0 foo 1 3.0
1 bar 2 NaN
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.notna.md
# maxframe.dataframe.notna
### maxframe.dataframe.notna(obj)
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings `''` or `numpy.inf` are not considered NA values
(unless you set `pandas.options.mode.use_inf_as_na = True`).
NA values, such as None or `numpy.NaN`, get mapped to False
values.
* **Returns:**
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.notnull`](maxframe.dataframe.DataFrame.notnull.md#maxframe.dataframe.DataFrame.notnull)
: Alias of notna.
[`DataFrame.isna`](maxframe.dataframe.DataFrame.isna.md#maxframe.dataframe.DataFrame.isna)
: Boolean inverse of notna.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Omit axes labels with missing values.
[`notna`](#maxframe.dataframe.notna)
: Top-level notna.
### Examples
Show which entries in a DataFrame are not NA.
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'age': [5, 6, np.NaN],
... 'born': [md.NaT, md.Timestamp('1939-05-27'),
... md.Timestamp('1940-04-25')],
... 'name': ['Alfred', 'Batman', ''],
... 'toy': [None, 'Batmobile', 'Joker']})
>>> df.execute()
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
```
```pycon
>>> df.notna().execute()
age born name toy
0 True False True False
1 True True True True
2 False True True True
```
Show which entries in a Series are not NA.
```pycon
>>> ser = md.Series([5, 6, np.NaN])
>>> ser.execute()
0 5.0
1 6.0
2 NaN
dtype: float64
```
```pycon
>>> ser.notna().execute()
0 True
1 True
2 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.notnull.md
# maxframe.dataframe.notnull
### maxframe.dataframe.notnull(obj)
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings `''` or `numpy.inf` are not considered NA values
(unless you set `pandas.options.mode.use_inf_as_na = True`).
NA values, such as None or `numpy.NaN`, get mapped to False
values.
* **Returns:**
Mask of bool values for each element in DataFrame that
indicates whether an element is not an NA value.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.notnull`](maxframe.dataframe.DataFrame.notnull.md#maxframe.dataframe.DataFrame.notnull)
: Alias of notna.
[`DataFrame.isna`](maxframe.dataframe.DataFrame.isna.md#maxframe.dataframe.DataFrame.isna)
: Boolean inverse of notna.
[`DataFrame.dropna`](maxframe.dataframe.DataFrame.dropna.md#maxframe.dataframe.DataFrame.dropna)
: Omit axes labels with missing values.
[`notna`](maxframe.dataframe.notna.md#maxframe.dataframe.notna)
: Top-level notna.
### Examples
Show which entries in a DataFrame are not NA.
```pycon
>>> import numpy as np
>>> import maxframe.dataframe as md
>>> df = md.DataFrame({'age': [5, 6, np.NaN],
... 'born': [md.NaT, md.Timestamp('1939-05-27'),
... md.Timestamp('1940-04-25')],
... 'name': ['Alfred', 'Batman', ''],
... 'toy': [None, 'Batmobile', 'Joker']})
>>> df.execute()
age born name toy
0 5.0 NaT Alfred None
1 6.0 1939-05-27 Batman Batmobile
2 NaN 1940-04-25 Joker
```
```pycon
>>> df.notna().execute()
age born name toy
0 True False True False
1 True True True True
2 False True True True
```
Show which entries in a Series are not NA.
```pycon
>>> ser = md.Series([5, 6, np.NaN])
>>> ser.execute()
0 5.0
1 6.0
2 NaN
dtype: float64
```
```pycon
>>> ser.notna().execute()
0 True
1 True
2 False
dtype: bool
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.read_clipboard.md
# maxframe.dataframe.read_clipboard
### maxframe.dataframe.read_clipboard(sep=None, \*\*kwargs)
Read text from clipboard and pass to [`read_csv()`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv).
Parses clipboard contents similar to how CSV files are parsed
using [`read_csv()`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv).
* **Parameters:**
* **sep** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 's+'*) – A string or regex delimiter. The default of `'\s+'` denotes
one or more whitespace characters.
* **\*\*kwargs** – See [`read_csv()`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv) for the full argument list.
* **Returns:**
A parsed [`DataFrame`](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) object.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
[`DataFrame.to_clipboard`](maxframe.dataframe.DataFrame.to_clipboard.md#maxframe.dataframe.DataFrame.to_clipboard)
: Copy object to the system clipboard.
[`read_csv`](maxframe.dataframe.read_csv.md#maxframe.dataframe.read_csv)
: Read a comma-separated values (csv) file into DataFrame.
`read_fwf`
: Read a table of fixed-width formatted lines into DataFrame.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> df = md.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])
>>> df.to_clipboard()
>>> md.read_clipboard()
A B C
0 1 2 3
1 4 5 6
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.read_csv.md
# maxframe.dataframe.read_csv
### maxframe.dataframe.read_csv(path, \*, names=None, sep: str = ', ', index_col=None, compression='infer', header='infer', dtype=None, usecols=None, nrows=None, chunk_bytes='64M', gpu=None, head_bytes='100k', head_lines=None, default_index_type: ~maxframe.protocol.DefaultIndexType | str = None, use_nullable_dtypes: bool = <no_default>, dtype_backend: str = <no_default>, storage_options: dict = None, memory_scale: int = None, merge_small_files: bool = True, merge_small_file_options: dict = None, session=None, run_kwargs: dict = None, \*\*kwargs)
Read a comma-separated values (csv) file into DataFrame.
Also supports optionally iterating or breaking of the file
into chunks.
* **Parameters:**
* **path** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Any valid string path is acceptable. The string could be a URL. Valid
URL schemes include http, ftp, s3, and file. For file URLs, a host is
expected. A local file could be: [file://localhost/path/to/table.csv](file://localhost/path/to/table.csv),
you can also read from external resources using a URL like:
hdfs://localhost:8020/test.csv.
If you want to pass in a path object, pandas accepts any `os.PathLike`.
By file-like object, we refer to objects with a `read()` method, such as
a file handler (e.g. via builtin `open` function) or `StringIO`.
* **sep** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '* *,* *'*) – Delimiter to use. If sep is None, the C engine cannot automatically detect
the separator, but the Python parsing engine can, meaning the latter will
be used and automatically detect the separator by Python’s builtin sniffer
tool, `csv.Sniffer`. In addition, separators longer than 1 character and
different from `'\s+'` will be interpreted as regular expressions and
will also force the use of the Python parsing engine. Note that regex
delimiters are prone to ignoring quoted data. Regex example: `'\r\t'`.
* **delimiter** (str, default `None`) – Alias for sep.
* **header** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 'infer'*) – Row number(s) to use as the column names, and the start of the
data. Default behavior is to infer the column names: if no names
are passed the behavior is identical to `header=0` and column
names are inferred from the first line of the file, if column
names are passed explicitly then the behavior is identical to
`header=None`. Explicitly pass `header=0` to be able to
replace existing names. The header can be a list of integers that
specify row locations for a multi-index on the columns
e.g. [0,1,3]. Intervening rows that are not specified will be
skipped (e.g. 2 in this example is skipped). Note that this
parameter ignores commented lines and empty lines if
`skip_blank_lines=True`, so `header=0` denotes the first line of
data rather than the first line of the file.
* **names** (*array-like* *,* *optional*) – List of column names to use. If the file contains a header row,
then you should explicitly pass `header=0` to override the column names.
Duplicates in this list are not allowed.
* **index_col** (int, str, sequence of int / str, or False, default `None`) – Column(s) to use as the row labels of the `DataFrame`, either given as
string name or column index. If a sequence of int / str is given, a
MultiIndex is used.
Note: `index_col=False` can be used to force pandas to *not* use the first
column as the index, e.g. when you have a malformed file with delimiters at
the end of each line.
* **usecols** (*list-like* *or* *callable* *,* *optional*) – Return a subset of the columns. If list-like, all elements must either
be positional (i.e. integer indices into the document columns) or strings
that correspond to column names provided either by the user in names or
inferred from the document header row(s). For example, a valid list-like
usecols parameter would be `[0, 1, 2]` or `['foo', 'bar', 'baz']`.
Element order is ignored, so `usecols=[0, 1]` is the same as `[1, 0]`.
To instantiate a DataFrame from `data` with element order preserved use
`pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]` for columns
in `['foo', 'bar']` order or
`pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]`
for `['bar', 'foo']` order.
If callable, the callable function will be evaluated against the column
names, returning names where the callable function evaluates to True. An
example of a valid callable argument would be `lambda x: x.upper() in
['AAA', 'BBB', 'DDD']`. Using this parameter results in much faster
parsing time and lower memory usage.
* **prefix** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Prefix to add to column numbers when no header, e.g. ‘X’ for X0, X1, …
* **mangle_dupe_cols** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Duplicate columns will be specified as ‘X’, ‘X.1’, …’X.N’, rather than
‘X’…’X’. Passing in False will cause data to be overwritten if there
are duplicate names in the columns.
* **dtype** (*Type name* *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *of* *column -> type* *,* *optional*) – Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32,
‘c’: ‘Int64’}
Use str or object together with suitable na_values settings
to preserve and not interpret dtype.
If converters are specified, they will be applied INSTEAD
of dtype conversion.
* **engine** ( *{'c'* *,* *'python'}* *,* *optional*) – Parser engine to use. The C engine is faster while the python engine is
currently more feature-complete.
* **converters** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *optional*) – Dict of functions for converting values in certain columns. Keys can either
be integers or column labels.
* **true_values** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *optional*) – Values to consider as True.
* **false_values** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *optional*) – Values to consider as False.
* **skipinitialspace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Skip spaces after delimiter.
* **skiprows** (*list-like* *,* [*int*](https://docs.python.org/3/library/functions.html#int) *or* *callable* *,* *optional*) – Line numbers to skip (0-indexed) or number of lines to skip (int)
at the start of the file.
If callable, the callable function will be evaluated against the row
indices, returning True if the row should be skipped and False otherwise.
An example of a valid callable argument would be `lambda x: x in [0, 2]`.
* **skipfooter** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default 0*) – Number of lines at bottom of file to skip (Unsupported with engine=’c’).
* **nrows** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Number of rows of file to read. Useful for reading pieces of large files.
* **na_values** (*scalar* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *list-like* *, or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *optional*) – Additional strings to recognize as NA/NaN. If dict passed, specific
per-column NA values. By default the following values are interpreted as
NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’,
‘1.#IND’, ‘1.#QNAN’, ‘<NA>’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’,
‘nan’, ‘null’.
* **keep_default_na** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) –
Whether or not to include the default NaN values when parsing the data.
Depending on whether na_values is passed in, the behavior is as follows:
\* If keep_default_na is True, and na_values are specified, na_values
> is appended to the default NaN values used for parsing.
* If keep_default_na is True, and na_values are not specified, only
the default NaN values are used for parsing.
* If keep_default_na is False, and na_values are specified, only
the NaN values specified na_values are used for parsing.
* If keep_default_na is False, and na_values are not specified, no
strings will be parsed as NaN.
Note that if na_filter is passed in as False, the keep_default_na and
na_values parameters will be ignored.
* **na_filter** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Detect missing value markers (empty strings and the value of na_values). In
data without any NAs, passing na_filter=False can improve the performance
of reading a large file.
* **verbose** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Indicate number of NA values placed in non-numeric columns.
* **skip_blank_lines** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If True, skip over blank lines rather than interpreting as NaN values.
* **parse_dates** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* *names* *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *lists* *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *default False*) –
The behavior is as follows:
\* boolean. If True -> try parsing the index.
\* list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3
> each as a separate date column.
* list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as
a single date column.
* dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call
result ‘foo’
If a column or index cannot be represented as an array of datetimes,
say because of an unparsable value or a mixture of timezones, the column
or index will be returned unaltered as an object data type. For
non-standard datetime parsing, use `pd.to_datetime` after
`pd.read_csv`. To parse an index or column with a mixture of timezones,
specify `date_parser` to be a partially-applied
[`pandas.to_datetime()`](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#pandas.to_datetime) with `utc=True`. See
[Parsing a CSV with mixed timezones](https://pandas.pydata.org/docs/user_guide/io.html#io-csv-mixed-timezones) for more.
Note: A fast-path exists for iso8601-formatted dates.
* **infer_datetime_format** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True and parse_dates is enabled, pandas will attempt to infer the
format of the datetime strings in the columns, and if it can be inferred,
switch to a faster method of parsing them. In some cases this can increase
the parsing speed by 5-10x.
* **keep_date_col** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True and parse_dates specifies combining multiple columns then
keep the original columns.
* **date_parser** (*function* *,* *optional*) – Function to use for converting a sequence of string columns to an array of
datetime instances. The default uses `dateutil.parser.parser` to do the
conversion. Pandas will try to call date_parser in three different ways,
advancing to the next if an exception occurs: 1) Pass one or more arrays
(as defined by parse_dates) as arguments; 2) concatenate (row-wise) the
string values from the columns defined by parse_dates into a single array
and pass that; and 3) call date_parser once for each row using one or
more strings (corresponding to the columns defined by parse_dates) as
arguments.
* **dayfirst** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – DD/MM format dates, international and European format.
* **cache_dates** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If True, use a cache of unique, converted dates to apply the datetime
conversion. May produce significant speed-up when parsing duplicate
date strings, especially ones with timezone offsets.
.. versionadded:: 0.25.0
* **iterator** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Return TextFileReader object for iteration or getting chunks with
`get_chunk()`.
* **chunksize** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Return TextFileReader object for iteration.
See the [IO Tools docs](https://pandas.pydata.org/pandas-docs/stable/io.html#io-chunking)
for more information on `iterator` and `chunksize`.
* **compression** ( *{'infer'* *,* *'gzip'* *,* *'bz2'* *,* *'zip'* *,* *'xz'* *,* *None}* *,* *default 'infer'*) – For on-the-fly decompression of on-disk data. If ‘infer’ and
filepath_or_buffer is path-like, then detect compression from the
following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no
decompression). If using ‘zip’, the ZIP file must contain only one data
file to be read in. Set to None for no decompression.
* **thousands** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Thousands separator.
* **decimal** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default '.'*) – Character to recognize as decimal point (e.g. use ‘,’ for European data).
* **lineterminator** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *(**length 1* *)* *,* *optional*) – Character to break file into lines. Only valid with C parser.
* **quotechar** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *(**length 1* *)* *,* *optional*) – The character used to denote the start and end of a quoted item. Quoted
items can include the delimiter and it will be ignored.
* **quoting** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *csv.QUOTE_\* instance* *,* *default 0*) – Control field quoting behavior per `csv.QUOTE_*` constants. Use one of
QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).
* **doublequote** (bool, default `True`) – When quotechar is specified and quoting is not `QUOTE_NONE`, indicate
whether or not to interpret two consecutive quotechar elements INSIDE a
field as a single `quotechar` element.
* **escapechar** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *(**length 1* *)* *,* *optional*) – One-character string used to escape other characters.
* **comment** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Indicates remainder of line should not be parsed. If found at the beginning
of a line, the line will be ignored altogether. This parameter must be a
single character. Like empty lines (as long as `skip_blank_lines=True`),
fully commented lines are ignored by the parameter header but not by
skiprows. For example, if `comment='#'`, parsing
`#empty\na,b,c\n1,2,3` with `header=0` will result in ‘a,b,c’ being
treated as the header.
* **encoding** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Encoding to use for UTF when reading/writing (ex. ‘utf-8’). [List of Python
standard encodings](https://docs.python.org/3/library/codecs.html#standard-encodings) .
* **dialect** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*csv.Dialect*](https://docs.python.org/3/library/csv.html#csv.Dialect) *,* *optional*) – If provided, this parameter will override values (default or not) for the
following parameters: delimiter, doublequote, escapechar,
skipinitialspace, quotechar, and quoting. If it is necessary to
override values, a ParserWarning will be issued. See csv.Dialect
documentation for more details.
* **error_bad_lines** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Lines with too many fields (e.g. a csv line with too many commas) will by
default cause an exception to be raised, and no DataFrame will be returned.
If False, then these “bad lines” will dropped from the DataFrame that is
returned.
* **warn_bad_lines** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If error_bad_lines is False, and warn_bad_lines is True, a warning for each
“bad line” will be output.
* **delim_whitespace** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Specifies whether or not whitespace (e.g. `' '` or `' '`) will be
used as the sep. Equivalent to setting `sep='\s+'`. If this option
is set to True, nothing should be passed in for the `delimiter`
parameter.
* **low_memory** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Internally process the file in chunks, resulting in lower memory use
while parsing, but possibly mixed type inference. To ensure no mixed
types either set False, or specify the type with the dtype parameter.
Note that the entire file is read into a single DataFrame regardless,
use the chunksize or iterator parameter to return the data in chunks.
(Only valid with C parser).
* **float_precision** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Specifies which converter the C engine should use for floating-point
values. The options are None for the ordinary converter,
high for the high-precision converter, and round_trip for the
round-trip converter.
* **chunk_bytes** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*float*](https://docs.python.org/3/library/functions.html#float) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Number of chunk bytes.
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If read into cudf DataFrame.
* **head_bytes** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*float*](https://docs.python.org/3/library/functions.html#float) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Number of bytes to use in the head of file, mainly for data inference.
* **head_lines** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Number of lines to use in the head of file, mainly for data inference.
* **default_index_type** ( *{None* *,* *'range'* *,* *'incremental'}* *,* *default None*) – If index_col not specified, specify type of index to generate.
If not specified, options.dataframe.default_index_type will be used.
* **dtype_backend** ( *{'numpy'* *,* *'pyarrow'}* *,* *default 'numpy'*) – Back-end data type applied to the resultant DataFrame (still experimental).
* **storage_options** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *optional*) – Options for storage connection.
* **merge_small_files** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Merge small files whose size is small.
* **merge_small_file_options** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) – Options for merging small files
* **Returns:**
A comma-separated values (csv) file is returned as two-dimensional
data structure with labeled axes.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
#### SEE ALSO
`to_csv`
: Write DataFrame to a comma-separated values (csv) file.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> md.read_csv('data.csv')
>>> # read from HDFS
>>> md.read_csv('hdfs://localhost:8020/test.csv')
>>> # read from OSS
>>> md.read_csv('oss://oss-cn-hangzhou.aliyuncs.com/bucket/test.csv',
>>> storage_options={'role_arn': 'acs:ram::xxxxxx:role/aliyunodpsdefaultrole'})
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.read_json.md
# maxframe.dataframe.read_json
### maxframe.dataframe.read_json(path, \*, orient=None, typ='frame', dtype=None, convert_axes=None, lines=False, chunksize=None, compression='infer', index_col=None, usecols=None, chunk_bytes='64M', gpu=None, head_bytes='100k', head_lines=None, default_index_type: ~maxframe.protocol.DefaultIndexType | str = None, use_nullable_dtypes: bool = <no_default>, dtype_backend: str = <no_default>, storage_options: dict = None, memory_scale: int = None, merge_small_files: bool = True, merge_small_file_options: dict = None, session=None, run_kwargs: dict = None, \*\*kwargs)
Read a JSON file into a DataFrame.
* **Parameters:**
* **path** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *path object* *, or* *file-like object*) – Any valid string path is acceptable. The string could be a URL. Valid
URL schemes include http, ftp, s3, and file. For file URLs, a host is
expected. A local file could be: [file://localhost/path/to/table.json](file://localhost/path/to/table.json),
you can also read from external resources using a URL like:
hdfs://localhost:8020/test.json.
If you want to pass in a path object, pandas accepts any `os.PathLike`.
By file-like object, we refer to objects with a `read()` method, such as
a file handler (e.g. via builtin `open` function) or `StringIO`.
* **orient** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) –
Indication of expected JSON string format.
Compatible JSON strings can be produced by `to_json()` with a
corresponding orient value.
The set of possible orients is:
- `'split'` : dict like `{'index' -> [index], 'columns' -> [columns], 'data' -> [values]}`
- `'records'` : list like `[{column -> value}, ... , {column -> value}]`
- `'index'` : dict like `{index -> {column -> value}}`
- `'columns'` : dict like `{column -> {index -> value}}`
- `'values'` : just the values array
The allowed and default values depend on the value of the typ parameter.
\* when `typ == 'series'`,
> - allowed orients are `{'split','records','index'}`
> - default is `'index'`
> - The Series index must be unique for orient `'index'`.
* when `typ == 'frame'`,
- allowed orients are `{'split','records','index','columns','values'}`
- default is `'columns'`
- The DataFrame index must be unique for orients `'index'` and `'columns'`.
- The DataFrame columns must be unique for orients `'index'`, `'columns'`,
> and `'records'`.
* **typ** ( *{{'frame'* *,* *'series'}}* *,* *default 'frame'*) – The type of object to recover.
* **dtype** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* [*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *default None*) – If True, infer dtypes; if a dict of column to dtype, then use those;
if False, then don’t infer dtypes at all, applies only to the data.
* **convert_axes** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default None*) – Try to convert the axes to the proper dtypes.
* **convert_dates** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default True*) – List of columns to parse for dates. If True, then try to parse datelike columns.
A column label is datelike if
\* it ends with `'_at'`,
\* it ends with `'_time'`,
\* it begins with `'date'`, or
\* it is `'datetime'`, `'timestamp'`, `'modified'`, or `'created'`.
* **keep_default_dates** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If parsing dates, then parse the default datelike columns.
* **precise_float** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Set to enable usage of higher precision (strtod) function when
decoding string to double values. Default (False) is to use fast but
less precise builtin functionality.
* **date_unit** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – The timestamp unit to detect if converting dates. The default behaviour
is to try and detect the correct precision, but if this is not desired
then pass one of ‘s’, ‘ms’, ‘us’ or ‘ns’ to force parsing only seconds,
milliseconds, microseconds or nanoseconds respectively.
* **encoding** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default is 'utf-8'*) – The encoding to use to decode py3 bytes.
* **lines** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Read the file as a json object per line.
* **chunksize** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Return JsonReader object for iteration.
See the [IO Tools docs](https://pandas.pydata.org/pandas-docs/stable/io.html#io-jsonl)
for more information on `chunksize`.
This can only be passed if lines=True.
If this is None, the file will be read into memory all at once.
* **compression** ( *{{'infer'* *,* *'gzip'* *,* *'bz2'* *,* *'zip'* *,* *'xz'* *,* *None}}* *,* *default 'infer'*) – For on-the-fly decompression of on-disk data. If ‘infer’ and
filepath_or_buffer is path-like, then detect compression from the
following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no
decompression). If using ‘zip’, the ZIP file must contain only one data
file to be read in. Set to None for no decompression.
* **index_col** (int, str, sequence of int / str, or False, default `None`) – Column(s) to use as the row labels of the `DataFrame`, either given as
string name or column index. If a sequence of int / str is given, a
MultiIndex is used.
Note: `index_col=False` can be used to force pandas to *not* use the first
column as the index, e.g. when you have a malformed file with delimiters at
the end of each line.
* **usecols** (*list-like* *or* *callable* *,* *optional*) – Return a subset of the columns. If list-like, all elements must either
be positional (i.e. integer indices into the document columns) or strings
that correspond to column names provided either by the user in names or
inferred from the document header row(s). For example, a valid list-like
usecols parameter would be `[0, 1, 2]` or `['foo', 'bar', 'baz']`.
Element order is ignored, so `usecols=[0, 1]` is the same as `[1, 0]`.
To instantiate a DataFrame from `data` with element order preserved use
`pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]` for columns
in `['foo', 'bar']` order or
`pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]`
for `['bar', 'foo']` order.
If callable, the callable function will be evaluated against the column
names, returning names where the callable function evaluates to True. An
example of a valid callable argument would be `lambda x: x.upper() in
['AAA', 'BBB', 'DDD']`. Using this parameter results in much faster
parsing time and lower memory usage.
* **chunk_bytes** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*float*](https://docs.python.org/3/library/functions.html#float) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Number of chunk bytes.
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If read into cudf DataFrame.
* **head_bytes** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*float*](https://docs.python.org/3/library/functions.html#float) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Number of bytes to use in the head of file, mainly for data inference.
* **head_lines** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Number of lines to use in the head of file, mainly for data inference.
* **default_index_type** ( *{None* *,* *'range'* *,* *'incremental'}* *,* *default None*) – If index_col not specified, specify type of index to generate.
If not specified, options.dataframe.default_index_type will be used.
* **dtype_backend** ( *{'numpy'* *,* *'pyarrow'}* *,* *default 'numpy'*) – Back-end data type applied to the resultant DataFrame (still experimental).
* **storage_options** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *optional*) – Options for storage connection.
* **merge_small_files** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Merge small files whose size is small.
* **merge_small_file_options** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) – Options for merging small files
* **Returns:**
A JSON file is returned as two-dimensional data structure with labeled axes.
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) or [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series)
#### SEE ALSO
`to_json`
: Convert DataFrame to JSON string.
`json_normalize`
: Normalize semi-structured JSON data into a flat table.
### Examples
```pycon
>>> import maxframe.dataframe as md
>>> md.read_json('data.json')
>>> # read from HDFS
>>> md.read_json('hdfs://localhost:8020/test.json')
>>> # read from OSS
>>> md.read_json('oss://oss-cn-hangzhou.aliyuncs.com/bucket/test.json',
>>> storage_options={'role_arn': 'acs:ram::xxxxxx:role/aliyunodpsdefaultrole'})
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.read_lance.md
# maxframe.dataframe.read_lance
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.read_odps_query.md
# maxframe.dataframe.read_odps_query
### maxframe.dataframe.read_odps_query(query: [str](https://docs.python.org/3/library/stdtypes.html#str), odps_entry: ODPS = None, index_col: [None](https://docs.python.org/3/library/constants.html#None) | [str](https://docs.python.org/3/library/stdtypes.html#str) | [List](https://docs.python.org/3/library/typing.html#typing.List)[[str](https://docs.python.org/3/library/stdtypes.html#str)] = None, string_as_binary: [bool](https://docs.python.org/3/library/functions.html#bool) = None, sql_hints: [Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [str](https://docs.python.org/3/library/stdtypes.html#str)] = None, anonymous_col_prefix: [str](https://docs.python.org/3/library/stdtypes.html#str) = '_anon_col_', skip_schema: [bool](https://docs.python.org/3/library/functions.html#bool) = False, dtype_backend: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, \*\*kw)
Read data from a MaxCompute (ODPS) query into DataFrame.
Supports specifying some columns as indexes. If not specified, RangeIndex
will be generated.
* **Parameters:**
* **query** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – MaxCompute SQL statement.
* **index_col** (*Union* *[**None* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *List* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *]* *]*) – Columns to be specified as indexes.
* **string_as_binary** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Whether to convert string columns to binary.
* **sql_hints** (*Dict* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *]* *,* *optional*) – User specified SQL hints.
* **anonymous_col_prefix** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Prefix for anonymous columns, ‘_anon_col_’ by default.
* **skip_schema** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Skip resolving output schema before execution. Once this is configured,
the output DataFrame cannot be inputs of other DataFrame operators
before execution.
* **dtype_backend** ( *{'numpy'* *,* *'pyarrow'}* *,* *default 'numpy'*) – Back-end data type applied to the resultant DataFrame (still experimental).
* **Returns:**
**result** – DataFrame read from MaxCompute (ODPS) table
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.read_odps_table.md
# maxframe.dataframe.read_odps_table
### maxframe.dataframe.read_odps_table(table_name: [str](https://docs.python.org/3/library/stdtypes.html#str) | Table, partitions: [None](https://docs.python.org/3/library/constants.html#None) | [str](https://docs.python.org/3/library/stdtypes.html#str) | [List](https://docs.python.org/3/library/typing.html#typing.List)[[str](https://docs.python.org/3/library/stdtypes.html#str)] = None, columns: [List](https://docs.python.org/3/library/typing.html#typing.List)[[str](https://docs.python.org/3/library/stdtypes.html#str)] | [None](https://docs.python.org/3/library/constants.html#None) = None, index_col: [None](https://docs.python.org/3/library/constants.html#None) | [str](https://docs.python.org/3/library/stdtypes.html#str) | [List](https://docs.python.org/3/library/typing.html#typing.List)[[str](https://docs.python.org/3/library/stdtypes.html#str)] = None, odps_entry: ODPS = None, string_as_binary: [bool](https://docs.python.org/3/library/functions.html#bool) = None, append_partitions: [bool](https://docs.python.org/3/library/functions.html#bool) = False, dtype_backend: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, default_index_type: DefaultIndexType = None, \*\*kw)
Read data from a MaxCompute (ODPS) table into DataFrame.
Supports specifying some columns as indexes. If not specified, RangeIndex
will be generated.
* **Parameters:**
* **table_name** (*Union* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *Table* *]*) – Name of the table to read from.
* **partitions** (*Union* *[**None* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *List* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *]* *]*) – Table partition or list of partitions to read from.
* **columns** (*Optional* *[**List* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *]* *]*) – Table columns to read from. You may also specify partition columns here.
If not specified, all table columns (or include partition columns if
append_partitions is True) will be included.
* **index_col** (*Union* *[**None* *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *List* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *]* *]*) – Columns to be specified as indexes.
* **append_partitions** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – If True, will add all partition columns as selected columns when
columns is not specified,
* **dtype_backend** ( *{'numpy'* *,* *'pyarrow'}* *,* *default 'numpy'*) – Back-end data type applied to the resultant DataFrame (still experimental).
* **Returns:**
**result** – DataFrame read from MaxCompute (ODPS) table
* **Return type:**
[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.read_pandas.md
# maxframe.dataframe.read_pandas
### maxframe.dataframe.read_pandas(data: DataFrame | Series | Index, \*\*kwargs) → [DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) | [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series) | [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)
Create MaxFrame objects from pandas.
* **Parameters:**
* **data** (*Union* *[**pd.DataFrame* *,* *pd.Series* *,* *pd.Index* *]*) – pandas data
* **kwargs** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict)) – arguments to be passed to initializers.
* **Returns:**
**result** – result MaxFrame object
* **Return type:**
Union[[DataFrame](maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame), [Series](maxframe.dataframe.Series.md#maxframe.dataframe.Series), [Index](maxframe.dataframe.Index.md#maxframe.dataframe.Index)]
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.read_parquet.md
# maxframe.dataframe.read_parquet
### maxframe.dataframe.read_parquet(path, engine: str = 'auto', columns: list = None, groups_as_chunks: bool = False, dtype_backend: str = <no_default>, default_index_type: ~maxframe.protocol.DefaultIndexType | str = None, storage_options: dict = None, use_nullable_dtypes: bool = <no_default>, \*, dtypes: ~pandas.core.series.Series = None, index_dtypes: ~pandas.core.series.Series = None, memory_scale: int = None, merge_small_files: bool = True, merge_small_file_options: dict = None, gpu: bool = None, session=None, run_kwargs: dict = None, \*\*kwargs)
Load a parquet object from the file path, returning a DataFrame.
* **Parameters:**
* **path** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *path object* *or* *file-like object*) – Any valid string path is acceptable. The string could be a URL.
For file URLs, a host is expected. A local file could be:
`file://localhost/path/to/table.parquet`.
A file URL can also be a path to a directory that contains multiple
partitioned parquet files. Both pyarrow and fastparquet support
paths to directories as well as file URLs. A directory path could be:
`file://localhost/path/to/tables`.
By file-like object, we refer to objects with a `read()` method,
such as a file handler (e.g. via builtin `open` function)
or `StringIO`.
* **engine** ( *{'auto'* *,* *'pyarrow'}* *,* *default 'auto'*) – Parquet library to use. The default behavior is to try ‘pyarrow’,
* **storage_options** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *optional*) – Options for storage connection.
* **columns** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *default=None*) – If not None, only these columns will be read from the file.
* **groups_as_chunks** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – if True, each row group correspond to a chunk.
if False, each file correspond to a chunk.
Only available for ‘pyarrow’ engine.
* **default_index_type** ( *{None* *,* *'range'* *,* *'incremental'}* *,* *default None*) – If index_col not specified, specify type of index to generate.
If not specified, options.dataframe.default_index_type will be used.
* **dtype_backend** ( *{'numpy'* *,* *'pyarrow'}* *,* *default 'numpy'*) – Back-end data type applied to the resultant DataFrame (still experimental).
* **storage_options** – Options for storage connection.
* **memory_scale** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Scale that real memory occupation divided with raw file size.
* **merge_small_files** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – Merge small files whose size is small.
* **\*\*kwargs** – Any additional kwargs are passed to the engine.
* **Return type:**
MaxFrame DataFrame
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.to_datetime.md
# maxframe.dataframe.to_datetime
### maxframe.dataframe.to_datetime(arg, errors: [str](https://docs.python.org/3/library/stdtypes.html#str) = 'raise', dayfirst: [bool](https://docs.python.org/3/library/functions.html#bool) = False, yearfirst: [bool](https://docs.python.org/3/library/functions.html#bool) = False, utc: [bool](https://docs.python.org/3/library/functions.html#bool) = None, format: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, exact: [bool](https://docs.python.org/3/library/functions.html#bool) = True, unit: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, infer_datetime_format: [bool](https://docs.python.org/3/library/functions.html#bool) = False, origin: [Any](https://docs.python.org/3/library/typing.html#typing.Any) = 'unix', cache: [bool](https://docs.python.org/3/library/functions.html#bool) = True)
Convert argument to datetime.
* **Parameters:**
* **arg** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*float*](https://docs.python.org/3/library/functions.html#float) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *datetime* *,* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *,* *1-d array* *,* *Series DataFrame/dict-like*) – The object to convert to a datetime.
* **errors** ( *{'ignore'* *,* *'raise'* *,* *'coerce'}* *,* *default 'raise'*) –
- If ‘raise’, then invalid parsing will raise an exception.
- If ‘coerce’, then invalid parsing will be set as NaT.
- If ‘ignore’, then invalid parsing will return the input.
* **dayfirst** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – Specify a date parse order if arg is str or its list-likes.
If True, parses dates with the day first, eg 10/11/12 is parsed as
2012-11-10.
Warning: dayfirst=True is not strict, but will prefer to parse
with day first (this is a known bug, based on dateutil behavior).
* **yearfirst** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) –
Specify a date parse order if arg is str or its list-likes.
- If True parses dates with the year first, eg 10/11/12 is parsed as
2010-11-12.
- If both dayfirst and yearfirst are True, yearfirst is preceded (same
as dateutil).
Warning: yearfirst=True is not strict, but will prefer to parse
with year first (this is a known bug, based on dateutil behavior).
* **utc** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default None*) – Return UTC DatetimeIndex if True (converting any tz-aware
datetime.datetime objects as well).
* **format** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default None*) – The strftime to parse time, eg “%d/%m/%Y”, note that “%f” will parse
all the way up to nanoseconds.
See strftime documentation for more information on choices:
[https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).
* **exact** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *True by default*) – Behaves as:
- If True, require an exact format match.
- If False, allow the format to match anywhere in the target string.
* **unit** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default 'ns'*) – The unit of the arg (D,s,ms,us,ns) denote the unit, which is an
integer or float number. This will be based off the origin.
Example, with unit=’ms’ and origin=’unix’ (the default), this
would calculate the number of milliseconds to the unix epoch start.
* **infer_datetime_format** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default False*) – If True and no format is given, attempt to infer the format of the
datetime strings, and if it can be inferred, switch to a faster
method of parsing them. In some cases this can increase the parsing
speed by ~5-10x.
* **origin** (*scalar* *,* *default 'unix'*) –
Define the reference date. The numeric values would be parsed as number
of units (defined by unit) since this reference date.
- If ‘unix’ (or POSIX) time; origin is set to 1970-01-01.
- If ‘julian’, unit must be ‘D’, and origin is set to beginning of
Julian Calendar. Julian day number 0 is assigned to the day starting
at noon on January 1, 4713 BC.
- If Timestamp convertible, origin is set to Timestamp identified by
origin.
* **cache** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default True*) – If True, use a cache of unique, converted dates to apply the datetime
conversion. May produce significant speed-up when parsing duplicate
date strings, especially ones with timezone offsets. The cache is only
used when there are at least 50 values. The presence of out-of-bounds
values will render the cache unusable and may slow down parsing.
* **Returns:**
If parsing succeeded.
Return type depends on input:
- list-like: DatetimeIndex
- Series: Series of datetime64 dtype
- scalar: Timestamp
In case when it is not possible to return designated types (e.g. when
any element of input is before Timestamp.min or after Timestamp.max)
return will have datetime.datetime type (or corresponding
array/Series).
* **Return type:**
datetime
#### SEE ALSO
[`DataFrame.astype`](maxframe.dataframe.DataFrame.astype.md#maxframe.dataframe.DataFrame.astype)
: Cast argument to a specified dtype.
`to_timedelta`
: Convert argument to timedelta.
`convert_dtypes`
: Convert dtypes.
### Examples
Assembling a datetime from multiple columns of a DataFrame. The keys can be
common abbreviations like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’,
‘ms’, ‘us’, ‘ns’]) or plurals of the same
```pycon
>>> import maxframe.dataframe as md
```
```pycon
>>> df = md.DataFrame({'year': [2015, 2016],
... 'month': [2, 3],
... 'day': [4, 5]})
>>> md.to_datetime(df).execute()
0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
```
If a date does not meet the [timestamp limitations](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-timestamp-limits), passing errors=’ignore’
will return the original input instead of raising any exception.
Passing errors=’coerce’ will force an out-of-bounds date to NaT,
in addition to forcing non-dates (or non-parseable dates) to NaT.
```pycon
>>> md.to_datetime('13000101', format='%Y%m%d', errors='ignore').execute()
datetime.datetime(1300, 1, 1, 0, 0)
>>> md.to_datetime('13000101', format='%Y%m%d', errors='coerce').execute()
NaT
```
Passing infer_datetime_format=True can often-times speedup a parsing
if its not an ISO8601 format exactly, but in a regular format.
```pycon
>>> s = md.Series(['3/11/2000', '3/12/2000', '3/13/2000'] * 1000)
>>> s.head().execute()
0 3/11/2000
1 3/12/2000
2 3/13/2000
3 3/11/2000
4 3/12/2000
dtype: object
```
Using a unix epoch time
```pycon
>>> md.to_datetime(1490195805, unit='s').execute()
Timestamp('2017-03-22 15:16:45')
>>> md.to_datetime(1490195805433502912, unit='ns').execute()
Timestamp('2017-03-22 15:16:45.433502912')
```
#### WARNING
For float arg, precision rounding might happen. To prevent
unexpected behavior use a fixed-width exact type.
Using a non-unix epoch origin
```pycon
>>> md.to_datetime([1, 2, 3], unit='D',
... origin=md.Timestamp('1960-01-01')).execute()
DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], dtype='datetime64[ns]', freq=None)
```
FILE:references/maxframe-client-docs/reference/dataframe/generated/maxframe.dataframe.to_numeric.md
# maxframe.dataframe.to_numeric
### maxframe.dataframe.to_numeric(arg, errors='raise', downcast=None)
Convert argument to a numeric type.
The default return dtype is float64 or int64
depending on the data supplied. Use the downcast parameter
to obtain other dtypes.
Please note that precision loss may occur if really large numbers
are passed in. Due to the internal limitations of ndarray, if
numbers smaller than -9223372036854775808 (np.iinfo(np.int64).min)
or larger than 18446744073709551615 (np.iinfo(np.uint64).max) are
passed in, it is very likely they will be converted to float so that
they can stored in an ndarray. These warnings apply similarly to
Series since it internally leverages ndarray.
* **Parameters:**
* **arg** (*scalar* *,* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *,* *1-d array* *, or* [*Series*](maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – Argument to be converted.
* **errors** ( *{'ignore'* *,* *'raise'* *,* *'coerce'}* *,* *default 'raise'*) –
- If ‘raise’, then invalid parsing will raise an exception.
- If ‘coerce’, then invalid parsing will be set as NaN.
- If ‘ignore’, then invalid parsing will return the input.
* **downcast** ( *{'integer'* *,* *'signed'* *,* *'unsigned'* *,* *'float'}* *,* *default None*) –
If not None, and if the data has been successfully cast to a
numerical dtype (or if the data was numeric to begin with),
downcast that resulting data to the smallest numerical dtype
possible according to the following rules:
- ’integer’ or ‘signed’: smallest signed int dtype (min.: np.int8)
- ’unsigned’: smallest unsigned int dtype (min.: np.uint8)
- ’float’: smallest float dtype (min.: np.float32)
As this behaviour is separate from the core conversion to
numeric values, any errors raised during the downcasting
will be surfaced regardless of the value of the ‘errors’ input.
In addition, downcasting will only occur if the size
of the resulting data’s dtype is strictly larger than
the dtype it is to be cast to, so if none of the dtypes
checked satisfy that specification, no downcasting will be
performed on the data.
* **Returns:**
Numeric if parsing succeeded.
Return type depends on input. Series if Series, otherwise Tensor.
* **Return type:**
ret
#### SEE ALSO
[`DataFrame.astype`](maxframe.dataframe.DataFrame.astype.md#maxframe.dataframe.DataFrame.astype)
: Cast argument to a specified dtype.
[`to_datetime`](maxframe.dataframe.to_datetime.md#maxframe.dataframe.to_datetime)
: Convert argument to datetime.
`to_timedelta`
: Convert argument to timedelta.
[`numpy.ndarray.astype`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html#numpy.ndarray.astype)
: Cast a numpy array to a specified type.
[`DataFrame.convert_dtypes`](maxframe.dataframe.DataFrame.convert_dtypes.md#maxframe.dataframe.DataFrame.convert_dtypes)
: Convert dtypes.
### Examples
Take separate series and convert to numeric, coercing when told to
```pycon
>>> s = md.Series(['1.0', '2', -3])
>>> md.to_numeric(s).execute()
0 1.0
1 2.0
2 -3.0
dtype: float64
>>> md.to_numeric(s, downcast='float').execute()
0 1.0
1 2.0
2 -3.0
dtype: float32
>>> md.to_numeric(s, downcast='signed').execute()
0 1
1 2
2 -3
dtype: int8
>>> s = md.Series(['apple', '1.0', '2', -3])
>>> md.to_numeric(s, errors='ignore').execute()
0 apple
1 1.0
2 2
3 -3
dtype: object
>>> md.to_numeric(s, errors='coerce').execute()
0 NaN
1 1.0
2 2.0
3 -3.0
dtype: float64
```
Downcasting of nullable integer and floating dtypes is supported:
```pycon
>>> s = md.Series([1, 2, 3], dtype="int64")
>>> md.to_numeric(s, downcast="integer").execute()
0 1
1 2
2 3
dtype: int8
>>> s = md.Series([1.0, 2.1, 3.0], dtype="float64")
>>> md.to_numeric(s, downcast="float").execute()
0 1.0
1 2.1
2 3.0
dtype: float32
```
FILE:references/maxframe-client-docs/reference/dataframe/groupby.md
<a id="generated-groupby"></a>
# GroupBy
GroupBy objects are returned by groupby
calls: [`maxframe.dataframe.DataFrame.groupby()`](generated/maxframe.dataframe.DataFrame.groupby.md#maxframe.dataframe.DataFrame.groupby), [`maxframe.dataframe.Series.groupby()`](generated/maxframe.dataframe.Series.groupby.md#maxframe.dataframe.Series.groupby), etc.
## Indexing, iteration
## Function application
| [`GroupBy.apply`](generated/maxframe.dataframe.groupby.GroupBy.apply.md#maxframe.dataframe.groupby.GroupBy.apply)(func, \*args[, output_type, ...]) | Apply function func group-wise and combine the results together. |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [`GroupBy.agg`](generated/maxframe.dataframe.groupby.GroupBy.agg.md#maxframe.dataframe.groupby.GroupBy.agg)([func, method]) | Aggregate using one or more operations on grouped data. |
| [`GroupBy.aggregate`](generated/maxframe.dataframe.groupby.GroupBy.aggregate.md#maxframe.dataframe.groupby.GroupBy.aggregate)([func, method]) | Aggregate using one or more operations on grouped data. |
| [`GroupBy.transform`](generated/maxframe.dataframe.groupby.GroupBy.transform.md#maxframe.dataframe.groupby.GroupBy.transform)(f, \*args[, dtypes, dtype, ...]) | Call function producing a like-indexed DataFrame on each group and return a DataFrame having the same indexes as the original object filled with the transformed values |
## Computations / descriptive stats
| [`GroupBy.all`](generated/maxframe.dataframe.groupby.GroupBy.all.md#maxframe.dataframe.groupby.GroupBy.all)(\*\*kw) | |
|----------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|
| [`GroupBy.any`](generated/maxframe.dataframe.groupby.GroupBy.any.md#maxframe.dataframe.groupby.GroupBy.any)(\*\*kw) | |
| [`GroupBy.cumcount`](generated/maxframe.dataframe.groupby.GroupBy.cumcount.md#maxframe.dataframe.groupby.GroupBy.cumcount)([ascending]) | Number each item in each group from 0 to the length of that group - 1. |
| [`GroupBy.cummax`](generated/maxframe.dataframe.groupby.GroupBy.cummax.md#maxframe.dataframe.groupby.GroupBy.cummax)() | Cumulative max for each group. |
| [`GroupBy.cummin`](generated/maxframe.dataframe.groupby.GroupBy.cummin.md#maxframe.dataframe.groupby.GroupBy.cummin)() | Cumulative min for each group. |
| [`GroupBy.cumprod`](generated/maxframe.dataframe.groupby.GroupBy.cumprod.md#maxframe.dataframe.groupby.GroupBy.cumprod)() | Cumulative prod for each group. |
| [`GroupBy.cumsum`](generated/maxframe.dataframe.groupby.GroupBy.cumsum.md#maxframe.dataframe.groupby.GroupBy.cumsum)() | Cumulative sum for each group. |
| [`GroupBy.count`](generated/maxframe.dataframe.groupby.GroupBy.count.md#maxframe.dataframe.groupby.GroupBy.count)(\*\*kw) | |
| [`GroupBy.expanding`](generated/maxframe.dataframe.groupby.GroupBy.expanding.md#maxframe.dataframe.groupby.GroupBy.expanding)([min_periods, shift, ...]) | Return an expanding grouper, providing expanding functionality per group. |
| [`GroupBy.max`](generated/maxframe.dataframe.groupby.GroupBy.max.md#maxframe.dataframe.groupby.GroupBy.max)(\*\*kw) | |
| [`GroupBy.mean`](generated/maxframe.dataframe.groupby.GroupBy.mean.md#maxframe.dataframe.groupby.GroupBy.mean)(\*\*kw) | |
| [`GroupBy.median`](generated/maxframe.dataframe.groupby.GroupBy.median.md#maxframe.dataframe.groupby.GroupBy.median)(\*\*kw) | |
| [`GroupBy.min`](generated/maxframe.dataframe.groupby.GroupBy.min.md#maxframe.dataframe.groupby.GroupBy.min)(\*\*kw) | |
| [`GroupBy.rolling`](generated/maxframe.dataframe.groupby.GroupBy.rolling.md#maxframe.dataframe.groupby.GroupBy.rolling)(window[, min_periods, ...]) | Return a rolling grouper, providing rolling functionality per group. |
| [`GroupBy.size`](generated/maxframe.dataframe.groupby.GroupBy.size.md#maxframe.dataframe.groupby.GroupBy.size)(\*\*kw) | |
| [`GroupBy.sem`](generated/maxframe.dataframe.groupby.GroupBy.sem.md#maxframe.dataframe.groupby.GroupBy.sem)(\*\*kw) | |
| [`GroupBy.std`](generated/maxframe.dataframe.groupby.GroupBy.std.md#maxframe.dataframe.groupby.GroupBy.std)(\*\*kw) | |
| [`GroupBy.sum`](generated/maxframe.dataframe.groupby.GroupBy.sum.md#maxframe.dataframe.groupby.GroupBy.sum)(\*\*kw) | |
| [`GroupBy.var`](generated/maxframe.dataframe.groupby.GroupBy.var.md#maxframe.dataframe.groupby.GroupBy.var)(\*\*kw) | |
The following methods are available in both `SeriesGroupBy` and
`DataFrameGroupBy` objects, but may differ slightly, usually in that
the `DataFrameGroupBy` version usually permits the specification of an
axis argument, and often an argument indicating whether to restrict
application to columns of a specific data type.
| [`DataFrameGroupBy.count`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.count.md#maxframe.dataframe.groupby.DataFrameGroupBy.count)(\*\*kw) | |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|
| [`DataFrameGroupBy.nunique`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.nunique.md#maxframe.dataframe.groupby.DataFrameGroupBy.nunique)(\*\*kw) | |
| [`DataFrameGroupBy.cummax`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.cummax.md#maxframe.dataframe.groupby.DataFrameGroupBy.cummax)() | Cumulative max for each group. |
| [`DataFrameGroupBy.cummin`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.cummin.md#maxframe.dataframe.groupby.DataFrameGroupBy.cummin)() | Cumulative min for each group. |
| [`DataFrameGroupBy.cumprod`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.cumprod.md#maxframe.dataframe.groupby.DataFrameGroupBy.cumprod)() | Cumulative prod for each group. |
| [`DataFrameGroupBy.cumsum`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.cumsum.md#maxframe.dataframe.groupby.DataFrameGroupBy.cumsum)() | Cumulative sum for each group. |
| [`DataFrameGroupBy.fillna`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.fillna.md#maxframe.dataframe.groupby.DataFrameGroupBy.fillna)([value, method, ...]) | Fill NA/NaN values using the specified method |
| [`DataFrameGroupBy.idxmax`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.idxmax.md#maxframe.dataframe.groupby.DataFrameGroupBy.idxmax)(\*\*kw) | |
| [`DataFrameGroupBy.idxmin`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.idxmin.md#maxframe.dataframe.groupby.DataFrameGroupBy.idxmin)(\*\*kw) | |
| [`DataFrameGroupBy.nunique`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.nunique.md#maxframe.dataframe.groupby.DataFrameGroupBy.nunique)(\*\*kw) | |
| [`DataFrameGroupBy.rank`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.rank.md#maxframe.dataframe.groupby.DataFrameGroupBy.rank)([method, ascending, ...]) | Provide the rank of values within each group. |
| [`DataFrameGroupBy.sample`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.sample.md#maxframe.dataframe.groupby.DataFrameGroupBy.sample)([n, frac, replace, ...]) | Return a random sample of items from each group. |
The following methods are available only for `SeriesGroupBy` objects.
The following methods are available only for `DataFrameGroupBy` objects.
| [`DataFrameGroupBy.mf.apply_chunk`](generated/maxframe.dataframe.groupby.DataFrameGroupBy.mf.apply_chunk.md#maxframe.dataframe.groupby.DataFrameGroupBy.mf.apply_chunk)(func[, ...]) | Apply function func group-wise and combine the results together. |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
FILE:references/maxframe-client-docs/reference/dataframe/index.md
<a id="dataframe-api"></a>
# MaxFrame DataFrame
This page gives an overview of all public MaxFrame DataFrame objects, functions and
methods. All classes and functions exposed in `maxframe.dataframe.*` namespace are public.
* [Input/output](io.md)
* [Clipboard](io.md#clipboard)
* [Flat file](io.md#flat-file)
* [JSON](io.md#json)
* [MaxCompute](io.md#maxcompute)
* [Native pandas](io.md#native-pandas)
* [Parquet](io.md#parquet)
* [General functions](general_functions.md)
* [Data manipulations](general_functions.md#data-manipulations)
* [Top-level missing data](general_functions.md#top-level-missing-data)
* [Top-level dealing with datetimelike](general_functions.md#top-level-dealing-with-datetimelike)
* [Top-level evaluation](general_functions.md#top-level-evaluation)
* [Series](series.md)
* [Constructor](series.md#constructor)
* [Attributes](series.md#attributes)
* [Conversion](series.md#conversion)
* [Index, iteration](series.md#index-iteration)
* [Binary operator functions](series.md#binary-operator-functions)
* [Function application, groupby & window](series.md#function-application-groupby-window)
* [Computations / descriptive stats](series.md#computations-descriptive-stats)
* [Reindexing / selection / label manipulation](series.md#reindexing-selection-label-manipulation)
* [Missing data handling](series.md#missing-data-handling)
* [Reshaping, sorting](series.md#reshaping-sorting)
* [Combining / comparing / joining / merging](series.md#combining-comparing-joining-merging)
* [Time Series-related](series.md#time-series-related)
* [Accessors](series.md#accessors)
* [Plotting](series.md#plotting)
* [DataFrame](frame.md)
* [Constructor](frame.md#constructor)
* [Attributes and underlying data](frame.md#attributes-and-underlying-data)
* [Conversion](frame.md#conversion)
* [Indexing, iteration](frame.md#indexing-iteration)
* [Binary operator functions](frame.md#binary-operator-functions)
* [Function application, GroupBy & window](frame.md#function-application-groupby-window)
* [Computations / descriptive stats](frame.md#computations-descriptive-stats)
* [Reindexing / selection / label manipulation](frame.md#reindexing-selection-label-manipulation)
* [Missing data handling](frame.md#missing-data-handling)
* [Reshaping, sorting, transposing](frame.md#reshaping-sorting-transposing)
* [Combining / comparing / joining / merging](frame.md#combining-comparing-joining-merging)
* [Plotting](frame.md#plotting)
* [Serialization / IO / conversion](frame.md#serialization-io-conversion)
* [MaxFrame Extensions](frame.md#maxframe-extensions)
* [Index objects](indexing.md)
* [Constructor](indexing.md#constructor)
* [Properties](indexing.md#properties)
* [Modifying and computations](indexing.md#modifying-and-computations)
* [Compatibility with MultiIndex](indexing.md#compatibility-with-multiindex)
* [Missing values](indexing.md#missing-values)
* [Conversion](indexing.md#conversion)
* [Sorting](indexing.md#sorting)
* [Selecting](indexing.md#selecting)
* [GroupBy](groupby.md)
* [Indexing, iteration](groupby.md#indexing-iteration)
* [Function application](groupby.md#function-application)
* [Computations / descriptive stats](groupby.md#computations-descriptive-stats)
FILE:references/maxframe-client-docs/reference/dataframe/indexing.md
# Index objects
## Constructor
| [`Index`](generated/maxframe.dataframe.Index.md#maxframe.dataframe.Index)(data, \*\*_) | |
|------------------------------------------------------------------------------------------|----|
## Properties
| [`Index.has_duplicates`](generated/maxframe.dataframe.Index.has_duplicates.md#maxframe.dataframe.Index.has_duplicates) | |
|---------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| [`Index.hasnans`](generated/maxframe.dataframe.Index.hasnans.md#maxframe.dataframe.Index.hasnans) | Return True if there are any NaNs. |
| [`Index.is_monotonic_decreasing`](generated/maxframe.dataframe.Index.is_monotonic_decreasing.md#maxframe.dataframe.Index.is_monotonic_decreasing) | Return boolean scalar if values in the object are monotonic_decreasing. |
| [`Index.is_monotonic_increasing`](generated/maxframe.dataframe.Index.is_monotonic_increasing.md#maxframe.dataframe.Index.is_monotonic_increasing) | Return boolean scalar if values in the object are monotonic_increasing. |
| [`Index.is_unique`](generated/maxframe.dataframe.Index.is_unique.md#maxframe.dataframe.Index.is_unique) | Return boolean if values in the index are unique. |
| [`Index.name`](generated/maxframe.dataframe.Index.name.md#maxframe.dataframe.Index.name) | |
| [`Index.names`](generated/maxframe.dataframe.Index.names.md#maxframe.dataframe.Index.names) | |
| [`Index.ndim`](generated/maxframe.dataframe.Index.ndim.md#maxframe.dataframe.Index.ndim) | |
| [`Index.size`](generated/maxframe.dataframe.Index.size.md#maxframe.dataframe.Index.size) | |
## Modifying and computations
| [`Index.all`](generated/maxframe.dataframe.Index.all.md#maxframe.dataframe.Index.all)() | |
|-------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|
| [`Index.any`](generated/maxframe.dataframe.Index.any.md#maxframe.dataframe.Index.any)() | |
| [`Index.argmax`](generated/maxframe.dataframe.Index.argmax.md#maxframe.dataframe.Index.argmax)([axis, skipna]) | Return int position of the smallest value in the Series. |
| [`Index.argmin`](generated/maxframe.dataframe.Index.argmin.md#maxframe.dataframe.Index.argmin)([axis, skipna]) | Return int position of the smallest value in the Series. |
| [`Index.drop`](generated/maxframe.dataframe.Index.drop.md#maxframe.dataframe.Index.drop)(labels[, errors]) | Make new Index with passed list of labels deleted. |
| [`Index.drop_duplicates`](generated/maxframe.dataframe.Index.drop_duplicates.md#maxframe.dataframe.Index.drop_duplicates)([keep, method]) | Return Index with duplicate values removed. |
| [`Index.factorize`](generated/maxframe.dataframe.Index.factorize.md#maxframe.dataframe.Index.factorize)([sort, use_na_sentinel]) | Encode the object as an enumerated type or categorical variable. |
| [`Index.insert`](generated/maxframe.dataframe.Index.insert.md#maxframe.dataframe.Index.insert)(loc, value) | Make new Index inserting new item at location. |
| [`Index.max`](generated/maxframe.dataframe.Index.max.md#maxframe.dataframe.Index.max)([axis, skipna]) | |
| [`Index.min`](generated/maxframe.dataframe.Index.min.md#maxframe.dataframe.Index.min)([axis, skipna]) | |
| [`Index.rename`](generated/maxframe.dataframe.Index.rename.md#maxframe.dataframe.Index.rename)(name[, inplace]) | Alter Index or MultiIndex name. |
| [`Index.repeat`](generated/maxframe.dataframe.Index.repeat.md#maxframe.dataframe.Index.repeat)(repeats[, axis]) | Repeat elements of an Index. |
## Compatibility with MultiIndex
| [`Index.droplevel`](generated/maxframe.dataframe.Index.droplevel.md#maxframe.dataframe.Index.droplevel)(level) | Return index with requested level(s) removed. |
|----------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
| [`Index.set_names`](generated/maxframe.dataframe.Index.set_names.md#maxframe.dataframe.Index.set_names)(names[, level, inplace]) | Set Index or MultiIndex name. |
## Missing values
| [`Index.dropna`](generated/maxframe.dataframe.Index.dropna.md#maxframe.dataframe.Index.dropna)([how]) | Return Index without NA/NaN values. |
|-------------------------------------------------------------------------------------------------------------------|----------------------------------------------|
| [`Index.fillna`](generated/maxframe.dataframe.Index.fillna.md#maxframe.dataframe.Index.fillna)([value, downcast]) | Fill NA/NaN values with the specified value. |
| [`Index.isna`](generated/maxframe.dataframe.Index.isna.md#maxframe.dataframe.Index.isna)() | Detect missing values. |
| [`Index.notna`](generated/maxframe.dataframe.Index.notna.md#maxframe.dataframe.Index.notna)() | Detect existing (non-missing) values. |
## Conversion
| [`Index.astype`](generated/maxframe.dataframe.Index.astype.md#maxframe.dataframe.Index.astype)(dtype[, copy]) | Create an Index with values cast to dtypes. |
|------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|
| [`Index.to_frame`](generated/maxframe.dataframe.Index.to_frame.md#maxframe.dataframe.Index.to_frame)([index, name]) | Create a DataFrame with a column containing the Index. |
| [`Index.to_series`](generated/maxframe.dataframe.Index.to_series.md#maxframe.dataframe.Index.to_series)([index, name]) | Create a Series with both index and values equal to the index keys. |
## Sorting
| [`Index.argsort`](generated/maxframe.dataframe.Index.argsort.md#maxframe.dataframe.Index.argsort)(\*args, \*\*kwargs) | |
|-------------------------------------------------------------------------------------------------------------------------|----|
## Selecting
| [`Index.get_level_values`](generated/maxframe.dataframe.Index.get_level_values.md#maxframe.dataframe.Index.get_level_values)(level) | Return vector of label values for requested level. |
|---------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------|
FILE:references/maxframe-client-docs/reference/dataframe/io.md
<a id="generated-io"></a>
# Input/output
## Clipboard
| [`read_clipboard`](generated/maxframe.dataframe.read_clipboard.md#maxframe.dataframe.read_clipboard)([sep]) | Read text from clipboard and pass to [`read_csv()`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv). |
|-----------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| [`DataFrame.to_clipboard`](generated/maxframe.dataframe.DataFrame.to_clipboard.md#maxframe.dataframe.DataFrame.to_clipboard)(\*[, excel, sep, ...]) | Copy object to the system clipboard. |
## Flat file
| [`read_csv`](generated/maxframe.dataframe.read_csv.md#maxframe.dataframe.read_csv)(path, \*[, names, sep, index_col, ...]) | Read a comma-separated values (csv) file into DataFrame. |
|--------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|
| [`DataFrame.to_csv`](generated/maxframe.dataframe.DataFrame.to_csv.md#maxframe.dataframe.DataFrame.to_csv)(path[, sep, na_rep, ...]) | Write object to a comma-separated values (csv) file. |
## JSON
| [`read_json`](generated/maxframe.dataframe.read_json.md#maxframe.dataframe.read_json)(path, \*[, orient, typ, dtype, ...]) | Read a JSON file into a DataFrame. |
|------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|
| [`DataFrame.to_json`](generated/maxframe.dataframe.DataFrame.to_json.md#maxframe.dataframe.DataFrame.to_json)([path, orient, ...]) | Convert the object to a JSON string. |
## MaxCompute
| [`read_odps_query`](generated/maxframe.dataframe.read_odps_query.md#maxframe.dataframe.read_odps_query)(query[, odps_entry, ...]) | Read data from a MaxCompute (ODPS) query into DataFrame. |
|----------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|
| [`read_odps_table`](generated/maxframe.dataframe.read_odps_table.md#maxframe.dataframe.read_odps_table)(table_name[, partitions, ...]) | Read data from a MaxCompute (ODPS) table into DataFrame. |
| [`DataFrame.to_odps_table`](generated/maxframe.dataframe.DataFrame.to_odps_table.md#maxframe.dataframe.DataFrame.to_odps_table)(table[, partition, ...]) | Write DataFrame object into a MaxCompute (ODPS) table. |
## Native pandas
| [`read_pandas`](generated/maxframe.dataframe.read_pandas.md#maxframe.dataframe.read_pandas)(data, \*\*kwargs) | Create MaxFrame objects from pandas. |
|--------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|
| [`DataFrame.to_pandas`](generated/maxframe.dataframe.DataFrame.to_pandas.md#maxframe.dataframe.DataFrame.to_pandas)([session]) | |
## Parquet
| [`read_parquet`](generated/maxframe.dataframe.read_parquet.md#maxframe.dataframe.read_parquet)(path[, engine, columns, ...]) | Load a parquet object from the file path, returning a DataFrame. |
|---------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| [`DataFrame.to_parquet`](generated/maxframe.dataframe.DataFrame.to_parquet.md#maxframe.dataframe.DataFrame.to_parquet)(path[, engine, ...]) | Write a DataFrame to the binary parquet format, each chunk will be written to a Parquet file. |
FILE:references/maxframe-client-docs/reference/dataframe/series.md
# Series
## Constructor
| [`Series`](generated/maxframe.dataframe.Series.md#maxframe.dataframe.Series)([data, index, dtype, name, copy, ...]) | |
|-----------------------------------------------------------------------------------------------------------------------|----|
## Attributes
**Axes**
| [`Series.index`](generated/maxframe.dataframe.Series.index.md#maxframe.dataframe.Series.index) | The index (axis labels) of the Series. |
|--------------------------------------------------------------------------------------------------|------------------------------------------|
| [`Series.dtype`](generated/maxframe.dataframe.Series.dtype.md#maxframe.dataframe.Series.dtype) | Return the dtype object of the underlying data. |
|------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
| [`Series.memory_usage`](generated/maxframe.dataframe.Series.memory_usage.md#maxframe.dataframe.Series.memory_usage)([index, deep]) | Return the memory usage of the Series. |
| [`Series.ndim`](generated/maxframe.dataframe.Series.ndim.md#maxframe.dataframe.Series.ndim) | Return an int representing the number of axes / array dimensions. |
| [`Series.name`](generated/maxframe.dataframe.Series.name.md#maxframe.dataframe.Series.name) | |
| [`Series.shape`](generated/maxframe.dataframe.Series.shape.md#maxframe.dataframe.Series.shape) | |
| [`Series.T`](generated/maxframe.dataframe.Series.T.md#maxframe.dataframe.Series.T) | Return the transpose, which is by definition self. |
## Conversion
| [`Series.astype`](generated/maxframe.dataframe.Series.astype.md#maxframe.dataframe.Series.astype)(dtype[, copy, errors]) | Cast a pandas object to a specified dtype `dtype`. |
|-------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|
| [`Series.convert_dtypes`](generated/maxframe.dataframe.Series.convert_dtypes.md#maxframe.dataframe.Series.convert_dtypes)([infer_objects, ...]) | Convert columns to best possible dtypes using dtypes supporting `pd.NA`. |
| [`Series.copy`](generated/maxframe.dataframe.Series.copy.md#maxframe.dataframe.Series.copy)([deep]) | Make a copy of this object's indices and data. |
| [`Series.infer_objects`](generated/maxframe.dataframe.Series.infer_objects.md#maxframe.dataframe.Series.infer_objects)([copy]) | Attempt to infer better dtypes for object columns. |
| [`Series.to_frame`](generated/maxframe.dataframe.Series.to_frame.md#maxframe.dataframe.Series.to_frame)([name]) | Convert Series to DataFrame. |
## Index, iteration
| [`Series.at`](generated/maxframe.dataframe.Series.at.md#maxframe.dataframe.Series.at) | Access a single value for a row/column label pair. |
|-----------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| [`Series.iat`](generated/maxframe.dataframe.Series.iat.md#maxframe.dataframe.Series.iat) | Access a single value for a row/column pair by integer position. |
| [`Series.iloc`](generated/maxframe.dataframe.Series.iloc.md#maxframe.dataframe.Series.iloc) | Purely integer-location based indexing for selection by position. |
| [`Series.loc`](generated/maxframe.dataframe.Series.loc.md#maxframe.dataframe.Series.loc) | Access a group of rows and columns by label(s) or a boolean array. |
| [`Series.mask`](generated/maxframe.dataframe.Series.mask.md#maxframe.dataframe.Series.mask)(cond[, other, inplace, axis, ...]) | Replace values where the condition is True. |
| [`Series.pop`](generated/maxframe.dataframe.Series.pop.md#maxframe.dataframe.Series.pop)(item) | Return item and drops from series. |
| [`Series.xs`](generated/maxframe.dataframe.Series.xs.md#maxframe.dataframe.Series.xs)(key[, axis, level, drop_level]) | Return cross-section from the Series/DataFrame. |
| [`Series.where`](generated/maxframe.dataframe.Series.where.md#maxframe.dataframe.Series.where)(cond[, other, inplace, axis, ...]) | Replace values where the condition is False. |
## Binary operator functions
| [`Series.add`](generated/maxframe.dataframe.Series.add.md#maxframe.dataframe.Series.add)(other[, level, fill_value, axis]) | Return Addition of series and other, element-wise (binary operator add). |
|---------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
| [`Series.sub`](generated/maxframe.dataframe.Series.sub.md#maxframe.dataframe.Series.sub)(other[, level, fill_value, axis]) | Return Subtraction of series and other, element-wise (binary operator subtract). |
| [`Series.mul`](generated/maxframe.dataframe.Series.mul.md#maxframe.dataframe.Series.mul)(other[, level, fill_value, axis]) | Return Multiplication of series and other, element-wise (binary operator mul). |
| [`Series.div`](generated/maxframe.dataframe.Series.div.md#maxframe.dataframe.Series.div)(other[, level, fill_value, axis]) | Return Floating division of series and other, element-wise (binary operator truediv). |
| [`Series.truediv`](generated/maxframe.dataframe.Series.truediv.md#maxframe.dataframe.Series.truediv)(other[, level, fill_value, axis]) | Return Floating division of series and other, element-wise (binary operator truediv). |
| [`Series.floordiv`](generated/maxframe.dataframe.Series.floordiv.md#maxframe.dataframe.Series.floordiv)(other[, level, fill_value, axis]) | Return Integer division of series and other, element-wise (binary operator floordiv). |
| [`Series.mod`](generated/maxframe.dataframe.Series.mod.md#maxframe.dataframe.Series.mod)(other[, level, fill_value, axis]) | Return Modulo of series and other, element-wise (binary operator mod). |
| [`Series.pow`](generated/maxframe.dataframe.Series.pow.md#maxframe.dataframe.Series.pow)(other[, level, fill_value, axis]) | Return Exponential power of series and other, element-wise (binary operator pow). |
| [`Series.radd`](generated/maxframe.dataframe.Series.radd.md#maxframe.dataframe.Series.radd)(other[, level, fill_value, axis]) | Return Addition of series and other, element-wise (binary operator radd). |
| [`Series.rsub`](generated/maxframe.dataframe.Series.rsub.md#maxframe.dataframe.Series.rsub)(other[, level, fill_value, axis]) | Return Subtraction of series and other, element-wise (binary operator rsubtract). |
| [`Series.rmul`](generated/maxframe.dataframe.Series.rmul.md#maxframe.dataframe.Series.rmul)(other[, level, fill_value, axis]) | Return Multiplication of series and other, element-wise (binary operator rmul). |
| [`Series.rdiv`](generated/maxframe.dataframe.Series.rdiv.md#maxframe.dataframe.Series.rdiv)(other[, level, fill_value, axis]) | Return Floating division of series and other, element-wise (binary operator rtruediv). |
| [`Series.rtruediv`](generated/maxframe.dataframe.Series.rtruediv.md#maxframe.dataframe.Series.rtruediv)(other[, level, fill_value, axis]) | Return Floating division of series and other, element-wise (binary operator rtruediv). |
| [`Series.rfloordiv`](generated/maxframe.dataframe.Series.rfloordiv.md#maxframe.dataframe.Series.rfloordiv)(other[, level, fill_value, ...]) | Return Integer division of series and other, element-wise (binary operator rfloordiv). |
| [`Series.rmod`](generated/maxframe.dataframe.Series.rmod.md#maxframe.dataframe.Series.rmod)(other[, level, fill_value, axis]) | Return Modulo of series and other, element-wise (binary operator rmod). |
| [`Series.rpow`](generated/maxframe.dataframe.Series.rpow.md#maxframe.dataframe.Series.rpow)(other[, level, fill_value, axis]) | Return Exponential power of series and other, element-wise (binary operator rpow). |
| [`Series.lt`](generated/maxframe.dataframe.Series.lt.md#maxframe.dataframe.Series.lt)(other[, level, fill_value, axis]) | Return Less than of series and other, element-wise (binary operator lt). |
| [`Series.gt`](generated/maxframe.dataframe.Series.gt.md#maxframe.dataframe.Series.gt)(other[, level, fill_value, axis]) | Return Greater than of series and other, element-wise (binary operator gt). |
| [`Series.le`](generated/maxframe.dataframe.Series.le.md#maxframe.dataframe.Series.le)(other[, level, fill_value, axis]) | Return Less than or equal to of series and other, element-wise (binary operator le). |
| [`Series.ge`](generated/maxframe.dataframe.Series.ge.md#maxframe.dataframe.Series.ge)(other[, level, fill_value, axis]) | Return Greater than or equal to of series and other, element-wise (binary operator ge). |
| [`Series.ne`](generated/maxframe.dataframe.Series.ne.md#maxframe.dataframe.Series.ne)(other[, level, fill_value, axis]) | Return Not equal to of series and other, element-wise (binary operator ne). |
| [`Series.eq`](generated/maxframe.dataframe.Series.eq.md#maxframe.dataframe.Series.eq)(other[, level, fill_value, axis]) | Return Equal to of series and other, element-wise (binary operator eq). |
| [`Series.combine`](generated/maxframe.dataframe.Series.combine.md#maxframe.dataframe.Series.combine)(other, func[, fill_value]) | Combine the Series with a Series or scalar according to func. |
| [`Series.combine_first`](generated/maxframe.dataframe.Series.combine_first.md#maxframe.dataframe.Series.combine_first)(other) | Update null elements with value in the same location in 'other'. |
## Function application, groupby & window
| [`Series.apply`](generated/maxframe.dataframe.Series.apply.md#maxframe.dataframe.Series.apply)(func[, convert_dtype, ...]) | Invoke function on values of Series. |
|----------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
| [`Series.agg`](generated/maxframe.dataframe.Series.agg.md#maxframe.dataframe.Series.agg)([func, axis]) | Aggregate using one or more operations over the specified axis. |
| [`Series.aggregate`](generated/maxframe.dataframe.Series.aggregate.md#maxframe.dataframe.Series.aggregate)([func, axis]) | Aggregate using one or more operations over the specified axis. |
| [`Series.ewm`](generated/maxframe.dataframe.Series.ewm.md#maxframe.dataframe.Series.ewm)([com, span, halflife, alpha, ...]) | Provide exponential weighted functions. |
| [`Series.expanding`](generated/maxframe.dataframe.Series.expanding.md#maxframe.dataframe.Series.expanding)([min_periods, shift, ...]) | Provide expanding transformations. |
| [`Series.groupby`](generated/maxframe.dataframe.Series.groupby.md#maxframe.dataframe.Series.groupby)([by, level, as_index, sort, ...]) | Group DataFrame using a mapper or by a Series of columns. |
| [`Series.map`](generated/maxframe.dataframe.Series.map.md#maxframe.dataframe.Series.map)(arg[, na_action, dtype, ...]) | Map values of Series according to input correspondence. |
| [`Series.rolling`](generated/maxframe.dataframe.Series.rolling.md#maxframe.dataframe.Series.rolling)(window[, min_periods, ...]) | Provide rolling window calculations. |
| [`Series.transform`](generated/maxframe.dataframe.Series.transform.md#maxframe.dataframe.Series.transform)(func[, convert_dtype, ...]) | Call `func` on self producing a Series with transformed values. |
<a id="generated-series-stats"></a>
## Computations / descriptive stats
| [`Series.abs`](generated/maxframe.dataframe.Series.abs.md#maxframe.dataframe.Series.abs)() | |
|------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| [`Series.all`](generated/maxframe.dataframe.Series.all.md#maxframe.dataframe.Series.all)([axis, bool_only, skipna, level, ...]) | |
| [`Series.any`](generated/maxframe.dataframe.Series.any.md#maxframe.dataframe.Series.any)([axis, bool_only, skipna, level, ...]) | |
| [`Series.between`](generated/maxframe.dataframe.Series.between.md#maxframe.dataframe.Series.between)(left, right[, inclusive]) | Return boolean Series equivalent to left <= series <= right. |
| [`Series.clip`](generated/maxframe.dataframe.Series.clip.md#maxframe.dataframe.Series.clip)([lower, upper, axis, inplace]) | Trim values at input threshold(s). |
| [`Series.corr`](generated/maxframe.dataframe.Series.corr.md#maxframe.dataframe.Series.corr)(other[, method, min_periods]) | Compute correlation with other Series, excluding missing values. |
| [`Series.count`](generated/maxframe.dataframe.Series.count.md#maxframe.dataframe.Series.count)([level]) | |
| [`Series.cov`](generated/maxframe.dataframe.Series.cov.md#maxframe.dataframe.Series.cov)(other[, min_periods, ddof]) | Compute covariance with Series, excluding missing values. |
| [`Series.describe`](generated/maxframe.dataframe.Series.describe.md#maxframe.dataframe.Series.describe)([percentiles, include, exclude]) | Generate descriptive statistics. |
| [`Series.factorize`](generated/maxframe.dataframe.Series.factorize.md#maxframe.dataframe.Series.factorize)([sort, use_na_sentinel]) | Encode the object as an enumerated type or categorical variable. |
| [`Series.is_monotonic_increasing`](generated/maxframe.dataframe.Series.is_monotonic_increasing.md#maxframe.dataframe.Series.is_monotonic_increasing) | Return boolean scalar if values in the object are monotonic_increasing. |
| [`Series.is_monotonic_decreasing`](generated/maxframe.dataframe.Series.is_monotonic_decreasing.md#maxframe.dataframe.Series.is_monotonic_decreasing) | Return boolean scalar if values in the object are monotonic_decreasing. |
| [`Series.is_unique`](generated/maxframe.dataframe.Series.is_unique.md#maxframe.dataframe.Series.is_unique) | Return boolean if values in the object are unique. |
| [`Series.max`](generated/maxframe.dataframe.Series.max.md#maxframe.dataframe.Series.max)([axis, skipna, level, method]) | |
| [`Series.mean`](generated/maxframe.dataframe.Series.mean.md#maxframe.dataframe.Series.mean)([axis, skipna, level, method]) | |
| [`Series.median`](generated/maxframe.dataframe.Series.median.md#maxframe.dataframe.Series.median)([axis, skipna, level, method]) | |
| [`Series.min`](generated/maxframe.dataframe.Series.min.md#maxframe.dataframe.Series.min)([axis, skipna, level, method]) | |
| [`Series.mode`](generated/maxframe.dataframe.Series.mode.md#maxframe.dataframe.Series.mode)([dropna, combine_size]) | Return the mode(s) of the Series. |
| [`Series.nlargest`](generated/maxframe.dataframe.Series.nlargest.md#maxframe.dataframe.Series.nlargest)(n[, keep]) | Return the largest n elements. |
| [`Series.nsmallest`](generated/maxframe.dataframe.Series.nsmallest.md#maxframe.dataframe.Series.nsmallest)(n[, keep]) | Return the smallest n elements. |
| [`Series.nunique`](generated/maxframe.dataframe.Series.nunique.md#maxframe.dataframe.Series.nunique)([dropna]) | Return number of unique elements in the object. |
| [`Series.prod`](generated/maxframe.dataframe.Series.prod.md#maxframe.dataframe.Series.prod)([axis, skipna, level, ...]) | |
| [`Series.product`](generated/maxframe.dataframe.Series.product.md#maxframe.dataframe.Series.product)([axis, skipna, level, ...]) | |
| [`Series.quantile`](generated/maxframe.dataframe.Series.quantile.md#maxframe.dataframe.Series.quantile)([q, interpolation]) | Return value at the given quantile. |
| [`Series.rank`](generated/maxframe.dataframe.Series.rank.md#maxframe.dataframe.Series.rank)([axis, method, numeric_only, ...]) | Compute numerical data ranks (1 through n) along axis. |
| [`Series.round`](generated/maxframe.dataframe.Series.round.md#maxframe.dataframe.Series.round)([decimals]) | Round each value in a Series to the given number of decimals. |
| [`Series.sem`](generated/maxframe.dataframe.Series.sem.md#maxframe.dataframe.Series.sem)([axis, skipna, level, ddof, method]) | |
| [`Series.std`](generated/maxframe.dataframe.Series.std.md#maxframe.dataframe.Series.std)([axis, skipna, level, ddof, method]) | |
| [`Series.sum`](generated/maxframe.dataframe.Series.sum.md#maxframe.dataframe.Series.sum)([axis, skipna, level, min_count, ...]) | |
| [`Series.unique`](generated/maxframe.dataframe.Series.unique.md#maxframe.dataframe.Series.unique)([method]) | Uniques are returned in order of appearance. |
| [`Series.value_counts`](generated/maxframe.dataframe.Series.value_counts.md#maxframe.dataframe.Series.value_counts)([normalize, sort, ...]) | Return a Series containing counts of unique values. |
| [`Series.var`](generated/maxframe.dataframe.Series.var.md#maxframe.dataframe.Series.var)([axis, skipna, level, ddof, method]) | |
## Reindexing / selection / label manipulation
| [`Series.add_prefix`](generated/maxframe.dataframe.Series.add_prefix.md#maxframe.dataframe.Series.add_prefix)(prefix) | Prefix labels with string prefix. |
|----------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
| [`Series.add_suffix`](generated/maxframe.dataframe.Series.add_suffix.md#maxframe.dataframe.Series.add_suffix)(suffix) | Suffix labels with string suffix. |
| [`Series.align`](generated/maxframe.dataframe.Series.align.md#maxframe.dataframe.Series.align)(other[, join, axis, level, ...]) | Align two objects on their axes with the specified join method. |
| [`Series.at_time`](generated/maxframe.dataframe.Series.at_time.md#maxframe.dataframe.Series.at_time)(time[, axis]) | Select values at particular time of day (e.g., 9:30AM). |
| [`Series.between_time`](generated/maxframe.dataframe.Series.between_time.md#maxframe.dataframe.Series.between_time)(start_time, end_time[, ...]) | Select values between particular times of the day (e.g., 9:00-9:30 AM). |
| [`Series.case_when`](generated/maxframe.dataframe.Series.case_when.md#maxframe.dataframe.Series.case_when)(caselist) | Replace values where the conditions are True. |
| [`Series.drop`](generated/maxframe.dataframe.Series.drop.md#maxframe.dataframe.Series.drop)([labels, axis, index, columns, ...]) | Return Series with specified index labels removed. |
| [`Series.drop_duplicates`](generated/maxframe.dataframe.Series.drop_duplicates.md#maxframe.dataframe.Series.drop_duplicates)([keep, inplace, ...]) | Return Series with duplicate values removed. |
| [`Series.droplevel`](generated/maxframe.dataframe.Series.droplevel.md#maxframe.dataframe.Series.droplevel)(level[, axis]) | Return Series/DataFrame with requested index / column level(s) removed. |
| [`Series.filter`](generated/maxframe.dataframe.Series.filter.md#maxframe.dataframe.Series.filter)([items, like, regex, axis]) | Subset the dataframe rows or columns according to the specified index labels. |
| [`Series.head`](generated/maxframe.dataframe.Series.head.md#maxframe.dataframe.Series.head)([n]) | Return the first n rows. |
| [`Series.idxmax`](generated/maxframe.dataframe.Series.idxmax.md#maxframe.dataframe.Series.idxmax)([axis, skipna]) | Return the row label of the maximum value. |
| [`Series.idxmin`](generated/maxframe.dataframe.Series.idxmin.md#maxframe.dataframe.Series.idxmin)([axis, skipna]) | Return the row label of the minimum value. |
| [`Series.isin`](generated/maxframe.dataframe.Series.isin.md#maxframe.dataframe.Series.isin)(values) | Whether elements in Series are contained in values. |
| [`Series.reindex`](generated/maxframe.dataframe.Series.reindex.md#maxframe.dataframe.Series.reindex)([labels, index, columns, ...]) | Conform Series/DataFrame to new index with optional filling logic. |
| [`Series.reindex_like`](generated/maxframe.dataframe.Series.reindex_like.md#maxframe.dataframe.Series.reindex_like)(other[, method, copy, ...]) | Return an object with matching indices as other object. |
| [`Series.rename`](generated/maxframe.dataframe.Series.rename.md#maxframe.dataframe.Series.rename)([index, axis, copy, inplace, ...]) | Alter Series index labels or name. |
| [`Series.reset_index`](generated/maxframe.dataframe.Series.reset_index.md#maxframe.dataframe.Series.reset_index)([level, drop, name, ...]) | Generate a new DataFrame or Series with the index reset. |
| [`Series.sample`](generated/maxframe.dataframe.Series.sample.md#maxframe.dataframe.Series.sample)([n, frac, replace, weights, ...]) | Return a random sample of items from an axis of object. |
| [`Series.set_axis`](generated/maxframe.dataframe.Series.set_axis.md#maxframe.dataframe.Series.set_axis)(labels[, axis, inplace]) | Assign desired index to given axis. |
| [`Series.take`](generated/maxframe.dataframe.Series.take.md#maxframe.dataframe.Series.take)(indices[, axis]) | Return the elements in the given *positional* indices along an axis. |
| [`Series.truncate`](generated/maxframe.dataframe.Series.truncate.md#maxframe.dataframe.Series.truncate)([before, after, axis, copy]) | Truncate a Series or DataFrame before and after some index value. |
## Missing data handling
| [`Series.dropna`](generated/maxframe.dataframe.Series.dropna.md#maxframe.dataframe.Series.dropna)([axis, inplace, how, ignore_index]) | Return a new Series with missing values removed. |
|-----------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------|
| [`Series.fillna`](generated/maxframe.dataframe.Series.fillna.md#maxframe.dataframe.Series.fillna)([value, method, axis, ...]) | Fill NA/NaN values using the specified method. |
| [`Series.isna`](generated/maxframe.dataframe.Series.isna.md#maxframe.dataframe.Series.isna)() | Detect missing values. |
| [`Series.notna`](generated/maxframe.dataframe.Series.notna.md#maxframe.dataframe.Series.notna)() | Detect existing (non-missing) values. |
| [`Series.dropna`](generated/maxframe.dataframe.Series.dropna.md#maxframe.dataframe.Series.dropna)([axis, inplace, how, ignore_index]) | Return a new Series with missing values removed. |
| [`Series.fillna`](generated/maxframe.dataframe.Series.fillna.md#maxframe.dataframe.Series.fillna)([value, method, axis, ...]) | Fill NA/NaN values using the specified method. |
## Reshaping, sorting
| [`Series.argmax`](generated/maxframe.dataframe.Series.argmax.md#maxframe.dataframe.Series.argmax)([axis, skipna]) | Return int position of the smallest value in the Series. |
|----------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------|
| [`Series.argmin`](generated/maxframe.dataframe.Series.argmin.md#maxframe.dataframe.Series.argmin)([axis, skipna]) | Return int position of the smallest value in the Series. |
| [`Series.argsort`](generated/maxframe.dataframe.Series.argsort.md#maxframe.dataframe.Series.argsort)([axis, kind, order, stable]) | Return the integer indices that would sort the Series values. |
| [`Series.explode`](generated/maxframe.dataframe.Series.explode.md#maxframe.dataframe.Series.explode)([ignore_index, ...]) | Transform each element of a list-like to a row. |
| [`Series.reorder_levels`](generated/maxframe.dataframe.Series.reorder_levels.md#maxframe.dataframe.Series.reorder_levels)(order) | Rearrange index levels using input order. |
| [`Series.repeat`](generated/maxframe.dataframe.Series.repeat.md#maxframe.dataframe.Series.repeat)(repeats[, axis]) | Repeat elements of a Series. |
| [`Series.sort_values`](generated/maxframe.dataframe.Series.sort_values.md#maxframe.dataframe.Series.sort_values)([axis, ascending, ...]) | Sort by the values. |
| [`Series.sort_index`](generated/maxframe.dataframe.Series.sort_index.md#maxframe.dataframe.Series.sort_index)([axis, level, ascending, ...]) | Sort object by labels (along an axis). |
| [`Series.swaplevel`](generated/maxframe.dataframe.Series.swaplevel.md#maxframe.dataframe.Series.swaplevel)([i, j]) | Swap levels i and j in a `MultiIndex`. |
| [`Series.unstack`](generated/maxframe.dataframe.Series.unstack.md#maxframe.dataframe.Series.unstack)([level, fill_value]) | Unstack, also known as pivot, Series with MultiIndex to produce DataFrame. |
## Combining / comparing / joining / merging
| [`Series.append`](generated/maxframe.dataframe.Series.append.md#maxframe.dataframe.Series.append)(other[, ignore_index, ...]) | Append rows of other to the end of caller, returning a new object. |
|---------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| [`Series.compare`](generated/maxframe.dataframe.Series.compare.md#maxframe.dataframe.Series.compare)(other[, align_axis, ...]) | Compare to another Series and show the differences. |
| [`Series.update`](generated/maxframe.dataframe.Series.update.md#maxframe.dataframe.Series.update)(other) | Modify Series in place using values from passed Series. |
## Time Series-related
| [`Series.first_valid_index`](generated/maxframe.dataframe.Series.first_valid_index.md#maxframe.dataframe.Series.first_valid_index)() | Return index for first non-NA value or None, if no non-NA value is found. |
|----------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| [`Series.last_valid_index`](generated/maxframe.dataframe.Series.last_valid_index.md#maxframe.dataframe.Series.last_valid_index)() | Return index for last non-NA value or None, if no non-NA value is found. |
| [`Series.shift`](generated/maxframe.dataframe.Series.shift.md#maxframe.dataframe.Series.shift)([periods, freq, axis, fill_value]) | Shift index by desired number of periods with an optional time freq. |
| [`Series.tshift`](generated/maxframe.dataframe.Series.tshift.md#maxframe.dataframe.Series.tshift)([periods, freq, axis]) | Shift the time index, using the index's frequency if available. |
## Accessors
Pandas provides dtype-specific methods under various accessors.
These are separate namespaces within [`Series`](generated/maxframe.dataframe.Series.md#maxframe.dataframe.Series) that only apply
to specific data types.
| Data Type | Accessor |
|-----------------------------|--------------------------------|
| Datetime, Timedelta, Period | [dt](#generated-series-dt) |
| String | [str](#generated-series-str) |
| Dict | [dict](#generated-series-dict) |
<a id="generated-series-dt"></a>
### Datetimelike properties
`Series.dt` can be used to access the values of the series as
datetimelike and return several properties.
These can be accessed like `Series.dt.<property>`.
#### Datetime properties
| [`Series.dt.date`](generated/maxframe.dataframe.Series.dt.date.md#maxframe.dataframe.Series.dt.date) | Returns numpy array of python [`datetime.date`](https://docs.python.org/3/library/datetime.html#datetime.date) objects. |
|------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|
| [`Series.dt.time`](generated/maxframe.dataframe.Series.dt.time.md#maxframe.dataframe.Series.dt.time) | Returns numpy array of [`datetime.time`](https://docs.python.org/3/library/datetime.html#datetime.time) objects. |
| [`Series.dt.timetz`](generated/maxframe.dataframe.Series.dt.timetz.md#maxframe.dataframe.Series.dt.timetz) | Returns numpy array of [`datetime.time`](https://docs.python.org/3/library/datetime.html#datetime.time) objects with timezones. |
| [`Series.dt.year`](generated/maxframe.dataframe.Series.dt.year.md#maxframe.dataframe.Series.dt.year) | The year of the datetime. |
| [`Series.dt.month`](generated/maxframe.dataframe.Series.dt.month.md#maxframe.dataframe.Series.dt.month) | The month as January=1, December=12. |
| [`Series.dt.day`](generated/maxframe.dataframe.Series.dt.day.md#maxframe.dataframe.Series.dt.day) | The day of the datetime. |
| [`Series.dt.hour`](generated/maxframe.dataframe.Series.dt.hour.md#maxframe.dataframe.Series.dt.hour) | The hours of the datetime. |
| [`Series.dt.minute`](generated/maxframe.dataframe.Series.dt.minute.md#maxframe.dataframe.Series.dt.minute) | The minutes of the datetime. |
| [`Series.dt.second`](generated/maxframe.dataframe.Series.dt.second.md#maxframe.dataframe.Series.dt.second) | The seconds of the datetime. |
| [`Series.dt.microsecond`](generated/maxframe.dataframe.Series.dt.microsecond.md#maxframe.dataframe.Series.dt.microsecond) | The microseconds of the datetime. |
| [`Series.dt.nanosecond`](generated/maxframe.dataframe.Series.dt.nanosecond.md#maxframe.dataframe.Series.dt.nanosecond) | The nanoseconds of the datetime. |
| [`Series.dt.week`](generated/maxframe.dataframe.Series.dt.week.md#maxframe.dataframe.Series.dt.week) | The week ordinal of the year. |
| [`Series.dt.weekofyear`](generated/maxframe.dataframe.Series.dt.weekofyear.md#maxframe.dataframe.Series.dt.weekofyear) | The week ordinal of the year. |
| [`Series.dt.dayofweek`](generated/maxframe.dataframe.Series.dt.dayofweek.md#maxframe.dataframe.Series.dt.dayofweek) | The day of the week with Monday=0, Sunday=6. |
| [`Series.dt.weekday`](generated/maxframe.dataframe.Series.dt.weekday.md#maxframe.dataframe.Series.dt.weekday) | The day of the week with Monday=0, Sunday=6. |
| [`Series.dt.dayofyear`](generated/maxframe.dataframe.Series.dt.dayofyear.md#maxframe.dataframe.Series.dt.dayofyear) | The ordinal day of the year. |
| [`Series.dt.quarter`](generated/maxframe.dataframe.Series.dt.quarter.md#maxframe.dataframe.Series.dt.quarter) | The quarter of the date. |
| [`Series.dt.is_month_start`](generated/maxframe.dataframe.Series.dt.is_month_start.md#maxframe.dataframe.Series.dt.is_month_start) | Indicates whether the date is the first day of the month. |
| [`Series.dt.is_month_end`](generated/maxframe.dataframe.Series.dt.is_month_end.md#maxframe.dataframe.Series.dt.is_month_end) | Indicates whether the date is the last day of the month. |
| [`Series.dt.is_quarter_start`](generated/maxframe.dataframe.Series.dt.is_quarter_start.md#maxframe.dataframe.Series.dt.is_quarter_start) | Indicator for whether the date is the first day of a quarter. |
| [`Series.dt.is_quarter_end`](generated/maxframe.dataframe.Series.dt.is_quarter_end.md#maxframe.dataframe.Series.dt.is_quarter_end) | Indicator for whether the date is the last day of a quarter. |
| [`Series.dt.is_year_start`](generated/maxframe.dataframe.Series.dt.is_year_start.md#maxframe.dataframe.Series.dt.is_year_start) | Indicate whether the date is the first day of a year. |
| [`Series.dt.is_year_end`](generated/maxframe.dataframe.Series.dt.is_year_end.md#maxframe.dataframe.Series.dt.is_year_end) | Indicate whether the date is the last day of the year. |
| [`Series.dt.is_leap_year`](generated/maxframe.dataframe.Series.dt.is_leap_year.md#maxframe.dataframe.Series.dt.is_leap_year) | Boolean indicator if the date belongs to a leap year. |
| [`Series.dt.daysinmonth`](generated/maxframe.dataframe.Series.dt.daysinmonth.md#maxframe.dataframe.Series.dt.daysinmonth) | The number of days in the month. |
| [`Series.dt.days_in_month`](generated/maxframe.dataframe.Series.dt.days_in_month.md#maxframe.dataframe.Series.dt.days_in_month) | The number of days in the month. |
#### Datetime methods
| [`Series.dt.to_period`](generated/maxframe.dataframe.Series.dt.to_period.md#maxframe.dataframe.Series.dt.to_period)(\*args, \*\*kwargs) | Cast to PeriodArray/PeriodIndex at a particular frequency. |
|-----------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|
| [`Series.dt.to_pydatetime`](generated/maxframe.dataframe.Series.dt.to_pydatetime.md#maxframe.dataframe.Series.dt.to_pydatetime)() | Return the data as an array of [`datetime.datetime`](https://docs.python.org/3/library/datetime.html#datetime.datetime) objects. |
| [`Series.dt.tz_localize`](generated/maxframe.dataframe.Series.dt.tz_localize.md#maxframe.dataframe.Series.dt.tz_localize)(\*args, \*\*kwargs) | Localize tz-naive Datetime Array/Index to tz-aware Datetime Array/Index. |
| [`Series.dt.tz_convert`](generated/maxframe.dataframe.Series.dt.tz_convert.md#maxframe.dataframe.Series.dt.tz_convert)(\*args, \*\*kwargs) | Convert tz-aware Datetime Array/Index from one time zone to another. |
| [`Series.dt.normalize`](generated/maxframe.dataframe.Series.dt.normalize.md#maxframe.dataframe.Series.dt.normalize)(\*args, \*\*kwargs) | Convert times to midnight. |
| [`Series.dt.strftime`](generated/maxframe.dataframe.Series.dt.strftime.md#maxframe.dataframe.Series.dt.strftime)(\*args, \*\*kwargs) | Convert to Index using specified date_format. |
| [`Series.dt.round`](generated/maxframe.dataframe.Series.dt.round.md#maxframe.dataframe.Series.dt.round)(\*args, \*\*kwargs) | Perform round operation on the data to the specified freq. |
| [`Series.dt.floor`](generated/maxframe.dataframe.Series.dt.floor.md#maxframe.dataframe.Series.dt.floor)(\*args, \*\*kwargs) | Perform floor operation on the data to the specified freq. |
| [`Series.dt.ceil`](generated/maxframe.dataframe.Series.dt.ceil.md#maxframe.dataframe.Series.dt.ceil)(\*args, \*\*kwargs) | Perform ceil operation on the data to the specified freq. |
| [`Series.dt.month_name`](generated/maxframe.dataframe.Series.dt.month_name.md#maxframe.dataframe.Series.dt.month_name)(\*args, \*\*kwargs) | Return the month names with specified locale. |
| [`Series.dt.day_name`](generated/maxframe.dataframe.Series.dt.day_name.md#maxframe.dataframe.Series.dt.day_name)(\*args, \*\*kwargs) | Return the day names with specified locale. |
<a id="generated-series-str"></a>
### String handling
`Series.str` can be used to access the values of the series as
strings and apply several methods to it. These can be accessed like
`Series.str.<function/property>`.
| [`Series.str.capitalize`](generated/maxframe.dataframe.Series.str.capitalize.md#maxframe.dataframe.Series.str.capitalize)() | Convert strings in the Series/Index to be capitalized. |
|--------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| [`Series.str.contains`](generated/maxframe.dataframe.Series.str.contains.md#maxframe.dataframe.Series.str.contains)(pat[, case, flags, na, ...]) | Test if pattern or regex is contained within a string of a Series or Index. |
| [`Series.str.count`](generated/maxframe.dataframe.Series.str.count.md#maxframe.dataframe.Series.str.count)(pat[, flags]) | Count occurrences of pattern in each string of the Series/Index. |
| [`Series.str.endswith`](generated/maxframe.dataframe.Series.str.endswith.md#maxframe.dataframe.Series.str.endswith)(pat[, na]) | Test if the end of each string element matches a pattern. |
| [`Series.str.find`](generated/maxframe.dataframe.Series.str.find.md#maxframe.dataframe.Series.str.find)(sub[, start, end]) | Return lowest indexes in each strings in the Series/Index. |
| [`Series.str.len`](generated/maxframe.dataframe.Series.str.len.md#maxframe.dataframe.Series.str.len)() | Compute the length of each element in the Series/Index. |
| [`Series.str.ljust`](generated/maxframe.dataframe.Series.str.ljust.md#maxframe.dataframe.Series.str.ljust)(width[, fillchar]) | Pad right side of strings in the Series/Index. |
| [`Series.str.lower`](generated/maxframe.dataframe.Series.str.lower.md#maxframe.dataframe.Series.str.lower)() | Convert strings in the Series/Index to lowercase. |
| [`Series.str.lstrip`](generated/maxframe.dataframe.Series.str.lstrip.md#maxframe.dataframe.Series.str.lstrip)([to_strip]) | Remove leading characters. |
| [`Series.str.pad`](generated/maxframe.dataframe.Series.str.pad.md#maxframe.dataframe.Series.str.pad)(width[, side, fillchar]) | Pad strings in the Series/Index up to width. |
| [`Series.str.repeat`](generated/maxframe.dataframe.Series.str.repeat.md#maxframe.dataframe.Series.str.repeat)(repeats) | Duplicate each string in the Series or Index. |
| [`Series.str.replace`](generated/maxframe.dataframe.Series.str.replace.md#maxframe.dataframe.Series.str.replace)(pat, repl[, n, case, ...]) | Replace each occurrence of pattern/regex in the Series/Index. |
| [`Series.str.rfind`](generated/maxframe.dataframe.Series.str.rfind.md#maxframe.dataframe.Series.str.rfind)(sub[, start, end]) | Return highest indexes in each strings in the Series/Index. |
| [`Series.str.rjust`](generated/maxframe.dataframe.Series.str.rjust.md#maxframe.dataframe.Series.str.rjust)(width[, fillchar]) | Pad left side of strings in the Series/Index. |
| [`Series.str.rstrip`](generated/maxframe.dataframe.Series.str.rstrip.md#maxframe.dataframe.Series.str.rstrip)([to_strip]) | Remove trailing characters. |
| [`Series.str.slice`](generated/maxframe.dataframe.Series.str.slice.md#maxframe.dataframe.Series.str.slice)([start, stop, step]) | Slice substrings from each element in the Series or Index. |
| [`Series.str.startswith`](generated/maxframe.dataframe.Series.str.startswith.md#maxframe.dataframe.Series.str.startswith)(pat[, na]) | Test if the start of each string element matches a pattern. |
| [`Series.str.strip`](generated/maxframe.dataframe.Series.str.strip.md#maxframe.dataframe.Series.str.strip)([to_strip]) | Remove leading and trailing characters. |
| [`Series.str.swapcase`](generated/maxframe.dataframe.Series.str.swapcase.md#maxframe.dataframe.Series.str.swapcase)() | Convert strings in the Series/Index to be swapcased. |
| [`Series.str.title`](generated/maxframe.dataframe.Series.str.title.md#maxframe.dataframe.Series.str.title)() | Convert strings in the Series/Index to titlecase. |
| [`Series.str.translate`](generated/maxframe.dataframe.Series.str.translate.md#maxframe.dataframe.Series.str.translate)(table) | Map all characters in the string through the given mapping table. |
| [`Series.str.upper`](generated/maxframe.dataframe.Series.str.upper.md#maxframe.dataframe.Series.str.upper)() | Convert strings in the Series/Index to uppercase. |
| [`Series.str.zfill`](generated/maxframe.dataframe.Series.str.zfill.md#maxframe.dataframe.Series.str.zfill)(width) | Pad strings in the Series/Index by prepending '0' characters. |
| [`Series.str.isalnum`](generated/maxframe.dataframe.Series.str.isalnum.md#maxframe.dataframe.Series.str.isalnum)() | Check whether all characters in each string are alphanumeric. |
| [`Series.str.isalpha`](generated/maxframe.dataframe.Series.str.isalpha.md#maxframe.dataframe.Series.str.isalpha)() | Check whether all characters in each string are alphabetic. |
| [`Series.str.isdigit`](generated/maxframe.dataframe.Series.str.isdigit.md#maxframe.dataframe.Series.str.isdigit)() | Check whether all characters in each string are digits. |
| [`Series.str.isspace`](generated/maxframe.dataframe.Series.str.isspace.md#maxframe.dataframe.Series.str.isspace)() | Check whether all characters in each string are whitespace. |
| [`Series.str.islower`](generated/maxframe.dataframe.Series.str.islower.md#maxframe.dataframe.Series.str.islower)() | Check whether all characters in each string are lowercase. |
| [`Series.str.isupper`](generated/maxframe.dataframe.Series.str.isupper.md#maxframe.dataframe.Series.str.isupper)() | Check whether all characters in each string are uppercase. |
| [`Series.str.istitle`](generated/maxframe.dataframe.Series.str.istitle.md#maxframe.dataframe.Series.str.istitle)() | Check whether all characters in each string are titlecase. |
| [`Series.str.isnumeric`](generated/maxframe.dataframe.Series.str.isnumeric.md#maxframe.dataframe.Series.str.isnumeric)() | Check whether all characters in each string are numeric. |
| [`Series.str.isdecimal`](generated/maxframe.dataframe.Series.str.isdecimal.md#maxframe.dataframe.Series.str.isdecimal)() | Check whether all characters in each string are decimal. |
<!-- The following is needed to ensure the generated pages are created with the
correct template (otherwise they would be created in the Series/Index class page) -->
<!-- .. autosummary::
:toctree: generated/
:template: accessor.rst
Series.str
Series.dt -->
<a id="generated-series-dict"></a>
### Dict properties
`Series.dict` can be used to access the methods of the series with dict values.
These can be accessed like `Series.dict.<method>`.
#### Dict methods
| [`Series.dict.__getitem__`](generated/maxframe.dataframe.Series.dict.__getitem__.md#maxframe.dataframe.Series.dict.__getitem__)(query_key) | Get the value by the key of each dict in the Series. |
|---------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|
| [`Series.dict.__setitem__`](generated/maxframe.dataframe.Series.dict.__setitem__.md#maxframe.dataframe.Series.dict.__setitem__)(query_key, value) | Set the value with the key to each dict of the Series. |
| [`Series.dict.contains`](generated/maxframe.dataframe.Series.dict.contains.md#maxframe.dataframe.Series.dict.contains)(query_key) | Check whether the key is in each dict of the Series. |
| [`Series.dict.get`](generated/maxframe.dataframe.Series.dict.get.md#maxframe.dataframe.Series.dict.get)(query_key[, default_value]) | Get the value by the key of each dict in the Series. |
| [`Series.dict.len`](generated/maxframe.dataframe.Series.dict.len.md#maxframe.dataframe.Series.dict.len)() | Get the length of each dict of the Series. |
| [`Series.dict.remove`](generated/maxframe.dataframe.Series.dict.remove.md#maxframe.dataframe.Series.dict.remove)(query_key[, ignore_key_error]) | Remove the item by the key from each dict of the Series. |
<a id="generated-series-list"></a>
### List properties
`Series.list` can be used to access the methods of the series with list values.
These can be accessed like `Series.list.<method>`.
#### List methods
| [`Series.list.__getitem__`](generated/maxframe.dataframe.Series.list.__getitem__.md#maxframe.dataframe.Series.list.__getitem__)(query_index) | Get the value by the index of each list in the Series. |
|------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|
| [`Series.list.len`](generated/maxframe.dataframe.Series.list.len.md#maxframe.dataframe.Series.list.len)() | Get the length of each list of the Series. |
### Struct properties
`Series.struct` can be used to access the methods of the series with struct values.
These can be accessed like `Series.struct.<method>`.
#### Struct methods
| [`Series.struct.dtypes`](generated/maxframe.dataframe.Series.struct.dtypes.md#maxframe.dataframe.Series.struct.dtypes) | Return the dtype object of each child field of the struct. |
|------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------|
| [`Series.struct.field`](generated/maxframe.dataframe.Series.struct.field.md#maxframe.dataframe.Series.struct.field)(name_or_index) | Extract a child field of a struct as a Series. |
## Plotting
`Series.plot` is both a callable method and a namespace attribute for
specific plotting methods of the form `Series.plot.<kind>`.
| [`Series.plot`](generated/maxframe.dataframe.Series.plot.md#maxframe.dataframe.Series.plot) | alias of `SeriesPlotAccessor` |
|-----------------------------------------------------------------------------------------------|---------------------------------|
| [`Series.plot.area`](generated/maxframe.dataframe.Series.plot.area.md#maxframe.dataframe.Series.plot.area)(\*args, \*\*kwargs) | Draw a stacked area plot. |
|-----------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| [`Series.plot.bar`](generated/maxframe.dataframe.Series.plot.bar.md#maxframe.dataframe.Series.plot.bar)(\*args, \*\*kwargs) | Vertical bar plot. |
| [`Series.plot.barh`](generated/maxframe.dataframe.Series.plot.barh.md#maxframe.dataframe.Series.plot.barh)(\*args, \*\*kwargs) | Make a horizontal bar plot. |
| [`Series.plot.box`](generated/maxframe.dataframe.Series.plot.box.md#maxframe.dataframe.Series.plot.box)(\*args, \*\*kwargs) | Make a box plot of the DataFrame columns. |
| [`Series.plot.density`](generated/maxframe.dataframe.Series.plot.density.md#maxframe.dataframe.Series.plot.density)(\*args, \*\*kwargs) | Generate Kernel Density Estimate plot using Gaussian kernels. |
| [`Series.plot.hist`](generated/maxframe.dataframe.Series.plot.hist.md#maxframe.dataframe.Series.plot.hist)(\*args, \*\*kwargs) | Draw one histogram of the DataFrame's columns. |
| [`Series.plot.kde`](generated/maxframe.dataframe.Series.plot.kde.md#maxframe.dataframe.Series.plot.kde)(\*args, \*\*kwargs) | Generate Kernel Density Estimate plot using Gaussian kernels. |
| [`Series.plot.line`](generated/maxframe.dataframe.Series.plot.line.md#maxframe.dataframe.Series.plot.line)(\*args, \*\*kwargs) | Plot Series or DataFrame as lines. |
| [`Series.plot.pie`](generated/maxframe.dataframe.Series.plot.pie.md#maxframe.dataframe.Series.plot.pie)(\*args, \*\*kwargs) | Generate a pie plot. |
### Serialization / IO / conversion
| [`Series.to_csv`](generated/maxframe.dataframe.Series.to_csv.md#maxframe.dataframe.Series.to_csv)(path[, sep, na_rep, ...]) | Write object to a comma-separated values (csv) file. |
|----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------|
| [`Series.to_dict`](generated/maxframe.dataframe.Series.to_dict.md#maxframe.dataframe.Series.to_dict)([into, batch_size, session]) | Convert Series to {label -> value} dict or dict-like object. |
| [`Series.to_json`](generated/maxframe.dataframe.Series.to_json.md#maxframe.dataframe.Series.to_json)([path, orient, date_format, ...]) | Convert the object to a JSON string. |
| [`Series.to_list`](generated/maxframe.dataframe.Series.to_list.md#maxframe.dataframe.Series.to_list)([batch_size, session]) | Return a list of the values. |
<a id="generated-series-mf"></a>
### MaxFrame Extensions
| [`Series.mf.apply_chunk`](generated/maxframe.dataframe.Series.mf.apply_chunk.md#maxframe.dataframe.Series.mf.apply_chunk)(func[, batch_rows, ...]) | Apply a function that takes pandas Series and outputs pandas DataFrame/Series. |
|------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| [`Series.mf.flatmap`](generated/maxframe.dataframe.Series.mf.flatmap.md#maxframe.dataframe.Series.mf.flatmap)(func[, dtypes, dtype, ...]) | Apply the given function to each row and then flatten results. |
| [`Series.mf.flatjson`](generated/maxframe.dataframe.Series.mf.flatjson.md#maxframe.dataframe.Series.mf.flatjson)(query_paths[, dtypes, ...]) | Flat JSON object in the series to a dataframe according to JSON query. |
`Series.mf` The Series.mf provides methods unique to MaxFrame. These methods are collated from application
scenarios in MaxCompute and these can be accessed like `Series.mf.<function/property>`.
FILE:references/maxframe-client-docs/reference/index.md
<a id="reference-index"></a>
# API reference
* [MaxFrame Tensor](tensor/routines.md)
* [Tensor Creation Routines](tensor/creation.md)
* [Tensor Indexing Routines](tensor/indexing.md)
* [Tensor Manipulation Routines](tensor/manipulation.md)
* [Binary Operations](tensor/binary.md)
* [Discrete Fourier Transform](tensor/fft.md)
* [Tensor Indexing Routines](tensor/indexing.md)
* [Linear Algebra](tensor/linalg.md)
* [Logic Functions](tensor/logic.md)
* [Mathematical Functions](tensor/math.md)
* [Random Sampling](tensor/random.md)
* [Set routines](tensor/sets.md)
* [Sorting, Searching, and Counting](tensor/sorting.md)
* [Special Functions](tensor/special.md)
* [Statistics](tensor/statistics.md)
* [MaxFrame DataFrame](dataframe/index.md)
* [Input/output](dataframe/io.md)
* [General functions](dataframe/general_functions.md)
* [Series](dataframe/series.md)
* [DataFrame](dataframe/frame.md)
* [Index objects](dataframe/indexing.md)
* [GroupBy](dataframe/groupby.md)
* [MaxFrame Learn](learn/index.md)
* [Clustering](learn/cluster.md)
* [Datasets](learn/datasets.md)
* [LightGBM Integration](learn/lightgbm.md)
* [LLM Integration](learn/llm.md)
* [Metrics](learn/metrics.md)
* [Model Selection](learn/model_selection.md)
* [Preprocessing](learn/preprocessing.md)
* [Utilities](learn/utils.md)
* [XGBoost Integration](learn/xgboost.md)
FILE:references/maxframe-client-docs/reference/learn/cluster.md
<a id="learn-cluster-ref"></a>
# Clustering
## Classes
| [`cluster.KMeans`](generated/maxframe.learn.cluster.KMeans.md#maxframe.learn.cluster.KMeans)([n_clusters, init, n_init, ...]) | K-Means clustering. |
|---------------------------------------------------------------------------------------------------------------------------------|-----------------------|
## Functions
| [`cluster.k_means`](generated/maxframe.learn.cluster.k_means.md#maxframe.learn.cluster.k_means)(X, n_clusters[, ...]) | K-means clustering algorithm. |
|-------------------------------------------------------------------------------------------------------------------------|---------------------------------|
FILE:references/maxframe-client-docs/reference/learn/datasets.md
<a id="learn-datasets-ref"></a>
# Datasets
## Sample generator
| [`datasets.make_blobs`](generated/maxframe.learn.datasets.make_blobs.md#maxframe.learn.datasets.make_blobs)([n_samples, n_features, ...]) | Generate isotropic Gaussian blobs for clustering. |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| [`datasets.make_classification`](generated/maxframe.learn.datasets.make_classification.md#maxframe.learn.datasets.make_classification)([n_samples, ...]) | Generate a random n-class classification problem. |
| [`datasets.make_low_rank_matrix`](generated/maxframe.learn.datasets.make_low_rank_matrix.md#maxframe.learn.datasets.make_low_rank_matrix)([n_samples, ...]) | Generate a mostly low rank matrix with bell-shaped singular values |
| [`datasets.make_regression`](generated/maxframe.learn.datasets.make_regression.md#maxframe.learn.datasets.make_regression)([n_samples, ...]) | Generate a random regression problem. |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.cluster.KMeans.md
# maxframe.learn.cluster.KMeans
### *class* maxframe.learn.cluster.KMeans(n_clusters=8, init='k-means||', n_init=1, max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='auto', oversampling_factor=2, init_iter=5)
K-Means clustering.
Read more in the User Guide.
* **Parameters:**
* **n_clusters** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=8*) – The number of clusters to form as well as the number of
centroids to generate.
* **init** ( *{'k-means++'* *,* *'k-means* *|* *|* *'* *,* *'random'}* *or* *tensor* *of* *shape* *(**n_clusters* *,* *n_features* *)* *,* *default='k-means* *|* *|* *'*) –
Method for initialization, defaults to ‘k-means||’:
’k-means++’ : selects initial cluster centers for k-mean
clustering in a smart way to speed up convergence. See section
Notes in k_init for more details.
’k-means||’: scalable k-means++.
’random’: choose k observations (rows) at random from data for
the initial centroids.
If a tensor is passed, it should be of shape (n_clusters, n_features)
and gives the initial centers.
* **n_init** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=1*) – Number of time the k-means algorithm will be run with different
centroid seeds. The final results will be the best output of
n_init consecutive runs in terms of inertia.
* **max_iter** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=300*) – Maximum number of iterations of the k-means algorithm for a
single run.
* **tol** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *default=1e-4*) – Relative tolerance with regards to inertia to declare convergence.
* **verbose** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=0*) – Verbosity mode.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *RandomState instance* *,* *default=None*) – Determines random number generation for centroid initialization. Use
an int to make the randomness deterministic.
See Glossary.
* **copy_x** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=True*) – When pre-computing distances it is more numerically accurate to center
the data first. If copy_x is True (default), then the original data is
not modified, ensuring X is C-contiguous. If False, the original data
is modified, and put back before the function returns, but small
numerical differences may be introduced by subtracting and then adding
the data mean, in this case it will also not ensure that data is
C-contiguous which may cause a significant slowdown.
* **algorithm** ( *{"auto"* *,* *"full"* *,* *"elkan"}* *,* *default="auto"*) – K-means algorithm to use. The classical EM-style algorithm is “full”.
The “elkan” variation is more efficient by using the triangle
inequality, but currently doesn’t support sparse data. “auto” chooses
“elkan” for dense data and “full” for sparse data.
* **oversampling_factor** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=2*) – Only work for kmeans||, used in each iteration in kmeans||.
* **init_iter** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=5*) – Only work for kmeans||, indicates how may iterations required.
#### cluster_centers_
Coordinates of cluster centers. If the algorithm stops before fully
converging (see `tol` and `max_iter`), these will not be
consistent with `labels_`.
* **Type:**
tensor of shape (n_clusters, n_features)
#### labels_
Labels of each point
* **Type:**
tensor of shape (n_samples,)
#### inertia_
Sum of squared distances of samples to their closest cluster center.
* **Type:**
[float](https://docs.python.org/3/library/functions.html#float)
#### n_iter_
Number of iterations run.
* **Type:**
[int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
`MiniBatchKMeans`
: Alternative online implementation that does incremental updates of the centers positions using mini-batches. For large scale learning (say n_samples > 10k) MiniBatchKMeans is probably much faster than the default batch implementation.
### Notes
The k-means problem is solved using either Lloyd’s or Elkan’s algorithm.
The average complexity is given by O(k n T), were n is the number of
samples and T is the number of iteration.
The worst case complexity is given by O(n^(k+2/p)) with
n = n_samples, p = n_features. (D. Arthur and S. Vassilvitskii,
‘How slow is the k-means method?’ SoCG2006)
In practice, the k-means algorithm is very fast (one of the fastest
clustering algorithms available), but it falls in local minima. That’s why
it can be useful to restart it several times.
If the algorithm stops before fully converging (because of `tol` or
`max_iter`), `labels_` and `cluster_centers_` will not be consistent,
i.e. the `cluster_centers_` will not be the means of the points in each
cluster. Also, the estimator will reassign `labels_` after the last
iteration to make `labels_` consistent with `predict` on the training
set.
### Examples
```pycon
>>> from maxframe.learn.cluster import KMeans
>>> import maxframe.tensor as mt
>>> X = mt.array([[1, 2], [1, 4], [1, 0],
... [10, 2], [10, 4], [10, 0]])
>>> kmeans = KMeans(n_clusters=2, random_state=0, init='k-means++').fit(X).execute()
>>> kmeans.labels_
array([1, 1, 1, 0, 0, 0], dtype=int32)
>>> kmeans.predict([[0, 0], [12, 3]]).execute()
array([1, 0], dtype=int32)
>>> kmeans.cluster_centers_
array([[10., 2.],
[ 1., 2.]])
```
#### \_\_init_\_(n_clusters=8, init='k-means||', n_init=1, max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='auto', oversampling_factor=2, init_iter=5)
### Methods
| [`__init__`](#maxframe.learn.cluster.KMeans.__init__)([n_clusters, init, n_init, ...]) | |
|------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| `execute`([session, run_kwargs, extra_tileables]) | |
| `fetch`([session, run_kwargs]) | |
| `fit`(X[, y, sample_weight, execute, session, ...]) | Compute k-means clustering. |
| `fit_predict`(X[, y, execute, sample_weight, ...]) | Compute cluster centers and predict cluster index for each sample. |
| `fit_transform`(X[, y, sample_weight, ...]) | Compute clustering and transform X to cluster-distance space. |
| `get_metadata_routing`() | Get metadata routing of this object. |
| `get_params`([deep]) | Get parameters for this estimator. |
| `predict`(X[, sample_weight, execute, ...]) | Predict the closest cluster each sample in X belongs to. |
| `score`(X[, y, execute, sample_weight, ...]) | Opposite of the value of X on the K-means objective. |
| `set_fit_request`(\*[, execute, run_kwargs, ...]) | Request metadata passed to the `fit` method. |
| `set_params`(\*\*params) | Set the parameters of this estimator. |
| `set_predict_request`(\*[, execute, ...]) | Request metadata passed to the `predict` method. |
| `set_score_request`(\*[, execute, run_kwargs, ...]) | Request metadata passed to the `score` method. |
| `set_transform_request`(\*[, run_kwargs, session]) | Request metadata passed to the `transform` method. |
| `transform`(X[, session, run_kwargs]) | Transform X to a cluster-distance space. |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.cluster.k_means.md
# maxframe.learn.cluster.k_means
### maxframe.learn.cluster.k_means(X, n_clusters, sample_weight=None, init='k-means||', n_init=10, max_iter=300, verbose=False, tol=0.0001, random_state=None, copy_x=True, algorithm='auto', oversampling_factor=2, init_iter=5, return_n_iter=False)
K-means clustering algorithm.
* **Parameters:**
* **X** (*Tensor* *,* *shape* *(**n_samples* *,* *n_features* *)*) – The observations to cluster. It must be noted that the data
will be converted to C ordering, which will cause a memory copy
if the given data is not C-contiguous.
* **n_clusters** ([*int*](https://docs.python.org/3/library/functions.html#int)) – The number of clusters to form as well as the number of
centroids to generate.
* **sample_weight** (*array-like* *,* *shape* *(**n_samples* *,* *)* *,* *optional*) – The weights for each observation in X. If None, all observations
are assigned equal weight (default: None)
* **init** ( *{'k-means++'* *,* *'k-means* *|* *|* *'* *,* *'random'* *, or* *tensor* *, or* *a callable}* *,* *optional*) –
Method for initialization, default to ‘k-means||’:
’k-means++’ : selects initial cluster centers for k-mean
clustering in a smart way to speed up convergence. See section
Notes in k_init for more details.
’k-means||’: scalable k-means++.
’random’: choose k observations (rows) at random from data for
the initial centroids.
If an ndarray is passed, it should be of shape (n_clusters, n_features)
and gives the initial centers.
If a callable is passed, it should take arguments X, k and
and a random state and return an initialization.
* **n_init** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *,* *default: 10*) – Number of time the k-means algorithm will be run with different
centroid seeds. The final results will be the best output of
n_init consecutive runs in terms of inertia.
* **max_iter** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *,* *default 300*) – Maximum number of iterations of the k-means algorithm to run.
* **verbose** (*boolean* *,* *optional*) – Verbosity mode.
* **tol** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – The relative increment in the results before declaring convergence.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *RandomState instance* *or* *None* *(**default* *)*) – Determines random number generation for centroid initialization. Use
an int to make the randomness deterministic.
See Glossary.
* **copy_x** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – When pre-computing distances it is more numerically accurate to center
the data first. If copy_x is True (default), then the original data is
not modified, ensuring X is C-contiguous. If False, the original data
is modified, and put back before the function returns, but small
numerical differences may be introduced by subtracting and then adding
the data mean, in this case it will also not ensure that data is
C-contiguous which may cause a significant slowdown.
* **algorithm** ( *"auto"* *,* *"full"* *or* *"elkan"* *,* *default="auto"*) – K-means algorithm to use. The classical EM-style algorithm is “full”.
The “elkan” variation is more efficient by using the triangle
inequality, but currently doesn’t support sparse data. “auto” chooses
“elkan” for dense data and “full” for sparse data.
* **oversampling_factor** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=2*) – Only work for kmeans||, used in each iteration in kmeans||.
* **init_iter** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=5*) – Only work for kmeans||, indicates how may iterations required.
* **return_n_iter** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Whether or not to return the number of iterations.
* **Returns:**
* **centroid** (*float ndarray with shape (k, n_features)*) – Centroids found at the last iteration of k-means.
* **label** (*integer ndarray with shape (n_samples,)*) – label[i] is the code or index of the centroid the
i’th observation is closest to.
* **inertia** (*float*) – The final value of the inertia criterion (sum of squared distances to
the closest centroid for all observations in the training set).
* **best_n_iter** (*int*) – Number of iterations corresponding to the best results.
Returned only if return_n_iter is set to True.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.lightgbm.Dataset.md
# maxframe.learn.contrib.lightgbm.Dataset
### maxframe.learn.contrib.lightgbm.Dataset(data, label=None, reference=None, weight=None, group=None, init_score=None, feature_name='auto', categorical_feature='auto', params=None, free_raw_data=True, position=None)
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.lightgbm.LGBMClassifier.md
# maxframe.learn.contrib.lightgbm.LGBMClassifier
### *class* maxframe.learn.contrib.lightgbm.LGBMClassifier(\*args, \*\*kwargs)
#### \_\_init_\_(\*args, \*\*kwargs)
Construct a gradient boosting model.
* **Parameters:**
* **boosting_type** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional* *(**default='gbdt'* *)*) – ‘gbdt’, traditional Gradient Boosting Decision Tree.
‘dart’, Dropouts meet Multiple Additive Regression Trees.
‘goss’, Gradient-based One-Side Sampling.
‘rf’, Random Forest.
* **num_leaves** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=31* *)*) – Maximum tree leaves for base learners.
* **max_depth** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=-1* *)*) – Maximum tree depth for base learners, <=0 means no limit.
* **learning_rate** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=0.1* *)*) – Boosting learning rate.
You can use `callbacks` parameter of `fit` method to shrink/adapt learning rate
in training using `reset_parameter` callback.
Note, that this will ignore the `learning_rate` argument in training.
* **n_estimators** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=100* *)*) – Number of boosted trees to fit.
* **subsample_for_bin** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=200000* *)*) – Number of samples for constructing bins.
* **objective** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *callable* *or* *None* *,* *optional* *(**default=None* *)*) – Specify the learning task and the corresponding learning objective or
a custom objective function to be used (see note below).
Default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker.
* **class_weight** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *'balanced'* *or* *None* *,* *optional* *(**default=None* *)*) – Weights associated with classes in the form `{class_label: weight}`.
Use this parameter only for multi-class classification task;
for binary classification task you may use `is_unbalance` or `scale_pos_weight` parameters.
Note, that the usage of all these parameters will result in poor estimates of the individual class probabilities.
You may want to consider performing probability calibration
([https://scikit-learn.org/stable/modules/calibration.html](https://scikit-learn.org/stable/modules/calibration.html)) of your model.
The ‘balanced’ mode uses the values of y to automatically adjust weights
inversely proportional to class frequencies in the input data as `n_samples / (n_classes * np.bincount(y))`.
If None, all classes are supposed to have weight one.
Note, that these weights will be multiplied with `sample_weight` (passed through the `fit` method)
if `sample_weight` is specified.
* **min_split_gain** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=0.* *)*) – Minimum loss reduction required to make a further partition on a leaf node of the tree.
* **min_child_weight** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=1e-3* *)*) – Minimum sum of instance weight (hessian) needed in a child (leaf).
* **min_child_samples** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=20* *)*) – Minimum number of data needed in a child (leaf).
* **subsample** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=1.* *)*) – Subsample ratio of the training instance.
* **subsample_freq** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=0* *)*) – Frequency of subsample, <=0 means no enable.
* **colsample_bytree** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=1.* *)*) – Subsample ratio of columns when constructing each tree.
* **reg_alpha** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=0.* *)*) – L1 regularization term on weights.
* **reg_lambda** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=0.* *)*) – L2 regularization term on weights.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *RandomState object* *or* *None* *,* *optional* *(**default=None* *)*) – Random number seed.
If int, this number is used to seed the C++ code.
If RandomState object (numpy), a random integer is picked based on its state to seed the C++ code.
If None, default seeds in C++ code are used.
* **n_jobs** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=-1* *)*) – Number of parallel threads.
* **silent** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional* *(**default=True* *)*) – Whether to print messages while running boosting.
* **importance_type** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional* *(**default='split'* *)*) – The type of feature importance to be filled into `feature_importances_`.
If ‘split’, result contains numbers of times the feature is used in a model.
If ‘gain’, result contains total gains of splits which use the feature.
* **\*\*kwargs** –
Other parameters for the model.
Check [http://lightgbm.readthedocs.io/en/latest/Parameters.html](http://lightgbm.readthedocs.io/en/latest/Parameters.html) for more parameters.
#### WARNING
\*\*kwargs is not supported in sklearn, it may cause unexpected issues.
#### NOTE
A custom objective function can be provided for the `objective` parameter.
In this case, it should have the signature
`objective(y_true, y_pred) -> grad, hess` or
`objective(y_true, y_pred, group) -> grad, hess`:
> y_true
> : The target values.
> y_pred
> : The predicted values.
> Predicted values are returned before any transformation,
> e.g. they are raw margin instead of probability of positive class for binary task.
> group
> : Group/query data.
> Only used in the learning-to-rank task.
> sum(group) = n_samples.
> For example, if you have a 100-document dataset with `group = [10, 20, 40, 10, 10, 10]`, that means that you have 6 groups,
> where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.
> grad
> : The value of the first order derivative (gradient) of the loss
> with respect to the elements of y_pred for each sample point.
> hess
> : The value of the second order derivative (Hessian) of the loss
> with respect to the elements of y_pred for each sample point.
For multi-class task, the y_pred is group by class_id first, then group by row_id.
If you want to get i-th row y_pred in j-th class, the access way is y_pred[j \* num_data + i]
and you should group grad and hess in this way as well.
### Methods
| [`__init__`](#maxframe.learn.contrib.lightgbm.LGBMClassifier.__init__)(\*args, \*\*kwargs) | Construct a gradient boosting model. |
|----------------------------------------------------------------------------------------------|------------------------------------------------------------------|
| `execute`([session, run_kwargs]) | |
| `fetch`([session, run_kwargs]) | |
| `fit`(X, y, \*[, sample_weight, init_score, ...]) | unsupported features: 1. |
| `get_metadata_routing`() | Get metadata routing of this object. |
| `get_params`([deep]) | Get parameters for this estimator. |
| `predict`(X[, raw_score, start_iteration, ...]) | Return the predicted value for each sample. |
| `predict_proba`(X[, raw_score, ...]) | Return the predicted probability for each class for each sample. |
| `score`(X, y[, sample_weight]) | Return the mean accuracy on the given test data and labels. |
| `set_fit_request`(\*[, callbacks, ...]) | Request metadata passed to the `fit` method. |
| `set_params`(\*\*params) | Set the parameters of this estimator. |
| `set_predict_proba_request`(\*[, ...]) | Request metadata passed to the `predict_proba` method. |
| `set_predict_request`(\*[, num_iteration, ...]) | Request metadata passed to the `predict` method. |
| `set_score_request`(\*[, sample_weight]) | Request metadata passed to the `score` method. |
### Attributes
| `best_iteration_` | The best iteration of fitted model if `early_stopping()` callback has been specified. |
|------------------------|-----------------------------------------------------------------------------------------|
| `best_score_` | The best score of fitted model. |
| `booster_` | The underlying Booster of this model. |
| `classes_` | The class label array. |
| `evals_result_` | The evaluation results if validation sets have been specified. |
| `feature_importances_` | The feature importances (the higher, the more important). |
| `feature_name_` | The names of features. |
| `n_classes_` | The number of classes. |
| `n_features_` | The number of features of fitted model. |
| `n_features_in_` | The number of features of fitted model. |
| `objective_` | The concrete objective used while fitting this model. |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.lightgbm.LGBMRegressor.md
# maxframe.learn.contrib.lightgbm.LGBMRegressor
### *class* maxframe.learn.contrib.lightgbm.LGBMRegressor(\*args, \*\*kwargs)
#### \_\_init_\_(\*args, \*\*kwargs)
Construct a gradient boosting model.
* **Parameters:**
* **boosting_type** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional* *(**default='gbdt'* *)*) – ‘gbdt’, traditional Gradient Boosting Decision Tree.
‘dart’, Dropouts meet Multiple Additive Regression Trees.
‘goss’, Gradient-based One-Side Sampling.
‘rf’, Random Forest.
* **num_leaves** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=31* *)*) – Maximum tree leaves for base learners.
* **max_depth** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=-1* *)*) – Maximum tree depth for base learners, <=0 means no limit.
* **learning_rate** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=0.1* *)*) – Boosting learning rate.
You can use `callbacks` parameter of `fit` method to shrink/adapt learning rate
in training using `reset_parameter` callback.
Note, that this will ignore the `learning_rate` argument in training.
* **n_estimators** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=100* *)*) – Number of boosted trees to fit.
* **subsample_for_bin** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=200000* *)*) – Number of samples for constructing bins.
* **objective** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *callable* *or* *None* *,* *optional* *(**default=None* *)*) – Specify the learning task and the corresponding learning objective or
a custom objective function to be used (see note below).
Default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker.
* **class_weight** ([*dict*](https://docs.python.org/3/library/stdtypes.html#dict) *,* *'balanced'* *or* *None* *,* *optional* *(**default=None* *)*) – Weights associated with classes in the form `{class_label: weight}`.
Use this parameter only for multi-class classification task;
for binary classification task you may use `is_unbalance` or `scale_pos_weight` parameters.
Note, that the usage of all these parameters will result in poor estimates of the individual class probabilities.
You may want to consider performing probability calibration
([https://scikit-learn.org/stable/modules/calibration.html](https://scikit-learn.org/stable/modules/calibration.html)) of your model.
The ‘balanced’ mode uses the values of y to automatically adjust weights
inversely proportional to class frequencies in the input data as `n_samples / (n_classes * np.bincount(y))`.
If None, all classes are supposed to have weight one.
Note, that these weights will be multiplied with `sample_weight` (passed through the `fit` method)
if `sample_weight` is specified.
* **min_split_gain** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=0.* *)*) – Minimum loss reduction required to make a further partition on a leaf node of the tree.
* **min_child_weight** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=1e-3* *)*) – Minimum sum of instance weight (hessian) needed in a child (leaf).
* **min_child_samples** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=20* *)*) – Minimum number of data needed in a child (leaf).
* **subsample** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=1.* *)*) – Subsample ratio of the training instance.
* **subsample_freq** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=0* *)*) – Frequency of subsample, <=0 means no enable.
* **colsample_bytree** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=1.* *)*) – Subsample ratio of columns when constructing each tree.
* **reg_alpha** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=0.* *)*) – L1 regularization term on weights.
* **reg_lambda** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=0.* *)*) – L2 regularization term on weights.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *RandomState object* *or* *None* *,* *optional* *(**default=None* *)*) – Random number seed.
If int, this number is used to seed the C++ code.
If RandomState object (numpy), a random integer is picked based on its state to seed the C++ code.
If None, default seeds in C++ code are used.
* **n_jobs** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=-1* *)*) – Number of parallel threads.
* **silent** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional* *(**default=True* *)*) – Whether to print messages while running boosting.
* **importance_type** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional* *(**default='split'* *)*) – The type of feature importance to be filled into `feature_importances_`.
If ‘split’, result contains numbers of times the feature is used in a model.
If ‘gain’, result contains total gains of splits which use the feature.
* **\*\*kwargs** –
Other parameters for the model.
Check [http://lightgbm.readthedocs.io/en/latest/Parameters.html](http://lightgbm.readthedocs.io/en/latest/Parameters.html) for more parameters.
#### WARNING
\*\*kwargs is not supported in sklearn, it may cause unexpected issues.
#### NOTE
A custom objective function can be provided for the `objective` parameter.
In this case, it should have the signature
`objective(y_true, y_pred) -> grad, hess` or
`objective(y_true, y_pred, group) -> grad, hess`:
> y_true
> : The target values.
> y_pred
> : The predicted values.
> Predicted values are returned before any transformation,
> e.g. they are raw margin instead of probability of positive class for binary task.
> group
> : Group/query data.
> Only used in the learning-to-rank task.
> sum(group) = n_samples.
> For example, if you have a 100-document dataset with `group = [10, 20, 40, 10, 10, 10]`, that means that you have 6 groups,
> where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.
> grad
> : The value of the first order derivative (gradient) of the loss
> with respect to the elements of y_pred for each sample point.
> hess
> : The value of the second order derivative (Hessian) of the loss
> with respect to the elements of y_pred for each sample point.
For multi-class task, the y_pred is group by class_id first, then group by row_id.
If you want to get i-th row y_pred in j-th class, the access way is y_pred[j \* num_data + i]
and you should group grad and hess in this way as well.
### Methods
| [`__init__`](#maxframe.learn.contrib.lightgbm.LGBMRegressor.__init__)(\*args, \*\*kwargs) | Construct a gradient boosting model. |
|---------------------------------------------------------------------------------------------|------------------------------------------------------------|
| `execute`([session, run_kwargs]) | |
| `fetch`([session, run_kwargs]) | |
| `fit`(X, y, \*[, sample_weight, init_score, ...]) | unsupported features: 1. |
| `get_metadata_routing`() | Get metadata routing of this object. |
| `get_params`([deep]) | Get parameters for this estimator. |
| `predict`(X[, raw_score, start_iteration, ...]) | Return the predicted value for each sample. |
| `score`(X, y[, sample_weight]) | Return the coefficient of determination of the prediction. |
| `set_fit_request`(\*[, callbacks, ...]) | Request metadata passed to the `fit` method. |
| `set_params`(\*\*params) | Set the parameters of this estimator. |
| `set_predict_request`(\*[, num_iteration, ...]) | Request metadata passed to the `predict` method. |
| `set_score_request`(\*[, sample_weight]) | Request metadata passed to the `score` method. |
### Attributes
| `best_iteration_` | The best iteration of fitted model if `early_stopping()` callback has been specified. |
|------------------------|-----------------------------------------------------------------------------------------|
| `best_score_` | The best score of fitted model. |
| `booster_` | The underlying Booster of this model. |
| `evals_result_` | The evaluation results if validation sets have been specified. |
| `feature_importances_` | The feature importances (the higher, the more important). |
| `feature_name_` | The names of features. |
| `n_features_` | The number of features of fitted model. |
| `n_features_in_` | The number of features of fitted model. |
| `objective_` | The concrete objective used while fitting this model. |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.lightgbm.callback.early_stopping.md
# maxframe.learn.contrib.lightgbm.callback.early_stopping
### maxframe.learn.contrib.lightgbm.callback.early_stopping(stopping_rounds: [int](https://docs.python.org/3/library/functions.html#int), first_metric_only: [bool](https://docs.python.org/3/library/functions.html#bool) = False, verbose: [bool](https://docs.python.org/3/library/functions.html#bool) = True, min_delta: [float](https://docs.python.org/3/library/functions.html#float) | [List](https://docs.python.org/3/library/typing.html#typing.List)[[float](https://docs.python.org/3/library/functions.html#float)] = 0.0) → EarlyStoppingCallback
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.lightgbm.callback.reset_parameter.md
# maxframe.learn.contrib.lightgbm.callback.reset_parameter
### maxframe.learn.contrib.lightgbm.callback.reset_parameter(\*\*kwargs: [list](https://docs.python.org/3/library/stdtypes.html#list) | [Callable](https://docs.python.org/3/library/typing.html#typing.Callable)) → [Callable](https://docs.python.org/3/library/typing.html#typing.Callable)
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.lightgbm.predict.md
# maxframe.learn.contrib.lightgbm.predict
### maxframe.learn.contrib.lightgbm.predict(booster, data, raw_score: [bool](https://docs.python.org/3/library/functions.html#bool) = False, start_iteration: [int](https://docs.python.org/3/library/functions.html#int) = 0, num_iteration: [int](https://docs.python.org/3/library/functions.html#int) = None, pred_leaf: [bool](https://docs.python.org/3/library/functions.html#bool) = False, pred_contrib: [bool](https://docs.python.org/3/library/functions.html#bool) = False, validate_features: [bool](https://docs.python.org/3/library/functions.html#bool) = False, \*\*kwargs)
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.lightgbm.train.md
# maxframe.learn.contrib.lightgbm.train
### maxframe.learn.contrib.lightgbm.train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, feval=None, init_model=None, keep_training_booster=False, callbacks=None, num_class=2, evals_result=None, \*\*kwargs)
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.llm.deploy.config.ModelDeploymentConfig.md
# maxframe.learn.contrib.llm.deploy.config.ModelDeploymentConfig
### *class* maxframe.learn.contrib.llm.deploy.config.ModelDeploymentConfig(\*args, \*\*kwargs)
Model deployment configuration for extending MaxFrame with custom models.
This configuration is designed for users who need to deploy models that are not
available within MaxFrame’s built-in model offerings. It provides a way to specify
custom deployment solutions by informing each MaxFrame worker which framework to use,
which model path to load, and how to load it.
The configuration assumes that models are already set up in the container image or
mounted paths, and uses the current deploy_config to load them. Users are responsible
for ensuring the runtime environment state and compatibility.
* **Parameters:**
* **model_name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The name of the model.
* **model_file** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) –
The **local** file path of the model, e.g., `"/mnt/models/qwen/"`.
When using OSS models, this should match one of the `mount_path` values
in `fs_mounts`.
Note: OSS paths (`oss://...`) are NOT supported directly. Use `fs_mounts`
to mount OSS paths to local paths first.
* **inference_framework_type** ([*InferenceFrameworkEnum*](maxframe.learn.contrib.llm.deploy.framework.InferenceFrameworkEnum.md#maxframe.learn.contrib.llm.deploy.framework.InferenceFrameworkEnum)) – The inference framework of the model.
* **required_resource_files** (*List* *[**Union* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *Any* *]* *]*) – The required resource files of the model.
* **load_params** (*Dict* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *Any* *]*) – The load params of the model.
* **required_cpu** ([*int*](https://docs.python.org/3/library/functions.html#int)) – The required cpu of the model.
* **required_memory** ([*int*](https://docs.python.org/3/library/functions.html#int)) – The required memory of the model.
* **required_gu** ([*int*](https://docs.python.org/3/library/functions.html#int)) – The required gu of the model.
* **required_gpu_memory** ([*int*](https://docs.python.org/3/library/functions.html#int)) – The required gpu memory of the model.
* **device** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The device of the model. One of “cpu” or “cuda”.
* **properties** (*Dict* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *Any* *]*) – The properties of the model.
* **tags** (*List* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *]*) – The tags of the model.
* **fs_mounts** (*List* *[**FsMountOptions* *]*) –
File system mount configurations for mounting OSS models to local paths.
Each FsMountOptions contains:
- `path`: OSS source path, e.g., `"oss://bucket/models/qwen/"`
- `mount_path`: Local mount path, e.g., `"/mnt/qwen"`
- `storage_options`: Authentication config (role_arn or AK/SK)
This is consistent with the `with_fs_mount` decorator pattern.
The `model_file` should reference the `mount_path` from one of the mounts.
* **envs** (*Dict* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *]*) – Custom environment variables for the inference subprocess.
Example: `{"CUDA_VISIBLE_DEVICES": "0", "HF_HOME": "/mnt/cache"}`
### Notes
- Preview version for model deployments, all fields could be changed in the future.
**User Responsibility Notice**: Users must have a complete understanding of what
they are computing and ensure they fully comprehend the implications of their
configuration choices. You are responsible for:
* Ensuring model compatibility with the specified inference framework
* Verifying that model files exist and are accessible in the runtime environment
* Confirming that resource requirements (CPU, memory, GPU) are adequate
* Validating that all dependencies and libraries are properly installed
* Understanding the computational behavior and characteristics of your chosen model
#### \_\_init_\_(\*args, \*\*kwargs)
### Methods
| [`__init__`](#maxframe.learn.contrib.llm.deploy.config.ModelDeploymentConfig.__init__)(\*args, \*\*kwargs) | |
|--------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|
| `check_validity`() | Validate the configuration and raise ValueError if invalid. |
| `copy`() | |
| `copy_to`(target) | |
| `is_reasoning_model`() | |
### Attributes
| `required_memory` | |
|----------------------------|----|
| `required_resource_files` | |
| `model_file` | |
| `device` | |
| `properties` | |
| `fs_mounts` | |
| `model_name` | |
| `load_params` | |
| `envs` | |
| `required_cpu` | |
| `required_gpu_memory` | |
| `tags` | |
| `required_gu` | |
| `inference_framework_type` | |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.llm.deploy.framework.InferenceFrameworkEnum.md
# maxframe.learn.contrib.llm.deploy.framework.InferenceFrameworkEnum
### *class* maxframe.learn.contrib.llm.deploy.framework.InferenceFrameworkEnum(value)
#### \_\_init_\_(\*args, \*\*kwds)
### Methods
| `from_string`(label) | |
|------------------------|----|
### Attributes
| `LLAMA_CPP_PYTHON_TEXT` | |
|---------------------------|----|
| `LLAMA_CPP_SERVE_TEXT` | |
| `DASH_SCOPE_TEXT` | |
| `DASH_SCOPE_MULTIMODAL` | |
| `VLLM_SERVE_TEXT` | |
| `OPENAI_REMOTE_TEXT` | |
| `OTHER` | |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.llm.models.dashscope.DashScopeMultiModalLLM.md
# maxframe.learn.contrib.llm.models.dashscope.DashScopeMultiModalLLM
### *class* maxframe.learn.contrib.llm.models.dashscope.DashScopeMultiModalLLM(name: [str](https://docs.python.org/3/library/stdtypes.html#str), api_key_resource: [str](https://docs.python.org/3/library/stdtypes.html#str))
DashScope multi-modal LLM.
#### \_\_init_\_(name: [str](https://docs.python.org/3/library/stdtypes.html#str), api_key_resource: [str](https://docs.python.org/3/library/stdtypes.html#str))
Initialize a DashScope multi-modal LLM.
* **Parameters:**
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The LLM name to use, check DashScope for [available models](https://help.aliyun.com/zh/model-studio/getting-started/models).
* **api_key_resource** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The MaxCompute resource file name containing the DashScope API key.
### Methods
| [`__init__`](#maxframe.learn.contrib.llm.models.dashscope.DashScopeMultiModalLLM.__init__)(name, api_key_resource) | Initialize a DashScope multi-modal LLM. |
|----------------------------------------------------------------------------------------------------------------------|-------------------------------------------|
| `copy`() | |
| `copy_to`(target) | |
| `generate`(data, prompt_template[, params]) | |
| `validate_params`(params) | |
### Attributes
| `api_key_resource` | |
|----------------------|----|
| `name` | |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.llm.models.dashscope.DashScopeTextLLM.md
# maxframe.learn.contrib.llm.models.dashscope.DashScopeTextLLM
### *class* maxframe.learn.contrib.llm.models.dashscope.DashScopeTextLLM(name: [str](https://docs.python.org/3/library/stdtypes.html#str), api_key_resource: [str](https://docs.python.org/3/library/stdtypes.html#str))
DashScope text LLM.
#### \_\_init_\_(name: [str](https://docs.python.org/3/library/stdtypes.html#str), api_key_resource: [str](https://docs.python.org/3/library/stdtypes.html#str))
Initialize a DashScope text LLM.
* **Parameters:**
* **name** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The LLM name to use, check DashScope for [available models](https://help.aliyun.com/zh/model-studio/getting-started/models).
* **api_key_resource** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – The MaxCompute resource file name containing the DashScope API key.
### Methods
| [`__init__`](#maxframe.learn.contrib.llm.models.dashscope.DashScopeTextLLM.__init__)(name, api_key_resource) | Initialize a DashScope text LLM. |
|----------------------------------------------------------------------------------------------------------------|------------------------------------|
| `classify`(series, labels[, description, ...]) | |
| `copy`() | |
| `copy_to`(target) | |
| `extract`(series, schema[, description, ...]) | |
| `generate`(data, prompt_template[, params]) | |
| `summarize`(series[, index]) | |
| `translate`(series, target_language[, ...]) | |
| `validate_params`(params) | |
### Attributes
| `api_key_resource` | |
|----------------------|----|
| `name` | |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.llm.models.managed.ManagedTextLLM.md
# maxframe.learn.contrib.llm.models.managed.ManagedTextLLM
### maxframe.learn.contrib.llm.models.managed.ManagedTextLLM
alias of `ManagedTextGenLLM`
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.llm.multi_modal.generate.md
# maxframe.learn.contrib.llm.multi_modal.generate
### maxframe.learn.contrib.llm.multi_modal.generate(data, model: MultiModalLLM, prompt_template: [Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)], params: [Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)] = None)
Generate text with multi model llm based on given data and prompt template.
* **Parameters:**
* **data** ([*DataFrame*](../../dataframe/generated/maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *or* [*Series*](../../dataframe/generated/maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – Input data used for generation. Can be maxframe DataFrame, Series that contain text to be processed.
* **model** (*MultiModalLLM*) – Language model instance support **MultiModal** inputs used for text generation.
* **prompt_template** (*List* *[**Dict* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *List* *[**Dict* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *]* *]* *]* *]*) –
List of message with column names as placeholders. Each message contains a role and content. Content is a list of dict, each dict contains a text or image, the value can reference column data from input.
Here is an example of prompt template.
```python
[
{
"role": "<role>", # e.g. "user" or "assistant"
"content": [
{
# At least one of these fields is required
"image": "<image_data_url>", # optional
"text": "<prompt_text_template>" # optional
},
...
]
}
]
```
Where:
- `text` can be a Python format string using column names from input data as parameters (e.g. `"{column_name}"`)
- `image` should be a DataURL string following [RFC2397](https://en.wikipedia.org/wiki/Data_URI_scheme) standard with format.
```none
data:<mime_type>[;base64],<column_name>
```
params
: Additional parameters for generation configuration, by default None.
Can include settings like temperature, max_tokens, etc.
* **Returns:**
Generated text raw response and success status. If the success is False, the generated text will return the
error message.
* **Return type:**
[DataFrame](../../dataframe/generated/maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Notes
- The `api_key_resource` parameter should reference a text file resource in MaxCompute that contains only your DashScope API key.
- Using DashScope services requires enabling public network access for your MaxCompute project. This can be configured through the MaxCompute console by [enabling the Internet access feature](https://help.aliyun.com/zh/maxcompute/user-guide/network-connection-process) for your project. Without this configuration, the API calls to DashScope will fail due to network connectivity issues.
### Examples
You can initialize a DashScope multi-modal model (such as qwen-vl-max) by providing a model name and an `api_key_resource`.
The `api_key_resource` is a MaxCompute resource name that points to a text file containing a [DashScope](https://dashscope.aliyun.com/) API key.
```pycon
>>> from maxframe.learn.contrib.llm.models.dashscope import DashScopeMultiModalLLM
>>> import maxframe.dataframe as md
>>>
>>> model = DashScopeMultiModalLLM(
... name="qwen-vl-max",
... api_key_resource="<api-key-resource-name>"
... )
```
We use Data Url Schema to provide multi modal input in prompt template, here is an example to fill in the image from table.
Assuming you have a MaxCompute table with two columns: `image_id` (as the index) and `encoded_image_data_base64` (containing Base64 encoded image data),
you can construct a prompt message template as follows:
```pycon
>>> df = md.read_odps_table("image_content", index_col="image_id")
```
```pycon
>>> prompt_template = [
... {
... "role": "user",
... "content": [
... {
... "image": "data:image/png;base64,encoded_image_data_base64",
... },
... {
... "text": "Analyze this image in detail",
... },
... ],
... },
... ]
>>> result = model.generate(df, prompt_template)
>>> result.execute()
```
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.llm.text.TextLLM.md
# maxframe.learn.contrib.llm.text.TextLLM
### maxframe.learn.contrib.llm.text.TextLLM
alias of `TextGenLLM`
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.llm.text.classify.md
# maxframe.learn.contrib.llm.text.classify
### maxframe.learn.contrib.llm.text.classify(series, model: TextGenLLM, labels: [List](https://docs.python.org/3/library/typing.html#typing.List)[[str](https://docs.python.org/3/library/stdtypes.html#str)], description: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, examples: [List](https://docs.python.org/3/library/typing.html#typing.List)[[Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [str](https://docs.python.org/3/library/stdtypes.html#str)]] = None, index=None)
Classify text content in a series with given labels using a language model.
* **Parameters:**
* **series** ([*Series*](../../dataframe/generated/maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – A maxframe Series containing text data to be classified.
Each element should be a text string.
* **model** (*TextGenLLM*) – Language model instance used for text classification.
* **labels** (*List* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *]*) – List of labels to classify the text into.
* **description** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Description of the classification task to help the model understand the context.
* **examples** (*List* *[**Dict* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *]* *]* *,* *optional*) – Examples of the classification task, like [{“text”: “text…”, “label”: “A”, “reason”: “reason…”}],
to help LLM better understand your classification rules.
* **index** (*array-like* *,* *optional*) – Index for the output series, by default None, will generate new index.
* **Returns:**
A DataFrame containing the generated classification results and success status.
Columns include ‘label’ (predicted label), ‘reason’ (reasoning), and ‘success’ (boolean status).
If ‘success’ is False, the ‘label’ and ‘reason’ columns will contain error information instead of the expected output.
* **Return type:**
[DataFrame](../../dataframe/generated/maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> from maxframe.learn.contrib.llm.models.managed import ManagedTextGenLLM
>>> import maxframe.dataframe as md
>>>
>>> # Initialize the model
>>> llm = ManagedTextGenLLM(name="Qwen3-0.6B")
>>>
>>> # Create sample data
>>> texts = md.Series([
... "I love this product! It's amazing!",
... "This is terrible, worst purchase ever.",
... "It's okay, nothing special."
... ])
>>>
>>> # Classify sentiment
>>> labels = ["positive", "negative", "neutral"]
>>> description = "Classify the sentiment of customer reviews"
>>> examples = [
... {"text": "Great product!", "label": "positive", "reason": "Expresses satisfaction"},
... {"text": "Poor quality", "label": "negative", "reason": "Expresses dissatisfaction"}
... ]
>>> result = classify(texts, llm, labels=labels, description=description, examples=examples)
>>> result.execute()
```
### Notes
**Preview:** This API is in preview state and may be unstable.
The interface may change in future releases.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.llm.text.extract.md
# maxframe.learn.contrib.llm.text.extract
### maxframe.learn.contrib.llm.text.extract(series, model: TextGenLLM, schema: [Any](https://docs.python.org/3/library/typing.html#typing.Any), description: [str](https://docs.python.org/3/library/stdtypes.html#str) = None, examples: [List](https://docs.python.org/3/library/typing.html#typing.List)[[Tuple](https://docs.python.org/3/library/typing.html#typing.Tuple)[[str](https://docs.python.org/3/library/stdtypes.html#str), [str](https://docs.python.org/3/library/stdtypes.html#str)]] = None, index=None)
Extract structured information from text content in a series using a language model.
* **Parameters:**
* **series** ([*Series*](../../dataframe/generated/maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – A maxframe Series containing text data to extract information from.
Each element should be a text string.
* **model** (*TextGenLLM*) – Language model instance used for information extraction.
* **schema** (*Any*) – Schema definition for the extraction. Can be a dictionary defining the structure
or a Pydantic BaseModel class that will be converted to JSON schema.
* **description** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – Description of the extraction task to help the model understand what to extract.
* **examples** (*List* *[**Tuple* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *]* *]* *,* *optional*) – Examples of the extraction task in format [(input_text, expected_output), …],
to help LLM better understand the extraction requirements.
* **index** (*array-like* *,* *optional*) – Index for the output series, by default None, will generate new index.
* **Returns:**
A DataFrame containing the extracted information and success status.
Columns include ‘output’ (extracted structured data) and ‘success’ (boolean status).
If ‘success’ is False, the ‘output’ column will contain error information instead of the expected output.
* **Return type:**
[DataFrame](../../dataframe/generated/maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> from maxframe.learn.contrib.llm.models.managed import ManagedTextGenLLM
>>> import maxframe.dataframe as md
>>>
>>> # Initialize the model
>>> llm = ManagedTextGenLLM(name="Qwen3-0.6B")
>>>
>>> # Create sample data
>>> texts = md.Series([
... "John Smith, age 30, works as a Software Engineer at Google.",
... "Alice Johnson, 25 years old, is a Data Scientist at Microsoft."
... ])
>>>
>>> # Define extraction schema
>>> schema = {
... "name": "string",
... "age": "integer",
... "job_title": "string",
... "company": "string"
... }
>>>
>>> # Extract structured information
>>> description = "Extract person information from text"
>>> examples = [
... ("Bob Brown, 35, Manager at Apple", '{"name": "Bob Brown", "age": 35, "job_title": "Manager", "company": "Apple"}')
... ]
>>> result = extract(texts, llm, schema=schema, description=description, examples=examples)
>>> result.execute()
```
### Notes
**Preview:** This API is in preview state and may be unstable.
The interface may change in future releases.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.llm.text.generate.md
# maxframe.learn.contrib.llm.text.generate
### maxframe.learn.contrib.llm.text.generate(data, model: TextGenLLM, prompt_template: [List](https://docs.python.org/3/library/typing.html#typing.List)[[Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)]], params: [Dict](https://docs.python.org/3/library/typing.html#typing.Dict)[[str](https://docs.python.org/3/library/stdtypes.html#str), [Any](https://docs.python.org/3/library/typing.html#typing.Any)] = None)
Generate text using a text language model based on given data and prompt template.
* **Parameters:**
* **data** ([*DataFrame*](../../dataframe/generated/maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame) *or* [*Series*](../../dataframe/generated/maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – Input data used for generation. Can be maxframe DataFrame, Series that contain text to be processed.
* **model** (*TextLLM*) – Language model instance used for text generation.
* **prompt_template** (*List* *[**Dict* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *]* *]*) –
Dictionary containing the conversation messages template. Use `{col_name}` as a placeholder to reference
column data from input data.
Usually in format of [{“role”: “user”, “content”: “{query}”}], same with openai api schema.
* **params** (*Dict* *[*[*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *Any* *]* *,* *optional*) – Additional parameters for generation configuration, by default None.
Can include settings like temperature, max_tokens, etc.
* **Returns:**
Generated text raw response and success status. If the success is False, the generated text will return the
error message.
* **Return type:**
[DataFrame](../../dataframe/generated/maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> from maxframe.learn.contrib.llm.models.managed import ManagedTextGenLLM
>>> import maxframe.dataframe as md
>>>
>>> # Initialize the model
>>> llm = ManagedTextGenLLM(name="Qwen3-0.6B")
>>>
>>> # Prepare prompt template
>>> messages = [
... {
... "role": "user",
... "content": "Help answer following question: {query}",
... },
... ]
```
```pycon
>>> # Create sample data
>>> df = md.DataFrame({"query": ["What is machine learning?"]})
>>>
>>> # Generate response
>>> result = generate(df, llm, prompt_template=messages)
>>> result.execute()
```
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.llm.text.summary.md
# maxframe.learn.contrib.llm.text.summary
### maxframe.learn.contrib.llm.text.summary(series, model: TextGenLLM, index=None)
Generate summaries for text content in a series using a language model.
* **Parameters:**
* **series** ([*Series*](../../dataframe/generated/maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – A maxframe Series containing text data to be summarized.
Each element should be a text string.
* **model** (*TextGenLLM*) – Language model instance used for text summarization.
* **index** (*array-like* *,* *optional*) – Index for the output series, by default None, will generate new index.
* **Returns:**
A DataFrame containing the generated summaries and success status.
Columns include ‘summary’ (generated summary text) and ‘success’ (boolean status).
If ‘success’ is False, the ‘summary’ column will contain error information instead of the expected output.
* **Return type:**
[DataFrame](../../dataframe/generated/maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> from maxframe.learn.contrib.llm.models.managed import ManagedTextGenLLM
>>> import maxframe.dataframe as md
>>>
>>> # Initialize the model
>>> llm = ManagedTextGenLLM(name="Qwen3-0.6B")
>>>
>>> # Create sample data
>>> texts = md.Series([
... "Machine learning is a subset of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed.",
... "Deep learning uses neural networks with multiple layers to model and understand complex patterns in data."
... ])
>>>
>>> # Generate summaries
>>> result = summary(texts, llm)
>>> result.execute()
```
### Notes
**Preview:** This API is in preview state and may be unstable.
The interface may change in future releases.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.llm.text.translate.md
# maxframe.learn.contrib.llm.text.translate
### maxframe.learn.contrib.llm.text.translate(series, model: TextGenLLM, source_language: [str](https://docs.python.org/3/library/stdtypes.html#str), target_language: [str](https://docs.python.org/3/library/stdtypes.html#str), index=None)
Translate text content in a series using a language model from source language to target language.
* **Parameters:**
* **series** ([*Series*](../../dataframe/generated/maxframe.dataframe.Series.md#maxframe.dataframe.Series)) – A maxframe Series containing text data to translate.
Each element should be a text string.
* **model** (*TextGenLLM*) – Language model instance used for text translation.
* **source_language** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Source language of the text (e.g., ‘en’, ‘zh’, ‘ja’).
* **target_language** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Target language for translation (e.g., ‘en’, ‘zh’, ‘ja’).
* **index** (*array-like* *,* *optional*) – Index for the output series, by default None, will generate new index.
* **Returns:**
A DataFrame containing the generated translations and success status.
Columns include ‘output’ (translated text) and ‘success’ (boolean status).
If ‘success’ is False, the ‘output’ column will contain error information instead of the expected output.
* **Return type:**
[DataFrame](../../dataframe/generated/maxframe.dataframe.DataFrame.md#maxframe.dataframe.DataFrame)
### Examples
```pycon
>>> from maxframe.learn.contrib.llm.models.managed import ManagedTextGenLLM
>>> import maxframe.dataframe as md
>>>
>>> # Initialize the model
>>> llm = ManagedTextGenLLM(name="Qwen3-0.6B")
>>>
>>> # Create sample data
>>> texts = md.Series([
... "Hello, how are you?",
... "Machine learning is fascinating."
... ])
>>>
>>> # Translate from English to Chinese
>>> result = translate(texts, llm, source_language="en", target_language="zh")
>>> result.execute()
```
### Notes
**Preview:** This API is in preview state and may be unstable.
The interface may change in future releases.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.xgboost.DMatrix.md
# maxframe.learn.contrib.xgboost.DMatrix
### maxframe.learn.contrib.xgboost.DMatrix(data, label=None, missing=None, weight=None, base_margin=None, feature_names=None, feature_types=None, feature_weights=None, nthread=None, group=None, qid=None, label_lower_bound=None, label_upper_bound=None, enable_categorical=None)
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.xgboost.XGBClassifier.md
# maxframe.learn.contrib.xgboost.XGBClassifier
### *class* maxframe.learn.contrib.xgboost.XGBClassifier(xgb_model: XGBClassifier | Booster = None, \*\*kwargs)
Implementation of the scikit-learn API for XGBoost classification.
#### \_\_init_\_(xgb_model: XGBClassifier | Booster = None, \*\*kwargs)
### Methods
| [`__init__`](#maxframe.learn.contrib.xgboost.XGBClassifier.__init__)([xgb_model]) | |
|-------------------------------------------------------------------------------------|-------------------------------------------------------------|
| `apply`(X[, iteration_range]) | Return the predicted leaf every tree for each sample. |
| `evals_result`(\*\*kw) | Return the evaluation results. |
| `execute`([session, run_kwargs]) | |
| `fetch`([session, run_kwargs]) | |
| `fit`(X, y[, sample_weight, base_margin, ...]) | Fit gradient boosting model. |
| `get_booster`() | Get the underlying xgboost Booster of this model. |
| `get_metadata_routing`() | Get metadata routing of this object. |
| `get_num_boosting_rounds`() | Gets the number of xgboost boosting rounds. |
| `get_params`([deep]) | Get parameters. |
| `get_xgb_params`() | Get xgboost specific parameters. |
| `load_model`(fname) | Load the model from a file or bytearray. |
| `predict`(data, \*\*kw) | Predict with data. |
| `predict_proba`(data[, ntree_limit, flag]) | |
| `save_model`(fname) | Save the model to a file. |
| `score`(X, y[, sample_weight]) | Return the mean accuracy on the given test data and labels. |
| `set_fit_request`(\*[, base_margin, ...]) | Request metadata passed to the `fit` method. |
| `set_params`(\*\*params) | Set the parameters of this estimator. |
| `set_predict_proba_request`(\*[, data, flag, ...]) | Request metadata passed to the `predict_proba` method. |
| `set_predict_request`(\*[, data]) | Request metadata passed to the `predict` method. |
| `set_score_request`(\*[, sample_weight]) | Request metadata passed to the `score` method. |
| `to_odps_model`(model_name[, model_version, ...]) | Save trained model to MaxCompute. |
### Attributes
| `best_iteration` | The best iteration obtained by early stopping. |
|------------------------|----------------------------------------------------------------------------|
| `best_score` | The best score obtained by early stopping. |
| `classes_` | |
| `coef_` | Coefficients property |
| `feature_importances_` | Feature importances property, return depends on importance_type parameter. |
| `feature_names_in_` | Names of features seen during `fit()`. |
| `intercept_` | Intercept (bias) property |
| `n_features_in_` | Number of features seen during `fit()`. |
| `training_info_` | |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.xgboost.XGBRegressor.md
# maxframe.learn.contrib.xgboost.XGBRegressor
### *class* maxframe.learn.contrib.xgboost.XGBRegressor(xgb_model: XGBRegressor | Booster = None, \*\*kwargs)
Implementation of the scikit-learn API for XGBoost regressor.
#### \_\_init_\_(xgb_model: XGBRegressor | Booster = None, \*\*kwargs)
### Methods
| [`__init__`](#maxframe.learn.contrib.xgboost.XGBRegressor.__init__)([xgb_model]) | |
|------------------------------------------------------------------------------------|------------------------------------------------------------|
| `apply`(X[, iteration_range]) | Return the predicted leaf every tree for each sample. |
| `evals_result`(\*\*kw) | Return the evaluation results. |
| `execute`([session, run_kwargs]) | |
| `fetch`([session, run_kwargs]) | |
| `fit`(X, y[, sample_weight, base_margin, ...]) | Fit the regressor. |
| `get_booster`() | Get the underlying xgboost Booster of this model. |
| `get_metadata_routing`() | Get metadata routing of this object. |
| `get_num_boosting_rounds`() | Gets the number of xgboost boosting rounds. |
| `get_params`([deep]) | Get parameters. |
| `get_xgb_params`() | Get xgboost specific parameters. |
| `load_model`(fname) | Load the model from a file or bytearray. |
| `predict`(data, \*\*kw) | Predict with data. |
| `save_model`(fname) | Save the model to a file. |
| `score`(X, y[, sample_weight]) | Return the coefficient of determination of the prediction. |
| `set_fit_request`(\*[, base_margin, ...]) | Request metadata passed to the `fit` method. |
| `set_params`(\*\*params) | Set the parameters of this estimator. |
| `set_predict_request`(\*[, data]) | Request metadata passed to the `predict` method. |
| `set_score_request`(\*[, sample_weight]) | Request metadata passed to the `score` method. |
| `to_odps_model`(model_name[, model_version, ...]) | Save trained model to MaxCompute. |
### Attributes
| `best_iteration` | The best iteration obtained by early stopping. |
|------------------------|----------------------------------------------------------------------------|
| `best_score` | The best score obtained by early stopping. |
| `coef_` | Coefficients property |
| `feature_importances_` | Feature importances property, return depends on importance_type parameter. |
| `feature_names_in_` | Names of features seen during `fit()`. |
| `intercept_` | Intercept (bias) property |
| `n_features_in_` | Number of features seen during `fit()`. |
| `training_info_` | |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.xgboost.callback.EarlyStopping.md
# maxframe.learn.contrib.xgboost.callback.EarlyStopping
### *class* maxframe.learn.contrib.xgboost.callback.EarlyStopping(, rounds: [int](https://docs.python.org/3/library/functions.html#int), metric_name: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, data_name: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, maximize: [bool](https://docs.python.org/3/library/functions.html#bool) | [None](https://docs.python.org/3/library/constants.html#None) = None, save_best: [bool](https://docs.python.org/3/library/functions.html#bool) | [None](https://docs.python.org/3/library/constants.html#None) = False, min_delta: [float](https://docs.python.org/3/library/functions.html#float) = 0.0, \*\*kw)
#### \_\_init_\_(, rounds: [int](https://docs.python.org/3/library/functions.html#int), metric_name: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, data_name: [str](https://docs.python.org/3/library/stdtypes.html#str) | [None](https://docs.python.org/3/library/constants.html#None) = None, maximize: [bool](https://docs.python.org/3/library/functions.html#bool) | [None](https://docs.python.org/3/library/constants.html#None) = None, save_best: [bool](https://docs.python.org/3/library/functions.html#bool) | [None](https://docs.python.org/3/library/constants.html#None) = False, min_delta: [float](https://docs.python.org/3/library/functions.html#float) = 0.0, \*\*kw) → [None](https://docs.python.org/3/library/constants.html#None)
### Methods
| [`__init__`](#maxframe.learn.contrib.xgboost.callback.EarlyStopping.__init__)(\*, rounds[, metric_name, ...]) | |
|-----------------------------------------------------------------------------------------------------------------|----|
| `copy`() | |
| `copy_to`(target) | |
| `from_local`(callback_obj) | |
| `has_custom_code`() | |
| `remote_to_local`(remote_obj) | |
| `to_local`() | |
### Attributes
| `rounds` | |
|---------------|----|
| `maximize` | |
| `metric_name` | |
| `min_delta` | |
| `save_best` | |
| `data_name` | |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.xgboost.callback.LearningRateScheduler.md
# maxframe.learn.contrib.xgboost.callback.LearningRateScheduler
### *class* maxframe.learn.contrib.xgboost.callback.LearningRateScheduler(learning_rates: [Callable](https://docs.python.org/3/library/typing.html#typing.Callable)[[[int](https://docs.python.org/3/library/functions.html#int)], [float](https://docs.python.org/3/library/functions.html#float)] | [Sequence](https://docs.python.org/3/library/typing.html#typing.Sequence)[[float](https://docs.python.org/3/library/functions.html#float)], \*\*kw)
#### \_\_init_\_(learning_rates: [Callable](https://docs.python.org/3/library/typing.html#typing.Callable)[[[int](https://docs.python.org/3/library/functions.html#int)], [float](https://docs.python.org/3/library/functions.html#float)] | [Sequence](https://docs.python.org/3/library/typing.html#typing.Sequence)[[float](https://docs.python.org/3/library/functions.html#float)], \*\*kw) → [None](https://docs.python.org/3/library/constants.html#None)
### Methods
| [`__init__`](#maxframe.learn.contrib.xgboost.callback.LearningRateScheduler.__init__)(learning_rates, \*\*kw) | |
|-----------------------------------------------------------------------------------------------------------------|----|
| `copy`() | |
| `copy_to`(target) | |
| `from_local`(callback_obj) | |
| `has_custom_code`() | |
| `remote_to_local`(remote_obj) | |
| `to_local`() | |
### Attributes
| `learning_rates` | |
|--------------------|----|
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.xgboost.predict.md
# maxframe.learn.contrib.xgboost.predict
### maxframe.learn.contrib.xgboost.predict(model, data, output_margin=False, pred_leaf=False, pred_contribs=False, approx_contribs=False, pred_interactions=False, validate_features=True, training=False, iteration_range=None, strict_shape=False, \*\*kwargs)
Using MaxFrame XGBoost model to predict data.
* **Parameters:**
**mode.** (*Parameters are the same as xgboost.train. The predict* *(* *)* *is lazy-execution*)
* **Returns:**
**results**
* **Return type:**
Booster
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.contrib.xgboost.train.md
# maxframe.learn.contrib.xgboost.train
### maxframe.learn.contrib.xgboost.train(params, dtrain, evals=None, evals_result=None, xgb_model=None, num_class=None, \*\*kwargs)
Train XGBoost model in MaxFrame manner.
* **Parameters:**
* **eager-execution** (*Parameters are the same as xgboost.train. Note that train is an*)
* **passed** (*API if evals is*)
* **finished.** (*thus the call will be blocked until training*)
* **Returns:**
**results**
* **Return type:**
Booster
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.datasets.make_blobs.md
# maxframe.learn.datasets.make_blobs
### maxframe.learn.datasets.make_blobs(n_samples=100, n_features=2, centers=None, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None)
Generate isotropic Gaussian blobs for clustering.
Read more in the User Guide.
* **Parameters:**
* **n_samples** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *array-like* *,* *optional* *(**default=100* *)*) – If int, it is the total number of points equally divided among
clusters.
If array-like, each element of the sequence indicates
the number of samples per cluster.
* **n_features** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=2* *)*) – The number of features for each sample.
* **centers** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *array* *of* *shape* *[**n_centers* *,* *n_features* *]* *,* *optional*) – (default=None)
The number of centers to generate, or the fixed center locations.
If n_samples is an int and centers is None, 3 centers are generated.
If n_samples is array-like, centers must be
either None or an array of length equal to the length of n_samples.
* **cluster_std** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *sequence* *of* *floats* *,* *optional* *(**default=1.0* *)*) – The standard deviation of the clusters.
* **center_box** (*pair* *of* *floats* *(**min* *,* *max* *)* *,* *optional* *(**default=* *(* *-10.0* *,* *10.0* *)* *)*) – The bounding box for each cluster center when centers are
generated at random.
* **shuffle** (*boolean* *,* *optional* *(**default=True* *)*) – Shuffle the samples.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *RandomState instance* *or* *None* *(**default* *)*) – Determines random number generation for dataset creation. Pass an int
for reproducible output across multiple function calls.
See Glossary.
* **Returns:**
* **X** (*tensor of shape [n_samples, n_features]*) – The generated samples.
* **y** (*tensor of shape [n_samples]*) – The integer labels for cluster membership of each sample.
### Examples
```pycon
>>> from maxframe.learn.datasets import make_blobs
>>> X, y = make_blobs(n_samples=10, centers=3, n_features=2,
... random_state=0)
>>> print(X.shape)
(10, 2)
>>> y
array([0, 0, 1, 0, 2, 2, 2, 1, 1, 0])
>>> X, y = make_blobs(n_samples=[3, 3, 4], centers=None, n_features=2,
... random_state=0)
>>> print(X.shape)
(10, 2)
>>> y
array([0, 1, 2, 0, 2, 2, 2, 1, 1, 0])
```
#### SEE ALSO
[`make_classification`](maxframe.learn.datasets.make_classification.md#maxframe.learn.datasets.make_classification)
: a more intricate variant
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.datasets.make_classification.md
# maxframe.learn.datasets.make_classification
### maxframe.learn.datasets.make_classification(n_samples=100, n_features=20, n_informative=2, n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, random_state=None)
Generate a random n-class classification problem.
This initially creates clusters of points normally distributed (std=1)
about vertices of an `n_informative`-dimensional hypercube with sides of
length `2*class_sep` and assigns an equal number of clusters to each
class. It introduces interdependence between these features and adds
various types of further noise to the data.
Without shuffling, `X` horizontally stacks features in the following
order: the primary `n_informative` features, followed by `n_redundant`
linear combinations of the informative features, followed by `n_repeated`
duplicates, drawn randomly with replacement from the informative and
redundant features. The remaining features are filled with random noise.
Thus, without shuffling, all useful features are contained in the columns
`X[:, :n_informative + n_redundant + n_repeated]`.
Read more in the User Guide.
* **Parameters:**
* **n_samples** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=100* *)*) – The number of samples.
* **n_features** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=20* *)*) – The total number of features. These comprise `n_informative`
informative features, `n_redundant` redundant features,
`n_repeated` duplicated features and
`n_features-n_informative-n_redundant-n_repeated` useless features
drawn at random.
* **n_informative** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=2* *)*) – The number of informative features. Each class is composed of a number
of gaussian clusters each located around the vertices of a hypercube
in a subspace of dimension `n_informative`. For each cluster,
informative features are drawn independently from N(0, 1) and then
randomly linearly combined within each cluster in order to add
covariance. The clusters are then placed on the vertices of the
hypercube.
* **n_redundant** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=2* *)*) – The number of redundant features. These features are generated as
random linear combinations of the informative features.
* **n_repeated** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=0* *)*) – The number of duplicated features, drawn randomly from the informative
and the redundant features.
* **n_classes** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=2* *)*) – The number of classes (or labels) of the classification problem.
* **n_clusters_per_class** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=2* *)*) – The number of clusters per class.
* **weights** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *floats* *or* *None* *(**default=None* *)*) – The proportions of samples assigned to each class. If None, then
classes are balanced. Note that if `len(weights) == n_classes - 1`,
then the last class weight is automatically inferred.
More than `n_samples` samples may be returned if the sum of
`weights` exceeds 1.
* **flip_y** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=0.01* *)*) – The fraction of samples whose class are randomly exchanged. Larger
values introduce noise in the labels and make the classification
task harder.
* **class_sep** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional* *(**default=1.0* *)*) – The factor multiplying the hypercube size. Larger values spread
out the clusters/classes and make the classification task easier.
* **hypercube** (*boolean* *,* *optional* *(**default=True* *)*) – If True, the clusters are put on the vertices of a hypercube. If
False, the clusters are put on the vertices of a random polytope.
* **shift** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *array* *of* *shape* *[**n_features* *] or* *None* *,* *optional* *(**default=0.0* *)*) – Shift features by the specified value. If None, then features
are shifted by a random value drawn in [-class_sep, class_sep].
* **scale** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *array* *of* *shape* *[**n_features* *] or* *None* *,* *optional* *(**default=1.0* *)*) – Multiply features by the specified value. If None, then features
are scaled by a random value drawn in [1, 100]. Note that scaling
happens after shifting.
* **shuffle** (*boolean* *,* *optional* *(**default=True* *)*) – Shuffle the samples and the features.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *RandomState instance* *or* *None* *(**default* *)*) – Determines random number generation for dataset creation. Pass an int
for reproducible output across multiple function calls.
See Glossary.
* **Returns:**
* **X** (*tensor of shape [n_samples, n_features]*) – The generated samples.
* **y** (*tensor of shape [n_samples]*) – The integer labels for class membership of each sample.
### Notes
The algorithm is adapted from Guyon [1] and was designed to generate
the “Madelon” dataset.
### References
* <a id='id1'>**[1]**</a> I. Guyon, “Design of experiments for the NIPS 2003 variable selection benchmark”, 2003.
#### SEE ALSO
[`make_blobs`](maxframe.learn.datasets.make_blobs.md#maxframe.learn.datasets.make_blobs)
: simplified variant
`make_multilabel_classification`
: unrelated generator for multilabel tasks
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.datasets.make_low_rank_matrix.md
# maxframe.learn.datasets.make_low_rank_matrix
### maxframe.learn.datasets.make_low_rank_matrix(n_samples=100, n_features=100, effective_rank=10, tail_strength=0.5, random_state=None, chunk_size=None)
Generate a mostly low rank matrix with bell-shaped singular values
Most of the variance can be explained by a bell-shaped curve of width
effective_rank: the low rank part of the singular values profile is:
```default
(1 - tail_strength) * exp(-1.0 * (i / effective_rank) ** 2)
```
The remaining singular values’ tail is fat, decreasing as:
```default
tail_strength * exp(-0.1 * i / effective_rank).
```
The low rank part of the profile can be considered the structured
signal part of the data while the tail can be considered the noisy
part of the data that cannot be summarized by a low number of linear
components (singular vectors).
This kind of singular profiles is often seen in practice, for instance:
: - gray level pictures of faces
- TF-IDF vectors of text documents crawled from the web
Read more in the User Guide.
* **Parameters:**
* **n_samples** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=100* *)*) – The number of samples.
* **n_features** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=100* *)*) – The number of features.
* **effective_rank** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional* *(**default=10* *)*) – The approximate number of singular vectors required to explain most of
the data by linear combinations.
* **tail_strength** (*float between 0.0 and 1.0* *,* *optional* *(**default=0.5* *)*) – The relative importance of the fat noisy tail of the singular values
profile.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *RandomState instance* *or* *None* *(**default* *)*) – Determines random number generation for dataset creation. Pass an int
for reproducible output across multiple function calls.
See Glossary.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **Returns:**
**X** – The matrix.
* **Return type:**
array of shape [n_samples, n_features]
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.datasets.make_regression.md
# maxframe.learn.datasets.make_regression
### maxframe.learn.datasets.make_regression(n_samples=100, n_features=100, , n_informative=10, n_targets=1, bias=0.0, effective_rank=None, tail_strength=0.5, noise=0.0, shuffle=True, coef=False, random_state=None)
Generate a random regression problem.
The input set can either be well conditioned (by default) or have a low
rank-fat tail singular profile. See [`make_low_rank_matrix()`](maxframe.learn.datasets.make_low_rank_matrix.md#maxframe.learn.datasets.make_low_rank_matrix) for
more details.
The output is generated by applying a (potentially biased) random linear
regression model with n_informative nonzero regressors to the previously
generated input and some gaussian centered noise with some adjustable
scale.
Read more in the User Guide.
* **Parameters:**
* **n_samples** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=100*) – The number of samples.
* **n_features** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=100*) – The number of features.
* **n_informative** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=10*) – The number of informative features, i.e., the number of features used
to build the linear model used to generate the output.
* **n_targets** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=1*) – The number of regression targets, i.e., the dimension of the y output
vector associated with a sample. By default, the output is a scalar.
* **bias** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *default=0.0*) – The bias term in the underlying linear model.
* **effective_rank** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=None*) –
if not None:
: The approximate number of singular vectors required to explain most
of the input data by linear combinations. Using this kind of
singular spectrum in the input allows the generator to reproduce
the correlations often observed in practice.
if None:
: The input set is well conditioned, centered and gaussian with
unit variance.
* **tail_strength** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *default=0.5*) – The relative importance of the fat noisy tail of the singular values
profile if effective_rank is not None. When a float, it should be
between 0 and 1.
* **noise** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *default=0.0*) – The standard deviation of the gaussian noise applied to the output.
* **shuffle** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=True*) – Shuffle the samples and the features.
* **coef** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=False*) – If True, the coefficients of the underlying linear model are returned.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *RandomState instance* *or* *None* *,* *default=None*) – Determines random number generation for dataset creation. Pass an int
for reproducible output across multiple function calls.
See Glossary.
* **Returns:**
* **X** (*tensor of shape (n_samples, n_features)*) – The input samples.
* **y** (*tensor of shape (n_samples,) or (n_samples, n_targets)*) – The output values.
* **coef** (*tensor of shape (n_features,) or (n_features, n_targets)*) – The coefficient of the underlying linear model. It is returned only if
coef is True.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.accuracy_score.md
# maxframe.learn.metrics.accuracy_score
### maxframe.learn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None, execute=False, session=None, run_kwargs=None)
Accuracy classification score.
In multilabel classification, this function computes subset accuracy:
the set of labels predicted for a sample must *exactly* match the
corresponding set of labels in y_true.
Read more in the User Guide.
* **Parameters:**
* **y_true** (*1d array-like* *, or* *label indicator tensor / sparse tensor*) – Ground truth (correct) labels.
* **y_pred** (*1d array-like* *, or* *label indicator tensor / sparse tensor*) – Predicted labels, as returned by a classifier.
* **normalize** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional* *(**default=True* *)*) – If `False`, return the number of correctly classified samples.
Otherwise, return the fraction of correctly classified samples.
* **sample_weight** (*array-like* *of* *shape* *(**n_samples* *,* *)* *,* *default=None*) – Sample weights.
* **Returns:**
**score** – If `normalize == True`, return the fraction of correctly
classified samples (float), else returns the number of correctly
classified samples (int).
The best performance is 1 with `normalize == True` and the number
of samples with `normalize == False`.
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float)
#### SEE ALSO
`jaccard_score`, `hamming_loss`, `zero_one_loss`
### Notes
In binary and multiclass classification, this function is equal
to the `jaccard_score` function.
### Examples
```pycon
>>> from maxframe.learn.metrics import accuracy_score
>>> y_pred = [0, 2, 1, 3]
>>> y_true = [0, 1, 2, 3]
>>> accuracy_score(y_true, y_pred).execute()
0.5
>>> accuracy_score(y_true, y_pred, normalize=False).execute()
2
```
In the multilabel case with binary label indicators:
```pycon
>>> import maxframe.tensor as mt
>>> accuracy_score(mt.array([[0, 1], [1, 1]]), mt.ones((2, 2))).execute()
0.5
```
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.auc.md
# maxframe.learn.metrics.auc
### maxframe.learn.metrics.auc(x, y, execute=False, session=None, run_kwargs=None)
Compute Area Under the Curve (AUC) using the trapezoidal rule
This is a general function, given points on a curve. For computing the
area under the ROC-curve, see [`roc_auc_score()`](maxframe.learn.metrics.roc_auc_score.md#maxframe.learn.metrics.roc_auc_score). For an alternative
way to summarize a precision-recall curve, see
`average_precision_score()`.
* **Parameters:**
* **x** (*tensor* *,* *shape =* *[**n* *]*) – x coordinates. These must be either monotonic increasing or monotonic
decreasing.
* **y** (*tensor* *,* *shape =* *[**n* *]*) – y coordinates.
* **Returns:**
**auc**
* **Return type:**
tensor, with float value
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> from maxframe.learn import metrics
>>> y = mt.array([1, 1, 2, 2])
>>> pred = mt.array([0.1, 0.4, 0.35, 0.8])
>>> fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label=2)
>>> metrics.auc(fpr, tpr).execute()
0.75
```
#### SEE ALSO
[`roc_auc_score`](maxframe.learn.metrics.roc_auc_score.md#maxframe.learn.metrics.roc_auc_score)
: Compute the area under the ROC curve
`average_precision_score`
: Compute average precision from prediction scores
`precision_recall_curve`
: Compute precision-recall pairs for different probability thresholds
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.f1_score.md
# maxframe.learn.metrics.f1_score
### maxframe.learn.metrics.f1_score(y_true, y_pred, , labels=None, pos_label=1, average='binary', sample_weight=None, zero_division='warn', execute=False, session=None, run_kwargs=None)
Compute the F1 score, also known as balanced F-score or F-measure
The F1 score can be interpreted as a weighted average of the precision and
recall, where an F1 score reaches its best value at 1 and worst score at 0.
The relative contribution of precision and recall to the F1 score are
equal. The formula for the F1 score is:
```default
F1 = 2 * (precision * recall) / (precision + recall)
```
In the multi-class and multi-label case, this is the average of
the F1 score of each class with weighting depending on the `average`
parameter.
Read more in the User Guide.
* **Parameters:**
* **y_true** (*1d array-like* *, or* *label indicator array / sparse matrix*) – Ground truth (correct) target values.
* **y_pred** (*1d array-like* *, or* *label indicator array / sparse matrix*) – Estimated targets as returned by a classifier.
* **labels** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *optional*) – The set of labels to include when `average != 'binary'`, and their
order if `average is None`. Labels present in the data can be
excluded, for example to calculate a multiclass average ignoring a
majority negative class, while labels not present in the data will
result in 0 components in a macro average. For multilabel targets,
labels are column indices. By default, all labels in `y_true` and
`y_pred` are used in sorted order.
* **pos_label** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *1 by default*) – The class to report if `average='binary'` and the data is binary.
If the data are multiclass or multilabel, this will be ignored;
setting `labels=[pos_label]` and `average != 'binary'` will report
scores for that label only.
* **average** (*string* *,* *[**None* *,* *'binary'* *(**default* *)* *,* *'micro'* *,* *'macro'* *,* *'samples'* *,* *'weighted'* *]*) –
This parameter is required for multiclass/multilabel targets.
If `None`, the scores for each class are returned. Otherwise, this
determines the type of averaging performed on the data:
`'binary'`:
: Only report results for the class specified by `pos_label`.
This is applicable only if targets (`y_{true,pred}`) are binary.
`'micro'`:
: Calculate metrics globally by counting the total true positives,
false negatives and false positives.
`'macro'`:
: Calculate metrics for each label, and find their unweighted
mean. This does not take label imbalance into account.
`'weighted'`:
: Calculate metrics for each label, and find their average weighted
by support (the number of true instances for each label). This
alters ‘macro’ to account for label imbalance; it can result in an
F-score that is not between precision and recall.
`'samples'`:
: Calculate metrics for each instance, and find their average (only
meaningful for multilabel classification where this differs from
[`accuracy_score()`](maxframe.learn.metrics.accuracy_score.md#maxframe.learn.metrics.accuracy_score)).
* **sample_weight** (*array-like* *of* *shape* *(**n_samples* *,* *)* *,* *default=None*) – Sample weights.
* **zero_division** ( *"warn"* *,* *0* *or* *1* *,* *default="warn"*) – Sets the value to return when there is a zero division, i.e. when all
predictions and labels are negative. If set to “warn”, this acts as 0,
but warnings are also raised.
* **Returns:**
**f1_score** – F1 score of the positive class in binary classification or weighted
average of the F1 scores of each class for the multiclass task.
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) or array of [float](https://docs.python.org/3/library/functions.html#float), shape = [n_unique_labels]
#### SEE ALSO
[`fbeta_score`](maxframe.learn.metrics.fbeta_score.md#maxframe.learn.metrics.fbeta_score), [`precision_recall_fscore_support`](maxframe.learn.metrics.precision_recall_fscore_support.md#maxframe.learn.metrics.precision_recall_fscore_support), `jaccard_score`, [`multilabel_confusion_matrix`](maxframe.learn.metrics.multilabel_confusion_matrix.md#maxframe.learn.metrics.multilabel_confusion_matrix)
### References
* <a id='id1'>**[1]**</a> [Wikipedia entry for the F1-score](https://en.wikipedia.org/wiki/F1_score)
### Examples
```pycon
>>> from maxframe.learn.metrics import f1_score
>>> y_true = [0, 1, 2, 0, 1, 2]
>>> y_pred = [0, 2, 1, 0, 0, 1]
>>> f1_score(y_true, y_pred, average='macro')
0.26...
>>> f1_score(y_true, y_pred, average='micro')
0.33...
>>> f1_score(y_true, y_pred, average='weighted')
0.26...
>>> f1_score(y_true, y_pred, average=None)
array([0.8, 0. , 0. ])
>>> y_true = [0, 0, 0, 0, 0, 0]
>>> y_pred = [0, 0, 0, 0, 0, 0]
>>> f1_score(y_true, y_pred, zero_division=1)
1.0...
```
### Notes
When `true positive + false positive == 0`, precision is undefined;
When `true positive + false negative == 0`, recall is undefined.
In such cases, by default the metric will be set to 0, as will f-score,
and `UndefinedMetricWarning` will be raised. This behavior can be
modified with `zero_division`.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.fbeta_score.md
# maxframe.learn.metrics.fbeta_score
### maxframe.learn.metrics.fbeta_score(y_true, y_pred, , beta, labels=None, pos_label=1, average='binary', sample_weight=None, zero_division='warn', execute=False, session=None, run_kwargs=None)
Compute the F-beta score
The F-beta score is the weighted harmonic mean of precision and recall,
reaching its optimal value at 1 and its worst value at 0.
The beta parameter determines the weight of recall in the combined
score. `beta < 1` lends more weight to precision, while `beta > 1`
favors recall (`beta -> 0` considers only precision, `beta -> +inf`
only recall).
Read more in the User Guide.
* **Parameters:**
* **y_true** (*1d array-like* *, or* *label indicator array / sparse matrix*) – Ground truth (correct) target values.
* **y_pred** (*1d array-like* *, or* *label indicator array / sparse matrix*) – Estimated targets as returned by a classifier.
* **beta** ([*float*](https://docs.python.org/3/library/functions.html#float)) – Determines the weight of recall in the combined score.
* **labels** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *optional*) – The set of labels to include when `average != 'binary'`, and their
order if `average is None`. Labels present in the data can be
excluded, for example to calculate a multiclass average ignoring a
majority negative class, while labels not present in the data will
result in 0 components in a macro average. For multilabel targets,
labels are column indices. By default, all labels in `y_true` and
`y_pred` are used in sorted order.
* **pos_label** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *1 by default*) – The class to report if `average='binary'` and the data is binary.
If the data are multiclass or multilabel, this will be ignored;
setting `labels=[pos_label]` and `average != 'binary'` will report
scores for that label only.
* **average** (*string* *,* *[**None* *,* *'binary'* *(**default* *)* *,* *'micro'* *,* *'macro'* *,* *'samples'* *,* *'weighted'* *]*) –
This parameter is required for multiclass/multilabel targets.
If `None`, the scores for each class are returned. Otherwise, this
determines the type of averaging performed on the data:
`'binary'`:
: Only report results for the class specified by `pos_label`.
This is applicable only if targets (`y_{true,pred}`) are binary.
`'micro'`:
: Calculate metrics globally by counting the total true positives,
false negatives and false positives.
`'macro'`:
: Calculate metrics for each label, and find their unweighted
mean. This does not take label imbalance into account.
`'weighted'`:
: Calculate metrics for each label, and find their average weighted
by support (the number of true instances for each label). This
alters ‘macro’ to account for label imbalance; it can result in an
F-score that is not between precision and recall.
`'samples'`:
: Calculate metrics for each instance, and find their average (only
meaningful for multilabel classification where this differs from
[`accuracy_score()`](maxframe.learn.metrics.accuracy_score.md#maxframe.learn.metrics.accuracy_score)).
* **sample_weight** (*array-like* *of* *shape* *(**n_samples* *,* *)* *,* *default=None*) – Sample weights.
* **zero_division** ( *"warn"* *,* *0* *or* *1* *,* *default="warn"*) – Sets the value to return when there is a zero division, i.e. when all
predictions and labels are negative. If set to “warn”, this acts as 0,
but warnings are also raised.
* **Returns:**
**fbeta_score** – F-beta score of the positive class in binary classification or weighted
average of the F-beta score of each class for the multiclass task.
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) (if average is not None) or array of [float](https://docs.python.org/3/library/functions.html#float), shape = [n_unique_labels]
#### SEE ALSO
[`precision_recall_fscore_support`](maxframe.learn.metrics.precision_recall_fscore_support.md#maxframe.learn.metrics.precision_recall_fscore_support), [`multilabel_confusion_matrix`](maxframe.learn.metrics.multilabel_confusion_matrix.md#maxframe.learn.metrics.multilabel_confusion_matrix)
### References
* <a id='id1'>**[1]**</a> R. Baeza-Yates and B. Ribeiro-Neto (2011). Modern Information Retrieval. Addison Wesley, pp. 327-328.
* <a id='id2'>**[2]**</a> [Wikipedia entry for the F1-score](https://en.wikipedia.org/wiki/F1_score)
### Examples
```pycon
>>> from maxframe.learn.metrics import fbeta_score
>>> y_true = [0, 1, 2, 0, 1, 2]
>>> y_pred = [0, 2, 1, 0, 0, 1]
>>> fbeta_score(y_true, y_pred, average='macro', beta=0.5)
0.23...
>>> fbeta_score(y_true, y_pred, average='micro', beta=0.5)
0.33...
>>> fbeta_score(y_true, y_pred, average='weighted', beta=0.5)
0.23...
>>> fbeta_score(y_true, y_pred, average=None, beta=0.5)
array([0.71..., 0. , 0. ])
```
### Notes
When `true positive + false positive == 0` or
`true positive + false negative == 0`, f-score returns 0 and raises
`UndefinedMetricWarning`. This behavior can be
modified with `zero_division`.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.log_loss.md
# maxframe.learn.metrics.log_loss
### maxframe.learn.metrics.log_loss(y_true, y_pred, , eps=1e-15, normalize=True, sample_weight=None, labels=None, execute=False, session=None, run_kwargs=None)
Log loss, aka logistic loss or cross-entropy loss.
This is the loss function used in (multinomial) logistic regression
and extensions of it such as neural networks, defined as the negative
log-likelihood of a logistic model that returns `y_pred` probabilities
for its training data `y_true`.
The log loss is only defined for two or more labels.
For a single sample with true label $y \in \{0,1\}$ and
and a probability estimate $p = \operatorname{Pr}(y = 1)$, the log
loss is:
$$
L_{\log}(y, p) = -(y \log (p) + (1 - y) \log (1 - p))
$$
Read more in the User Guide.
* **Parameters:**
* **y_true** (*array-like* *or* *label indicator matrix*) – Ground truth (correct) labels for n_samples samples.
* **y_pred** (*array-like* *of* [*float*](https://docs.python.org/3/library/functions.html#float) *,* *shape =* *(**n_samples* *,* *n_classes* *) or* *(**n_samples* *,* *)*) – Predicted probabilities, as returned by a classifier’s
predict_proba method. If `y_pred.shape = (n_samples,)`
the probabilities provided are assumed to be that of the
positive class. The labels in `y_pred` are assumed to be
ordered alphabetically, as done by
`preprocessing.LabelBinarizer`.
* **eps** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *default=1e-15*) – Log loss is undefined for p=0 or p=1, so probabilities are
clipped to max(eps, min(1 - eps, p)).
* **normalize** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=True*) – If true, return the mean loss per sample.
Otherwise, return the sum of the per-sample losses.
* **sample_weight** (*array-like* *of* *shape* *(**n_samples* *,* *)* *,* *default=None*) – Sample weights.
* **labels** (*array-like* *,* *default=None*) – If not provided, labels will be inferred from y_true. If `labels`
is `None` and `y_pred` has shape (n_samples,) the labels are
assumed to be binary and are inferred from `y_true`.
* **Returns:**
**loss**
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float)
### Notes
The logarithm used is the natural logarithm (base-e).
### Examples
```pycon
>>> from maxframe.learn.metrics import log_loss
>>> log_loss(["spam", "ham", "ham", "spam"],
... [[.1, .9], [.9, .1], [.8, .2], [.35, .65]])
0.21616...
```
### References
C.M. Bishop (2006). Pattern Recognition and Machine Learning. Springer,
p. 209.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.multilabel_confusion_matrix.md
# maxframe.learn.metrics.multilabel_confusion_matrix
### maxframe.learn.metrics.multilabel_confusion_matrix(y_true, y_pred, , sample_weight=None, labels=None, samplewise=False, execute=False, session=None, run_kwargs=None)
Compute a confusion matrix for each class or sample.
Compute class-wise (default) or sample-wise (samplewise=True) multilabel
confusion matrix to evaluate the accuracy of a classification, and output
confusion matrices for each class or sample.
In multilabel confusion matrix $MCM$, the count of true negatives
is $MCM_{:,0,0}$, false negatives is $MCM_{:,1,0}$,
true positives is $MCM_{:,1,1}$ and false positives is
$MCM_{:,0,1}$.
Multiclass data will be treated as if binarized under a one-vs-rest
transformation. Returned confusion matrices will be in the order of
sorted unique labels in the union of (y_true, y_pred).
Read more in the User Guide.
* **Parameters:**
* **y_true** ( *{array-like* *,* *sparse matrix}* *of* *shape* *(**n_samples* *,* *n_outputs* *) or* *(**n_samples* *,* *)*) – Ground truth (correct) target values.
* **y_pred** ( *{array-like* *,* *sparse matrix}* *of* *shape* *(**n_samples* *,* *n_outputs* *) or* *(**n_samples* *,* *)*) – Estimated targets as returned by a classifier.
* **sample_weight** (*array-like* *of* *shape* *(**n_samples* *,* *)* *,* *default=None*) – Sample weights.
* **labels** (*array-like* *of* *shape* *(**n_classes* *,* *)* *,* *default=None*) – A list of classes or column indices to select some (or to force
inclusion of classes absent from the data).
* **samplewise** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=False*) – In the multilabel case, this calculates a confusion matrix per sample.
* **Returns:**
**multi_confusion** – A 2x2 confusion matrix corresponding to each output in the input.
When calculating class-wise multi_confusion (default), then
n_outputs = n_labels; when calculating sample-wise multi_confusion
(samplewise=True), n_outputs = n_samples. If `labels` is defined,
the results will be returned in the order specified in `labels`,
otherwise the results will be returned in sorted order by default.
* **Return type:**
ndarray of shape (n_outputs, 2, 2)
#### SEE ALSO
`confusion_matrix`
: Compute confusion matrix to evaluate the accuracy of a classifier.
### Notes
The multilabel_confusion_matrix calculates class-wise or sample-wise
multilabel confusion matrices, and in multiclass tasks, labels are
binarized under a one-vs-rest way; while
`confusion_matrix()` calculates one confusion matrix
for confusion between every two classes.
### Examples
Multiclass case:
```pycon
>>> import maxframe.tensor as mt
>>> from maxframe.learn.metrics import multilabel_confusion_matrix
>>> y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]
>>> y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]
>>> multilabel_confusion_matrix(y_true, y_pred,
... labels=["ant", "bird", "cat"]).execute()
array([[[3, 1],
[0, 2]],
[[5, 0],
[1, 0]],
[[2, 1],
[1, 2]]])
```
Multilabel-indicator case not implemented yet.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.pairwise.cosine_distances.md
# maxframe.learn.metrics.pairwise.cosine_distances
### maxframe.learn.metrics.pairwise.cosine_distances(X, Y=None)
Compute cosine distance between samples in X and Y.
Cosine distance is defined as 1.0 minus the cosine similarity.
Read more in the User Guide.
* **Parameters:**
* **X** (*array_like* *,* *sparse matrix*) – with shape (n_samples_X, n_features).
* **Y** (*array_like* *,* *sparse matrix* *(**optional* *)*) – with shape (n_samples_Y, n_features).
* **Returns:**
**distance matrix** – A tensor with shape (n_samples_X, n_samples_Y).
* **Return type:**
Tensor
#### SEE ALSO
[`maxframe.learn.metrics.pairwise.cosine_similarity`](maxframe.learn.metrics.pairwise.cosine_similarity.md#maxframe.learn.metrics.pairwise.cosine_similarity)
`maxframe.tensor.spatial.distance.cosine`
: dense matrices only
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.pairwise.cosine_similarity.md
# maxframe.learn.metrics.pairwise.cosine_similarity
### maxframe.learn.metrics.pairwise.cosine_similarity(X, Y=None, dense_output=True)
Compute cosine similarity between samples in X and Y.
Cosine similarity, or the cosine kernel, computes similarity as the
normalized dot product of X and Y:
> K(X, Y) = <X, Y> / (||X||\*||Y||)
On L2-normalized data, this function is equivalent to linear_kernel.
Read more in the User Guide.
* **Parameters:**
* **X** (*Tensor* *or* *sparse tensor* *,* *shape:* *(**n_samples_X* *,* *n_features* *)*) – Input data.
* **Y** (*Tensor* *or* *sparse tensor* *,* *shape:* *(**n_samples_Y* *,* *n_features* *)*) – Input data. If `None`, the output will be the pairwise
similarities between all samples in `X`.
* **dense_output** (*boolean* *(**optional* *)* *,* *default True*) – Whether to return dense output even when the input is sparse. If
`False`, the output is sparse if both input tensors are sparse.
* **Returns:**
**kernel matrix** – A tensor with shape (n_samples_X, n_samples_Y).
* **Return type:**
Tensor
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.pairwise.euclidean_distances.md
# maxframe.learn.metrics.pairwise.euclidean_distances
### maxframe.learn.metrics.pairwise.euclidean_distances(X, Y=None, Y_norm_squared=None, squared=False, X_norm_squared=None)
Considering the rows of X (and Y=X) as vectors, compute the
distance matrix between each pair of vectors.
For efficiency reasons, the euclidean distance between a pair of row
vector x and y is computed as:
```default
dist(x, y) = sqrt(dot(x, x) - 2 * dot(x, y) + dot(y, y))
```
This formulation has two advantages over other ways of computing distances.
First, it is computationally efficient when dealing with sparse data.
Second, if one argument varies but the other remains unchanged, then
dot(x, x) and/or dot(y, y) can be pre-computed.
However, this is not the most precise way of doing this computation, and
the distance matrix returned by this function may not be exactly
symmetric as required by, e.g., `scipy.spatial.distance` functions.
Read more in the User Guide.
* **Parameters:**
* **X** ( *{array-like* *,* *sparse matrix}* *,* *shape* *(**n_samples_1* *,* *n_features* *)*)
* **Y** ( *{array-like* *,* *sparse matrix}* *,* *shape* *(**n_samples_2* *,* *n_features* *)*)
* **Y_norm_squared** (*array-like* *,* *shape* *(**n_samples_2* *,* *)* *,* *optional*) – Pre-computed dot-products of vectors in Y (e.g.,
`(Y**2).sum(axis=1)`)
May be ignored in some cases, see the note below.
* **squared** (*boolean* *,* *optional*) – Return squared Euclidean distances.
* **X_norm_squared** (*array-like* *,* *shape =* *[**n_samples_1* *]* *,* *optional*) – Pre-computed dot-products of vectors in X (e.g.,
`(X**2).sum(axis=1)`)
May be ignored in some cases, see the note below.
### Notes
To achieve better accuracy, X_norm_squared and Y_norm_squared may be
unused if they are passed as `float32`.
* **Returns:**
**distances**
* **Return type:**
tensor, shape (n_samples_1, n_samples_2)
### Examples
```pycon
>>> from maxframe.learn.metrics.pairwise import euclidean_distances
>>> X = [[0, 1], [1, 1]]
>>> # distance between rows of X
>>> euclidean_distances(X, X).execute()
array([[0., 1.],
[1., 0.]])
>>> # get distance to origin
>>> euclidean_distances(X, [[0, 0]]).execute()
array([[1. ],
[1.41421356]])
```
#### SEE ALSO
`paired_distances`
: distances betweens pairs of elements of X and Y.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.pairwise.haversine_distances.md
# maxframe.learn.metrics.pairwise.haversine_distances
### maxframe.learn.metrics.pairwise.haversine_distances(X, Y=None)
Compute the Haversine distance between samples in X and Y
The Haversine (or great circle) distance is the angular distance between
two points on the surface of a sphere. The first distance of each point is
assumed to be the latitude, the second is the longitude, given in radians.
The dimension of the data must be 2.
$$
D(x, y) = 2\arcsin[\sqrt{\sin^2((x1 - y1) / 2)
+ \cos(x1)\cos(y1)\sin^2((x2 - y2) / 2)}]
$$
* **Parameters:**
* **X** (*array_like* *,* *shape* *(**n_samples_1* *,* *2* *)*)
* **Y** (*array_like* *,* *shape* *(**n_samples_2* *,* *2* *)* *,* *optional*)
* **Returns:**
**distance**
* **Return type:**
{Tensor}, shape (n_samples_1, n_samples_2)
### Notes
As the Earth is nearly spherical, the haversine formula provides a good
approximation of the distance between two points of the Earth surface, with
a less than 1% error on average.
### Examples
We want to calculate the distance between the Ezeiza Airport
(Buenos Aires, Argentina) and the Charles de Gaulle Airport (Paris, France)
```pycon
>>> from maxframe.learn.metrics.pairwise import haversine_distances
>>> bsas = [-34.83333, -58.5166646]
>>> paris = [49.0083899664, 2.53844117956]
>>> result = haversine_distances([bsas, paris])
>>> (result * 6371000/1000).execute() # multiply by Earth radius to get kilometers
array([[ 0. , 11279.45379464],
[11279.45379464, 0. ]])
```
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.pairwise.manhattan_distances.md
# maxframe.learn.metrics.pairwise.manhattan_distances
### maxframe.learn.metrics.pairwise.manhattan_distances(X, Y=None)
Compute the L1 distances between the vectors in X and Y.
Read more in the User Guide.
* **Parameters:**
* **X** (*array_like*) – A tensor with shape (n_samples_X, n_features).
* **Y** (*array_like* *,* *optional*) – A tensor with shape (n_samples_Y, n_features).
* **Returns:**
**D** – Shape is (n_samples_X, n_samples_Y) and D contains
the pairwise L1 distances.
* **Return type:**
Tensor
### Examples
```pycon
>>> from maxframe.learn.metrics.pairwise import manhattan_distances
>>> manhattan_distances([[3]], [[3]]).execute()
array([[0.]])
>>> manhattan_distances([[3]], [[2]]).execute()
array([[1.]])
>>> manhattan_distances([[2]], [[3]]).execute()
array([[1.]])
>>> manhattan_distances([[1, 2], [3, 4]], [[1, 2], [0, 3]]).execute()
array([[0., 2.],
[4., 4.]])
```
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.pairwise.rbf_kernel.md
# maxframe.learn.metrics.pairwise.rbf_kernel
### maxframe.learn.metrics.pairwise.rbf_kernel(X, Y=None, gamma=None)
Compute the rbf (gaussian) kernel between X and Y:
```default
K(x, y) = exp(-gamma ||x-y||^2)
```
for each pair of rows x in X and y in Y.
Read more in the User Guide.
* **Parameters:**
* **X** (*tensor* *of* *shape* *(**n_samples_X* *,* *n_features* *)*)
* **Y** (*tensor* *of* *shape* *(**n_samples_Y* *,* *n_features* *)*)
* **gamma** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *default None*) – If None, defaults to 1.0 / n_features
* **Returns:**
**kernel_matrix**
* **Return type:**
tensor of shape (n_samples_X, n_samples_Y)
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.pairwise_distances.md
# maxframe.learn.metrics.pairwise_distances
### maxframe.learn.metrics.pairwise_distances(X, Y=None, metric='euclidean', \*\*kwds)
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.precision_recall_fscore_support.md
# maxframe.learn.metrics.precision_recall_fscore_support
### maxframe.learn.metrics.precision_recall_fscore_support(y_true, y_pred, , beta=1.0, labels=None, pos_label=1, average=None, warn_for=('precision', 'recall', 'f-score'), sample_weight=None, zero_division='warn', execute=False, session=None, run_kwargs=None)
Compute precision, recall, F-measure and support for each class
The precision is the ratio `tp / (tp + fp)` where `tp` is the number of
true positives and `fp` the number of false positives. The precision is
intuitively the ability of the classifier not to label as positive a sample
that is negative.
The recall is the ratio `tp / (tp + fn)` where `tp` is the number of
true positives and `fn` the number of false negatives. The recall is
intuitively the ability of the classifier to find all the positive samples.
The F-beta score can be interpreted as a weighted harmonic mean of
the precision and recall, where an F-beta score reaches its best
value at 1 and worst score at 0.
The F-beta score weights recall more than precision by a factor of
`beta`. `beta == 1.0` means recall and precision are equally important.
The support is the number of occurrences of each class in `y_true`.
If `pos_label is None` and in binary classification, this function
returns the average precision, recall and F-measure if `average`
is one of `'micro'`, `'macro'`, `'weighted'` or `'samples'`.
Read more in the User Guide.
* **Parameters:**
* **y_true** (*1d array-like* *, or* *label indicator array / sparse matrix*) – Ground truth (correct) target values.
* **y_pred** (*1d array-like* *, or* *label indicator array / sparse matrix*) – Estimated targets as returned by a classifier.
* **beta** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *1.0 by default*) – The strength of recall versus precision in the F-score.
* **labels** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *optional*) – The set of labels to include when `average != 'binary'`, and their
order if `average is None`. Labels present in the data can be
excluded, for example to calculate a multiclass average ignoring a
majority negative class, while labels not present in the data will
result in 0 components in a macro average. For multilabel targets,
labels are column indices. By default, all labels in `y_true` and
`y_pred` are used in sorted order.
* **pos_label** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *1 by default*) – The class to report if `average='binary'` and the data is binary.
If the data are multiclass or multilabel, this will be ignored;
setting `labels=[pos_label]` and `average != 'binary'` will report
scores for that label only.
* **average** (*string* *,* *[**None* *(**default* *)* *,* *'binary'* *,* *'micro'* *,* *'macro'* *,* *'samples'* *,* *'weighted'* *]*) –
If `None`, the scores for each class are returned. Otherwise, this
determines the type of averaging performed on the data:
`'binary'`:
: Only report results for the class specified by `pos_label`.
This is applicable only if targets (`y_{true,pred}`) are binary.
`'micro'`:
: Calculate metrics globally by counting the total true positives,
false negatives and false positives.
`'macro'`:
: Calculate metrics for each label, and find their unweighted
mean. This does not take label imbalance into account.
`'weighted'`:
: Calculate metrics for each label, and find their average weighted
by support (the number of true instances for each label). This
alters ‘macro’ to account for label imbalance; it can result in an
F-score that is not between precision and recall.
`'samples'`:
: Calculate metrics for each instance, and find their average (only
meaningful for multilabel classification where this differs from
[`accuracy_score()`](maxframe.learn.metrics.accuracy_score.md#maxframe.learn.metrics.accuracy_score)).
* **warn_for** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *or* [*set*](https://docs.python.org/3/library/stdtypes.html#set) *,* *for internal use*) – This determines which warnings will be made in the case that this
function is being used to return only one of its metrics.
* **sample_weight** (*array-like* *of* *shape* *(**n_samples* *,* *)* *,* *default=None*) – Sample weights.
* **zero_division** ( *"warn"* *,* *0* *or* *1* *,* *default="warn"*) –
Sets the value to return when there is a zero division:
: - recall: when there are no positive labels
- precision: when there are no positive predictions
- f-score: both
If set to “warn”, this acts as 0, but warnings are also raised.
* **Returns:**
* **precision** (*float (if average is not None) or array of float, shape = [n_unique_labels]*)
* **recall** (*float (if average is not None) or array of float, , shape = [n_unique_labels]*)
* **fbeta_score** (*float (if average is not None) or array of float, shape = [n_unique_labels]*)
* **support** (*None (if average is not None) or array of int, shape = [n_unique_labels]*) – The number of occurrences of each label in `y_true`.
### References
* <a id='id1'>**[1]**</a> [Wikipedia entry for the Precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall)
* <a id='id2'>**[2]**</a> [Wikipedia entry for the F1-score](https://en.wikipedia.org/wiki/F1_score)
* <a id='id3'>**[3]**</a> [Discriminative Methods for Multi-labeled Classification Advances in Knowledge Discovery and Data Mining (2004), pp. 22-30 by Shantanu Godbole, Sunita Sarawagi](http://www.godbole.net/shantanu/pubs/multilabelsvm-pakdd04.pdf)
### Examples
```pycon
>>> import numpy as np
>>> from maxframe.learn.metrics import precision_recall_fscore_support
>>> y_true = np.array(['cat', 'dog', 'pig', 'cat', 'dog', 'pig'])
>>> y_pred = np.array(['cat', 'pig', 'dog', 'cat', 'cat', 'dog'])
>>> precision_recall_fscore_support(y_true, y_pred, average='macro')
(0.22..., 0.33..., 0.26..., None)
>>> precision_recall_fscore_support(y_true, y_pred, average='micro')
(0.33..., 0.33..., 0.33..., None)
>>> precision_recall_fscore_support(y_true, y_pred, average='weighted')
(0.22..., 0.33..., 0.26..., None)
```
It is possible to compute per-label precisions, recalls, F1-scores and
supports instead of averaging:
```pycon
>>> precision_recall_fscore_support(y_true, y_pred, average=None,
... labels=['pig', 'dog', 'cat'])
(array([0. , 0. , 0.66...]),
array([0., 0., 1.]), array([0. , 0. , 0.8]),
array([2, 2, 2]))
```
### Notes
When `true positive + false positive == 0`, precision is undefined;
When `true positive + false negative == 0`, recall is undefined.
In such cases, by default the metric will be set to 0, as will f-score,
and `UndefinedMetricWarning` will be raised. This behavior can be
modified with `zero_division`.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.precision_score.md
# maxframe.learn.metrics.precision_score
### maxframe.learn.metrics.precision_score(y_true, y_pred, , labels=None, pos_label=1, average='binary', sample_weight=None, zero_division='warn', execute=False, session=None, run_kwargs=None)
Compute the precision
The precision is the ratio `tp / (tp + fp)` where `tp` is the number of
true positives and `fp` the number of false positives. The precision is
intuitively the ability of the classifier not to label as positive a sample
that is negative.
The best value is 1 and the worst value is 0.
Read more in the User Guide.
* **Parameters:**
* **y_true** (*1d array-like* *, or* *label indicator array / sparse matrix*) – Ground truth (correct) target values.
* **y_pred** (*1d array-like* *, or* *label indicator array / sparse matrix*) – Estimated targets as returned by a classifier.
* **labels** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *optional*) – The set of labels to include when `average != 'binary'`, and their
order if `average is None`. Labels present in the data can be
excluded, for example to calculate a multiclass average ignoring a
majority negative class, while labels not present in the data will
result in 0 components in a macro average. For multilabel targets,
labels are column indices. By default, all labels in `y_true` and
`y_pred` are used in sorted order.
* **pos_label** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *1 by default*) – The class to report if `average='binary'` and the data is binary.
If the data are multiclass or multilabel, this will be ignored;
setting `labels=[pos_label]` and `average != 'binary'` will report
scores for that label only.
* **average** (*string* *,* *[**None* *,* *'binary'* *(**default* *)* *,* *'micro'* *,* *'macro'* *,* *'samples'* *,* *'weighted'* *]*) –
This parameter is required for multiclass/multilabel targets.
If `None`, the scores for each class are returned. Otherwise, this
determines the type of averaging performed on the data:
`'binary'`:
: Only report results for the class specified by `pos_label`.
This is applicable only if targets (`y_{true,pred}`) are binary.
`'micro'`:
: Calculate metrics globally by counting the total true positives,
false negatives and false positives.
`'macro'`:
: Calculate metrics for each label, and find their unweighted
mean. This does not take label imbalance into account.
`'weighted'`:
: Calculate metrics for each label, and find their average weighted
by support (the number of true instances for each label). This
alters ‘macro’ to account for label imbalance; it can result in an
F-score that is not between precision and recall.
`'samples'`:
: Calculate metrics for each instance, and find their average (only
meaningful for multilabel classification where this differs from
[`accuracy_score()`](maxframe.learn.metrics.accuracy_score.md#maxframe.learn.metrics.accuracy_score)).
* **sample_weight** (*array-like* *of* *shape* *(**n_samples* *,* *)* *,* *default=None*) – Sample weights.
* **zero_division** ( *"warn"* *,* *0* *or* *1* *,* *default="warn"*) – Sets the value to return when there is a zero division. If set to
“warn”, this acts as 0, but warnings are also raised.
* **Returns:**
**precision** – Precision of the positive class in binary classification or weighted
average of the precision of each class for the multiclass task.
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) (if average is not None) or array of [float](https://docs.python.org/3/library/functions.html#float), shape = [n_unique_labels]
#### SEE ALSO
[`precision_recall_fscore_support`](maxframe.learn.metrics.precision_recall_fscore_support.md#maxframe.learn.metrics.precision_recall_fscore_support), [`multilabel_confusion_matrix`](maxframe.learn.metrics.multilabel_confusion_matrix.md#maxframe.learn.metrics.multilabel_confusion_matrix)
### Examples
```pycon
>>> from maxframe.learn.metrics import precision_score
>>> y_true = [0, 1, 2, 0, 1, 2]
>>> y_pred = [0, 2, 1, 0, 0, 1]
>>> precision_score(y_true, y_pred, average='macro')
0.22...
>>> precision_score(y_true, y_pred, average='micro')
0.33...
>>> precision_score(y_true, y_pred, average='weighted')
0.22...
>>> precision_score(y_true, y_pred, average=None)
array([0.66..., 0. , 0. ])
>>> y_pred = [0, 0, 0, 0, 0, 0]
>>> precision_score(y_true, y_pred, average=None)
array([0.33..., 0. , 0. ])
>>> precision_score(y_true, y_pred, average=None, zero_division=1)
array([0.33..., 1. , 1. ])
```
### Notes
When `true positive + false positive == 0`, precision returns 0 and
raises `UndefinedMetricWarning`. This behavior can be
modified with `zero_division`.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.r2_score.md
# maxframe.learn.metrics.r2_score
### maxframe.learn.metrics.r2_score(y_true, y_pred, , sample_weight=None, multioutput='uniform_average', execute=False, session=None, run_kwargs=None)
$R^2$ (coefficient of determination) regression score function.
Best possible score is 1.0 and it can be negative (because the
model can be arbitrarily worse). A constant model that always
predicts the expected value of y, disregarding the input features,
would get a $R^2$ score of 0.0.
Read more in the User Guide.
* **Parameters:**
* **y_true** (*array-like* *of* *shape* *(**n_samples* *,* *) or* *(**n_samples* *,* *n_outputs* *)*) – Ground truth (correct) target values.
* **y_pred** (*array-like* *of* *shape* *(**n_samples* *,* *) or* *(**n_samples* *,* *n_outputs* *)*) – Estimated target values.
* **sample_weight** (*array-like* *of* *shape* *(**n_samples* *,* *)* *,* *default=None*) – Sample weights.
* **multioutput** ( *{'raw_values'* *,* *'uniform_average'* *,* *'variance_weighted'}* *,* *array-like* *of* *shape* *(**n_outputs* *,* *) or* *None* *,* *default='uniform_average'*) –
Defines aggregating of multiple output scores.
Array-like value defines weights used to average scores.
Default is “uniform_average”.
’raw_values’ :
: Returns a full set of scores in case of multioutput input.
’uniform_average’ :
: Scores of all outputs are averaged with uniform weight.
’variance_weighted’ :
: Scores of all outputs are averaged, weighted by the variances
of each individual output.
* **Returns:**
**z** – The $R^2$ score or ndarray of scores if ‘multioutput’ is
‘raw_values’.
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) or tensor of floats
### Notes
This is not a symmetric function.
Unlike most other scores, $R^2$ score may be negative (it need not
actually be the square of a quantity R).
This metric is not well-defined for single samples and will return a NaN
value if n_samples is less than two.
### References
* <a id='id1'>**[1]**</a> [Wikipedia entry on the Coefficient of determination](https://en.wikipedia.org/wiki/Coefficient_of_determination)
### Examples
```pycon
>>> from maxframe.learn.metrics import r2_score
>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> r2_score(y_true, y_pred)
0.948...
>>> y_true = [[0.5, 1], [-1, 1], [7, -6]]
>>> y_pred = [[0, 2], [-1, 2], [8, -5]]
>>> r2_score(y_true, y_pred,
... multioutput='variance_weighted')
0.938...
>>> y_true = [1, 2, 3]
>>> y_pred = [1, 2, 3]
>>> r2_score(y_true, y_pred)
1.0
>>> y_true = [1, 2, 3]
>>> y_pred = [2, 2, 2]
>>> r2_score(y_true, y_pred)
0.0
>>> y_true = [1, 2, 3]
>>> y_pred = [3, 2, 1]
>>> r2_score(y_true, y_pred)
-3.0
```
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.recall_score.md
# maxframe.learn.metrics.recall_score
### maxframe.learn.metrics.recall_score(y_true, y_pred, , labels=None, pos_label=1, average='binary', sample_weight=None, zero_division='warn', execute=False, session=None, run_kwargs=None)
Compute the recall
The recall is the ratio `tp / (tp + fn)` where `tp` is the number of
true positives and `fn` the number of false negatives. The recall is
intuitively the ability of the classifier to find all the positive samples.
The best value is 1 and the worst value is 0.
Read more in the User Guide.
* **Parameters:**
* **y_true** (*1d array-like* *, or* *label indicator array / sparse matrix*) – Ground truth (correct) target values.
* **y_pred** (*1d array-like* *, or* *label indicator array / sparse matrix*) – Estimated targets as returned by a classifier.
* **labels** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *,* *optional*) – The set of labels to include when `average != 'binary'`, and their
order if `average is None`. Labels present in the data can be
excluded, for example to calculate a multiclass average ignoring a
majority negative class, while labels not present in the data will
result in 0 components in a macro average. For multilabel targets,
labels are column indices. By default, all labels in `y_true` and
`y_pred` are used in sorted order.
* **pos_label** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *1 by default*) – The class to report if `average='binary'` and the data is binary.
If the data are multiclass or multilabel, this will be ignored;
setting `labels=[pos_label]` and `average != 'binary'` will report
scores for that label only.
* **average** (*string* *,* *[**None* *,* *'binary'* *(**default* *)* *,* *'micro'* *,* *'macro'* *,* *'samples'* *,* *'weighted'* *]*) –
This parameter is required for multiclass/multilabel targets.
If `None`, the scores for each class are returned. Otherwise, this
determines the type of averaging performed on the data:
`'binary'`:
: Only report results for the class specified by `pos_label`.
This is applicable only if targets (`y_{true,pred}`) are binary.
`'micro'`:
: Calculate metrics globally by counting the total true positives,
false negatives and false positives.
`'macro'`:
: Calculate metrics for each label, and find their unweighted
mean. This does not take label imbalance into account.
`'weighted'`:
: Calculate metrics for each label, and find their average weighted
by support (the number of true instances for each label). This
alters ‘macro’ to account for label imbalance; it can result in an
F-score that is not between precision and recall.
`'samples'`:
: Calculate metrics for each instance, and find their average (only
meaningful for multilabel classification where this differs from
[`accuracy_score()`](maxframe.learn.metrics.accuracy_score.md#maxframe.learn.metrics.accuracy_score)).
* **sample_weight** (*array-like* *of* *shape* *(**n_samples* *,* *)* *,* *default=None*) – Sample weights.
* **zero_division** ( *"warn"* *,* *0* *or* *1* *,* *default="warn"*) – Sets the value to return when there is a zero division. If set to
“warn”, this acts as 0, but warnings are also raised.
* **Returns:**
**recall** – Recall of the positive class in binary classification or weighted
average of the recall of each class for the multiclass task.
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) (if average is not None) or array of [float](https://docs.python.org/3/library/functions.html#float), shape = [n_unique_labels]
#### SEE ALSO
[`precision_recall_fscore_support`](maxframe.learn.metrics.precision_recall_fscore_support.md#maxframe.learn.metrics.precision_recall_fscore_support), `balanced_accuracy_score`, [`multilabel_confusion_matrix`](maxframe.learn.metrics.multilabel_confusion_matrix.md#maxframe.learn.metrics.multilabel_confusion_matrix)
### Examples
```pycon
>>> from maxframe.learn.metrics import recall_score
>>> y_true = [0, 1, 2, 0, 1, 2]
>>> y_pred = [0, 2, 1, 0, 0, 1]
>>> recall_score(y_true, y_pred, average='macro')
0.33...
>>> recall_score(y_true, y_pred, average='micro')
0.33...
>>> recall_score(y_true, y_pred, average='weighted')
0.33...
>>> recall_score(y_true, y_pred, average=None)
array([1., 0., 0.])
>>> y_true = [0, 0, 0, 0, 0, 0]
>>> recall_score(y_true, y_pred, average=None)
array([0.5, 0. , 0. ])
>>> recall_score(y_true, y_pred, average=None, zero_division=1)
array([0.5, 1. , 1. ])
```
### Notes
When `true positive + false negative == 0`, recall returns 0 and raises
`UndefinedMetricWarning`. This behavior can be modified with
`zero_division`.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.roc_auc_score.md
# maxframe.learn.metrics.roc_auc_score
### maxframe.learn.metrics.roc_auc_score(y_true, y_score, , average='macro', sample_weight=None, max_fpr=None, multi_class='raise', labels=None, execute=False, session=None, run_kwargs=None)
Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC)
from prediction scores.
Note: this implementation can be used with binary, multiclass and
multilabel classification, but some restrictions apply (see Parameters).
Read more in the User Guide.
* **Parameters:**
* **y_true** (*array-like* *of* *shape* *(**n_samples* *,* *) or* *(**n_samples* *,* *n_classes* *)*) – True labels or binary label indicators. The binary and multiclass cases
expect labels with shape (n_samples,) while the multilabel case expects
binary label indicators with shape (n_samples, n_classes).
* **y_score** (*array-like* *of* *shape* *(**n_samples* *,* *) or* *(**n_samples* *,* *n_classes* *)*) –
Target scores.
* In the binary case, it corresponds to an array of shape
(n_samples,). Both probability estimates and non-thresholded
decision values can be provided. The probability estimates correspond
to the **probability of the class with the greater label**,
i.e. estimator.classes_[1] and thus
estimator.predict_proba(X, y)[:, 1]. The decision values
corresponds to the output of estimator.decision_function(X, y).
See more information in the User guide;
* In the multiclass case, it corresponds to an array of shape
(n_samples, n_classes) of probability estimates provided by the
predict_proba method. The probability estimates **must**
sum to 1 across the possible classes. In addition, the order of the
class scores must correspond to the order of `labels`,
if provided, or else to the numerical or lexicographical order of
the labels in `y_true`. See more information in the
User guide;
* In the multilabel case, it corresponds to an array of shape
(n_samples, n_classes). Probability estimates are provided by the
predict_proba method and the non-thresholded decision values by
the decision_function method. The probability estimates correspond
to the **probability of the class with the greater label for each
output** of the classifier. See more information in the
User guide.
* **average** ( *{'micro'* *,* *'macro'* *,* *'samples'* *,* *'weighted'}* *or* *None* *,* *default='macro'*) –
If `None`, the scores for each class are returned. Otherwise,
this determines the type of averaging performed on the data:
Note: multiclass ROC AUC currently only handles the ‘macro’ and
‘weighted’ averages.
`'micro'`:
: Calculate metrics globally by considering each element of the label
indicator matrix as a label.
`'macro'`:
: Calculate metrics for each label, and find their unweighted
mean. This does not take label imbalance into account.
`'weighted'`:
: Calculate metrics for each label, and find their average, weighted
by support (the number of true instances for each label).
`'samples'`:
: Calculate metrics for each instance, and find their average.
Will be ignored when `y_true` is binary.
* **sample_weight** (*array-like* *of* *shape* *(**n_samples* *,* *)* *,* *default=None*) – Sample weights.
* **max_fpr** (*float > 0 and <= 1* *,* *default=None*) – If not `None`, the standardized partial AUC <sup>[2](#id6)</sup> over the range
[0, max_fpr] is returned. For the multiclass case, `max_fpr`,
should be either equal to `None` or `1.0` as AUC ROC partial
computation currently is not supported for multiclass.
* **multi_class** ( *{'raise'* *,* *'ovr'* *,* *'ovo'}* *,* *default='raise'*) –
Only used for multiclass targets. Determines the type of configuration
to use. The default value raises an error, so either
`'ovr'` or `'ovo'` must be passed explicitly.
`'ovr'`:
: Stands for One-vs-rest. Computes the AUC of each class
against the rest <sup>[3](#id7)</sup> <sup>[4](#id8)</sup>. This
treats the multiclass case in the same way as the multilabel case.
Sensitive to class imbalance even when `average == 'macro'`,
because class imbalance affects the composition of each of the
‘rest’ groupings.
`'ovo'`:
: Stands for One-vs-one. Computes the average AUC of all
possible pairwise combinations of classes <sup>[5](#id9)</sup>.
Insensitive to class imbalance when
`average == 'macro'`.
* **labels** (*array-like* *of* *shape* *(**n_classes* *,* *)* *,* *default=None*) – Only used for multiclass targets. List of labels that index the
classes in `y_score`. If `None`, the numerical or lexicographical
order of the labels in `y_true` is used.
* **Returns:**
**auc**
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float)
### References
* <a id='id5'>**[1]**</a> [Wikipedia entry for the Receiver operating characteristic](https://en.wikipedia.org/wiki/Receiver_operating_characteristic)
* <a id='id6'>**[2]**</a> [Analyzing a portion of the ROC curve. McClish, 1989](https://www.ncbi.nlm.nih.gov/pubmed/2668680)
* <a id='id7'>**[3]**</a> Provost, F., Domingos, P. (2000). Well-trained PETs: Improving probability estimation trees (Section 6.2), CeDER Working Paper #IS-00-04, Stern School of Business, New York University.
* <a id='id8'>**[4]**</a> [Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874.](https://www.sciencedirect.com/science/article/pii/S016786550500303X)
* <a id='id9'>**[5]**</a> [Hand, D.J., Till, R.J. (2001). A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning, 45(2), 171-186.](http://link.springer.com/article/10.1023/A:1010920819831)
#### SEE ALSO
`average_precision_score`
: Area under the precision-recall curve.
[`roc_curve`](maxframe.learn.metrics.roc_curve.md#maxframe.learn.metrics.roc_curve)
: Compute Receiver operating characteristic (ROC) curve.
`RocCurveDisplay.from_estimator`
: Plot Receiver Operating Characteristic (ROC) curve given an estimator and some data.
`RocCurveDisplay.from_predictions`
: Plot Receiver Operating Characteristic (ROC) curve given the true and predicted values.
### Examples
Binary case:
```pycon
>>> from sklearn.datasets import load_breast_cancer
>>> from sklearn.linear_model import LogisticRegression
>>> from maxframe.learn.metrics import roc_auc_score
>>> X, y = load_breast_cancer(return_X_y=True)
>>> clf = LogisticRegression(solver="liblinear", random_state=0).fit(X, y)
>>> roc_auc_score(y, clf.predict_proba(X)[:, 1]).execute()
0.99...
>>> roc_auc_score(y, clf.decision_function(X)).execute()
0.99...
```
Multiclass case:
```pycon
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> clf = LogisticRegression(solver="liblinear").fit(X, y)
>>> roc_auc_score(y, clf.predict_proba(X), multi_class='ovr').execute()
0.99...
```
Multilabel case:
```pycon
>>> import numpy as np
>>> from sklearn.datasets import make_multilabel_classification
>>> from sklearn.multioutput import MultiOutputClassifier
>>> X, y = make_multilabel_classification(random_state=0)
>>> clf = MultiOutputClassifier(clf).fit(X, y)
>>> # get a list of n_output containing probability arrays of shape
>>> # (n_samples, n_classes)
>>> y_pred = clf.predict_proba(X)
>>> # extract the positive columns for each output
>>> y_pred = np.transpose([pred[:, 1] for pred in y_pred])
>>> roc_auc_score(y, y_pred, average=None).execute()
array([0.82..., 0.86..., 0.94..., 0.85... , 0.94...])
>>> from sklearn.linear_model import RidgeClassifierCV
>>> clf = RidgeClassifierCV().fit(X, y)
>>> roc_auc_score(y, clf.decision_function(X), average=None).execute()
array([0.81..., 0.84... , 0.93..., 0.87..., 0.94...])
```
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.metrics.roc_curve.md
# maxframe.learn.metrics.roc_curve
### maxframe.learn.metrics.roc_curve(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True, execute=False, session=None, run_kwargs=None)
Compute Receiver operating characteristic (ROC)
Note: this implementation is restricted to the binary classification task.
Read more in the User Guide.
* **Parameters:**
* **y_true** (*tensor* *,* *shape =* *[**n_samples* *]*) – True binary labels. If labels are not either {-1, 1} or {0, 1}, then
pos_label should be explicitly given.
* **y_score** (*tensor* *,* *shape =* *[**n_samples* *]*) – Target scores, can either be probability estimates of the positive
class, confidence values, or non-thresholded measure of decisions
(as returned by “decision_function” on some classifiers).
* **pos_label** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *default=None*) – The label of the positive class.
When `pos_label=None`, if y_true is in {-1, 1} or {0, 1},
`pos_label` is set to 1, otherwise an error will be raised.
* **sample_weight** (*array-like* *of* *shape* *(**n_samples* *,* *)* *,* *default=None*) – Sample weights.
* **drop_intermediate** (*boolean* *,* *optional* *(**default=True* *)*) – Whether to drop some suboptimal thresholds which would not appear
on a plotted ROC curve. This is useful in order to create lighter
ROC curves.
* **Returns:**
* **fpr** (*tensor, shape = [>2]*) – Increasing false positive rates such that element i is the false
positive rate of predictions with score >= thresholds[i].
* **tpr** (*tensor, shape = [>2]*) – Increasing true positive rates such that element i is the true
positive rate of predictions with score >= thresholds[i].
* **thresholds** (*tensor, shape = [n_thresholds]*) – Decreasing thresholds on the decision function used to compute
fpr and tpr. thresholds[0] represents no instances being predicted
and is arbitrarily set to max(y_score) + 1.
#### SEE ALSO
[`roc_auc_score`](maxframe.learn.metrics.roc_auc_score.md#maxframe.learn.metrics.roc_auc_score)
: Compute the area under the ROC curve
### Notes
Since the thresholds are sorted from low to high values, they
are reversed upon returning them to ensure they correspond to both `fpr`
and `tpr`, which are sorted in reversed order during their calculation.
### References
* <a id='id1'>**[1]**</a> [Wikipedia entry for the Receiver operating characteristic](https://en.wikipedia.org/wiki/Receiver_operating_characteristic)
* <a id='id2'>**[2]**</a> Fawcett T. An introduction to ROC analysis[J]. Pattern Recognition Letters, 2006, 27(8):861-874.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> from maxframe.learn import metrics
>>> y = mt.array([1, 1, 2, 2])
>>> scores = mt.array([0.1, 0.4, 0.35, 0.8])
>>> fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)
>>> fpr
array([0. , 0. , 0.5, 0.5, 1. ])
>>> tpr
array([0. , 0.5, 0.5, 1. , 1. ])
>>> thresholds
array([1.8 , 0.8 , 0.4 , 0.35, 0.1 ])
```
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.model_selection.KFold.md
# maxframe.learn.model_selection.KFold
### *class* maxframe.learn.model_selection.KFold(n_splits=5, , shuffle=False, random_state=None)
K-Folds cross-validator
Provides train/test indices to split data in train/test sets. Split
dataset into k consecutive folds (without shuffling by default).
Each fold is then used once as a validation while the k - 1 remaining
folds form the training set.
* **Parameters:**
* **n_splits** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=5*) –
Number of folds. Must be at least 2.
#### Versionchanged
Changed in version 0.22: `n_splits` default value changed from 3 to 5.
* **shuffle** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=False*) – Whether to shuffle the data before splitting into batches.
Note that the samples within each split will not be shuffled.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *RandomState instance* *or* *None* *,* *default=None*) – When shuffle is True, random_state affects the ordering of the
indices, which controls the randomness of each fold. Otherwise, this
parameter has no effect.
Pass an int for reproducible output across multiple function calls.
See Glossary.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> from maxframe.learn.model_selection import KFold
>>> X = mt.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = mt.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
2
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]
```
### Notes
The first `n_samples % n_splits` folds have size
`n_samples // n_splits + 1`, other folds have size
`n_samples // n_splits`, where `n_samples` is the number of samples.
Randomized CV splitters may return different results for each call of
split. You can make the results identical by setting random_state
to an integer.
#### SEE ALSO
`StratifiedKFold`
: Takes group information into account to avoid building folds with imbalanced class distributions (for binary or multiclass classification tasks).
`GroupKFold`
: K-fold iterator variant with non-overlapping groups.
`RepeatedKFold`
: Repeats K-Fold n times.
#### \_\_init_\_(n_splits=5, , shuffle=False, random_state=None)
### Methods
| [`__init__`](#maxframe.learn.model_selection.KFold.__init__)([n_splits, shuffle, random_state]) | |
|---------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
| `get_n_splits`([X, y, groups]) | Returns the number of splitting iterations in the cross-validator |
| `split`(X[, y, groups]) | Generate indices to split data into training and test set. |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.model_selection.train_test_split.md
# maxframe.learn.model_selection.train_test_split
### maxframe.learn.model_selection.train_test_split(\*arrays, \*\*options)
Split arrays or matrices into random train and test subsets
* **Parameters:**
* **\*arrays** (*sequence* *of* *indexables with same length / shape* *[**0* *]*) – Allowed inputs are lists, numpy arrays, scipy-sparse
matrices or pandas dataframes.
* **test_size** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* [*int*](https://docs.python.org/3/library/functions.html#int) *or* *None* *,* *optional* *(**default=None* *)*) – If float, should be between 0.0 and 1.0 and represent the proportion
of the dataset to include in the test split. If int, represents the
absolute number of test samples. If None, the value is set to the
complement of the train size. If `train_size` is also None, it will
be set to 0.25.
* **train_size** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* [*int*](https://docs.python.org/3/library/functions.html#int) *, or* *None* *,* *(**default=None* *)*) – If float, should be between 0.0 and 1.0 and represent the
proportion of the dataset to include in the train split. If
int, represents the absolute number of train samples. If None,
the value is automatically set to the complement of the test size.
* **random_state** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *RandomState instance* *or* *None* *,* *optional* *(**default=None* *)*) – If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random.
* **shuffle** (*boolean* *,* *optional* *(**default=True* *)*) – Whether or not to shuffle the data before splitting. If shuffle=False
then stratify must be None. CURRENTLY only shuffle=False is supported.
* **stratify** (*array-like* *or* *None* *(**default=None* *)*) – If not None, data is split in a stratified fashion, using this as
the class labels.
* **Returns:**
**splitting** – List containing train-test split of inputs.
* **Return type:**
[list](https://docs.python.org/3/library/stdtypes.html#list), length=2 \* len(arrays)
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> from maxframe.learn.model_selection import train_test_split
>>> X, y = mt.arange(10).reshape((5, 2)), range(5)
>>> X.execute()
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> list(y)
[0, 1, 2, 3, 4]
```
```pycon
>>> X_train, X_test, y_train, y_test = train_test_split(
... X, y, test_size=0.33, random_state=42)
...
>>> X_train.execute()
array([[8, 9],
[0, 1],
[4, 5]])
>>> y_train.execute()
array([4, 0, 2])
>>> X_test.execute()
array([[2, 3],
[6, 7]])
>>> y_test.execute()
array([1, 3])
```
```pycon
>>> train_test_split(y, shuffle=False)
[array([0, 1, 2]), array([3, 4])]
```
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.preprocessing.LabelBinarizer.md
# maxframe.learn.preprocessing.LabelBinarizer
### *class* maxframe.learn.preprocessing.LabelBinarizer(, neg_label=0, pos_label=1, sparse_output=False)
Binarize labels in a one-vs-all fashion.
Several regression and binary classification algorithms are
available in scikit-learn. A simple way to extend these algorithms
to the multi-class classification case is to use the so-called
one-vs-all scheme.
At learning time, this simply consists in learning one regressor
or binary classifier per class. In doing so, one needs to convert
multi-class labels to binary labels (belong or does not belong
to the class). LabelBinarizer makes this process easy with the
transform method.
At prediction time, one assigns the class for which the corresponding
model gave the greatest confidence. LabelBinarizer makes this easy
with the inverse_transform method.
Read more in the User Guide.
* **Parameters:**
* **neg_label** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=0*) – Value with which negative labels must be encoded.
* **pos_label** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=1*) – Value with which positive labels must be encoded.
* **sparse_output** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=False*) – True if the returned array from transform is desired to be in sparse
CSR format.
#### classes_
Holds the label for each class.
* **Type:**
ndarray of shape (n_classes,)
#### y_type_
Represents the type of the target data as evaluated by
utils.multiclass.type_of_target. Possible type are ‘continuous’,
‘continuous-multioutput’, ‘binary’, ‘multiclass’,
‘multiclass-multioutput’, ‘multilabel-indicator’, and ‘unknown’.
* **Type:**
[str](https://docs.python.org/3/library/stdtypes.html#str)
#### sparse_input_
True if the input data to transform is given as a sparse matrix, False
otherwise.
* **Type:**
[bool](https://docs.python.org/3/library/functions.html#bool)
### Examples
```pycon
>>> from maxframe.learn import preprocessing
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit([1, 2, 6, 4, 2])
LabelBinarizer()
>>> lb.classes_.execute()
array([1, 2, 4, 6])
>>> lb.transform([1, 6]).execute()
array([[1, 0, 0, 0],
[0, 0, 0, 1]])
```
Binary targets transform to a column vector
```pycon
>>> lb = preprocessing.LabelBinarizer()
>>> lb.fit_transform(['yes', 'no', 'no', 'yes']).execute()
array([[1],
[0],
[0],
[1]])
```
Passing a 2D matrix for multilabel classification
```pycon
>>> import numpy as np
>>> lb.fit(np.array([[0, 1, 1], [1, 0, 0]]))
LabelBinarizer()
>>> lb.classes_.execute()
array([0, 1, 2])
>>> lb.transform([0, 1, 2, 1]).execute()
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 1, 0]])
```
#### SEE ALSO
[`label_binarize`](maxframe.learn.preprocessing.label_binarize.md#maxframe.learn.preprocessing.label_binarize)
: Function to perform the transform operation of LabelBinarizer with fixed classes.
`OneHotEncoder`
: Encode categorical features using a one-hot aka one-of-K scheme.
#### \_\_init_\_(, neg_label=0, pos_label=1, sparse_output=False)
### Methods
| [`__init__`](#maxframe.learn.preprocessing.LabelBinarizer.__init__)(\*[, neg_label, pos_label, ...]) | |
|--------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
| `execute`([session, run_kwargs, extra_tileables]) | |
| `fetch`([session, run_kwargs]) | |
| `fit`(y[, check, execute, session, run_kwargs]) | Fit label binarizer. |
| `fit_transform`(y[, check, execute, session, ...]) | Fit label binarizer and transform multi-class labels to binary labels. |
| `get_metadata_routing`() | Get metadata routing of this object. |
| `get_params`([deep]) | Get parameters for this estimator. |
| `inverse_transform`(Y[, threshold]) | Transform binary labels back to multi-class labels. |
| `set_fit_request`(\*[, check, execute, ...]) | Request metadata passed to the `fit` method. |
| `set_inverse_transform_request`(\*[, threshold]) | Request metadata passed to the `inverse_transform` method. |
| `set_params`(\*\*params) | Set the parameters of this estimator. |
| `set_transform_request`(\*[, check, execute, ...]) | Request metadata passed to the `transform` method. |
| `transform`(y[, check, execute, session, ...]) | Transform multi-class labels to binary labels. |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.preprocessing.LabelEncoder.md
# maxframe.learn.preprocessing.LabelEncoder
### *class* maxframe.learn.preprocessing.LabelEncoder
Encode target labels with value between 0 and n_classes-1.
This transformer should be used to encode target values, *i.e.* y, and
not the input X.
Read more in the User Guide.
#### classes_
Holds the label for each class.
* **Type:**
ndarray of shape (n_classes,)
#### SEE ALSO
`OrdinalEncoder`
: Encode categorical features using an ordinal encoding scheme.
`OneHotEncoder`
: Encode categorical features as a one-hot numeric array.
### Examples
LabelEncoder can be used to normalize labels.
```pycon
>>> from maxframe.learn import preprocessing
>>> le = preprocessing.LabelEncoder()
>>> le.fit([1, 2, 2, 6]).execute()
LabelEncoder()
>>> le.classes_.to_numpy()
array([1, 2, 6])
>>> le.transform([1, 1, 2, 6]).to_numpy()
array([0, 0, 1, 2]...)
>>> le.inverse_transform([0, 0, 1, 2]).to_numpy()
array([1, 1, 2, 6])
```
It can also be used to transform non-numerical labels (as long as they are
hashable and comparable) to numerical labels.
```pycon
>>> le = preprocessing.LabelEncoder()
>>> le.fit(["paris", "paris", "tokyo", "amsterdam"]).execute()
LabelEncoder()
>>> list(le.classes_.to_numpy())
['amsterdam', 'paris', 'tokyo']
>>> le.transform(["tokyo", "tokyo", "paris"]).to_numpy()
array([2, 2, 1]...)
>>> list(le.inverse_transform([2, 2, 1]).to_numpy())
['tokyo', 'tokyo', 'paris']
```
#### \_\_init_\_()
### Methods
| [`__init__`](#maxframe.learn.preprocessing.LabelEncoder.__init__)() | |
|-----------------------------------------------------------------------|------------------------------------------------------------|
| `execute`([session, run_kwargs, extra_tileables]) | |
| `fetch`([session, run_kwargs]) | |
| `fit`(y[, execute, session, run_kwargs]) | Fit label encoder. |
| `fit_transform`(y[, execute, session, run_kwargs]) | Fit label encoder and return encoded labels. |
| `get_metadata_routing`() | Get metadata routing of this object. |
| `get_params`([deep]) | Get parameters for this estimator. |
| `inverse_transform`(y[, execute, session, ...]) | Transform labels back to original encoding. |
| `set_fit_request`(\*[, execute, run_kwargs, ...]) | Request metadata passed to the `fit` method. |
| `set_inverse_transform_request`(\*[, execute, ...]) | Request metadata passed to the `inverse_transform` method. |
| `set_params`(\*\*params) | Set the parameters of this estimator. |
| `set_transform_request`(\*[, execute, ...]) | Request metadata passed to the `transform` method. |
| `transform`(y[, execute, session, run_kwargs]) | Transform labels to normalized encoding. |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.preprocessing.MinMaxScaler.md
# maxframe.learn.preprocessing.MinMaxScaler
### *class* maxframe.learn.preprocessing.MinMaxScaler(feature_range=(0, 1), copy=True, clip=False, validate=True)
Transform features by scaling each feature to a given range.
This estimator scales and translates each feature individually such
that it is in the given range on the training set, e.g. between
zero and one.
The transformation is given by:
```default
X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min
```
where min, max = feature_range.
This transformation is often used as an alternative to zero mean,
unit variance scaling.
Read more in the User Guide.
* **Parameters:**
* **feature_range** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *(**min* *,* *max* *)* *,* *default=* *(**0* *,* *1* *)*) – Desired range of transformed data.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=True*) – Set to False to perform inplace row normalization and avoid a
copy (if the input is already a numpy array).
* **clip** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=False*) – Set to True to clip transformed values of held-out data to
provided feature range.
#### min_
Per feature adjustment for minimum. Equivalent to
`min - X.min(axis=0) * self.scale_`
* **Type:**
Tensor of shape (n_features,)
#### scale_
Per feature relative scaling of the data. Equivalent to
`(max - min) / (X.max(axis=0) - X.min(axis=0))`
* **Type:**
Tensor of shape (n_features,)
#### data_min_
Per feature minimum seen in the data
* **Type:**
ndarray of shape (n_features,)
#### data_max_
Per feature maximum seen in the data
* **Type:**
ndarray of shape (n_features,)
#### data_range_
Per feature range `(data_max_ - data_min_)` seen in the data
* **Type:**
ndarray of shape (n_features,)
#### n_samples_seen_
The number of samples processed by the estimator.
It will be reset on new calls to fit, but increments across
`partial_fit` calls.
* **Type:**
[int](https://docs.python.org/3/library/functions.html#int)
### Examples
```pycon
>>> from maxframe.learn.preprocessing import MinMaxScaler
>>> data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
>>> scaler = MinMaxScaler()
>>> print(scaler.fit(data))
MinMaxScaler()
>>> print(scaler.data_max_)
[ 1. 18.]
>>> print(scaler.transform(data))
[[0. 0. ]
[0.25 0.25]
[0.5 0.5 ]
[1. 1. ]]
>>> print(scaler.transform([[2, 2]]))
[[1.5 0. ]]
```
#### SEE ALSO
[`minmax_scale`](maxframe.learn.preprocessing.minmax_scale.md#maxframe.learn.preprocessing.minmax_scale)
: Equivalent function without the estimator API.
### Notes
NaNs are treated as missing values: disregarded in fit, and maintained in
transform.
For a comparison of the different scalers, transformers, and normalizers,
see examples/preprocessing/plot_all_scaling.py.
#### \_\_init_\_(feature_range=(0, 1), copy=True, clip=False, validate=True)
### Methods
| [`__init__`](#maxframe.learn.preprocessing.MinMaxScaler.__init__)([feature_range, copy, clip, validate]) | |
|------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| `execute`([session, run_kwargs, extra_tileables]) | |
| `fetch`([session, run_kwargs]) | |
| `fit`(X[, y, execute, session, run_kwargs]) | Compute the minimum and maximum to be used for later scaling. |
| `fit_transform`(X[, y]) | Fit to data, then transform it. |
| `get_metadata_routing`() | Get metadata routing of this object. |
| `get_params`([deep]) | Get parameters for this estimator. |
| `inverse_transform`(X[, execute, session, ...]) | Undo the scaling of X according to feature_range. |
| `partial_fit`(X[, y, execute, session, run_kwargs]) | Online computation of min and max on X for later scaling. |
| `set_fit_request`(\*[, execute, run_kwargs, ...]) | Request metadata passed to the `fit` method. |
| `set_inverse_transform_request`(\*[, execute, ...]) | Request metadata passed to the `inverse_transform` method. |
| `set_params`(\*\*params) | Set the parameters of this estimator. |
| `set_partial_fit_request`(\*[, execute, ...]) | Request metadata passed to the `partial_fit` method. |
| `set_transform_request`(\*[, execute, ...]) | Request metadata passed to the `transform` method. |
| `transform`(X[, execute, session, run_kwargs]) | Scale features of X according to feature_range. |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.preprocessing.StandardScaler.md
# maxframe.learn.preprocessing.StandardScaler
### *class* maxframe.learn.preprocessing.StandardScaler(, copy=True, with_mean=True, with_std=True, validate=True)
Standardize features by removing the mean and scaling to unit variance.
The standard score of a sample x is calculated as:
```text
z = (x - u) / s
```
where u is the mean of the training samples or zero if with_mean=False,
and s is the standard deviation of the training samples or one if
with_std=False.
Centering and scaling happen independently on each feature by computing
the relevant statistics on the samples in the training set. Mean and
standard deviation are then stored to be used on later data using
`transform()`.
Standardization of a dataset is a common requirement for many
machine learning estimators: they might behave badly if the
individual features do not more or less look like standard normally
distributed data (e.g. Gaussian with 0 mean and unit variance).
For instance many elements used in the objective function of
a learning algorithm (such as the RBF kernel of Support Vector
Machines or the L1 and L2 regularizers of linear models) assume that
all features are centered around 0 and have variance in the same
order. If a feature has a variance that is orders of magnitude larger
than others, it might dominate the objective function and make the
estimator unable to learn from other features correctly as expected.
StandardScaler is sensitive to outliers, and the features may scale
differently from each other in the presence of outliers. For an example
visualization, refer to Compare StandardScaler with other scalers.
This scaler can also be applied to sparse CSR or CSC matrices by passing
with_mean=False to avoid breaking the sparsity structure of the data.
Read more in the User Guide.
* **Parameters:**
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=True*) – If False, try to avoid a copy and do inplace scaling instead.
This is not guaranteed to always work inplace; e.g. if the data is
not a NumPy array or scipy.sparse CSR matrix, a copy may still be
returned.
* **with_mean** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=True*) – If True, center the data before scaling.
This does not work (and will raise an exception) when attempted on
sparse matrices, because centering them entails building a dense
matrix which in common use cases is likely to be too large to fit in
memory.
* **with_std** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=True*) – If True, scale the data to unit variance (or equivalently,
unit standard deviation).
#### scale_
Per feature relative scaling of the data to achieve zero mean and unit
variance. Generally this is calculated using np.sqrt(var_). If a
variance is zero, we can’t achieve unit variance, and the data is left
as-is, giving a scaling factor of 1. scale_ is equal to None
when with_std=False.
* **Type:**
ndarray of shape (n_features,) or None
#### mean_
The mean value for each feature in the training set.
Equal to `None` when `with_mean=False` and `with_std=False`.
* **Type:**
ndarray of shape (n_features,) or None
#### var_
The variance for each feature in the training set. Used to compute
scale_. Equal to `None` when `with_mean=False` and
`with_std=False`.
* **Type:**
ndarray of shape (n_features,) or None
#### n_features_in_
Number of features seen during fit.
* **Type:**
[int](https://docs.python.org/3/library/functions.html#int)
#### feature_names_in_
Names of features seen during fit. Defined only when X
has feature names that are all strings.
* **Type:**
ndarray of shape (n_features_in_,)
#### n_samples_seen_
The number of samples processed by the estimator for each feature.
If there are no missing samples, the `n_samples_seen` will be an
integer, otherwise it will be an array of dtype int. If
sample_weights are used it will be a float (if no missing data)
or an array of dtype float that sums the weights seen so far.
Will be reset on new calls to fit, but increments across
`partial_fit` calls.
* **Type:**
[int](https://docs.python.org/3/library/functions.html#int) or ndarray of shape (n_features,)
#### SEE ALSO
[`scale`](maxframe.learn.preprocessing.scale.md#maxframe.learn.preprocessing.scale)
: Equivalent function without the estimator API.
`PCA`
: Further removes the linear correlation across features with ‘whiten=True’.
### Notes
NaNs are treated as missing values: disregarded in fit, and maintained in
transform.
We use a biased estimator for the standard deviation, equivalent to
numpy.std(x, ddof=0). Note that the choice of ddof is unlikely to
affect model performance.
### Examples
```pycon
>>> from maxframe.learn.preprocessing import StandardScaler
>>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
>>> scaler = StandardScaler()
>>> print(scaler.fit(data))
StandardScaler()
>>> print(scaler.mean_.execute())
[0.5 0.5]
>>> print(scaler.transform(data).execute())
[[-1. -1.]
[-1. -1.]
[ 1. 1.]
[ 1. 1.]]
>>> print(scaler.transform([[2, 2]]).execute())
[[3. 3.]]
```
#### \_\_init_\_(, copy=True, with_mean=True, with_std=True, validate=True)
### Methods
| [`__init__`](#maxframe.learn.preprocessing.StandardScaler.__init__)(\*[, copy, with_mean, with_std, ...]) | |
|-------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|
| `execute`([session, run_kwargs, extra_tileables]) | |
| `fetch`([session, run_kwargs]) | |
| `fit`(X[, y, sample_weight, execute, session, ...]) | Compute the mean and std to be used for later scaling. |
| `fit_transform`(X[, y]) | Fit to data, then transform it. |
| `get_metadata_routing`() | Get metadata routing of this object. |
| `get_params`([deep]) | Get parameters for this estimator. |
| `inverse_transform`(X[, copy, execute, ...]) | Scale back the data to the original representation. |
| `partial_fit`(X[, y, sample_weight, execute, ...]) | Online computation of mean and std on X for later scaling. |
| `set_fit_request`(\*[, execute, run_kwargs, ...]) | Request metadata passed to the `fit` method. |
| `set_inverse_transform_request`(\*[, copy, ...]) | Request metadata passed to the `inverse_transform` method. |
| `set_params`(\*\*params) | Set the parameters of this estimator. |
| `set_partial_fit_request`(\*[, execute, ...]) | Request metadata passed to the `partial_fit` method. |
| `set_transform_request`(\*[, copy, execute, ...]) | Request metadata passed to the `transform` method. |
| `transform`(X[, copy, execute, session, ...]) | Perform standardization by centering and scaling. |
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.preprocessing.label_binarize.md
# maxframe.learn.preprocessing.label_binarize
### maxframe.learn.preprocessing.label_binarize(y, , classes, neg_label=0, pos_label=1, sparse_output=False, execute=False, session=None, run_kwargs=None)
Binarize labels in a one-vs-all fashion.
Several regression and binary classification algorithms are
available in scikit-learn. A simple way to extend these algorithms
to the multi-class classification case is to use the so-called
one-vs-all scheme.
This function makes it possible to compute this transformation for a
fixed set of class labels known ahead of time.
* **Parameters:**
* **y** (*array-like*) – Sequence of integer labels or multilabel data to encode.
* **classes** (*array-like* *of* *shape* *(**n_classes* *,* *)*) – Uniquely holds the label for each class.
* **neg_label** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=0*) – Value with which negative labels must be encoded.
* **pos_label** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=1*) – Value with which positive labels must be encoded.
* **sparse_output** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=False* *,*) – Set to true if output binary array is desired in CSR sparse format.
* **Returns:**
**Y** – Shape will be (n_samples, 1) for binary problems.
* **Return type:**
{tensor, sparse tensor} of shape (n_samples, n_classes)
### Examples
```pycon
>>> from maxframe.learn.preprocessing import label_binarize
>>> label_binarize([1, 6], classes=[1, 2, 4, 6])
array([[1, 0, 0, 0],
[0, 0, 0, 1]])
```
The class ordering is preserved:
```pycon
>>> label_binarize([1, 6], classes=[1, 6, 4, 2])
array([[1, 0, 0, 0],
[0, 1, 0, 0]])
```
Binary targets transform to a column vector
```pycon
>>> label_binarize(['yes', 'no', 'no', 'yes'], classes=['no', 'yes'])
array([[1],
[0],
[0],
[1]])
```
#### SEE ALSO
[`LabelBinarizer`](maxframe.learn.preprocessing.LabelBinarizer.md#maxframe.learn.preprocessing.LabelBinarizer)
: Class used to wrap the functionality of label_binarize and allow for fitting to classes independently of the transform operation.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.preprocessing.minmax_scale.md
# maxframe.learn.preprocessing.minmax_scale
### maxframe.learn.preprocessing.minmax_scale(X, feature_range=(0, 1), , axis=0, copy=True, validate=True, execute=False, session=None, run_kwargs=None)
Transform features by scaling each feature to a given range.
This estimator scales and translates each feature individually such
that it is in the given range on the training set, i.e. between
zero and one.
The transformation is given by (when `axis=0`):
```default
X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min
```
where min, max = feature_range.
The transformation is calculated as (when `axis=0`):
```default
X_scaled = scale * X + min - X.min(axis=0) * scale
where scale = (max - min) / (X.max(axis=0) - X.min(axis=0))
```
This transformation is often used as an alternative to zero mean,
unit variance scaling.
Read more in the User Guide.
#### Versionadded
Added in version 0.17: *minmax_scale* function interface
to `MinMaxScaler`.
* **Parameters:**
* **X** (*array-like* *of* *shape* *(**n_samples* *,* *n_features* *)*) – The data.
* **feature_range** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *(**min* *,* *max* *)* *,* *default=* *(**0* *,* *1* *)*) – Desired range of transformed data.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *default=0*) – Axis used to scale along. If 0, independently scale each feature,
otherwise (if 1) scale each sample.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=True*) – Set to False to perform inplace scaling and avoid a copy (if the input
is already a numpy array).
* **Returns:**
* **X_tr** (*ndarray of shape (n_samples, n_features)*) – The transformed data.
* *.. warning:: Risk of data leak* – Do not use `minmax_scale()` unless you know
what you are doing. A common mistake is to apply it to the entire data
*before* splitting into training and test sets. This will bias the
model evaluation because information would have leaked from the test
set to the training set.
In general, we recommend using
`MinMaxScaler` within a
Pipeline in order to prevent most risks of data
leaking: pipe = make_pipeline(MinMaxScaler(), LogisticRegression()).
#### SEE ALSO
[`MinMaxScaler`](maxframe.learn.preprocessing.MinMaxScaler.md#maxframe.learn.preprocessing.MinMaxScaler)
: Performs scaling to a given range using the Transformer API (e.g. as part of a preprocessing `Pipeline`).
### Notes
For a comparison of the different scalers, transformers, and normalizers,
see examples/preprocessing/plot_all_scaling.py.
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.preprocessing.normalize.md
# maxframe.learn.preprocessing.normalize
### maxframe.learn.preprocessing.normalize(X, norm='l2', axis=1, copy=True, return_norm=False)
Scale input vectors individually to unit norm (vector length).
* **Parameters:**
* **X** ( *{array-like* *,* *sparse matrix}* *,* *shape* *[**n_samples* *,* *n_features* *]*) – The data to normalize, element by element.
scipy.sparse matrices should be in CSR format to avoid an
un-necessary copy.
* **norm** ( *'l1'* *,* *'l2'* *, or* *'max'* *,* *optional* *(* *'l2' by default* *)*) – The norm to use to normalize each non zero sample (or each non-zero
feature if axis is 0).
* **axis** (*0* *or* *1* *,* *optional* *(**1 by default* *)*) – axis used to normalize the data along. If 1, independently normalize
each sample, otherwise (if 0) normalize each feature.
* **copy** (*boolean* *,* *optional* *,* *default True*) – set to False to perform inplace row normalization and avoid a
copy (if the input is already a tensor and if axis is 1).
* **return_norm** (*boolean* *,* *default False*) – whether to return the computed norms
* **Returns:**
* **X** ( *{array-like, sparse matrix}, shape [n_samples, n_features]*) – Normalized input X.
* **norms** (*Tensor, shape [n_samples] if axis=1 else [n_features]*) – A tensor of norms along given axis for X.
When X is sparse, a NotImplementedError will be raised
for norm ‘l1’ or ‘l2’.
#### SEE ALSO
`Normalizer`
: Performs normalization using the `Transformer` API (e.g. as part of a preprocessing `maxframe.learn.pipeline.Pipeline`).
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.preprocessing.scale.md
# maxframe.learn.preprocessing.scale
### maxframe.learn.preprocessing.scale(X, , axis=0, with_mean=True, with_std=True, copy=True, validate=True)
Standardize a dataset along any axis.
Center to the mean and component wise scale to unit variance.
Read more in the User Guide.
* **Parameters:**
* **X** ( *{array-like* *,* *sparse matrix}* *of* *shape* *(**n_samples* *,* *n_features* *)*) – The data to center and scale.
* **axis** ( *{0* *,* *1}* *,* *default=0*) – Axis used to compute the means and standard deviations along. If 0,
independently standardize each feature, otherwise (if 1) standardize
each sample.
* **with_mean** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=True*) – If True, center the data before scaling.
* **with_std** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=True*) – If True, scale the data to unit variance (or equivalently,
unit standard deviation).
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *default=True*) – If False, try to avoid a copy and scale in place.
This is not guaranteed to always work in place; e.g. if the data is
a numpy array with an int dtype, a copy will be returned even with
copy=False.
* **Returns:**
**X_tr** – The transformed data.
* **Return type:**
{ndarray, sparse matrix} of shape (n_samples, n_features)
#### SEE ALSO
[`StandardScaler`](maxframe.learn.preprocessing.StandardScaler.md#maxframe.learn.preprocessing.StandardScaler)
: Performs scaling to unit variance using the Transformer API (e.g. as part of a preprocessing `Pipeline`).
### Notes
This implementation will refuse to center scipy.sparse matrices
since it would make them non-sparse and would potentially crash the
program with memory exhaustion problems.
Instead the caller is expected to either set explicitly
with_mean=False (in that case, only variance scaling will be
performed on the features of the CSC matrix) or to call X.toarray()
if he/she expects the materialized dense array to fit in memory.
To avoid memory copy the caller should pass a CSC matrix.
NaNs are treated as missing values: disregarded to compute the statistics,
and maintained during the data transformation.
We use a biased estimator for the standard deviation, equivalent to
numpy.std(x, ddof=0). Note that the choice of ddof is unlikely to
affect model performance.
For a comparison of the different scalers, transformers, and normalizers,
see: sphx_glr_auto_examples_preprocessing_plot_all_scaling.py.
#### WARNING
Risk of data leak
Do not use `scale()` unless you know
what you are doing. A common mistake is to apply it to the entire data
*before* splitting into training and test sets. This will bias the
model evaluation because information would have leaked from the test
set to the training set.
In general, we recommend using
`StandardScaler` within a
Pipeline in order to prevent most risks of data
leaking: pipe = make_pipeline(StandardScaler(), LogisticRegression()).
### Examples
```pycon
>>> from maxframe.learn.preprocessing import scale
>>> X = [[-2, 1, 2], [-1, 0, 1]]
>>> scale(X, axis=0).execute() # scaling each column independently
array([[-1., 1., 1.],
[ 1., -1., -1.]])
>>> scale(X, axis=1).execute() # scaling each row independently
array([[-1.37..., 0.39..., 0.98...],
[-1.22..., 0. , 1.22...]])
```
FILE:references/maxframe-client-docs/reference/learn/generated/maxframe.learn.utils.check_consistent_length.md
# maxframe.learn.utils.check_consistent_length
### maxframe.learn.utils.check_consistent_length(\*arrays, ref_length=None)
Check that all arrays have consistent first dimensions.
Checks whether all objects in arrays have the same shape or length.
* **Parameters:**
**\*arrays** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *input objects.*) – Objects that will be checked for consistent length.
FILE:references/maxframe-client-docs/reference/learn/index.md
<a id="learn-api"></a>
# MaxFrame Learn
This document provides a comprehensive reference for MaxFrame’s machine learning capabilities, including:
- Machine learning algorithms
- Large Language Model (LLM) operations
- Related utility functions and classes
The following APIs are available through the `maxframe.learn.*` namespace.
Note that some experimental features may undergo changes in future releases.
* [Clustering](cluster.md)
* [Classes](cluster.md#classes)
* [Functions](cluster.md#functions)
* [Datasets](datasets.md)
* [Sample generator](datasets.md#sample-generator)
* [LightGBM Integration](lightgbm.md)
* [Data Structure](lightgbm.md#data-structure)
* [Training](lightgbm.md#training)
* [Callbacks](lightgbm.md#callbacks)
* [Scikit-learn API](lightgbm.md#scikit-learn-api)
* [LLM Integration](llm.md)
* [LLM Models](llm.md#module-maxframe.learn.contrib.llm.models)
* [Custom Model Configuration](llm.md#module-maxframe.learn.contrib.llm.deploy)
* [Text Generate Functions](llm.md#module-maxframe.learn.contrib.llm)
* [Metrics](metrics.md)
* [Classification Metrics](metrics.md#classification-metrics)
* [Regression Metrics](metrics.md#regression-metrics)
* [Pairwise metrics](metrics.md#pairwise-metrics)
* [Model Selection](model_selection.md)
* [Splitter Classes](model_selection.md#splitter-classes)
* [Splitter Functions](model_selection.md#splitter-functions)
* [Preprocessing](preprocessing.md)
* [Transform Classes](preprocessing.md#transform-classes)
* [Transform Functions](preprocessing.md#transform-functions)
* [Utilities](utils.md)
* [maxframe.learn.utils.check_consistent_length](generated/maxframe.learn.utils.check_consistent_length.md)
* [XGBoost Integration](xgboost.md)
* [Data Structure](xgboost.md#data-structure)
* [Training](xgboost.md#training)
* [Callbacks](xgboost.md#callbacks)
* [Scikit-learn API](xgboost.md#scikit-learn-api)
FILE:references/maxframe-client-docs/reference/learn/lightgbm.md
<a id="learn-lightgbm-ref"></a>
# LightGBM Integration
## Data Structure
| [`lightgbm.Dataset`](generated/maxframe.learn.contrib.lightgbm.Dataset.md#maxframe.learn.contrib.lightgbm.Dataset)(data[, label, reference, ...]) | |
|-----------------------------------------------------------------------------------------------------------------------------------------------------|----|
## Training
| [`lightgbm.predict`](generated/maxframe.learn.contrib.lightgbm.predict.md#maxframe.learn.contrib.lightgbm.predict)(booster, data[, raw_score, ...]) | |
|-------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| [`lightgbm.train`](generated/maxframe.learn.contrib.lightgbm.train.md#maxframe.learn.contrib.lightgbm.train)(params, train_set[, ...]) | |
## Callbacks
| [`lightgbm.callback.early_stopping`](generated/maxframe.learn.contrib.lightgbm.callback.early_stopping.md#maxframe.learn.contrib.lightgbm.callback.early_stopping)(stopping_rounds) | |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| [`lightgbm.callback.reset_parameter`](generated/maxframe.learn.contrib.lightgbm.callback.reset_parameter.md#maxframe.learn.contrib.lightgbm.callback.reset_parameter)(\*\*kwargs) | |
## Scikit-learn API
| [`lightgbm.LGBMClassifier`](generated/maxframe.learn.contrib.lightgbm.LGBMClassifier.md#maxframe.learn.contrib.lightgbm.LGBMClassifier)(\*args, \*\*kwargs) | |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| [`lightgbm.LGBMRegressor`](generated/maxframe.learn.contrib.lightgbm.LGBMRegressor.md#maxframe.learn.contrib.lightgbm.LGBMRegressor)(\*args, \*\*kwargs) | |
FILE:references/maxframe-client-docs/reference/learn/llm.md
<a id="learn-llm-ref"></a>
# LLM Integration
## LLM Models
| [`dashscope.DashScopeMultiModalLLM`](generated/maxframe.learn.contrib.llm.models.dashscope.DashScopeMultiModalLLM.md#maxframe.learn.contrib.llm.models.dashscope.DashScopeMultiModalLLM)(name, ...) | DashScope multi-modal LLM. |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------|
| [`dashscope.DashScopeTextLLM`](generated/maxframe.learn.contrib.llm.models.dashscope.DashScopeTextLLM.md#maxframe.learn.contrib.llm.models.dashscope.DashScopeTextLLM)(name, ...) | DashScope text LLM. |
| [`managed.ManagedTextLLM`](generated/maxframe.learn.contrib.llm.models.managed.ManagedTextLLM.md#maxframe.learn.contrib.llm.models.managed.ManagedTextLLM) | alias of `ManagedTextGenLLM` |
## Custom Model Configuration
| [`config.ModelDeploymentConfig`](generated/maxframe.learn.contrib.llm.deploy.config.ModelDeploymentConfig.md#maxframe.learn.contrib.llm.deploy.config.ModelDeploymentConfig)(\*args, \*\*kwargs) | Model deployment configuration for extending MaxFrame with custom models. |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
<a id="module-0"></a>
| [`framework.InferenceFrameworkEnum`](generated/maxframe.learn.contrib.llm.deploy.framework.InferenceFrameworkEnum.md#maxframe.learn.contrib.llm.deploy.framework.InferenceFrameworkEnum)(value) | |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
## Text Generate Functions
| [`multi_modal.generate`](generated/maxframe.learn.contrib.llm.multi_modal.generate.md#maxframe.learn.contrib.llm.multi_modal.generate)(data, model, ...[, params]) | Generate text with multi model llm based on given data and prompt template. |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| [`text.extract`](generated/maxframe.learn.contrib.llm.text.extract.md#maxframe.learn.contrib.llm.text.extract)(series, model, schema[, ...]) | Extract structured information from text content in a series using a language model. |
| [`text.generate`](generated/maxframe.learn.contrib.llm.text.generate.md#maxframe.learn.contrib.llm.text.generate)(data, model, prompt_template) | Generate text using a text language model based on given data and prompt template. |
| [`text.translate`](generated/maxframe.learn.contrib.llm.text.translate.md#maxframe.learn.contrib.llm.text.translate)(series, model, ...[, index]) | Translate text content in a series using a language model from source language to target language. |
FILE:references/maxframe-client-docs/reference/learn/metrics.md
<a id="learn-metrics-ref"></a>
# Metrics
## Classification Metrics
| [`metrics.accuracy_score`](generated/maxframe.learn.metrics.accuracy_score.md#maxframe.learn.metrics.accuracy_score)(y_true, y_pred[, ...]) | Accuracy classification score. |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
| [`metrics.auc`](generated/maxframe.learn.metrics.auc.md#maxframe.learn.metrics.auc)(x, y[, execute, session, run_kwargs]) | Compute Area Under the Curve (AUC) using the trapezoidal rule |
| [`metrics.f1_score`](generated/maxframe.learn.metrics.f1_score.md#maxframe.learn.metrics.f1_score)(y_true, y_pred, \*[, ...]) | Compute the F1 score, also known as balanced F-score or F-measure |
| [`metrics.fbeta_score`](generated/maxframe.learn.metrics.fbeta_score.md#maxframe.learn.metrics.fbeta_score)(y_true, y_pred, \*, beta) | Compute the F-beta score |
| [`metrics.log_loss`](generated/maxframe.learn.metrics.log_loss.md#maxframe.learn.metrics.log_loss)(y_true, y_pred, \*[, eps, ...]) | Log loss, aka logistic loss or cross-entropy loss. |
| [`metrics.multilabel_confusion_matrix`](generated/maxframe.learn.metrics.multilabel_confusion_matrix.md#maxframe.learn.metrics.multilabel_confusion_matrix)(y_true, ...) | Compute a confusion matrix for each class or sample. |
| [`metrics.precision_recall_fscore_support`](generated/maxframe.learn.metrics.precision_recall_fscore_support.md#maxframe.learn.metrics.precision_recall_fscore_support)(...) | Compute precision, recall, F-measure and support for each class |
| [`metrics.precision_score`](generated/maxframe.learn.metrics.precision_score.md#maxframe.learn.metrics.precision_score)(y_true, y_pred, \*[, ...]) | Compute the precision |
| [`metrics.recall_score`](generated/maxframe.learn.metrics.recall_score.md#maxframe.learn.metrics.recall_score)(y_true, y_pred, \*[, ...]) | Compute the recall |
| [`metrics.roc_auc_score`](generated/maxframe.learn.metrics.roc_auc_score.md#maxframe.learn.metrics.roc_auc_score)(y_true, y_score, \*[, ...]) | Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores. |
| [`metrics.roc_curve`](generated/maxframe.learn.metrics.roc_curve.md#maxframe.learn.metrics.roc_curve)(y_true, y_score[, ...]) | Compute Receiver operating characteristic (ROC) |
## Regression Metrics
| [`metrics.r2_score`](generated/maxframe.learn.metrics.r2_score.md#maxframe.learn.metrics.r2_score)(y_true, y_pred, \*[, ...]) | $R^2$ (coefficient of determination) regression score function. |
|---------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
## Pairwise metrics
| [`metrics.pairwise.cosine_distances`](generated/maxframe.learn.metrics.pairwise.cosine_distances.md#maxframe.learn.metrics.pairwise.cosine_distances)(X[, Y]) | Compute cosine distance between samples in X and Y. |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| [`metrics.pairwise.cosine_similarity`](generated/maxframe.learn.metrics.pairwise.cosine_similarity.md#maxframe.learn.metrics.pairwise.cosine_similarity)(X[, Y, ...]) | Compute cosine similarity between samples in X and Y. |
| [`metrics.pairwise.euclidean_distances`](generated/maxframe.learn.metrics.pairwise.euclidean_distances.md#maxframe.learn.metrics.pairwise.euclidean_distances)(X[, Y, ...]) | Considering the rows of X (and Y=X) as vectors, compute the distance matrix between each pair of vectors. |
| [`metrics.pairwise.haversine_distances`](generated/maxframe.learn.metrics.pairwise.haversine_distances.md#maxframe.learn.metrics.pairwise.haversine_distances)(X[, Y]) | Compute the Haversine distance between samples in X and Y |
| [`metrics.pairwise.manhattan_distances`](generated/maxframe.learn.metrics.pairwise.manhattan_distances.md#maxframe.learn.metrics.pairwise.manhattan_distances)(X[, Y]) | Compute the L1 distances between the vectors in X and Y. |
| [`metrics.pairwise.rbf_kernel`](generated/maxframe.learn.metrics.pairwise.rbf_kernel.md#maxframe.learn.metrics.pairwise.rbf_kernel)(X[, Y, gamma]) | Compute the rbf (gaussian) kernel between X and Y. |
| [`metrics.pairwise_distances`](generated/maxframe.learn.metrics.pairwise_distances.md#maxframe.learn.metrics.pairwise_distances)(X[, Y, metric]) | |
FILE:references/maxframe-client-docs/reference/learn/model_selection.md
<a id="learn-model-selection-ref"></a>
# Model Selection
## Splitter Classes
| [`model_selection.KFold`](generated/maxframe.learn.model_selection.KFold.md#maxframe.learn.model_selection.KFold)([n_splits, shuffle, ...]) | K-Folds cross-validator |
|-----------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|
## Splitter Functions
| [`model_selection.train_test_split`](generated/maxframe.learn.model_selection.train_test_split.md#maxframe.learn.model_selection.train_test_split)(\*arrays, ...) | Split arrays or matrices into random train and test subsets |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
FILE:references/maxframe-client-docs/reference/learn/preprocessing.md
<a id="learn-preprocessing-ref"></a>
# Preprocessing
## Transform Classes
| [`preprocessing.LabelBinarizer`](generated/maxframe.learn.preprocessing.LabelBinarizer.md#maxframe.learn.preprocessing.LabelBinarizer)(\*[, neg_label, ...]) | Binarize labels in a one-vs-all fashion. |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| [`preprocessing.LabelEncoder`](generated/maxframe.learn.preprocessing.LabelEncoder.md#maxframe.learn.preprocessing.LabelEncoder)() | Encode target labels with value between 0 and n_classes-1. |
| [`preprocessing.MinMaxScaler`](generated/maxframe.learn.preprocessing.MinMaxScaler.md#maxframe.learn.preprocessing.MinMaxScaler)([feature_range, ...]) | Transform features by scaling each feature to a given range. |
| [`preprocessing.StandardScaler`](generated/maxframe.learn.preprocessing.StandardScaler.md#maxframe.learn.preprocessing.StandardScaler)(\*[, copy, ...]) | Standardize features by removing the mean and scaling to unit variance. |
## Transform Functions
| [`preprocessing.label_binarize`](generated/maxframe.learn.preprocessing.label_binarize.md#maxframe.learn.preprocessing.label_binarize)(y, \*, classes) | Binarize labels in a one-vs-all fashion. |
|----------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|
| [`preprocessing.minmax_scale`](generated/maxframe.learn.preprocessing.minmax_scale.md#maxframe.learn.preprocessing.minmax_scale)(X[, ...]) | Transform features by scaling each feature to a given range. |
| [`preprocessing.normalize`](generated/maxframe.learn.preprocessing.normalize.md#maxframe.learn.preprocessing.normalize)(X[, norm, axis, ...]) | Scale input vectors individually to unit norm (vector length). |
| [`preprocessing.scale`](generated/maxframe.learn.preprocessing.scale.md#maxframe.learn.preprocessing.scale)(X, \*[, axis, with_mean, ...]) | Standardize a dataset along any axis. |
FILE:references/maxframe-client-docs/reference/learn/utils.md
<a id="learn-utils-ref"></a>
# Utilities
| [`utils.check_consistent_length`](generated/maxframe.learn.utils.check_consistent_length.md#maxframe.learn.utils.check_consistent_length)(\*arrays[, ...]) | Check that all arrays have consistent first dimensions. |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------|
FILE:references/maxframe-client-docs/reference/learn/xgboost.md
<a id="learn-xgboost-ref"></a>
# XGBoost Integration
## Data Structure
| [`xgboost.DMatrix`](generated/maxframe.learn.contrib.xgboost.DMatrix.md#maxframe.learn.contrib.xgboost.DMatrix)(data[, label, missing, ...]) | |
|------------------------------------------------------------------------------------------------------------------------------------------------|----|
## Training
| [`xgboost.predict`](generated/maxframe.learn.contrib.xgboost.predict.md#maxframe.learn.contrib.xgboost.predict)(model, data[, ...]) | Using MaxFrame XGBoost model to predict data. |
|-----------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
| [`xgboost.train`](generated/maxframe.learn.contrib.xgboost.train.md#maxframe.learn.contrib.xgboost.train)(params, dtrain[, evals, ...]) | Train XGBoost model in MaxFrame manner. |
## Callbacks
| [`xgboost.callback.EarlyStopping`](generated/maxframe.learn.contrib.xgboost.callback.EarlyStopping.md#maxframe.learn.contrib.xgboost.callback.EarlyStopping)(\*, rounds[, ...]) | |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| [`xgboost.callback.LearningRateScheduler`](generated/maxframe.learn.contrib.xgboost.callback.LearningRateScheduler.md#maxframe.learn.contrib.xgboost.callback.LearningRateScheduler)(...) | |
## Scikit-learn API
| [`xgboost.XGBClassifier`](generated/maxframe.learn.contrib.xgboost.XGBClassifier.md#maxframe.learn.contrib.xgboost.XGBClassifier)([xgb_model]) | Implementation of the scikit-learn API for XGBoost classification. |
|--------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| [`xgboost.XGBRegressor`](generated/maxframe.learn.contrib.xgboost.XGBRegressor.md#maxframe.learn.contrib.xgboost.XGBRegressor)([xgb_model]) | Implementation of the scikit-learn API for XGBoost regressor. |
FILE:references/maxframe-client-docs/reference/tensor/binary.md
# Binary Operations
## Elementwise bit operations
| [`maxframe.tensor.bitwise_and`](generated/maxframe.tensor.bitwise_and.md#maxframe.tensor.bitwise_and) | Compute the bit-wise AND of two tensors element-wise. |
|---------------------------------------------------------------------------------------------------------|------------------------------------------------------------|
| [`maxframe.tensor.bitwise_or`](generated/maxframe.tensor.bitwise_or.md#maxframe.tensor.bitwise_or) | Compute the bit-wise OR of two tensors element-wise. |
| [`maxframe.tensor.bitwise_xor`](generated/maxframe.tensor.bitwise_xor.md#maxframe.tensor.bitwise_xor) | Compute the bit-wise XOR of two arrays element-wise. |
| [`maxframe.tensor.invert`](generated/maxframe.tensor.invert.md#maxframe.tensor.invert) | Compute bit-wise inversion, or bit-wise NOT, element-wise. |
| [`maxframe.tensor.left_shift`](generated/maxframe.tensor.left_shift.md#maxframe.tensor.left_shift) | Shift the bits of an integer to the left. |
| [`maxframe.tensor.right_shift`](generated/maxframe.tensor.right_shift.md#maxframe.tensor.right_shift) | Shift the bits of an integer to the right. |
FILE:references/maxframe-client-docs/reference/tensor/creation.md
<a id="tensor-creation"></a>
# Tensor Creation Routines
## From shape or value
| [`maxframe.tensor.ones`](generated/maxframe.tensor.ones.md#maxframe.tensor.ones) | Return a new tensor of given shape and type, filled with ones. |
|----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------|
| [`maxframe.tensor.zeros`](generated/maxframe.tensor.zeros.md#maxframe.tensor.zeros) | Return a new tensor of given shape and type, filled with zeros. |
| [`maxframe.tensor.empty`](generated/maxframe.tensor.empty.md#maxframe.tensor.empty) | Return a new tensor of given shape and type, without initializing entries. |
| [`maxframe.tensor.empty_like`](generated/maxframe.tensor.empty_like.md#maxframe.tensor.empty_like) | Return a new tensor with the same shape and type as a given tensor. |
| [`maxframe.tensor.full`](generated/maxframe.tensor.full.md#maxframe.tensor.full) | Return a new tensor of given shape and type, filled with fill_value. |
| [`maxframe.tensor.full_like`](generated/maxframe.tensor.full_like.md#maxframe.tensor.full_like) | Return a full tensor with the same shape and type as a given tensor. |
## From existing data
| [`maxframe.tensor.tensor`](generated/maxframe.tensor.tensor.md#maxframe.tensor.tensor) | |
|-------------------------------------------------------------------------------------------|--------------------------------|
| [`maxframe.tensor.array`](generated/maxframe.tensor.array.md#maxframe.tensor.array) | Create a tensor. |
| [`maxframe.tensor.asarray`](generated/maxframe.tensor.asarray.md#maxframe.tensor.asarray) | Convert the input to an array. |
## Building matrices
| [`maxframe.tensor.diag`](generated/maxframe.tensor.diag.md#maxframe.tensor.diag) | Extract a diagonal or construct a diagonal tensor. |
|----------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| [`maxframe.tensor.diagflat`](generated/maxframe.tensor.diagflat.md#maxframe.tensor.diagflat) | Create a two-dimensional tensor with the flattened input as a diagonal. |
| [`maxframe.tensor.tril`](generated/maxframe.tensor.tril.md#maxframe.tensor.tril) | Lower triangle of a tensor. |
| [`maxframe.tensor.triu`](generated/maxframe.tensor.triu.md#maxframe.tensor.triu) | Upper triangle of a tensor. |
## Numerical ranges
| [`maxframe.tensor.arange`](generated/maxframe.tensor.arange.md#maxframe.tensor.arange) | Return evenly spaced values within a given interval. |
|----------------------------------------------------------------------------------------------|---------------------------------------------------------|
| [`maxframe.tensor.linspace`](generated/maxframe.tensor.linspace.md#maxframe.tensor.linspace) | Return evenly spaced numbers over a specified interval. |
| [`maxframe.tensor.meshgrid`](generated/maxframe.tensor.meshgrid.md#maxframe.tensor.meshgrid) | Return coordinate matrices from coordinate vectors. |
| [`maxframe.tensor.mgrid`](generated/maxframe.tensor.mgrid.md#maxframe.tensor.mgrid) | Construct a multi-dimensional "meshgrid". |
| [`maxframe.tensor.ogrid`](generated/maxframe.tensor.ogrid.md#maxframe.tensor.ogrid) | Construct a multi-dimensional "meshgrid". |
FILE:references/maxframe-client-docs/reference/tensor/fft.md
<a id="module-maxframe.tensor.fft"></a>
# Discrete Fourier Transform
## Standard FFTs
| [`maxframe.tensor.fft.fft`](generated/maxframe.tensor.fft.fft.md#maxframe.tensor.fft.fft) | Compute the one-dimensional discrete Fourier Transform. |
|-------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
| [`maxframe.tensor.fft.ifft`](generated/maxframe.tensor.fft.ifft.md#maxframe.tensor.fft.ifft) | Compute the one-dimensional inverse discrete Fourier Transform. |
| [`maxframe.tensor.fft.fft2`](generated/maxframe.tensor.fft.fft2.md#maxframe.tensor.fft.fft2) | Compute the 2-dimensional discrete Fourier Transform |
| [`maxframe.tensor.fft.ifft2`](generated/maxframe.tensor.fft.ifft2.md#maxframe.tensor.fft.ifft2) | Compute the 2-dimensional inverse discrete Fourier Transform. |
| [`maxframe.tensor.fft.fftn`](generated/maxframe.tensor.fft.fftn.md#maxframe.tensor.fft.fftn) | Compute the N-dimensional discrete Fourier Transform. |
| [`maxframe.tensor.fft.ifftn`](generated/maxframe.tensor.fft.ifftn.md#maxframe.tensor.fft.ifftn) | Compute the N-dimensional inverse discrete Fourier Transform. |
## Real FFTs
| [`maxframe.tensor.fft.rfft`](generated/maxframe.tensor.fft.rfft.md#maxframe.tensor.fft.rfft) | Compute the one-dimensional discrete Fourier Transform for real input. |
|----------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|
| [`maxframe.tensor.fft.irfft`](generated/maxframe.tensor.fft.irfft.md#maxframe.tensor.fft.irfft) | Compute the inverse of the n-point DFT for real input. |
| [`maxframe.tensor.fft.rfft2`](generated/maxframe.tensor.fft.rfft2.md#maxframe.tensor.fft.rfft2) | Compute the 2-dimensional FFT of a real tensor. |
| [`maxframe.tensor.fft.irfft2`](generated/maxframe.tensor.fft.irfft2.md#maxframe.tensor.fft.irfft2) | Compute the 2-dimensional inverse FFT of a real array. |
| [`maxframe.tensor.fft.rfftn`](generated/maxframe.tensor.fft.rfftn.md#maxframe.tensor.fft.rfftn) | Compute the N-dimensional discrete Fourier Transform for real input. |
| [`maxframe.tensor.fft.irfftn`](generated/maxframe.tensor.fft.irfftn.md#maxframe.tensor.fft.irfftn) | Compute the inverse of the N-dimensional FFT of real input. |
## Hermitian FFTs
| [`maxframe.tensor.fft.hfft`](generated/maxframe.tensor.fft.hfft.md#maxframe.tensor.fft.hfft) | Compute the FFT of a signal that has Hermitian symmetry, i.e., a real spectrum. |
|-------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
| [`maxframe.tensor.fft.ihfft`](generated/maxframe.tensor.fft.ihfft.md#maxframe.tensor.fft.ihfft) | Compute the inverse FFT of a signal that has Hermitian symmetry. |
## Helper routines
| [`maxframe.tensor.fft.fftfreq`](generated/maxframe.tensor.fft.fftfreq.md#maxframe.tensor.fft.fftfreq) | Return the Discrete Fourier Transform sample frequencies. |
|-------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|
| [`maxframe.tensor.fft.rfftfreq`](generated/maxframe.tensor.fft.rfftfreq.md#maxframe.tensor.fft.rfftfreq) | Return the Discrete Fourier Transform sample frequencies (for usage with rfft, irfft). |
| [`maxframe.tensor.fft.fftshift`](generated/maxframe.tensor.fft.fftshift.md#maxframe.tensor.fft.fftshift) | Shift the zero-frequency component to the center of the spectrum. |
| [`maxframe.tensor.fft.ifftshift`](generated/maxframe.tensor.fft.ifftshift.md#maxframe.tensor.fft.ifftshift) | The inverse of fftshift. |
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.absolute.md
# maxframe.tensor.absolute
### maxframe.tensor.absolute(x, out=None, where=None, \*\*kwargs)
Calculate the absolute value element-wise.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**absolute** – An tensor containing the absolute value of
each element in x. For complex input, `a + ib`, the
absolute value is $\sqrt{ a^2 + b^2 }$.
* **Return type:**
Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([-1.2, 1.2])
>>> mt.absolute(x).execute()
array([ 1.2, 1.2])
>>> mt.absolute(1.2 + 1j).execute()
1.5620499351813308
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.add.md
# maxframe.tensor.add
### maxframe.tensor.add(x1, x2, out=None, where=None, \*\*kwargs)
Add arguments element-wise.
* **Parameters:**
* **x1** (*array_like*) – The tensors to be added. If `x1.shape != x2.shape`, they must be
broadcastable to a common shape (which may be the shape of one or
the other).
* **x2** (*array_like*) – The tensors to be added. If `x1.shape != x2.shape`, they must be
broadcastable to a common shape (which may be the shape of one or
the other).
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**add** – The sum of x1 and x2, element-wise. Returns a scalar if
both x1 and x2 are scalars.
* **Return type:**
Tensor or scalar
### Notes
Equivalent to x1 + x2 in terms of tensor broadcasting.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.add(1.0, 4.0).execute()
5.0
>>> x1 = mt.arange(9.0).reshape((3, 3))
>>> x2 = mt.arange(3.0)
>>> mt.add(x1, x2).execute()
array([[ 0., 2., 4.],
[ 3., 5., 7.],
[ 6., 8., 10.]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.all.md
# maxframe.tensor.all
### maxframe.tensor.all(a, axis=None, out=None, keepdims=None)
Test whether all array elements along a given axis evaluate to True.
* **Parameters:**
* **a** (*array_like*) – Input tensor or object that can be converted to a tensor.
* **axis** (*None* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) –
Axis or axes along which a logical AND reduction is performed.
The default (axis = None) is to perform a logical AND over all
the dimensions of the input array. axis may be negative, in
which case it counts from the last to the first axis.
If this is a tuple of ints, a reduction is performed on multiple
axes, instead of a single axis or all the axes as before.
* **out** (*Tensor* *,* *optional*) – Alternate output tensor in which to place the result.
It must have the same shape as the expected output and its
type is preserved (e.g., if `dtype(out)` is float, the result
will consist of 0.0’s and 1.0’s). See doc.ufuncs (Section
“Output arguments”) for more details.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input tensor.
If the default value is passed, then keepdims will not be
passed through to the all method of sub-classes of
ndarray, however any non-default value will be. If the
sub-classes sum method does not implement keepdims any
exceptions will be raised.
* **Returns:**
**all** – A new boolean or tensor is returned unless out is specified,
in which case a reference to out is returned.
* **Return type:**
Tensor, [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
`Tensor.all`
: equivalent method
[`any`](maxframe.tensor.any.md#maxframe.tensor.any)
: Test whether any element along a given axis evaluates to True.
### Notes
Not a Number (NaN), positive infinity and negative infinity
evaluate to True because these are not equal to zero.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.all([[True,False],[True,True]]).execute()
False
```
```pycon
>>> mt.all([[True,False],[True,True]], axis=0).execute()
array([ True, False])
```
```pycon
>>> mt.all([-1, 4, 5]).execute()
True
```
```pycon
>>> mt.all([1.0, mt.nan]).execute()
True
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.allclose.md
# maxframe.tensor.allclose
### maxframe.tensor.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)
Returns True if two tensors are element-wise equal within a tolerance.
The tolerance values are positive, typically very small numbers. The
relative difference (rtol \* abs(b)) and the absolute difference
atol are added together to compare against the absolute difference
between a and b.
If either array contains one or more NaNs, False is returned.
Infs are treated as equal if they are in the same place and of the same
sign in both tensors.
* **Parameters:**
* **a** (*array_like*) – Input tensors to compare.
* **b** (*array_like*) – Input tensors to compare.
* **rtol** ([*float*](https://docs.python.org/3/library/functions.html#float)) – The relative tolerance parameter (see Notes).
* **atol** ([*float*](https://docs.python.org/3/library/functions.html#float)) – The absolute tolerance parameter (see Notes).
* **equal_nan** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – Whether to compare NaN’s as equal. If True, NaN’s in a will be
considered equal to NaN’s in b in the output tensor.
* **Returns:**
**allclose** – Returns True if the two tensors are equal within the given
tolerance; False otherwise.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`isclose`](maxframe.tensor.isclose.md#maxframe.tensor.isclose), [`all`](maxframe.tensor.all.md#maxframe.tensor.all), [`any`](maxframe.tensor.any.md#maxframe.tensor.any), [`equal`](maxframe.tensor.equal.md#maxframe.tensor.equal)
### Notes
If the following equation is element-wise True, then allclose returns
True.
> absolute(a - b) <= (atol + rtol \* absolute(b))
The above equation is not symmetric in a and b, so that
`allclose(a, b)` might be different from `allclose(b, a)` in
some rare cases.
The comparison of a and b uses standard broadcasting, which
means that a and b need not have the same shape in order for
`allclose(a, b)` to evaluate to True. The same is true for
equal but not array_equal.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.allclose([1e10,1e-7], [1.00001e10,1e-8]).execute()
False
>>> mt.allclose([1e10,1e-8], [1.00001e10,1e-9]).execute()
True
>>> mt.allclose([1e10,1e-8], [1.0001e10,1e-9]).execute()
False
>>> mt.allclose([1.0, mt.nan], [1.0, mt.nan]).execute()
False
>>> mt.allclose([1.0, mt.nan], [1.0, mt.nan], equal_nan=True).execute()
True
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.angle.md
# maxframe.tensor.angle
### maxframe.tensor.angle(z, deg=False, \*\*kwargs)
Return the angle of the complex argument.
* **Parameters:**
* **z** (*array_like*) – A complex number or sequence of complex numbers.
* **deg** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Return angle in degrees if True, radians if False (default).
* **Returns:**
**angle** – The counterclockwise angle from the positive real axis on
the complex plane, with dtype as numpy.float64.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`arctan2`](maxframe.tensor.arctan2.md#maxframe.tensor.arctan2), [`absolute`](maxframe.tensor.absolute.md#maxframe.tensor.absolute)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.angle([1.0, 1.0j, 1+1j]).execute() # in radians
array([ 0. , 1.57079633, 0.78539816])
>>> mt.angle(1+1j, deg=True).execute() # in degrees
45.0
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.any.md
# maxframe.tensor.any
### maxframe.tensor.any(a, axis=None, out=None, keepdims=None)
Test whether any tensor element along a given axis evaluates to True.
Returns single boolean unless axis is not `None`
* **Parameters:**
* **a** (*array_like*) – Input tensor or object that can be converted to an array.
* **axis** (*None* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) –
Axis or axes along which a logical OR reduction is performed.
The default (axis = None) is to perform a logical OR over all
the dimensions of the input array. axis may be negative, in
which case it counts from the last to the first axis.
If this is a tuple of ints, a reduction is performed on multiple
axes, instead of a single axis or all the axes as before.
* **out** (*Tensor* *,* *optional*) – Alternate output tensor in which to place the result. It must have
the same shape as the expected output and its type is preserved
(e.g., if it is of type float, then it will remain so, returning
1.0 for True and 0.0 for False, regardless of the type of a).
See doc.ufuncs (Section “Output arguments”) for details.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input tensor.
If the default value is passed, then keepdims will not be
passed through to the any method of sub-classes of
Tensor, however any non-default value will be. If the
sub-classes sum method does not implement keepdims any
exceptions will be raised.
* **Returns:**
**any** – A new boolean or Tensor is returned unless out is specified,
in which case a reference to out is returned.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool) or Tensor
#### SEE ALSO
`Tensor.any`
: equivalent method
[`all`](maxframe.tensor.all.md#maxframe.tensor.all)
: Test whether all elements along a given axis evaluate to True.
### Notes
Not a Number (NaN), positive infinity and negative infinity evaluate
to True because these are not equal to zero.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.any([[True, False], [True, True]]).execute()
True
```
```pycon
>>> mt.any([[True, False], [False, False]], axis=0).execute()
array([ True, False])
```
```pycon
>>> mt.any([-1, 0, 5]).execute()
True
```
```pycon
>>> mt.any(mt.nan).execute()
True
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.arange.md
# maxframe.tensor.arange
### maxframe.tensor.arange(\*args, \*\*kwargs)
Return evenly spaced values within a given interval.
Values are generated within the half-open interval `[start, stop)`
(in other words, the interval including start but excluding stop).
For integer arguments the function is equivalent to the Python built-in
[range](http://docs.python.org/lib/built-in-funcs.html) function,
but returns a tensor rather than a list.
When using a non-integer step, such as 0.1, the results will often not
be consistent. It is better to use `linspace` for these cases.
* **Parameters:**
* **start** (*number* *,* *optional*) – Start of interval. The interval includes this value. The default
start value is 0.
* **stop** (*number*) – End of interval. The interval does not include this value, except
in some cases where step is not an integer and floating point
round-off affects the length of out.
* **step** (*number* *,* *optional*) – Spacing between values. For any output out, this is the distance
between two adjacent values, `out[i+1] - out[i]`. The default
step size is 1. If step is specified as a position argument,
start must also be given.
* **dtype** (*dtype*) – The type of the output tensor. If dtype is not given, infer the data
type from the other input arguments.
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **Returns:**
**arange** – Tensor of evenly spaced values.
For floating point arguments, the length of the result is
`ceil((stop - start)/step)`. Because of floating point overflow,
this rule may result in the last element of out being greater
than stop.
* **Return type:**
Tensor
#### SEE ALSO
[`linspace`](maxframe.tensor.linspace.md#maxframe.tensor.linspace)
: Evenly spaced numbers with careful handling of endpoints.
[`ogrid`](maxframe.tensor.ogrid.md#maxframe.tensor.ogrid)
: Tensors of evenly spaced numbers in N-dimensions.
[`mgrid`](maxframe.tensor.mgrid.md#maxframe.tensor.mgrid)
: Grid-shaped tensors of evenly spaced numbers in N-dimensions.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.arange(3).execute()
array([0, 1, 2])
>>> mt.arange(3.0).execute()
array([ 0., 1., 2.])
>>> mt.arange(3,7).execute()
array([3, 4, 5, 6])
>>> mt.arange(3,7,2).execute()
array([3, 5])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.arccos.md
# maxframe.tensor.arccos
### maxframe.tensor.arccos(x, out=None, where=None, \*\*kwargs)
Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if `y = cos(x)`, then `x = arccos(y)`.
* **Parameters:**
* **x** (*array_like*) – x-coordinate on the unit circle.
For real arguments, the domain is [-1, 1].
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**angle** – The angle of the ray intersecting the unit circle at the given
x-coordinate in radians [0, pi]. If x is a scalar then a
scalar is returned, otherwise an array of the same shape as x
is returned.
* **Return type:**
Tensor
#### SEE ALSO
[`cos`](maxframe.tensor.cos.md#maxframe.tensor.cos), [`arctan`](maxframe.tensor.arctan.md#maxframe.tensor.arctan), [`arcsin`](maxframe.tensor.arcsin.md#maxframe.tensor.arcsin)
### Notes
arccos is a multivalued function: for each x there are infinitely
many numbers z such that cos(z) = x. The convention is to return
the angle z whose real part lies in [0, pi].
For real-valued input data types, arccos always returns real output.
For each value that cannot be expressed as a real number or infinity,
it yields `nan` and sets the invalid floating point error flag.
For complex-valued input, arccos is a complex analytic function that
has branch cuts [-inf, -1] and [1, inf] and is continuous from
above on the former and from below on the latter.
The inverse cos is also known as acos or cos^-1.
### References
M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”,
10th printing, 1964, pp. 79. [http://www.math.sfu.ca/~cbm/aands/](http://www.math.sfu.ca/~cbm/aands/)
### Examples
We expect the arccos of 1 to be 0, and of -1 to be pi:
>>> import maxframe.tensor as mt
```pycon
>>> mt.arccos([1, -1]).execute()
array([ 0. , 3.14159265])
```
Plot arccos:
```pycon
>>> import matplotlib.pyplot as plt
>>> x = mt.linspace(-1, 1, num=100)
>>> plt.plot(x.execute(), mt.arccos(x).execute())
>>> plt.axis('tight')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.arccosh.md
# maxframe.tensor.arccosh
### maxframe.tensor.arccosh(x, out=None, where=None, \*\*kwargs)
Inverse hyperbolic cosine, element-wise.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**arccosh** – Array of the same shape as x.
* **Return type:**
Tensor
#### SEE ALSO
[`cosh`](maxframe.tensor.cosh.md#maxframe.tensor.cosh), [`arcsinh`](maxframe.tensor.arcsinh.md#maxframe.tensor.arcsinh), [`sinh`](maxframe.tensor.sinh.md#maxframe.tensor.sinh), [`arctanh`](maxframe.tensor.arctanh.md#maxframe.tensor.arctanh), [`tanh`](maxframe.tensor.tanh.md#maxframe.tensor.tanh)
### Notes
arccosh is a multivalued function: for each x there are infinitely
many numbers z such that cosh(z) = x. The convention is to return the
z whose imaginary part lies in [-pi, pi] and the real part in
`[0, inf]`.
For real-valued input data types, arccosh always returns real output.
For each value that cannot be expressed as a real number or infinity, it
yields `nan` and sets the invalid floating point error flag.
For complex-valued input, arccosh is a complex analytical function that
has a branch cut [-inf, 1] and is continuous from above on it.
### References
* <a id='id1'>**[1]**</a> M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 86. [http://www.math.sfu.ca/~cbm/aands/](http://www.math.sfu.ca/~cbm/aands/)
* <a id='id2'>**[2]**</a> Wikipedia, “Inverse hyperbolic function”, [http://en.wikipedia.org/wiki/Arccosh](http://en.wikipedia.org/wiki/Arccosh)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.arccosh([mt.e, 10.0]).execute()
array([ 1.65745445, 2.99322285])
>>> mt.arccosh(1).execute()
0.0
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.arcsin.md
# maxframe.tensor.arcsin
### maxframe.tensor.arcsin(x, out=None, where=None, \*\*kwargs)
Inverse sine, element-wise.
* **Parameters:**
* **x** (*array_like*) – y-coordinate on the unit circle.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**angle** – The inverse sine of each element in x, in radians and in the
closed interval `[-pi/2, pi/2]`. If x is a scalar, a scalar
is returned, otherwise a tensor.
* **Return type:**
Tensor
#### SEE ALSO
[`sin`](maxframe.tensor.sin.md#maxframe.tensor.sin), [`cos`](maxframe.tensor.cos.md#maxframe.tensor.cos), [`arccos`](maxframe.tensor.arccos.md#maxframe.tensor.arccos), [`tan`](maxframe.tensor.tan.md#maxframe.tensor.tan), [`arctan`](maxframe.tensor.arctan.md#maxframe.tensor.arctan), [`arctan2`](maxframe.tensor.arctan2.md#maxframe.tensor.arctan2), `emath.arcsin`
### Notes
arcsin is a multivalued function: for each x there are infinitely
many numbers z such that $sin(z) = x$. The convention is to
return the angle z whose real part lies in [-pi/2, pi/2].
For real-valued input data types, *arcsin* always returns real output.
For each value that cannot be expressed as a real number or infinity,
it yields `nan` and sets the invalid floating point error flag.
For complex-valued input, arcsin is a complex analytic function that
has, by convention, the branch cuts [-inf, -1] and [1, inf] and is
continuous from above on the former and from below on the latter.
The inverse sine is also known as asin or sin^{-1}.
### References
Abramowitz, M. and Stegun, I. A., *Handbook of Mathematical Functions*,
10th printing, New York: Dover, 1964, pp. 79ff.
[http://www.math.sfu.ca/~cbm/aands/](http://www.math.sfu.ca/~cbm/aands/)
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> mt.arcsin(1).execute() # pi/2
1.5707963267948966
>>> mt.arcsin(-1).execute() # -pi/2
-1.5707963267948966
>>> mt.arcsin(0).execute()
0.0
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.arcsinh.md
# maxframe.tensor.arcsinh
### maxframe.tensor.arcsinh(x, out=None, where=None, \*\*kwargs)
Inverse hyperbolic sine element-wise.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Tensor of of the same shape as x.
* **Return type:**
Tensor
### Notes
arcsinh is a multivalued function: for each x there are infinitely
many numbers z such that sinh(z) = x. The convention is to return the
z whose imaginary part lies in [-pi/2, pi/2].
For real-valued input data types, arcsinh always returns real output.
For each value that cannot be expressed as a real number or infinity, it
returns `nan` and sets the invalid floating point error flag.
For complex-valued input, arccos is a complex analytical function that
has branch cuts [1j, infj] and [-1j, -infj] and is continuous from
the right on the former and from the left on the latter.
The inverse hyperbolic sine is also known as asinh or `sinh^-1`.
### References
* <a id='id1'>**[1]**</a> M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 86. [http://www.math.sfu.ca/~cbm/aands/](http://www.math.sfu.ca/~cbm/aands/)
* <a id='id2'>**[2]**</a> Wikipedia, “Inverse hyperbolic function”, [http://en.wikipedia.org/wiki/Arcsinh](http://en.wikipedia.org/wiki/Arcsinh)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.arcsinh(mt.array([mt.e, 10.0])).execute()
array([ 1.72538256, 2.99822295])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.arctan.md
# maxframe.tensor.arctan
### maxframe.tensor.arctan(x, out=None, where=None, \*\*kwargs)
Trigonometric inverse tangent, element-wise.
The inverse of tan, so that if `y = tan(x)` then `x = arctan(y)`.
* **Parameters:**
* **x** (*array_like*)
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Out has the same shape as x. Its real part is in
`[-pi/2, pi/2]` (`arctan(+/-inf)` returns `+/-pi/2`).
It is a scalar if x is a scalar.
* **Return type:**
Tensor
#### SEE ALSO
[`arctan2`](maxframe.tensor.arctan2.md#maxframe.tensor.arctan2)
: The “four quadrant” arctan of the angle formed by (x, y) and the positive x-axis.
[`angle`](maxframe.tensor.angle.md#maxframe.tensor.angle)
: Argument of complex values.
### Notes
arctan is a multi-valued function: for each x there are infinitely
many numbers z such that tan(z) = x. The convention is to return
the angle z whose real part lies in [-pi/2, pi/2].
For real-valued input data types, arctan always returns real output.
For each value that cannot be expressed as a real number or infinity,
it yields `nan` and sets the invalid floating point error flag.
For complex-valued input, arctan is a complex analytic function that
has [1j, infj] and [-1j, -infj] as branch cuts, and is continuous
from the left on the former and from the right on the latter.
The inverse tangent is also known as atan or tan^{-1}.
### References
Abramowitz, M. and Stegun, I. A., *Handbook of Mathematical Functions*,
10th printing, New York: Dover, 1964, pp. 79.
[http://www.math.sfu.ca/~cbm/aands/](http://www.math.sfu.ca/~cbm/aands/)
### Examples
We expect the arctan of 0 to be 0, and of 1 to be pi/4:
>>> import maxframe.tensor as mt
```pycon
>>> mt.arctan([0, 1]).execute()
array([ 0. , 0.78539816])
```
```pycon
>>> mt.pi/4
0.78539816339744828
```
Plot arctan:
```pycon
>>> import matplotlib.pyplot as plt
>>> x = mt.linspace(-10, 10)
>>> plt.plot(x.execute(), mt.arctan(x).execute())
>>> plt.axis('tight')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.arctan2.md
# maxframe.tensor.arctan2
### maxframe.tensor.arctan2(x1, x2, out=None, where=None, \*\*kwargs)
Element-wise arc tangent of `x1/x2` choosing the quadrant correctly.
The quadrant (i.e., branch) is chosen so that `arctan2(x1, x2)` is
the signed angle in radians between the ray ending at the origin and
passing through the point (1,0), and the ray ending at the origin and
passing through the point (x2, x1). (Note the role reversal: the
“y-coordinate” is the first function parameter, the “x-coordinate”
is the second.) By IEEE convention, this function is defined for
x2 = +/-0 and for either or both of x1 and x2 = +/-inf (see
Notes for specific values).
This function is not defined for complex-valued arguments; for the
so-called argument of complex values, use angle.
* **Parameters:**
* **x1** (*array_like* *,* *real-valued*) – y-coordinates.
* **x2** (*array_like* *,* *real-valued*) – x-coordinates. x2 must be broadcastable to match the shape of
x1 or vice versa.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**angle** – Array of angles in radians, in the range `[-pi, pi]`.
* **Return type:**
Tensor
#### SEE ALSO
[`arctan`](maxframe.tensor.arctan.md#maxframe.tensor.arctan), [`tan`](maxframe.tensor.tan.md#maxframe.tensor.tan), [`angle`](maxframe.tensor.angle.md#maxframe.tensor.angle)
### Notes
*arctan2* is identical to the atan2 function of the underlying
C library. The following special values are defined in the C
standard: <sup>[1](#id2)</sup>
| x1 | x2 | arctan2(x1,x2) |
|--------|--------|------------------|
| +/- 0 | +0 | +/- 0 |
| +/- 0 | -0 | +/- pi |
| > 0 | +/-inf | +0 / +pi |
| < 0 | +/-inf | -0 / -pi |
| +/-inf | +inf | +/- (pi/4) |
| +/-inf | -inf | +/- (3\*pi/4) |
Note that +0 and -0 are distinct floating point numbers, as are +inf
and -inf.
### References
* <a id='id2'>**[1]**</a> ISO/IEC standard 9899:1999, “Programming language C.”
### Examples
Consider four points in different quadrants:
>>> import maxframe.tensor as mt
```pycon
>>> x = mt.array([-1, +1, +1, -1])
>>> y = mt.array([-1, -1, +1, +1])
>>> (mt.arctan2(y, x) * 180 / mt.pi).execute()
array([-135., -45., 45., 135.])
```
Note the order of the parameters. arctan2 is defined also when x2 = 0
and at several other special points, obtaining values in
the range `[-pi, pi]`:
```pycon
>>> mt.arctan2([1., -1.], [0., 0.]).execute()
array([ 1.57079633, -1.57079633])
>>> mt.arctan2([0., 0., mt.inf], [+0., -0., mt.inf]).execute()
array([ 0. , 3.14159265, 0.78539816])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.arctanh.md
# maxframe.tensor.arctanh
### maxframe.tensor.arctanh(x, out=None, where=None, \*\*kwargs)
Inverse hyperbolic tangent element-wise.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Array of the same shape as x.
* **Return type:**
Tensor
### Notes
arctanh is a multivalued function: for each x there are infinitely
many numbers z such that tanh(z) = x. The convention is to return
the z whose imaginary part lies in [-pi/2, pi/2].
For real-valued input data types, arctanh always returns real output.
For each value that cannot be expressed as a real number or infinity,
it yields `nan` and sets the invalid floating point error flag.
For complex-valued input, arctanh is a complex analytical function
that has branch cuts [-1, -inf] and [1, inf] and is continuous from
above on the former and from below on the latter.
The inverse hyperbolic tangent is also known as atanh or `tanh^-1`.
### References
* <a id='id1'>**[1]**</a> M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 86. [http://www.math.sfu.ca/~cbm/aands/](http://www.math.sfu.ca/~cbm/aands/)
* <a id='id2'>**[2]**</a> Wikipedia, “Inverse hyperbolic function”, [http://en.wikipedia.org/wiki/Arctanh](http://en.wikipedia.org/wiki/Arctanh)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.arctanh([0, -0.5]).execute()
array([ 0. , -0.54930614])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.argmax.md
# maxframe.tensor.argmax
### maxframe.tensor.argmax(a, axis=None, out=None)
Returns the indices of the maximum values along an axis.
* **Parameters:**
* **a** (*array_like*) – Input tensor.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – By default, the index is into the flattened tensor, otherwise
along the specified axis.
* **out** (*Tensor* *,* *optional*) – If provided, the result will be inserted into this tensor. It should
be of the appropriate shape and dtype.
* **Returns:**
**index_array** – Tensor of indices into the tensor. It has the same shape as a.shape
with the dimension along axis removed.
* **Return type:**
Tensor of ints
#### SEE ALSO
`Tensor.argmax`, [`argmin`](maxframe.tensor.argmin.md#maxframe.tensor.argmin)
`amax`
: The maximum value along a given axis.
[`unravel_index`](maxframe.tensor.unravel_index.md#maxframe.tensor.unravel_index)
: Convert a flat index into an index tuple.
### Notes
In case of multiple occurrences of the maximum values, the indices
corresponding to the first occurrence are returned.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.arange(6).reshape(2,3)
>>> a.execute()
array([[0, 1, 2],
[3, 4, 5]])
>>> mt.argmax(a).execute()
5
>>> mt.argmax(a, axis=0).execute()
array([1, 1, 1])
>>> mt.argmax(a, axis=1).execute()
array([2, 2])
```
Indexes of the maximal elements of a N-dimensional tensor:
```pycon
>>> ind = mt.unravel_index(mt.argmax(a, axis=None), a.shape)
>>> ind.execute()
(1, 2)
>>> a[ind].execute()
```
```pycon
>>> b = mt.arange(6)
>>> b[1] = 5
>>> b.execute()
array([0, 5, 2, 3, 4, 5])
>>> mt.argmax(b).execute() # Only the first occurrence is returned.
1
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.argmin.md
# maxframe.tensor.argmin
### maxframe.tensor.argmin(a, axis=None, out=None)
Returns the indices of the minimum values along an axis.
* **Parameters:**
* **a** (*array_like*) – Input tensor.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – By default, the index is into the flattened tensor, otherwise
along the specified axis.
* **out** (*Tensor* *,* *optional*) – If provided, the result will be inserted into this tensor. It should
be of the appropriate shape and dtype.
* **Returns:**
**index_array** – Tensor of indices into the tensor. It has the same shape as a.shape
with the dimension along axis removed.
* **Return type:**
Tensor of ints
#### SEE ALSO
`Tensor.argmin`, [`argmax`](maxframe.tensor.argmax.md#maxframe.tensor.argmax)
`amin`
: The minimum value along a given axis.
[`unravel_index`](maxframe.tensor.unravel_index.md#maxframe.tensor.unravel_index)
: Convert a flat index into an index tuple.
### Notes
In case of multiple occurrences of the minimum values, the indices
corresponding to the first occurrence are returned.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.arange(6).reshape(2,3)
>>> a.execute()
array([[0, 1, 2],
[3, 4, 5]])
>>> mt.argmin(a).execute()
0
>>> mt.argmin(a, axis=0).execute()
array([0, 0, 0])
>>> mt.argmin(a, axis=1).execute()
array([0, 0])
```
Indices of the minimum elements of a N-dimensional tensor:
```pycon
>>> ind = mt.unravel_index(mt.argmin(a, axis=None), a.shape)
>>> ind.execute()
(0, 0)
>>> a[ind]
```
```pycon
>>> b = mt.arange(6)
>>> b[4] = 0
>>> b.execute()
array([0, 1, 2, 3, 0, 5])
>>> mt.argmin(b).execute() # Only the first occurrence is returned.
0
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.argpartition.md
# maxframe.tensor.argpartition
### maxframe.tensor.argpartition(a, kth, axis=-1, kind='introselect', order=None, \*\*kw)
Perform an indirect partition along the given axis using the
algorithm specified by the kind keyword. It returns an array of
indices of the same shape as a that index data along the given
axis in partitioned order.
#### Versionadded
Added in version 1.8.0.
* **Parameters:**
* **a** (*array_like*) – Tensor to sort.
* **kth** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *sequence* *of* *ints*) – Element index to partition by. The k-th element will be in its
final sorted position and all smaller elements will be moved
before it and all larger elements behind it. The order all
elements in the partitions is undefined. If provided with a
sequence of k-th it will partition all of them into their sorted
position at once.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *None* *,* *optional*) – Axis along which to sort. The default is -1 (the last axis). If
None, the flattened tensor is used.
* **kind** ( *{'introselect'}* *,* *optional*) – Selection algorithm. Default is ‘introselect’
* **order** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – When a is a tensor with fields defined, this argument
specifies which fields to compare first, second, etc. A single
field can be specified as a string, and not all fields need be
specified, but unspecified fields will still be used, in the
order in which they come up in the dtype, to break ties.
* **Returns:**
**index_tensor** – Tensor of indices that partition a along the specified axis.
If a is one-dimensional, `a[index_tensor]` yields a partitioned a.
More generally, `np.take_along_axis(a, index_tensor, axis=a)` always
yields the partitioned a, irrespective of dimensionality.
* **Return type:**
Tensor, [int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
[`partition`](maxframe.tensor.partition.md#maxframe.tensor.partition)
: Describes partition algorithms used.
`Tensor.partition`
: Inplace partition.
[`argsort`](maxframe.tensor.argsort.md#maxframe.tensor.argsort)
: Full indirect sort
### Notes
See partition for notes on the different selection algorithms.
### Examples
One dimensional tensor:
```pycon
>>> import maxframe.tensor as mt
>>> x = mt.array([3, 4, 2, 1])
>>> x[mt.argpartition(x, 3)].execute()
array([2, 1, 3, 4])
>>> x[mt.argpartition(x, (1, 3))].execute()
array([1, 2, 3, 4])
```
```pycon
>>> x = [3, 4, 2, 1]
>>> mt.array(x)[mt.argpartition(x, 3)].execute()
array([2, 1, 3, 4])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.argsort.md
# maxframe.tensor.argsort
### maxframe.tensor.argsort(a, axis=-1, kind=None, order=None, , stable=None, parallel_kind=None, psrs_kinds=None)
Returns the indices that would sort a tensor.
Perform an indirect sort along the given axis using the algorithm specified
by the kind keyword. It returns a tensor of indices of the same shape as
a that index data along the given axis in sorted order.
* **Parameters:**
* **a** (*array_like*) – Tensor to sort.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *None* *,* *optional*) – Axis along which to sort. The default is -1 (the last axis). If None,
the flattened tensor is used.
* **kind** ( *{'quicksort'* *,* *'mergesort'* *,* *'heapsort'* *,* *'stable'}* *,* *optional*) –
Sorting algorithm. The default is ‘quicksort’. Note that both ‘stable’
and ‘mergesort’ use timsort under the covers and, in general, the
actual implementation will vary with data type. The ‘mergesort’ option
is retained for backwards compatibility.
#### Versionchanged
Changed in version 1.15.0.: The ‘stable’ option was added.
* **order** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – When a is a tensor with fields defined, this argument specifies
which fields to compare first, second, etc. A single field can
be specified as a string, and not all fields need be specified,
but unspecified fields will still be used, in the order in which
they come up in the dtype, to break ties.
* **stable** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Sort stability. If True, the returned array will maintain the relative
order of a values which compare as equal. If False or None, this
is not guaranteed. Internally, this option selects kind=’stable’.
Default: None.
* **Returns:**
**index_tensor** – Tensor of indices that sort a along the specified axis.
If a is one-dimensional, `a[index_tensor]` yields a sorted a.
More generally, `np.take_along_axis(a, index_tensor, axis=axis)`
always yields the sorted a, irrespective of dimensionality.
* **Return type:**
Tensor, [int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
[`sort`](maxframe.tensor.sort.md#maxframe.tensor.sort)
: Describes sorting algorithms used.
`lexsort`
: Indirect stable sort with multiple keys.
`Tensor.sort`
: Inplace sort.
[`argpartition`](maxframe.tensor.argpartition.md#maxframe.tensor.argpartition)
: Indirect partial sort.
### Notes
See sort for notes on the different sorting algorithms.
### Examples
One dimensional tensor:
```pycon
>>> import maxframe.tensor as mt
>>> x = mt.array([3, 1, 2])
>>> mt.argsort(x).execute()
array([1, 2, 0])
```
Two-dimensional tensor:
```pycon
>>> x = mt.array([[0, 3], [2, 2]])
>>> x.execute()
array([[0, 3],
[2, 2]])
```
```pycon
>>> ind = mt.argsort(x, axis=0) # sorts along first axis (down)
>>> ind.execute()
array([[0, 1],
[1, 0]])
#>>> mt.take_along_axis(x, ind, axis=0).execute() # same as np.sort(x, axis=0)
#array([[0, 2],
# [2, 3]])
```
```pycon
>>> ind = mt.argsort(x, axis=1) # sorts along last axis (across)
>>> ind.execute()
array([[0, 1],
[0, 1]])
#>>> mt.take_along_axis(x, ind, axis=1).execute() # same as np.sort(x, axis=1)
#array([[0, 3],
# [2, 2]])
```
Indices of the sorted elements of a N-dimensional array:
```pycon
>>> ind = mt.unravel_index(mt.argsort(x, axis=None), x.shape)
>>> ind.execute()
(array([0, 1, 1, 0]), array([0, 0, 1, 1]))
>>> x[ind].execute() # same as np.sort(x, axis=None)
array([0, 2, 2, 3])
```
Sorting with keys:
```pycon
>>> x = mt.array([(1, 0), (0, 1)], dtype=[('x', '<i4'), ('y', '<i4')])
>>> x.execute()
array([(1, 0), (0, 1)],
dtype=[('x', '<i4'), ('y', '<i4')])
```
```pycon
>>> mt.argsort(x, order=('x','y')).execute()
array([1, 0])
```
```pycon
>>> mt.argsort(x, order=('y','x')).execute()
array([0, 1])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.argwhere.md
# maxframe.tensor.argwhere
### maxframe.tensor.argwhere(a)
Find the indices of tensor elements that are non-zero, grouped by element.
* **Parameters:**
**a** (*array_like*) – Input data.
* **Returns:**
**index_tensor** – Indices of elements that are non-zero. Indices are grouped by element.
* **Return type:**
Tensor
#### SEE ALSO
[`where`](maxframe.tensor.where.md#maxframe.tensor.where), [`nonzero`](maxframe.tensor.nonzero.md#maxframe.tensor.nonzero)
### Notes
`mt.argwhere(a)` is the same as `mt.transpose(mt.nonzero(a))`.
The output of `argwhere` is not suitable for indexing tensors.
For this purpose use `nonzero(a)` instead.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(6).reshape(2,3)
>>> x.execute()
array([[0, 1, 2],
[3, 4, 5]])
>>> mt.argwhere(x>1).execute()
array([[0, 2],
[1, 0],
[1, 1],
[1, 2]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.around.md
# maxframe.tensor.around
### maxframe.tensor.around(a, decimals=0, out=None)
Evenly round to the given number of decimals.
* **Parameters:**
* **a** (*array_like*) – Input data.
* **decimals** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Number of decimal places to round to (default: 0). If
decimals is negative, it specifies the number of positions to
the left of the decimal point.
* **out** (*Tensor* *,* *optional*) – Alternative output tensor in which to place the result. It must have
the same shape as the expected output, but the type of the output
values will be cast if necessary.
* **Returns:**
**rounded_array** – An tensor of the same type as a, containing the rounded values.
Unless out was specified, a new tensor is created. A reference to
the result is returned.
The real and imaginary parts of complex numbers are rounded
separately. The result of rounding a float is a float.
* **Return type:**
Tensor
#### SEE ALSO
`Tensor.round`
: equivalent method
[`ceil`](maxframe.tensor.ceil.md#maxframe.tensor.ceil), [`fix`](maxframe.tensor.fix.md#maxframe.tensor.fix), [`floor`](maxframe.tensor.floor.md#maxframe.tensor.floor), [`rint`](maxframe.tensor.rint.md#maxframe.tensor.rint), [`trunc`](maxframe.tensor.trunc.md#maxframe.tensor.trunc)
### Notes
For values exactly halfway between rounded decimal values, NumPy
rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0,
-0.5 and 0.5 round to 0.0, etc. Results may also be surprising due
to the inexact representation of decimal fractions in the IEEE
floating point standard <sup>[1](#id2)</sup> and errors introduced when scaling
by powers of ten.
### References
* <a id='id2'>**[1]**</a> “Lecture Notes on the Status of IEEE 754”, William Kahan, [http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF](http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF)
* <a id='id3'>**[2]**</a> “How Futile are Mindless Assessments of Roundoff in Floating-Point Computation?”, William Kahan, [http://www.cs.berkeley.edu/~wkahan/Mindless.pdf](http://www.cs.berkeley.edu/~wkahan/Mindless.pdf)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.around([0.37, 1.64]).execute()
array([ 0., 2.])
>>> mt.around([0.37, 1.64], decimals=1).execute()
array([ 0.4, 1.6])
>>> mt.around([.5, 1.5, 2.5, 3.5, 4.5]).execute() # rounds to nearest even value
array([ 0., 2., 2., 4., 4.])
>>> mt.around([1,2,3,11], decimals=1).execute() # tensor of ints is returned
array([ 1, 2, 3, 11])
>>> mt.around([1,2,3,11], decimals=-1).execute()
array([ 0, 0, 0, 10])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.array.md
# maxframe.tensor.array
### maxframe.tensor.array(x, dtype=None, copy=True, order='K', ndmin=None, chunk_size=None)
Create a tensor.
* **Parameters:**
* **object** (*array_like*) – An array, any object exposing the array interface, an object whose
\_\_array_\_ method returns an array, or any (nested) sequence.
* **dtype** (*data-type* *,* *optional*) – The desired data-type for the array. If not given, then the type will
be determined as the minimum type required to hold the objects in the
sequence. This argument can only be used to ‘upcast’ the array. For
downcasting, use the .astype(t) method.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If true (default), then the object is copied. Otherwise, a copy will
only be made if \_\_array_\_ returns a copy, if obj is a nested sequence,
or if a copy is needed to satisfy any of the other requirements
(dtype, order, etc.).
* **order** ( *{'K'* *,* *'A'* *,* *'C'* *,* *'F'}* *,* *optional*) –
Specify the memory layout of the array. If object is not an array, the
newly created array will be in C order (row major) unless ‘F’ is
specified, in which case it will be in Fortran order (column major).
If object is an array the following holds.
| order | no copy | copy=True |
|---------|-----------|-----------------------------------------------------|
| ’K’ | unchanged | F & C order preserved, otherwise most similar order |
| ’A’ | unchanged | F order if input is F and not C, otherwise C order |
| ’C’ | C order | C order |
| ’F’ | F order | F order |
When `copy=False` and a copy is made for other reasons, the result is
the same as if `copy=True`, with some exceptions for A, see the
Notes section. The default order is ‘K’.
* **ndmin** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Specifies the minimum number of dimensions that the resulting
array should have. Ones will be prepended to the shape as
needed to meet this requirement.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *,* *optional*) – Specifies chunk size for each dimension.
* **Returns:**
**out** – An tensor object satisfying the specified requirements.
* **Return type:**
Tensor
#### SEE ALSO
[`empty`](maxframe.tensor.empty.md#maxframe.tensor.empty), [`empty_like`](maxframe.tensor.empty_like.md#maxframe.tensor.empty_like), [`zeros`](maxframe.tensor.zeros.md#maxframe.tensor.zeros), `zeros_like`, [`ones`](maxframe.tensor.ones.md#maxframe.tensor.ones), `ones_like`, [`full`](maxframe.tensor.full.md#maxframe.tensor.full), [`full_like`](maxframe.tensor.full_like.md#maxframe.tensor.full_like)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.array([1, 2, 3]).execute()
array([1, 2, 3])
```
Upcasting:
```pycon
>>> mt.array([1, 2, 3.0]).execute()
array([ 1., 2., 3.])
```
More than one dimension:
```pycon
>>> mt.array([[1, 2], [3, 4]]).execute()
array([[1, 2],
[3, 4]])
```
Minimum dimensions 2:
```pycon
>>> mt.array([1, 2, 3], ndmin=2).execute()
array([[1, 2, 3]])
```
Type provided:
```pycon
>>> mt.array([1, 2, 3], dtype=complex).execute()
array([ 1.+0.j, 2.+0.j, 3.+0.j])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.array_equal.md
# maxframe.tensor.array_equal
### maxframe.tensor.array_equal(a1, a2)
True if two tensors have the same shape and elements, False otherwise.
* **Parameters:**
* **a1** (*array_like*) – Input arrays.
* **a2** (*array_like*) – Input arrays.
* **Returns:**
**b** – Returns True if the tensors are equal.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`allclose`](maxframe.tensor.allclose.md#maxframe.tensor.allclose)
: Returns True if two tensors are element-wise equal within a tolerance.
`array_equiv`
: Returns True if input tensors are shape consistent and all elements equal.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.array_equal([1, 2], [1, 2]).execute()
True
>>> mt.array_equal(mt.array([1, 2]), mt.array([1, 2])).execute()
True
>>> mt.array_equal([1, 2], [1, 2, 3]).execute()
False
>>> mt.array_equal([1, 2], [1, 4]).execute()
False
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.array_split.md
# maxframe.tensor.array_split
### maxframe.tensor.array_split(a, indices_or_sections, axis=0)
Split a tensor into multiple sub-tensors.
Please refer to the `split` documentation. The only difference
between these functions is that `array_split` allows
indices_or_sections to be an integer that does *not* equally
divide the axis. For a tensor of length l that should be split
into n sections, it returns l % n sub-arrays of size l//n + 1
and the rest of size l//n.
#### SEE ALSO
[`split`](maxframe.tensor.split.md#maxframe.tensor.split)
: Split tensor into multiple sub-tensors of equal size.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(8.0)
>>> mt.array_split(x, 3).execute()
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7.])]
```
```pycon
>>> x = mt.arange(7.0)
>>> mt.array_split(x, 3).execute()
[array([ 0., 1., 2.]), array([ 3., 4.]), array([ 5., 6.])]
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.asarray.md
# maxframe.tensor.asarray
### maxframe.tensor.asarray(x, dtype=None, order=None, chunk_size=None)
Convert the input to an array.
* **Parameters:**
* **a** (*array_like*) – Input data, in any form that can be converted to a tensor. This
includes lists, lists of tuples, tuples, tuples of tuples, tuples
of lists and tensors.
* **dtype** (*data-type* *,* *optional*) – By default, the data-type is inferred from the input data.
* **order** ( *{'C'* *,* *'F'}* *,* *optional*) – Whether to use row-major (C-style) or
column-major (Fortran-style) memory representation.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *,* *optional*) – Specifies chunk size for each dimension.
* **Returns:**
**out** – Tensor interpretation of a. No copy is performed if the input
is already an ndarray with matching dtype and order. If a is a
subclass of ndarray, a base class ndarray is returned.
* **Return type:**
Tensor
#### SEE ALSO
`ascontiguousarray`
: Convert input to a contiguous tensor.
`asfortranarray`
: Convert input to a tensor with column-major memory order.
### Examples
Convert a list into a tensor:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = [1, 2]
>>> mt.asarray(a).execute()
array([1, 2])
```
Existing arrays are not copied:
```pycon
>>> a = mt.array([1, 2])
>>> mt.asarray(a) is a
True
```
If dtype is set, array is copied only if dtype does not match:
```pycon
>>> a = mt.array([1, 2], dtype=mt.float32)
>>> mt.asarray(a, dtype=mt.float32) is a
True
>>> mt.asarray(a, dtype=mt.float64) is a
False
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.atleast_1d.md
# maxframe.tensor.atleast_1d
### maxframe.tensor.atleast_1d(\*tensors)
Convert inputs to tensors with at least one dimension.
Scalar inputs are converted to 1-dimensional tensors, whilst
higher-dimensional inputs are preserved.
* **Parameters:**
* **tensors1** (*array_like*) – One or more input tensors.
* **tensors2** (*array_like*) – One or more input tensors.
* **...** (*array_like*) – One or more input tensors.
* **Returns:**
**ret** – An tensor, or list of tensors, each with `a.ndim >= 1`.
Copies are made only if necessary.
* **Return type:**
Tensor
#### SEE ALSO
[`atleast_2d`](maxframe.tensor.atleast_2d.md#maxframe.tensor.atleast_2d), [`atleast_3d`](maxframe.tensor.atleast_3d.md#maxframe.tensor.atleast_3d)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.atleast_1d(1.0).execute()
array([ 1.])
```
```pycon
>>> x = mt.arange(9.0).reshape(3,3)
>>> mt.atleast_1d(x).execute()
array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.]])
>>> mt.atleast_1d(x) is x
True
```
```pycon
>>> mt.atleast_1d(1, [3, 4]).execute()
[array([1]), array([3, 4])]
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.atleast_2d.md
# maxframe.tensor.atleast_2d
### maxframe.tensor.atleast_2d(\*tensors)
View inputs as tensors with at least two dimensions.
* **Parameters:**
* **tensors1** (*array_like*) – One or more array-like sequences. Non-tensor inputs are converted
to tensors. Tensors that already have two or more dimensions are
preserved.
* **tensors2** (*array_like*) – One or more array-like sequences. Non-tensor inputs are converted
to tensors. Tensors that already have two or more dimensions are
preserved.
* **...** (*array_like*) – One or more array-like sequences. Non-tensor inputs are converted
to tensors. Tensors that already have two or more dimensions are
preserved.
* **Returns:**
**res, res2, …** – A tensor, or list of tensors, each with `a.ndim >= 2`.
Copies are avoided where possible, and views with two or more
dimensions are returned.
* **Return type:**
Tensor
#### SEE ALSO
[`atleast_1d`](maxframe.tensor.atleast_1d.md#maxframe.tensor.atleast_1d), [`atleast_3d`](maxframe.tensor.atleast_3d.md#maxframe.tensor.atleast_3d)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.atleast_2d(3.0).execute()
array([[ 3.]])
```
```pycon
>>> x = mt.arange(3.0)
>>> mt.atleast_2d(x).execute()
array([[ 0., 1., 2.]])
```
```pycon
>>> mt.atleast_2d(1, [1, 2], [[1, 2]]).execute()
[array([[1]]), array([[1, 2]]), array([[1, 2]])]
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.atleast_3d.md
# maxframe.tensor.atleast_3d
### maxframe.tensor.atleast_3d(\*tensors)
View inputs as tensors with at least three dimensions.
* **Parameters:**
* **tensors1** (*array_like*) – One or more tensor-like sequences. Non-tensor inputs are converted to
tensors. Tensors that already have three or more dimensions are
preserved.
* **tensors2** (*array_like*) – One or more tensor-like sequences. Non-tensor inputs are converted to
tensors. Tensors that already have three or more dimensions are
preserved.
* **...** (*array_like*) – One or more tensor-like sequences. Non-tensor inputs are converted to
tensors. Tensors that already have three or more dimensions are
preserved.
* **Returns:**
**res1, res2, …** – A tensor, or list of tensors, each with `a.ndim >= 3`. Copies are
avoided where possible, and views with three or more dimensions are
returned. For example, a 1-D tensor of shape `(N,)` becomes a view
of shape `(1, N, 1)`, and a 2-D tensor of shape `(M, N)` becomes a
view of shape `(M, N, 1)`.
* **Return type:**
Tensor
#### SEE ALSO
[`atleast_1d`](maxframe.tensor.atleast_1d.md#maxframe.tensor.atleast_1d), [`atleast_2d`](maxframe.tensor.atleast_2d.md#maxframe.tensor.atleast_2d)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.atleast_3d(3.0).execute()
array([[[ 3.]]])
```
```pycon
>>> x = mt.arange(3.0)
>>> mt.atleast_3d(x).shape
(1, 3, 1)
```
```pycon
>>> x = mt.arange(12.0).reshape(4,3)
>>> mt.atleast_3d(x).shape
(4, 3, 1)
```
```pycon
>>> for arr in mt.atleast_3d([1, 2], [[1, 2]], [[[1, 2]]]).execute():
... print(arr, arr.shape)
...
[[[1]
[2]]] (1, 2, 1)
[[[1]
[2]]] (1, 2, 1)
[[[1 2]]] (1, 1, 2)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.average.md
# maxframe.tensor.average
### maxframe.tensor.average(a, axis=None, weights=None, returned=False)
Compute the weighted average along the specified axis.
* **Parameters:**
* **a** (*array_like*) – Tensor containing data to be averaged. If a is not a tensor, a
conversion is attempted.
* **axis** (*None* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) –
Axis or axes along which to average a. The default,
axis=None, will average over all of the elements of the input tensor.
If axis is negative it counts from the last to the first axis.
If axis is a tuple of ints, averaging is performed on all of the axes
specified in the tuple instead of a single axis or all the axes as
before.
* **weights** (*array_like* *,* *optional*) – A tensor of weights associated with the values in a. Each value in
a contributes to the average according to its associated weight.
The weights tensor can either be 1-D (in which case its length must be
the size of a along the given axis) or of the same shape as a.
If weights=None, then all data in a are assumed to have a
weight equal to one.
* **returned** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Default is False. If True, the tuple (average, sum_of_weights)
is returned, otherwise only the average is returned.
If weights=None, sum_of_weights is equivalent to the number of
elements over which the average is taken.
* **Returns:**
**average, [sum_of_weights]** – Return the average along the specified axis. When returned is True,
return a tuple with the average as the first element and the sum
of the weights as the second element. The return type is Float
if a is of integer type, otherwise it is of the same type as a.
sum_of_weights is of the same type as average.
* **Return type:**
tensor_type or double
* **Raises:**
* [**ZeroDivisionError**](https://docs.python.org/3/library/exceptions.html#ZeroDivisionError) – When all weights along axis are zero. See numpy.ma.average for a
version robust to this type of error.
* [**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError) – When the length of 1D weights is not the same as the shape of a
along axis.
#### SEE ALSO
[`mean`](maxframe.tensor.mean.md#maxframe.tensor.mean)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> data = list(range(1,5))
>>> data
[1, 2, 3, 4]
>>> mt.average(data).execute()
2.5
>>> mt.average(range(1,11), weights=range(10,0,-1)).execute()
4.0
```
```pycon
>>> data = mt.arange(6).reshape((3,2))
>>> data.execute()
array([[0, 1],
[2, 3],
[4, 5]])
>>> mt.average(data, axis=1, weights=[1./4, 3./4]).execute()
array([ 0.75, 2.75, 4.75])
>>> mt.average(data, weights=[1./4, 3./4]).execute()
Traceback (most recent call last):
...
TypeError: Axis must be specified when shapes of a and weights differ.
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.bincount.md
# maxframe.tensor.bincount
### maxframe.tensor.bincount(x, weights=None, minlength=0, chunk_size_limit=None)
Count number of occurrences of each value in array of non-negative ints.
The number of bins (of size 1) is one larger than the largest value in
x. If minlength is specified, there will be at least this number
of bins in the output array (though it will be longer if necessary,
depending on the contents of x).
Each bin gives the number of occurrences of its index value in x.
If weights is specified the input array is weighted by it, i.e. if a
value `n` is found at position `i`, `out[n] += weight[i]` instead
of `out[n] += 1`.
* **Parameters:**
* **x** (*tensor* *or* *array_like* *,* *1 dimension* *,* *nonnegative ints*) – Input array.
* **weights** (*tensor* *or* *array_like* *,* *optional*) – Weights, array of the same shape as x.
* **minlength** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – A minimum number of bins for the output array.
* **Returns:**
**out** – The result of binning the input array.
The length of out is equal to `np.amax(x)+1`.
* **Return type:**
tensor of ints
* **Raises:**
* [**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If the input is not 1-dimensional, or contains elements with negative
values, or if minlength is negative.
* [**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError) – If the type of the input is float or complex.
#### SEE ALSO
[`histogram`](maxframe.tensor.histogram.md#maxframe.tensor.histogram), [`digitize`](maxframe.tensor.digitize.md#maxframe.tensor.digitize), [`unique`](maxframe.tensor.unique.md#maxframe.tensor.unique)
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> mt.bincount(mt.arange(5)).execute()
array([1, 1, 1, 1, 1])
>>> mt.bincount(mt.tensor([0, 1, 1, 3, 2, 1, 7])).execute()
array([1, 3, 1, 1, 0, 0, 0, 1])
```
The input array needs to be of integer dtype, otherwise a
TypeError is raised:
```pycon
>>> mt.bincount(mt.arange(5, dtype=float)).execute()
Traceback (most recent call last):
....execute()
TypeError: Cannot cast array data from dtype('float64') to dtype('int64')
according to the rule 'safe'
```
A possible use of `bincount` is to perform sums over
variable-size chunks of an array, using the `weights` keyword.
```pycon
>>> w = mt.array([0.3, 0.5, 0.2, 0.7, 1., -0.6]) # weights
>>> x = mt.array([0, 1, 1, 2, 2, 2])
>>> mt.bincount(x, weights=w).execute()
array([ 0.3, 0.7, 1.1])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.bitwise_and.md
# maxframe.tensor.bitwise_and
### maxframe.tensor.bitwise_and(x1, x2, out=None, where=None, \*\*kwargs)
Compute the bit-wise AND of two tensors element-wise.
Computes the bit-wise AND of the underlying binary representation of
the integers in the input arrays. This ufunc implements the C/Python
operator `&`.
* **Parameters:**
* **x1** (*array_like*) – Only integer and boolean types are handled.
* **x2** (*array_like*) – Only integer and boolean types are handled.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Result.
* **Return type:**
array_like
#### SEE ALSO
[`logical_and`](maxframe.tensor.logical_and.md#maxframe.tensor.logical_and), [`bitwise_or`](maxframe.tensor.bitwise_or.md#maxframe.tensor.bitwise_or), [`bitwise_xor`](maxframe.tensor.bitwise_xor.md#maxframe.tensor.bitwise_xor)
### Examples
The number 13 is represented by `00001101`. Likewise, 17 is
represented by `00010001`. The bit-wise AND of 13 and 17 is
therefore `000000001`, or 1:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.bitwise_and(13, 17).execute()
1
```
```pycon
>>> mt.bitwise_and(14, 13).execute()
12
>>> mt.bitwise_and([14,3], 13).execute()
array([12, 1])
```
```pycon
>>> mt.bitwise_and([11,7], [4,25]).execute()
array([0, 1])
>>> mt.bitwise_and(mt.array([2,5,255]), mt.array([3,14,16])).execute()
array([ 2, 4, 16])
>>> mt.bitwise_and([True, True], [False, True]).execute()
array([False, True])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.bitwise_or.md
# maxframe.tensor.bitwise_or
### maxframe.tensor.bitwise_or(x1, x2, out=None, where=None, \*\*kwargs)
Compute the bit-wise OR of two tensors element-wise.
Computes the bit-wise OR of the underlying binary representation of
the integers in the input arrays. This ufunc implements the C/Python
operator `|`.
* **Parameters:**
* **x1** (*array_like*) – Only integer and boolean types are handled.
* **x2** (*array_like*) – Only integer and boolean types are handled.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Result.
* **Return type:**
array_like
#### SEE ALSO
[`logical_or`](maxframe.tensor.logical_or.md#maxframe.tensor.logical_or), [`bitwise_and`](maxframe.tensor.bitwise_and.md#maxframe.tensor.bitwise_and), [`bitwise_xor`](maxframe.tensor.bitwise_xor.md#maxframe.tensor.bitwise_xor)
`binary_repr`
: Return the binary representation of the input number as a string.
### Examples
The number 13 has the binaray representation `00001101`. Likewise,
16 is represented by `00010000`. The bit-wise OR of 13 and 16 is
then `000111011`, or 29:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.bitwise_or(13, 16).execute()
29
```
```pycon
>>> mt.bitwise_or(32, 2).execute()
34
>>> mt.bitwise_or([33, 4], 1).execute()
array([33, 5])
>>> mt.bitwise_or([33, 4], [1, 2]).execute()
array([33, 6])
```
```pycon
>>> mt.bitwise_or(mt.array([2, 5, 255]), mt.array([4, 4, 4])).execute()
array([ 6, 5, 255])
>>> (mt.array([2, 5, 255]) | mt.array([4, 4, 4])).execute()
array([ 6, 5, 255])
>>> mt.bitwise_or(mt.array([2, 5, 255, 2147483647], dtype=mt.int32),
... mt.array([4, 4, 4, 2147483647], dtype=mt.int32)).execute()
array([ 6, 5, 255, 2147483647])
>>> mt.bitwise_or([True, True], [False, True]).execute()
array([ True, True])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.bitwise_xor.md
# maxframe.tensor.bitwise_xor
### maxframe.tensor.bitwise_xor(x1, x2, out=None, where=None, \*\*kwargs)
Compute the bit-wise XOR of two arrays element-wise.
Computes the bit-wise XOR of the underlying binary representation of
the integers in the input arrays. This ufunc implements the C/Python
operator `^`.
* **Parameters:**
* **x1** (*array_like*) – Only integer and boolean types are handled.
* **x2** (*array_like*) – Only integer and boolean types are handled.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Result.
* **Return type:**
array_like
#### SEE ALSO
[`logical_xor`](maxframe.tensor.logical_xor.md#maxframe.tensor.logical_xor), [`bitwise_and`](maxframe.tensor.bitwise_and.md#maxframe.tensor.bitwise_and), [`bitwise_or`](maxframe.tensor.bitwise_or.md#maxframe.tensor.bitwise_or)
`binary_repr`
: Return the binary representation of the input number as a string.
### Examples
The number 13 is represented by `00001101`. Likewise, 17 is
represented by `00010001`. The bit-wise XOR of 13 and 17 is
therefore `00011100`, or 28:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.bitwise_xor(13, 17).execute()
28
```
```pycon
>>> mt.bitwise_xor(31, 5).execute()
26
>>> mt.bitwise_xor([31,3], 5).execute()
array([26, 6])
```
```pycon
>>> mt.bitwise_xor([31,3], [5,6]).execute()
array([26, 5])
>>> mt.bitwise_xor([True, True], [False, True]).execute()
array([ True, False])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.broadcast_arrays.md
# maxframe.tensor.broadcast_arrays
### maxframe.tensor.broadcast_arrays(\*args, \*\*kwargs)
Broadcast any number of arrays against each other.
* **Parameters:**
**\*args** (*array_likes*) – The tensors to broadcast.
* **Returns:**
**broadcasted**
* **Return type:**
[list](https://docs.python.org/3/library/stdtypes.html#list) of tensors
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([[1,2,3]])
>>> y = mt.array([[1],[2],[3]])
>>> mt.broadcast_arrays(x, y).execute()
[array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]), array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])]
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.broadcast_to.md
# maxframe.tensor.broadcast_to
### maxframe.tensor.broadcast_to(tensor, shape)
Broadcast a tensor to a new shape.
* **Parameters:**
* **tensor** (*array_like*) – The tensor to broadcast.
* **shape** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple)) – The shape of the desired array.
* **Returns:**
**broadcast**
* **Return type:**
Tensor
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If the tensor is not compatible with the new shape according to MaxFrame’s
broadcasting rules.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([1, 2, 3])
>>> mt.broadcast_to(x, (3, 3)).execute()
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.c_.md
# maxframe.tensor.c_
### maxframe.tensor.c_ *= <maxframe.tensor.lib.index_tricks.CClass object>*
Translates slice objects to concatenation along the second axis.
This is short-hand for `mt.r_['-1,2,0', index expression]`, which is
useful because of its common occurrence. In particular, tensors will be
stacked along their last axis after being upgraded to at least 2-D with
1’s post-pended to the shape (column vectors made out of 1-D tensors).
#### SEE ALSO
`column_stack`
: Stack 1-D tensors as columns into a 2-D tensor.
[`r_`](maxframe.tensor.r_.md#maxframe.tensor.r_)
: For more detailed documentation.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.c_[mt.array([1,2,3]), mt.array([4,5,6])].execute()
array([[1, 4],
[2, 5],
[3, 6]])
>>> mt.c_[mt.array([[1,2,3]]), 0, 0, mt.array([[4,5,6]])].execute()
array([[1, 2, 3, ..., 4, 5, 6]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.cbrt.md
# maxframe.tensor.cbrt
### maxframe.tensor.cbrt(x, out=None, where=None, \*\*kwargs)
Return the cube-root of an tensor, element-wise.
* **Parameters:**
* **x** (*array_like*) – The values whose cube-roots are required.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – An tensor of the same shape as x, containing the cube
cube-root of each element in x.
If out was provided, y is a reference to it.
* **Return type:**
Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.cbrt([1,8,27]).execute()
array([ 1., 2., 3.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.ceil.md
# maxframe.tensor.ceil
### maxframe.tensor.ceil(x, out=None, where=None, \*\*kwargs)
Return the ceiling of the input, element-wise.
The ceil of the scalar x is the smallest integer i, such that
i >= x. It is often denoted as $\lceil x \rceil$.
* **Parameters:**
* **x** (*array_like*) – Input data.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The ceiling of each element in x, with float dtype.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`floor`](maxframe.tensor.floor.md#maxframe.tensor.floor), [`trunc`](maxframe.tensor.trunc.md#maxframe.tensor.trunc), [`rint`](maxframe.tensor.rint.md#maxframe.tensor.rint)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([-1.7, -1.5, -0.2, 0.2, 1.5, 1.7, 2.0])
>>> mt.ceil(a).execute()
array([-1., -1., -0., 1., 2., 2., 2.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.choose.md
# maxframe.tensor.choose
### maxframe.tensor.choose(a, choices, out=None, mode='raise')
Construct a tensor from an index tensor and a set of tensors to choose from.
First of all, if confused or uncertain, definitely look at the Examples -
in its full generality, this function is less simple than it might
seem from the following code description (below ndi =
mt.lib.index_tricks):
`mt.choose(a,c) == mt.array([c[a[I]][I] for I in ndi.ndindex(a.shape)])`.
But this omits some subtleties. Here is a fully general summary:
Given an “index” tensor (a) of integers and a sequence of n tensors
(choices), a and each choice tensor are first broadcast, as necessary,
to tensors of a common shape; calling these *Ba* and *Bchoices[i], i =
0,…,n-1* we have that, necessarily, `Ba.shape == Bchoices[i].shape`
for each i. Then, a new array with shape `Ba.shape` is created as
follows:
* if `mode=raise` (the default), then, first of all, each element of
a (and thus Ba) must be in the range [0, n-1]; now, suppose that
i (in that range) is the value at the (j0, j1, …, jm) position
in Ba - then the value at the same position in the new array is the
value in Bchoices[i] at that same position;
* if `mode=wrap`, values in a (and thus Ba) may be any (signed)
integer; modular arithmetic is used to map integers outside the range
[0, n-1] back into that range; and then the new array is constructed
as above;
* if `mode=clip`, values in a (and thus Ba) may be any (signed)
integer; negative integers are mapped to 0; values greater than n-1
are mapped to n-1; and then the new tensor is constructed as above.
* **Parameters:**
* **a** (*int tensor*) – This tensor must contain integers in [0, n-1], where n is the number
of choices, unless `mode=wrap` or `mode=clip`, in which cases any
integers are permissible.
* **choices** (*sequence* *of* *tensors*) – Choice tensors. a and all of the choices must be broadcastable to the
same shape. If choices is itself a tensor (not recommended), then
its outermost dimension (i.e., the one corresponding to
`choices.shape[0]`) is taken as defining the “sequence”.
* **out** (*tensor* *,* *optional*) – If provided, the result will be inserted into this tensor. It should
be of the appropriate shape and dtype.
* **mode** ( *{'raise'* *(**default* *)* *,* *'wrap'* *,* *'clip'}* *,* *optional*) –
Specifies how indices outside [0, n-1] will be treated:
> * ’raise’ : an exception is raised
> * ’wrap’ : value becomes value mod n
> * ’clip’ : values < 0 are mapped to 0, values > n-1 are mapped to n-1
* **Returns:**
**merged_array** – The merged result.
* **Return type:**
Tensor
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – shape mismatch: If a and each choice tensor are not all broadcastable to the same
shape.
#### SEE ALSO
`Tensor.choose`
: equivalent method
### Notes
To reduce the chance of misinterpretation, even though the following
“abuse” is nominally supported, choices should neither be, nor be
thought of as, a single tensor, i.e., the outermost sequence-like container
should be either a list or a tuple.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> choices = [[0, 1, 2, 3], [10, 11, 12, 13],
... [20, 21, 22, 23], [30, 31, 32, 33]]
>>> mt.choose([2, 3, 1, 0], choices
... # the first element of the result will be the first element of the
... # third (2+1) "array" in choices, namely, 20; the second element
... # will be the second element of the fourth (3+1) choice array, i.e.,
... # 31, etc.
... ).execute()
array([20, 31, 12, 3])
>>> mt.choose([2, 4, 1, 0], choices, mode='clip').execute() # 4 goes to 3 (4-1)
array([20, 31, 12, 3])
>>> # because there are 4 choice arrays
>>> mt.choose([2, 4, 1, 0], choices, mode='wrap').execute() # 4 goes to (4 mod 4)
array([20, 1, 12, 3])
>>> # i.e., 0
```
A couple examples illustrating how choose broadcasts:
```pycon
>>> a = [[1, 0, 1], [0, 1, 0], [1, 0, 1]]
>>> choices = [-10, 10]
>>> mt.choose(a, choices).execute()
array([[ 10, -10, 10],
[-10, 10, -10],
[ 10, -10, 10]])
```
```pycon
>>> # With thanks to Anne Archibald
>>> a = mt.array([0, 1]).reshape((2,1,1))
>>> c1 = mt.array([1, 2, 3]).reshape((1,3,1))
>>> c2 = mt.array([-1, -2, -3, -4, -5]).reshape((1,1,5))
>>> mt.choose(a, (c1, c2)).execute() # result is 2x3x5, res[0,:,:]=c1, res[1,:,:]=c2
array([[[ 1, 1, 1, 1, 1],
[ 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3]],
[[-1, -2, -3, -4, -5],
[-1, -2, -3, -4, -5],
[-1, -2, -3, -4, -5]]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.compress.md
# maxframe.tensor.compress
### maxframe.tensor.compress(condition, a, axis=None, out=None)
Return selected slices of a tensor along given axis.
When working along a given axis, a slice along that axis is returned in
output for each index where condition evaluates to True. When
working on a 1-D array, compress is equivalent to extract.
* **Parameters:**
* **condition** (*1-D tensor* *of* *bools*) – Tensor that selects which entries to return. If len(condition)
is less than the size of a along the given axis, then output is
truncated to the length of the condition tensor.
* **a** (*array_like*) – Tensor from which to extract a part.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which to take slices. If None (default), work on the
flattened tensor.
* **out** (*Tensor* *,* *optional*) – Output tensor. Its type is preserved and it must be of the right
shape to hold the output.
* **Returns:**
**compressed_array** – A copy of a without the slices along axis for which condition
is false.
* **Return type:**
Tensor
#### SEE ALSO
[`take`](maxframe.tensor.take.md#maxframe.tensor.take), [`choose`](maxframe.tensor.choose.md#maxframe.tensor.choose), [`diag`](maxframe.tensor.diag.md#maxframe.tensor.diag), `diagonal`, [`select`](https://docs.python.org/3/library/select.html#module-select)
`Tensor.compress`
: Equivalent method in ndarray
`mt.extract`
: Equivalent method when working on 1-D arrays
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[1, 2], [3, 4], [5, 6]])
>>> a.execute()
array([[1, 2],
[3, 4],
[5, 6]])
>>> mt.compress([0, 1], a, axis=0).execute()
array([[3, 4]])
>>> mt.compress([False, True, True], a, axis=0).execute()
array([[3, 4],
[5, 6]])
>>> mt.compress([False, True], a, axis=1).execute()
array([[2],
[4],
[6]])
```
Working on the flattened tensor does not return slices along an axis but
selects elements.
```pycon
>>> mt.compress([False, True], a).execute()
array([2])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.concatenate.md
# maxframe.tensor.concatenate
### maxframe.tensor.concatenate(tensors, axis=0)
Join a sequence of arrays along an existing axis.
* **Parameters:**
* **a1** (*sequence* *of* *array_like*) – The tensors must have the same shape, except in the dimension
corresponding to axis (the first, by default).
* **a2** (*sequence* *of* *array_like*) – The tensors must have the same shape, except in the dimension
corresponding to axis (the first, by default).
* **...** (*sequence* *of* *array_like*) – The tensors must have the same shape, except in the dimension
corresponding to axis (the first, by default).
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The axis along which the tensors will be joined. Default is 0.
* **Returns:**
**res** – The concatenated tensor.
* **Return type:**
Tensor
#### SEE ALSO
`stack`
: Stack a sequence of tensors along a new axis.
[`vstack`](maxframe.tensor.vstack.md#maxframe.tensor.vstack)
: Stack tensors in sequence vertically (row wise)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[1, 2], [3, 4]])
>>> b = mt.array([[5, 6]])
>>> mt.concatenate((a, b), axis=0).execute()
array([[1, 2],
[3, 4],
[5, 6]])
>>> mt.concatenate((a, b.T), axis=1).execute()
array([[1, 2, 5],
[3, 4, 6]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.conj.md
# maxframe.tensor.conj
### maxframe.tensor.conj(x, out=None, where=None, \*\*kwargs)
Return the complex conjugate, element-wise.
The complex conjugate of a complex number is obtained by changing the
sign of its imaginary part.
* **Parameters:**
* **x** (*array_like*) – Input value.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The complex conjugate of x, with same dtype as y.
* **Return type:**
Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.conjugate(1+2j).execute()
(1-2j)
```
```pycon
>>> x = mt.eye(2) + 1j * mt.eye(2)
>>> mt.conjugate(x).execute()
array([[ 1.-1.j, 0.-0.j],
[ 0.-0.j, 1.-1.j]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.copysign.md
# maxframe.tensor.copysign
### maxframe.tensor.copysign(x1, x2, out=None, where=None, \*\*kwargs)
Change the sign of x1 to that of x2, element-wise.
If both arguments are arrays or sequences, they have to be of the same
length. If x2 is a scalar, its sign will be copied to all elements of
x1.
* **Parameters:**
* **x1** (*array_like*) – Values to change the sign of.
* **x2** (*array_like*) – The sign of x2 is copied to x1.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – The values of x1 with the sign of x2.
* **Return type:**
array_like
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.copysign(1.3, -1).execute()
-1.3
>>> (1/mt.copysign(0, 1)).execute()
inf
>>> (1/mt.copysign(0, -1)).execute()
-inf
```
```pycon
>>> mt.copysign([-1, 0, 1], -1.1).execute()
array([-1., -0., -1.])
>>> mt.copysign([-1, 0, 1], mt.arange(3)-1).execute()
array([-1., 0., 1.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.copyto.md
# maxframe.tensor.copyto
### maxframe.tensor.copyto(dst, src, casting='same_kind', where=True)
Copies values from one array to another, broadcasting as necessary.
Raises a TypeError if the casting rule is violated, and if
where is provided, it selects which elements to copy.
* **Parameters:**
* **dst** (*Tensor*) – The tensor into which values are copied.
* **src** (*array_like*) – The tensor from which values are copied.
* **casting** ( *{'no'* *,* *'equiv'* *,* *'safe'* *,* *'same_kind'* *,* *'unsafe'}* *,* *optional*) –
Controls what kind of data casting may occur when copying.
> * ’no’ means the data types should not be cast at all.
> * ’equiv’ means only byte-order changes are allowed.
> * ’safe’ means only casts which can preserve values are allowed.
> * ’same_kind’ means only safe casts or casts within a kind,
> like float64 to float32, are allowed.
> * ’unsafe’ means any data conversions may be done.
* **where** (*array_like* *of* [*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – A boolean tensor which is broadcasted to match the dimensions
of dst, and selects elements to copy from src to dst
wherever it contains the value True.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.core.Tensor.T.md
# maxframe.tensor.core.Tensor.T
#### *property* Tensor.T
Same as self.transpose(), except that self is returned if
self.ndim < 2.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([[1.,2.],[3.,4.]])
>>> x.execute()
array([[ 1., 2.],
[ 3., 4.]])
>>> x.T.execute()
array([[ 1., 3.],
[ 2., 4.]])
>>> x = mt.array([1.,2.,3.,4.])
>>> x.execute()
array([ 1., 2., 3., 4.])
>>> x.T.execute()
array([ 1., 2., 3., 4.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.core.Tensor.flatten.md
# maxframe.tensor.core.Tensor.flatten
#### Tensor.flatten(order='C')
Return a copy of the tensor collapsed into one dimension.
* **Parameters:**
**order** ( *{'C'* *,* *'F'* *,* *'A'* *,* *'K'}* *,* *optional*) – ‘C’ means to flatten in row-major (C-style) order.
‘F’ means to flatten in column-major (Fortran-
style) order. ‘A’ means to flatten in column-major
order if a is Fortran *contiguous* in memory,
row-major order otherwise. ‘K’ means to flatten
a in the order the elements occur in memory.
The default is ‘C’.
* **Returns:**
**y** – A copy of the input tensor, flattened to one dimension.
* **Return type:**
Tensor
#### SEE ALSO
`ravel`
: Return a flattened tensor.
`flat`
: A 1-D flat iterator over the tensor.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[1,2], [3,4]])
>>> a.flatten().execute()
array([1, 2, 3, 4])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.corrcoef.md
# maxframe.tensor.corrcoef
### maxframe.tensor.corrcoef(x, y=None, rowvar=True)
Return Pearson product-moment correlation coefficients.
Please refer to the documentation for cov for more detail. The
relationship between the correlation coefficient matrix, R, and the
covariance matrix, C, is
$$
R_{ij} = \frac{ C_{ij} } { \sqrt{ C_{ii} * C_{jj} } }
$$
The values of R are between -1 and 1, inclusive.
* **Parameters:**
* **x** (*array_like*) – A 1-D or 2-D array containing multiple variables and observations.
Each row of x represents a variable, and each column a single
observation of all those variables. Also see rowvar below.
* **y** (*array_like* *,* *optional*) – An additional set of variables and observations. y has the same
shape as x.
* **rowvar** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If rowvar is True (default), then each row represents a
variable, with observations in the columns. Otherwise, the relationship
is transposed: each column represents a variable, while the rows
contain observations.
* **Returns:**
**R** – The correlation coefficient matrix of the variables.
* **Return type:**
Tensor
#### SEE ALSO
[`cov`](maxframe.tensor.cov.md#maxframe.tensor.cov)
: Covariance matrix
### Notes
Due to floating point rounding the resulting tensor may not be Hermitian,
the diagonal elements may not be 1, and the elements may not satisfy the
inequality abs(a) <= 1. The real and imaginary parts are clipped to the
interval [-1, 1] in an attempt to improve on that situation but is not
much help in the complex case.
This function accepts but discards arguments bias and ddof. This is
for backwards compatibility with previous versions of this function. These
arguments had no effect on the return values of the function and can be
safely ignored in this and previous versions of numpy.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.cos.md
# maxframe.tensor.cos
### maxframe.tensor.cos(x, out=None, where=None, \*\*kwargs)
Cosine element-wise.
* **Parameters:**
* **x** (*array_like*) – Input tensor in radians.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated array is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The corresponding cosine values.
* **Return type:**
Tensor
### Notes
If out is provided, the function writes the result into it,
and returns a reference to out. (See Examples)
### References
M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions.
New York, NY: Dover, 1972.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.cos(mt.array([0, mt.pi/2, mt.pi])).execute()
array([ 1.00000000e+00, 6.12303177e-17, -1.00000000e+00])
>>>
>>> # Example of providing the optional output parameter
>>> out1 = mt.empty(1)
>>> out2 = mt.cos([0.1], out1)
>>> out2 is out1
True
>>>
>>> # Example of ValueError due to provision of shape mis-matched `out`
>>> mt.cos(mt.zeros((3,3)),mt.zeros((2,2)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operators could not be broadcast together with shapes (3,3) (2,2)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.cosh.md
# maxframe.tensor.cosh
### maxframe.tensor.cosh(x, out=None, where=None, \*\*kwargs)
Hyperbolic cosine, element-wise.
Equivalent to `1/2 * (mt.exp(x) + mt.exp(-x))` and `mt.cos(1j*x)`.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Output array of same shape as x.
* **Return type:**
Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.cosh(0).execute()
1.0
```
The hyperbolic cosine describes the shape of a hanging cable:
```pycon
>>> import matplotlib.pyplot as plt
>>> x = mt.linspace(-4, 4, 1000)
>>> plt.plot(x.execute(), mt.cosh(x).execute())
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.count_nonzero.md
# maxframe.tensor.count_nonzero
### maxframe.tensor.count_nonzero(a, axis=None)
Counts the number of non-zero values in the tensor `a`.
The word “non-zero” is in reference to the Python 2.x
built-in method `__nonzero__()` (renamed `__bool__()`
in Python 3.x) of Python objects that tests an object’s
“truthfulness”. For example, any number is considered
truthful if it is nonzero, whereas any string is considered
truthful if it is not the empty string. Thus, this function
(recursively) counts how many elements in `a` (and in
sub-tensors thereof) have their `__nonzero__()` or `__bool__()`
method evaluated to `True`.
* **Parameters:**
* **a** (*array_like*) – The tensor for which to count non-zeros.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *,* *optional*) – Axis or tuple of axes along which to count non-zeros.
Default is None, meaning that non-zeros will be counted
along a flattened version of `a`.
* **Returns:**
**count** – Number of non-zero values in the array along a given axis.
Otherwise, the total number of non-zero values in the tensor
is returned.
* **Return type:**
[int](https://docs.python.org/3/library/functions.html#int) or tensor of [int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
[`nonzero`](maxframe.tensor.nonzero.md#maxframe.tensor.nonzero)
: Return the coordinates of all the non-zero values.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.count_nonzero(mt.eye(4)).execute()
4
>>> mt.count_nonzero([[0,1,7,0,0],[3,0,0,2,19]]).execute()
5
>>> mt.count_nonzero([[0,1,7,0,0],[3,0,0,2,19]], axis=0).execute()
array([1, 1, 1, 1, 1])
>>> mt.count_nonzero([[0,1,7,0,0],[3,0,0,2,19]], axis=1).execute()
array([2, 3])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.cov.md
# maxframe.tensor.cov
### maxframe.tensor.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)
Estimate a covariance matrix, given data and weights.
Covariance indicates the level to which two variables vary together.
If we examine N-dimensional samples, $X = [x_1, x_2, ... x_N]^T$,
then the covariance matrix element $C_{ij}$ is the covariance of
$x_i$ and $x_j$. The element $C_{ii}$ is the variance
of $x_i$.
See the notes for an outline of the algorithm.
* **Parameters:**
* **m** (*array_like*) – A 1-D or 2-D array containing multiple variables and observations.
Each row of m represents a variable, and each column a single
observation of all those variables. Also see rowvar below.
* **y** (*array_like* *,* *optional*) – An additional set of variables and observations. y has the same form
as that of m.
* **rowvar** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If rowvar is True (default), then each row represents a
variable, with observations in the columns. Otherwise, the relationship
is transposed: each column represents a variable, while the rows
contain observations.
* **bias** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Default normalization (False) is by `(N - 1)`, where `N` is the
number of observations given (unbiased estimate). If bias is True,
then normalization is by `N`. These values can be overridden by using
the keyword `ddof` in numpy versions >= 1.5.
* **ddof** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – If not `None` the default value implied by bias is overridden.
Note that `ddof=1` will return the unbiased estimate, even if both
fweights and aweights are specified, and `ddof=0` will return
the simple average. See the notes for the details. The default value
is `None`.
* **fweights** (*array_like* *,* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – 1-D tensor of integer freguency weights; the number of times each
observation vector should be repeated.
* **aweights** (*array_like* *,* *optional*) – 1-D tensor of observation vector weights. These relative weights are
typically large for observations considered “important” and smaller for
observations considered less “important”. If `ddof=0` the array of
weights can be used to assign probabilities to observation vectors.
* **Returns:**
**out** – The covariance matrix of the variables.
* **Return type:**
Tensor
#### SEE ALSO
[`corrcoef`](maxframe.tensor.corrcoef.md#maxframe.tensor.corrcoef)
: Normalized covariance matrix
### Notes
Assume that the observations are in the columns of the observation
array m and let `f = fweights` and `a = aweights` for brevity. The
steps to compute the weighted covariance are as follows:
```default
>>> w = f * a
>>> v1 = mt.sum(w)
>>> v2 = mt.sum(w * a)
>>> m -= mt.sum(m * w, axis=1, keepdims=True) / v1
>>> cov = mt.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)
```
Note that when `a == 1`, the normalization factor
`v1 / (v1**2 - ddof * v2)` goes over to `1 / (np.sum(f) - ddof)`
as it should.
### Examples
Consider two variables, $x_0$ and $x_1$, which
correlate perfectly, but in opposite directions:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([[0, 2], [1, 1], [2, 0]]).T
>>> x.execute()
array([[0, 1, 2],
[2, 1, 0]])
```
Note how $x_0$ increases while $x_1$ decreases. The covariance
matrix shows this clearly:
```pycon
>>> mt.cov(x).execute()
array([[ 1., -1.],
[-1., 1.]])
```
Note that element $C_{0,1}$, which shows the correlation between
$x_0$ and $x_1$, is negative.
Further, note how x and y are combined:
```pycon
>>> x = [-2.1, -1, 4.3]
>>> y = [3, 1.1, 0.12]
>>> X = mt.stack((x, y), axis=0)
>>> print(mt.cov(X).execute())
[[ 11.71 -4.286 ]
[ -4.286 2.14413333]]
>>> print(mt.cov(x, y).execute())
[[ 11.71 -4.286 ]
[ -4.286 2.14413333]]
>>> print(mt.cov(x).execute())
11.71
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.cumprod.md
# maxframe.tensor.cumprod
### maxframe.tensor.cumprod(a, axis=None, dtype=None, out=None)
Return the cumulative product of elements along a given axis.
* **Parameters:**
* **a** (*array_like*) – Input tensor.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which the cumulative product is computed. By default
the input is flattened.
* **dtype** (*dtype* *,* *optional*) – Type of the returned tensor, as well as of the accumulator in which
the elements are multiplied. If *dtype* is not specified, it
defaults to the dtype of a, unless a has an integer dtype with
a precision less than that of the default platform integer. In
that case, the default platform integer is used instead.
* **out** (*Tensor* *,* *optional*) – Alternative output tensor in which to place the result. It must
have the same shape and buffer length as the expected output
but the type of the resulting values will be cast if necessary.
* **Returns:**
**cumprod** – A new tensor holding the result is returned unless out is
specified, in which case a reference to out is returned.
* **Return type:**
Tensor
#### SEE ALSO
`numpy.doc.ufuncs`
: Section “Output arguments”
### Notes
Arithmetic is modular when using integer types, and no error is
raised on overflow.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([1,2,3])
>>> mt.cumprod(a).execute() # intermediate results 1, 1*2
... # total product 1*2*3 = 6
array([1, 2, 6])
>>> a = mt.array([[1, 2, 3], [4, 5, 6]])
>>> mt.cumprod(a, dtype=float).execute() # specify type of output
array([ 1., 2., 6., 24., 120., 720.])
```
The cumulative product for each column (i.e., over the rows) of a:
```pycon
>>> mt.cumprod(a, axis=0).execute()
array([[ 1, 2, 3],
[ 4, 10, 18]])
```
The cumulative product for each row (i.e. over the columns) of a:
```pycon
>>> mt.cumprod(a,axis=1).execute()
array([[ 1, 2, 6],
[ 4, 20, 120]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.cumsum.md
# maxframe.tensor.cumsum
### maxframe.tensor.cumsum(a, axis=None, dtype=None, out=None)
Return the cumulative sum of the elements along a given axis.
* **Parameters:**
* **a** (*array_like*) – Input tensor.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which the cumulative sum is computed. The default
(None) is to compute the cumsum over the flattened tensor.
* **dtype** (*dtype* *,* *optional*) – Type of the returned tensor and of the accumulator in which the
elements are summed. If dtype is not specified, it defaults
to the dtype of a, unless a has an integer dtype with a
precision less than that of the default platform integer. In
that case, the default platform integer is used.
* **out** (*Tensor* *,* *optional*) – Alternative output tensor in which to place the result. It must
have the same shape and buffer length as the expected output
but the type will be cast if necessary. See doc.ufuncs
(Section “Output arguments”) for more details.
* **Returns:**
**cumsum_along_axis** – A new tensor holding the result is returned unless out is
specified, in which case a reference to out is returned. The
result has the same size as a, and the same shape as a if
axis is not None or a is a 1-d tensor.
* **Return type:**
Tensor.
#### SEE ALSO
[`sum`](maxframe.tensor.sum.md#maxframe.tensor.sum)
: Sum tensor elements.
`trapz`
: Integration of tensor values using the composite trapezoidal rule.
[`diff`](maxframe.tensor.diff.md#maxframe.tensor.diff)
: Calculate the n-th discrete difference along given axis.
### Notes
Arithmetic is modular when using integer types, and no error is
raised on overflow.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[1,2,3], [4,5,6]])
>>> a.execute()
array([[1, 2, 3],
[4, 5, 6]])
>>> mt.cumsum(a).execute()
array([ 1, 3, 6, 10, 15, 21])
>>> mt.cumsum(a, dtype=float).execute() # specifies type of output value(s)
array([ 1., 3., 6., 10., 15., 21.])
```
```pycon
>>> mt.cumsum(a,axis=0).execute() # sum over rows for each of the 3 columns
array([[1, 2, 3],
[5, 7, 9]])
>>> mt.cumsum(a,axis=1).execute() # sum over columns for each of the 2 rows
array([[ 1, 3, 6],
[ 4, 9, 15]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.deg2rad.md
# maxframe.tensor.deg2rad
### maxframe.tensor.deg2rad(x, out=None, where=None, \*\*kwargs)
Convert angles from degrees to radians.
* **Parameters:**
* **x** (*array_like*) – Angles in degrees.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The corresponding angle in radians.
* **Return type:**
Tensor
#### SEE ALSO
[`rad2deg`](maxframe.tensor.rad2deg.md#maxframe.tensor.rad2deg)
: Convert angles from radians to degrees.
`unwrap`
: Remove large jumps in angle by wrapping.
### Notes
`deg2rad(x)` is `x * pi / 180`.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.deg2rad(180).execute()
3.1415926535897931
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.degrees.md
# maxframe.tensor.degrees
### maxframe.tensor.degrees(x, out=None, where=None, \*\*kwargs)
Convert angles from radians to degrees.
* **Parameters:**
* **x** (*array_like*) – Input tensor in radians.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The corresponding degree values; if out was supplied this is a
reference to it.
* **Return type:**
Tensor of floats
#### SEE ALSO
[`rad2deg`](maxframe.tensor.rad2deg.md#maxframe.tensor.rad2deg)
: equivalent function
### Examples
Convert a radian array to degrees
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> rad = mt.arange(12.)*mt.pi/6
>>> mt.degrees(rad).execute()
array([ 0., 30., 60., 90., 120., 150., 180., 210., 240.,
270., 300., 330.])
```
```pycon
>>> out = mt.zeros((rad.shape))
>>> r = mt.degrees(out)
>>> mt.all(r == out).execute()
True
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.delete.md
# maxframe.tensor.delete
### maxframe.tensor.delete(arr, obj, axis=None)
Return a new array with sub-arrays along an axis deleted. For a one
dimensional array, this returns those entries not returned by
arr[obj].
* **Parameters:**
* **arr** (*array_like*) – Input array.
* **obj** ([*slice*](https://docs.python.org/3/library/functions.html#slice) *,* [*int*](https://docs.python.org/3/library/functions.html#int) *or* *array* *of* *ints*) – Indicate indices of sub-arrays to remove along the specified axis.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The axis along which to delete the subarray defined by obj.
If axis is None, obj is applied to the flattened array.
* **Returns:**
**out** – A copy of arr with the elements specified by obj removed. Note
that delete does not occur in-place. If axis is None, out is
a flattened array.
* **Return type:**
maxframe.tensor.Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> arr = mt.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
>>> arr.execute()
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> mt.delete(arr, 1, 0).execute()
array([[ 1, 2, 3, 4],
[ 9, 10, 11, 12]])
>>> mt.delete(arr, np.s_[::2], 1).execute()
array([[ 2, 4],
[ 6, 8],
[10, 12]])
>>> mt.delete(arr, [1,3,5], None).execute()
array([ 1, 3, 5, 7, 8, 9, 10, 11, 12])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.diag.md
# maxframe.tensor.diag
### maxframe.tensor.diag(v, k=0, sparse=None, gpu=None, chunk_size=None)
Extract a diagonal or construct a diagonal tensor.
See the more detailed documentation for `mt.diagonal` if you use this
function to extract a diagonal and wish to write to the resulting tensor
* **Parameters:**
* **v** (*array_like*) – If v is a 2-D tensor, return its k-th diagonal.
If v is a 1-D tensor, return a 2-D tensor with v on the k-th
diagonal.
* **k** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Diagonal in question. The default is 0. Use k>0 for diagonals
above the main diagonal, and k<0 for diagonals below the main
diagonal.
* **sparse** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Create sparse tensor if True, False as default
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **Returns:**
**out** – The extracted diagonal or constructed diagonal tensor.
* **Return type:**
Tensor
#### SEE ALSO
`diagonal`
: Return specified diagonals.
[`diagflat`](maxframe.tensor.diagflat.md#maxframe.tensor.diagflat)
: Create a 2-D array with the flattened input as a diagonal.
[`trace`](https://docs.python.org/3/library/trace.html#module-trace)
: Sum along diagonals.
[`triu`](maxframe.tensor.triu.md#maxframe.tensor.triu)
: Upper triangle of a tensor.
[`tril`](maxframe.tensor.tril.md#maxframe.tensor.tril)
: Lower triangle of a tensor.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(9).reshape((3,3))
>>> x.execute()
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
```
```pycon
>>> mt.diag(x).execute()
array([0, 4, 8])
>>> mt.diag(x, k=1).execute()
array([1, 5])
>>> mt.diag(x, k=-1).execute()
array([3, 7])
```
```pycon
>>> mt.diag(mt.diag(x)).execute()
array([[0, 0, 0],
[0, 4, 0],
[0, 0, 8]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.diagflat.md
# maxframe.tensor.diagflat
### maxframe.tensor.diagflat(v, k=0, sparse=None, gpu=None, chunk_size=None)
Create a two-dimensional tensor with the flattened input as a diagonal.
* **Parameters:**
* **v** (*array_like*) – Input data, which is flattened and set as the k-th
diagonal of the output.
* **k** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Diagonal to set; 0, the default, corresponds to the “main” diagonal,
a positive (negative) k giving the number of the diagonal above
(below) the main.
* **sparse** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Create sparse tensor if True, False as default
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **Returns:**
**out** – The 2-D output tensor.
* **Return type:**
Tensor
#### SEE ALSO
[`diag`](maxframe.tensor.diag.md#maxframe.tensor.diag)
: MATLAB work-alike for 1-D and 2-D tensors.
`diagonal`
: Return specified diagonals.
[`trace`](https://docs.python.org/3/library/trace.html#module-trace)
: Sum along diagonals.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.diagflat([[1,2], [3,4]]).execute()
array([[1, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 4]])
```
```pycon
>>> mt.diagflat([1,2], 1).execute()
array([[0, 1, 0],
[0, 0, 2],
[0, 0, 0]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.diff.md
# maxframe.tensor.diff
### maxframe.tensor.diff(a, n=1, axis=-1)
Calculate the n-th discrete difference along the given axis.
The first difference is given by `out[n] = a[n+1] - a[n]` along
the given axis, higher differences are calculated by using diff
recursively.
* **Parameters:**
* **a** (*array_like*) – Input tensor
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The number of times values are differenced. If zero, the input
is returned as-is.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The axis along which the difference is taken, default is the
last axis.
* **Returns:**
**diff** – The n-th differences. The shape of the output is the same as a
except along axis where the dimension is smaller by n. The
type of the output is the same as the type of the difference
between any two elements of a. This is the same as the type of
a in most cases. A notable exception is datetime64, which
results in a timedelta64 output tensor.
* **Return type:**
Tensor
#### SEE ALSO
`gradient`, [`ediff1d`](maxframe.tensor.ediff1d.md#maxframe.tensor.ediff1d), [`cumsum`](maxframe.tensor.cumsum.md#maxframe.tensor.cumsum)
### Notes
Type is preserved for boolean tensors, so the result will contain
False when consecutive elements are the same and True when they
differ.
For unsigned integer tensors, the results will also be unsigned. This
should not be surprising, as the result is consistent with
calculating the difference directly:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> u8_arr = mt.array([1, 0], dtype=mt.uint8)
>>> mt.diff(u8_arr).execute()
array([255], dtype=uint8)
>>> (u8_arr[1,...] - u8_arr[0,...]).execute()
255
```
If this is not desirable, then the array should be cast to a larger
integer type first:
```pycon
>>> i16_arr = u8_arr.astype(mt.int16)
>>> mt.diff(i16_arr).execute()
array([-1], dtype=int16)
```
### Examples
```pycon
>>> x = mt.array([1, 2, 4, 7, 0])
>>> mt.diff(x).execute()
array([ 1, 2, 3, -7])
>>> mt.diff(x, n=2).execute()
array([ 1, 1, -10])
```
```pycon
>>> x = mt.array([[1, 3, 6, 10], [0, 5, 6, 8]])
>>> mt.diff(x).execute()
array([[2, 3, 4],
[5, 1, 2]])
>>> mt.diff(x, axis=0).execute()
array([[-1, 2, 0, -2]])
```
```pycon
>>> x = mt.arange('1066-10-13', '1066-10-16', dtype=mt.datetime64)
>>> mt.diff(x).execute()
array([1, 1], dtype='timedelta64[D]')
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.digitize.md
# maxframe.tensor.digitize
### maxframe.tensor.digitize(x, bins, right=False)
Return the indices of the bins to which each value in input tensor belongs.
Each index `i` returned is such that `bins[i-1] <= x < bins[i]` if
bins is monotonically increasing, or `bins[i-1] > x >= bins[i]` if
bins is monotonically decreasing. If values in x are beyond the
bounds of bins, 0 or `len(bins)` is returned as appropriate. If right
is True, then the right bin is closed so that the index `i` is such
that `bins[i-1] < x <= bins[i]` or `bins[i-1] >= x > bins[i]` if bins
is monotonically increasing or decreasing, respectively.
* **Parameters:**
* **x** (*array_like*) – Input tensor to be binned.
* **bins** (*array_like*) – Array of bins. It has to be 1-dimensional and monotonic.
* **right** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Indicating whether the intervals include the right or the left bin
edge. Default behavior is (right==False) indicating that the interval
does not include the right edge. The left bin end is open in this
case, i.e., bins[i-1] <= x < bins[i] is the default behavior for
monotonically increasing bins.
* **Returns:**
**out** – Output tensor of indices, of same shape as x.
* **Return type:**
Tensor of ints
* **Raises:**
* [**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If bins is not monotonic.
* [**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError) – If the type of the input is complex.
#### SEE ALSO
[`bincount`](maxframe.tensor.bincount.md#maxframe.tensor.bincount), [`histogram`](maxframe.tensor.histogram.md#maxframe.tensor.histogram), [`unique`](maxframe.tensor.unique.md#maxframe.tensor.unique), `searchsorted`
### Notes
If values in x are such that they fall outside the bin range,
attempting to index bins with the indices that digitize returns
will result in an IndexError.
mt.digitize is implemented in terms of mt.searchsorted. This means
that a binary search is used to bin the values, which scales much better
for larger number of bins than the previous linear search. It also removes
the requirement for the input array to be 1-dimensional.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([0.2, 6.4, 3.0, 1.6])
>>> bins = mt.array([0.0, 1.0, 2.5, 4.0, 10.0])
>>> inds = mt.digitize(x, bins)
>>> inds.execute()
array([1, 4, 3, 2])
```
```pycon
>>> x = mt.array([1.2, 10.0, 12.4, 15.5, 20.])
>>> bins = mt.array([0, 5, 10, 15, 20])
>>> mt.digitize(x,bins,right=True).execute()
array([1, 2, 3, 4, 4])
>>> mt.digitize(x,bins,right=False).execute()
array([1, 3, 3, 4, 5])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.divide.md
# maxframe.tensor.divide
### maxframe.tensor.divide(x1, x2, out=None, where=None, \*\*kwargs)
Divide arguments element-wise.
* **Parameters:**
* **x1** (*array_like*) – Dividend tensor.
* **x2** (*array_like*) – Divisor tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated array is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – The quotient x1/x2, element-wise. Returns a scalar if both x1 and x2 are scalars.
* **Return type:**
Tensor
### Notes
Equivalent to x1 / x2 in terms of array-broadcasting.
Behavior on division by zero can be changed using seterr.
In Python 2, when both x1 and x2 are of an integer type, divide will behave like floor_divide.
In Python 3, it behaves like true_divide.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.divide(2.0, 4.0).execute()
0.5
>>> x1 = mt.arange(9.0).reshape((3, 3))
>>> x2 = mt.arange(3.0)
>>> mt.divide(x1, x2).execute()
array([[ NaN, 1. , 1. ],
[ Inf, 4. , 2.5],
[ Inf, 7. , 4. ]])
Note the behavior with integer types (Python 2 only):
>>> mt.divide(2, 4).execute()
0
>>> mt.divide(2, 4.).execute()
0.5
Division by zero always yields zero in integer arithmetic (again, Python 2 only),
and does not raise an exception or a warning:
>>> mt.divide(mt.array([0, 1], dtype=int), mt.array([0, 0], dtype=int)).execute()
array([0, 0])
Division by zero can, however, be caught using seterr:
>>> old_err_state = mt.seterr(divide='raise')
>>> mt.divide(1, 0).execute()
Traceback (most recent call last):
...
FloatingPointError: divide by zero encountered in divide
>>> ignored_states = mt.seterr(**old_err_state)
>>> mt.divide(1, 0).execute()
0
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.dot.md
# maxframe.tensor.dot
### maxframe.tensor.dot(a, b, out=None, sparse=None)
Dot product of two arrays. Specifically,
- If both a and b are 1-D arrays, it is inner product of vectors
(without complex conjugation).
- If both a and b are 2-D arrays, it is matrix multiplication,
but using [`matmul()`](maxframe.tensor.matmul.md#maxframe.tensor.matmul) or `a @ b` is preferred.
- If either a or b is 0-D (scalar), it is equivalent to [`multiply()`](maxframe.tensor.multiply.md#maxframe.tensor.multiply)
and using `numpy.multiply(a, b)` or `a * b` is preferred.
- If a is an N-D array and b is a 1-D array, it is a sum product over
the last axis of a and b.
- If a is an N-D array and b is an M-D array (where `M>=2`), it is a
sum product over the last axis of a and the second-to-last axis of b:
```default
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
```
* **Parameters:**
* **a** (*array_like*) – First argument.
* **b** (*array_like*) – Second argument.
* **out** (*Tensor* *,* *optional*) – Output argument. This must have the exact kind that would be returned
if it was not used. In particular, it must have the right type, must be
C-contiguous, and its dtype must be the dtype that would be returned
for dot(a,b). This is a performance feature. Therefore, if these
conditions are not met, an exception is raised, instead of attempting
to be flexible.
* **Returns:**
**output** – Returns the dot product of a and b. If a and b are both
scalars or both 1-D arrays then a scalar is returned; otherwise
a tensor is returned.
If out is given, then it is returned.
* **Return type:**
Tensor
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If the last dimension of a is not the same size as
the second-to-last dimension of b.
#### SEE ALSO
[`vdot`](maxframe.tensor.vdot.md#maxframe.tensor.vdot)
: Complex-conjugating dot product.
[`tensordot`](maxframe.tensor.tensordot.md#maxframe.tensor.tensordot)
: Sum products over arbitrary axes.
[`einsum`](maxframe.tensor.einsum.md#maxframe.tensor.einsum)
: Einstein summation convention.
[`matmul`](maxframe.tensor.matmul.md#maxframe.tensor.matmul)
: ‘@’ operator as method with out parameter.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.dot(3, 4).execute()
12
```
Neither argument is complex-conjugated:
```pycon
>>> mt.dot([2j, 3j], [2j, 3j]).execute()
(-13+0j)
```
For 2-D arrays it is the matrix product:
```pycon
>>> a = [[1, 0], [0, 1]]
>>> b = [[4, 1], [2, 2]]
>>> mt.dot(a, b).execute()
array([[4, 1],
[2, 2]])
```
```pycon
>>> a = mt.arange(3*4*5*6).reshape((3,4,5,6))
>>> b = mt.arange(3*4*5*6)[::-1].reshape((5,4,6,3))
>>> mt.dot(a, b)[2,3,2,1,2,2].execute()
499128
>>> mt.sum(a[2,3,2,:] * b[1,2,:,2]).execute()
499128
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.dsplit.md
# maxframe.tensor.dsplit
### maxframe.tensor.dsplit(a, indices_or_sections)
Split tensor into multiple sub-tensors along the 3rd axis (depth).
Please refer to the split documentation. dsplit is equivalent
to split with `axis=2`, the array is always split along the third
axis provided the tensor dimension is greater than or equal to 3.
#### SEE ALSO
[`split`](maxframe.tensor.split.md#maxframe.tensor.split)
: Split a tensor into multiple sub-arrays of equal size.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(16.0).reshape(2, 2, 4)
>>> x.execute()
array([[[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.]],
[[ 8., 9., 10., 11.],
[ 12., 13., 14., 15.]]])
>>> mt.dsplit(x, 2).execute()
[array([[[ 0., 1.],
[ 4., 5.]],
[[ 8., 9.],
[ 12., 13.]]]),
array([[[ 2., 3.],
[ 6., 7.]],
[[ 10., 11.],
[ 14., 15.]]])]
>>> mt.dsplit(x, mt.array([3, 6])).execute()
[array([[[ 0., 1., 2.],
[ 4., 5., 6.]],
[[ 8., 9., 10.],
[ 12., 13., 14.]]]),
array([[[ 3.],
[ 7.]],
[[ 11.],
[ 15.]]]),
array([], dtype=float64)]
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.ediff1d.md
# maxframe.tensor.ediff1d
### maxframe.tensor.ediff1d(a, to_end=None, to_begin=None)
The differences between consecutive elements of a tensor.
* **Parameters:**
* **a** (*array_like*) – If necessary, will be flattened before the differences are taken.
* **to_end** (*array_like* *,* *optional*) – Number(s) to append at the end of the returned differences.
* **to_begin** (*array_like* *,* *optional*) – Number(s) to prepend at the beginning of the returned differences.
* **Returns:**
**ediff1d** – The differences. Loosely, this is `a.flat[1:] - a.flat[:-1]`.
* **Return type:**
Tensor
#### SEE ALSO
[`diff`](maxframe.tensor.diff.md#maxframe.tensor.diff), `gradient`
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([1, 2, 4, 7, 0])
>>> mt.ediff1d(x).execute()
array([ 1, 2, 3, -7])
```
```pycon
>>> mt.ediff1d(x, to_begin=-99, to_end=mt.array([88, 99])).execute()
array([-99, 1, 2, 3, -7, 88, 99])
```
The returned tensor is always 1D.
```pycon
>>> y = [[1, 2, 4], [1, 6, 24]]
>>> mt.ediff1d(y).execute()
array([ 1, 2, -3, 5, 18])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.einsum.md
# maxframe.tensor.einsum
### maxframe.tensor.einsum(subscripts, \*operands, dtype=None, order='K', casting='safe', optimize=False)
Evaluates the Einstein summation convention on the operands.
Using the Einstein summation convention, many common multi-dimensional,
linear algebraic array operations can be represented in a simple fashion.
In *implicit* mode einsum computes these values.
In *explicit* mode, einsum provides further flexibility to compute
other array operations that might not be considered classical Einstein
summation operations, by disabling, or forcing summation over specified
subscript labels.
See the notes and examples for clarification.
* **Parameters:**
* **subscripts** ([*str*](https://docs.python.org/3/library/stdtypes.html#str)) – Specifies the subscripts for summation as comma separated list of
subscript labels. An implicit (classical Einstein summation)
calculation is performed unless the explicit indicator ‘->’ is
included as well as subscript labels of the precise output form.
* **operands** ([*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *array_like*) – These are the arrays for the operation.
* **dtype** ( *{data-type* *,* *None}* *,* *optional*) – If provided, forces the calculation to use the data type specified.
Note that you may have to also give a more liberal casting
parameter to allow the conversions. Default is None.
* **order** ( *{'C'* *,* *'F'* *,* *'A'* *,* *'K'}* *,* *optional*) – Controls the memory layout of the output. ‘C’ means it should
be C contiguous. ‘F’ means it should be Fortran contiguous,
‘A’ means it should be ‘F’ if the inputs are all ‘F’, ‘C’ otherwise.
‘K’ means it should be as close to the layout as the inputs as
is possible, including arbitrarily permuted axes.
Default is ‘K’.
* **casting** ( *{'no'* *,* *'equiv'* *,* *'safe'* *,* *'same_kind'* *,* *'unsafe'}* *,* *optional*) –
Controls what kind of data casting may occur. Setting this to
‘unsafe’ is not recommended, as it can adversely affect accumulations.
> * ’no’ means the data types should not be cast at all.
> * ’equiv’ means only byte-order changes are allowed.
> * ’safe’ means only casts which can preserve values are allowed.
> * ’same_kind’ means only safe casts or casts within a kind,
> like float64 to float32, are allowed.
> * ’unsafe’ means any data conversions may be done.
Default is ‘safe’.
* **optimize** ( *{False* *,* *True* *,* *'greedy'* *,* *'optimal'}* *,* *optional*) – Controls if intermediate optimization should occur. No optimization
will occur if False and True will default to the ‘greedy’ algorithm.
Also accepts an explicit contraction list from the `np.einsum_path`
function. See `np.einsum_path` for more details. Defaults to False.
* **Returns:**
* **output** (*maxframe.tensor.Tensor*) – The calculation based on the Einstein summation convention.
* *The Einstein summation convention can be used to compute*
* many multi-dimensional, linear algebraic array operations. einsum
* *provides a succinct way of representing these.*
* *A non-exhaustive list of these operations,*
* which can be computed by einsum, is shown below along with examples
* \* Trace of an array, [`numpy.trace()`](https://numpy.org/doc/stable/reference/generated/numpy.trace.html#numpy.trace).
* \* Return a diagonal, [`numpy.diag()`](https://numpy.org/doc/stable/reference/generated/numpy.diag.html#numpy.diag).
* \* Array axis summations, [`numpy.sum()`](https://numpy.org/doc/stable/reference/generated/numpy.sum.html#numpy.sum).
* \* Transpositions and permutations, [`numpy.transpose()`](https://numpy.org/doc/stable/reference/generated/numpy.transpose.html#numpy.transpose).
* \* Matrix multiplication and dot product, `numpy.matmul()` [`numpy.dot()`](https://numpy.org/doc/stable/reference/generated/numpy.dot.html#numpy.dot).
* \* Vector inner and outer products, [`numpy.inner()`](https://numpy.org/doc/stable/reference/generated/numpy.inner.html#numpy.inner) [`numpy.outer()`](https://numpy.org/doc/stable/reference/generated/numpy.outer.html#numpy.outer).
* \* Broadcasting, element-wise and scalar multiplication, `numpy.multiply()`.
* \* Tensor contractions, [`numpy.tensordot()`](https://numpy.org/doc/stable/reference/generated/numpy.tensordot.html#numpy.tensordot).
* \* Chained array operations, in efficient calculation order, [`numpy.einsum_path()`](https://numpy.org/doc/stable/reference/generated/numpy.einsum_path.html#numpy.einsum_path).
* *The subscripts string is a comma-separated list of subscript labels,*
* *where each label refers to a dimension of the corresponding operand.*
* Whenever a label is repeated it is summed, so `mt.einsum('i,i', a, b)`
* is equivalent to [`mt.inner(a,b)`](maxframe.tensor.inner.md#maxframe.tensor.inner). If a label
* appears only once, it is not summed, so `mt.einsum('i', a)` produces a
* view of `a` with no changes. A further example `mt.einsum('ij,jk', a, b)`
* *describes traditional matrix multiplication and is equivalent to*
* [`mt.matmul(a,b)`](maxframe.tensor.matmul.md#maxframe.tensor.matmul).
* *In \*implicit mode*, the chosen subscripts are important\*
* *since the axes of the output are reordered alphabetically. This*
* means that `mt.einsum('ij', a)` doesn’t affect a 2D array, while
* `mt.einsum('ji', a)` takes its transpose. Additionally,
* `mt.einsum('ij,jk', a, b)` returns a matrix multiplication, while,
* `mt.einsum('ij,jh', a, b)` returns the transpose of the
* *multiplication since subscript ‘h’ precedes subscript ‘i’.*
* *In \*explicit mode* the output can be directly controlled by\*
* *specifying output subscript labels. This requires the*
* *identifier ‘->’ as well as the list of output subscript labels.*
* *This feature increases the flexibility of the function since*
* *summing can be disabled or forced when required. The call*
* `mt.einsum('i->', a)` is like [`mt.sum(a, axis=-1)`](maxframe.tensor.sum.md#maxframe.tensor.sum),
* and `mt.einsum('ii->i', a)` is like [`mt.diag(a)`](maxframe.tensor.diag.md#maxframe.tensor.diag).
* The difference is that einsum does not allow broadcasting by default.
* Additionally `mt.einsum('ij,jh->ih', a, b)` directly specifies the
* *order of the output subscript labels and therefore returns matrix*
* *multiplication, unlike the example above in implicit mode.*
* *To enable and control broadcasting, use an ellipsis. Default*
* *NumPy-style broadcasting is done by adding an ellipsis*
* to the left of each term, like `mt.einsum('...ii->...i', a)`.
* *To take the trace along the first and last axes,*
* you can do `mt.einsum('i...i', a)`, or to do a matrix-matrix
* *product with the left-most indices instead of rightmost, one can do*
* `mt.einsum('ij...,jk...->ik...', a, b)`.
* *When there is only one operand, no axes are summed, and no output*
* *parameter is provided, a view into the operand is returned instead*
* of a new array. Thus, taking the diagonal as `mt.einsum('ii->i', a)`
* *produces a view (changed in version 1.10.0).*
* einsum also provides an alternative way to provide the subscripts
* and operands as `einsum(op0, sublist0, op1, sublist1, ..., [sublistout])`.
* If the output shape is not provided in this format einsum will be
* *calculated in implicit mode, otherwise it will be performed explicitly.*
* The examples below have corresponding einsum calls with the two
* *parameter methods.*
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = mt.arange(25).reshape(5,5)
>>> b = mt.arange(5)
>>> c = mt.arange(6).reshape(2,3)
Trace of a matrix:
>>> mt.einsum('ii', a).execute()
60
>>> mt.einsum(a, [0,0]).execute()
60
Extract the diagonal (requires explicit form):
>>> mt.einsum('ii->i', a).execute()
array([ 0, 6, 12, 18, 24])
>>> mt.einsum(a, [0,0], [0]).execute()
array([ 0, 6, 12, 18, 24])
>>> mt.diag(a).execute()
array([ 0, 6, 12, 18, 24])
Sum over an axis (requires explicit form):
>>> mt.einsum('ij->i', a).execute()
array([ 10, 35, 60, 85, 110])
>>> mt.einsum(a, [0,1], [0]).execute()
array([ 10, 35, 60, 85, 110])
>>> mt.sum(a, axis=1).execute()
array([ 10, 35, 60, 85, 110])
For higher dimensional arrays summing a single axis can be done with ellipsis:
>>> mt.einsum('...j->...', a).execute()
array([ 10, 35, 60, 85, 110])
>>> mt.einsum(a, [Ellipsis,1], [Ellipsis]).execute()
array([ 10, 35, 60, 85, 110])
Compute a matrix transpose, or reorder any number of axes:
>>> mt.einsum('ji', c).execute()
array([[0, 3],
[1, 4],
[2, 5]])
>>> mt.einsum('ij->ji', c).execute()
array([[0, 3],
[1, 4],
[2, 5]])
>>> mt.einsum(c, [1,0]).execute()
array([[0, 3],
[1, 4],
[2, 5]])
>>> mt.transpose(c).execute()
array([[0, 3],
[1, 4],
[2, 5]])
Vector inner products:
>>> mt.einsum('i,i', b, b).execute()
30
>>> mt.einsum(b, [0], b, [0]).execute()
30
>>> mt.inner(b,b).execute()
30
Matrix vector multiplication:
>>> mt.einsum('ij,j', a, b).execute()
array([ 30, 80, 130, 180, 230])
>>> mt.einsum(a, [0,1], b, [1]).execute()
array([ 30, 80, 130, 180, 230])
>>> mt.dot(a, b).execute()
array([ 30, 80, 130, 180, 230])
>>> mt.einsum('...j,j', a, b).execute()
array([ 30, 80, 130, 180, 230])
Broadcasting and scalar multiplication:
>>> mt.einsum('..., ...', 3, c).execute()
array([[ 0, 3, 6],
[ 9, 12, 15]])
>>> mt.einsum(',ij', 3, c).execute()
array([[ 0, 3, 6],
[ 9, 12, 15]])
>>> mt.einsum(3, [Ellipsis], c, [Ellipsis]).execute()
array([[ 0, 3, 6],
[ 9, 12, 15]])
>>> mt.multiply(3, c).execute()
array([[ 0, 3, 6],
[ 9, 12, 15]])
Vector outer product:
>>> mt.einsum('i,j', mt.arange(2)+1, b).execute()
array([[0, 1, 2, 3, 4],
[0, 2, 4, 6, 8]])
>>> mt.einsum(mt.arange(2)+1, [0], b, [1]).execute()
array([[0, 1, 2, 3, 4],
[0, 2, 4, 6, 8]])
>>> mt.outer(mt.arange(2)+1, b).execute()
array([[0, 1, 2, 3, 4],
[0, 2, 4, 6, 8]])
Tensor contraction:
>>> a = mt.arange(60.).reshape(3,4,5)
>>> b = mt.arange(24.).reshape(4,3,2)
>>> mt.einsum('ijk,jil->kl', a, b).execute()
array([[4400., 4730.],
[4532., 4874.],
[4664., 5018.],
[4796., 5162.],
[4928., 5306.]])
>>> mt.einsum(a, [0,1,2], b, [1,0,3], [2,3]).execute()
array([[4400., 4730.],
[4532., 4874.],
[4664., 5018.],
[4796., 5162.],
[4928., 5306.]])
>>> mt.tensordot(a,b, axes=([1,0],[0,1])).execute()
array([[4400., 4730.],
[4532., 4874.],
[4664., 5018.],
[4796., 5162.],
[4928., 5306.]])
Writeable returned arrays (since version 1.10.0):
>>> a = mt.zeros((3, 3))
>>> mt.einsum('ii->i', a)[:] = 1
>>> a.execute()
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
Example of ellipsis use:
>>> a = mt.arange(6).reshape((3,2))
>>> b = mt.arange(12).reshape((4,3))
>>> mt.einsum('ki,jk->ij', a, b).execute()
array([[10, 28, 46, 64],
[13, 40, 67, 94]])
>>> mt.einsum('ki,...k->i...', a, b).execute()
array([[10, 28, 46, 64],
[13, 40, 67, 94]])
>>> mt.einsum('k...,jk', a, b).execute()
array([[10, 28, 46, 64],
[13, 40, 67, 94]])
Chained array operations. For more complicated contractions, speed ups
might be achieved by repeatedly computing a 'greedy' path or pre-computing the
'optimal' path and repeatedly applying it, using an
`einsum_path` insertion (since version 1.12.0). Performance improvements can be
particularly significant with larger arrays:
>>> a = mt.ones(64).reshape(2,4,8)
Basic `einsum`: ~1520ms (benchmarked on 3.1GHz Intel i5.)
>>> for iteration in range(500):
... _ = mt.einsum('ijk,ilm,njm,nlk,abc->',a,a,a,a,a)
Sub-optimal `einsum` (due to repeated path calculation time): ~330ms
>>> for iteration in range(500):
... _ = mt.einsum('ijk,ilm,njm,nlk,abc->',a,a,a,a,a, optimize='optimal')
Greedy `einsum` (faster optimal path approximation): ~160ms
>>> for iteration in range(500):
... _ = mt.einsum('ijk,ilm,njm,nlk,abc->',a,a,a,a,a, optimize='greedy')
Optimal `einsum` (best usage pattern in some use cases): ~110ms
>>> path = mt.einsum_path('ijk,ilm,njm,nlk,abc->',a,a,a,a,a, optimize='optimal')[0]
>>> for iteration in range(500):
... _ = mt.einsum('ijk,ilm,njm,nlk,abc->',a,a,a,a,a, optimize=path)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.empty.md
# maxframe.tensor.empty
### maxframe.tensor.empty(shape, dtype=None, chunk_size=None, gpu=None, order='C')
Return a new tensor of given shape and type, without initializing entries.
* **Parameters:**
* **shape** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int)) – Shape of the empty tensor
* **dtype** (*data-type* *,* *optional*) – Desired output data-type.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **order** ( *{'C'* *,* *'F'}* *,* *optional* *,* *default: 'C'*) – Whether to store multi-dimensional data in row-major
(C-style) or column-major (Fortran-style) order in
memory.
* **Returns:**
**out** – Tensor of uninitialized (arbitrary) data of the given shape, dtype, and
order. Object arrays will be initialized to None.
* **Return type:**
Tensor
#### SEE ALSO
[`empty_like`](maxframe.tensor.empty_like.md#maxframe.tensor.empty_like), [`zeros`](maxframe.tensor.zeros.md#maxframe.tensor.zeros), [`ones`](maxframe.tensor.ones.md#maxframe.tensor.ones)
### Notes
empty, unlike zeros, does not set the array values to zero,
and may therefore be marginally faster. On the other hand, it requires
the user to manually set all the values in the array, and should be
used with caution.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> mt.empty([2, 2]).execute()
array([[ -9.74499359e+001, 6.69583040e-309],
[ 2.13182611e-314, 3.06959433e-309]]) #random
>>> mt.empty([2, 2], dtype=int).execute()
array([[-1073741821, -1067949133],
[ 496041986, 19249760]]) #random
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.empty_like.md
# maxframe.tensor.empty_like
### maxframe.tensor.empty_like(a, dtype=None, gpu=None, order='K')
Return a new tensor with the same shape and type as a given tensor.
* **Parameters:**
* **a** (*array_like*) – The shape and data-type of a define these same attributes of the
returned tensor.
* **dtype** (*data-type* *,* *optional*) – Overrides the data type of the result.
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, None as default
* **order** ( *{'C'* *,* *'F'* *,* *'A'* *, or* *'K'}* *,* *optional*) – Overrides the memory layout of the result. ‘C’ means C-order,
‘F’ means F-order, ‘A’ means ‘F’ if `prototype` is Fortran
contiguous, ‘C’ otherwise. ‘K’ means match the layout of `prototype`
as closely as possible.
* **Returns:**
**out** – Array of uninitialized (arbitrary) data with the same
shape and type as a.
* **Return type:**
Tensor
#### SEE ALSO
`ones_like`
: Return a tensor of ones with shape and type of input.
`zeros_like`
: Return a tensor of zeros with shape and type of input.
[`empty`](maxframe.tensor.empty.md#maxframe.tensor.empty)
: Return a new uninitialized tensor.
[`ones`](maxframe.tensor.ones.md#maxframe.tensor.ones)
: Return a new tensor setting values to one.
[`zeros`](maxframe.tensor.zeros.md#maxframe.tensor.zeros)
: Return a new tensor setting values to zero.
### Notes
This function does *not* initialize the returned tensor; to do that use
zeros_like or ones_like instead. It may be marginally faster than
the functions that do set the array values.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = ([1,2,3], [4,5,6]) # a is array-like
>>> mt.empty_like(a).execute()
array([[-1073741821, -1073741821, 3], #ranm
[ 0, 0, -1073741821]])
>>> a = mt.array([[1., 2., 3.],[4.,5.,6.]])
>>> mt.empty_like(a).execute()
array([[ -2.00000715e+000, 1.48219694e-323, -2.00000572e+000],#random
[ 4.38791518e-305, -2.00000715e+000, 4.17269252e-309]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.equal.md
# maxframe.tensor.equal
### maxframe.tensor.equal(x1, x2, out=None, where=None, \*\*kwargs)
Return (x1 == x2) element-wise.
* **Parameters:**
* **x1** (*array_like*) – Input tensors of the same shape.
* **x2** (*array_like*) – Input tensors of the same shape.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated array is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs** – For other keyword-only arguments, see the
[ufunc docs](https://numpy.org/doc/stable/reference/ufuncs.html#ufuncs-kwargs).
* **Returns:**
**out** – Output tensor of bools, or a single bool if x1 and x2 are scalars.
* **Return type:**
Tensor or [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`not_equal`](maxframe.tensor.not_equal.md#maxframe.tensor.not_equal), [`greater_equal`](maxframe.tensor.greater_equal.md#maxframe.tensor.greater_equal), [`less_equal`](maxframe.tensor.less_equal.md#maxframe.tensor.less_equal), [`greater`](maxframe.tensor.greater.md#maxframe.tensor.greater), [`less`](maxframe.tensor.less.md#maxframe.tensor.less)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.equal([0, 1, 3], mt.arange(3)).execute()
array([ True, True, False])
```
What is compared are values, not types. So an int (1) and a tensor of
length one can evaluate as True:
```pycon
>>> mt.equal(1, mt.ones(1))
array([ True])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.exp.md
# maxframe.tensor.exp
### maxframe.tensor.exp(x, out=None, where=None, \*\*kwargs)
Calculate the exponential of all elements in the input tensor.
* **Parameters:**
* **x** (*array_like*) – Input values.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs** – For other keyword-only arguments, see the
[ufunc docs](https://numpy.org/doc/stable/reference/ufuncs.html#ufuncs-kwargs).
* **Returns:**
**out** – Output tensor, element-wise exponential of x.
* **Return type:**
Tensor
#### SEE ALSO
[`expm1`](maxframe.tensor.expm1.md#maxframe.tensor.expm1)
: Calculate `exp(x) - 1` for all elements in the array.
[`exp2`](maxframe.tensor.exp2.md#maxframe.tensor.exp2)
: Calculate `2**x` for all elements in the array.
### Notes
The irrational number `e` is also known as Euler’s number. It is
approximately 2.718281, and is the base of the natural logarithm,
`ln` (this means that, if $x = \ln y = \log_e y$,
then $e^x = y$. For real input, `exp(x)` is always positive.
For complex arguments, `x = a + ib`, we can write
$e^x = e^a e^{ib}$. The first term, $e^a$, is already
known (it is the real argument, described above). The second term,
$e^{ib}$, is $\cos b + i \sin b$, a function with
magnitude 1 and a periodic phase.
### References
* <a id='id1'>**[1]**</a> Wikipedia, “Exponential function”, [http://en.wikipedia.org/wiki/Exponential_function](http://en.wikipedia.org/wiki/Exponential_function)
* <a id='id2'>**[2]**</a> M. Abramovitz and I. A. Stegun, “Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables,” Dover, 1964, p. 69, [http://www.math.sfu.ca/~cbm/aands/page_69.htm](http://www.math.sfu.ca/~cbm/aands/page_69.htm)
### Examples
Plot the magnitude and phase of `exp(x)` in the complex plane:
```pycon
>>> import maxframe.tensor as mt
>>> import matplotlib.pyplot as plt
```
```pycon
>>> x = mt.linspace(-2*mt.pi, 2*mt.pi, 100)
>>> xx = x + 1j * x[:, mt.newaxis] # a + ib over complex plane
>>> out = mt.exp(xx)
```
```pycon
>>> plt.subplot(121)
>>> plt.imshow(mt.abs(out).execute(),
... extent=[-2*mt.pi, 2*mt.pi, -2*mt.pi, 2*mt.pi], cmap='gray')
>>> plt.title('Magnitude of exp(x)')
```
```pycon
>>> plt.subplot(122)
>>> plt.imshow(mt.angle(out).execute(),
... extent=[-2*mt.pi, 2*mt.pi, -2*mt.pi, 2*mt.pi], cmap='hsv')
>>> plt.title('Phase (angle) of exp(x)')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.exp2.md
# maxframe.tensor.exp2
### maxframe.tensor.exp2(x, out=None, where=None, \*\*kwargs)
Calculate 2\*\*p for all p in the input tensor.
* **Parameters:**
* **x** (*array_like*) – Input values.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Element-wise 2 to the power x.
* **Return type:**
Tensor
#### SEE ALSO
[`power`](maxframe.tensor.power.md#maxframe.tensor.power)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.exp2([2, 3]).execute()
array([ 4., 8.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.expand_dims.md
# maxframe.tensor.expand_dims
### maxframe.tensor.expand_dims(a, axis)
Expand the shape of a tensor.
Insert a new axis that will appear at the axis position in the expanded
array shape.
* **Parameters:**
* **a** (*array_like*) – Input tensor.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Position in the expanded axes where the new axis is placed.
* **Returns:**
**res** – Output tensor. The number of dimensions is one greater than that of
the input tensor.
* **Return type:**
Tensor
#### SEE ALSO
[`squeeze`](maxframe.tensor.squeeze.md#maxframe.tensor.squeeze)
: The inverse operation, removing singleton dimensions
[`reshape`](maxframe.tensor.reshape.md#maxframe.tensor.reshape)
: Insert, remove, and combine dimensions, and resize existing ones
`doc.indexing`, [`atleast_1d`](maxframe.tensor.atleast_1d.md#maxframe.tensor.atleast_1d), [`atleast_2d`](maxframe.tensor.atleast_2d.md#maxframe.tensor.atleast_2d), [`atleast_3d`](maxframe.tensor.atleast_3d.md#maxframe.tensor.atleast_3d)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([1,2])
>>> x.shape
(2,)
```
The following is equivalent to `x[mt.newaxis,:]` or `x[mt.newaxis]`:
```pycon
>>> y = mt.expand_dims(x, axis=0)
>>> y.execute()
array([[1, 2]])
>>> y.shape
(1, 2)
```
```pycon
>>> y = mt.expand_dims(x, axis=1) # Equivalent to x[:,mt.newaxis]
>>> y.execute()
array([[1],
[2]])
>>> y.shape
(2, 1)
```
Note that some examples may use `None` instead of `np.newaxis`. These
are the same objects:
```pycon
>>> mt.newaxis is None
True
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.expm1.md
# maxframe.tensor.expm1
### maxframe.tensor.expm1(x, out=None, where=None, \*\*kwargs)
Calculate `exp(x) - 1` for all elements in the tensor.
* **Parameters:**
* **x** (*array_like*) – Input values.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Element-wise exponential minus one: `out = exp(x) - 1`.
* **Return type:**
Tensor
#### SEE ALSO
[`log1p`](maxframe.tensor.log1p.md#maxframe.tensor.log1p)
: `log(1 + x)`, the inverse of expm1.
### Notes
This function provides greater precision than `exp(x) - 1`
for small values of `x`.
### Examples
The true value of `exp(1e-10) - 1` is `1.00000000005e-10` to
about 32 significant digits. This example shows the superiority of
expm1 in this case.
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.expm1(1e-10).execute()
1.00000000005e-10
>>> (mt.exp(1e-10) - 1).execute()
1.000000082740371e-10
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.fft.md
# maxframe.tensor.fft.fft
### maxframe.tensor.fft.fft(a, n=None, axis=-1, norm=None)
Compute the one-dimensional discrete Fourier Transform.
This function computes the one-dimensional *n*-point discrete Fourier
Transform (DFT) with the efficient Fast Fourier Transform (FFT)
algorithm [CT].
* **Parameters:**
* **a** (*array_like*) – Input tensor, can be complex.
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Length of the transformed axis of the output.
If n is smaller than the length of the input, the input is cropped.
If it is larger, the input is padded with zeros. If n is not given,
the length of the input along the axis specified by axis is used.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis over which to compute the FFT. If not given, the last axis is
used.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see mt.fft). Default is None.
* **Returns:**
**out** – The truncated or zero-padded input, transformed along the axis
indicated by axis, or the last one if axis is not specified.
* **Return type:**
complex Tensor
* **Raises:**
[**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – if axes is larger than the last axis of a.
#### SEE ALSO
`mt.fft`
: for definition of the DFT and conventions used.
[`ifft`](maxframe.tensor.fft.ifft.md#maxframe.tensor.fft.ifft)
: The inverse of fft.
[`fft2`](maxframe.tensor.fft.fft2.md#maxframe.tensor.fft.fft2)
: The two-dimensional FFT.
[`fftn`](maxframe.tensor.fft.fftn.md#maxframe.tensor.fft.fftn)
: The *n*-dimensional FFT.
[`rfftn`](maxframe.tensor.fft.rfftn.md#maxframe.tensor.fft.rfftn)
: The *n*-dimensional FFT of real input.
[`fftfreq`](maxframe.tensor.fft.fftfreq.md#maxframe.tensor.fft.fftfreq)
: Frequency bins for given FFT parameters.
### Notes
FFT (Fast Fourier Transform) refers to a way the discrete Fourier
Transform (DFT) can be calculated efficiently, by using symmetries in the
calculated terms. The symmetry is highest when n is a power of 2, and
the transform is therefore most efficient for these sizes.
The DFT is defined, with the conventions used in this implementation, in
the documentation for the numpy.fft module.
### References
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.fft.fft(mt.exp(2j * mt.pi * mt.arange(8) / 8)).execute()
array([-2.33486982e-16+1.14423775e-17j, 8.00000000e+00-6.89018570e-16j,
2.33486982e-16+2.33486982e-16j, 0.00000000e+00+0.00000000e+00j,
-1.14423775e-17+2.33486982e-16j, 0.00000000e+00+1.99159850e-16j,
1.14423775e-17+1.14423775e-17j, 0.00000000e+00+0.00000000e+00j])
```
In this example, real input has an FFT which is Hermitian, i.e., symmetric
in the real part and anti-symmetric in the imaginary part, as described in
the numpy.fft documentation:
```pycon
>>> import matplotlib.pyplot as plt
>>> t = mt.arange(256)
>>> sp = mt.fft.fft(mt.sin(t))
>>> freq = mt.fft.fftfreq(t.shape[-1])
>>> plt.plot(freq.execute(), sp.real.execute(), freq.execute(), sp.imag.execute())
[<matplotlib.lines.Line2D object at 0x...>, <matplotlib.lines.Line2D object at 0x...>]
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.fft2.md
# maxframe.tensor.fft.fft2
### maxframe.tensor.fft.fft2(a, s=None, axes=(-2, -1), norm=None)
Compute the 2-dimensional discrete Fourier Transform
This function computes the *n*-dimensional discrete Fourier Transform
over any axes in an *M*-dimensional array by means of the
Fast Fourier Transform (FFT). By default, the transform is computed over
the last two axes of the input array, i.e., a 2-dimensional FFT.
* **Parameters:**
* **a** (*array_like*) – Input tensor, can be complex
* **s** (*sequence* *of* *ints* *,* *optional*) – Shape (length of each transformed axis) of the output
(`s[0]` refers to axis 0, `s[1]` to axis 1, etc.).
This corresponds to `n` for `fft(x, n)`.
Along each axis, if the given shape is smaller than that of the input,
the input is cropped. If it is larger, the input is padded with zeros.
if s is not given, the shape of the input along the axes specified
by axes is used.
* **axes** (*sequence* *of* *ints* *,* *optional*) – Axes over which to compute the FFT. If not given, the last two
axes are used. A repeated index in axes means the transform over
that axis is performed multiple times. A one-element sequence means
that a one-dimensional FFT is performed.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see mt.fft). Default is None.
* **Returns:**
**out** – The truncated or zero-padded input, transformed along the axes
indicated by axes, or the last two axes if axes is not given.
* **Return type:**
complex Tensor
* **Raises:**
* [**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If s and axes have different length, or axes not given and
`len(s) != 2`.
* [**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – If an element of axes is larger than than the number of axes of a.
#### SEE ALSO
`mt.fft`
: Overall view of discrete Fourier transforms, with definitions and conventions used.
[`ifft2`](maxframe.tensor.fft.ifft2.md#maxframe.tensor.fft.ifft2)
: The inverse two-dimensional FFT.
[`fft`](maxframe.tensor.fft.fft.md#maxframe.tensor.fft.fft)
: The one-dimensional FFT.
[`fftn`](maxframe.tensor.fft.fftn.md#maxframe.tensor.fft.fftn)
: The *n*-dimensional FFT.
[`fftshift`](maxframe.tensor.fft.fftshift.md#maxframe.tensor.fft.fftshift)
: Shifts zero-frequency terms to the center of the array. For two-dimensional input, swaps first and third quadrants, and second and fourth quadrants.
### Notes
fft2 is just fftn with a different default for axes.
The output, analogously to fft, contains the term for zero frequency in
the low-order corner of the transformed axes, the positive frequency terms
in the first half of these axes, the term for the Nyquist frequency in the
middle of the axes and the negative frequency terms in the second half of
the axes, in order of decreasingly negative frequency.
See fftn for details and a plotting example, and mt.fft for
definitions and conventions used.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.mgrid[:5, :5][0]
>>> mt.fft.fft2(a).execute()
array([[ 50.0 +0.j , 0.0 +0.j , 0.0 +0.j ,
0.0 +0.j , 0.0 +0.j ],
[-12.5+17.20477401j, 0.0 +0.j , 0.0 +0.j ,
0.0 +0.j , 0.0 +0.j ],
[-12.5 +4.0614962j , 0.0 +0.j , 0.0 +0.j ,
0.0 +0.j , 0.0 +0.j ],
[-12.5 -4.0614962j , 0.0 +0.j , 0.0 +0.j ,
0.0 +0.j , 0.0 +0.j ],
[-12.5-17.20477401j, 0.0 +0.j , 0.0 +0.j ,
0.0 +0.j , 0.0 +0.j ]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.fftfreq.md
# maxframe.tensor.fft.fftfreq
### maxframe.tensor.fft.fftfreq(n, d=1.0, gpu=None, chunk_size=None)
Return the Discrete Fourier Transform sample frequencies.
The returned float tensor f contains the frequency bin centers in cycles
per unit of the sample spacing (with zero at the start). For instance, if
the sample spacing is in seconds, then the frequency unit is cycles/second.
Given a window length n and a sample spacing d:
```default
f = [0, 1, ..., n/2-1, -n/2, ..., -1] / (d*n) if n is even
f = [0, 1, ..., (n-1)/2, -(n-1)/2, ..., -1] / (d*n) if n is odd
```
* **Parameters:**
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Window length.
* **d** (*scalar* *,* *optional*) – Sample spacing (inverse of the sampling rate). Defaults to 1.
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **Returns:**
**f** – Array of length n containing the sample frequencies.
* **Return type:**
Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> signal = mt.array([-2, 8, 6, 4, 1, 0, 3, 5], dtype=float)
>>> fourier = mt.fft.fft(signal)
>>> n = signal.size
>>> timestep = 0.1
>>> freq = mt.fft.fftfreq(n, d=timestep)
>>> freq.execute()
array([ 0. , 1.25, 2.5 , 3.75, -5. , -3.75, -2.5 , -1.25])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.fftn.md
# maxframe.tensor.fft.fftn
### maxframe.tensor.fft.fftn(a, s=None, axes=None, norm=None)
Compute the N-dimensional discrete Fourier Transform.
This function computes the *N*-dimensional discrete Fourier Transform over
any number of axes in an *M*-dimensional tensor by means of the Fast Fourier
Transform (FFT).
* **Parameters:**
* **a** (*array_like*) – Input tensor, can be complex.
* **s** (*sequence* *of* *ints* *,* *optional*) – Shape (length of each transformed axis) of the output
(`s[0]` refers to axis 0, `s[1]` to axis 1, etc.).
This corresponds to `n` for `fft(x, n)`.
Along any axis, if the given shape is smaller than that of the input,
the input is cropped. If it is larger, the input is padded with zeros.
if s is not given, the shape of the input along the axes specified
by axes is used.
* **axes** (*sequence* *of* *ints* *,* *optional*) – Axes over which to compute the FFT. If not given, the last `len(s)`
axes are used, or all axes if s is also not specified.
Repeated indices in axes means that the transform over that axis is
performed multiple times.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see mt.fft). Default is None.
* **Returns:**
**out** – The truncated or zero-padded input, transformed along the axes
indicated by axes, or by a combination of s and a,
as explained in the parameters section above.
* **Return type:**
complex Tensor
* **Raises:**
* [**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If s and axes have different length.
* [**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – If an element of axes is larger than than the number of axes of a.
#### SEE ALSO
`mt.fft`
: Overall view of discrete Fourier transforms, with definitions and conventions used.
[`ifftn`](maxframe.tensor.fft.ifftn.md#maxframe.tensor.fft.ifftn)
: The inverse of fftn, the inverse *n*-dimensional FFT.
[`fft`](maxframe.tensor.fft.fft.md#maxframe.tensor.fft.fft)
: The one-dimensional FFT, with definitions and conventions used.
[`rfftn`](maxframe.tensor.fft.rfftn.md#maxframe.tensor.fft.rfftn)
: The *n*-dimensional FFT of real input.
[`fft2`](maxframe.tensor.fft.fft2.md#maxframe.tensor.fft.fft2)
: The two-dimensional FFT.
[`fftshift`](maxframe.tensor.fft.fftshift.md#maxframe.tensor.fft.fftshift)
: Shifts zero-frequency terms to centre of tensor
### Notes
The output, analogously to fft, contains the term for zero frequency in
the low-order corner of all axes, the positive frequency terms in the
first half of all axes, the term for the Nyquist frequency in the middle
of all axes and the negative frequency terms in the second half of all
axes, in order of decreasingly negative frequency.
See mt.fft for details, definitions and conventions used.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.mgrid[:3, :3, :3][0]
>>> mt.fft.fftn(a, axes=(1, 2)).execute()
array([[[ 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j]],
[[ 9.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j]],
[[ 18.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j]]])
>>> mt.fft.fftn(a, (2, 2), axes=(0, 1)).execute()
array([[[ 2.+0.j, 2.+0.j, 2.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j]],
[[-2.+0.j, -2.+0.j, -2.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j]]])
```
```pycon
>>> import matplotlib.pyplot as plt
>>> [X, Y] = mt.meshgrid(2 * mt.pi * mt.arange(200) / 12,
... 2 * mt.pi * mt.arange(200) / 34)
>>> S = mt.sin(X) + mt.cos(Y) + mt.random.uniform(0, 1, X.shape)
>>> FS = mt.fft.fftn(S)
>>> plt.imshow(mt.log(mt.abs(mt.fft.fftshift(FS))**2).execute())
<matplotlib.image.AxesImage object at 0x...>
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.fftshift.md
# maxframe.tensor.fft.fftshift
### maxframe.tensor.fft.fftshift(x, axes=None)
Shift the zero-frequency component to the center of the spectrum.
This function swaps half-spaces for all axes listed (defaults to all).
Note that `y[0]` is the Nyquist component only if `len(x)` is even.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **axes** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *shape tuple* *,* *optional*) – Axes over which to shift. Default is None, which shifts all axes.
* **Returns:**
**y** – The shifted tensor.
* **Return type:**
Tensor
#### SEE ALSO
[`ifftshift`](maxframe.tensor.fft.ifftshift.md#maxframe.tensor.fft.ifftshift)
: The inverse of fftshift.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> freqs = mt.fft.fftfreq(10, 0.1)
>>> freqs.execute()
array([ 0., 1., 2., 3., 4., -5., -4., -3., -2., -1.])
>>> mt.fft.fftshift(freqs).execute()
array([-5., -4., -3., -2., -1., 0., 1., 2., 3., 4.])
```
Shift the zero-frequency component only along the second axis:
```pycon
>>> freqs = mt.fft.fftfreq(9, d=1./9).reshape(3, 3)
>>> freqs.execute()
array([[ 0., 1., 2.],
[ 3., 4., -4.],
[-3., -2., -1.]])
>>> mt.fft.fftshift(freqs, axes=(1,)).execute()
array([[ 2., 0., 1.],
[-4., 3., 4.],
[-1., -3., -2.]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.hfft.md
# maxframe.tensor.fft.hfft
### maxframe.tensor.fft.hfft(a, n=None, axis=-1, norm=None)
Compute the FFT of a signal that has Hermitian symmetry, i.e., a real
spectrum.
* **Parameters:**
* **a** (*array_like*) – The input tensor.
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Length of the transformed axis of the output. For n output
points, `n//2 + 1` input points are necessary. If the input is
longer than this, it is cropped. If it is shorter than this, it is
padded with zeros. If n is not given, it is determined from the
length of the input along the axis specified by axis.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis over which to compute the FFT. If not given, the last
axis is used.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see mt.fft). Default is None.
* **Returns:**
**out** – The truncated or zero-padded input, transformed along the axis
indicated by axis, or the last one if axis is not specified.
The length of the transformed axis is n, or, if n is not given,
`2*m - 2` where `m` is the length of the transformed axis of
the input. To get an odd number of output points, n must be
specified, for instance as `2*m - 1` in the typical case,
* **Return type:**
Tensor
* **Raises:**
[**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – If axis is larger than the last axis of a.
#### SEE ALSO
[`rfft`](maxframe.tensor.fft.rfft.md#maxframe.tensor.fft.rfft)
: Compute the one-dimensional FFT for real input.
[`ihfft`](maxframe.tensor.fft.ihfft.md#maxframe.tensor.fft.ihfft)
: The inverse of hfft.
### Notes
hfft/ihfft are a pair analogous to rfft/irfft, but for the
opposite case: here the signal has Hermitian symmetry in the time
domain and is real in the frequency domain. So here it’s hfft for
which you must supply the length of the result if it is to be odd.
* even: `ihfft(hfft(a, 2*len(a) - 2) == a`, within roundoff error,
* odd: `ihfft(hfft(a, 2*len(a) - 1) == a`, within roundoff error.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> signal = mt.array([1, 2, 3, 4, 3, 2])
>>> mt.fft.fft(signal).execute()
array([ 15.+0.j, -4.+0.j, 0.+0.j, -1.-0.j, 0.+0.j, -4.+0.j])
>>> mt.fft.hfft(signal[:4]).execute() # Input first half of signal
array([ 15., -4., 0., -1., 0., -4.])
>>> mt.fft.hfft(signal, 6).execute() # Input entire signal and truncate
array([ 15., -4., 0., -1., 0., -4.])
```
```pycon
>>> signal = mt.array([[1, 1.j], [-1.j, 2]])
>>> (mt.conj(signal.T) - signal).execute() # check Hermitian symmetry
array([[ 0.-0.j, 0.+0.j],
[ 0.+0.j, 0.-0.j]])
>>> freq_spectrum = mt.fft.hfft(signal)
>>> freq_spectrum.execute()
array([[ 1., 1.],
[ 2., -2.]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.ifft.md
# maxframe.tensor.fft.ifft
### maxframe.tensor.fft.ifft(a, n=None, axis=-1, norm=None)
Compute the one-dimensional inverse discrete Fourier Transform.
This function computes the inverse of the one-dimensional *n*-point
discrete Fourier transform computed by fft. In other words,
`ifft(fft(a)) == a` to within numerical accuracy.
For a general description of the algorithm and definitions,
see mt.fft.
The input should be ordered in the same way as is returned by fft,
i.e.,
* `a[0]` should contain the zero frequency term,
* `a[1:n//2]` should contain the positive-frequency terms,
* `a[n//2 + 1:]` should contain the negative-frequency terms, in
increasing order starting from the most negative frequency.
For an even number of input points, `A[n//2]` represents the sum of
the values at the positive and negative Nyquist frequencies, as the two
are aliased together. See numpy.fft for details.
* **Parameters:**
* **a** (*array_like*) – Input tensor, can be complex.
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Length of the transformed axis of the output.
If n is smaller than the length of the input, the input is cropped.
If it is larger, the input is padded with zeros. If n is not given,
the length of the input along the axis specified by axis is used.
See notes about padding issues.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis over which to compute the inverse DFT. If not given, the last
axis is used.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see numpy.fft). Default is None.
* **Returns:**
**out** – The truncated or zero-padded input, transformed along the axis
indicated by axis, or the last one if axis is not specified.
* **Return type:**
complex Tensor
* **Raises:**
[**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – If axes is larger than the last axis of a.
#### SEE ALSO
`mt.fft`
: An introduction, with definitions and general explanations.
[`fft`](maxframe.tensor.fft.fft.md#maxframe.tensor.fft.fft)
: The one-dimensional (forward) FFT, of which ifft is the inverse
[`ifft2`](maxframe.tensor.fft.ifft2.md#maxframe.tensor.fft.ifft2)
: The two-dimensional inverse FFT.
[`ifftn`](maxframe.tensor.fft.ifftn.md#maxframe.tensor.fft.ifftn)
: The n-dimensional inverse FFT.
### Notes
If the input parameter n is larger than the size of the input, the input
is padded by appending zeros at the end. Even though this is the common
approach, it might lead to surprising results. If a different padding is
desired, it must be performed before calling ifft.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.fft.ifft([0, 4, 0, 0]).execute()
array([ 1.+0.j, 0.+1.j, -1.+0.j, 0.-1.j])
```
Create and plot a band-limited signal with random phases:
```pycon
>>> import matplotlib.pyplot as plt
>>> t = mt.arange(400)
>>> n = mt.zeros((400,), dtype=complex)
>>> n[40:60] = mt.exp(1j*mt.random.uniform(0, 2*mt.pi, (20,)))
>>> s = mt.fft.ifft(n)
>>> plt.plot(t.execute(), s.real.execute(), 'b-', t.execute(), s.imag.execute(), 'r--')
...
>>> plt.legend(('real', 'imaginary'))
...
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.ifft2.md
# maxframe.tensor.fft.ifft2
### maxframe.tensor.fft.ifft2(a, s=None, axes=(-2, -1), norm=None)
Compute the 2-dimensional inverse discrete Fourier Transform.
This function computes the inverse of the 2-dimensional discrete Fourier
Transform over any number of axes in an M-dimensional array by means of
the Fast Fourier Transform (FFT). In other words, `ifft2(fft2(a)) == a`
to within numerical accuracy. By default, the inverse transform is
computed over the last two axes of the input array.
The input, analogously to ifft, should be ordered in the same way as is
returned by fft2, i.e. it should have the term for zero frequency
in the low-order corner of the two axes, the positive frequency terms in
the first half of these axes, the term for the Nyquist frequency in the
middle of the axes and the negative frequency terms in the second half of
both axes, in order of decreasingly negative frequency.
* **Parameters:**
* **a** (*array_like*) – Input tensor, can be complex.
* **s** (*sequence* *of* *ints* *,* *optional*) – Shape (length of each axis) of the output (`s[0]` refers to axis 0,
`s[1]` to axis 1, etc.). This corresponds to n for `ifft(x, n)`.
Along each axis, if the given shape is smaller than that of the input,
the input is cropped. If it is larger, the input is padded with zeros.
if s is not given, the shape of the input along the axes specified
by axes is used. See notes for issue on ifft zero padding.
* **axes** (*sequence* *of* *ints* *,* *optional*) – Axes over which to compute the FFT. If not given, the last two
axes are used. A repeated index in axes means the transform over
that axis is performed multiple times. A one-element sequence means
that a one-dimensional FFT is performed.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see mt.fft). Default is None.
* **Returns:**
**out** – The truncated or zero-padded input, transformed along the axes
indicated by axes, or the last two axes if axes is not given.
* **Return type:**
complex Tensor
* **Raises:**
* [**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If s and axes have different length, or axes not given and
`len(s) != 2`.
* [**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – If an element of axes is larger than than the number of axes of a.
#### SEE ALSO
`mt.fft`
: Overall view of discrete Fourier transforms, with definitions and conventions used.
[`fft2`](maxframe.tensor.fft.fft2.md#maxframe.tensor.fft.fft2)
: The forward 2-dimensional FFT, of which ifft2 is the inverse.
[`ifftn`](maxframe.tensor.fft.ifftn.md#maxframe.tensor.fft.ifftn)
: The inverse of the *n*-dimensional FFT.
[`fft`](maxframe.tensor.fft.fft.md#maxframe.tensor.fft.fft)
: The one-dimensional FFT.
[`ifft`](maxframe.tensor.fft.ifft.md#maxframe.tensor.fft.ifft)
: The one-dimensional inverse FFT.
### Notes
ifft2 is just ifftn with a different default for axes.
See ifftn for details and a plotting example, and numpy.fft for
definition and conventions used.
Zero-padding, analogously with ifft, is performed by appending zeros to
the input along the specified dimension. Although this is the common
approach, it might lead to surprising results. If another form of zero
padding is desired, it must be performed before ifft2 is called.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = 4 * mt.eye(4)
>>> mt.fft.ifft2(a).execute()
array([[ 1.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 1.+0.j],
[ 0.+0.j, 0.+0.j, 1.+0.j, 0.+0.j],
[ 0.+0.j, 1.+0.j, 0.+0.j, 0.+0.j]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.ifftn.md
# maxframe.tensor.fft.ifftn
### maxframe.tensor.fft.ifftn(a, s=None, axes=None, norm=None)
Compute the N-dimensional inverse discrete Fourier Transform.
This function computes the inverse of the N-dimensional discrete
Fourier Transform over any number of axes in an M-dimensional tensor by
means of the Fast Fourier Transform (FFT). In other words,
`ifftn(fftn(a)) == a` to within numerical accuracy.
For a description of the definitions and conventions used, see mt.fft.
The input, analogously to ifft, should be ordered in the same way as is
returned by fftn, i.e. it should have the term for zero frequency
in all axes in the low-order corner, the positive frequency terms in the
first half of all axes, the term for the Nyquist frequency in the middle
of all axes and the negative frequency terms in the second half of all
axes, in order of decreasingly negative frequency.
* **Parameters:**
* **a** (*array_like*) – Input tensor, can be complex.
* **s** (*sequence* *of* *ints* *,* *optional*) – Shape (length of each transformed axis) of the output
(`s[0]` refers to axis 0, `s[1]` to axis 1, etc.).
This corresponds to `n` for `ifft(x, n)`.
Along any axis, if the given shape is smaller than that of the input,
the input is cropped. If it is larger, the input is padded with zeros.
if s is not given, the shape of the input along the axes specified
by axes is used. See notes for issue on ifft zero padding.
* **axes** (*sequence* *of* *ints* *,* *optional*) – Axes over which to compute the IFFT. If not given, the last `len(s)`
axes are used, or all axes if s is also not specified.
Repeated indices in axes means that the inverse transform over that
axis is performed multiple times.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see mt.fft). Default is None.
* **Returns:**
**out** – The truncated or zero-padded input, transformed along the axes
indicated by axes, or by a combination of s or a,
as explained in the parameters section above.
* **Return type:**
complex Tensor
* **Raises:**
* [**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If s and axes have different length.
* [**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – If an element of axes is larger than than the number of axes of a.
#### SEE ALSO
`mt.fft`
: Overall view of discrete Fourier transforms, with definitions and conventions used.
[`fftn`](maxframe.tensor.fft.fftn.md#maxframe.tensor.fft.fftn)
: The forward *n*-dimensional FFT, of which ifftn is the inverse.
[`ifft`](maxframe.tensor.fft.ifft.md#maxframe.tensor.fft.ifft)
: The one-dimensional inverse FFT.
[`ifft2`](maxframe.tensor.fft.ifft2.md#maxframe.tensor.fft.ifft2)
: The two-dimensional inverse FFT.
[`ifftshift`](maxframe.tensor.fft.ifftshift.md#maxframe.tensor.fft.ifftshift)
: Undoes fftshift, shifts zero-frequency terms to beginning of tensor.
### Notes
See mt.fft for definitions and conventions used.
Zero-padding, analogously with ifft, is performed by appending zeros to
the input along the specified dimension. Although this is the common
approach, it might lead to surprising results. If another form of zero
padding is desired, it must be performed before ifftn is called.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.eye(4)
>>> mt.fft.ifftn(mt.fft.fftn(a, axes=(0,)), axes=(1,)).execute()
array([[ 1.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 1.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 1.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 1.+0.j]])
```
Create and plot an image with band-limited frequency content:
```pycon
>>> import matplotlib.pyplot as plt
>>> n = mt.zeros((200,200), dtype=complex)
>>> n[60:80, 20:40] = mt.exp(1j*mt.random.uniform(0, 2*mt.pi, (20, 20)))
>>> im = mt.fft.ifftn(n).real
>>> plt.imshow(im.execute())
<matplotlib.image.AxesImage object at 0x...>
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.ifftshift.md
# maxframe.tensor.fft.ifftshift
### maxframe.tensor.fft.ifftshift(x, axes=None)
The inverse of fftshift. Although identical for even-length x, the
functions differ by one sample for odd-length x.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **axes** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *shape tuple* *,* *optional*) – Axes over which to calculate. Defaults to None, which shifts all axes.
* **Returns:**
**y** – The shifted tensor.
* **Return type:**
Tensor
#### SEE ALSO
[`fftshift`](maxframe.tensor.fft.fftshift.md#maxframe.tensor.fft.fftshift)
: Shift zero-frequency component to the center of the spectrum.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> freqs = mt.fft.fftfreq(9, d=1./9).reshape(3, 3)
>>> freqs.execute()
array([[ 0., 1., 2.],
[ 3., 4., -4.],
[-3., -2., -1.]])
>>> mt.fft.ifftshift(mt.fft.fftshift(freqs)).execute()
array([[ 0., 1., 2.],
[ 3., 4., -4.],
[-3., -2., -1.]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.ihfft.md
# maxframe.tensor.fft.ihfft
### maxframe.tensor.fft.ihfft(a, n=None, axis=-1, norm=None)
Compute the inverse FFT of a signal that has Hermitian symmetry.
* **Parameters:**
* **a** (*array_like*) – Input tensor.
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Length of the inverse FFT, the number of points along
transformation axis in the input to use. If n is smaller than
the length of the input, the input is cropped. If it is larger,
the input is padded with zeros. If n is not given, the length of
the input along the axis specified by axis is used.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis over which to compute the inverse FFT. If not given, the last
axis is used.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see numpy.fft). Default is None.
* **Returns:**
**out** – The truncated or zero-padded input, transformed along the axis
indicated by axis, or the last one if axis is not specified.
The length of the transformed axis is `n//2 + 1`.
* **Return type:**
complex Tensor
#### SEE ALSO
[`hfft`](maxframe.tensor.fft.hfft.md#maxframe.tensor.fft.hfft), [`irfft`](maxframe.tensor.fft.irfft.md#maxframe.tensor.fft.irfft)
### Notes
hfft/ihfft are a pair analogous to rfft/irfft, but for the
opposite case: here the signal has Hermitian symmetry in the time
domain and is real in the frequency domain. So here it’s hfft for
which you must supply the length of the result if it is to be odd:
* even: `ihfft(hfft(a, 2*len(a) - 2) == a`, within roundoff error,
* odd: `ihfft(hfft(a, 2*len(a) - 1) == a`, within roundoff error.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> spectrum = mt.array([ 15, -4, 0, -1, 0, -4])
>>> mt.fft.ifft(spectrum).execute()
array([ 1.+0.j, 2.-0.j, 3.+0.j, 4.+0.j, 3.+0.j, 2.-0.j])
>>> mt.fft.ihfft(spectrum).execute()
array([ 1.-0.j, 2.-0.j, 3.-0.j, 4.-0.j])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.irfft.md
# maxframe.tensor.fft.irfft
### maxframe.tensor.fft.irfft(a, n=None, axis=-1, norm=None)
Compute the inverse of the n-point DFT for real input.
This function computes the inverse of the one-dimensional *n*-point
discrete Fourier Transform of real input computed by rfft.
In other words, `irfft(rfft(a), len(a)) == a` to within numerical
accuracy. (See Notes below for why `len(a)` is necessary here.)
The input is expected to be in the form returned by rfft, i.e. the
real zero-frequency term followed by the complex positive frequency terms
in order of increasing frequency. Since the discrete Fourier Transform of
real input is Hermitian-symmetric, the negative frequency terms are taken
to be the complex conjugates of the corresponding positive frequency terms.
* **Parameters:**
* **a** (*array_like*) – The input tensor.
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Length of the transformed axis of the output.
For n output points, `n//2+1` input points are necessary. If the
input is longer than this, it is cropped. If it is shorter than this,
it is padded with zeros. If n is not given, it is determined from
the length of the input along the axis specified by axis.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis over which to compute the inverse FFT. If not given, the last
axis is used.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see mt.fft). Default is None.
* **Returns:**
**out** – The truncated or zero-padded input, transformed along the axis
indicated by axis, or the last one if axis is not specified.
The length of the transformed axis is n, or, if n is not given,
`2*(m-1)` where `m` is the length of the transformed axis of the
input. To get an odd number of output points, n must be specified.
* **Return type:**
Tensor
* **Raises:**
[**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – If axis is larger than the last axis of a.
#### SEE ALSO
`mt.fft`
: For definition of the DFT and conventions used.
[`rfft`](maxframe.tensor.fft.rfft.md#maxframe.tensor.fft.rfft)
: The one-dimensional FFT of real input, of which irfft is inverse.
[`fft`](maxframe.tensor.fft.fft.md#maxframe.tensor.fft.fft)
: The one-dimensional FFT.
[`irfft2`](maxframe.tensor.fft.irfft2.md#maxframe.tensor.fft.irfft2)
: The inverse of the two-dimensional FFT of real input.
[`irfftn`](maxframe.tensor.fft.irfftn.md#maxframe.tensor.fft.irfftn)
: The inverse of the *n*-dimensional FFT of real input.
### Notes
Returns the real valued n-point inverse discrete Fourier transform
of a, where a contains the non-negative frequency terms of a
Hermitian-symmetric sequence. n is the length of the result, not the
input.
If you specify an n such that a must be zero-padded or truncated, the
extra/removed values will be added/removed at high frequencies. One can
thus resample a series to m points via Fourier interpolation by:
`a_resamp = irfft(rfft(a), m)`.
### Examples
```pycon
>>> import maxframe.tenosr as mt
```
```pycon
>>> mt.fft.ifft([1, -1j, -1, 1j]).execute()
array([ 0.+0.j, 1.+0.j, 0.+0.j, 0.+0.j])
>>> mt.fft.irfft([1, -1j, -1]).execute()
array([ 0., 1., 0., 0.])
```
Notice how the last term in the input to the ordinary ifft is the
complex conjugate of the second term, and the output has zero imaginary
part everywhere. When calling irfft, the negative frequencies are not
specified, and the output array is purely real.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.irfft2.md
# maxframe.tensor.fft.irfft2
### maxframe.tensor.fft.irfft2(a, s=None, axes=(-2, -1), norm=None)
Compute the 2-dimensional inverse FFT of a real array.
* **Parameters:**
* **a** (*array_like*) – The input tensor
* **s** (*sequence* *of* *ints* *,* *optional*) – Shape of the inverse FFT.
* **axes** (*sequence* *of* *ints* *,* *optional*) – The axes over which to compute the inverse fft.
Default is the last two axes.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see mt.fft). Default is None.
* **Returns:**
**out** – The result of the inverse real 2-D FFT.
* **Return type:**
Tensor
#### SEE ALSO
[`irfftn`](maxframe.tensor.fft.irfftn.md#maxframe.tensor.fft.irfftn)
: Compute the inverse of the N-dimensional FFT of real input.
### Notes
This is really irfftn with different defaults.
For more details see irfftn.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.irfftn.md
# maxframe.tensor.fft.irfftn
### maxframe.tensor.fft.irfftn(a, s=None, axes=None, norm=None)
Compute the inverse of the N-dimensional FFT of real input.
This function computes the inverse of the N-dimensional discrete
Fourier Transform for real input over any number of axes in an
M-dimensional tensor by means of the Fast Fourier Transform (FFT). In
other words, `irfftn(rfftn(a), a.shape) == a` to within numerical
accuracy. (The `a.shape` is necessary like `len(a)` is for irfft,
and for the same reason.)
The input should be ordered in the same way as is returned by rfftn,
i.e. as for irfft for the final transformation axis, and as for ifftn
along all the other axes.
* **Parameters:**
* **a** (*array_like*) – Input tensor.
* **s** (*sequence* *of* *ints* *,* *optional*) – Shape (length of each transformed axis) of the output
(`s[0]` refers to axis 0, `s[1]` to axis 1, etc.). s is also the
number of input points used along this axis, except for the last axis,
where `s[-1]//2+1` points of the input are used.
Along any axis, if the shape indicated by s is smaller than that of
the input, the input is cropped. If it is larger, the input is padded
with zeros. If s is not given, the shape of the input along the
axes specified by axes is used.
* **axes** (*sequence* *of* *ints* *,* *optional*) – Axes over which to compute the inverse FFT. If not given, the last
len(s) axes are used, or all axes if s is also not specified.
Repeated indices in axes means that the inverse transform over that
axis is performed multiple times.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see mt.fft). Default is None.
* **Returns:**
**out** – The truncated or zero-padded input, transformed along the axes
indicated by axes, or by a combination of s or a,
as explained in the parameters section above.
The length of each transformed axis is as given by the corresponding
element of s, or the length of the input in every axis except for the
last one if s is not given. In the final transformed axis the length
of the output when s is not given is `2*(m-1)` where `m` is the
length of the final transformed axis of the input. To get an odd
number of output points in the final axis, s must be specified.
* **Return type:**
Tensor
* **Raises:**
* [**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If s and axes have different length.
* [**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – If an element of axes is larger than than the number of axes of a.
#### SEE ALSO
[`rfftn`](maxframe.tensor.fft.rfftn.md#maxframe.tensor.fft.rfftn)
: The forward n-dimensional FFT of real input, of which ifftn is the inverse.
[`fft`](maxframe.tensor.fft.fft.md#maxframe.tensor.fft.fft)
: The one-dimensional FFT, with definitions and conventions used.
[`irfft`](maxframe.tensor.fft.irfft.md#maxframe.tensor.fft.irfft)
: The inverse of the one-dimensional FFT of real input.
[`irfft2`](maxframe.tensor.fft.irfft2.md#maxframe.tensor.fft.irfft2)
: The inverse of the two-dimensional FFT of real input.
### Notes
See fft for definitions and conventions used.
See rfft for definitions and conventions used for real input.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.zeros((3, 2, 2))
>>> a[0, 0, 0] = 3 * 2 * 2
>>> mt.fft.irfftn(a).execute()
array([[[ 1., 1.],
[ 1., 1.]],
[[ 1., 1.],
[ 1., 1.]],
[[ 1., 1.],
[ 1., 1.]]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.rfft.md
# maxframe.tensor.fft.rfft
### maxframe.tensor.fft.rfft(a, n=None, axis=-1, norm=None)
Compute the one-dimensional discrete Fourier Transform for real input.
This function computes the one-dimensional *n*-point discrete Fourier
Transform (DFT) of a real-valued array by means of an efficient algorithm
called the Fast Fourier Transform (FFT).
* **Parameters:**
* **a** (*array_like*) – Input tensor
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Number of points along transformation axis in the input to use.
If n is smaller than the length of the input, the input is cropped.
If it is larger, the input is padded with zeros. If n is not given,
the length of the input along the axis specified by axis is used.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis over which to compute the FFT. If not given, the last axis is
used.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see mt.fft). Default is None.
* **Returns:**
**out** – The truncated or zero-padded input, transformed along the axis
indicated by axis, or the last one if axis is not specified.
If n is even, the length of the transformed axis is `(n/2)+1`.
If n is odd, the length is `(n+1)/2`.
* **Return type:**
complex Tensor
* **Raises:**
[**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – If axis is larger than the last axis of a.
#### SEE ALSO
`mt.fft`
: For definition of the DFT and conventions used.
[`irfft`](maxframe.tensor.fft.irfft.md#maxframe.tensor.fft.irfft)
: The inverse of rfft.
[`fft`](maxframe.tensor.fft.fft.md#maxframe.tensor.fft.fft)
: The one-dimensional FFT of general (complex) input.
[`fftn`](maxframe.tensor.fft.fftn.md#maxframe.tensor.fft.fftn)
: The *n*-dimensional FFT.
[`rfftn`](maxframe.tensor.fft.rfftn.md#maxframe.tensor.fft.rfftn)
: The *n*-dimensional FFT of real input.
### Notes
When the DFT is computed for purely real input, the output is
Hermitian-symmetric, i.e. the negative frequency terms are just the complex
conjugates of the corresponding positive-frequency terms, and the
negative-frequency terms are therefore redundant. This function does not
compute the negative frequency terms, and the length of the transformed
axis of the output is therefore `n//2 + 1`.
When `A = rfft(a)` and fs is the sampling frequency, `A[0]` contains
the zero-frequency term 0\*fs, which is real due to Hermitian symmetry.
If n is even, `A[-1]` contains the term representing both positive
and negative Nyquist frequency (+fs/2 and -fs/2), and must also be purely
real. If n is odd, there is no term at fs/2; `A[-1]` contains
the largest positive frequency (fs/2\*(n-1)/n), and is complex in the
general case.
If the input a contains an imaginary part, it is silently discarded.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.fft.fft([0, 1, 0, 0]).execute()
array([ 1.+0.j, 0.-1.j, -1.+0.j, 0.+1.j])
>>> mt.fft.rfft([0, 1, 0, 0]).execute()
array([ 1.+0.j, 0.-1.j, -1.+0.j])
```
Notice how the final element of the fft output is the complex conjugate
of the second element, for real input. For rfft, this symmetry is
exploited to compute only the non-negative frequency terms.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.rfft2.md
# maxframe.tensor.fft.rfft2
### maxframe.tensor.fft.rfft2(a, s=None, axes=(-2, -1), norm=None)
Compute the 2-dimensional FFT of a real tensor.
* **Parameters:**
* **a** (*array_like*) – Input tensor, taken to be real.
* **s** (*sequence* *of* *ints* *,* *optional*) – Shape of the FFT.
* **axes** (*sequence* *of* *ints* *,* *optional*) – Axes over which to compute the FFT.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see mt.fft). Default is None.
* **Returns:**
**out** – The result of the real 2-D FFT.
* **Return type:**
Tensor
#### SEE ALSO
[`rfftn`](maxframe.tensor.fft.rfftn.md#maxframe.tensor.fft.rfftn)
: Compute the N-dimensional discrete Fourier Transform for real input.
### Notes
This is really just rfftn with different default behavior.
For more details see rfftn.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.rfftfreq.md
# maxframe.tensor.fft.rfftfreq
### maxframe.tensor.fft.rfftfreq(n, d=1.0, gpu=None, chunk_size=None)
Return the Discrete Fourier Transform sample frequencies
(for usage with rfft, irfft).
The returned float tensor f contains the frequency bin centers in cycles
per unit of the sample spacing (with zero at the start). For instance, if
the sample spacing is in seconds, then the frequency unit is cycles/second.
Given a window length n and a sample spacing d:
```default
f = [0, 1, ..., n/2-1, n/2] / (d*n) if n is even
f = [0, 1, ..., (n-1)/2-1, (n-1)/2] / (d*n) if n is odd
```
Unlike fftfreq (but like scipy.fftpack.rfftfreq)
the Nyquist frequency component is considered to be positive.
* **Parameters:**
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Window length.
* **d** (*scalar* *,* *optional*) – Sample spacing (inverse of the sampling rate). Defaults to 1.
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **Returns:**
**f** – Tensor of length `n//2 + 1` containing the sample frequencies.
* **Return type:**
Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> signal = mt.array([-2, 8, 6, 4, 1, 0, 3, 5, -3, 4], dtype=float)
>>> fourier = mt.fft.rfft(signal)
>>> n = signal.size
>>> sample_rate = 100
>>> freq = mt.fft.fftfreq(n, d=1./sample_rate)
>>> freq.execute()
array([ 0., 10., 20., 30., 40., -50., -40., -30., -20., -10.])
>>> freq = mt.fft.rfftfreq(n, d=1./sample_rate)
>>> freq.execute()
array([ 0., 10., 20., 30., 40., 50.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fft.rfftn.md
# maxframe.tensor.fft.rfftn
### maxframe.tensor.fft.rfftn(a, s=None, axes=None, norm=None)
Compute the N-dimensional discrete Fourier Transform for real input.
This function computes the N-dimensional discrete Fourier Transform over
any number of axes in an M-dimensional real tensor by means of the Fast
Fourier Transform (FFT). By default, all axes are transformed, with the
real transform performed over the last axis, while the remaining
transforms are complex.
* **Parameters:**
* **a** (*array_like*) – Input tensor, taken to be real.
* **s** (*sequence* *of* *ints* *,* *optional*) – Shape (length along each transformed axis) to use from the input.
(`s[0]` refers to axis 0, `s[1]` to axis 1, etc.).
The final element of s corresponds to n for `rfft(x, n)`, while
for the remaining axes, it corresponds to n for `fft(x, n)`.
Along any axis, if the given shape is smaller than that of the input,
the input is cropped. If it is larger, the input is padded with zeros.
if s is not given, the shape of the input along the axes specified
by axes is used.
* **axes** (*sequence* *of* *ints* *,* *optional*) – Axes over which to compute the FFT. If not given, the last `len(s)`
axes are used, or all axes if s is also not specified.
* **norm** ( *{None* *,* *"ortho"}* *,* *optional*) – Normalization mode (see mt.fft). Default is None.
* **Returns:**
**out** – The truncated or zero-padded input, transformed along the axes
indicated by axes, or by a combination of s and a,
as explained in the parameters section above.
The length of the last axis transformed will be `s[-1]//2+1`,
while the remaining transformed axes will have lengths according to
s, or unchanged from the input.
* **Return type:**
complex Tensor
* **Raises:**
* [**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If s and axes have different length.
* [**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError) – If an element of axes is larger than than the number of axes of a.
#### SEE ALSO
[`irfftn`](maxframe.tensor.fft.irfftn.md#maxframe.tensor.fft.irfftn)
: The inverse of rfftn, i.e. the inverse of the n-dimensional FFT of real input.
[`fft`](maxframe.tensor.fft.fft.md#maxframe.tensor.fft.fft)
: The one-dimensional FFT, with definitions and conventions used.
[`rfft`](maxframe.tensor.fft.rfft.md#maxframe.tensor.fft.rfft)
: The one-dimensional FFT of real input.
[`fftn`](maxframe.tensor.fft.fftn.md#maxframe.tensor.fft.fftn)
: The n-dimensional FFT.
[`rfft2`](maxframe.tensor.fft.rfft2.md#maxframe.tensor.fft.rfft2)
: The two-dimensional FFT of real input.
### Notes
The transform for real input is performed over the last transformation
axis, as by rfft, then the transform over the remaining axes is
performed as by fftn. The order of the output is as for rfft for the
final transformation axis, and as for fftn for the remaining
transformation axes.
See fft for details, definitions and conventions used.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.ones((2, 2, 2))
>>> mt.fft.rfftn(a).execute()
array([[[ 8.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j]],
[[ 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j]]])
```
```pycon
>>> mt.fft.rfftn(a, axes=(2, 0)).execute()
array([[[ 4.+0.j, 0.+0.j],
[ 4.+0.j, 0.+0.j]],
[[ 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j]]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fill_diagonal.md
# maxframe.tensor.fill_diagonal
### maxframe.tensor.fill_diagonal(a, val, wrap=False)
Fill the main diagonal of the given tensor of any dimensionality.
For a tensor a with `a.ndim >= 2`, the diagonal is the list of
locations with indices `a[i, ..., i]` all identical. This function
modifies the input tensor in-place, it does not return a value.
* **Parameters:**
* **a** (*Tensor* *,* *at least 2-D.*) – Tensor whose diagonal is to be filled, it gets modified in-place.
* **val** (*scalar*) – Value to be written on the diagonal, its type must be compatible with
that of the tensor a.
* **wrap** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – For tall matrices in NumPy version up to 1.6.2, the
diagonal “wrapped” after N columns. You can have this behavior
with this option. This affects only tall matrices.
#### SEE ALSO
`diag_indices`, `diag_indices_from`
### Notes
This functionality can be obtained via diag_indices, but internally
this version uses a much faster implementation that never constructs the
indices and uses simple slicing.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = mt.zeros((3, 3), int)
>>> mt.fill_diagonal(a, 5)
>>> a.execute()
array([[5, 0, 0],
[0, 5, 0],
[0, 0, 5]])
```
The same function can operate on a 4-D tensor:
```pycon
>>> a = mt.zeros((3, 3, 3, 3), int)
>>> mt.fill_diagonal(a, 4)
```
We only show a few blocks for clarity:
```pycon
>>> a[0, 0].execute()
array([[4, 0, 0],
[0, 0, 0],
[0, 0, 0]])
>>> a[1, 1].execute()
array([[0, 0, 0],
[0, 4, 0],
[0, 0, 0]])
>>> a[2, 2].execute()
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 4]])
```
The wrap option affects only tall matrices:
```pycon
>>> # tall matrices no wrap
>>> a = mt.zeros((5, 3), int)
>>> mt.fill_diagonal(a, 4)
>>> a.execute()
array([[4, 0, 0],
[0, 4, 0],
[0, 0, 4],
[0, 0, 0],
[0, 0, 0]])
```
```pycon
>>> # tall matrices wrap
>>> a = mt.zeros((5, 3), int)
>>> mt.fill_diagonal(a, 4, wrap=True)
>>> a.execute()
array([[4, 0, 0],
[0, 4, 0],
[0, 0, 4],
[0, 0, 0],
[4, 0, 0]])
```
```pycon
>>> # wide matrices
>>> a = mt.zeros((3, 5), int)
>>> mt.fill_diagonal(a, 4, wrap=True)
>>> a.execute()
array([[4, 0, 0, 0, 0],
[0, 4, 0, 0, 0],
[0, 0, 4, 0, 0]])
```
The anti-diagonal can be filled by reversing the order of elements
using either numpy.flipud or numpy.fliplr.
```pycon
>>> a = mt.zeros((3, 3), int)
>>> mt.fill_diagonal(mt.fliplr(a), [1,2,3]) # Horizontal flip
>>> a.execute()
array([[0, 0, 1],
[0, 2, 0],
[3, 0, 0]])
>>> mt.fill_diagonal(mt.flipud(a), [1,2,3]) # Vertical flip
>>> a.execute()
array([[0, 0, 3],
[0, 2, 0],
[1, 0, 0]])
```
Note that the order in which the diagonal is filled varies depending
on the flip function.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fix.md
# maxframe.tensor.fix
### maxframe.tensor.fix(x, out=None, \*\*kwargs)
Round to nearest integer towards zero.
Round a tensor of floats element-wise to nearest integer towards zero.
The rounded values are returned as floats.
* **Parameters:**
* **x** (*array_like*) – An tensor of floats to be rounded
* **out** (*Tensor* *,* *optional*) – Output tensor
* **Returns:**
**out** – The array of rounded numbers
* **Return type:**
Tensor of floats
#### SEE ALSO
[`trunc`](maxframe.tensor.trunc.md#maxframe.tensor.trunc), [`floor`](maxframe.tensor.floor.md#maxframe.tensor.floor), [`ceil`](maxframe.tensor.ceil.md#maxframe.tensor.ceil)
[`around`](maxframe.tensor.around.md#maxframe.tensor.around)
: Round to given number of decimals
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.fix(3.14).execute()
3.0
>>> mt.fix(3).execute()
3.0
>>> mt.fix([2.1, 2.9, -2.1, -2.9]).execute()
array([ 2., 2., -2., -2.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.flatnonzero.md
# maxframe.tensor.flatnonzero
### maxframe.tensor.flatnonzero(a)
Return indices that are non-zero in the flattened version of a.
This is equivalent to a.ravel().nonzero()[0].
* **Parameters:**
**a** (*Tensor*) – Input tensor.
* **Returns:**
**res** – Output tensor, containing the indices of the elements of a.ravel()
that are non-zero.
* **Return type:**
Tensor
#### SEE ALSO
[`nonzero`](maxframe.tensor.nonzero.md#maxframe.tensor.nonzero)
: Return the indices of the non-zero elements of the input tensor.
[`ravel`](maxframe.tensor.ravel.md#maxframe.tensor.ravel)
: Return a 1-D tensor containing the elements of the input tensor.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(-2, 3)
>>> x.execute()
array([-2, -1, 0, 1, 2])
>>> mt.flatnonzero(x).execute()
array([0, 1, 3, 4])
```
Use the indices of the non-zero elements as an index array to extract
these elements:
```pycon
>>> x.ravel()[mt.flatnonzero(x)].execute()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.flip.md
# maxframe.tensor.flip
### maxframe.tensor.flip(m, axis)
Reverse the order of elements in a tensor along the given axis.
The shape of the array is preserved, but the elements are reordered.
* **Parameters:**
* **m** (*array_like*) – Input tensor.
* **axis** (*integer*) – Axis in tensor, which entries are reversed.
* **Returns:**
**out** – A view of m with the entries of axis reversed. Since a view is
returned, this operation is done in constant time.
* **Return type:**
array_like
#### SEE ALSO
[`flipud`](maxframe.tensor.flipud.md#maxframe.tensor.flipud)
: Flip a tensor vertically (axis=0).
[`fliplr`](maxframe.tensor.fliplr.md#maxframe.tensor.fliplr)
: Flip a tensor horizontally (axis=1).
### Notes
flip(m, 0) is equivalent to flipud(m).
flip(m, 1) is equivalent to fliplr(m).
flip(m, n) corresponds to `m[...,::-1,...]` with `::-1` at position n.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> A = mt.arange(8).reshape((2,2,2))
>>> A.execute()
array([[[0, 1],
[2, 3]],
```
> [[4, 5],
> : [6, 7]]])
```pycon
>>> mt.flip(A, 0).execute()
array([[[4, 5],
[6, 7]],
```
> [[0, 1],
> : [2, 3]]])
```pycon
>>> mt.flip(A, 1).execute()
array([[[2, 3],
[0, 1]],
```
> [[6, 7],
> : [4, 5]]])
```pycon
>>> A = mt.random.randn(3,4,5)
>>> mt.all(mt.flip(A,2) == A[:,:,::-1,...]).execute()
True
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fliplr.md
# maxframe.tensor.fliplr
### maxframe.tensor.fliplr(m)
Flip tensor in the left/right direction.
Flip the entries in each row in the left/right direction.
Columns are preserved, but appear in a different order than before.
* **Parameters:**
**m** (*array_like*) – Input tensor, must be at least 2-D.
* **Returns:**
**f** – A view of m with the columns reversed. Since a view
is returned, this operation is $\mathcal O(1)$.
* **Return type:**
Tensor
#### SEE ALSO
[`flipud`](maxframe.tensor.flipud.md#maxframe.tensor.flipud)
: Flip array in the up/down direction.
`rot90`
: Rotate array counterclockwise.
### Notes
Equivalent to m[:,::-1]. Requires the tensor to be at least 2-D.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> A = mt.diag([1.,2.,3.])
>>> A.execute()
array([[ 1., 0., 0.],
[ 0., 2., 0.],
[ 0., 0., 3.]])
>>> mt.fliplr(A).execute()
array([[ 0., 0., 1.],
[ 0., 2., 0.],
[ 3., 0., 0.]])
```
```pycon
>>> A = mt.random.randn(2,3,5)
>>> mt.all(mt.fliplr(A) == A[:,::-1,...]).execute()
True
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.flipud.md
# maxframe.tensor.flipud
### maxframe.tensor.flipud(m)
Flip tensor in the up/down direction.
Flip the entries in each column in the up/down direction.
Rows are preserved, but appear in a different order than before.
* **Parameters:**
**m** (*array_like*) – Input tensor.
* **Returns:**
**out** – A view of m with the rows reversed. Since a view is
returned, this operation is $\mathcal O(1)$.
* **Return type:**
array_like
#### SEE ALSO
[`fliplr`](maxframe.tensor.fliplr.md#maxframe.tensor.fliplr)
: Flip tensor in the left/right direction.
`rot90`
: Rotate tensor counterclockwise.
### Notes
Equivalent to `m[::-1,...]`.
Does not require the tensor to be two-dimensional.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> A = mt.diag([1.0, 2, 3])
>>> A.execute()
array([[ 1., 0., 0.],
[ 0., 2., 0.],
[ 0., 0., 3.]])
>>> mt.flipud(A).execute()
array([[ 0., 0., 3.],
[ 0., 2., 0.],
[ 1., 0., 0.]])
```
```pycon
>>> A = mt.random.randn(2,3,5)
>>> mt.all(mt.flipud(A) == A[::-1,...]).execute()
True
```
```pycon
>>> mt.flipud([1,2]).execute()
array([2, 1])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.float_power.md
# maxframe.tensor.float_power
### maxframe.tensor.float_power(x1, x2, out=None, where=None, \*\*kwargs)
First tensor elements raised to powers from second array, element-wise.
Raise each base in x1 to the positionally-corresponding power in x2.
x1 and x2 must be broadcastable to the same shape. This differs from
the power function in that integers, float16, and float32 are promoted to
floats with a minimum precision of float64 so that the result is always
inexact. The intent is that the function will return a usable result for
negative powers and seldom overflow for positive powers.
* **Parameters:**
* **x1** (*array_like*) – The bases.
* **x2** (*array_like*) – The exponents.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The bases in x1 raised to the exponents in x2.
* **Return type:**
Tensor
#### SEE ALSO
[`power`](maxframe.tensor.power.md#maxframe.tensor.power)
: power function that preserves type
### Examples
Cube each element in a list.
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x1 = range(6)
>>> x1
[0, 1, 2, 3, 4, 5]
>>> mt.float_power(x1, 3).execute()
array([ 0., 1., 8., 27., 64., 125.])
```
Raise the bases to different exponents.
```pycon
>>> x2 = [1.0, 2.0, 3.0, 3.0, 2.0, 1.0]
>>> mt.float_power(x1, x2).execute()
array([ 0., 1., 8., 27., 16., 5.])
```
The effect of broadcasting.
```pycon
>>> x2 = mt.array([[1, 2, 3, 3, 2, 1], [1, 2, 3, 3, 2, 1]])
>>> x2.execute()
array([[1, 2, 3, 3, 2, 1],
[1, 2, 3, 3, 2, 1]])
>>> mt.float_power(x1, x2).execute()
array([[ 0., 1., 8., 27., 16., 5.],
[ 0., 1., 8., 27., 16., 5.]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.floor.md
# maxframe.tensor.floor
### maxframe.tensor.floor(x, out=None, where=None, \*\*kwargs)
Return the floor of the input, element-wise.
The floor of the scalar x is the largest integer i, such that
i <= x. It is often denoted as $\lfloor x \rfloor$.
* **Parameters:**
* **x** (*array_like*) – Input data.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The floor of each element in x.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`ceil`](maxframe.tensor.ceil.md#maxframe.tensor.ceil), [`trunc`](maxframe.tensor.trunc.md#maxframe.tensor.trunc), [`rint`](maxframe.tensor.rint.md#maxframe.tensor.rint)
### Notes
Some spreadsheet programs calculate the “floor-towards-zero”, in other
words `floor(-2.5) == -2`. NumPy instead uses the definition of
floor where floor(-2.5) == -3.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([-1.7, -1.5, -0.2, 0.2, 1.5, 1.7, 2.0])
>>> mt.floor(a).execute()
array([-2., -2., -1., 0., 1., 1., 2.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.floor_divide.md
# maxframe.tensor.floor_divide
### maxframe.tensor.floor_divide(x1, x2, out=None, where=None, \*\*kwargs)
Return the largest integer smaller or equal to the division of the inputs.
It is equivalent to the Python `//` operator and pairs with the
Python `%` (remainder), function so that `b = a % b + b * (a // b)`
up to roundoff.
* **Parameters:**
* **x1** (*array_like*) – Numerator.
* **x2** (*array_like*) – Denominator.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated array is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – y = floor(x1/x2)
* **Return type:**
Tensor
#### SEE ALSO
[`remainder`](maxframe.tensor.remainder.md#maxframe.tensor.remainder)
: Remainder complementary to floor_divide.
[`divmod`](https://docs.python.org/3/library/functions.html#divmod)
: Simultaneous floor division and remainder.
[`divide`](maxframe.tensor.divide.md#maxframe.tensor.divide)
: Standard division.
[`floor`](maxframe.tensor.floor.md#maxframe.tensor.floor)
: Round a number to the nearest integer toward minus infinity.
[`ceil`](maxframe.tensor.ceil.md#maxframe.tensor.ceil)
: Round a number to the nearest integer toward infinity.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.floor_divide(7,3).execute()
2
>>> mt.floor_divide([1., 2., 3., 4.], 2.5).execute()
array([ 0., 0., 1., 1.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fmax.md
# maxframe.tensor.fmax
### maxframe.tensor.fmax(x1, x2, out=None, where=None, \*\*kwargs)
Element-wise maximum of array elements.
Compare two tensors and returns a new tensor containing the element-wise
maxima. If one of the elements being compared is a NaN, then the
non-nan element is returned. If both elements are NaNs then the first
is returned. The latter distinction is important for complex NaNs,
which are defined as at least one of the real or imaginary parts being
a NaN. The net effect is that NaNs are ignored when possible.
* **Parameters:**
* **x1** (*array_like*) – The tensors holding the elements to be compared. They must have
the same shape.
* **x2** (*array_like*) – The tensors holding the elements to be compared. They must have
the same shape.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The maximum of x1 and x2, element-wise. Returns scalar if
both x1 and x2 are scalars.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`fmin`](maxframe.tensor.fmin.md#maxframe.tensor.fmin)
: Element-wise minimum of two tensors, ignores NaNs.
[`maximum`](maxframe.tensor.maximum.md#maxframe.tensor.maximum)
: Element-wise maximum of two tensors, propagates NaNs.
`amax`
: The maximum value of an tensor along a given axis, propagates NaNs.
[`nanmax`](maxframe.tensor.nanmax.md#maxframe.tensor.nanmax)
: The maximum value of an tensor along a given axis, ignores NaNs.
[`minimum`](maxframe.tensor.minimum.md#maxframe.tensor.minimum), `amin`, [`nanmin`](maxframe.tensor.nanmin.md#maxframe.tensor.nanmin)
### Notes
The fmax is equivalent to `mt.where(x1 >= x2, x1, x2)` when neither
x1 nor x2 are NaNs, but it is faster and does proper broadcasting.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.fmax([2, 3, 4], [1, 5, 2]).execute()
array([ 2., 5., 4.])
```
```pycon
>>> mt.fmax(mt.eye(2), [0.5, 2]).execute()
array([[ 1. , 2. ],
[ 0.5, 2. ]])
```
```pycon
>>> mt.fmax([mt.nan, 0, mt.nan],[0, mt.nan, mt.nan]).execute()
array([ 0., 0., NaN])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fmin.md
# maxframe.tensor.fmin
### maxframe.tensor.fmin(x1, x2, out=None, where=None, \*\*kwargs)
Element-wise minimum of array elements.
Compare two tensors and returns a new tensor containing the element-wise
minima. If one of the elements being compared is a NaN, then the
non-nan element is returned. If both elements are NaNs then the first
is returned. The latter distinction is important for complex NaNs,
which are defined as at least one of the real or imaginary parts being
a NaN. The net effect is that NaNs are ignored when possible.
* **Parameters:**
* **x1** (*array_like*) – The tensors holding the elements to be compared. They must have
the same shape.
* **x2** (*array_like*) – The tensors holding the elements to be compared. They must have
the same shape.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The minimum of x1 and x2, element-wise. Returns scalar if
both x1 and x2 are scalars.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`fmax`](maxframe.tensor.fmax.md#maxframe.tensor.fmax)
: Element-wise maximum of two tensors, ignores NaNs.
[`minimum`](maxframe.tensor.minimum.md#maxframe.tensor.minimum)
: Element-wise minimum of two tensors, propagates NaNs.
`amin`
: The minimum value of a tensor along a given axis, propagates NaNs.
[`nanmin`](maxframe.tensor.nanmin.md#maxframe.tensor.nanmin)
: The minimum value of a tensor along a given axis, ignores NaNs.
[`maximum`](maxframe.tensor.maximum.md#maxframe.tensor.maximum), `amax`, [`nanmax`](maxframe.tensor.nanmax.md#maxframe.tensor.nanmax)
### Notes
The fmin is equivalent to `mt.where(x1 <= x2, x1, x2)` when neither
x1 nor x2 are NaNs, but it is faster and does proper broadcasting.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.fmin([2, 3, 4], [1, 5, 2]).execute()
array([1, 3, 2])
```
```pycon
>>> mt.fmin(mt.eye(2), [0.5, 2]).execute()
array([[ 0.5, 0. ],
[ 0. , 1. ]])
```
```pycon
>>> mt.fmin([mt.nan, 0, mt.nan],[0, mt.nan, mt.nan]).execute()
array([ 0., 0., NaN])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.fmod.md
# maxframe.tensor.fmod
### maxframe.tensor.fmod(x1, x2, out=None, where=None, \*\*kwargs)
Return the element-wise remainder of division.
This is the NumPy implementation of the C library function fmod, the
remainder has the same sign as the dividend x1. It is equivalent to
the Matlab(TM) `rem` function and should not be confused with the
Python modulus operator `x1 % x2`.
* **Parameters:**
* **x1** (*array_like*) – Dividend.
* **x2** (*array_like*) – Divisor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs** – For other keyword-only arguments, see the
[ufunc docs](https://numpy.org/doc/stable/reference/ufuncs.html#ufuncs-kwargs).
* **Returns:**
**y** – The remainder of the division of x1 by x2.
* **Return type:**
Tensor_like
#### SEE ALSO
[`remainder`](maxframe.tensor.remainder.md#maxframe.tensor.remainder)
: Equivalent to the Python `%` operator.
[`divide`](maxframe.tensor.divide.md#maxframe.tensor.divide)
### Notes
The result of the modulo operation for negative dividend and divisors
is bound by conventions. For fmod, the sign of result is the sign of
the dividend, while for remainder the sign of the result is the sign
of the divisor. The fmod function is equivalent to the Matlab(TM)
`rem` function.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.fmod([-3, -2, -1, 1, 2, 3], 2).execute()
array([-1, 0, -1, 1, 0, 1])
>>> mt.remainder([-3, -2, -1, 1, 2, 3], 2).execute()
array([1, 0, 1, 1, 0, 1])
```
```pycon
>>> mt.fmod([5, 3], [2, 2.]).execute()
array([ 1., 1.])
>>> a = mt.arange(-3, 3).reshape(3, 2)
>>> a.execute()
array([[-3, -2],
[-1, 0],
[ 1, 2]])
>>> mt.fmod(a, [2,2]).execute()
array([[-1, 0],
[-1, 0],
[ 1, 0]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.frexp.md
# maxframe.tensor.frexp
### maxframe.tensor.frexp(x, out1=None, out2=None, out=None, where=None, \*\*kwargs)
Decompose the elements of x into mantissa and twos exponent.
Returns (mantissa, exponent), where x = mantissa \* 2\*\*exponent\`.
The mantissa is lies in the open interval(-1, 1), while the twos
exponent is a signed integer.
* **Parameters:**
* **x** (*array_like*) – Tensor of numbers to be decomposed.
* **out1** (*Tensor* *,* *optional*) – Output tensor for the mantissa. Must have the same shape as x.
* **out2** (*Tensor* *,* *optional*) – Output tensor for the exponent. Must have the same shape as x.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**(mantissa, exponent)** – mantissa is a float array with values between -1 and 1.
exponent is an int array which represents the exponent of 2.
* **Return type:**
[tuple](https://docs.python.org/3/library/stdtypes.html#tuple) of tensors, ([float](https://docs.python.org/3/library/functions.html#float), [int](https://docs.python.org/3/library/functions.html#int))
#### SEE ALSO
[`ldexp`](maxframe.tensor.ldexp.md#maxframe.tensor.ldexp)
: Compute `y = x1 * 2**x2`, the inverse of frexp.
### Notes
Complex dtypes are not supported, they will raise a TypeError.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(9)
>>> y1, y2 = mt.frexp(x)
```
```pycon
>>> y1_result, y2_result = mt.ExecutableTuple([y1, y2]).execute()
>>> y1_result
array([ 0. , 0.5 , 0.5 , 0.75 , 0.5 , 0.625, 0.75 , 0.875,
0.5 ])
>>> y2_result
array([0, 1, 2, 2, 3, 3, 3, 3, 4])
>>> (y1 * 2**y2).execute()
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.full.md
# maxframe.tensor.full
### maxframe.tensor.full(shape, fill_value, dtype=None, chunk_size=None, gpu=None, order='C')
Return a new tensor of given shape and type, filled with fill_value.
* **Parameters:**
* **shape** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *sequence* *of* *ints*) – Shape of the new tensor, e.g., `(2, 3)` or `2`.
* **fill_value** (*scalar*) – Fill value.
* **dtype** (*data-type* *,* *optional*) –
The desired data-type for the tensor The default, None, means
: np.array(fill_value).dtype.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **order** ( *{'C'* *,* *'F'}* *,* *optional*) – Whether to store multidimensional data in C- or Fortran-contiguous
(row- or column-wise) order in memory.
* **Returns:**
**out** – Tensor of fill_value with the given shape, dtype, and order.
* **Return type:**
Tensor
#### SEE ALSO
`zeros_like`
: Return a tensor of zeros with shape and type of input.
`ones_like`
: Return a tensor of ones with shape and type of input.
[`empty_like`](maxframe.tensor.empty_like.md#maxframe.tensor.empty_like)
: Return an empty tensor with shape and type of input.
[`full_like`](maxframe.tensor.full_like.md#maxframe.tensor.full_like)
: Fill a tensor with shape and type of input.
[`zeros`](maxframe.tensor.zeros.md#maxframe.tensor.zeros)
: Return a new tensor setting values to zero.
[`ones`](maxframe.tensor.ones.md#maxframe.tensor.ones)
: Return a new tensor setting values to one.
[`empty`](maxframe.tensor.empty.md#maxframe.tensor.empty)
: Return a new uninitialized tensor.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.full((2, 2), mt.inf).execute()
array([[ inf, inf],
[ inf, inf]])
>>> mt.full((2, 2), 10).execute()
array([[10, 10],
[10, 10]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.full_like.md
# maxframe.tensor.full_like
### maxframe.tensor.full_like(a, fill_value, dtype=None, gpu=None, order='K')
Return a full tensor with the same shape and type as a given tensor.
* **Parameters:**
* **a** (*array_like*) – The shape and data-type of a define these same attributes of
the returned tensor.
* **fill_value** (*scalar*) – Fill value.
* **dtype** (*data-type* *,* *optional*) – Overrides the data type of the result.
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, None as default
* **order** ( *{'C'* *,* *'F'* *,* *'A'* *, or* *'K'}* *,* *optional*) – Overrides the memory layout of the result. ‘C’ means C-order,
‘F’ means F-order, ‘A’ means ‘F’ if a is Fortran contiguous,
‘C’ otherwise. ‘K’ means match the layout of a as closely
as possible.
* **Returns:**
**out** – Tensor of fill_value with the same shape and type as a.
* **Return type:**
Tensor
#### SEE ALSO
[`empty_like`](maxframe.tensor.empty_like.md#maxframe.tensor.empty_like)
: Return an empty tensor with shape and type of input.
`ones_like`
: Return a tensor of ones with shape and type of input.
`zeros_like`
: Return a tensor of zeros with shape and type of input.
[`full`](maxframe.tensor.full.md#maxframe.tensor.full)
: Return a new tensor of given shape filled with value.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> x = mt.arange(6, dtype=int)
>>> mt.full_like(x, 1).execute()
array([1, 1, 1, 1, 1, 1])
>>> mt.full_like(x, 0.1).execute()
array([0, 0, 0, 0, 0, 0])
>>> mt.full_like(x, 0.1, dtype=mt.double).execute()
array([ 0.1, 0.1, 0.1, 0.1, 0.1, 0.1])
>>> mt.full_like(x, mt.nan, dtype=mt.double).execute()
array([ nan, nan, nan, nan, nan, nan])
```
```pycon
>>> y = mt.arange(6, dtype=mt.double)
>>> mt.full_like(y, 0.1).execute()
array([ 0.1, 0.1, 0.1, 0.1, 0.1, 0.1])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.greater.md
# maxframe.tensor.greater
### maxframe.tensor.greater(x1, x2, out=None, where=None, \*\*kwargs)
Return the truth value of (x1 > x2) element-wise.
* **Parameters:**
* **x1** (*array_like*) – Input tensors. If `x1.shape != x2.shape`, they must be
broadcastable to a common shape (which may be the shape of one or
the other).
* **x2** (*array_like*) – Input tensors. If `x1.shape != x2.shape`, they must be
broadcastable to a common shape (which may be the shape of one or
the other).
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Array of bools, or a single bool if x1 and x2 are scalars.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool) or Tensor of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`greater_equal`](maxframe.tensor.greater_equal.md#maxframe.tensor.greater_equal), [`less`](maxframe.tensor.less.md#maxframe.tensor.less), [`less_equal`](maxframe.tensor.less_equal.md#maxframe.tensor.less_equal), [`equal`](maxframe.tensor.equal.md#maxframe.tensor.equal), [`not_equal`](maxframe.tensor.not_equal.md#maxframe.tensor.not_equal)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.greater([4,2],[2,2]).execute()
array([ True, False])
```
If the inputs are ndarrays, then np.greater is equivalent to ‘>’.
```pycon
>>> a = mt.array([4,2])
>>> b = mt.array([2,2])
>>> (a > b).execute()
array([ True, False])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.greater_equal.md
# maxframe.tensor.greater_equal
### maxframe.tensor.greater_equal(x1, x2, out=None, where=None, \*\*kwargs)
Return the truth value of (x1 >= x2) element-wise.
* **Parameters:**
* **x1** (*array_like*) – Input tensors. If `x1.shape != x2.shape`, they must be
broadcastable to a common shape (which may be the shape of one or
the other).
* **x2** (*array_like*) – Input tensors. If `x1.shape != x2.shape`, they must be
broadcastable to a common shape (which may be the shape of one or
the other).
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Array of bools, or a single bool if x1 and x2 are scalars.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool) or Tensor of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`greater`](maxframe.tensor.greater.md#maxframe.tensor.greater), [`less`](maxframe.tensor.less.md#maxframe.tensor.less), [`less_equal`](maxframe.tensor.less_equal.md#maxframe.tensor.less_equal), [`equal`](maxframe.tensor.equal.md#maxframe.tensor.equal), [`not_equal`](maxframe.tensor.not_equal.md#maxframe.tensor.not_equal)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.greater_equal([4, 2, 1], [2, 2, 2]).execute()
array([ True, True, False])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.histogram.md
# maxframe.tensor.histogram
### maxframe.tensor.histogram(a, bins=10, range=None, weights=None, density=None)
Compute the histogram of a set of data.
* **Parameters:**
* **a** (*array_like*) – Input data. The histogram is computed over the flattened tensor.
* **bins** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *sequence* *of* *scalars* *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) –
If bins is an int, it defines the number of equal-width
bins in the given range (10, by default). If bins is a
sequence, it defines a monotonically increasing tensor of bin edges,
including the rightmost edge, allowing for non-uniform bin widths.
If bins is a string, it defines the method used to calculate the
optimal bin width, as defined by histogram_bin_edges.
* **range** ( *(*[*float*](https://docs.python.org/3/library/functions.html#float) *,* [*float*](https://docs.python.org/3/library/functions.html#float) *)* *,* *optional*) – The lower and upper range of the bins. If not provided, range
is simply `(a.min(), a.max())`. Values outside the range are
ignored. The first element of the range must be less than or
equal to the second. range affects the automatic bin
computation as well. While bin width is computed to be optimal
based on the actual data within range, the bin count will fill
the entire range including portions containing no data.
* **weights** (*array_like* *,* *optional*) – A tensor of weights, of the same shape as a. Each value in
a only contributes its associated weight towards the bin count
(instead of 1). If density is True, the weights are
normalized, so that the integral of the density over the range
remains 1.
* **density** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If `False`, the result will contain the number of samples in
each bin. If `True`, the result is the value of the
probability *density* function at the bin, normalized such that
the *integral* over the range is 1. Note that the sum of the
histogram values will not be equal to 1 unless bins of unity
width are chosen; it is not a probability *mass* function.
Overrides the `normed` keyword if given.
* **Returns:**
* **hist** (*tensor*) – The values of the histogram. See density and weights for a
description of the possible semantics.
* **bin_edges** (*tensor of dtype float*) – Return the bin edges `(length(hist)+1)`.
#### SEE ALSO
`histogramdd`, [`bincount`](maxframe.tensor.bincount.md#maxframe.tensor.bincount), `searchsorted`, [`digitize`](maxframe.tensor.digitize.md#maxframe.tensor.digitize), [`histogram_bin_edges`](maxframe.tensor.histogram_bin_edges.md#maxframe.tensor.histogram_bin_edges)
### Notes
All but the last (righthand-most) bin is half-open. In other words,
if bins is:
```default
[1, 2, 3, 4]
```
then the first bin is `[1, 2)` (including 1, but excluding 2) and
the second `[2, 3)`. The last bin, however, is `[3, 4]`, which
*includes* 4.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> mt.histogram([1, 2, 1], bins=[0, 1, 2, 3]).execute()
(array([0, 2, 1]), array([0, 1, 2, 3]))
>>> mt.histogram(mt.arange(4), bins=mt.arange(5), density=True).execute()
(array([0.25, 0.25, 0.25, 0.25]), array([0, 1, 2, 3, 4]))
>>> mt.histogram([[1, 2, 1], [1, 0, 1]], bins=[0,1,2,3]).execute()
(array([1, 4, 1]), array([0, 1, 2, 3]))
```
```pycon
>>> a = mt.arange(5)
>>> hist, bin_edges = mt.histogram(a, density=True)
>>> hist.execute()
array([0.5, 0. , 0.5, 0. , 0. , 0.5, 0. , 0.5, 0. , 0.5])
>>> hist.sum().execute()
2.4999999999999996
>>> mt.sum(hist * mt.diff(bin_edges)).execute()
1.0
```
Automated Bin Selection Methods example, using 2 peak random data
with 2000 points:
```pycon
>>> import matplotlib.pyplot as plt
>>> rng = mt.random.RandomState(10) # deterministic random data
>>> a = mt.hstack((rng.normal(size=1000),
... rng.normal(loc=5, scale=2, size=1000)))
>>> _ = plt.hist(np.asarray(a), bins='auto') # arguments are passed to np.histogram
>>> plt.title("Histogram with 'auto' bins")
Text(0.5, 1.0, "Histogram with 'auto' bins")
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.histogram_bin_edges.md
# maxframe.tensor.histogram_bin_edges
### maxframe.tensor.histogram_bin_edges(a, bins=10, range=None, weights=None)
Function to calculate only the edges of the bins used by the histogram
function.
* **Parameters:**
* **a** (*array_like*) – Input data. The histogram is computed over the flattened tensor.
* **bins** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *sequence* *of* *scalars* *or* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) –
If bins is an int, it defines the number of equal-width
bins in the given range (10, by default). If bins is a
sequence, it defines the bin edges, including the rightmost
edge, allowing for non-uniform bin widths.
If bins is a string from the list below, histogram_bin_edges will use
the method chosen to calculate the optimal bin width and
consequently the number of bins (see Notes for more detail on
the estimators) from the data that falls within the requested
range. While the bin width will be optimal for the actual data
in the range, the number of bins will be computed to fill the
entire range, including the empty portions. For visualisation,
using the ‘auto’ option is suggested. Weighted data is not
supported for automated bin size selection.
’auto’
: Maximum of the ‘sturges’ and ‘fd’ estimators. Provides good
all around performance.
’fd’ (Freedman Diaconis Estimator)
: Robust (resilient to outliers) estimator that takes into
account data variability and data size.
’doane’
: An improved version of Sturges’ estimator that works better
with non-normal datasets.
’scott’
: Less robust estimator that that takes into account data
variability and data size.
’stone’
: Estimator based on leave-one-out cross-validation estimate of
the integrated squared error. Can be regarded as a generalization
of Scott’s rule.
’rice’
: Estimator does not take variability into account, only data
size. Commonly overestimates number of bins required.
’sturges’
: R’s default method, only accounts for data size. Only
optimal for gaussian data and underestimates number of bins
for large non-gaussian datasets.
’sqrt’
: Square root (of data size) estimator, used by Excel and
other programs for its speed and simplicity.
* **range** ( *(*[*float*](https://docs.python.org/3/library/functions.html#float) *,* [*float*](https://docs.python.org/3/library/functions.html#float) *)* *,* *optional*) – The lower and upper range of the bins. If not provided, range
is simply `(a.min(), a.max())`. Values outside the range are
ignored. The first element of the range must be less than or
equal to the second. range affects the automatic bin
computation as well. While bin width is computed to be optimal
based on the actual data within range, the bin count will fill
the entire range including portions containing no data.
* **weights** (*array_like* *,* *optional*) – A tensor of weights, of the same shape as a. Each value in
a only contributes its associated weight towards the bin count
(instead of 1). This is currently not used by any of the bin estimators,
but may be in the future.
* **Returns:**
**bin_edges** – The edges to pass into histogram
* **Return type:**
tensor of dtype float
#### SEE ALSO
[`histogram`](maxframe.tensor.histogram.md#maxframe.tensor.histogram)
### Notes
The methods to estimate the optimal number of bins are well founded
in literature, and are inspired by the choices R provides for
histogram visualisation. Note that having the number of bins
proportional to $n^{1/3}$ is asymptotically optimal, which is
why it appears in most estimators. These are simply plug-in methods
that give good starting points for number of bins. In the equations
below, $h$ is the binwidth and $n_h$ is the number of
bins. All estimators that compute bin counts are recast to bin width
using the ptp of the data. The final bin count is obtained from
`np.round(np.ceil(range / h))`.
‘auto’ (maximum of the ‘sturges’ and ‘fd’ estimators)
: A compromise to get a good value. For small datasets the Sturges
value will usually be chosen, while larger datasets will usually
default to FD. Avoids the overly conservative behaviour of FD
and Sturges for small and large datasets respectively.
Switchover point is usually $a.size \approx 1000$.
‘fd’ (Freedman Diaconis Estimator)
: $$
h = 2 \frac{IQR}{n^{1/3}}
<br/>
$$
<br/>
The binwidth is proportional to the interquartile range (IQR)
and inversely proportional to cube root of a.size. Can be too
conservative for small datasets, but is quite good for large
datasets. The IQR is very robust to outliers.
‘scott’
: $$
h = \sigma \sqrt[3]{\frac{24 * \sqrt{\pi}}{n}}
<br/>
$$
<br/>
The binwidth is proportional to the standard deviation of the
data and inversely proportional to cube root of `x.size`. Can
be too conservative for small datasets, but is quite good for
large datasets. The standard deviation is not very robust to
outliers. Values are very similar to the Freedman-Diaconis
estimator in the absence of outliers.
‘rice’
: $$
n_h = 2n^{1/3}
<br/>
$$
<br/>
The number of bins is only proportional to cube root of
`a.size`. It tends to overestimate the number of bins and it
does not take into account data variability.
‘sturges’
: $$
n_h = \log _{2}n+1
<br/>
$$
<br/>
The number of bins is the base 2 log of `a.size`. This
estimator assumes normality of data and is too conservative for
larger, non-normal datasets. This is the default method in R’s
`hist` method.
‘doane’
: $$
n_h = 1 + \log_{2}(n) +
\log_{2}(1 + \frac{|g_1|}{\sigma_{g_1}})
<br/>
g_1 = mean[(\frac{x - \mu}{\sigma})^3]
<br/>
\sigma_{g_1} = \sqrt{\frac{6(n - 2)}{(n + 1)(n + 3)}}
$$
<br/>
An improved version of Sturges’ formula that produces better
estimates for non-normal datasets. This estimator attempts to
account for the skew of the data.
‘sqrt’
: $$
n_h = \sqrt n
<br/>
$$
<br/>
The simplest and fastest estimator. Only takes into account the
data size.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> arr = mt.array([0, 0, 0, 1, 2, 3, 3, 4, 5])
>>> mt.histogram_bin_edges(arr, bins='auto', range=(0, 1)).execute()
array([0. , 0.25, 0.5 , 0.75, 1. ])
>>> mt.histogram_bin_edges(arr, bins=2).execute()
array([0. , 2.5, 5. ])
```
For consistency with histogram, a tensor of pre-computed bins is
passed through unmodified:
```pycon
>>> mt.histogram_bin_edges(arr, [1, 2]).execute()
array([1, 2])
```
This function allows one set of bins to be computed, and reused across
multiple histograms:
```pycon
>>> shared_bins = mt.histogram_bin_edges(arr, bins='auto')
>>> shared_bins.execute()
array([0., 1., 2., 3., 4., 5.])
```
```pycon
>>> group_id = mt.array([0, 1, 1, 0, 1, 1, 0, 1, 1])
>>> a = arr[group_id == 0]
>>> a.execute()
array([0, 1, 3])
>>> hist_0, _ = mt.histogram(a, bins=shared_bins).execute()
>>> b = arr[group_id == 1]
>>> b.execute()
array([0, 0, 2, 3, 4, 5])
>>> hist_1, _ = mt.histogram(b, bins=shared_bins).execute()
```
```pycon
>>> hist_0; hist_1
array([1, 1, 0, 1, 0])
array([2, 0, 1, 1, 2])
```
Which gives more easily comparable results than using separate bins for
each histogram:
```pycon
>>> hist_0, bins_0 = mt.histogram(a, bins='auto').execute()
>>> hist_1, bins_1 = mt.histogram(b, bins='auto').execute()
>>> hist_0; hist_1
array([1, 1, 1])
array([2, 1, 1, 2])
>>> bins_0; bins_1
array([0., 1., 2., 3.])
array([0. , 1.25, 2.5 , 3.75, 5. ])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.hsplit.md
# maxframe.tensor.hsplit
### maxframe.tensor.hsplit(a, indices_or_sections)
Split a tensor into multiple sub-tensors horizontally (column-wise).
Please refer to the split documentation. hsplit is equivalent
to split with `axis=1`, the tensor is always split along the second
axis regardless of the tensor dimension.
#### SEE ALSO
[`split`](maxframe.tensor.split.md#maxframe.tensor.split)
: Split an array into multiple sub-arrays of equal size.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(16.0).reshape(4, 4)
>>> x.execute()
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[ 12., 13., 14., 15.]])
>>> mt.hsplit(x, 2).execute()
[array([[ 0., 1.],
[ 4., 5.],
[ 8., 9.],
[ 12., 13.]]),
array([[ 2., 3.],
[ 6., 7.],
[ 10., 11.],
[ 14., 15.]])]
>>> mt.hsplit(x, mt.array([3, 6])).execute()
[array([[ 0., 1., 2.],
[ 4., 5., 6.],
[ 8., 9., 10.],
[ 12., 13., 14.]]),
array([[ 3.],
[ 7.],
[ 11.],
[ 15.]]),
array([], dtype=float64)]
```
With a higher dimensional array the split is still along the second axis.
```pycon
>>> x = mt.arange(8.0).reshape(2, 2, 2)
>>> x.execute()
array([[[ 0., 1.],
[ 2., 3.]],
[[ 4., 5.],
[ 6., 7.]]])
>>> mt.hsplit(x, 2)
[array([[[ 0., 1.]],
[[ 4., 5.]]]),
array([[[ 2., 3.]],
[[ 6., 7.]]])]
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.hypot.md
# maxframe.tensor.hypot
### maxframe.tensor.hypot(x1, x2, out=None, where=None, \*\*kwargs)
Given the “legs” of a right triangle, return its hypotenuse.
Equivalent to `sqrt(x1**2 + x2**2)`, element-wise. If x1 or
x2 is scalar_like (i.e., unambiguously cast-able to a scalar type),
it is broadcast for use with each element of the other argument.
(See Examples)
* **Parameters:**
* **x1** (*array_like*) – Leg of the triangle(s).
* **x2** (*array_like*) – Leg of the triangle(s).
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated array is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**z** – The hypotenuse of the triangle(s).
* **Return type:**
Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.hypot(3*mt.ones((3, 3)), 4*mt.ones((3, 3))).execute()
array([[ 5., 5., 5.],
[ 5., 5., 5.],
[ 5., 5., 5.]])
```
Example showing broadcast of scalar_like argument:
```pycon
>>> mt.hypot(3*mt.ones((3, 3)), [4]).execute()
array([[ 5., 5., 5.],
[ 5., 5., 5.],
[ 5., 5., 5.]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.i0.md
# maxframe.tensor.i0
### maxframe.tensor.i0(x, \*\*kwargs)
Modified Bessel function of the first kind, order 0.
Usually denoted $I_0$. This function does broadcast, but will *not*
“up-cast” int dtype arguments unless accompanied by at least one float or
complex dtype argument (see Raises below).
* **Parameters:**
**x** (*array_like* *,* *dtype float* *or* [*complex*](https://docs.python.org/3/library/functions.html#complex)) – Argument of the Bessel function.
* **Returns:**
**out** – The modified Bessel function evaluated at each of the elements of x.
* **Return type:**
Tensor, shape = x.shape, dtype = x.dtype
* **Raises:**
[**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError) – array cannot be safely cast to required type: If argument consists exclusively of int dtypes.
#### SEE ALSO
[`scipy.special.iv`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.iv.html#scipy.special.iv), [`scipy.special.ive`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.ive.html#scipy.special.ive)
### Notes
We use the algorithm published by Clenshaw <sup>[1](#id4)</sup> and referenced by
Abramowitz and Stegun <sup>[2](#id5)</sup>, for which the function domain is
partitioned into the two intervals [0,8] and (8,inf), and Chebyshev
polynomial expansions are employed in each interval. Relative error on
the domain [0,30] using IEEE arithmetic is documented <sup>[3](#id6)</sup> as having a
peak of 5.8e-16 with an rms of 1.4e-16 (n = 30000).
### References
* <a id='id4'>**[1]**</a> C. W. Clenshaw, “Chebyshev series for mathematical functions”, in *National Physical Laboratory Mathematical Tables*, vol. 5, London: Her Majesty’s Stationery Office, 1962.
* <a id='id5'>**[2]**</a> M. Abramowitz and I. A. Stegun, *Handbook of Mathematical Functions*, 10th printing, New York: Dover, 1964, pp. 379. [http://www.math.sfu.ca/~cbm/aands/page_379.htm](http://www.math.sfu.ca/~cbm/aands/page_379.htm)
* <a id='id6'>**[3]**</a> [http://kobesearch.cpan.org/htdocs/Math-Cephes/Math/Cephes.html](http://kobesearch.cpan.org/htdocs/Math-Cephes/Math/Cephes.html)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.i0([0.]).execute()
array([1.])
>>> mt.i0([0., 1. + 2j]).execute()
array([ 1.00000000+0.j , 0.18785373+0.64616944j])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.imag.md
# maxframe.tensor.imag
### maxframe.tensor.imag(val, \*\*kwargs)
Return the imaginary part of the complex argument.
* **Parameters:**
**val** (*array_like*) – Input tensor.
* **Returns:**
**out** – The imaginary component of the complex argument. If val is real,
the type of val is used for the output. If val has complex
elements, the returned type is float.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`real`](maxframe.tensor.real.md#maxframe.tensor.real), [`angle`](maxframe.tensor.angle.md#maxframe.tensor.angle), `real_if_close`
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([1+2j, 3+4j, 5+6j])
>>> a.imag.execute()
array([ 2., 4., 6.])
>>> a.imag = mt.array([8, 10, 12])
>>> a.execute()
array([ 1. +8.j, 3.+10.j, 5.+12.j])
>>> mt.imag(1 + 1j).execute()
1.0
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.in1d.md
# maxframe.tensor.in1d
### maxframe.tensor.in1d(ar1: TileableType | [ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray), ar2: TileableType | [ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray) | [list](https://docs.python.org/3/library/stdtypes.html#list), assume_unique: [bool](https://docs.python.org/3/library/functions.html#bool) = False, invert: [bool](https://docs.python.org/3/library/functions.html#bool) = False)
Test whether each element of a 1-D tensor is also present in a second tensor.
Returns a boolean tensor the same length as ar1 that is True
where an element of ar1 is in ar2 and False otherwise.
We recommend using [`isin()`](maxframe.tensor.isin.md#maxframe.tensor.isin) instead of in1d for new code.
* **Parameters:**
* **ar1** ( *(**M* *,* *)* *Tensor*) – Input tensor.
* **ar2** (*array_like*) – The values against which to test each value of ar1.
* **assume_unique** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True, the input tensors are both assumed to be unique, which
can speed up the calculation. Default is False.
* **invert** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True, the values in the returned tensor are inverted (that is,
False where an element of ar1 is in ar2 and True otherwise).
Default is False. `np.in1d(a, b, invert=True)` is equivalent
to (but is faster than) `np.invert(in1d(a, b))`.
* **Returns:**
**in1d** – The values ar1[in1d] are in ar2.
* **Return type:**
(M,) Tensor, [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`isin`](maxframe.tensor.isin.md#maxframe.tensor.isin)
: Version of this function that preserves the shape of ar1.
`numpy.lib.arraysetops`
: Module with a number of other functions for performing set operations on arrays.
### Notes
in1d can be considered as an element-wise function version of the
python keyword in, for 1-D sequences. `in1d(a, b)` is roughly
equivalent to `mt.array([item in b for item in a])`.
However, this idea fails if ar2 is a set, or similar (non-sequence)
container: As `ar2` is converted to a tensor, in those cases
`asarray(ar2)` is an object tensor rather than the expected tensor of
contained values.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> test = mt.array([0, 1, 2, 5, 0])
>>> states = [0, 2]
>>> mask = mt.in1d(test, states)
>>> mask.execute()
array([ True, False, True, False, True])
>>> test[mask].execute()
array([0, 2, 0])
>>> mask = mt.in1d(test, states, invert=True)
>>> mask.execute()
array([False, True, False, True, False])
>>> test[mask].execute()
array([1, 5])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.indices.md
# maxframe.tensor.indices
### maxframe.tensor.indices(dimensions, dtype=<class 'int'>, chunk_size=None)
Return a tensor representing the indices of a grid.
Compute a tensor where the subtensors contain index values 0,1,…
varying only along the corresponding axis.
* **Parameters:**
* **dimensions** (*sequence* *of* *ints*) – The shape of the grid.
* **dtype** (*dtype* *,* *optional*) – Data type of the result.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **Returns:**
**grid** – The tensor of grid indices,
`grid.shape = (len(dimensions),) + tuple(dimensions)`.
* **Return type:**
Tensor
#### SEE ALSO
[`mgrid`](maxframe.tensor.mgrid.md#maxframe.tensor.mgrid), [`meshgrid`](maxframe.tensor.meshgrid.md#maxframe.tensor.meshgrid)
### Notes
The output shape is obtained by prepending the number of dimensions
in front of the tuple of dimensions, i.e. if dimensions is a tuple
`(r0, ..., rN-1)` of length `N`, the output shape is
`(N,r0,...,rN-1)`.
The subtensors `grid[k]` contains the N-D array of indices along the
`k-th` axis. Explicitly:
```default
grid[k,i0,i1,...,iN-1] = ik
```
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> grid = mt.indices((2, 3))
>>> grid.shape
(2, 2, 3)
>>> grid[0].execute() # row indices
array([[0, 0, 0],
[1, 1, 1]])
>>> grid[1].execute() # column indices
array([[0, 1, 2],
[0, 1, 2]])
```
The indices can be used as an index into a tensor.
```pycon
>>> x = mt.arange(20).reshape(5, 4)
>>> row, col = mt.indices((2, 3))
>>> # x[row, col]
```
Note that it would be more straightforward in the above example to
extract the required elements directly with `x[:2, :3]`.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.inner.md
# maxframe.tensor.inner
### maxframe.tensor.inner(a, b, sparse=None)
Returns the inner product of a and b for arrays of floating point types.
Like the generic NumPy equivalent the product sum is over the last dimension
of a and b. The first argument is not conjugated.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.insert.md
# maxframe.tensor.insert
### maxframe.tensor.insert(arr, obj, values, axis=None)
Insert values along the given axis before the given indices.
* **Parameters:**
* **arr** (*array like*) – Input array.
* **obj** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* [*slice*](https://docs.python.org/3/library/functions.html#slice) *or* *sequence* *of* *ints*) – Object that defines the index or indices before which values is
inserted.
* **values** (*array_like*) – Values to insert into arr. If the type of values is different
from that of arr, values is converted to the type of arr.
values should be shaped so that `arr[...,obj,...] = values`
is legal.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which to insert values. If axis is None then arr
is flattened first.
* **Returns:**
**out** – A copy of arr with values inserted. Note that insert
does not occur in-place: a new array is returned. If
axis is None, out is a flattened array.
* **Return type:**
ndarray
#### SEE ALSO
`append`
: Append elements at the end of an array.
[`concatenate`](maxframe.tensor.concatenate.md#maxframe.tensor.concatenate)
: Join a sequence of arrays along an existing axis.
[`delete`](maxframe.tensor.delete.md#maxframe.tensor.delete)
: Delete elements from an array.
### Notes
Note that for higher dimensional inserts obj=0 behaves very different
from obj=[0] just like arr[:,0,:] = values is different from
arr[:,[0],:] = values.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = mt.array([[1, 1], [2, 2], [3, 3]])
>>> a.execute()
array([[1, 1],
[2, 2],
[3, 3]])
>>> mt.insert(a, 1, 5).execute()
array([1, 5, 1, ..., 2, 3, 3])
>>> mt.insert(a, 1, 5, axis=1).execute()
array([[1, 5, 1],
[2, 5, 2],
[3, 5, 3]])
Difference between sequence and scalars:
>>> mt.insert(a, [1], [[1],[2],[3]], axis=1).execute()
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> b = a.flatten()
>>> b.execute()
array([1, 1, 2, 2, 3, 3])
>>> mt.insert(b, [2, 2], [5, 6]).execute()
array([1, 1, 5, ..., 2, 3, 3])
>>> mt.insert(b, slice(2, 4), [5, 6]).execute()
array([1, 1, 5, ..., 2, 3, 3])
>>> mt.insert(b, [2, 2], [7.13, False]).execute() # type casting
array([1, 1, 7, ..., 2, 3, 3])
>>> x = mt.arange(8).reshape(2, 4)
>>> idx = (1, 3)
>>> mt.insert(x, idx, 999, axis=1).execute()
array([[ 0, 999, 1, 2, 999, 3],
[ 4, 999, 5, 6, 999, 7]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.invert.md
# maxframe.tensor.invert
### maxframe.tensor.invert(x, out=None, where=None, \*\*kwargs)
Compute bit-wise inversion, or bit-wise NOT, element-wise.
Computes the bit-wise NOT of the underlying binary representation of
the integers in the input tensors. This ufunc implements the C/Python
operator `~`.
For signed integer inputs, the two’s complement is returned. In a
two’s-complement system negative numbers are represented by the two’s
complement of the absolute value. This is the most common method of
representing signed integers on computers <sup>[1](#id2)</sup>. A N-bit
two’s-complement system can represent every integer in the range
$-2^{N-1}$ to $+2^{N-1}-1$.
* **Parameters:**
* **x** (*array_like*) – Only integer and boolean types are handled.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Result.
* **Return type:**
array_like
#### SEE ALSO
[`bitwise_and`](maxframe.tensor.bitwise_and.md#maxframe.tensor.bitwise_and), [`bitwise_or`](maxframe.tensor.bitwise_or.md#maxframe.tensor.bitwise_or), [`bitwise_xor`](maxframe.tensor.bitwise_xor.md#maxframe.tensor.bitwise_xor), [`logical_not`](maxframe.tensor.logical_not.md#maxframe.tensor.logical_not)
### Notes
bitwise_not is an alias for invert:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.bitwise_not is mt.invert
True
```
### References
* <a id='id2'>**[1]**</a> Wikipedia, “Two’s complement”, [http://en.wikipedia.org/wiki/Two’s_complement](http://en.wikipedia.org/wiki/Two's_complement)
### Examples
We’ve seen that 13 is represented by `00001101`.
The invert or bit-wise NOT of 13 is then:
```pycon
>>> mt.invert(mt.array([13], dtype=mt.uint8)).execute()
array([242], dtype=uint8)
```
The result depends on the bit-width:
```pycon
>>> mt.invert(mt.array([13], dtype=mt.uint16)).execute()
array([65522], dtype=uint16)
```
When using signed integer types the result is the two’s complement of
the result for the unsigned type:
```pycon
>>> mt.invert(mt.array([13], dtype=mt.int8)).execute()
array([-14], dtype=int8)
```
Booleans are accepted as well:
```pycon
>>> mt.invert(mt.array([True, False])).execute()
array([False, True])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.isclose.md
# maxframe.tensor.isclose
### maxframe.tensor.isclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)
Returns a boolean tensor where two tensors are element-wise equal within a
tolerance.
The tolerance values are positive, typically very small numbers. The
relative difference (rtol \* abs(b)) and the absolute difference
atol are added together to compare against the absolute difference
between a and b.
* **Parameters:**
* **a** (*array_like*) – Input tensors to compare.
* **b** (*array_like*) – Input tensors to compare.
* **rtol** ([*float*](https://docs.python.org/3/library/functions.html#float)) – The relative tolerance parameter (see Notes).
* **atol** ([*float*](https://docs.python.org/3/library/functions.html#float)) – The absolute tolerance parameter (see Notes).
* **equal_nan** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – Whether to compare NaN’s as equal. If True, NaN’s in a will be
considered equal to NaN’s in b in the output tensor.
* **Returns:**
**y** – Returns a boolean tensor of where a and b are equal within the
given tolerance. If both a and b are scalars, returns a single
boolean value.
* **Return type:**
array_like
#### SEE ALSO
[`allclose`](maxframe.tensor.allclose.md#maxframe.tensor.allclose)
### Notes
For finite values, isclose uses the following equation to test whether
two floating point values are equivalent.
> absolute(a - b) <= (atol + rtol \* absolute(b))
The above equation is not symmetric in a and b, so that
isclose(a, b) might be different from isclose(b, a) in
some rare cases.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.isclose([1e10,1e-7], [1.00001e10,1e-8]).execute()
array([True, False])
>>> mt.isclose([1e10,1e-8], [1.00001e10,1e-9]).execute()
array([True, True])
>>> mt.isclose([1e10,1e-8], [1.0001e10,1e-9]).execute()
array([False, True])
>>> mt.isclose([1.0, mt.nan], [1.0, mt.nan]).execute()
array([True, False])
>>> mt.isclose([1.0, mt.nan], [1.0, mt.nan], equal_nan=True).execute()
array([True, True])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.iscomplex.md
# maxframe.tensor.iscomplex
### maxframe.tensor.iscomplex(x, \*\*kwargs)
Returns a bool tensor, where True if input element is complex.
What is tested is whether the input has a non-zero imaginary part, not if
the input type is complex.
* **Parameters:**
**x** (*array_like*) – Input tensor.
* **Returns:**
**out** – Output tensor.
* **Return type:**
Tensor of bools
#### SEE ALSO
[`isreal`](maxframe.tensor.isreal.md#maxframe.tensor.isreal)
`iscomplexobj`
: Return True if x is a complex type or an array of complex numbers.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.iscomplex([1+1j, 1+0j, 4.5, 3, 2, 2j]).execute()
array([ True, False, False, False, False, True])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.isfinite.md
# maxframe.tensor.isfinite
### maxframe.tensor.isfinite(x, out=None, where=None, \*\*kwargs)
Test element-wise for finiteness (not infinity or not Not a Number).
The result is returned as a boolean tensor.
* **Parameters:**
* **x** (*array_like*) – Input values.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – For scalar input, the result is a new boolean with value True
if the input is finite; otherwise the value is False (input is
either positive infinity, negative infinity or Not a Number).
For array input, the result is a boolean array with the same
dimensions as the input and the values are True if the
corresponding element of the input is finite; otherwise the values
are False (element is either positive infinity, negative infinity
or Not a Number).
* **Return type:**
Tensor, [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`isinf`](maxframe.tensor.isinf.md#maxframe.tensor.isinf), `isneginf`, `isposinf`, [`isnan`](maxframe.tensor.isnan.md#maxframe.tensor.isnan)
### Notes
Not a Number, positive infinity and negative infinity are considered
to be non-finite.
MaxFrame uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). This means that Not a Number is not equivalent to infinity.
Also that positive infinity is not equivalent to negative infinity. But
infinity is equivalent to positive infinity. Errors result if the
second argument is also supplied when x is a scalar input, or if
first and second arguments have different shapes.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.isfinite(1).execute()
True
>>> mt.isfinite(0).execute()
True
>>> mt.isfinite(mt.nan).execute()
False
>>> mt.isfinite(mt.inf).execute()
False
>>> mt.isfinite(mt.NINF).execute()
False
>>> mt.isfinite([mt.log(-1.).execute(),1.,mt.log(0).execute()]).execute()
array([False, True, False])
```
```pycon
>>> x = mt.array([-mt.inf, 0., mt.inf])
>>> y = mt.array([2, 2, 2])
>>> mt.isfinite(x, y).execute()
array([0, 1, 0])
>>> y.execute()
array([0, 1, 0])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.isin.md
# maxframe.tensor.isin
### maxframe.tensor.isin(element: TileableType | [ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray), test_elements: TileableType | [ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray) | [list](https://docs.python.org/3/library/stdtypes.html#list), assume_unique: [bool](https://docs.python.org/3/library/functions.html#bool) = False, invert: [bool](https://docs.python.org/3/library/functions.html#bool) = False)
Calculates element in test_elements, broadcasting over element only.
Returns a boolean array of the same shape as element that is True
where an element of element is in test_elements and False otherwise.
* **Parameters:**
* **element** (*array_like*) – Input tensor.
* **test_elements** (*array_like*) – The values against which to test each value of element.
This argument is flattened if it is a tensor or array_like.
See notes for behavior with non-array-like parameters.
* **assume_unique** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True, the input tensors are both assumed to be unique, which
can speed up the calculation. Default is False.
* **invert** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True, the values in the returned tensor are inverted, as if
calculating element not in test_elements. Default is False.
`mt.isin(a, b, invert=True)` is equivalent to (but faster
than) `mt.invert(mt.isin(a, b))`.
* **Returns:**
**isin** – Has the same shape as element. The values element[isin]
are in test_elements.
* **Return type:**
Tensor, [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`in1d`](maxframe.tensor.in1d.md#maxframe.tensor.in1d)
: Flattened version of this function.
### Notes
isin is an element-wise function version of the python keyword in.
`isin(a, b)` is roughly equivalent to
`mt.array([item in b for item in a])` if a and b are 1-D sequences.
element and test_elements are converted to tensors if they are not
already. If test_elements is a set (or other non-sequence collection)
it will be converted to an object tensor with one element, rather than a
tensor of the values contained in test_elements. This is a consequence
of the tensor constructor’s way of handling non-sequence collections.
Converting the set to a list usually gives the desired behavior.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> element = 2*mt.arange(4).reshape((2, 2))
>>> element.execute()
array([[0, 2],
[4, 6]])
>>> test_elements = [1, 2, 4, 8]
>>> mask = mt.isin(element, test_elements)
>>> mask.execute()
array([[ False, True],
[ True, False]])
>>> element[mask].execute()
array([2, 4])
>>> mask = mt.isin(element, test_elements, invert=True)
>>> mask.execute()
array([[ True, False],
[ False, True]])
>>> element[mask]
array([0, 6])
```
Because of how array handles sets, the following does not
work as expected:
```pycon
>>> test_set = {1, 2, 4, 8}
>>> mt.isin(element, test_set).execute()
array([[ False, False],
[ False, False]])
```
Casting the set to a list gives the expected result:
```pycon
>>> mt.isin(element, list(test_set)).execute()
array([[ False, True],
[ True, False]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.isinf.md
# maxframe.tensor.isinf
### maxframe.tensor.isinf(x, out=None, where=None, \*\*kwargs)
Test element-wise for positive or negative infinity.
Returns a boolean array of the same shape as x, True where `x ==
+/-inf`, otherwise False.
* **Parameters:**
* **x** (*array_like*) – Input values
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – For scalar input, the result is a new boolean with value True if
the input is positive or negative infinity; otherwise the value is
False.
For tensor input, the result is a boolean tensor with the same shape
as the input and the values are True where the corresponding
element of the input is positive or negative infinity; elsewhere
the values are False. If a second argument was supplied the result
is stored there. If the type of that array is a numeric type the
result is represented as zeros and ones, if the type is boolean
then as False and True, respectively. The return value y is then
a reference to that tensor.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool) (scalar) or boolean Tensor
#### SEE ALSO
`isneginf`, `isposinf`, [`isnan`](maxframe.tensor.isnan.md#maxframe.tensor.isnan), [`isfinite`](maxframe.tensor.isfinite.md#maxframe.tensor.isfinite)
### Notes
MaxFrame uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754).
Errors result if the second argument is supplied when the first
argument is a scalar, or if the first and second arguments have
different shapes.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.isinf(mt.inf).execute()
True
>>> mt.isinf(mt.nan).execute()
False
>>> mt.isinf(mt.NINF).execute()
True
>>> mt.isinf([mt.inf, -mt.inf, 1.0, mt.nan]).execute()
array([ True, True, False, False])
```
```pycon
>>> x = mt.array([-mt.inf, 0., mt.inf])
>>> y = mt.array([2, 2, 2])
>>> mt.isinf(x, y).execute()
array([1, 0, 1])
>>> y.execute()
array([1, 0, 1])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.isnan.md
# maxframe.tensor.isnan
### maxframe.tensor.isnan(x, out=None, where=None, \*\*kwargs)
Test element-wise for NaN and return result as a boolean tensor.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – For scalar input, the result is a new boolean with value True if
the input is NaN; otherwise the value is False.
For array input, the result is a boolean tensor of the same
dimensions as the input and the values are True if the
corresponding element of the input is NaN; otherwise the values are
False.
* **Return type:**
Tensor or [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`isinf`](maxframe.tensor.isinf.md#maxframe.tensor.isinf), `isneginf`, `isposinf`, [`isfinite`](maxframe.tensor.isfinite.md#maxframe.tensor.isfinite), `isnat`
### Notes
MaxFrame uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). This means that Not a Number is not equivalent to infinity.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.isnan(mt.nan).execute()
True
>>> mt.isnan(mt.inf).execute()
False
>>> mt.isnan([mt.log(-1.).execute(),1.,mt.log(0).execute()]).execute()
array([ True, False, False])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.isreal.md
# maxframe.tensor.isreal
### maxframe.tensor.isreal(x, \*\*kwargs)
Returns a bool tensor, where True if input element is real.
If element has complex type with zero complex part, the return value
for that element is True.
* **Parameters:**
**x** (*array_like*) – Input tensor.
* **Returns:**
**out** – Boolean tensor of same shape as x.
* **Return type:**
Tensor, [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`iscomplex`](maxframe.tensor.iscomplex.md#maxframe.tensor.iscomplex)
`isrealobj`
: Return True if x is not a complex type.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.isreal([1+1j, 1+0j, 4.5, 3, 2, 2j]).execute()
array([False, True, True, True, True, False])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.ldexp.md
# maxframe.tensor.ldexp
### maxframe.tensor.ldexp(x1, x2, out=None, where=None, \*\*kwargs)
Returns x1 \* 2\*\*x2, element-wise.
The mantissas x1 and twos exponents x2 are used to construct
floating point numbers `x1 * 2**x2`.
* **Parameters:**
* **x1** (*array_like*) – Tensor of multipliers.
* **x2** (*array_like* *,* [*int*](https://docs.python.org/3/library/functions.html#int)) – Tensor of twos exponents.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The result of `x1 * 2**x2`.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`frexp`](maxframe.tensor.frexp.md#maxframe.tensor.frexp)
: Return (y1, y2) from `x = y1 * 2**y2`, inverse to ldexp.
### Notes
Complex dtypes are not supported, they will raise a TypeError.
ldexp is useful as the inverse of frexp, if used by itself it is
more clear to simply use the expression `x1 * 2**x2`.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.ldexp(5, mt.arange(4)).execute()
array([ 5., 10., 20., 40.], dtype=float32)
```
```pycon
>>> x = mt.arange(6)
>>> mt.ldexp(*mt.frexp(x)).execute()
array([ 0., 1., 2., 3., 4., 5.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.left_shift.md
# maxframe.tensor.left_shift
### maxframe.tensor.left_shift(x1, x2, out=None, where=None, \*\*kwargs)
Shift the bits of an integer to the left.
Bits are shifted to the left by appending x2 0s at the right of x1.
Since the internal representation of numbers is in binary format, this
operation is equivalent to multiplying x1 by `2**x2`.
* **Parameters:**
* **x1** (*array_like* *of* *integer type*) – Input values.
* **x2** (*array_like* *of* *integer type*) – Number of zeros to append to x1. Has to be non-negative.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Return x1 with bits shifted x2 times to the left.
* **Return type:**
tensor of integer type
#### SEE ALSO
[`right_shift`](maxframe.tensor.right_shift.md#maxframe.tensor.right_shift)
: Shift the bits of an integer to the right.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.left_shift(5, 2).execute()
20
```
```pycon
>>> mt.left_shift(5, [1,2,3]).execute()
array([10, 20, 40])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.less.md
# maxframe.tensor.less
### maxframe.tensor.less(x1, x2, out=None, where=None, \*\*kwargs)
Return the truth value of (x1 < x2) element-wise.
* **Parameters:**
* **x1** (*array_like*) – Input tensors. If `x1.shape != x2.shape`, they must be
broadcastable to a common shape (which may be the shape of one or
the other).
* **x2** (*array_like*) – Input tensors. If `x1.shape != x2.shape`, they must be
broadcastable to a common shape (which may be the shape of one or
the other).
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Array of bools, or a single bool if x1 and x2 are scalars.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool) or Tensor of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`greater`](maxframe.tensor.greater.md#maxframe.tensor.greater), [`less_equal`](maxframe.tensor.less_equal.md#maxframe.tensor.less_equal), [`greater_equal`](maxframe.tensor.greater_equal.md#maxframe.tensor.greater_equal), [`equal`](maxframe.tensor.equal.md#maxframe.tensor.equal), [`not_equal`](maxframe.tensor.not_equal.md#maxframe.tensor.not_equal)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.less([1, 2], [2, 2]).execute()
array([ True, False])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.less_equal.md
# maxframe.tensor.less_equal
### maxframe.tensor.less_equal(x1, x2, out=None, where=None, \*\*kwargs)
Return the truth value of (x1 =< x2) element-wise.
* **Parameters:**
* **x1** (*array_like*) – Input tensors. If `x1.shape != x2.shape`, they must be
broadcastable to a common shape (which may be the shape of one or
the other).
* **x2** (*array_like*) – Input tensors. If `x1.shape != x2.shape`, they must be
broadcastable to a common shape (which may be the shape of one or
the other).
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Array of bools, or a single bool if x1 and x2 are scalars.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool) or tensor of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`greater`](maxframe.tensor.greater.md#maxframe.tensor.greater), [`less`](maxframe.tensor.less.md#maxframe.tensor.less), [`greater_equal`](maxframe.tensor.greater_equal.md#maxframe.tensor.greater_equal), [`equal`](maxframe.tensor.equal.md#maxframe.tensor.equal), [`not_equal`](maxframe.tensor.not_equal.md#maxframe.tensor.not_equal)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.less_equal([4, 2, 1], [2, 2, 2]).execute()
array([False, True, True])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.linalg.cholesky.md
# maxframe.tensor.linalg.cholesky
### maxframe.tensor.linalg.cholesky(a, lower=False)
Cholesky decomposition.
Return the Cholesky decomposition, L \* L.H, of the square matrix a,
where L is lower-triangular and .H is the conjugate transpose operator
(which is the ordinary transpose if a is real-valued). a must be
Hermitian (symmetric if real-valued) and positive-definite. Only L is
actually returned.
* **Parameters:**
* **a** ( *(* *...* *,* *M* *,* *M* *)* *array_like*) – Hermitian (symmetric if all elements are real), positive-definite
input matrix.
* **lower** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – Whether to compute the upper or lower triangular Cholesky
factorization. Default is upper-triangular.
* **Returns:**
**L** – Upper or lower-triangular Cholesky factor of a.
* **Return type:**
(…, M, M) array_like
* **Raises:**
**LinAlgError** – If the decomposition fails, for example, if a is not
positive-definite.
### Notes
Broadcasting rules apply, see the mt.linalg documentation for
details.
The Cholesky decomposition is often used as a fast way of solving
$$
A \mathbf{x} = \mathbf{b}
$$
(when A is both Hermitian/symmetric and positive-definite).
First, we solve for $\mathbf{y}$ in
$$
L \mathbf{y} = \mathbf{b},
$$
and then for $\mathbf{x}$ in
$$
L.H \mathbf{x} = \mathbf{y}.
$$
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> A = mt.array([[1,-2j],[2j,5]])
>>> A.execute()
array([[ 1.+0.j, 0.-2.j],
[ 0.+2.j, 5.+0.j]])
>>> L = mt.linalg.cholesky(A, lower=True)
>>> L.execute()
array([[ 1.+0.j, 0.+0.j],
[ 0.+2.j, 1.+0.j]])
>>> mt.dot(L, L.T.conj()).execute() # verify that L * L.H = A
array([[ 1.+0.j, 0.-2.j],
[ 0.+2.j, 5.+0.j]])
>>> A = [[1,-2j],[2j,5]] # what happens if A is only array_like?
>>> mt.linalg.cholesky(A, lower=True).execute()
array([[ 1.+0.j, 0.+0.j],
[ 0.+2.j, 1.+0.j]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.linalg.inv.md
# maxframe.tensor.linalg.inv
### maxframe.tensor.linalg.inv(a, sparse=None)
Compute the (multiplicative) inverse of a matrix.
Given a square matrix a, return the matrix ainv satisfying
`dot(a, ainv) = dot(ainv, a) = eye(a.shape[0])`.
* **Parameters:**
* **a** ( *(* *...* *,* *M* *,* *M* *)* *array_like*) – Matrix to be inverted.
* **sparse** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Return sparse value or not.
* **Returns:**
**ainv** – (Multiplicative) inverse of the matrix a.
* **Return type:**
(…, M, M) ndarray or matrix
* **Raises:**
**LinAlgError** – If a is not square or inversion fails.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = np.array([[1., 2.], [3., 4.]])
>>> ainv = mt.linalg.inv(a)
>>> mt.allclose(mt.dot(a, ainv), mt.eye(2)).execute()
True
```
```pycon
>>> mt.allclose(mt.dot(ainv, a), mt.eye(2)).execute()
True
```
```pycon
>>> ainv.execute()
array([[ -2. , 1. ],
[ 1.5, -0.5]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.linalg.lstsq.md
# maxframe.tensor.linalg.lstsq
### maxframe.tensor.linalg.lstsq(a, b, rcond=None)
Return the least-squares solution to a linear matrix equation.
Computes the vector x that approximately solves the equation
`a @ x = b`. The equation may be under-, well-, or over-determined
(i.e., the number of linearly independent rows of a can be less than,
equal to, or greater than its number of linearly independent columns).
If a is square and of full rank, then x (but for round-off error)
is the “exact” solution of the equation. Else, x minimizes the
Euclidean 2-norm $||b - ax||$. If there are multiple minimizing
solutions, the one with the smallest 2-norm $||x||$ is returned.
* **Parameters:**
* **a** ( *(**M* *,* *N* *)* *array_like*) – “Coefficient” matrix.
* **b** ( *{* *(**M* *,* *)* *,* *(**M* *,* *K* *)* *} array_like*) – Ordinate or “dependent variable” values. If b is two-dimensional,
the least-squares solution is calculated for each of the K columns
of b.
* **rcond** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Cut-off ratio for small singular values of a.
For the purposes of rank determination, singular values are treated
as zero if they are smaller than rcond times the largest singular
value of a.
The default uses the machine precision times `max(M, N)`. Passing
`-1` will use machine precision.
* **Returns:**
* **x** ( *{(N,), (N, K)} ndarray*) – Least-squares solution. If b is two-dimensional,
the solutions are in the K columns of x.
* **residuals** ( *{(1,), (K,), (0,)} ndarray*) – Sums of squared residuals: Squared Euclidean 2-norm for each column in
`b - a @ x`.
If the rank of a is < N or M <= N, this is an empty array.
If b is 1-dimensional, this is a (1,) shape array.
Otherwise the shape is (K,).
* **rank** (*int*) – Rank of matrix a.
* **s** ( *(min(M, N),) ndarray*) – Singular values of a.
* **Raises:**
**LinAlgError** – If computation does not converge.
### Notes
If b is a matrix, then all array results are returned as matrices.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.linalg.lu.md
# maxframe.tensor.linalg.lu
### maxframe.tensor.linalg.lu(a)
LU decomposition
The decomposition is::
: A = P L U
where P is a permutation matrix, L lower triangular with unit diagonal elements,
and U upper triangular.
* **Parameters:**
**a** ( *(**M* *,* *N* *)* *array_like*) – Array to decompose
* **Returns:**
* **p** ( *(M, M) ndarray*) – Permutation matrix
* **l** ( *(M, K) ndarray*) – Lower triangular or trapezoidal matrix with unit diagonal.
K = min(M, N)
* **u** ( *(K, N) ndarray*) – Upper triangular or trapezoidal matrix
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> A = mt.array([[1,2],[2,3]])
>>> A.execute()
array([[ 1, 2],
[ 2, 3]])
>>> P, L, U = mt.linalg.lu(A)
>>> P.execute()
array([[ 0, 1],
[ 1, 0]])
>>> L.execute()
array([[ 1, 0],
[ 0.5, 1]])
>>> U.execute()
array([[ 2, 3],
[ 0, 0.5]])
>>> mt.dot(P.dot(L), U).execute() # verify that PL * U = A
array([[ 1, 2],
[ 2, 3]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.linalg.matrix_norm.md
# maxframe.tensor.linalg.matrix_norm
### maxframe.tensor.linalg.matrix_norm(x, , keepdims=False, ord='fro')
Computes the matrix norm of a matrix (or a stack of matrices) `x`.
This function is Array API compatible.
* **Parameters:**
* **x** (*array_like*) – Input array having shape (…, M, N) and whose two innermost
dimensions form `MxN` matrices.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If this is set to True, the axes which are normed over are left in
the result as dimensions with size one. Default: False.
* **ord** ( *{1* *,* *-1* *,* *2* *,* *-2* *,* *inf* *,* *-inf* *,* *'fro'* *,* *'nuc'}* *,* *optional*) – The order of the norm. For details see the table under `Notes`
in numpy.linalg.norm.
#### SEE ALSO
[`numpy.linalg.norm`](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html#numpy.linalg.norm)
: Generic norm function
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> from maxframe.tensor import linalg as LA
>>> a = mt.arange(9) - 4
>>> a.execute()
array([-4, -3, -2, ..., 2, 3, 4])
>>> b = a.reshape((3, 3))
>>> b.execute()
array([[-4, -3, -2],
[-1, 0, 1],
[ 2, 3, 4]])
```
```pycon
>>> LA.matrix_norm(b).execute()
7.745966692414834
>>> LA.matrix_norm(b, ord='fro').execute()
7.745966692414834
>>> LA.matrix_norm(b, ord=np.inf).execute()
9.0
>>> LA.matrix_norm(b, ord=-np.inf).execute()
2.0
```
```pycon
>>> LA.matrix_norm(b, ord=1).execute()
7.0
>>> LA.matrix_norm(b, ord=-1).execute()
6.0
>>> LA.matrix_norm(b, ord=2).execute()
7.3484692283495345
>>> LA.matrix_norm(b, ord=-2).execute()
1.8570331885190563e-016 # may vary
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.linalg.norm.md
# maxframe.tensor.linalg.norm
### maxframe.tensor.linalg.norm(x, ord=None, axis=None, keepdims=False)
Matrix or vector norm.
This function is able to return one of eight different matrix norms,
or one of an infinite number of vector norms (described below), depending
on the value of the `ord` parameter.
* **Parameters:**
* **x** (*array_like*) – Input tensor. If axis is None, x must be 1-D or 2-D.
* **ord** ( *{non-zero int* *,* *inf* *,* *-inf* *,* *'fro'* *,* *'nuc'}* *,* *optional*) – Order of the norm (see table under `Notes`). inf means maxframe tensor’s
inf object.
* **axis** ( *{int* *,* *2-tuple* *of* *ints* *,* *None}* *,* *optional*) – If axis is an integer, it specifies the axis of x along which to
compute the vector norms. If axis is a 2-tuple, it specifies the
axes that hold 2-D matrices, and the matrix norms of these matrices
are computed. If axis is None then either a vector norm (when x
is 1-D) or a matrix norm (when x is 2-D) is returned.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If this is set to True, the axes which are normed over are left in the
result as dimensions with size one. With this option the result will
broadcast correctly against the original x.
* **Returns:**
**n** – Norm of the matrix or vector(s).
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) or Tensor
### Notes
For values of `ord <= 0`, the result is, strictly speaking, not a
mathematical ‘norm’, but it may still be useful for various numerical
purposes.
The following norms can be calculated:
| ord | norm for matrices | norm for vectors |
|-------|------------------------------|--------------------------------|
| None | Frobenius norm | 2-norm |
| ‘fro’ | Frobenius norm | – |
| ‘nuc’ | nuclear norm | – |
| inf | max(sum(abs(x), axis=1)) | max(abs(x)) |
| -inf | min(sum(abs(x), axis=1)) | min(abs(x)) |
| 0 | – | sum(x != 0) |
| 1 | max(sum(abs(x), axis=0)) | as below |
| -1 | min(sum(abs(x), axis=0)) | as below |
| 2 | 2-norm (largest sing. value) | as below |
| -2 | smallest singular value | as below |
| other | – | sum(abs(x)\*\*ord)\*\*(1./ord) |
The Frobenius norm is given by <sup>[1](#id2)</sup>:
> $||A||_F = [\\sum_{i,j} abs(a_{i,j})^2]^{1/2}$
The nuclear norm is the sum of the singular values.
### References
* <a id='id2'>**[1]**</a> G. H. Golub and C. F. Van Loan, *Matrix Computations*, Baltimore, MD, Johns Hopkins University Press, 1985, pg. 15
### Examples
```pycon
>>> from maxframe.tensor import linalg as LA
>>> import maxframe.tensor as mt
>>> a = mt.arange(9) - 4
>>> a.execute()
array([-4, -3, -2, -1, 0, 1, 2, 3, 4])
>>> b = a.reshape((3, 3))
>>> b.execute()
array([[-4, -3, -2],
[-1, 0, 1],
[ 2, 3, 4]])
```
```pycon
>>> LA.norm(a).execute()
7.745966692414834
>>> LA.norm(b).execute()
7.745966692414834
>>> LA.norm(b, 'fro').execute()
7.745966692414834
>>> LA.norm(a, mt.inf).execute()
4.0
>>> LA.norm(b, mt.inf).execute()
9.0
>>> LA.norm(a, -mt.inf).execute()
0.0
>>> LA.norm(b, -mt.inf).execute()
2.0
```
```pycon
>>> LA.norm(a, 1).execute()
20.0
>>> LA.norm(b, 1).execute()
7.0
>>> LA.norm(a, -1).execute()
0.0
>>> LA.norm(b, -1).execute()
6.0
>>> LA.norm(a, 2).execute()
7.745966692414834
>>> LA.norm(b, 2).execute()
7.3484692283495345
```
```pycon
>>> LA.norm(a, -2).execute()
0.0
>>> LA.norm(b, -2).execute()
4.351066026358965e-18
>>> LA.norm(a, 3).execute()
5.8480354764257312
>>> LA.norm(a, -3).execute()
0.0
```
Using the axis argument to compute vector norms:
```pycon
>>> c = mt.array([[ 1, 2, 3],
... [-1, 1, 4]])
>>> LA.norm(c, axis=0).execute()
array([ 1.41421356, 2.23606798, 5. ])
>>> LA.norm(c, axis=1).execute()
array([ 3.74165739, 4.24264069])
>>> LA.norm(c, ord=1, axis=1).execute()
array([ 6., 6.])
```
Using the axis argument to compute matrix norms:
```pycon
>>> m = mt.arange(8).reshape(2,2,2)
>>> LA.norm(m, axis=(1,2)).execute()
array([ 3.74165739, 11.22497216])
>>> LA.norm(m[0, :, :]).execute(), LA.norm(m[1, :, :]).execute()
(3.7416573867739413, 11.224972160321824)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.linalg.qr.md
# maxframe.tensor.linalg.qr
### maxframe.tensor.linalg.qr(a, method='tsqr')
Compute the qr factorization of a matrix.
Factor the matrix a as *qr*, where q is orthonormal and r is
upper-triangular.
* **Parameters:**
* **a** (*array_like* *,* *shape* *(**M* *,* *N* *)*) – Matrix to be factored.
* **method** ( *{'tsqr'* *,* *'sfqr'}* *,* *optional*) –
method to calculate qr factorization, tsqr as default
TSQR is presented in:
> A. Benson, D. Gleich, and J. Demmel.
> Direct QR factorizations for tall-and-skinny matrices in
> MapReduce architectures.
> IEEE International Conference on Big Data, 2013.
> [http://arxiv.org/abs/1301.1071](http://arxiv.org/abs/1301.1071)
FSQR is a QR decomposition for fat and short matrix:
: A = [A1, A2, A3, …], A1 may be decomposed as A1 = Q1 \* R1,
for A = Q \* R, Q = Q1, R = [R1, R2, R3, …] where A2 = Q1 \* R2, A3 = Q1 \* R3, …
* **Returns:**
* **q** (*Tensor of float or complex, optional*) – A matrix with orthonormal columns. When mode = ‘complete’ the
result is an orthogonal/unitary matrix depending on whether or not
a is real/complex. The determinant may be either +/- 1 in that
case.
* **r** (*Tensor of float or complex, optional*) – The upper-triangular matrix.
* **Raises:**
**LinAlgError** – If factoring fails.
### Notes
For more information on the qr factorization, see for example:
[http://en.wikipedia.org/wiki/QR_factorization](http://en.wikipedia.org/wiki/QR_factorization)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.random.randn(9, 6)
>>> q, r = mt.linalg.qr(a)
>>> mt.allclose(a, mt.dot(q, r)).execute() # a does equal qr
True
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.linalg.solve.md
# maxframe.tensor.linalg.solve
### maxframe.tensor.linalg.solve(a, b, sym_pos=False, sparse=None)
Solve the equation `a x = b` for `x`.
* **Parameters:**
* **a** ( *(**M* *,* *M* *)* *array_like*) – A square matrix.
* **b** ( *(**M* *,* *) or* *(**M* *,* *N* *)* *array_like*) – Right-hand side matrix in `a x = b`.
* **sym_pos** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – Assume a is symmetric and positive definite. If `True`, use Cholesky
decomposition.
* **sparse** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Return sparse value or not.
* **Returns:**
* **x** ( *(M,) or (M, N) ndarray*)
* Solution to the system `a x = b`. Shape of the return matches the
* shape of b.
* **Raises:**
* **LinAlgError** –
* **If a is singular.** –
### Examples
Given a and b, solve for x:
```pycon
>>> import maxframe.tensor as mt
>>> a = mt.array([[3, 2, 0], [1, -1, 0], [0, 5, 1]])
>>> b = mt.array([2, 4, -1])
>>> x = mt.linalg.solve(a, b)
>>> x.execute()
array([ 2., -2., 9.])
```
```pycon
>>> mt.dot(a, x).execute() # Check the result
array([ 2., 4., -1.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.linalg.solve_triangular.md
# maxframe.tensor.linalg.solve_triangular
### maxframe.tensor.linalg.solve_triangular(a, b, lower=False, sparse=None)
Solve the equation a x = b for x, assuming a is a triangular matrix.
* **Parameters:**
* **a** ( *(**M* *,* *M* *)* *array_like*) – A triangular matrix
* **b** ( *(**M* *,* *) or* *(**M* *,* *N* *)* *array_like*) – Right-hand side matrix in a x = b
* **lower** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Use only data contained in the lower triangle of a.
Default is to use upper triangle.
* **sparse** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Return sparse value or not.
* **Returns:**
**x** – Solution to the system a x = b. Shape of return matches b.
* **Return type:**
(M,) or (M, N) ndarray
### Examples
Solve the lower triangular system a x = b, where::
: > [3 0 0 0] [4]
<br/>
a = [2 1 0 0] b = [2]
: [1 0 1 0] [4]
[1 1 1 1] [2]
```pycon
>>> import maxframe.tensor as mt
>>> a = mt.array([[3, 0, 0, 0], [2, 1, 0, 0], [1, 0, 1, 0], [1, 1, 1, 1]])
>>> b = mt.array([4, 2, 4, 2])
>>> x = mt.linalg.solve_triangular(a, b, lower=True)
>>> x.execute()
array([ 1.33333333, -0.66666667, 2.66666667, -1.33333333])
```
```pycon
>>> a.dot(x).execute() # Check the result
array([ 4., 2., 4., 2.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.linalg.svd.md
# maxframe.tensor.linalg.svd
### maxframe.tensor.linalg.svd(a, method='tsqr')
Singular Value Decomposition.
When a is a 2D tensor, it is factorized as `u @ np.diag(s) @ vh
= (u * s) @ vh`, where u and vh are 2D unitary tensors and s is a 1D
tensor of a’s singular values. When a is higher-dimensional, SVD is
applied in stacked mode as explained below.
* **Parameters:**
* **a** ( *(* *...* *,* *M* *,* *N* *)* *array_like*) – A real or complex tensor with `a.ndim >= 2`.
* **method** ( *{'tsqr'}* *,* *optional*) –
method to calculate qr factorization, tsqr as default
TSQR is presented in:
> A. Benson, D. Gleich, and J. Demmel.
> Direct QR factorizations for tall-and-skinny matrices in
> MapReduce architectures.
> IEEE International Conference on Big Data, 2013.
> [http://arxiv.org/abs/1301.1071](http://arxiv.org/abs/1301.1071)
* **Returns:**
* **u** ( *{ (…, M, M), (…, M, K) } tensor*) – Unitary tensor(s). The first `a.ndim - 2` dimensions have the same
size as those of the input a. The size of the last two dimensions
depends on the value of full_matrices. Only returned when
compute_uv is True.
* **s** ( *(…, K) tensor*) – Vector(s) with the singular values, within each vector sorted in
descending order. The first `a.ndim - 2` dimensions have the same
size as those of the input a.
* **vh** ( *{ (…, N, N), (…, K, N) } tensor*) – Unitary tensor(s). The first `a.ndim - 2` dimensions have the same
size as those of the input a. The size of the last two dimensions
depends on the value of full_matrices. Only returned when
compute_uv is True.
* **Raises:**
**LinAlgError** – If SVD computation does not converge.
### Notes
SVD is usually described for the factorization of a 2D matrix $A$.
The higher-dimensional case will be discussed below. In the 2D case, SVD is
written as $A = U S V^H$, where $A = a$, $U= u$,
$S= \mathtt{np.diag}(s)$ and $V^H = vh$. The 1D tensor s
contains the singular values of a and u and vh are unitary. The rows
of vh are the eigenvectors of $A^H A$ and the columns of u are
the eigenvectors of $A A^H$. In both cases the corresponding
(possibly non-zero) eigenvalues are given by `s**2`.
If a has more than two dimensions, then broadcasting rules apply, as
explained in [Linear algebra on several matrices at once](https://numpy.org/doc/stable/reference/routines.linalg.html#routines-linalg-broadcasting). This means that SVD is
working in “stacked” mode: it iterates over all indices of the first
`a.ndim - 2` dimensions and for each combination SVD is applied to the
last two indices. The matrix a can be reconstructed from the
decomposition with either `(u * s[..., None, :]) @ vh` or
`u @ (s[..., None] * vh)`. (The `@` operator can be replaced by the
function `mt.matmul` for python versions below 3.5.)
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = mt.random.randn(9, 6) + 1j*mt.random.randn(9, 6)
>>> b = mt.random.randn(2, 7, 8, 3) + 1j*mt.random.randn(2, 7, 8, 3)
```
Reconstruction based on reduced SVD, 2D case:
```pycon
>>> u, s, vh = mt.linalg.svd(a)
>>> u.shape, s.shape, vh.shape
((9, 6), (6,), (6, 6))
>>> np.allclose(a, np.dot(u * s, vh))
True
>>> smat = np.diag(s)
>>> np.allclose(a, np.dot(u, np.dot(smat, vh)))
True
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.linalg.vector_norm.md
# maxframe.tensor.linalg.vector_norm
### maxframe.tensor.linalg.vector_norm(x, , axis=None, keepdims=False, ord=2)
Computes the vector norm of a vector (or batch of vectors) `x`.
This function is Array API compatible.
* **Parameters:**
* **x** (*array_like*) – Input array.
* **axis** ( *{None* *,* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *2-tuple* *of* *ints}* *,* *optional*) – If an integer, `axis` specifies the axis (dimension) along which
to compute vector norms. If an n-tuple, `axis` specifies the axes
(dimensions) along which to compute batched vector norms. If `None`,
the vector norm must be computed over all array values (i.e.,
equivalent to computing the vector norm of a flattened array).
Default: `None`.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If this is set to True, the axes which are normed over are left in
the result as dimensions with size one. Default: False.
* **ord** ( *{int* *,* [*float*](https://docs.python.org/3/library/functions.html#float) *,* *inf* *,* *-inf}* *,* *optional*) – The order of the norm. For details see the table under `Notes`
in numpy.linalg.norm.
#### SEE ALSO
[`numpy.linalg.norm`](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html#numpy.linalg.norm)
: Generic norm function
### Examples
```pycon
>>> from numpy import linalg as LA
>>> a = np.arange(9) + 1
>>> a
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = a.reshape((3, 3))
>>> b
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
```
```pycon
>>> LA.vector_norm(b)
16.881943016134134
>>> LA.vector_norm(b, ord=np.inf)
9.0
>>> LA.vector_norm(b, ord=-np.inf)
1.0
```
```pycon
>>> LA.vector_norm(b, ord=0)
9.0
>>> LA.vector_norm(b, ord=1)
45.0
>>> LA.vector_norm(b, ord=-1)
0.3534857623790153
>>> LA.vector_norm(b, ord=2)
16.881943016134134
>>> LA.vector_norm(b, ord=-2)
0.8058837395885292
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.linspace.md
# maxframe.tensor.linspace
### maxframe.tensor.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, gpu=None, chunk_size=None)
Return evenly spaced numbers over a specified interval.
Returns num evenly spaced samples, calculated over the
interval [start, stop].
The endpoint of the interval can optionally be excluded.
* **Parameters:**
* **start** (*scalar*) – The starting value of the sequence.
* **stop** (*scalar*) – The end value of the sequence, unless endpoint is set to False.
In that case, the sequence consists of all but the last of `num + 1`
evenly spaced samples, so that stop is excluded. Note that the step
size changes when endpoint is False.
* **num** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Number of samples to generate. Default is 50. Must be non-negative.
* **endpoint** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True, stop is the last sample. Otherwise, it is not included.
Default is True.
* **retstep** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True, return (samples, step), where step is the spacing
between samples.
* **dtype** (*dtype* *,* *optional*) – The type of the output tensor. If dtype is not given, infer the data
type from the other input arguments.
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **Returns:**
* **samples** (*Tensor*) – There are num equally spaced samples in the closed interval
`[start, stop]` or the half-open interval `[start, stop)`
(depending on whether endpoint is True or False).
* **step** (*float, optional*) – Only returned if retstep is True
Size of spacing between samples.
#### SEE ALSO
[`arange`](maxframe.tensor.arange.md#maxframe.tensor.arange)
: Similar to linspace, but uses a step size (instead of the number of samples).
`logspace`
: Samples uniformly distributed in log space.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.linspace(2.0, 3.0, num=5).execute()
array([ 2. , 2.25, 2.5 , 2.75, 3. ])
>>> mt.linspace(2.0, 3.0, num=5, endpoint=False).execute()
array([ 2. , 2.2, 2.4, 2.6, 2.8])
>>> mt.linspace(2.0, 3.0, num=5, retstep=True).execute()
(array([ 2. , 2.25, 2.5 , 2.75, 3. ]), 0.25)
```
Graphical illustration:
```pycon
>>> import matplotlib.pyplot as plt
>>> N = 8
>>> y = mt.zeros(N)
>>> x1 = mt.linspace(0, 10, N, endpoint=True)
>>> x2 = mt.linspace(0, 10, N, endpoint=False)
>>> plt.plot(x1.execute(), y.execute(), 'o')
[<matplotlib.lines.Line2D object at 0x...>]
>>> plt.plot(x2.execute(), y.execute() + 0.5, 'o')
[<matplotlib.lines.Line2D object at 0x...>]
>>> plt.ylim([-0.5, 1])
(-0.5, 1)
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.log.md
# maxframe.tensor.log
### maxframe.tensor.log(x, out=None, where=None, \*\*kwargs)
Natural logarithm, element-wise.
The natural logarithm log is the inverse of the exponential function,
so that log(exp(x)) = x. The natural logarithm is logarithm in base
e.
* **Parameters:**
* **x** (*array_like*) – Input value.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The natural logarithm of x, element-wise.
* **Return type:**
Tensor
#### SEE ALSO
[`log10`](maxframe.tensor.log10.md#maxframe.tensor.log10), [`log2`](maxframe.tensor.log2.md#maxframe.tensor.log2), [`log1p`](maxframe.tensor.log1p.md#maxframe.tensor.log1p)
### Notes
Logarithm is a multivalued function: for each x there is an infinite
number of z such that exp(z) = x. The convention is to return the
z whose imaginary part lies in [-pi, pi].
For real-valued input data types, log always returns real output. For
each value that cannot be expressed as a real number or infinity, it
yields `nan` and sets the invalid floating point error flag.
For complex-valued input, log is a complex analytical function that
has a branch cut [-inf, 0] and is continuous from above on it. log
handles the floating-point negative zero as an infinitesimal negative
number, conforming to the C99 standard.
### References
* <a id='id1'>**[1]**</a> M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 67. [http://www.math.sfu.ca/~cbm/aands/](http://www.math.sfu.ca/~cbm/aands/)
* <a id='id2'>**[2]**</a> Wikipedia, “Logarithm”. [http://en.wikipedia.org/wiki/Logarithm](http://en.wikipedia.org/wiki/Logarithm)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.log([1, mt.e, mt.e**2, 0]).execute()
array([ 0., 1., 2., -Inf])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.log10.md
# maxframe.tensor.log10
### maxframe.tensor.log10(x, out=None, where=None, \*\*kwargs)
Return the base 10 logarithm of the input tensor, element-wise.
* **Parameters:**
* **x** (*array_like*) – Input values.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The logarithm to the base 10 of x, element-wise. NaNs are
returned where x is negative.
* **Return type:**
Tensor
### Notes
Logarithm is a multivalued function: for each x there is an infinite
number of z such that 10\*\*z = x. The convention is to return the
z whose imaginary part lies in [-pi, pi].
For real-valued input data types, log10 always returns real output.
For each value that cannot be expressed as a real number or infinity,
it yields `nan` and sets the invalid floating point error flag.
For complex-valued input, log10 is a complex analytical function that
has a branch cut [-inf, 0] and is continuous from above on it.
log10 handles the floating-point negative zero as an infinitesimal
negative number, conforming to the C99 standard.
### References
* <a id='id1'>**[1]**</a> M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 67. [http://www.math.sfu.ca/~cbm/aands/](http://www.math.sfu.ca/~cbm/aands/)
* <a id='id2'>**[2]**</a> Wikipedia, “Logarithm”. [http://en.wikipedia.org/wiki/Logarithm](http://en.wikipedia.org/wiki/Logarithm)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.log10([1e-15, -3.]).execute()
array([-15., NaN])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.log1p.md
# maxframe.tensor.log1p
### maxframe.tensor.log1p(x, out=None, where=None, \*\*kwargs)
Return the natural logarithm of one plus the input tensor, element-wise.
Calculates `log(1 + x)`.
* **Parameters:**
* **x** (*array_like*) – Input values.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – Natural logarithm of 1 + x, element-wise.
* **Return type:**
Tensor
#### SEE ALSO
[`expm1`](maxframe.tensor.expm1.md#maxframe.tensor.expm1)
: `exp(x) - 1`, the inverse of log1p.
### Notes
For real-valued input, log1p is accurate also for x so small
that 1 + x == 1 in floating-point accuracy.
Logarithm is a multivalued function: for each x there is an infinite
number of z such that exp(z) = 1 + x. The convention is to return
the z whose imaginary part lies in [-pi, pi].
For real-valued input data types, log1p always returns real output.
For each value that cannot be expressed as a real number or infinity,
it yields `nan` and sets the invalid floating point error flag.
For complex-valued input, log1p is a complex analytical function that
has a branch cut [-inf, -1] and is continuous from above on it.
log1p handles the floating-point negative zero as an infinitesimal
negative number, conforming to the C99 standard.
### References
* <a id='id1'>**[1]**</a> M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 67. [http://www.math.sfu.ca/~cbm/aands/](http://www.math.sfu.ca/~cbm/aands/)
* <a id='id2'>**[2]**</a> Wikipedia, “Logarithm”. [http://en.wikipedia.org/wiki/Logarithm](http://en.wikipedia.org/wiki/Logarithm)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.log1p(1e-99).execute()
1e-99
>>> mt.log(1 + 1e-99).execute()
0.0
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.log2.md
# maxframe.tensor.log2
### maxframe.tensor.log2(x, out=None, where=None, \*\*kwargs)
Base-2 logarithm of x.
* **Parameters:**
* **x** (*array_like*) – Input values.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – Base-2 logarithm of x.
* **Return type:**
Tensor
#### SEE ALSO
[`log`](maxframe.tensor.log.md#maxframe.tensor.log), [`log10`](maxframe.tensor.log10.md#maxframe.tensor.log10), [`log1p`](maxframe.tensor.log1p.md#maxframe.tensor.log1p), `Logarithm`, `number`, `whose`, `pi`, `For`, [`None`](https://docs.python.org/3/library/constants.html#None), `For`, `it`, `For`, [`None`](https://docs.python.org/3/library/constants.html#None), `has`, `0`, `handles`, `number`, `conforming`
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([0, 1, 2, 2**4])
>>> mt.log2(x).execute()
array([-Inf, 0., 1., 4.])
```
```pycon
>>> xi = mt.array([0+1.j, 1, 2+0.j, 4.j])
>>> mt.log2(xi).execute()
array([ 0.+2.26618007j, 0.+0.j , 1.+0.j , 2.+2.26618007j])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.logaddexp.md
# maxframe.tensor.logaddexp
### maxframe.tensor.logaddexp(x1, x2, out=None, where=None, \*\*kwargs)
Logarithm of the sum of exponentiations of the inputs.
Calculates `log(exp(x1) + exp(x2))`. This function is useful in
statistics where the calculated probabilities of events may be so small
as to exceed the range of normal floating point numbers. In such cases
the logarithm of the calculated probability is stored. This function
allows adding probabilities stored in such a fashion.
* **Parameters:**
* **x1** (*array_like*) – Input values.
* **x2** (*array_like*) – Input values.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs** – For other keyword-only arguments, see the
[ufunc docs](https://numpy.org/doc/stable/reference/ufuncs.html#ufuncs-kwargs).
* **Returns:**
**result** – Logarithm of `exp(x1) + exp(x2)`.
* **Return type:**
Tensor
#### SEE ALSO
[`logaddexp2`](maxframe.tensor.logaddexp2.md#maxframe.tensor.logaddexp2)
: Logarithm of the sum of exponentiations of inputs in base 2.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> prob1 = mt.log(1e-50)
>>> prob2 = mt.log(2.5e-50)
>>> prob12 = mt.logaddexp(prob1, prob2)
>>> prob12.execute()
-113.87649168120691
>>> mt.exp(prob12).execute()
3.5000000000000057e-50
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.logaddexp2.md
# maxframe.tensor.logaddexp2
### maxframe.tensor.logaddexp2(x1, x2, out=None, where=None, \*\*kwargs)
Logarithm of the sum of exponentiations of the inputs in base-2.
Calculates `log2(2**x1 + 2**x2)`. This function is useful in machine
learning when the calculated probabilities of events may be so small as
to exceed the range of normal floating point numbers. In such cases
the base-2 logarithm of the calculated probability can be used instead.
This function allows adding probabilities stored in such a fashion.
* **Parameters:**
* **x1** (*array_like*) – Input values.
* **x2** (*array_like*) – Input values.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**result** – Base-2 logarithm of `2**x1 + 2**x2`.
* **Return type:**
Tensor
#### SEE ALSO
[`logaddexp`](maxframe.tensor.logaddexp.md#maxframe.tensor.logaddexp)
: Logarithm of the sum of exponentiations of the inputs.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> prob1 = mt.log2(1e-50)
>>> prob2 = mt.log2(2.5e-50)
>>> prob12 = mt.logaddexp2(prob1, prob2)
>>> prob1.execute(), prob2.execute(), prob12.execute()
(-166.09640474436813, -164.77447664948076, -164.28904982231052)
>>> (2**prob12).execute()
3.4999999999999914e-50
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.logical_and.md
# maxframe.tensor.logical_and
### maxframe.tensor.logical_and(x1, x2, out=None, where=None, \*\*kwargs)
Compute the truth value of x1 AND x2 element-wise.
* **Parameters:**
* **x1** (*array_like*) – Input tensors. x1 and x2 must be of the same shape.
* **x2** (*array_like*) – Input tensors. x1 and x2 must be of the same shape.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – Boolean result with the same shape as x1 and x2 of the logical
AND operation on corresponding elements of x1 and x2.
* **Return type:**
Tensor or [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`logical_or`](maxframe.tensor.logical_or.md#maxframe.tensor.logical_or), [`logical_not`](maxframe.tensor.logical_not.md#maxframe.tensor.logical_not), [`logical_xor`](maxframe.tensor.logical_xor.md#maxframe.tensor.logical_xor), [`bitwise_and`](maxframe.tensor.bitwise_and.md#maxframe.tensor.bitwise_and)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.logical_and(True, False).execute()
False
>>> mt.logical_and([True, False], [False, False]).execute()
array([False, False])
```
```pycon
>>> x = mt.arange(5)
>>> mt.logical_and(x>1, x<4).execute()
array([False, False, True, True, False])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.logical_not.md
# maxframe.tensor.logical_not
### maxframe.tensor.logical_not(x, out=None, where=None, \*\*kwargs)
Compute the truth value of NOT x element-wise.
* **Parameters:**
* **x** (*array_like*) – Logical NOT is applied to the elements of x.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – Boolean result with the same shape as x of the NOT operation
on elements of x.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool) or Tensor of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`logical_and`](maxframe.tensor.logical_and.md#maxframe.tensor.logical_and), [`logical_or`](maxframe.tensor.logical_or.md#maxframe.tensor.logical_or), [`logical_xor`](maxframe.tensor.logical_xor.md#maxframe.tensor.logical_xor)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.logical_not(3).execute()
False
>>> mt.logical_not([True, False, 0, 1]).execute()
array([False, True, True, False])
```
```pycon
>>> x = mt.arange(5)
>>> mt.logical_not(x<3).execute()
array([False, False, False, True, True])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.logical_or.md
# maxframe.tensor.logical_or
### maxframe.tensor.logical_or(x1, x2, out=None, where=None, \*\*kwargs)
Compute the truth value of x1 OR x2 element-wise.
* **Parameters:**
* **x1** (*array_like*) – Logical OR is applied to the elements of x1 and x2.
They have to be of the same shape.
* **x2** (*array_like*) – Logical OR is applied to the elements of x1 and x2.
They have to be of the same shape.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – Boolean result with the same shape as x1 and x2 of the logical
OR operation on elements of x1 and x2.
* **Return type:**
Tensor or [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`logical_and`](maxframe.tensor.logical_and.md#maxframe.tensor.logical_and), [`logical_not`](maxframe.tensor.logical_not.md#maxframe.tensor.logical_not), [`logical_xor`](maxframe.tensor.logical_xor.md#maxframe.tensor.logical_xor), [`bitwise_or`](maxframe.tensor.bitwise_or.md#maxframe.tensor.bitwise_or)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.logical_or(True, False).execute()
True
>>> mt.logical_or([True, False], [False, False]).execute()
array([ True, False])
```
```pycon
>>> x = mt.arange(5)
>>> mt.logical_or(x < 1, x > 3).execute()
array([ True, False, False, False, True])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.logical_xor.md
# maxframe.tensor.logical_xor
### maxframe.tensor.logical_xor(x1, x2, out=None, where=None, \*\*kwargs)
Compute the truth value of x1 XOR x2, element-wise.
* **Parameters:**
* **x1** (*array_like*) – Logical XOR is applied to the elements of x1 and x2. They must
be broadcastable to the same shape.
* **x2** (*array_like*) – Logical XOR is applied to the elements of x1 and x2. They must
be broadcastable to the same shape.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – Boolean result of the logical XOR operation applied to the elements
of x1 and x2; the shape is determined by whether or not
broadcasting of one or both arrays was required.
* **Return type:**
[bool](https://docs.python.org/3/library/functions.html#bool) or Tensor of [bool](https://docs.python.org/3/library/functions.html#bool)
#### SEE ALSO
[`logical_and`](maxframe.tensor.logical_and.md#maxframe.tensor.logical_and), [`logical_or`](maxframe.tensor.logical_or.md#maxframe.tensor.logical_or), [`logical_not`](maxframe.tensor.logical_not.md#maxframe.tensor.logical_not), [`bitwise_xor`](maxframe.tensor.bitwise_xor.md#maxframe.tensor.bitwise_xor)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.logical_xor(True, False).execute()
True
>>> mt.logical_xor([True, True, False, False], [True, False, True, False]).execute()
array([False, True, True, False])
```
```pycon
>>> x = mt.arange(5)
>>> mt.logical_xor(x < 1, x > 3).execute()
array([ True, False, False, False, True])
```
Simple example showing support of broadcasting
```pycon
>>> mt.logical_xor(0, mt.eye(2)).execute()
array([[ True, False],
[False, True]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.matmul.md
# maxframe.tensor.matmul
### maxframe.tensor.matmul(a, b, sparse=None, out=None, \*\*kw)
Matrix product of two tensors.
The behavior depends on the arguments in the following way.
- If both arguments are 2-D they are multiplied like conventional
matrices.
- If either argument is N-D, N > 2, it is treated as a stack of
matrices residing in the last two indexes and broadcast accordingly.
- If the first argument is 1-D, it is promoted to a matrix by
prepending a 1 to its dimensions. After matrix multiplication
the prepended 1 is removed.
- If the second argument is 1-D, it is promoted to a matrix by
appending a 1 to its dimensions. After matrix multiplication
the appended 1 is removed.
Multiplication by a scalar is not allowed, use `*` instead. Note that
multiplying a stack of matrices with a vector will result in a stack of
vectors, but matmul will not recognize it as such.
`matmul` differs from `dot` in two important ways.
- Multiplication by scalars is not allowed.
- Stacks of matrices are broadcast together as if the matrices
were elements.
* **Parameters:**
* **a** (*array_like*) – First argument.
* **b** (*array_like*) – Second argument.
* **out** (*Tensor* *,* *optional*) – Output argument. This must have the exact kind that would be returned
if it was not used. In particular, it must have the right type,
and its dtype must be the dtype that would be returned
for dot(a,b). This is a performance feature. Therefore, if these
conditions are not met, an exception is raised, instead of attempting
to be flexible.
* **Returns:**
**output** – Returns the dot product of a and b. If a and b are both
1-D arrays then a scalar is returned; otherwise an array is
returned. If out is given, then it is returned.
* **Return type:**
Tensor
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If the last dimension of a is not the same size as
the second-to-last dimension of b.
If scalar value is passed.
#### SEE ALSO
[`vdot`](maxframe.tensor.vdot.md#maxframe.tensor.vdot)
: Complex-conjugating dot product.
[`tensordot`](maxframe.tensor.tensordot.md#maxframe.tensor.tensordot)
: Sum products over arbitrary axes.
[`dot`](maxframe.tensor.dot.md#maxframe.tensor.dot)
: alternative matrix product with different broadcasting rules.
### Notes
The matmul function implements the semantics of the @ operator introduced
in Python 3.5 following PEP465.
### Examples
For 2-D arrays it is the matrix product:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = [[1, 0], [0, 1]]
>>> b = [[4, 1], [2, 2]]
>>> mt.matmul(a, b).execute()
array([[4, 1],
[2, 2]])
```
For 2-D mixed with 1-D, the result is the usual.
```pycon
>>> a = [[1, 0], [0, 1]]
>>> b = [1, 2]
>>> mt.matmul(a, b).execute()
array([1, 2])
>>> mt.matmul(b, a).execute()
array([1, 2])
```
Broadcasting is conventional for stacks of arrays
```pycon
>>> a = mt.arange(2*2*4).reshape((2,2,4))
>>> b = mt.arange(2*2*4).reshape((2,4,2))
>>> mt.matmul(a,b).shape
(2, 2, 2)
>>> mt.matmul(a,b)[0,1,1].execute()
98
>>> mt.sum(a[0,1,:] * b[0,:,1]).execute()
98
```
Vector, vector returns the scalar inner product, but neither argument
is complex-conjugated:
```pycon
>>> mt.matmul([2j, 3j], [2j, 3j]).execute()
(-13+0j)
```
Scalar multiplication raises an error.
```pycon
>>> mt.matmul([1,2], 3)
Traceback (most recent call last):
...
ValueError: Scalar operands are not allowed, use '*' instead
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.maximum.md
# maxframe.tensor.maximum
### maxframe.tensor.maximum(x1, x2, out=None, where=None, \*\*kwargs)
Element-wise maximum of tensor elements.
Compare two tensors and returns a new array containing the element-wise
maxima. If one of the elements being compared is a NaN, then that
element is returned. If both elements are NaNs then the first is
returned. The latter distinction is important for complex NaNs, which
are defined as at least one of the real or imaginary parts being a NaN.
The net effect is that NaNs are propagated.
* **Parameters:**
* **x1** (*array_like*) – The tensors holding the elements to be compared. They must have
the same shape, or shapes that can be broadcast to a single shape.
* **x2** (*array_like*) – The tensors holding the elements to be compared. They must have
the same shape, or shapes that can be broadcast to a single shape.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The maximum of x1 and x2, element-wise. Returns scalar if
both x1 and x2 are scalars.
* **Return type:**
ndarray or scalar
#### SEE ALSO
[`minimum`](maxframe.tensor.minimum.md#maxframe.tensor.minimum)
: Element-wise minimum of two tensors, propagates NaNs.
[`fmax`](maxframe.tensor.fmax.md#maxframe.tensor.fmax)
: Element-wise maximum of two tensors, ignores NaNs.
`amax`
: The maximum value of a tensor along a given axis, propagates NaNs.
[`nanmax`](maxframe.tensor.nanmax.md#maxframe.tensor.nanmax)
: The maximum value of a tensor along a given axis, ignores NaNs.
[`fmin`](maxframe.tensor.fmin.md#maxframe.tensor.fmin), `amin`, [`nanmin`](maxframe.tensor.nanmin.md#maxframe.tensor.nanmin)
### Notes
The maximum is equivalent to `mt.where(x1 >= x2, x1, x2)` when
neither x1 nor x2 are nans, but it is faster and does proper
broadcasting.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.maximum([2, 3, 4], [1, 5, 2]).execute()
array([2, 5, 4])
```
```pycon
>>> mt.maximum(mt.eye(2), [0.5, 2]).execute() # broadcasting
array([[ 1. , 2. ],
[ 0.5, 2. ]])
```
```pycon
>>> mt.maximum([mt.nan, 0, mt.nan], [0, mt.nan, mt.nan]).execute()
array([ NaN, NaN, NaN])
>>> mt.maximum(mt.Inf, 1).execute()
inf
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.mean.md
# maxframe.tensor.mean
### maxframe.tensor.mean(a, axis=None, dtype=None, out=None, keepdims=None)
Compute the arithmetic mean along the specified axis.
Returns the average of the array elements. The average is taken over
the flattened tensor by default, otherwise over the specified axis.
float64 intermediate and return values are used for integer inputs.
* **Parameters:**
* **a** (*array_like*) – Tensor containing numbers whose mean is desired. If a is not an
tensor, a conversion is attempted.
* **axis** (*None* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) –
Axis or axes along which the means are computed. The default is to
compute the mean of the flattened array.
If this is a tuple of ints, a mean is performed over multiple axes,
instead of a single axis or all the axes as before.
* **dtype** (*data-type* *,* *optional*) – Type to use in computing the mean. For integer inputs, the default
is float64; for floating point inputs, it is the same as the
input dtype.
* **out** (*Tensor* *,* *optional*) – Alternate output tensor in which to place the result. The default
is `None`; if provided, it must have the same shape as the
expected output, but the type will be cast if necessary.
See doc.ufuncs for details.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input tensor.
If the default value is passed, then keepdims will not be
passed through to the mean method of sub-classes of
Tensor, however any non-default value will be. If the
sub-classes sum method does not implement keepdims any
exceptions will be raised.
* **Returns:**
**m** – If out=None, returns a new tensor containing the mean values,
otherwise a reference to the output array is returned.
* **Return type:**
Tensor, see dtype parameter above
#### SEE ALSO
[`average`](maxframe.tensor.average.md#maxframe.tensor.average)
: Weighted average
[`std`](maxframe.tensor.std.md#maxframe.tensor.std), [`var`](maxframe.tensor.var.md#maxframe.tensor.var), [`nanmean`](maxframe.tensor.nanmean.md#maxframe.tensor.nanmean), [`nanstd`](maxframe.tensor.nanstd.md#maxframe.tensor.nanstd), [`nanvar`](maxframe.tensor.nanvar.md#maxframe.tensor.nanvar)
### Notes
The arithmetic mean is the sum of the elements along the axis divided
by the number of elements.
Note that for floating-point input, the mean is computed using the
same precision the input has. Depending on the input data, this can
cause the results to be inaccurate, especially for float32 (see
example below). Specifying a higher-precision accumulator using the
dtype keyword can alleviate this issue.
By default, float16 results are computed using float32 intermediates
for extra precision.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[1, 2], [3, 4]])
>>> mt.mean(a).execute()
2.5
>>> mt.mean(a, axis=0).execute()
array([ 2., 3.])
>>> mt.mean(a, axis=1).execute()
array([ 1.5, 3.5])
```
In single precision, mean can be inaccurate:
```pycon
>>> a = mt.zeros((2, 512*512), dtype=mt.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> mt.mean(a).execute()
0.54999924
```
Computing the mean in float64 is more accurate:
```pycon
>>> mt.mean(a, dtype=mt.float64).execute()
0.55000000074505806
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.median.md
# maxframe.tensor.median
### maxframe.tensor.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)
Compute the median along the specified axis.
Returns the median of the tensor elements.
* **Parameters:**
* **a** (*array_like*) – Input tensor or object that can be converted to a tensor.
* **axis** ( *{int* *,* *sequence* *of* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *None}* *,* *optional*) – Axis or axes along which the medians are computed. The default
is to compute the median along a flattened version of the tensor.
A sequence of axes is supported since version 1.9.0.
* **out** (*Tensor* *,* *optional*) – Alternative output tensor in which to place the result. It must
have the same shape and buffer length as the expected output,
but the type (of the output) will be cast if necessary.
* **overwrite_input** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Just for compatibility with Numpy, would not take effect.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the original arr.
* **Returns:**
**median** – A new tensor holding the result. If the input contains integers
or floats smaller than `float64`, then the output data-type is
`np.float64`. Otherwise, the data-type of the output is the
same as that of the input. If out is specified, that tensor is
returned instead.
* **Return type:**
Tensor
#### SEE ALSO
[`mean`](maxframe.tensor.mean.md#maxframe.tensor.mean), [`percentile`](maxframe.tensor.percentile.md#maxframe.tensor.percentile)
### Notes
Given a vector `V` of length `N`, the median of `V` is the
middle value of a sorted copy of `V`, `V_sorted` - i
e., `V_sorted[(N-1)/2]`, when `N` is odd, and the average of the
two middle values of `V_sorted` when `N` is even.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = mt.array([[10, 7, 4], [3, 2, 1]])
>>> a.execute()
array([[10, 7, 4],
[ 3, 2, 1]])
>>> mt.median(a).execute()
3.5
>>> mt.median(a, axis=0).execute()
array([6.5, 4.5, 2.5])
>>> mt.median(a, axis=1).execute()
array([7., 2.])
>>> m = mt.median(a, axis=0)
>>> out = mt.zeros_like(m)
>>> mt.median(a, axis=0, out=m).execute()
array([6.5, 4.5, 2.5])
>>> m.execute()
array([6.5, 4.5, 2.5])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.meshgrid.md
# maxframe.tensor.meshgrid
### maxframe.tensor.meshgrid(\*xi, \*\*kwargs)
Return coordinate matrices from coordinate vectors.
Make N-D coordinate arrays for vectorized evaluations of
N-D scalar/vector fields over N-D grids, given
one-dimensional coordinate tensors x1, x2,…, xn.
* **Parameters:**
* **x1** (*array_like*) – 1-D arrays representing the coordinates of a grid.
* **x2** (*array_like*) – 1-D arrays representing the coordinates of a grid.
* **...** (*array_like*) – 1-D arrays representing the coordinates of a grid.
* **xn** (*array_like*) – 1-D arrays representing the coordinates of a grid.
* **indexing** ( *{'xy'* *,* *'ij'}* *,* *optional*) – Cartesian (‘xy’, default) or matrix (‘ij’) indexing of output.
See Notes for more details.
* **sparse** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True a sparse grid is returned in order to conserve memory.
Default is False.
* **Returns:**
**X1, X2,…, XN** – For vectors x1, x2,…, ‘xn’ with lengths `Ni=len(xi)` ,
return `(N1, N2, N3,...Nn)` shaped tensors if indexing=’ij’
or `(N2, N1, N3,...Nn)` shaped tensors if indexing=’xy’
with the elements of xi repeated to fill the matrix along
the first dimension for x1, the second for x2 and so on.
* **Return type:**
Tensor
### Notes
This function supports both indexing conventions through the indexing
keyword argument. Giving the string ‘ij’ returns a meshgrid with
matrix indexing, while ‘xy’ returns a meshgrid with Cartesian indexing.
In the 2-D case with inputs of length M and N, the outputs are of shape
(N, M) for ‘xy’ indexing and (M, N) for ‘ij’ indexing. In the 3-D case
with inputs of length M, N and P, outputs are of shape (N, M, P) for
‘xy’ indexing and (M, N, P) for ‘ij’ indexing. The difference is
illustrated by the following code snippet:
```default
xv, yv = mt.meshgrid(x, y, sparse=False, indexing='ij')
for i in range(nx):
for j in range(ny):
# treat xv[i,j], yv[i,j]
xv, yv = mt.meshgrid(x, y, sparse=False, indexing='xy')
for i in range(nx):
for j in range(ny):
# treat xv[j,i], yv[j,i]
```
In the 1-D and 0-D case, the indexing and sparse keywords have no effect.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> nx, ny = (3, 2)
>>> x = mt.linspace(0, 1, nx)
>>> y = mt.linspace(0, 1, ny)
>>> xv, yv = mt.meshgrid(x, y)
>>> xv.execute()
array([[ 0. , 0.5, 1. ],
[ 0. , 0.5, 1. ]])
>>> yv.execute()
array([[ 0., 0., 0.],
[ 1., 1., 1.]])
>>> xv, yv = mt.meshgrid(x, y, sparse=True) # make sparse output arrays
>>> xv.execute()
array([[ 0. , 0.5, 1. ]])
>>> yv.execute()
array([[ 0.],
[ 1.]])
```
meshgrid is very useful to evaluate functions on a grid.
```pycon
>>> import matplotlib.pyplot as plt
>>> x = mt.arange(-5, 5, 0.1)
>>> y = mt.arange(-5, 5, 0.1)
>>> xx, yy = mt.meshgrid(x, y, sparse=True)
>>> z = mt.sin(xx**2 + yy**2) / (xx**2 + yy**2)
>>> h = plt.contourf(x,y,z)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.mgrid.md
# maxframe.tensor.mgrid
### maxframe.tensor.mgrid *= <maxframe.tensor.lib.index_tricks.nd_grid object>*
Construct a multi-dimensional “meshgrid”.
`grid = nd_grid()` creates an instance which will return a mesh-grid
when indexed. The dimension and number of the output arrays are equal
to the number of indexing dimensions. If the step length is not a
complex number, then the stop is not inclusive.
However, if the step length is a **complex number** (e.g. 5j), then the
integer part of its magnitude is interpreted as specifying the
number of points to create between the start and stop values, where
the stop value **is inclusive**.
If instantiated with an argument of `sparse=True`, the mesh-grid is
open (or not fleshed out) so that only one-dimension of each returned
argument is greater than 1.
* **Parameters:**
**sparse** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Whether the grid is sparse or not. Default is False.
### Notes
Two instances of nd_grid are made available in the maxframe.tensor namespace,
mgrid and ogrid:
```default
mgrid = nd_grid(sparse=False)
ogrid = nd_grid(sparse=True)
```
Users should use these pre-defined instances instead of using nd_grid
directly.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mgrid = mt.lib.index_tricks.nd_grid()
>>> mgrid[0:5,0:5]
array([[[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3],
[4, 4, 4, 4, 4]],
[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]])
>>> mgrid[-1:1:5j]
array([-1. , -0.5, 0. , 0.5, 1. ])
```
```pycon
>>> ogrid = mt.lib.index_tricks.nd_grid(sparse=True)
>>> ogrid[0:5,0:5]
[array([[0],
[1],
[2],
[3],
[4]]), array([[0, 1, 2, 3, 4]])]
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.minimum.md
# maxframe.tensor.minimum
### maxframe.tensor.minimum(x1, x2, out=None, where=None, \*\*kwargs)
Element-wise minimum of tensor elements.
Compare two tensors and returns a new tensor containing the element-wise
minima. If one of the elements being compared is a NaN, then that
element is returned. If both elements are NaNs then the first is
returned. The latter distinction is important for complex NaNs, which
are defined as at least one of the real or imaginary parts being a NaN.
The net effect is that NaNs are propagated.
* **Parameters:**
* **x1** (*array_like*) – The tensors holding the elements to be compared. They must have
the same shape, or shapes that can be broadcast to a single shape.
* **x2** (*array_like*) – The tensors holding the elements to be compared. They must have
the same shape, or shapes that can be broadcast to a single shape.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The minimum of x1 and x2, element-wise. Returns scalar if
both x1 and x2 are scalars.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`maximum`](maxframe.tensor.maximum.md#maxframe.tensor.maximum)
: Element-wise maximum of two tensors, propagates NaNs.
[`fmin`](maxframe.tensor.fmin.md#maxframe.tensor.fmin)
: Element-wise minimum of two tensors, ignores NaNs.
`amin`
: The minimum value of a tensor along a given axis, propagates NaNs.
[`nanmin`](maxframe.tensor.nanmin.md#maxframe.tensor.nanmin)
: The minimum value of a tenosr along a given axis, ignores NaNs.
[`fmax`](maxframe.tensor.fmax.md#maxframe.tensor.fmax), `amax`, [`nanmax`](maxframe.tensor.nanmax.md#maxframe.tensor.nanmax)
### Notes
The minimum is equivalent to `mt.where(x1 <= x2, x1, x2)` when
neither x1 nor x2 are NaNs, but it is faster and does proper
broadcasting.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.minimum([2, 3, 4], [1, 5, 2]).execute()
array([1, 3, 2])
```
```pycon
>>> mt.minimum(mt.eye(2), [0.5, 2]).execute() # broadcasting
array([[ 0.5, 0. ],
[ 0. , 1. ]])
```
```pycon
>>> mt.minimum([mt.nan, 0, mt.nan],[0, mt.nan, mt.nan]).execute()
array([ NaN, NaN, NaN])
>>> mt.minimum(-mt.Inf, 1).execute()
-inf
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.mod.md
# maxframe.tensor.mod
### maxframe.tensor.mod(x1, x2, out=None, where=None, \*\*kwargs)
Return element-wise remainder of division.
Computes the remainder complementary to the floor_divide function. It is
equivalent to the Python modulus operator\`\`x1 % x2\`\` and has the same sign
as the divisor x2. The MATLAB function equivalent to `np.remainder`
is `mod`.
#### WARNING
This should not be confused with:
* Python 3.7’s math.remainder and C’s `remainder`, which
computes the IEEE remainder, which are the complement to
`round(x1 / x2)`.
* The MATLAB `rem` function and or the C `%` operator which is the
complement to `int(x1 / x2)`.
* **Parameters:**
* **x1** (*array_like*) – Dividend array.
* **x2** (*array_like*) – Divisor array.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The element-wise remainder of the quotient `floor_divide(x1, x2)`.
Returns a scalar if both x1 and x2 are scalars.
* **Return type:**
Tensor
#### SEE ALSO
[`floor_divide`](maxframe.tensor.floor_divide.md#maxframe.tensor.floor_divide)
: Equivalent of Python `//` operator.
[`divmod`](https://docs.python.org/3/library/functions.html#divmod)
: Simultaneous floor division and remainder.
[`fmod`](maxframe.tensor.fmod.md#maxframe.tensor.fmod)
: Equivalent of the MATLAB `rem` function.
[`divide`](maxframe.tensor.divide.md#maxframe.tensor.divide), [`floor`](maxframe.tensor.floor.md#maxframe.tensor.floor)
### Notes
Returns 0 when x2 is 0 and both x1 and x2 are (tensors of)
integers.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.remainder([4, 7], [2, 3]).execute()
array([0, 1])
>>> mt.remainder(mt.arange(7), 5).execute()
array([0, 1, 2, 3, 4, 0, 1])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.modf.md
# maxframe.tensor.modf
### maxframe.tensor.modf(x, out1=None, out2=None, out=None, where=None, \*\*kwargs)
Return the fractional and integral parts of a tensor, element-wise.
The fractional and integral parts are negative if the given number is
negative.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
* **y1** (*Tensor*) – Fractional part of x.
* **y2** (*Tensor*) – Integral part of x.
### Notes
For integer input the return values are floats.
#### SEE ALSO
[`divmod`](https://docs.python.org/3/library/functions.html#divmod)
: `divmod(x, 1)` is equivalent to `modf` with the return values switched, except it always has a positive remainder.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.modf([0, 3.5]).execute()
(array([ 0. , 0.5]), array([ 0., 3.]))
>>> mt.modf(-0.5).execute()
(-0.5, -0)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.moveaxis.md
# maxframe.tensor.moveaxis
### maxframe.tensor.moveaxis(a, source, destination)
Move axes of a tensor to new positions.
Other axes remain in their original order.
* **Parameters:**
* **a** (*Tensor*) – The tensor whose axes should be reordered.
* **source** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *sequence* *of* [*int*](https://docs.python.org/3/library/functions.html#int)) – Original positions of the axes to move. These must be unique.
* **destination** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *sequence* *of* [*int*](https://docs.python.org/3/library/functions.html#int)) – Destination positions for each of the original axes. These must also be
unique.
* **Returns:**
**result** – Array with moved axes. This tensor is a view of the input tensor.
* **Return type:**
Tensor
#### SEE ALSO
[`transpose`](maxframe.tensor.transpose.md#maxframe.tensor.transpose)
: Permute the dimensions of an array.
[`swapaxes`](maxframe.tensor.swapaxes.md#maxframe.tensor.swapaxes)
: Interchange two axes of an array.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.zeros((3, 4, 5))
>>> mt.moveaxis(x, 0, -1).shape
(4, 5, 3)
>>> mt.moveaxis(x, -1, 0).shape
(5, 3, 4),
```
These all achieve the same result:
```pycon
>>> mt.transpose(x).shape
(5, 4, 3)
>>> mt.swapaxes(x, 0, -1).shape
(5, 4, 3)
>>> mt.moveaxis(x, [0, 1], [-1, -2]).shape
(5, 4, 3)
>>> mt.moveaxis(x, [0, 1, 2], [-1, -2, -3]).shape
(5, 4, 3)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.multiply.md
# maxframe.tensor.multiply
### maxframe.tensor.multiply(x1, x2, out=None, where=None, \*\*kwargs)
Multiply arguments element-wise.
* **Parameters:**
* **x1** (*array_like*) – Input arrays to be multiplied.
* **x2** (*array_like*) – Input arrays to be multiplied.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The product of x1 and x2, element-wise. Returns a scalar if
both x1 and x2 are scalars.
* **Return type:**
Tensor
### Notes
Equivalent to x1 \* x2 in terms of array broadcasting.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.multiply(2.0, 4.0).execute()
8.0
```
```pycon
>>> x1 = mt.arange(9.0).reshape((3, 3))
>>> x2 = mt.arange(3.0)
>>> mt.multiply(x1, x2).execute()
array([[ 0., 1., 4.],
[ 0., 4., 10.],
[ 0., 7., 16.]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nan_to_num.md
# maxframe.tensor.nan_to_num
### maxframe.tensor.nan_to_num(x, copy=True, \*\*kwargs)
Replace nan with zero and inf with large finite numbers.
If x is inexact, NaN is replaced by zero, and infinity and -infinity
replaced by the respectively largest and most negative finite floating
point values representable by `x.dtype`.
For complex dtypes, the above is applied to each of the real and
imaginary components of x separately.
If x is not inexact, then no replacements are made.
* **Parameters:**
* **x** (*array_like*) – Input data.
* **copy** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Whether to create a copy of x (True) or to replace values
in-place (False). The in-place operation only occurs if
casting to an array does not require a copy.
Default is True.
* **Returns:**
**out** – x, with the non-finite values replaced. If copy is False, this may
be x itself.
* **Return type:**
Tensor
#### SEE ALSO
[`isinf`](maxframe.tensor.isinf.md#maxframe.tensor.isinf)
: Shows which elements are positive or negative infinity.
`isneginf`
: Shows which elements are negative infinity.
`isposinf`
: Shows which elements are positive infinity.
[`isnan`](maxframe.tensor.isnan.md#maxframe.tensor.isnan)
: Shows which elements are Not a Number (NaN).
[`isfinite`](maxframe.tensor.isfinite.md#maxframe.tensor.isfinite)
: Shows which elements are finite (not NaN, not infinity)
### Notes
MaxFrame uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). This means that Not a Number is not equivalent to infinity.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([mt.inf, -mt.inf, mt.nan, -128, 128])
>>> mt.nan_to_num(x).execute()
array([ 1.79769313e+308, -1.79769313e+308, 0.00000000e+000,
-1.28000000e+002, 1.28000000e+002])
>>> y = mt.array([complex(mt.inf, mt.nan), mt.nan, complex(mt.nan, mt.inf)])
>>> mt.nan_to_num(y).execute()
array([ 1.79769313e+308 +0.00000000e+000j,
0.00000000e+000 +0.00000000e+000j,
0.00000000e+000 +1.79769313e+308j])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nanargmax.md
# maxframe.tensor.nanargmax
### maxframe.tensor.nanargmax(a, axis=None, out=None)
Return the indices of the maximum values in the specified axis ignoring
NaNs. For all-NaN slices `ValueError` is raised. Warning: the
results cannot be trusted if a slice contains only NaNs and -Infs.
* **Parameters:**
* **a** (*array_like*) – Input data.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which to operate. By default flattened input is used.
* **out** (*Tensor* *,* *optional*) – Alternate output tensor in which to place the result. The default
is `None`; if provided, it must have the same shape as the
expected output, but the type will be cast if necessary.
See doc.ufuncs for details.
* **Returns:**
**index_array** – An tensor of indices or a single index value.
* **Return type:**
Tensor
#### SEE ALSO
[`argmax`](maxframe.tensor.argmax.md#maxframe.tensor.argmax), [`nanargmin`](maxframe.tensor.nanargmin.md#maxframe.tensor.nanargmin)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[mt.nan, 4], [2, 3]])
>>> mt.argmax(a).execute()
0
>>> mt.nanargmax(a).execute()
1
>>> mt.nanargmax(a, axis=0).execute()
array([1, 0])
>>> mt.nanargmax(a, axis=1).execute()
array([1, 1])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nanargmin.md
# maxframe.tensor.nanargmin
### maxframe.tensor.nanargmin(a, axis=None, out=None)
Return the indices of the minimum values in the specified axis ignoring
NaNs. For all-NaN slices `ValueError` is raised. Warning: the results
cannot be trusted if a slice contains only NaNs and Infs.
* **Parameters:**
* **a** (*array_like*) – Input data.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which to operate. By default flattened input is used.
* **Returns:**
**index_array** – A tensor of indices or a single index value.
* **Return type:**
Tensor
#### SEE ALSO
[`argmin`](maxframe.tensor.argmin.md#maxframe.tensor.argmin), [`nanargmax`](maxframe.tensor.nanargmax.md#maxframe.tensor.nanargmax)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[mt.nan, 4], [2, 3]])
>>> mt.argmin(a).execute()
0
>>> mt.nanargmin(a).execute()
2
>>> mt.nanargmin(a, axis=0).execute()
array([1, 1])
>>> mt.nanargmin(a, axis=1).execute()
array([1, 0])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nancumprod.md
# maxframe.tensor.nancumprod
### maxframe.tensor.nancumprod(a, axis=None, dtype=None, out=None)
Return the cumulative product of tensor elements over a given axis treating Not a
Numbers (NaNs) as one. The cumulative product does not change when NaNs are
encountered and leading NaNs are replaced by ones.
Ones are returned for slices that are all-NaN or empty.
* **Parameters:**
* **a** (*array_like*) – Input tensor.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which the cumulative product is computed. By default
the input is flattened.
* **dtype** (*dtype* *,* *optional*) – Type of the returned tensor, as well as of the accumulator in which
the elements are multiplied. If *dtype* is not specified, it
defaults to the dtype of a, unless a has an integer dtype with
a precision less than that of the default platform integer. In
that case, the default platform integer is used instead.
* **out** (*Tensor* *,* *optional*) – Alternative output tensor in which to place the result. It must
have the same shape and buffer length as the expected output
but the type of the resulting values will be cast if necessary.
* **Returns:**
**nancumprod** – A new array holding the result is returned unless out is
specified, in which case it is returned.
* **Return type:**
Tensor
#### SEE ALSO
`mt.cumprod`
: Cumulative product across array propagating NaNs.
[`isnan`](maxframe.tensor.isnan.md#maxframe.tensor.isnan)
: Show which elements are NaN.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.nancumprod(1).execute()
array([1])
>>> mt.nancumprod([1]).execute()
array([1])
>>> mt.nancumprod([1, mt.nan]).execute()
array([ 1., 1.])
>>> a = mt.array([[1, 2], [3, mt.nan]])
>>> mt.nancumprod(a).execute()
array([ 1., 2., 6., 6.])
>>> mt.nancumprod(a, axis=0).execute()
array([[ 1., 2.],
[ 3., 2.]])
>>> mt.nancumprod(a, axis=1).execute()
array([[ 1., 2.],
[ 3., 3.]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nancumsum.md
# maxframe.tensor.nancumsum
### maxframe.tensor.nancumsum(a, axis=None, dtype=None, out=None)
Return the cumulative sum of tensor elements over a given axis treating Not a
Numbers (NaNs) as zero. The cumulative sum does not change when NaNs are
encountered and leading NaNs are replaced by zeros.
Zeros are returned for slices that are all-NaN or empty.
* **Parameters:**
* **a** (*array_like*) – Input tensor.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which the cumulative sum is computed. The default
(None) is to compute the cumsum over the flattened tensor.
* **dtype** (*dtype* *,* *optional*) – Type of the returned tensor and of the accumulator in which the
elements are summed. If dtype is not specified, it defaults
to the dtype of a, unless a has an integer dtype with a
precision less than that of the default platform integer. In
that case, the default platform integer is used.
* **out** (*Tensor* *,* *optional*) – Alternative output tensor in which to place the result. It must
have the same shape and buffer length as the expected output
but the type will be cast if necessary. See doc.ufuncs
(Section “Output arguments”) for more details.
* **Returns:**
**nancumsum** – A new tensor holding the result is returned unless out is
specified, in which it is returned. The result has the same
size as a, and the same shape as a if axis is not None
or a is a 1-d tensor.
* **Return type:**
Tensor.
#### SEE ALSO
[`numpy.cumsum`](https://numpy.org/doc/stable/reference/generated/numpy.cumsum.html#numpy.cumsum)
: Cumulative sum across tensor propagating NaNs.
[`isnan`](maxframe.tensor.isnan.md#maxframe.tensor.isnan)
: Show which elements are NaN.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.nancumsum(1).execute()
array([1])
>>> mt.nancumsum([1]).execute()
array([1])
>>> mt.nancumsum([1, mt.nan]).execute()
array([ 1., 1.])
>>> a = mt.array([[1, 2], [3, mt.nan]])
>>> mt.nancumsum(a).execute()
array([ 1., 3., 6., 6.])
>>> mt.nancumsum(a, axis=0).execute()
array([[ 1., 2.],
[ 4., 2.]])
>>> mt.nancumsum(a, axis=1).execute()
array([[ 1., 3.],
[ 3., 3.]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nanmax.md
# maxframe.tensor.nanmax
### maxframe.tensor.nanmax(a, axis=None, out=None, keepdims=None)
Return the maximum of an array or maximum along an axis, ignoring any
NaNs. When all-NaN slices are encountered a `RuntimeWarning` is
raised and NaN is returned for that slice.
* **Parameters:**
* **a** (*array_like*) – Tensor containing numbers whose maximum is desired. If a is not a
tensor, a conversion is attempted.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which the maximum is computed. The default is to compute
the maximum of the flattened tensor.
* **out** (*ndarray* *,* *optional*) – Alternate output array in which to place the result. The default
is `None`; if provided, it must have the same shape as the
expected output, but the type will be cast if necessary. See
doc.ufuncs for details.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the original a.
If the value is anything but the default, then
keepdims will be passed through to the max method
of sub-classes of Tensor. If the sub-classes methods
does not implement keepdims any exceptions will be raised.
* **Returns:**
**nanmax** – A tensor with the same shape as a, with the specified axis removed.
If a is a 0-d tensor, or if axis is None, a Tensor scalar is
returned. The same dtype as a is returned.
* **Return type:**
Tensor
#### SEE ALSO
[`nanmin`](maxframe.tensor.nanmin.md#maxframe.tensor.nanmin)
: The minimum value of a tensor along a given axis, ignoring any NaNs.
`amax`
: The maximum value of a tensor along a given axis, propagating any NaNs.
[`fmax`](maxframe.tensor.fmax.md#maxframe.tensor.fmax)
: Element-wise maximum of two tensors, ignoring any NaNs.
[`maximum`](maxframe.tensor.maximum.md#maxframe.tensor.maximum)
: Element-wise maximum of two tensors, propagating any NaNs.
[`isnan`](maxframe.tensor.isnan.md#maxframe.tensor.isnan)
: Shows which elements are Not a Number (NaN).
[`isfinite`](maxframe.tensor.isfinite.md#maxframe.tensor.isfinite)
: Shows which elements are neither NaN nor infinity.
`amin`, [`fmin`](maxframe.tensor.fmin.md#maxframe.tensor.fmin), [`minimum`](maxframe.tensor.minimum.md#maxframe.tensor.minimum)
### Notes
MaxFrame uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). This means that Not a Number is not equivalent to infinity.
Positive infinity is treated as a very large number and negative
infinity is treated as a very small (i.e. negative) number.
If the input has a integer type the function is equivalent to np.max.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[1, 2], [3, mt.nan]])
>>> mt.nanmax(a).execute()
3.0
>>> mt.nanmax(a, axis=0).execute()
array([ 3., 2.])
>>> mt.nanmax(a, axis=1).execute()
array([ 2., 3.])
```
When positive infinity and negative infinity are present:
```pycon
>>> mt.nanmax([1, 2, mt.nan, mt.NINF]).execute()
2.0
>>> mt.nanmax([1, 2, mt.nan, mt.inf]).execute()
inf
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nanmean.md
# maxframe.tensor.nanmean
### maxframe.tensor.nanmean(a, axis=None, dtype=None, out=None, keepdims=None)
Compute the arithmetic mean along the specified axis, ignoring NaNs.
Returns the average of the tensor elements. The average is taken over
the flattened tensor by default, otherwise over the specified axis.
float64 intermediate and return values are used for integer inputs.
For all-NaN slices, NaN is returned and a RuntimeWarning is raised.
* **Parameters:**
* **a** (*array_like*) – Tensor containing numbers whose mean is desired. If a is not an
tensor, a conversion is attempted.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which the means are computed. The default is to compute
the mean of the flattened tensor.
* **dtype** (*data-type* *,* *optional*) – Type to use in computing the mean. For integer inputs, the default
is float64; for inexact inputs, it is the same as the input
dtype.
* **out** (*Tensor* *,* *optional*) – Alternate output tensor in which to place the result. The default
is `None`; if provided, it must have the same shape as the
expected output, but the type will be cast if necessary. See
doc.ufuncs for details.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the original a.
If the value is anything but the default, then
keepdims will be passed through to the mean or sum methods
of sub-classes of Tensor. If the sub-classes methods
does not implement keepdims any exceptions will be raised.
* **Returns:**
**m** – If out=None, returns a new array containing the mean values,
otherwise a reference to the output array is returned. Nan is
returned for slices that contain only NaNs.
* **Return type:**
Tensor, see dtype parameter above
#### SEE ALSO
[`average`](maxframe.tensor.average.md#maxframe.tensor.average)
: Weighted average
[`mean`](maxframe.tensor.mean.md#maxframe.tensor.mean)
: Arithmetic mean taken while not ignoring NaNs
[`var`](maxframe.tensor.var.md#maxframe.tensor.var), [`nanvar`](maxframe.tensor.nanvar.md#maxframe.tensor.nanvar)
### Notes
The arithmetic mean is the sum of the non-NaN elements along the axis
divided by the number of non-NaN elements.
Note that for floating-point input, the mean is computed using the same
precision the input has. Depending on the input data, this can cause
the results to be inaccurate, especially for float32. Specifying a
higher-precision accumulator using the dtype keyword can alleviate
this issue.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[1, mt.nan], [3, 4]])
>>> mt.nanmean(a).execute()
2.6666666666666665
>>> mt.nanmean(a, axis=0).execute()
array([ 2., 4.])
>>> mt.nanmean(a, axis=1).execute()
array([ 1., 3.5])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nanmin.md
# maxframe.tensor.nanmin
### maxframe.tensor.nanmin(a, axis=None, out=None, keepdims=None)
Return minimum of a tensor or minimum along an axis, ignoring any NaNs.
When all-NaN slices are encountered a `RuntimeWarning` is raised and
Nan is returned for that slice.
* **Parameters:**
* **a** (*array_like*) – Tensor containing numbers whose minimum is desired. If a is not an
tensor, a conversion is attempted.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which the minimum is computed. The default is to compute
the minimum of the flattened tensor.
* **out** (*Tensor* *,* *optional*) – Alternate output tensor in which to place the result. The default
is `None`; if provided, it must have the same shape as the
expected output, but the type will be cast if necessary. See
doc.ufuncs for details.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the original a.
If the value is anything but the default, then
keepdims will be passed through to the min method
of sub-classes of Tensor. If the sub-classes methods
does not implement keepdims any exceptions will be raised.
* **Returns:**
**nanmin** – An tensor with the same shape as a, with the specified axis
removed. If a is a 0-d tensor, or if axis is None, a tensor
scalar is returned. The same dtype as a is returned.
* **Return type:**
Tensor
#### SEE ALSO
[`nanmax`](maxframe.tensor.nanmax.md#maxframe.tensor.nanmax)
: The maximum value of an array along a given axis, ignoring any NaNs.
`amin`
: The minimum value of an array along a given axis, propagating any NaNs.
[`fmin`](maxframe.tensor.fmin.md#maxframe.tensor.fmin)
: Element-wise minimum of two arrays, ignoring any NaNs.
[`minimum`](maxframe.tensor.minimum.md#maxframe.tensor.minimum)
: Element-wise minimum of two arrays, propagating any NaNs.
[`isnan`](maxframe.tensor.isnan.md#maxframe.tensor.isnan)
: Shows which elements are Not a Number (NaN).
[`isfinite`](maxframe.tensor.isfinite.md#maxframe.tensor.isfinite)
: Shows which elements are neither NaN nor infinity.
`amax`, [`fmax`](maxframe.tensor.fmax.md#maxframe.tensor.fmax), [`maximum`](maxframe.tensor.maximum.md#maxframe.tensor.maximum)
### Notes
MaxFrame uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). This means that Not a Number is not equivalent to infinity.
Positive infinity is treated as a very large number and negative
infinity is treated as a very small (i.e. negative) number.
If the input has a integer type the function is equivalent to mt.min.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[1, 2], [3, mt.nan]])
>>> mt.nanmin(a).execute()
1.0
>>> mt.nanmin(a, axis=0).execute()
array([ 1., 2.])
>>> mt.nanmin(a, axis=1).execute()
array([ 1., 3.])
```
When positive infinity and negative infinity are present:
```pycon
>>> mt.nanmin([1, 2, mt.nan, mt.inf]).execute()
1.0
>>> mt.nanmin([1, 2, mt.nan, mt.NINF]).execute()
-inf
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nanprod.md
# maxframe.tensor.nanprod
### maxframe.tensor.nanprod(a, axis=None, dtype=None, out=None, keepdims=None)
Return the product of array elements over a given axis treating Not a
Numbers (NaNs) as ones.
One is returned for slices that are all-NaN or empty.
* **Parameters:**
* **a** (*array_like*) – Tensor containing numbers whose product is desired. If a is not an
tensor, a conversion is attempted.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which the product is computed. The default is to compute
the product of the flattened tensor.
* **dtype** (*data-type* *,* *optional*) – The type of the returned tensor and of the accumulator in which the
elements are summed. By default, the dtype of a is used. An
exception is when a has an integer type with less precision than
the platform (u)intp. In that case, the default will be either
(u)int32 or (u)int64 depending on whether the platform is 32 or 64
bits. For inexact inputs, dtype must be inexact.
* **out** (*Tensor* *,* *optional*) – Alternate output tensor in which to place the result. The default
is `None`. If provided, it must have the same shape as the
expected output, but the type will be cast if necessary. See
doc.ufuncs for details. The casting of NaN to integer can yield
unexpected results.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True, the axes which are reduced are left in the result as
dimensions with size one. With this option, the result will
broadcast correctly against the original arr.
* **Returns:**
**nanprod** – A new tensor holding the result is returned unless out is
specified, in which case it is returned.
* **Return type:**
Tensor
#### SEE ALSO
`mt.prod`
: Product across array propagating NaNs.
[`isnan`](maxframe.tensor.isnan.md#maxframe.tensor.isnan)
: Show which elements are NaN.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.nanprod(1).execute()
1
>>> mt.nanprod([1]).execute()
1
>>> mt.nanprod([1, mt.nan]).execute()
1.0
>>> a = mt.array([[1, 2], [3, mt.nan]])
>>> mt.nanprod(a).execute()
6.0
>>> mt.nanprod(a, axis=0).execute()
array([ 3., 2.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nanstd.md
# maxframe.tensor.nanstd
### maxframe.tensor.nanstd(a, axis=None, dtype=None, out=None, ddof=0, keepdims=None)
Compute the standard deviation along the specified axis, while
ignoring NaNs.
Returns the standard deviation, a measure of the spread of a
distribution, of the non-NaN tensor elements. The standard deviation is
computed for the flattened tensor by default, otherwise over the
specified axis.
For all-NaN slices or slices with zero degrees of freedom, NaN is
returned and a RuntimeWarning is raised.
* **Parameters:**
* **a** (*array_like*) – Calculate the standard deviation of the non-NaN values.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which the standard deviation is computed. The default is
to compute the standard deviation of the flattened tensor.
* **dtype** (*dtype* *,* *optional*) – Type to use in computing the standard deviation. For tensors of
integer type the default is float64, for tensors of float types it
is the same as the tensor type.
* **out** (*Tensor* *,* *optional*) – Alternative output tensor in which to place the result. It must have
the same shape as the expected output but the type (of the
calculated values) will be cast if necessary.
* **ddof** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Means Delta Degrees of Freedom. The divisor used in calculations
is `N - ddof`, where `N` represents the number of non-NaN
elements. By default ddof is zero.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the original a.
If this value is anything but the default it is passed through
as-is to the relevant functions of the sub-classes. If these
functions do not have a keepdims kwarg, a RuntimeError will
be raised.
* **Returns:**
**standard_deviation** – If out is None, return a new array containing the standard
deviation, otherwise return a reference to the output tensor. If
ddof is >= the number of non-NaN elements in a slice or the slice
contains only NaNs, then the result for that slice is NaN.
* **Return type:**
ndarray, see dtype parameter above.
#### SEE ALSO
[`var`](maxframe.tensor.var.md#maxframe.tensor.var), [`mean`](maxframe.tensor.mean.md#maxframe.tensor.mean), [`std`](maxframe.tensor.std.md#maxframe.tensor.std), [`nanvar`](maxframe.tensor.nanvar.md#maxframe.tensor.nanvar), [`nanmean`](maxframe.tensor.nanmean.md#maxframe.tensor.nanmean)
### Notes
The standard deviation is the square root of the average of the squared
deviations from the mean: `std = sqrt(mean(abs(x - x.mean())**2))`.
The average squared deviation is normally calculated as
`x.sum() / N`, where `N = len(x)`. If, however, ddof is
specified, the divisor `N - ddof` is used instead. In standard
statistical practice, `ddof=1` provides an unbiased estimator of the
variance of the infinite population. `ddof=0` provides a maximum
likelihood estimate of the variance for normally distributed variables.
The standard deviation computed in this function is the square root of
the estimated variance, so even with `ddof=1`, it will not be an
unbiased estimate of the standard deviation per se.
Note that, for complex numbers, std takes the absolute value before
squaring, so that the result is always real and nonnegative.
For floating-point input, the *std* is computed using the same
precision the input has. Depending on the input data, this can cause
the results to be inaccurate, especially for float32 (see example
below). Specifying a higher-accuracy accumulator using the dtype
keyword can alleviate this issue.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[1, mt.nan], [3, 4]])
>>> mt.nanstd(a).execute()
1.247219128924647
>>> mt.nanstd(a, axis=0).execute()
array([ 1., 0.])
>>> mt.nanstd(a, axis=1).execute()
array([ 0., 0.5])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nansum.md
# maxframe.tensor.nansum
### maxframe.tensor.nansum(a, axis=None, dtype=None, out=None, keepdims=None)
Return the sum of array elements over a given axis treating Not a
Numbers (NaNs) as zero.
Zero is returned for slices that are all-NaN or
empty.
* **Parameters:**
* **a** (*array_like*) – Tensor containing numbers whose sum is desired. If a is not an
tensor, a conversion is attempted.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which the sum is computed. The default is to compute the
sum of the flattened array.
* **dtype** (*data-type* *,* *optional*) – The type of the returned tensor and of the accumulator in which the
elements are summed. By default, the dtype of a is used. An
exception is when a has an integer type with less precision than
the platform (u)intp. In that case, the default will be either
(u)int32 or (u)int64 depending on whether the platform is 32 or 64
bits. For inexact inputs, dtype must be inexact.
* **out** (*Tensor* *,* *optional*) – Alternate output tensor in which to place the result. The default
is `None`. If provided, it must have the same shape as the
expected output, but the type will be cast if necessary. See
doc.ufuncs for details. The casting of NaN to integer can yield
unexpected results.
* **keepdims** – If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the original a.
* **Returns:**
**nansum** – A new tensor holding the result is returned unless out is
specified, in which it is returned. The result has the same
size as a, and the same shape as a if axis is not None
or a is a 1-d array.
* **Return type:**
Tensor.
#### SEE ALSO
`mt.sum`
: Sum across tensor propagating NaNs.
[`isnan`](maxframe.tensor.isnan.md#maxframe.tensor.isnan)
: Show which elements are NaN.
[`isfinite`](maxframe.tensor.isfinite.md#maxframe.tensor.isfinite)
: Show which elements are not NaN or +/-inf.
### Notes
If both positive and negative infinity are present, the sum will be Not
A Number (NaN).
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.nansum(1).execute()
1
>>> mt.nansum([1]).execute()
1
>>> mt.nansum([1, mt.nan]).execute()
1.0
>>> a = mt.array([[1, 1], [1, mt.nan]])
>>> mt.nansum(a).execute()
3.0
>>> mt.nansum(a, axis=0).execute()
array([ 2., 1.])
>>> mt.nansum([1, mt.nan, mt.inf]).execute()
inf
>>> mt.nansum([1, mt.nan, mt.NINF]).execute()
-inf
>>> mt.nansum([1, mt.nan, mt.inf, -mt.inf]).execute() # both +/- infinity present
nan
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nanvar.md
# maxframe.tensor.nanvar
### maxframe.tensor.nanvar(a, axis=None, dtype=None, out=None, ddof=0, keepdims=None)
Compute the variance along the specified axis, while ignoring NaNs.
Returns the variance of the tensor elements, a measure of the spread of
a distribution. The variance is computed for the flattened tensor by
default, otherwise over the specified axis.
For all-NaN slices or slices with zero degrees of freedom, NaN is
returned and a RuntimeWarning is raised.
* **Parameters:**
* **a** (*array_like*) – Tensor containing numbers whose variance is desired. If a is not a
tensor, a conversion is attempted.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which the variance is computed. The default is to compute
the variance of the flattened array.
* **dtype** (*data-type* *,* *optional*) – Type to use in computing the variance. For tensors of integer type
the default is float32; for tensors of float types it is the same as
the tensor type.
* **out** (*Tensor* *,* *optional*) – Alternate output tensor in which to place the result. It must have
the same shape as the expected output, but the type is cast if
necessary.
* **ddof** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – “Delta Degrees of Freedom”: the divisor used in the calculation is
`N - ddof`, where `N` represents the number of non-NaN
elements. By default ddof is zero.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the original a.
* **Returns:**
**variance** – If out is None, return a new tensor containing the variance,
otherwise return a reference to the output tensor. If ddof is >= the
number of non-NaN elements in a slice or the slice contains only
NaNs, then the result for that slice is NaN.
* **Return type:**
Tensor, see dtype parameter above
#### SEE ALSO
[`std`](maxframe.tensor.std.md#maxframe.tensor.std)
: Standard deviation
[`mean`](maxframe.tensor.mean.md#maxframe.tensor.mean)
: Average
[`var`](maxframe.tensor.var.md#maxframe.tensor.var)
: Variance while not ignoring NaNs
[`nanstd`](maxframe.tensor.nanstd.md#maxframe.tensor.nanstd), [`nanmean`](maxframe.tensor.nanmean.md#maxframe.tensor.nanmean)
### Notes
The variance is the average of the squared deviations from the mean,
i.e., `var = mean(abs(x - x.mean())**2)`.
The mean is normally calculated as `x.sum() / N`, where `N = len(x)`.
If, however, ddof is specified, the divisor `N - ddof` is used
instead. In standard statistical practice, `ddof=1` provides an
unbiased estimator of the variance of a hypothetical infinite
population. `ddof=0` provides a maximum likelihood estimate of the
variance for normally distributed variables.
Note that for complex numbers, the absolute value is taken before
squaring, so that the result is always real and nonnegative.
For floating-point input, the variance is computed using the same
precision the input has. Depending on the input data, this can cause
the results to be inaccurate, especially for float32 (see example
below). Specifying a higher-accuracy accumulator using the `dtype`
keyword can alleviate this issue.
For this function to work on sub-classes of Tensor, they must define
sum with the kwarg keepdims
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[1, mt.nan], [3, 4]])
>>> mt.nanvar(a).execute()
1.5555555555555554
>>> mt.nanvar(a, axis=0).execute()
array([ 1., 0.])
>>> mt.nanvar(a, axis=1).execute()
array([ 0., 0.25])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.ndim.md
# maxframe.tensor.ndim
### maxframe.tensor.ndim(a)
Return the number of dimensions of a tensor.
* **Parameters:**
**a** (*array_like*) – Input tebsir. If it is not already a tensor, a conversion is
attempted.
* **Returns:**
**number_of_dimensions** – The number of dimensions in a. Scalars are zero-dimensional.
* **Return type:**
[int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
`ndarray.ndim`
: equivalent method
[`shape`](maxframe.tensor.shape.md#maxframe.tensor.shape)
: dimensions of tensor
`Tensor.shape`
: dimensions of tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> mt.ndim([[1,2,3],[4,5,6]])
2
>>> mt.ndim(mt.array([[1,2,3],[4,5,6]]))
2
>>> mt.ndim(1)
0
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.negative.md
# maxframe.tensor.negative
### maxframe.tensor.negative(x, out=None, where=None, \*\*kwargs)
Numerical negative, element-wise.
* **Parameters:**
* **x** (*array_like* *or* *scalar*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs** – For other keyword-only arguments, see the
[ufunc docs](https://numpy.org/doc/stable/reference/ufuncs.html#ufuncs-kwargs).
* **Returns:**
**y** – Returned array or scalar: y = -x.
* **Return type:**
Tensor or scalar
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.negative([1.,-1.]).execute()
array([-1., 1.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nextafter.md
# maxframe.tensor.nextafter
### maxframe.tensor.nextafter(x1, x2, out=None, where=None, \*\*kwargs)
Return the next floating-point value after x1 towards x2, element-wise.
* **Parameters:**
* **x1** (*array_like*) – Values to find the next representable value of.
* **x2** (*array_like*) – The direction where to look for the next representable value of x1.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – The next representable values of x1 in the direction of x2.
* **Return type:**
array_like
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> eps = mt.finfo(mt.float64).eps
>>> (mt.nextafter(1, 2) == eps + 1).execute()
True
>>> (mt.nextafter([1, 2], [2, 1]) == [eps + 1, 2 - eps]).execute()
array([ True, True])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.nonzero.md
# maxframe.tensor.nonzero
### maxframe.tensor.nonzero(a)
Return the indices of the elements that are non-zero.
Returns a tuple of tensors, one for each dimension of a,
containing the indices of the non-zero elements in that
dimension. The values in a are always tested and returned.
The corresponding non-zero
values can be obtained with:
```default
a[nonzero(a)]
```
To group the indices by element, rather than dimension, use:
```default
transpose(nonzero(a))
```
The result of this is always a 2-D array, with a row for
each non-zero element.
* **Parameters:**
**a** (*array_like*) – Input tensor.
* **Returns:**
**tuple_of_arrays** – Indices of elements that are non-zero.
* **Return type:**
[tuple](https://docs.python.org/3/library/stdtypes.html#tuple)
#### SEE ALSO
[`flatnonzero`](maxframe.tensor.flatnonzero.md#maxframe.tensor.flatnonzero)
: Return indices that are non-zero in the flattened version of the input tensor.
`Tensor.nonzero`
: Equivalent tensor method.
[`count_nonzero`](maxframe.tensor.count_nonzero.md#maxframe.tensor.count_nonzero)
: Counts the number of non-zero elements in the input tensor.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([[1,0,0], [0,2,0], [1,1,0]])
>>> x.execute()
array([[1, 0, 0],
[0, 2, 0],
[1, 1, 0]])
>>> mt.nonzero(x).execute()
(array([0, 1, 2, 2]), array([0, 1, 0, 1]))
```
```pycon
>>> x[mt.nonzero(x)].execute()
```
```pycon
>>> mt.transpose(mt.nonzero(x)).execute()
```
A common use for `nonzero` is to find the indices of an array, where
a condition is True. Given an array a, the condition a > 3 is a
boolean array and since False is interpreted as 0, np.nonzero(a > 3)
yields the indices of the a where the condition is true.
```pycon
>>> a = mt.array([[1,2,3],[4,5,6],[7,8,9]])
>>> (a > 3).execute()
array([[False, False, False],
[ True, True, True],
[ True, True, True]])
>>> mt.nonzero(a > 3).execute()
(array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))
```
The `nonzero` method of the boolean array can also be called.
```pycon
>>> (a > 3).nonzero().execute()
(array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.not_equal.md
# maxframe.tensor.not_equal
### maxframe.tensor.not_equal(x1, x2, out=None, where=None, \*\*kwargs)
Return (x1 != x2) element-wise.
* **Parameters:**
* **x1** (*array_like*) – Input tensors.
* **x2** (*array_like*) – Input tensors.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**not_equal** – For each element in x1, x2, return True if x1 is not equal
to x2 and False otherwise.
* **Return type:**
tensor bool, scalar bool
#### SEE ALSO
[`equal`](maxframe.tensor.equal.md#maxframe.tensor.equal), [`greater`](maxframe.tensor.greater.md#maxframe.tensor.greater), [`greater_equal`](maxframe.tensor.greater_equal.md#maxframe.tensor.greater_equal), [`less`](maxframe.tensor.less.md#maxframe.tensor.less), [`less_equal`](maxframe.tensor.less_equal.md#maxframe.tensor.less_equal)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.not_equal([1.,2.], [1., 3.]).execute()
array([False, True])
>>> mt.not_equal([1, 2], [[1, 3],[1, 4]]).execute()
array([[False, True],
[False, True]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.ogrid.md
# maxframe.tensor.ogrid
### maxframe.tensor.ogrid *= <maxframe.tensor.lib.index_tricks.nd_grid object>*
Construct a multi-dimensional “meshgrid”.
`grid = nd_grid()` creates an instance which will return a mesh-grid
when indexed. The dimension and number of the output arrays are equal
to the number of indexing dimensions. If the step length is not a
complex number, then the stop is not inclusive.
However, if the step length is a **complex number** (e.g. 5j), then the
integer part of its magnitude is interpreted as specifying the
number of points to create between the start and stop values, where
the stop value **is inclusive**.
If instantiated with an argument of `sparse=True`, the mesh-grid is
open (or not fleshed out) so that only one-dimension of each returned
argument is greater than 1.
* **Parameters:**
**sparse** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Whether the grid is sparse or not. Default is False.
### Notes
Two instances of nd_grid are made available in the maxframe.tensor namespace,
mgrid and ogrid:
```default
mgrid = nd_grid(sparse=False)
ogrid = nd_grid(sparse=True)
```
Users should use these pre-defined instances instead of using nd_grid
directly.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mgrid = mt.lib.index_tricks.nd_grid()
>>> mgrid[0:5,0:5]
array([[[0, 0, 0, 0, 0],
[1, 1, 1, 1, 1],
[2, 2, 2, 2, 2],
[3, 3, 3, 3, 3],
[4, 4, 4, 4, 4]],
[[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]]])
>>> mgrid[-1:1:5j]
array([-1. , -0.5, 0. , 0.5, 1. ])
```
```pycon
>>> ogrid = mt.lib.index_tricks.nd_grid(sparse=True)
>>> ogrid[0:5,0:5]
[array([[0],
[1],
[2],
[3],
[4]]), array([[0, 1, 2, 3, 4]])]
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.ones.md
# maxframe.tensor.ones
### maxframe.tensor.ones(shape, dtype=None, chunk_size=None, gpu=None, order='C')
Return a new tensor of given shape and type, filled with ones.
* **Parameters:**
* **shape** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *sequence* *of* *ints*) – Shape of the new tensor, e.g., `(2, 3)` or `2`.
* **dtype** (*data-type* *,* *optional*) – The desired data-type for the tensor, e.g., mt.int8. Default is
mt.float64.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **order** ( *{'C'* *,* *'F'}* *,* *optional* *,* *default: C*) – Whether to store multi-dimensional data in row-major
(C-style) or column-major (Fortran-style) order in
memory.
* **Returns:**
**out** – Tensor of ones with the given shape, dtype, and order.
* **Return type:**
Tensor
#### SEE ALSO
[`zeros`](maxframe.tensor.zeros.md#maxframe.tensor.zeros), `ones_like`
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.ones(5).execute()
array([ 1., 1., 1., 1., 1.])
```
```pycon
>>> mt.ones((5,), dtype=int).execute()
array([1, 1, 1, 1, 1])
```
```pycon
>>> mt.ones((2, 1)).execute()
array([[ 1.],
[ 1.]])
```
```pycon
>>> s = (2,2)
>>> mt.ones(s).execute()
array([[ 1., 1.],
[ 1., 1.]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.partition.md
# maxframe.tensor.partition
### maxframe.tensor.partition(a, kth, axis=-1, kind='introselect', order=None, \*\*kw)
Return a partitioned copy of a tensor.
Creates a copy of the tensor with its elements rearranged in such a
way that the value of the element in k-th position is in the
position it would be in a sorted tensor. All elements smaller than
the k-th element are moved before this element and all equal or
greater are moved behind it. The ordering of the elements in the two
partitions is undefined.
* **Parameters:**
* **a** (*array_like*) – Tensor to be sorted.
* **kth** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *sequence* *of* *ints*) – Element index to partition by. The k-th value of the element
will be in its final sorted position and all smaller elements
will be moved before it and all equal or greater elements behind
it. The order of all elements in the partitions is undefined. If
provided with a sequence of k-th it will partition all elements
indexed by k-th of them into their sorted position at once.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *None* *,* *optional*) – Axis along which to sort. If None, the tensor is flattened before
sorting. The default is -1, which sorts along the last axis.
* **kind** ( *{'introselect'}* *,* *optional*) – Selection algorithm. Default is ‘introselect’.
* **order** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – When a is a tensor with fields defined, this argument
specifies which fields to compare first, second, etc. A single
field can be specified as a string. Not all fields need be
specified, but unspecified fields will still be used, in the
order in which they come up in the dtype, to break ties.
* **Returns:**
**partitioned_tensor** – Tensor of the same type and shape as a.
* **Return type:**
Tensor
#### SEE ALSO
`Tensor.partition`
: Method to sort a tensor in-place.
[`argpartition`](maxframe.tensor.argpartition.md#maxframe.tensor.argpartition)
: Indirect partition.
[`sort`](maxframe.tensor.sort.md#maxframe.tensor.sort)
: Full sorting
### Notes
The various selection algorithms are characterized by their average
speed, worst case performance, work space size, and whether they are
stable. A stable sort keeps items with the same key in the same
relative order. The available algorithms have the following
properties:
| kind | speed | worst case | work space | stable |
|---------------|---------|--------------|--------------|----------|
| ‘introselect’ | 1 | O(n) | 0 | no |
All the partition algorithms make temporary copies of the data when
partitioning along any but the last axis. Consequently,
partitioning along the last axis is faster and uses less space than
partitioning along any other axis.
The sort order for complex numbers is lexicographic. If both the
real and imaginary parts are non-nan then the order is determined by
the real parts except when they are equal, in which case the order
is determined by the imaginary parts.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = mt.array([3, 4, 2, 1])
>>> mt.partition(a, 3).execute()
array([2, 1, 3, 4])
```
```pycon
>>> mt.partition(a, (1, 3)).execute()
array([1, 2, 3, 4])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.percentile.md
# maxframe.tensor.percentile
### maxframe.tensor.percentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)
Compute the q-th percentile of the data along the specified axis.
Returns the q-th percentile(s) of the array elements.
* **Parameters:**
* **a** (*array_like*) – Input tensor or object that can be converted to a tensor.
* **q** (*array_like* *of* [*float*](https://docs.python.org/3/library/functions.html#float)) – Percentile or sequence of percentiles to compute, which must be between
0 and 100 inclusive.
* **axis** ( *{int* *,* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *None}* *,* *optional*) – Axis or axes along which the percentiles are computed. The
default is to compute the percentile(s) along a flattened
version of the tensor.
* **out** (*ndarray* *,* *optional*) – Alternative output array in which to place the result. It must
have the same shape and buffer length as the expected output,
but the type (of the output) will be cast if necessary.
* **overwrite_input** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Just for compatibility with Numpy, would not take effect.
* **interpolation** ( *{'linear'* *,* *'lower'* *,* *'higher'* *,* *'midpoint'* *,* *'nearest'}*) –
This optional parameter specifies the interpolation method to
use when the desired percentile lies between two data points
`i < j`:
* ’linear’: `i + (j - i) * fraction`, where `fraction`
is the fractional part of the index surrounded by `i`
and `j`.
* ’lower’: `i`.
* ’higher’: `j`.
* ’nearest’: `i` or `j`, whichever is nearest.
* ’midpoint’: `(i + j) / 2`.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If this is set to True, the axes which are reduced are left in
the result as dimensions with size one. With this option, the
result will broadcast correctly against the original array a.
* **Returns:**
**percentile** – If q is a single percentile and axis=None, then the result
is a scalar. If multiple percentiles are given, first axis of
the result corresponds to the percentiles. The other axes are
the axes that remain after the reduction of a. If the input
contains integers or floats smaller than `float64`, the output
data-type is `float64`. Otherwise, the output data-type is the
same as that of the input. If out is specified, that array is
returned instead.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`mean`](maxframe.tensor.mean.md#maxframe.tensor.mean)
[`median`](maxframe.tensor.median.md#maxframe.tensor.median)
: equivalent to `percentile(..., 50)`
`nanpercentile`
[`quantile`](maxframe.tensor.quantile.md#maxframe.tensor.quantile)
: equivalent to percentile, except with q in the range [0, 1].
### Notes
Given a vector `V` of length `N`, the q-th percentile of
`V` is the value `q/100` of the way from the minimum to the
maximum in a sorted copy of `V`. The values and distances of
the two nearest neighbors as well as the interpolation parameter
will determine the percentile if the normalized ranking does not
match the location of `q` exactly. This function is the same as
the median if `q=50`, the same as the minimum if `q=0` and the
same as the maximum if `q=100`.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = mt.array([[10, 7, 4], [3, 2, 1]])
>>> a.execute()
array([[10, 7, 4],
[ 3, 2, 1]])
>>> mt.percentile(a, 50).execute()
3.5
>>> mt.percentile(a, 50, axis=0).execute()
array([6.5, 4.5, 2.5])
>>> mt.percentile(a, 50, axis=1).execute()
array([7., 2.])
>>> mt.percentile(a, 50, axis=1, keepdims=True).execute()
array([[7.],
[2.]])
```
```pycon
>>> m = mt.percentile(a, 50, axis=0)
>>> out = mt.zeros_like(m)
>>> mt.percentile(a, 50, axis=0, out=out).execute()
array([6.5, 4.5, 2.5])
>>> m.execute()
array([6.5, 4.5, 2.5])
```
The different types of interpolation can be visualized graphically:
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.positive.md
# maxframe.tensor.positive
### maxframe.tensor.positive(x, out=None, where=None, \*\*kwargs)
Numerical positive, element-wise.
* **Parameters:**
**x** (*array_like* *or* *scalar*) – Input tensor.
* **Returns:**
**y** – Returned array or scalar: y = +x.
* **Return type:**
Tensor or scalar
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.power.md
# maxframe.tensor.power
### maxframe.tensor.power(x1, x2, out=None, where=None, \*\*kwargs)
First tensor elements raised to powers from second tensor, element-wise.
Raise each base in x1 to the positionally-corresponding power in
x2. x1 and x2 must be broadcastable to the same shape. Note that an
integer type raised to a negative integer power will raise a ValueError.
* **Parameters:**
* **x1** (*array_like*) – The bases.
* **x2** (*array_like*) – The exponents.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The bases in x1 raised to the exponents in x2.
* **Return type:**
Tensor
#### SEE ALSO
[`float_power`](maxframe.tensor.float_power.md#maxframe.tensor.float_power)
: power function that promotes integers to float
### Examples
Cube each element in a list.
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x1 = range(6)
>>> x1
[0, 1, 2, 3, 4, 5]
>>> mt.power(x1, 3).execute()
array([ 0, 1, 8, 27, 64, 125])
```
Raise the bases to different exponents.
```pycon
>>> x2 = [1.0, 2.0, 3.0, 3.0, 2.0, 1.0]
>>> mt.power(x1, x2).execute()
array([ 0., 1., 8., 27., 16., 5.])
```
The effect of broadcasting.
```pycon
>>> x2 = mt.array([[1, 2, 3, 3, 2, 1], [1, 2, 3, 3, 2, 1]])
>>> x2.execute()
array([[1, 2, 3, 3, 2, 1],
[1, 2, 3, 3, 2, 1]])
>>> mt.power(x1, x2).execute()
array([[ 0, 1, 8, 27, 16, 5],
[ 0, 1, 8, 27, 16, 5]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.prod.md
# maxframe.tensor.prod
### maxframe.tensor.prod(a, axis=None, dtype=None, out=None, keepdims=None)
Return the product of tensor elements over a given axis.
* **Parameters:**
* **a** (*array_like*) – Input data.
* **axis** (*None* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) –
Axis or axes along which a product is performed. The default,
axis=None, will calculate the product of all the elements in the
input tensor. If axis is negative it counts from the last to the
first axis.
If axis is a tuple of ints, a product is performed on all of the
axes specified in the tuple instead of a single axis or all the
axes as before.
* **dtype** (*dtype* *,* *optional*) – The type of the returned tensor, as well as of the accumulator in
which the elements are multiplied. The dtype of a is used by
default unless a has an integer dtype of less precision than the
default platform integer. In that case, if a is signed then the
platform integer is used while if a is unsigned then an unsigned
integer of the same precision as the platform integer is used.
* **out** (*Tensor* *,* *optional*) – Alternative output tensor in which to place the result. It must have
the same shape as the expected output, but the type of the output
values will be cast if necessary.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If this is set to True, the axes which are reduced are left in the
result as dimensions with size one. With this option, the result
will broadcast correctly against the input array.
If the default value is passed, then keepdims will not be
passed through to the prod method of sub-classes of
Tensor, however any non-default value will be. If the
sub-classes sum method does not implement keepdims any
exceptions will be raised.
* **Returns:**
**product_along_axis** – An tensor shaped as a but with the specified axis removed.
Returns a reference to out if specified.
* **Return type:**
Tensor, see dtype parameter above.
#### SEE ALSO
`Tensor.prod`
: equivalent method
### Notes
Arithmetic is modular when using integer types, and no error is
raised on overflow. That means that, on a 32-bit platform:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([536870910, 536870910, 536870910, 536870910])
>>> mt.prod(x).execute() # random
16
```
The product of an empty array is the neutral element 1:
```pycon
>>> mt.prod([]).execute()
1.0
```
### Examples
By default, calculate the product of all elements:
```pycon
>>> mt.prod([1.,2.]).execute()
2.0
```
Even when the input array is two-dimensional:
```pycon
>>> mt.prod([[1.,2.],[3.,4.]]).execute()
24.0
```
But we can also specify the axis over which to multiply:
```pycon
>>> mt.prod([[1.,2.],[3.,4.]], axis=1).execute()
array([ 2., 12.])
```
If the type of x is unsigned, then the output type is
the unsigned platform integer:
```pycon
>>> x = mt.array([1, 2, 3], dtype=mt.uint8)
>>> mt.prod(x).dtype == mt.uint
True
```
If x is of a signed integer type, then the output type
is the default platform integer:
```pycon
>>> x = mt.array([1, 2, 3], dtype=mt.int8)
>>> mt.prod(x).dtype == int
True
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.ptp.md
# maxframe.tensor.ptp
### maxframe.tensor.ptp(a, axis=None, out=None, keepdims=None)
Range of values (maximum - minimum) along an axis.
The name of the function comes from the acronym for ‘peak to peak’.
* **Parameters:**
* **a** (*array_like*) – Input values.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Axis along which to find the peaks. By default, flatten the
array.
* **out** (*array_like*) – Alternative output tensor in which to place the result. It must
have the same shape and buffer length as the expected output,
but the type of the output values will be cast if necessary.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input array.
If the default value is passed, then keepdims will not be
passed through to the ptp method of sub-classes of
Tensor, however any non-default value will be. If the
sub-class’ method does not implement keepdims any
exceptions will be raised.
* **Returns:**
**ptp** – A new tensor holding the result, unless out was
specified, in which case a reference to out is returned.
* **Return type:**
Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(4).reshape((2,2))
>>> x.execute()
array([[0, 1],
[2, 3]])
```
```pycon
>>> mt.ptp(x, axis=0).execute()
array([2, 2])
```
```pycon
>>> mt.ptp(x, axis=1).execute()
array([1, 1])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.quantile.md
# maxframe.tensor.quantile
### maxframe.tensor.quantile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False, \*\*kw)
Compute the q-th quantile of the data along the specified axis.
* **Parameters:**
* **a** (*array_like*) – Input tensor or object that can be converted to a tensor.
* **q** (*array_like* *of* [*float*](https://docs.python.org/3/library/functions.html#float)) – Quantile or sequence of quantiles to compute, which must be between
0 and 1 inclusive.
* **axis** ( *{int* *,* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *,* *None}* *,* *optional*) – Axis or axes along which the quantiles are computed. The
default is to compute the quantile(s) along a flattened
version of the tensor.
* **out** (*Tensor* *,* *optional*) – Alternative output tensor in which to place the result. It must
have the same shape and buffer length as the expected output,
but the type (of the output) will be cast if necessary.
* **overwrite_input** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Just for compatibility with Numpy, would not take effect.
* **interpolation** ( *{'linear'* *,* *'lower'* *,* *'higher'* *,* *'midpoint'* *,* *'nearest'}*) –
This optional parameter specifies the interpolation method to
use when the desired quantile lies between two data points
`i < j`:
> * linear: `i + (j - i) * fraction`, where `fraction`
> is the fractional part of the index surrounded by `i`
> and `j`.
> * lower: `i`.
> * higher: `j`.
> * nearest: `i` or `j`, whichever is nearest.
> * midpoint: `(i + j) / 2`.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If this is set to True, the axes which are reduced are left in
the result as dimensions with size one. With this option, the
result will broadcast correctly against the original tensor a.
* **Returns:**
**quantile** – If q is a single quantile and axis=None, then the result
is a scalar. If multiple quantiles are given, first axis of
the result corresponds to the quantiles. The other axes are
the axes that remain after the reduction of a. If the input
contains integers or floats smaller than `float64`, the output
data-type is `float64`. Otherwise, the output data-type is the
same as that of the input. If out is specified, that tensor is
returned instead.
* **Return type:**
scalar or Tensor
#### SEE ALSO
[`mean`](maxframe.tensor.mean.md#maxframe.tensor.mean)
[`percentile`](maxframe.tensor.percentile.md#maxframe.tensor.percentile)
: equivalent to quantile, but with q in the range [0, 100].
[`median`](maxframe.tensor.median.md#maxframe.tensor.median)
: equivalent to `quantile(..., 0.5)`
`nanquantile`
### Notes
Given a vector `V` of length `N`, the q-th quantile of
`V` is the value `q` of the way from the minimum to the
maximum in a sorted copy of `V`. The values and distances of
the two nearest neighbors as well as the interpolation parameter
will determine the quantile if the normalized ranking does not
match the location of `q` exactly. This function is the same as
the median if `q=0.5`, the same as the minimum if `q=0.0` and the
same as the maximum if `q=1.0`.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = mt.array([[10, 7, 4], [3, 2, 1]])
>>> a.execute()
array([[10, 7, 4],
[ 3, 2, 1]])
>>> mt.quantile(a, 0.5).execute()
3.5
>>> mt.quantile(a, 0.5, axis=0).execute()
array([6.5, 4.5, 2.5])
>>> mt.quantile(a, 0.5, axis=1).execute()
array([7., 2.])
>>> mt.quantile(a, 0.5, axis=1, keepdims=True).execute()
array([[7.],
[2.]])
>>> m = mt.quantile(a, 0.5, axis=0)
>>> out = mt.zeros_like(m)
>>> mt.quantile(a, 0.5, axis=0, out=out).execute()
array([6.5, 4.5, 2.5])
>>> m.execute()
array([6.5, 4.5, 2.5])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.r_.md
# maxframe.tensor.r_
### maxframe.tensor.r_ *= <maxframe.tensor.lib.index_tricks.RClass object>*
Translates slice objects to concatenation along the first axis.
This is a simple way to build up tensor quickly. There are two use cases.
1. If the index expression contains comma separated tensors, then stack
them along their first axis.
2. If the index expression contains slice notation or scalars then create
a 1-D tensor with a range indicated by the slice notation.
If slice notation is used, the syntax `start:stop:step` is equivalent
to `mt.arange(start, stop, step)` inside of the brackets. However, if
`step` is an imaginary number (i.e. 100j) then its integer portion is
interpreted as a number-of-points desired and the start and stop are
inclusive. In other words `start:stop:stepj` is interpreted as
`mt.linspace(start, stop, step, endpoint=1)` inside of the brackets.
After expansion of slice notation, all comma separated sequences are
concatenated together.
Optional character strings placed as the first element of the index
expression can be used to change the output. The strings ‘r’ or ‘c’ result
in matrix output. If the result is 1-D and ‘r’ is specified a 1 x N (row)
matrix is produced. If the result is 1-D and ‘c’ is specified, then a N x 1
(column) matrix is produced. If the result is 2-D then both provide the
same matrix result.
A string integer specifies which axis to stack multiple comma separated
tensors along. A string of two comma-separated integers allows indication
of the minimum number of dimensions to force each entry into as the
second integer (the axis to concatenate along is still the first integer).
A string with three comma-separated integers allows specification of the
axis to concatenate along, the minimum number of dimensions to force the
entries to, and which axis should contain the start of the tensors which
are less than the specified number of dimensions. In other words the third
integer allows you to specify where the 1’s should be placed in the shape
of the tensors that have their shapes upgraded. By default, they are placed
in the front of the shape tuple. The third argument allows you to specify
where the start of the tensor should be instead. Thus, a third argument of
‘0’ would place the 1’s at the end of the tensor shape. Negative integers
specify where in the new shape tuple the last dimension of upgraded tensors
should be placed, so the default is ‘-1’.
* **Parameters:**
* **function** (*Not a*)
* **parameters** (*so takes no*)
* **Return type:**
A concatenated tensor or matrix.
#### SEE ALSO
[`concatenate`](maxframe.tensor.concatenate.md#maxframe.tensor.concatenate)
: Join a sequence of tensors along an existing axis.
[`c_`](maxframe.tensor.c_.md#maxframe.tensor.c_)
: Translates slice objects to concatenation along the second axis.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> mt.r_[mt.array([1,2,3]), 0, 0, mt.array([4,5,6])].execute()
array([1, 2, 3, ..., 4, 5, 6])
>>> mt.r_[-1:1:6j, [0]*3, 5, 6].execute()
array([-1. , -0.6, -0.2, 0.2, 0.6, 1. , 0. , 0. , 0. , 5. , 6. ])
```
String integers specify the axis to concatenate along or the minimum
number of dimensions to force entries into.
```pycon
>>> a = mt.array([[0, 1, 2], [3, 4, 5]])
>>> mt.r_['-1', a, a].execute() # concatenate along last axis
array([[0, 1, 2, 0, 1, 2],
[3, 4, 5, 3, 4, 5]])
>>> mt.r_['0,2', [1,2,3], [4,5,6]].execute() # concatenate along first axis, dim>=2
array([[1, 2, 3],
[4, 5, 6]])
```
```pycon
>>> mt.r_['0,2,0', [1,2,3], [4,5,6]].execute()
array([[1],
[2],
[3],
[4],
[5],
[6]])
>>> mt.r_['1,2,0', [1,2,3], [4,5,6]].execute()
array([[1, 4],
[2, 5],
[3, 6]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.rad2deg.md
# maxframe.tensor.rad2deg
### maxframe.tensor.rad2deg(x, out=None, where=None, \*\*kwargs)
Convert angles from radians to degrees.
* **Parameters:**
* **x** (*array_like*) – Angle in radians.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The corresponding angle in degrees.
* **Return type:**
Tensor
#### SEE ALSO
[`deg2rad`](maxframe.tensor.deg2rad.md#maxframe.tensor.deg2rad)
: Convert angles from degrees to radians.
### Notes
rad2deg(x) is `180 * x / pi`.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.rad2deg(mt.pi/2).execute()
90.0
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.radians.md
# maxframe.tensor.radians
### maxframe.tensor.radians(x, out=None, where=None, \*\*kwargs)
Convert angles from degrees to radians.
* **Parameters:**
* **x** (*array_like*) – Input tensor in degrees.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The corresponding radian values.
* **Return type:**
Tensor
#### SEE ALSO
[`deg2rad`](maxframe.tensor.deg2rad.md#maxframe.tensor.deg2rad)
: equivalent function
### Examples
Convert a degree array to radians
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> deg = mt.arange(12.) * 30.
>>> mt.radians(deg).execute()
array([ 0. , 0.52359878, 1.04719755, 1.57079633, 2.0943951 ,
2.61799388, 3.14159265, 3.66519143, 4.1887902 , 4.71238898,
5.23598776, 5.75958653])
```
```pycon
>>> out = mt.zeros((deg.shape))
>>> ret = mt.radians(deg, out)
>>> ret is out
True
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.RandomState.md
# maxframe.tensor.random.RandomState
### *class* maxframe.tensor.random.RandomState(seed=None)
#### \_\_init_\_(seed=None)
### Methods
| [`__init__`](#maxframe.tensor.random.RandomState.__init__)([seed]) | |
|----------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|
| `beta`(a, b[, size, chunk_size, gpu, dtype]) | Draw samples from a Beta distribution. |
| `binomial`(n, p[, size, chunk_size, gpu, dtype]) | Draw samples from a binomial distribution. |
| `bytes`(length) | Return random bytes. |
| `chisquare`(df[, size, chunk_size, gpu, dtype]) | Draw samples from a chi-square distribution. |
| `choice`(a[, size, replace, p, chunk_size, gpu]) | Generates a random sample from a given 1-D array |
| `dirichlet`(alpha[, size, chunk_size, gpu, dtype]) | Draw samples from the Dirichlet distribution. |
| `exponential`([scale, size, chunk_size, gpu, ...]) | Draw samples from an exponential distribution. |
| `f`(dfnum, dfden[, size, chunk_size, gpu, dtype]) | Draw samples from an F distribution. |
| `from_numpy`(np_random_state) | |
| `gamma`(shape[, scale, size, chunk_size, gpu, ...]) | Draw samples from a Gamma distribution. |
| `geometric`(p[, size, chunk_size, gpu, dtype]) | Draw samples from the geometric distribution. |
| `gumbel`([loc, scale, size, chunk_size, gpu, ...]) | Draw samples from a Gumbel distribution. |
| `hypergeometric`(ngood, nbad, nsample[, size, ...]) | Draw samples from a Hypergeometric distribution. |
| `laplace`([loc, scale, size, chunk_size, gpu, ...]) | Draw samples from the Laplace or double exponential distribution with specified location (or mean) and scale (decay). |
| `logistic`([loc, scale, size, chunk_size, ...]) | Draw samples from a logistic distribution. |
| `lognormal`([mean, sigma, size, chunk_size, ...]) | Draw samples from a log-normal distribution. |
| `logseries`(p[, size, chunk_size, gpu, dtype]) | Draw samples from a logarithmic series distribution. |
| `multinomial`(n, pvals[, size, chunk_size, ...]) | Draw samples from a multinomial distribution. |
| `multivariate_normal`(mean, cov[, size, ...]) | Draw random samples from a multivariate normal distribution. |
| `negative_binomial`(n, p[, size, chunk_size, ...]) | Draw samples from a negative binomial distribution. |
| `noncentral_chisquare`(df, nonc[, size, ...]) | Draw samples from a noncentral chi-square distribution. |
| `noncentral_f`(dfnum, dfden, nonc[, size, ...]) | Draw samples from the noncentral F distribution. |
| `normal`([loc, scale, size, chunk_size, gpu, ...]) | Draw random samples from a normal (Gaussian) distribution. |
| `pareto`(a[, size, chunk_size, gpu, dtype]) | Draw samples from a Pareto II or Lomax distribution with specified shape. |
| `permutation`(x[, axis, chunk_size]) | Randomly permute a sequence, or return a permuted range. |
| `poisson`([lam, size, chunk_size, gpu, dtype]) | Draw samples from a Poisson distribution. |
| `power`(a[, size, chunk_size, gpu, dtype]) | Draws samples in [0, 1] from a power distribution with positive exponent a - 1. |
| `rand`(\*dn, \*\*kw) | Random values in a given shape. |
| `randint`(low[, high, size, dtype, density, ...]) | Return random integers from low (inclusive) to high (exclusive). |
| `randn`(\*dn, \*\*kw) | Return a sample (or samples) from the "standard normal" distribution. |
| `random`([size, chunk_size, gpu, dtype]) | Return random floats in the half-open interval [0.0, 1.0). |
| `random_integers`(low[, high, size, ...]) | Random integers of type mt.int between low and high, inclusive. |
| `random_sample`([size, chunk_size, gpu, dtype]) | Return random floats in the half-open interval [0.0, 1.0). |
| `ranf`([size, chunk_size, gpu, dtype]) | Return random floats in the half-open interval [0.0, 1.0). |
| `rayleigh`([scale, size, chunk_size, gpu, dtype]) | Draw samples from a Rayleigh distribution. |
| `sample`([size, chunk_size, gpu, dtype]) | Return random floats in the half-open interval [0.0, 1.0). |
| `seed`([seed]) | Seed the generator. |
| `shuffle`(x[, axis]) | Modify a sequence in-place by shuffling its contents. |
| `standard_cauchy`([size, chunk_size, gpu, dtype]) | Draw samples from a standard Cauchy distribution with mode = 0. |
| `standard_exponential`([size, chunk_size, ...]) | Draw samples from the standard exponential distribution. |
| `standard_gamma`(shape[, size, chunk_size, ...]) | Draw samples from a standard Gamma distribution. |
| `standard_normal`([size, chunk_size, gpu, dtype]) | Draw samples from a standard Normal distribution (mean=0, stdev=1). |
| `standard_t`(df[, size, chunk_size, gpu, dtype]) | Draw samples from a standard Student's t distribution with df degrees of freedom. |
| `to_numpy`() | |
| `triangular`(left, mode, right[, size, ...]) | Draw samples from the triangular distribution over the interval `[left, right]`. |
| `uniform`([low, high, size, chunk_size, gpu, ...]) | Draw samples from a uniform distribution. |
| `vonmises`(mu, kappa[, size, chunk_size, gpu, ...]) | Draw samples from a von Mises distribution. |
| `wald`(mean, scale[, size, chunk_size, gpu, dtype]) | Draw samples from a Wald, or inverse Gaussian, distribution. |
| `weibull`(a[, size, chunk_size, gpu, dtype]) | Draw samples from a Weibull distribution. |
| `zipf`(a[, size, chunk_size, gpu, dtype]) | Draw samples from a Zipf distribution. |
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.beta.md
# maxframe.tensor.random.beta
### maxframe.tensor.random.beta(a, b, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a Beta distribution.
The Beta distribution is a special case of the Dirichlet distribution,
and is related to the Gamma distribution. It has the probability
distribution function
$$
f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1}
(1 - x)^{\beta - 1},
$$
where the normalisation, B, is the beta function,
$$
B(\alpha, \beta) = \int_0^1 t^{\alpha - 1}
(1 - t)^{\beta - 1} dt.
$$
It is often seen in Bayesian inference and order statistics.
* **Parameters:**
* **a** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Alpha, non-negative.
* **b** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Beta, non-negative.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `a` and `b` are both scalars.
Otherwise, `mt.broadcast(a, b).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized beta distribution.
* **Return type:**
Tensor or scalar
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.binomial.md
# maxframe.tensor.random.binomial
### maxframe.tensor.random.binomial(n, p, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a binomial distribution.
Samples are drawn from a binomial distribution with specified
parameters, n trials and p probability of success where
n an integer >= 0 and p is in the interval [0,1]. (n may be
input as a float, but it is truncated to an integer in use)
* **Parameters:**
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *array_like* *of* *ints*) – Parameter of the distribution, >= 0. Floats are also accepted,
but they will be truncated to integers.
* **p** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Parameter of the distribution, >= 0 and <=1.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `n` and `p` are both scalars.
Otherwise, `mt.broadcast(n, p).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized binomial distribution, where
each sample is equal to the number of successes over the n trials.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.binom`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html#scipy.stats.binom)
: probability density function, distribution or cumulative density function, etc.
### Notes
The probability density for the binomial distribution is
$$
P(N) = \binom{n}{N}p^N(1-p)^{n-N},
$$
where $n$ is the number of trials, $p$ is the probability
of success, and $N$ is the number of successes.
When estimating the standard error of a proportion in a population by
using a random sample, the normal distribution works well unless the
product p\*n <=5, where p = population proportion estimate, and n =
number of samples, in which case the binomial distribution is used
instead. For example, a sample of 15 people shows 4 who are left
handed, and 11 who are right handed. Then p = 4/15 = 27%. 0.27\*15 = 4,
so the binomial distribution should be used in this case.
### References
* <a id='id1'>**[1]**</a> Dalgaard, Peter, “Introductory Statistics with R”, Springer-Verlag, 2002.
* <a id='id2'>**[2]**</a> Glantz, Stanton A. “Primer of Biostatistics.”, McGraw-Hill, Fifth Edition, 2002.
* <a id='id3'>**[3]**</a> Lentner, Marvin, “Elementary Applied Statistics”, Bogden and Quigley, 1972.
* <a id='id4'>**[4]**</a> Weisstein, Eric W. “Binomial Distribution.” From MathWorld–A Wolfram Web Resource. [http://mathworld.wolfram.com/BinomialDistribution.html](http://mathworld.wolfram.com/BinomialDistribution.html)
* <a id='id5'>**[5]**</a> Wikipedia, “Binomial distribution”, [http://en.wikipedia.org/wiki/Binomial_distribution](http://en.wikipedia.org/wiki/Binomial_distribution)
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> n, p = 10, .5 # number of trials, probability of each trial
>>> s = mt.random.binomial(n, p, 1000).execute()
# result of flipping a coin 10 times, tested 1000 times.
```
A real world example. A company drills 9 wild-cat oil exploration
wells, each with an estimated probability of success of 0.1. All nine
wells fail. What is the probability of that happening?
Let’s do 20,000 trials of the model, and count the number that
generate zero positive results.
```pycon
>>> (mt.sum(mt.random.binomial(9, 0.1, 20000) == 0)/20000.).execute()
# answer = 0.38885, or 38%.
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.bytes.md
# maxframe.tensor.random.bytes
### maxframe.tensor.random.bytes(length)
> Return random bytes.
> length
> : Number of random bytes.
> out
> : String of length length.
> ```pycon
> >>> import maxframe.tensor as mt
> ```
> ```pycon
> >>> mt.random.bytes(10)
> ' eh
> ```
2SZ¿¤’ #random
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.chisquare.md
# maxframe.tensor.random.chisquare
### maxframe.tensor.random.chisquare(df, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a chi-square distribution.
When df independent random variables, each with standard normal
distributions (mean 0, variance 1), are squared and summed, the
resulting distribution is chi-square (see Notes). This distribution
is often used in hypothesis testing.
* **Parameters:**
* **df** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Number of degrees of freedom, should be > 0.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `df` is a scalar. Otherwise,
`mt.array(df).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized chi-square distribution.
* **Return type:**
Tensor or scalar
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – When df <= 0 or when an inappropriate size (e.g. `size=-1`)
is given.
### Notes
The variable obtained by summing the squares of df independent,
standard normally distributed random variables:
$$
Q = \sum_{i=0}^{\mathtt{df}} X^2_i
$$
is chi-square distributed, denoted
$$
Q \sim \chi^2_k.
$$
The probability density function of the chi-squared distribution is
$$
p(x) = \frac{(1/2)^{k/2}}{\Gamma(k/2)}
x^{k/2 - 1} e^{-x/2},
$$
where $\Gamma$ is the gamma function,
$$
\Gamma(x) = \int_0^{-\infty} t^{x - 1} e^{-t} dt.
$$
### References
* <a id='id1'>**[1]**</a> NIST “Engineering Statistics Handbook” [http://www.itl.nist.gov/div898/handbook/eda/section3/eda3666.htm](http://www.itl.nist.gov/div898/handbook/eda/section3/eda3666.htm)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.random.chisquare(2,4).execute()
array([ 1.89920014, 9.00867716, 3.13710533, 5.62318272])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.choice.md
# maxframe.tensor.random.choice
### maxframe.tensor.random.choice(a, size=None, replace=True, p=None, chunk_size=None, gpu=None)
Generates a random sample from a given 1-D array
* **Parameters:**
* **a** (*1-D array-like* *or* [*int*](https://docs.python.org/3/library/functions.html#int)) – If a tensor, a random sample is generated from its elements.
If an int, the random sample is generated as if a were mt.arange(a)
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. Default is None, in which case a
single value is returned.
* **replace** (*boolean* *,* *optional*) – Whether the sample is with or without replacement
* **p** (*1-D array-like* *,* *optional*) – The probabilities associated with each entry in a.
If not given the sample assumes a uniform distribution over all
entries in a.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **Returns:**
**samples** – The generated random samples
* **Return type:**
single item or tensor
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If a is an int and less than zero, if a or p are not 1-dimensional,
if a is an array-like of size 0, if p is not a vector of
probabilities, if a and p have different lengths, or if
replace=False and the sample size is greater than the population
size
#### SEE ALSO
[`randint`](maxframe.tensor.random.randint.md#maxframe.tensor.random.randint), `shuffle`, [`permutation`](maxframe.tensor.random.permutation.md#maxframe.tensor.random.permutation)
### Examples
Generate a uniform random sample from mt.arange(5) of size 3:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.random.choice(5, 3).execute()
array([0, 3, 4])
>>> #This is equivalent to mt.random.randint(0,5,3)
```
Generate a non-uniform random sample from np.arange(5) of size 3:
```pycon
>>> mt.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0]).execute()
array([3, 3, 0])
```
Generate a uniform random sample from mt.arange(5) of size 3 without
replacement:
```pycon
>>> mt.random.choice(5, 3, replace=False).execute()
array([3,1,0])
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]
```
Generate a non-uniform random sample from mt.arange(5) of size
3 without replacement:
```pycon
>>> mt.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0]).execute()
array([2, 3, 0])
```
Any of the above can be repeated with an arbitrary array-like
instead of just integers. For instance:
```pycon
>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'],
dtype='|S11')
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.dirichlet.md
# maxframe.tensor.random.dirichlet
### maxframe.tensor.random.dirichlet(alpha, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from the Dirichlet distribution.
Draw size samples of dimension k from a Dirichlet distribution. A
Dirichlet-distributed random variable can be seen as a multivariate
generalization of a Beta distribution. Dirichlet pdf is the conjugate
prior of a multinomial in Bayesian inference.
* **Parameters:**
* **alpha** (*array*) – Parameter of the distribution (k dimension for sample of
dimension k).
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. Default is None, in which case a
single value is returned.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**samples** – The drawn samples, of shape (size, alpha.ndim).
* **Return type:**
Tensor
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If any value in alpha is less than or equal to zero
### Notes
$$
X \approx \prod_{i=1}^{k}{x^{\alpha_i-1}_i}
$$
Uses the following property for computation: for each dimension,
draw a random sample y_i from a standard gamma generator of shape
alpha_i, then
$X = \frac{1}{\sum_{i=1}^k{y_i}} (y_1, \ldots, y_n)$ is
Dirichlet distributed.
### References
* <a id='id1'>**[1]**</a> David McKay, “Information Theory, Inference and Learning Algorithms,” chapter 23, [http://www.inference.phy.cam.ac.uk/mackay/](http://www.inference.phy.cam.ac.uk/mackay/)
* <a id='id2'>**[2]**</a> Wikipedia, “Dirichlet distribution”, [http://en.wikipedia.org/wiki/Dirichlet_distribution](http://en.wikipedia.org/wiki/Dirichlet_distribution)
### Examples
Taking an example cited in Wikipedia, this distribution can be used if
one wanted to cut strings (each of initial length 1.0) into K pieces
with different lengths, where each piece had, on average, a designated
average length, but allowing some variation in the relative sizes of
the pieces.
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> s = mt.random.dirichlet((10, 5, 3), 20).transpose()
```
```pycon
>>> import matplotlib.pyplot as plt
```
```pycon
>>> plt.barh(range(20), s[0].execute())
>>> plt.barh(range(20), s[1].execute(), left=s[0].execute(), color='g')
>>> plt.barh(range(20), s[2].execute(), left=(s[0]+s[1]).execute(), color='r')
>>> plt.title("Lengths of Strings")
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.exponential.md
# maxframe.tensor.random.exponential
### maxframe.tensor.random.exponential(scale=1.0, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from an exponential distribution.
Its probability density function is
$$
f(x; \frac{1}{\beta}) = \frac{1}{\beta} \exp(-\frac{x}{\beta}),
$$
for `x > 0` and 0 elsewhere. $\beta$ is the scale parameter,
which is the inverse of the rate parameter $\lambda = 1/\beta$.
The rate parameter is an alternative, widely used parameterization
of the exponential distribution <sup>[3](#id6)</sup>.
The exponential distribution is a continuous analogue of the
geometric distribution. It describes many common situations, such as
the size of raindrops measured over many rainstorms <sup>[1](#id4)</sup>, or the time
between page requests to Wikipedia <sup>[2](#id5)</sup>.
* **Parameters:**
* **scale** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – The scale parameter, $\beta = 1/\lambda$.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `scale` is a scalar. Otherwise,
`np.array(scale).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized exponential distribution.
* **Return type:**
Tensor or scalar
### References
* <a id='id4'>**[1]**</a> Peyton Z. Peebles Jr., “Probability, Random Variables and Random Signal Principles”, 4th ed, 2001, p. 57.
* <a id='id5'>**[2]**</a> Wikipedia, “Poisson process”, [http://en.wikipedia.org/wiki/Poisson_process](http://en.wikipedia.org/wiki/Poisson_process)
* <a id='id6'>**[3]**</a> Wikipedia, “Exponential distribution”, [http://en.wikipedia.org/wiki/Exponential_distribution](http://en.wikipedia.org/wiki/Exponential_distribution)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.f.md
# maxframe.tensor.random.f
### maxframe.tensor.random.f(dfnum, dfden, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from an F distribution.
Samples are drawn from an F distribution with specified parameters,
dfnum (degrees of freedom in numerator) and dfden (degrees of
freedom in denominator), where both parameters should be greater than
zero.
The random variate of the F distribution (also known as the
Fisher distribution) is a continuous probability distribution
that arises in ANOVA tests, and is the ratio of two chi-square
variates.
* **Parameters:**
* **dfnum** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Degrees of freedom in numerator, should be > 0.
* **dfden** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* [*float*](https://docs.python.org/3/library/functions.html#float)) – Degrees of freedom in denominator, should be > 0.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `dfnum` and `dfden` are both scalars.
Otherwise, `np.broadcast(dfnum, dfden).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized Fisher distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.f`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f.html#scipy.stats.f)
: probability density function, distribution or cumulative density function, etc.
### Notes
The F statistic is used to compare in-group variances to between-group
variances. Calculating the distribution depends on the sampling, and
so it is a function of the respective degrees of freedom in the
problem. The variable dfnum is the number of samples minus one, the
between-groups degrees of freedom, while dfden is the within-groups
degrees of freedom, the sum of the number of samples in each group
minus the number of groups.
### References
* <a id='id1'>**[1]**</a> Glantz, Stanton A. “Primer of Biostatistics.”, McGraw-Hill, Fifth Edition, 2002.
* <a id='id2'>**[2]**</a> Wikipedia, “F-distribution”, [http://en.wikipedia.org/wiki/F-distribution](http://en.wikipedia.org/wiki/F-distribution)
### Examples
An example from Glantz[1], pp 47-40:
Two groups, children of diabetics (25 people) and children from people
without diabetes (25 controls). Fasting blood glucose was measured,
case group had a mean value of 86.1, controls had a mean value of
82.2. Standard deviations were 2.09 and 2.49 respectively. Are these
data consistent with the null hypothesis that the parents diabetic
status does not affect their children’s blood glucose levels?
Calculating the F statistic from the data gives a value of 36.01.
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> dfnum = 1. # between group degrees of freedom
>>> dfden = 48. # within groups degrees of freedom
>>> s = mt.random.f(dfnum, dfden, 1000).execute()
```
The lower bound for the top 1% of the samples is :
```pycon
>>> sorted(s)[-10]
7.61988120985
```
So there is about a 1% chance that the F statistic will exceed 7.62,
the measured value is 36, so the null hypothesis is rejected at the 1%
level.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.gamma.md
# maxframe.tensor.random.gamma
### maxframe.tensor.random.gamma(shape, scale=1.0, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a Gamma distribution.
Samples are drawn from a Gamma distribution with specified parameters,
shape (sometimes designated “k”) and scale (sometimes designated
“theta”), where both parameters are > 0.
* **Parameters:**
* **shape** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – The shape of the gamma distribution. Should be greater than zero.
* **scale** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats* *,* *optional*) – The scale of the gamma distribution. Should be greater than zero.
Default is equal to 1.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `shape` and `scale` are both scalars.
Otherwise, `np.broadcast(shape, scale).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized gamma distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.gamma`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gamma.html#scipy.stats.gamma)
: probability density function, distribution or cumulative density function, etc.
### Notes
The probability density for the Gamma distribution is
$$
p(x) = x^{k-1}\frac{e^{-x/\theta}}{\theta^k\Gamma(k)},
$$
where $k$ is the shape and $\theta$ the scale,
and $\Gamma$ is the Gamma function.
The Gamma distribution is often used to model the times to failure of
electronic components, and arises naturally in processes for which the
waiting times between Poisson distributed events are relevant.
### References
* <a id='id1'>**[1]**</a> Weisstein, Eric W. “Gamma Distribution.” From MathWorld–A Wolfram Web Resource. [http://mathworld.wolfram.com/GammaDistribution.html](http://mathworld.wolfram.com/GammaDistribution.html)
* <a id='id2'>**[2]**</a> Wikipedia, “Gamma distribution”, [http://en.wikipedia.org/wiki/Gamma_distribution](http://en.wikipedia.org/wiki/Gamma_distribution)
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> shape, scale = 2., 2. # mean=4, std=2*sqrt(2)
>>> s = mt.random.gamma(shape, scale, 1000).execute()
```
Display the histogram of the samples, along with
the probability density function:
```pycon
>>> import matplotlib.pyplot as plt
>>> import scipy.special as sps
>>> import numpy as np
>>> count, bins, ignored = plt.hist(s, 50, normed=True)
>>> y = bins**(shape-1)*(np.exp(-bins/scale) /
... (sps.gamma(shape)*scale**shape))
>>> plt.plot(bins, y, linewidth=2, color='r')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.geometric.md
# maxframe.tensor.random.geometric
### maxframe.tensor.random.geometric(p, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from the geometric distribution.
Bernoulli trials are experiments with one of two outcomes:
success or failure (an example of such an experiment is flipping
a coin). The geometric distribution models the number of trials
that must be run in order to achieve success. It is therefore
supported on the positive integers, `k = 1, 2, ...`.
The probability mass function of the geometric distribution is
$$
f(k) = (1 - p)^{k - 1} p
$$
where p is the probability of success of an individual trial.
* **Parameters:**
* **p** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – The probability of success of an individual trial.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `p` is a scalar. Otherwise,
`mt.array(p).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized geometric distribution.
* **Return type:**
Tensor or scalar
### Examples
Draw ten thousand values from the geometric distribution,
with the probability of an individual success equal to 0.35:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> z = mt.random.geometric(p=0.35, size=10000)
```
How many trials succeeded after a single run?
```pycon
>>> ((z == 1).sum() / 10000.).execute()
0.34889999999999999 #random
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.gumbel.md
# maxframe.tensor.random.gumbel
### maxframe.tensor.random.gumbel(loc=0.0, scale=1.0, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a Gumbel distribution.
Draw samples from a Gumbel distribution with specified location and
scale. For more information on the Gumbel distribution, see
Notes and References below.
* **Parameters:**
* **loc** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats* *,* *optional*) – The location of the mode of the distribution. Default is 0.
* **scale** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats* *,* *optional*) – The scale parameter of the distribution. Default is 1.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `loc` and `scale` are both scalars.
Otherwise, `np.broadcast(loc, scale).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized Gumbel distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.gumbel_l`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gumbel_l.html#scipy.stats.gumbel_l), [`scipy.stats.gumbel_r`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gumbel_r.html#scipy.stats.gumbel_r), [`scipy.stats.genextreme`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.genextreme.html#scipy.stats.genextreme), [`weibull`](maxframe.tensor.random.weibull.md#maxframe.tensor.random.weibull)
### Notes
The Gumbel (or Smallest Extreme Value (SEV) or the Smallest Extreme
Value Type I) distribution is one of a class of Generalized Extreme
Value (GEV) distributions used in modeling extreme value problems.
The Gumbel is a special case of the Extreme Value Type I distribution
for maximums from distributions with “exponential-like” tails.
The probability density for the Gumbel distribution is
$$
p(x) = \frac{e^{-(x - \mu)/ \beta}}{\beta} e^{ -e^{-(x - \mu)/
\beta}},
$$
where $\mu$ is the mode, a location parameter, and
$\beta$ is the scale parameter.
The Gumbel (named for German mathematician Emil Julius Gumbel) was used
very early in the hydrology literature, for modeling the occurrence of
flood events. It is also used for modeling maximum wind speed and
rainfall rates. It is a “fat-tailed” distribution - the probability of
an event in the tail of the distribution is larger than if one used a
Gaussian, hence the surprisingly frequent occurrence of 100-year
floods. Floods were initially modeled as a Gaussian process, which
underestimated the frequency of extreme events.
It is one of a class of extreme value distributions, the Generalized
Extreme Value (GEV) distributions, which also includes the Weibull and
Frechet.
The function has a mean of $\mu + 0.57721\beta$ and a variance
of $\frac{\pi^2}{6}\beta^2$.
### References
* <a id='id1'>**[1]**</a> Gumbel, E. J., “Statistics of Extremes,” New York: Columbia University Press, 1958.
* <a id='id2'>**[2]**</a> Reiss, R.-D. and Thomas, M., “Statistical Analysis of Extreme Values from Insurance, Finance, Hydrology and Other Fields,” Basel: Birkhauser Verlag, 2001.
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mu, beta = 0, 0.1 # location and scale
>>> s = mt.random.gumbel(mu, beta, 1000).execute()
```
Display the histogram of the samples, along with
the probability density function:
```pycon
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> count, bins, ignored = plt.hist(s, 30, normed=True)
>>> plt.plot(bins, (1/beta)*np.exp(-(bins - mu)/beta)
... * np.exp( -np.exp( -(bins - mu) /beta) ),
... linewidth=2, color='r')
>>> plt.show()
```
Show how an extreme value distribution can arise from a Gaussian process
and compare to a Gaussian:
```pycon
>>> means = []
>>> maxima = []
>>> for i in range(0,1000) :
... a = mt.random.normal(mu, beta, 1000)
... means.append(a.mean().execute())
... maxima.append(a.max().execute())
>>> count, bins, ignored = plt.hist(maxima, 30, normed=True)
>>> beta = mt.std(maxima) * mt.sqrt(6) / mt.pi
>>> mu = mt.mean(maxima) - 0.57721*beta
>>> plt.plot(bins, ((1/beta)*mt.exp(-(bins - mu)/beta)
... * mt.exp(-mt.exp(-(bins - mu)/beta))).execute(),
... linewidth=2, color='r')
>>> plt.plot(bins, (1/(beta * mt.sqrt(2 * mt.pi))
... * mt.exp(-(bins - mu)**2 / (2 * beta**2))).execute(),
... linewidth=2, color='g')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.hypergeometric.md
# maxframe.tensor.random.hypergeometric
### maxframe.tensor.random.hypergeometric(ngood, nbad, nsample, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a Hypergeometric distribution.
Samples are drawn from a hypergeometric distribution with specified
parameters, ngood (ways to make a good selection), nbad (ways to make
a bad selection), and nsample = number of items sampled, which is less
than or equal to the sum ngood + nbad.
* **Parameters:**
* **ngood** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *array_like* *of* *ints*) – Number of ways to make a good selection. Must be nonnegative.
* **nbad** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *array_like* *of* *ints*) – Number of ways to make a bad selection. Must be nonnegative.
* **nsample** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *array_like* *of* *ints*) – Number of items sampled. Must be at least 1 and at most
`ngood + nbad`.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `ngood`, `nbad`, and `nsample`
are all scalars. Otherwise, `np.broadcast(ngood, nbad, nsample).size`
samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized hypergeometric distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.hypergeom`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.hypergeom.html#scipy.stats.hypergeom)
: probability density function, distribution or cumulative density function, etc.
### Notes
The probability density for the Hypergeometric distribution is
$$
P(x) = \frac{\binom{m}{n}\binom{N-m}{n-x}}{\binom{N}{n}},
$$
where $0 \le x \le m$ and $n+m-N \le x \le n$
for P(x) the probability of x successes, n = ngood, m = nbad, and
N = number of samples.
Consider an urn with black and white marbles in it, ngood of them
black and nbad are white. If you draw nsample balls without
replacement, then the hypergeometric distribution describes the
distribution of black balls in the drawn sample.
Note that this distribution is very similar to the binomial
distribution, except that in this case, samples are drawn without
replacement, whereas in the Binomial case samples are drawn with
replacement (or the sample space is infinite). As the sample space
becomes large, this distribution approaches the binomial.
### References
* <a id='id1'>**[1]**</a> Lentner, Marvin, “Elementary Applied Statistics”, Bogden and Quigley, 1972.
* <a id='id2'>**[2]**</a> Weisstein, Eric W. “Hypergeometric Distribution.” From MathWorld–A Wolfram Web Resource. [http://mathworld.wolfram.com/HypergeometricDistribution.html](http://mathworld.wolfram.com/HypergeometricDistribution.html)
* <a id='id3'>**[3]**</a> Wikipedia, “Hypergeometric distribution”, [http://en.wikipedia.org/wiki/Hypergeometric_distribution](http://en.wikipedia.org/wiki/Hypergeometric_distribution)
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> ngood, nbad, nsamp = 100, 2, 10
# number of good, number of bad, and number of samples
>>> s = mt.random.hypergeometric(ngood, nbad, nsamp, 1000)
>>> hist(s)
# note that it is very unlikely to grab both bad items
```
Suppose you have an urn with 15 white and 15 black marbles.
If you pull 15 marbles at random, how likely is it that
12 or more of them are one color?
```pycon
>>> s = mt.random.hypergeometric(15, 15, 15, 100000)
>>> (mt.sum(s>=12)/100000. + mt.sum(s<=3)/100000.).execute()
# answer = 0.003 ... pretty unlikely!
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.laplace.md
# maxframe.tensor.random.laplace
### maxframe.tensor.random.laplace(loc=0.0, scale=1.0, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from the Laplace or double exponential distribution with
specified location (or mean) and scale (decay).
The Laplace distribution is similar to the Gaussian/normal distribution,
but is sharper at the peak and has fatter tails. It represents the
difference between two independent, identically distributed exponential
random variables.
* **Parameters:**
* **loc** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats* *,* *optional*) – The position, $\mu$, of the distribution peak. Default is 0.
* **scale** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats* *,* *optional*) – $\lambda$, the exponential decay. Default is 1.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `loc` and `scale` are both scalars.
Otherwise, `np.broadcast(loc, scale).size` samples are drawn.
* **chunks** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized Laplace distribution.
* **Return type:**
Tensor or scalar
### Notes
It has the probability density function
$$
f(x; \mu, \lambda) = \frac{1}{2\lambda}
\exp\left(-\frac{|x - \mu|}{\lambda}\right).
$$
The first law of Laplace, from 1774, states that the frequency
of an error can be expressed as an exponential function of the
absolute magnitude of the error, which leads to the Laplace
distribution. For many problems in economics and health
sciences, this distribution seems to model the data better
than the standard Gaussian distribution.
### References
* <a id='id1'>**[1]**</a> Abramowitz, M. and Stegun, I. A. (Eds.). “Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th printing,” New York: Dover, 1972.
* <a id='id2'>**[2]**</a> Kotz, Samuel, et. al. “The Laplace Distribution and Generalizations, “ Birkhauser, 2001.
* <a id='id3'>**[3]**</a> Weisstein, Eric W. “Laplace Distribution.” From MathWorld–A Wolfram Web Resource. [http://mathworld.wolfram.com/LaplaceDistribution.html](http://mathworld.wolfram.com/LaplaceDistribution.html)
* <a id='id4'>**[4]**</a> Wikipedia, “Laplace distribution”, [http://en.wikipedia.org/wiki/Laplace_distribution](http://en.wikipedia.org/wiki/Laplace_distribution)
### Examples
Draw samples from the distribution
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> loc, scale = 0., 1.
>>> s = mt.random.laplace(loc, scale, 1000)
```
Display the histogram of the samples, along with
the probability density function:
```pycon
>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s.execute(), 30, normed=True)
>>> x = mt.arange(-8., 8., .01)
>>> pdf = mt.exp(-abs(x-loc)/scale)/(2.*scale)
>>> plt.plot(x.execute(), pdf.execute())
```
Plot Gaussian for comparison:
```pycon
>>> g = (1/(scale * mt.sqrt(2 * np.pi)) *
... mt.exp(-(x - loc)**2 / (2 * scale**2)))
>>> plt.plot(x.execute(),g.execute())
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.lognormal.md
# maxframe.tensor.random.lognormal
### maxframe.tensor.random.lognormal(mean=0.0, sigma=1.0, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a log-normal distribution.
Draw samples from a log-normal distribution with specified mean,
standard deviation, and array shape. Note that the mean and standard
deviation are not the values for the distribution itself, but of the
underlying normal distribution it is derived from.
* **Parameters:**
* **mean** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats* *,* *optional*) – Mean value of the underlying normal distribution. Default is 0.
* **sigma** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats* *,* *optional*) – Standard deviation of the underlying normal distribution. Should
be greater than zero. Default is 1.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `mean` and `sigma` are both scalars.
Otherwise, `np.broadcast(mean, sigma).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized log-normal distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.lognorm`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.lognorm.html#scipy.stats.lognorm)
: probability density function, distribution, cumulative density function, etc.
### Notes
A variable x has a log-normal distribution if log(x) is normally
distributed. The probability density function for the log-normal
distribution is:
$$
p(x) = \frac{1}{\sigma x \sqrt{2\pi}}
e^{(-\frac{(ln(x)-\mu)^2}{2\sigma^2})}
$$
where $\mu$ is the mean and $\sigma$ is the standard
deviation of the normally distributed logarithm of the variable.
A log-normal distribution results if a random variable is the *product*
of a large number of independent, identically-distributed variables in
the same way that a normal distribution results if the variable is the
*sum* of a large number of independent, identically-distributed
variables.
### References
* <a id='id1'>**[1]**</a> Limpert, E., Stahel, W. A., and Abbt, M., “Log-normal Distributions across the Sciences: Keys and Clues,” BioScience, Vol. 51, No. 5, May, 2001. [http://stat.ethz.ch/~stahel/lognormal/bioscience.pdf](http://stat.ethz.ch/~stahel/lognormal/bioscience.pdf)
* <a id='id2'>**[2]**</a> Reiss, R.D. and Thomas, M., “Statistical Analysis of Extreme Values,” Basel: Birkhauser Verlag, 2001, pp. 31-32.
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mu, sigma = 3., 1. # mean and standard deviation
>>> s = mt.random.lognormal(mu, sigma, 1000)
```
Display the histogram of the samples, along with
the probability density function:
```pycon
>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s.execute(), 100, normed=True, align='mid')
```
```pycon
>>> x = mt.linspace(min(bins), max(bins), 10000)
>>> pdf = (mt.exp(-(mt.log(x) - mu)**2 / (2 * sigma**2))
... / (x * sigma * mt.sqrt(2 * mt.pi)))
```
```pycon
>>> plt.plot(x.execute(), pdf.execute(), linewidth=2, color='r')
>>> plt.axis('tight')
>>> plt.show()
```
Demonstrate that taking the products of random samples from a uniform
distribution can be fit well by a log-normal probability density
function.
```pycon
>>> # Generate a thousand samples: each is the product of 100 random
>>> # values, drawn from a normal distribution.
>>> b = []
>>> for i in range(1000):
... a = 10. + mt.random.random(100)
... b.append(mt.product(a).execute())
```
```pycon
>>> b = mt.array(b) / mt.min(b) # scale values to be positive
>>> count, bins, ignored = plt.hist(b.execute(), 100, normed=True, align='mid')
>>> sigma = mt.std(mt.log(b))
>>> mu = mt.mean(mt.log(b))
```
```pycon
>>> x = mt.linspace(min(bins), max(bins), 10000)
>>> pdf = (mt.exp(-(mt.log(x) - mu)**2 / (2 * sigma**2))
... / (x * sigma * mt.sqrt(2 * mt.pi)))
```
```pycon
>>> plt.plot(x.execute(), pdf.execute(), color='r', linewidth=2)
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.logseries.md
# maxframe.tensor.random.logseries
### maxframe.tensor.random.logseries(p, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a logarithmic series distribution.
Samples are drawn from a log series distribution with specified
shape parameter, 0 < `p` < 1.
* **Parameters:**
* **p** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Shape parameter for the distribution. Must be in the range (0, 1).
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `p` is a scalar. Otherwise,
`np.array(p).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized logarithmic series distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.logser`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.logser.html#scipy.stats.logser)
: probability density function, distribution or cumulative density function, etc.
### Notes
The probability density for the Log Series distribution is
$$
P(k) = \frac{-p^k}{k \ln(1-p)},
$$
where p = probability.
The log series distribution is frequently used to represent species
richness and occurrence, first proposed by Fisher, Corbet, and
Williams in 1943 [2]. It may also be used to model the numbers of
occupants seen in cars [3].
### References
* <a id='id1'>**[1]**</a> Buzas, Martin A.; Culver, Stephen J., Understanding regional species diversity through the log series distribution of occurrences: BIODIVERSITY RESEARCH Diversity & Distributions, Volume 5, Number 5, September 1999 , pp. 187-195(9).
* <a id='id2'>**[2]**</a> Fisher, R.A,, A.S. Corbet, and C.B. Williams. 1943. The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology, 12:42-58.
* <a id='id3'>**[3]**</a> D. J. Hand, F. Daly, D. Lunn, E. Ostrowski, A Handbook of Small Data Sets, CRC Press, 1994.
* <a id='id4'>**[4]**</a> Wikipedia, “Logarithmic distribution”, [http://en.wikipedia.org/wiki/Logarithmic_distribution](http://en.wikipedia.org/wiki/Logarithmic_distribution)
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
>>> import matplotlib.pyplot as plt
```
```pycon
>>> a = .6
>>> s = mt.random.logseries(a, 10000)
>>> count, bins, ignored = plt.hist(s.execute())
```
# plot against distribution
```pycon
>>> def logseries(k, p):
... return -p**k/(k*mt.log(1-p))
>>> plt.plot(bins, (logseries(bins, a)*count.max()/
... logseries(bins, a).max()).execute(), 'r')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.multinomial.md
# maxframe.tensor.random.multinomial
### maxframe.tensor.random.multinomial(n, pvals, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a multinomial distribution.
The multinomial distribution is a multivariate generalisation of the
binomial distribution. Take an experiment with one of `p`
possible outcomes. An example of such an experiment is throwing a dice,
where the outcome can be 1 through 6. Each sample drawn from the
distribution represents n such experiments. Its values,
`X_i = [X_0, X_1, ..., X_p]`, represent the number of times the
outcome was `i`.
* **Parameters:**
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Number of experiments.
* **pvals** (*sequence* *of* *floats* *,* *length p*) – Probabilities of each of the `p` different outcomes. These
should sum to 1 (however, the last element is always assumed to
account for the remaining probability, as long as
`sum(pvals[:-1]) <= 1)`.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. Default is None, in which case a
single value is returned.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – The drawn samples, of shape *size*, if that was provided. If not,
the shape is `(N,)`.
In other words, each entry `out[i,j,...,:]` is an N-dimensional
value drawn from the distribution.
* **Return type:**
Tensor
### Examples
Throw a dice 20 times:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.random.multinomial(20, [1/6.]*6, size=1).execute()
array([[4, 1, 7, 5, 2, 1]])
```
It landed 4 times on 1, once on 2, etc.
Now, throw the dice 20 times, and 20 times again:
```pycon
>>> mt.random.multinomial(20, [1/6.]*6, size=2).execute()
array([[3, 4, 3, 3, 4, 3],
[2, 4, 3, 4, 0, 7]])
```
For the first run, we threw 3 times 1, 4 times 2, etc. For the second,
we threw 2 times 1, 4 times 2, etc.
A loaded die is more likely to land on number 6:
```pycon
>>> mt.random.multinomial(100, [1/7.]*5 + [2/7.]).execute()
array([11, 16, 14, 17, 16, 26])
```
The probability inputs should be normalized. As an implementation
detail, the value of the last entry is ignored and assumed to take
up any leftover probability mass, but this should not be relied on.
A biased coin which has twice as much weight on one side as on the
other should be sampled like so:
```pycon
>>> mt.random.multinomial(100, [1.0 / 3, 2.0 / 3]).execute() # RIGHT
array([38, 62])
```
not like:
```pycon
>>> mt.random.multinomial(100, [1.0, 2.0]).execute() # WRONG
array([100, 0])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.multivariate_normal.md
# maxframe.tensor.random.multivariate_normal
### maxframe.tensor.random.multivariate_normal(mean, cov, size=None, check_valid=None, tol=None, chunk_size=None, gpu=None, dtype=None)
Draw random samples from a multivariate normal distribution.
The multivariate normal, multinormal or Gaussian distribution is a
generalization of the one-dimensional normal distribution to higher
dimensions. Such a distribution is specified by its mean and
covariance matrix. These parameters are analogous to the mean
(average or “center”) and variance (standard deviation, or “width,”
squared) of the one-dimensional normal distribution.
* **Parameters:**
* **mean** (*1-D array_like* *, of* *length N*) – Mean of the N-dimensional distribution.
* **cov** (*2-D array_like* *, of* *shape* *(**N* *,* *N* *)*) – Covariance matrix of the distribution. It must be symmetric and
positive-semidefinite for proper sampling.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Given a shape of, for example, `(m,n,k)`, `m*n*k` samples are
generated, and packed in an m-by-n-by-k arrangement. Because
each sample is N-dimensional, the output shape is `(m,n,k,N)`.
If no shape is specified, a single (N-D) sample is returned.
* **check_valid** ( *{ 'warn'* *,* *'raise'* *,* *'ignore' }* *,* *optional*) – Behavior when the covariance matrix is not positive semidefinite.
* **tol** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – Tolerance when checking the singular values in covariance matrix.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – The drawn samples, of shape *size*, if that was provided. If not,
the shape is `(N,)`.
In other words, each entry `out[i,j,...,:]` is an N-dimensional
value drawn from the distribution.
* **Return type:**
Tensor
### Notes
The mean is a coordinate in N-dimensional space, which represents the
location where samples are most likely to be generated. This is
analogous to the peak of the bell curve for the one-dimensional or
univariate normal distribution.
Covariance indicates the level to which two variables vary together.
From the multivariate normal distribution, we draw N-dimensional
samples, $X = [x_1, x_2, ... x_N]$. The covariance matrix
element $C_{ij}$ is the covariance of $x_i$ and $x_j$.
The element $C_{ii}$ is the variance of $x_i$ (i.e. its
“spread”).
Instead of specifying the full covariance matrix, popular
approximations include:
> - Spherical covariance (cov is a multiple of the identity matrix)
> - Diagonal covariance (cov has non-negative elements, and only on
> the diagonal)
This geometrical property can be seen in two dimensions by plotting
generated data-points:
```pycon
>>> mean = [0, 0]
>>> cov = [[1, 0], [0, 100]] # diagonal covariance
```
Diagonal covariance means that points are oriented along x or y-axis:
```pycon
>>> import matplotlib.pyplot as plt
>>> import maxframe.tensor as mt
>>> x, y = mt.random.multivariate_normal(mean, cov, 5000).T
>>> plt.plot(x.execute(), y.execute(), 'x')
>>> plt.axis('equal')
>>> plt.show()
```
Note that the covariance matrix must be positive semidefinite (a.k.a.
nonnegative-definite). Otherwise, the behavior of this method is
undefined and backwards compatibility is not guaranteed.
### References
* <a id='id1'>**[1]**</a> Papoulis, A., “Probability, Random Variables, and Stochastic Processes,” 3rd ed., New York: McGraw-Hill, 1991.
* <a id='id2'>**[2]**</a> Duda, R. O., Hart, P. E., and Stork, D. G., “Pattern Classification,” 2nd ed., New York: Wiley, 2001.
### Examples
```pycon
>>> mean = (1, 2)
>>> cov = [[1, 0], [0, 1]]
>>> x = mt.random.multivariate_normal(mean, cov, (3, 3))
>>> x.shape
(3, 3, 2)
```
The following is probably true, given that 0.6 is roughly twice the
standard deviation:
```pycon
>>> list(((x[0,0,:] - mean) < 0.6).execute())
[True, True]
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.negative_binomial.md
# maxframe.tensor.random.negative_binomial
### maxframe.tensor.random.negative_binomial(n, p, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a negative binomial distribution.
Samples are drawn from a negative binomial distribution with specified
parameters, n trials and p probability of success where n is an
integer > 0 and p is in the interval [0, 1].
* **Parameters:**
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *array_like* *of* *ints*) – Parameter of the distribution, > 0. Floats are also accepted,
but they will be truncated to integers.
* **p** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Parameter of the distribution, >= 0 and <=1.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `n` and `p` are both scalars.
Otherwise, `np.broadcast(n, p).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized negative binomial distribution,
where each sample is equal to N, the number of trials it took to
achieve n - 1 successes, N - (n - 1) failures, and a success on the,
(N + n)th trial.
* **Return type:**
Tensor or scalar
### Notes
The probability density for the negative binomial distribution is
$$
P(N;n,p) = \binom{N+n-1}{n-1}p^{n}(1-p)^{N},
$$
where $n-1$ is the number of successes, $p$ is the
probability of success, and $N+n-1$ is the number of trials.
The negative binomial distribution gives the probability of n-1
successes and N failures in N+n-1 trials, and success on the (N+n)th
trial.
If one throws a die repeatedly until the third time a “1” appears,
then the probability distribution of the number of non-“1”s that
appear before the third “1” is a negative binomial distribution.
### References
* <a id='id1'>**[1]**</a> Weisstein, Eric W. “Negative Binomial Distribution.” From MathWorld–A Wolfram Web Resource. [http://mathworld.wolfram.com/NegativeBinomialDistribution.html](http://mathworld.wolfram.com/NegativeBinomialDistribution.html)
* <a id='id2'>**[2]**</a> Wikipedia, “Negative binomial distribution”, [http://en.wikipedia.org/wiki/Negative_binomial_distribution](http://en.wikipedia.org/wiki/Negative_binomial_distribution)
### Examples
Draw samples from the distribution:
A real world example. A company drills wild-cat oil
exploration wells, each with an estimated probability of
success of 0.1. What is the probability of having one success
for each successive well, that is what is the probability of a
single success after drilling 5 wells, after 6 wells, etc.?
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> s = mt.random.negative_binomial(1, 0.1, 100000)
>>> for i in range(1, 11):
... probability = (mt.sum(s<i) / 100000.).execute()
... print i, "wells drilled, probability of one success =", probability
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.noncentral_chisquare.md
# maxframe.tensor.random.noncentral_chisquare
### maxframe.tensor.random.noncentral_chisquare(df, nonc, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a noncentral chi-square distribution.
The noncentral $\chi^2$ distribution is a generalisation of
the $\chi^2$ distribution.
* **Parameters:**
* **df** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Degrees of freedom, should be > 0.
* **nonc** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Non-centrality, should be non-negative.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `df` and `nonc` are both scalars.
Otherwise, `mt.broadcast(df, nonc).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized noncentral chi-square distribution.
* **Return type:**
Tensor or scalar
### Notes
The probability density function for the noncentral Chi-square
distribution is
$$
P(x;df,nonc) = \sum^{\infty}_{i=0}
\frac{e^{-nonc/2}(nonc/2)^{i}}{i!}
\P_{Y_{df+2i}}(x),
$$
where $Y_{q}$ is the Chi-square with q degrees of freedom.
In Delhi (2007), it is noted that the noncentral chi-square is
useful in bombing and coverage problems, the probability of
killing the point target given by the noncentral chi-squared
distribution.
### References
* <a id='id1'>**[1]**</a> Delhi, M.S. Holla, “On a noncentral chi-square distribution in the analysis of weapon systems effectiveness”, Metrika, Volume 15, Number 1 / December, 1970.
* <a id='id2'>**[2]**</a> Wikipedia, “Noncentral chi-square distribution” [http://en.wikipedia.org/wiki/Noncentral_chi-square_distribution](http://en.wikipedia.org/wiki/Noncentral_chi-square_distribution)
### Examples
Draw values from the distribution and plot the histogram
```pycon
>>> import matplotlib.pyplot as plt
>>> import maxframe.tensor as mt
>>> values = plt.hist(mt.random.noncentral_chisquare(3, 20, 100000).execute(),
... bins=200, normed=True)
>>> plt.show()
```
Draw values from a noncentral chisquare with very small noncentrality,
and compare to a chisquare.
```pycon
>>> plt.figure()
>>> values = plt.hist(mt.random.noncentral_chisquare(3, .0000001, 100000).execute(),
... bins=mt.arange(0., 25, .1).execute(), normed=True)
>>> values2 = plt.hist(mt.random.chisquare(3, 100000).execute(),
... bins=mt.arange(0., 25, .1).execute(), normed=True)
>>> plt.plot(values[1][0:-1], values[0]-values2[0], 'ob')
>>> plt.show()
```
Demonstrate how large values of non-centrality lead to a more symmetric
distribution.
```pycon
>>> plt.figure()
>>> values = plt.hist(mt.random.noncentral_chisquare(3, 20, 100000).execute(),
... bins=200, normed=True)
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.noncentral_f.md
# maxframe.tensor.random.noncentral_f
### maxframe.tensor.random.noncentral_f(dfnum, dfden, nonc, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from the noncentral F distribution.
Samples are drawn from an F distribution with specified parameters,
dfnum (degrees of freedom in numerator) and dfden (degrees of
freedom in denominator), where both parameters > 1.
nonc is the non-centrality parameter.
* **Parameters:**
* **dfnum** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Numerator degrees of freedom, should be > 0.
* **dfden** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Denominator degrees of freedom, should be > 0.
* **nonc** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Non-centrality parameter, the sum of the squares of the numerator
means, should be >= 0.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `dfnum`, `dfden`, and `nonc`
are all scalars. Otherwise, `np.broadcast(dfnum, dfden, nonc).size`
samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized noncentral Fisher distribution.
* **Return type:**
Tensor or scalar
### Notes
When calculating the power of an experiment (power = probability of
rejecting the null hypothesis when a specific alternative is true) the
non-central F statistic becomes important. When the null hypothesis is
true, the F statistic follows a central F distribution. When the null
hypothesis is not true, then it follows a non-central F statistic.
### References
* <a id='id1'>**[1]**</a> Weisstein, Eric W. “Noncentral F-Distribution.” From MathWorld–A Wolfram Web Resource. [http://mathworld.wolfram.com/NoncentralF-Distribution.html](http://mathworld.wolfram.com/NoncentralF-Distribution.html)
* <a id='id2'>**[2]**</a> Wikipedia, “Noncentral F-distribution”, [http://en.wikipedia.org/wiki/Noncentral_F-distribution](http://en.wikipedia.org/wiki/Noncentral_F-distribution)
### Examples
In a study, testing for a specific alternative to the null hypothesis
requires use of the Noncentral F distribution. We need to calculate the
area in the tail of the distribution that exceeds the value of the F
distribution for the null hypothesis. We’ll plot the two probability
distributions for comparison.
```pycon
>>> import maxframe.tensor as mt
>>> import matplotlib.pyplot as plt
```
```pycon
>>> dfnum = 3 # between group deg of freedom
>>> dfden = 20 # within groups degrees of freedom
>>> nonc = 3.0
>>> nc_vals = mt.random.noncentral_f(dfnum, dfden, nonc, 1000000)
>>> NF = np.histogram(nc_vals.execute(), bins=50, normed=True)
>>> c_vals = mt.random.f(dfnum, dfden, 1000000)
>>> F = np.histogram(c_vals.execute(), bins=50, normed=True)
>>> plt.plot(F[1][1:], F[0])
>>> plt.plot(NF[1][1:], NF[0])
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.normal.md
# maxframe.tensor.random.normal
### maxframe.tensor.random.normal(loc=0.0, scale=1.0, size=None, chunk_size=None, gpu=None, dtype=None)
Draw random samples from a normal (Gaussian) distribution.
The probability density function of the normal distribution, first
derived by De Moivre and 200 years later by both Gauss and Laplace
independently <sup>[2](#id5)</sup>, is often called the bell curve because of
its characteristic shape (see the example below).
The normal distributions occurs often in nature. For example, it
describes the commonly occurring distribution of samples influenced
by a large number of tiny, random disturbances, each with its own
unique distribution <sup>[2](#id5)</sup>.
* **Parameters:**
* **loc** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Mean (“centre”) of the distribution.
* **scale** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Standard deviation (spread or “width”) of the distribution.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `loc` and `scale` are both scalars.
Otherwise, `mt.broadcast(loc, scale).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized normal distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.norm`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html#scipy.stats.norm)
: probability density function, distribution or cumulative density function, etc.
### Notes
The probability density for the Gaussian distribution is
$$
p(x) = \frac{1}{\sqrt{ 2 \pi \sigma^2 }}
e^{ - \frac{ (x - \mu)^2 } {2 \sigma^2} },
$$
where $\mu$ is the mean and $\sigma$ the standard
deviation. The square of the standard deviation, $\sigma^2$,
is called the variance.
The function has its peak at the mean, and its “spread” increases with
the standard deviation (the function reaches 0.607 times its maximum at
$x + \sigma$ and $x - \sigma$ <sup>[2](#id5)</sup>). This implies that
numpy.random.normal is more likely to return samples lying close to
the mean, rather than those far away.
### References
* <a id='id4'>**[1]**</a> Wikipedia, “Normal distribution”, [http://en.wikipedia.org/wiki/Normal_distribution](http://en.wikipedia.org/wiki/Normal_distribution)
* <a id='id5'>**[2]**</a> P. R. Peebles Jr., “Central Limit Theorem” in “Probability, Random Variables and Random Signal Principles”, 4th ed., 2001, pp. 51, 51, 125.
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mu, sigma = 0, 0.1 # mean and standard deviation
>>> s = mt.random.normal(mu, sigma, 1000)
```
Verify the mean and the variance:
```pycon
>>> (abs(mu - mt.mean(s)) < 0.01).execute()
True
```
```pycon
>>> (abs(sigma - mt.std(s, ddof=1)) < 0.01).execute()
True
```
Display the histogram of the samples, along with
the probability density function:
```pycon
>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s.execute(), 30, normed=True)
>>> plt.plot(bins, (1/(sigma * mt.sqrt(2 * mt.pi)) *
... mt.exp( - (bins - mu)**2 / (2 * sigma**2) )).execute(),
... linewidth=2, color='r')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.pareto.md
# maxframe.tensor.random.pareto
### maxframe.tensor.random.pareto(a, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a Pareto II or Lomax distribution with
specified shape.
The Lomax or Pareto II distribution is a shifted Pareto
distribution. The classical Pareto distribution can be
obtained from the Lomax distribution by adding 1 and
multiplying by the scale parameter `m` (see Notes). The
smallest value of the Lomax distribution is zero while for the
classical Pareto distribution it is `mu`, where the standard
Pareto distribution has location `mu = 1`. Lomax can also
be considered as a simplified version of the Generalized
Pareto distribution (available in SciPy), with the scale set
to one and the location set to zero.
The Pareto distribution must be greater than zero, and is
unbounded above. It is also known as the “80-20 rule”. In
this distribution, 80 percent of the weights are in the lowest
20 percent of the range, while the other 20 percent fill the
remaining 80 percent of the range.
* **Parameters:**
* **a** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Shape of the distribution. Should be greater than zero.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `a` is a scalar. Otherwise,
`mt.array(a).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized Pareto distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.lomax`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.lomax.html#scipy.stats.lomax)
: probability density function, distribution or cumulative density function, etc.
[`scipy.stats.genpareto`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.genpareto.html#scipy.stats.genpareto)
: probability density function, distribution or cumulative density function, etc.
### Notes
The probability density for the Pareto distribution is
$$
p(x) = \frac{am^a}{x^{a+1}}
$$
where $a$ is the shape and $m$ the scale.
The Pareto distribution, named after the Italian economist
Vilfredo Pareto, is a power law probability distribution
useful in many real world problems. Outside the field of
economics it is generally referred to as the Bradford
distribution. Pareto developed the distribution to describe
the distribution of wealth in an economy. It has also found
use in insurance, web page access statistics, oil field sizes,
and many other problems, including the download frequency for
projects in Sourceforge <sup>[1](#id2)</sup>. It is one of the so-called
“fat-tailed” distributions.
### References
* <a id='id2'>**[1]**</a> Francis Hunt and Paul Johnson, On the Pareto Distribution of Sourceforge projects.
* <a id='id3'>**[2]**</a> Pareto, V. (1896). Course of Political Economy. Lausanne.
* <a id='id4'>**[3]**</a> Reiss, R.D., Thomas, M.(2001), Statistical Analysis of Extreme Values, Birkhauser Verlag, Basel, pp 23-30.
* <a id='id5'>**[4]**</a> Wikipedia, “Pareto distribution”, [http://en.wikipedia.org/wiki/Pareto_distribution](http://en.wikipedia.org/wiki/Pareto_distribution)
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a, m = 3., 2. # shape and mode
>>> s = (mt.random.pareto(a, 1000) + 1) * m
```
Display the histogram of the samples, along with the probability
density function:
```pycon
>>> import matplotlib.pyplot as plt
>>> count, bins, _ = plt.hist(s.execute(), 100, normed=True)
>>> fit = a*m**a / bins**(a+1)
>>> plt.plot(bins, max(count)*fit/max(fit), linewidth=2, color='r')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.permutation.md
# maxframe.tensor.random.permutation
### maxframe.tensor.random.permutation(x, axis=0, chunk_size=None)
Randomly permute a sequence, or return a permuted range.
* **Parameters:**
* **x** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *array_like*) – If x is an integer, randomly permute `mt.arange(x)`.
If x is an array, make a copy and shuffle the elements
randomly.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The axis which x is shuffled along. Default is 0.
* **chunk_size** ( *: int* *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **Returns:**
**out** – Permuted sequence or tensor range.
* **Return type:**
Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> rng = mt.random.RandomState()
>>> rng.permutation(10).execute()
array([1, 2, 3, 7, 9, 8, 0, 6, 4, 5]) # random
>>> rng.permutation([1, 4, 9, 12, 15]).execute()
array([ 9, 4, 12, 1, 15]) # random
>>> arr = mt.arange(9).reshape((3, 3))
>>> rng.permutation(arr).execute()
array([[3, 4, 5], # random
[6, 7, 8],
[0, 1, 2]])
>>> rng.permutation("abc")
Traceback (most recent call last):
...
numpy.AxisError: x must be an integer or at least 1-dimensional
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.poisson.md
# maxframe.tensor.random.poisson
### maxframe.tensor.random.poisson(lam=1.0, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a Poisson distribution.
The Poisson distribution is the limit of the binomial distribution
for large N.
* **Parameters:**
* **lam** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Expectation of interval, should be >= 0. A sequence of expectation
intervals must be broadcastable over the requested size.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `lam` is a scalar. Otherwise,
`mt.array(lam).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized Poisson distribution.
* **Return type:**
Tensor or scalar
### Notes
The Poisson distribution
$$
f(k; \lambda)=\frac{\lambda^k e^{-\lambda}}{k!}
$$
For events with an expected separation $\lambda$ the Poisson
distribution $f(k; \lambda)$ describes the probability of
$k$ events occurring within the observed
interval $\lambda$.
Because the output is limited to the range of the C long type, a
ValueError is raised when lam is within 10 sigma of the maximum
representable value.
### References
* <a id='id1'>**[1]**</a> Weisstein, Eric W. “Poisson Distribution.” From MathWorld–A Wolfram Web Resource. [http://mathworld.wolfram.com/PoissonDistribution.html](http://mathworld.wolfram.com/PoissonDistribution.html)
* <a id='id2'>**[2]**</a> Wikipedia, “Poisson distribution”, [http://en.wikipedia.org/wiki/Poisson_distribution](http://en.wikipedia.org/wiki/Poisson_distribution)
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
>>> s = mt.random.poisson(5, 10000)
```
Display histogram of the sample:
```pycon
>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s.execute(), 14, normed=True)
>>> plt.show()
```
Draw each 100 values for lambda 100 and 500:
```pycon
>>> s = mt.random.poisson(lam=(100., 500.), size=(100, 2))
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.power.md
# maxframe.tensor.random.power
### maxframe.tensor.random.power(a, size=None, chunk_size=None, gpu=None, dtype=None)
Draws samples in [0, 1] from a power distribution with positive
exponent a - 1.
Also known as the power function distribution.
* **Parameters:**
* **a** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Parameter of the distribution. Should be greater than zero.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `a` is a scalar. Otherwise,
`mt.array(a).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized power distribution.
* **Return type:**
Tensor or scalar
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If a < 1.
### Notes
The probability density function is
$$
P(x; a) = ax^{a-1}, 0 \le x \le 1, a>0.
$$
The power function distribution is just the inverse of the Pareto
distribution. It may also be seen as a special case of the Beta
distribution.
It is used, for example, in modeling the over-reporting of insurance
claims.
### References
* <a id='id1'>**[1]**</a> Christian Kleiber, Samuel Kotz, “Statistical size distributions in economics and actuarial sciences”, Wiley, 2003.
* <a id='id2'>**[2]**</a> Heckert, N. A. and Filliben, James J. “NIST Handbook 148: Dataplot Reference Manual, Volume 2: Let Subcommands and Library Functions”, National Institute of Standards and Technology Handbook Series, June 2003. [http://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/powpdf.pdf](http://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/powpdf.pdf)
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = 5. # shape
>>> samples = 1000
>>> s = mt.random.power(a, samples)
```
Display the histogram of the samples, along with
the probability density function:
```pycon
>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s.execute(), bins=30)
>>> x = mt.linspace(0, 1, 100)
>>> y = a*x**(a-1.)
>>> normed_y = samples*mt.diff(bins)[0]*y
>>> plt.plot(x.execute(), normed_y.execute())
>>> plt.show()
```
Compare the power function distribution to the inverse of the Pareto.
```pycon
>>> from scipy import stats
>>> rvs = mt.random.power(5, 1000000)
>>> rvsp = mt.random.pareto(5, 1000000)
>>> xx = mt.linspace(0,1,100)
>>> powpdf = stats.powerlaw.pdf(xx.execute(),5)
```
```pycon
>>> plt.figure()
>>> plt.hist(rvs.execute(), bins=50, normed=True)
>>> plt.plot(xx.execute(),powpdf,'r-')
>>> plt.title('np.random.power(5)')
```
```pycon
>>> plt.figure()
>>> plt.hist((1./(1.+rvsp)).execute(), bins=50, normed=True)
>>> plt.plot(xx.execute(),powpdf,'r-')
>>> plt.title('inverse of 1 + np.random.pareto(5)')
```
```pycon
>>> plt.figure()
>>> plt.hist((1./(1.+rvsp)).execute(), bins=50, normed=True)
>>> plt.plot(xx.execute(),powpdf,'r-')
>>> plt.title('inverse of stats.pareto(5)')
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.rand.md
# maxframe.tensor.random.rand
### maxframe.tensor.random.rand(\*dn, \*\*kw)
Random values in a given shape.
Create a tensor of the given shape and populate it with
random samples from a uniform distributionc
over `[0, 1)`.
* **Parameters:**
* **d0** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The dimensions of the returned tensor, should all be positive.
If no argument is given a single Python float is returned.
* **d1** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The dimensions of the returned tensor, should all be positive.
If no argument is given a single Python float is returned.
* **...** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The dimensions of the returned tensor, should all be positive.
If no argument is given a single Python float is returned.
* **dn** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The dimensions of the returned tensor, should all be positive.
If no argument is given a single Python float is returned.
* **Returns:**
**out** – Random values.
* **Return type:**
Tensor, shape `(d0, d1, ..., dn)`
#### SEE ALSO
[`random`](maxframe.tensor.random.random.md#maxframe.tensor.random.random)
### Notes
This is a convenience function. If you want an interface that
takes a shape-tuple as the first argument, refer to
mt.random.random_sample .
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.random.rand(3, 2).execute()
array([[ 0.14022471, 0.96360618], #random
[ 0.37601032, 0.25528411], #random
[ 0.49313049, 0.94909878]]) #random
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.randint.md
# maxframe.tensor.random.randint
### maxframe.tensor.random.randint(low, high=None, size=None, dtype='l', density=None, chunk_size=None, gpu=None)
Return random integers from low (inclusive) to high (exclusive).
Return random integers from the “discrete uniform” distribution of
the specified dtype in the “half-open” interval [low, high). If
high is None (the default), then results are from [0, low).
* **Parameters:**
* **low** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Lowest (signed) integer to be drawn from the distribution (unless
`high=None`, in which case this parameter is one above the
*highest* such integer).
* **high** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – If provided, one above the largest (signed) integer to be drawn
from the distribution (see above for behavior if `high=None`).
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. Default is None, in which case a
single value is returned.
* **dtype** (*data-type* *,* *optional*) – Desired dtype of the result. All dtypes are determined by their
name, i.e., ‘int64’, ‘int’, etc, so byteorder is not available
and a specific precision may have different C types depending
on the platform. The default value is ‘np.int’.
* **density** ([*float*](https://docs.python.org/3/library/functions.html#float) *,* *optional*) – if density specified, a sparse tensor will be created
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** – Data-type of the returned tensor.
* **Returns:**
**out** – size-shaped tensor of random integers from the appropriate
distribution, or a single such random int if size not provided.
* **Return type:**
[int](https://docs.python.org/3/library/functions.html#int) or Tensor of ints
#### SEE ALSO
`random.random_integers`
: similar to randint, only for the closed interval [low, high], and 1 is the lowest value if high is omitted. In particular, this other one is the one to use to generate uniformly distributed discrete non-integers.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.random.randint(2, size=10).execute()
array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0])
>>> mt.random.randint(1, size=10).execute()
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
```
Generate a 2 x 4 tensor of ints between 0 and 4, inclusive:
```pycon
>>> mt.random.randint(5, size=(2, 4)).execute()
array([[4, 0, 2, 1],
[3, 2, 2, 0]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.randn.md
# maxframe.tensor.random.randn
### maxframe.tensor.random.randn(\*dn, \*\*kw)
Return a sample (or samples) from the “standard normal” distribution.
If positive, int_like or int-convertible arguments are provided,
randn generates an array of shape `(d0, d1, ..., dn)`, filled
with random floats sampled from a univariate “normal” (Gaussian)
distribution of mean 0 and variance 1 (if any of the $d_i$ are
floats, they are first converted to integers by truncation). A single
float randomly sampled from the distribution is returned if no
argument is provided.
This is a convenience function. If you want an interface that takes a
tuple as the first argument, use numpy.random.standard_normal instead.
* **Parameters:**
* **d0** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The dimensions of the returned tensor, should be all positive.
If no argument is given a single Python float is returned.
* **d1** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The dimensions of the returned tensor, should be all positive.
If no argument is given a single Python float is returned.
* **...** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The dimensions of the returned tensor, should be all positive.
If no argument is given a single Python float is returned.
* **dn** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The dimensions of the returned tensor, should be all positive.
If no argument is given a single Python float is returned.
* **Returns:**
**Z** – A `(d0, d1, ..., dn)`-shaped array of floating-point samples from
the standard normal distribution, or a single such float if
no parameters were supplied.
* **Return type:**
Tensor or [float](https://docs.python.org/3/library/functions.html#float)
#### SEE ALSO
`random.standard_normal`
: Similar, but takes a tuple as its argument.
### Notes
For random samples from $N(\mu, \sigma^2)$, use:
`sigma * mt.random.randn(...) + mu`
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.random.randn().execute()
2.1923875335537315 #random
```
Two-by-four tensor of samples from N(3, 6.25):
```pycon
>>> (2.5 * mt.random.randn(2, 4) + 3).execute()
array([[-4.49401501, 4.00950034, -1.81814867, 7.29718677], #random
[ 0.39924804, 4.68456316, 4.99394529, 4.84057254]]) #random
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.random.md
# maxframe.tensor.random.random
### maxframe.tensor.random.random(size=None, chunk_size=None, gpu=None, dtype=None)
Return random floats in the half-open interval [0.0, 1.0).
Results are from the “continuous uniform” distribution over the
stated interval. To sample $Unif[a, b), b > a$ multiply
the output of random_sample by (b-a) and add a:
```default
(b - a) * random_sample() + a
```
* **Parameters:**
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. Default is None, in which case a
single value is returned.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Array of random floats of shape size (unless `size=None`, in which
case a single float is returned).
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) or Tensor of floats
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.random.random_sample().execute()
0.47108547995356098
>>> type(mt.random.random_sample().execute())
<type 'float'>
>>> mt.random.random_sample((5,)).execute()
array([ 0.30220482, 0.86820401, 0.1654503 , 0.11659149, 0.54323428])
```
Three-by-two array of random numbers from [-5, 0):
```pycon
>>> (5 * mt.random.random_sample((3, 2)) - 5).execute()
array([[-3.99149989, -0.52338984],
[-2.99091858, -0.79479508],
[-1.23204345, -1.75224494]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.random_integers.md
# maxframe.tensor.random.random_integers
### maxframe.tensor.random.random_integers(low, high=None, size=None, chunk_size=None, gpu=None)
Random integers of type mt.int between low and high, inclusive.
Return random integers of type mt.int from the “discrete uniform”
distribution in the closed interval [low, high]. If high is
None (the default), then results are from [1, low]. The np.int
type translates to the C long type used by Python 2 for “short”
integers and its precision is platform dependent.
This function has been deprecated. Use randint instead.
* **Parameters:**
* **low** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Lowest (signed) integer to be drawn from the distribution (unless
`high=None`, in which case this parameter is the *highest* such
integer).
* **high** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – If provided, the largest (signed) integer to be drawn from the
distribution (see above for behavior if `high=None`).
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. Default is None, in which case a
single value is returned.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **Returns:**
**out** – size-shaped array of random integers from the appropriate
distribution, or a single such random int if size not provided.
* **Return type:**
[int](https://docs.python.org/3/library/functions.html#int) or Tensor of ints
#### SEE ALSO
[`random.randint`](https://docs.python.org/3/library/random.html#random.randint)
: Similar to random_integers, only for the half-open interval [low, high), and 0 is the lowest value if high is omitted.
### Notes
To sample from N evenly spaced floating-point numbers between a and b,
use:
```default
a + (b - a) * (np.random.random_integers(N) - 1) / (N - 1.)
```
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.random.random_integers(5).execute()
4
>>> type(mt.random.random_integers(5).execute())
<type 'int'>
>>> mt.random.random_integers(5, size=(3,2)).execute()
array([[5, 4],
[3, 3],
[4, 5]])
```
Choose five random numbers from the set of five evenly-spaced
numbers between 0 and 2.5, inclusive (*i.e.*, from the set
0, 5/8, 10/8, 15/8, 20/8$):
```pycon
>>> (2.5 * (mt.random.random_integers(5, size=(5,)) - 1) / 4.).execute()
array([ 0.625, 1.25 , 0.625, 0.625, 2.5 ])
```
Roll two six sided dice 1000 times and sum the results:
```pycon
>>> d1 = mt.random.random_integers(1, 6, 1000)
>>> d2 = mt.random.random_integers(1, 6, 1000)
>>> dsums = d1 + d2
```
Display results as a histogram:
```pycon
>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(dsums.execute(), 11, normed=True)
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.random_sample.md
# maxframe.tensor.random.random_sample
### maxframe.tensor.random.random_sample(size=None, chunk_size=None, gpu=None, dtype=None)
Return random floats in the half-open interval [0.0, 1.0).
Results are from the “continuous uniform” distribution over the
stated interval. To sample $Unif[a, b), b > a$ multiply
the output of random_sample by (b-a) and add a:
```default
(b - a) * random_sample() + a
```
* **Parameters:**
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. Default is None, in which case a
single value is returned.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Array of random floats of shape size (unless `size=None`, in which
case a single float is returned).
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) or Tensor of floats
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.random.random_sample().execute()
0.47108547995356098
>>> type(mt.random.random_sample().execute())
<type 'float'>
>>> mt.random.random_sample((5,)).execute()
array([ 0.30220482, 0.86820401, 0.1654503 , 0.11659149, 0.54323428])
```
Three-by-two array of random numbers from [-5, 0):
```pycon
>>> (5 * mt.random.random_sample((3, 2)) - 5).execute()
array([[-3.99149989, -0.52338984],
[-2.99091858, -0.79479508],
[-1.23204345, -1.75224494]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.ranf.md
# maxframe.tensor.random.ranf
### maxframe.tensor.random.ranf(size=None, chunk_size=None, gpu=None, dtype=None)
Return random floats in the half-open interval [0.0, 1.0).
Results are from the “continuous uniform” distribution over the
stated interval. To sample $Unif[a, b), b > a$ multiply
the output of random_sample by (b-a) and add a:
```default
(b - a) * random_sample() + a
```
* **Parameters:**
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. Default is None, in which case a
single value is returned.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Array of random floats of shape size (unless `size=None`, in which
case a single float is returned).
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) or Tensor of floats
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.random.random_sample().execute()
0.47108547995356098
>>> type(mt.random.random_sample().execute())
<type 'float'>
>>> mt.random.random_sample((5,)).execute()
array([ 0.30220482, 0.86820401, 0.1654503 , 0.11659149, 0.54323428])
```
Three-by-two array of random numbers from [-5, 0):
```pycon
>>> (5 * mt.random.random_sample((3, 2)) - 5).execute()
array([[-3.99149989, -0.52338984],
[-2.99091858, -0.79479508],
[-1.23204345, -1.75224494]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.rayleigh.md
# maxframe.tensor.random.rayleigh
### maxframe.tensor.random.rayleigh(scale=1.0, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a Rayleigh distribution.
The $\chi$ and Weibull distributions are generalizations of the
Rayleigh.
* **Parameters:**
* **scale** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats* *,* *optional*) – Scale, also equals the mode. Should be >= 0. Default is 1.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `scale` is a scalar. Otherwise,
`mt.array(scale).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized Rayleigh distribution.
* **Return type:**
Tensor or scalar
### Notes
The probability density function for the Rayleigh distribution is
$$
P(x;scale) = \frac{x}{scale^2}e^{\frac{-x^2}{2 \cdotp scale^2}}
$$
The Rayleigh distribution would arise, for example, if the East
and North components of the wind velocity had identical zero-mean
Gaussian distributions. Then the wind speed would have a Rayleigh
distribution.
### References
* <a id='id1'>**[1]**</a> Brighton Webs Ltd., “Rayleigh Distribution,” [http://www.brighton-webs.co.uk/distributions/rayleigh.asp](http://www.brighton-webs.co.uk/distributions/rayleigh.asp)
* <a id='id2'>**[2]**</a> Wikipedia, “Rayleigh distribution” [http://en.wikipedia.org/wiki/Rayleigh_distribution](http://en.wikipedia.org/wiki/Rayleigh_distribution)
### Examples
Draw values from the distribution and plot the histogram
```pycon
>>> import matplotlib.pyplot as plt
>>> import maxframe.tensor as mt
```
```pycon
>>> values = plt.hist(mt.random.rayleigh(3, 100000).execute(), bins=200, normed=True)
```
Wave heights tend to follow a Rayleigh distribution. If the mean wave
height is 1 meter, what fraction of waves are likely to be larger than 3
meters?
```pycon
>>> meanvalue = 1
>>> modevalue = mt.sqrt(2 / mt.pi) * meanvalue
>>> s = mt.random.rayleigh(modevalue, 1000000)
```
The percentage of waves larger than 3 meters is:
```pycon
>>> (100.*mt.sum(s>3)/1000000.).execute()
0.087300000000000003
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.sample.md
# maxframe.tensor.random.sample
### maxframe.tensor.random.sample(size=None, chunk_size=None, gpu=None, dtype=None)
Return random floats in the half-open interval [0.0, 1.0).
Results are from the “continuous uniform” distribution over the
stated interval. To sample $Unif[a, b), b > a$ multiply
the output of random_sample by (b-a) and add a:
```default
(b - a) * random_sample() + a
```
* **Parameters:**
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. Default is None, in which case a
single value is returned.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Array of random floats of shape size (unless `size=None`, in which
case a single float is returned).
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) or Tensor of floats
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.random.random_sample().execute()
0.47108547995356098
>>> type(mt.random.random_sample().execute())
<type 'float'>
>>> mt.random.random_sample((5,)).execute()
array([ 0.30220482, 0.86820401, 0.1654503 , 0.11659149, 0.54323428])
```
Three-by-two array of random numbers from [-5, 0):
```pycon
>>> (5 * mt.random.random_sample((3, 2)) - 5).execute()
array([[-3.99149989, -0.52338984],
[-2.99091858, -0.79479508],
[-1.23204345, -1.75224494]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.seed.md
# maxframe.tensor.random.seed
### maxframe.tensor.random.seed(seed=None)
Seed the generator.
This method is called when RandomState is initialized. It can be
called again to re-seed the generator. For details, see RandomState.
* **Parameters:**
**seed** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *1-d array_like* *,* *optional*) – Seed for RandomState.
Must be convertible to 32 bit unsigned integers.
#### SEE ALSO
[`RandomState`](maxframe.tensor.random.RandomState.md#maxframe.tensor.random.RandomState)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.standard_cauchy.md
# maxframe.tensor.random.standard_cauchy
### maxframe.tensor.random.standard_cauchy(size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a standard Cauchy distribution with mode = 0.
Also known as the Lorentz distribution.
* **Parameters:**
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. Default is None, in which case a
single value is returned.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**samples** – The drawn samples.
* **Return type:**
Tensor or scalar
### Notes
The probability density function for the full Cauchy distribution is
$$
P(x; x_0, \gamma) = \frac{1}{\pi \gamma \bigl[ 1+
(\frac{x-x_0}{\gamma})^2 \bigr] }
$$
and the Standard Cauchy distribution just sets $x_0=0$ and
$\gamma=1$
The Cauchy distribution arises in the solution to the driven harmonic
oscillator problem, and also describes spectral line broadening. It
also describes the distribution of values at which a line tilted at
a random angle will cut the x axis.
When studying hypothesis tests that assume normality, seeing how the
tests perform on data from a Cauchy distribution is a good indicator of
their sensitivity to a heavy-tailed distribution, since the Cauchy looks
very much like a Gaussian distribution, but with heavier tails.
### References
* <a id='id1'>**[1]**</a> NIST/SEMATECH e-Handbook of Statistical Methods, “Cauchy Distribution”, [http://www.itl.nist.gov/div898/handbook/eda/section3/eda3663.htm](http://www.itl.nist.gov/div898/handbook/eda/section3/eda3663.htm)
* <a id='id2'>**[2]**</a> Weisstein, Eric W. “Cauchy Distribution.” From MathWorld–A Wolfram Web Resource. [http://mathworld.wolfram.com/CauchyDistribution.html](http://mathworld.wolfram.com/CauchyDistribution.html)
* <a id='id3'>**[3]**</a> Wikipedia, “Cauchy distribution” [http://en.wikipedia.org/wiki/Cauchy_distribution](http://en.wikipedia.org/wiki/Cauchy_distribution)
### Examples
Draw samples and plot the distribution:
```pycon
>>> import maxframe.tensor as mt
>>> import matplotlib.pyplot as plt
```
```pycon
>>> s = mt.random.standard_cauchy(1000000)
>>> s = s[(s>-25) & (s<25)] # truncate distribution so it plots well
>>> plt.hist(s.execute(), bins=100)
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.standard_exponential.md
# maxframe.tensor.random.standard_exponential
### maxframe.tensor.random.standard_exponential(size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from the standard exponential distribution.
standard_exponential is identical to the exponential distribution
with a scale parameter of 1.
* **Parameters:**
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. Default is None, in which case a
single value is returned.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples.
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) or Tensor
### Examples
Output a 3x8000 tensor:
```pycon
>>> import maxframe.tensor as mt
>>> n = mt.random.standard_exponential((3, 8000))
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.standard_gamma.md
# maxframe.tensor.random.standard_gamma
### maxframe.tensor.random.standard_gamma(shape, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a standard Gamma distribution.
Samples are drawn from a Gamma distribution with specified parameters,
shape (sometimes designated “k”) and scale=1.
* **Parameters:**
* **shape** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Parameter, should be > 0.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `shape` is a scalar. Otherwise,
`mt.array(shape).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized standard gamma distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.gamma`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gamma.html#scipy.stats.gamma)
: probability density function, distribution or cumulative density function, etc.
### Notes
The probability density for the Gamma distribution is
$$
p(x) = x^{k-1}\frac{e^{-x/\theta}}{\theta^k\Gamma(k)},
$$
where $k$ is the shape and $\theta$ the scale,
and $\Gamma$ is the Gamma function.
The Gamma distribution is often used to model the times to failure of
electronic components, and arises naturally in processes for which the
waiting times between Poisson distributed events are relevant.
### References
* <a id='id1'>**[1]**</a> Weisstein, Eric W. “Gamma Distribution.” From MathWorld–A Wolfram Web Resource. [http://mathworld.wolfram.com/GammaDistribution.html](http://mathworld.wolfram.com/GammaDistribution.html)
* <a id='id2'>**[2]**</a> Wikipedia, “Gamma distribution”, [http://en.wikipedia.org/wiki/Gamma_distribution](http://en.wikipedia.org/wiki/Gamma_distribution)
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> shape, scale = 2., 1. # mean and width
>>> s = mt.random.standard_gamma(shape, 1000000)
```
Display the histogram of the samples, along with
the probability density function:
```pycon
>>> import matplotlib.pyplot as plt
>>> import scipy.special as sps
>>> count, bins, ignored = plt.hist(s.execute(), 50, normed=True)
>>> y = bins**(shape-1) * ((mt.exp(-bins/scale))/ \
... (sps.gamma(shape) * scale**shape))
>>> plt.plot(bins, y.execute(), linewidth=2, color='r')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.standard_normal.md
# maxframe.tensor.random.standard_normal
### maxframe.tensor.random.standard_normal(size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a standard Normal distribution (mean=0, stdev=1).
* **Parameters:**
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. Default is None, in which case a
single value is returned.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples.
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float) or Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> s = mt.random.standard_normal(8000)
>>> s.execute()
array([ 0.6888893 , 0.78096262, -0.89086505, ..., 0.49876311, #random
-0.38672696, -0.4685006 ]) #random
>>> s.shape
(8000,)
>>> s = mt.random.standard_normal(size=(3, 4, 2))
>>> s.shape
(3, 4, 2)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.standard_t.md
# maxframe.tensor.random.standard_t
### maxframe.tensor.random.standard_t(df, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a standard Student’s t distribution with df degrees
of freedom.
A special case of the hyperbolic distribution. As df gets
large, the result resembles that of the standard normal
distribution (standard_normal).
* **Parameters:**
* **df** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Degrees of freedom, should be > 0.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `df` is a scalar. Otherwise,
`mt.array(df).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized standard Student’s t distribution.
* **Return type:**
Tensor or scalar
### Notes
The probability density function for the t distribution is
$$
P(x, df) = \frac{\Gamma(\frac{df+1}{2})}{\sqrt{\pi df}
\Gamma(\frac{df}{2})}\Bigl( 1+\frac{x^2}{df} \Bigr)^{-(df+1)/2}
$$
The t test is based on an assumption that the data come from a
Normal distribution. The t test provides a way to test whether
the sample mean (that is the mean calculated from the data) is
a good estimate of the true mean.
The derivation of the t-distribution was first published in
1908 by William Gosset while working for the Guinness Brewery
in Dublin. Due to proprietary issues, he had to publish under
a pseudonym, and so he used the name Student.
### References
* <a id='id1'>**[1]**</a> Dalgaard, Peter, “Introductory Statistics With R”, Springer, 2002.
* <a id='id2'>**[2]**</a> Wikipedia, “Student’s t-distribution” [http://en.wikipedia.org/wiki/Student’s_t-distribution](http://en.wikipedia.org/wiki/Student's_t-distribution)
### Examples
From Dalgaard page 83 <sup>[1](#id1)</sup>, suppose the daily energy intake for 11
women in Kj is:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> intake = mt.array([5260., 5470, 5640, 6180, 6390, 6515, 6805, 7515, \
... 7515, 8230, 8770])
```
Does their energy intake deviate systematically from the recommended
value of 7725 kJ?
We have 10 degrees of freedom, so is the sample mean within 95% of the
recommended value?
```pycon
>>> s = mt.random.standard_t(10, size=100000)
>>> mt.mean(intake).execute()
6753.636363636364
>>> intake.std(ddof=1).execute()
1142.1232221373727
```
Calculate the t statistic, setting the ddof parameter to the unbiased
value so the divisor in the standard deviation will be degrees of
freedom, N-1.
```pycon
>>> t = (mt.mean(intake)-7725)/(intake.std(ddof=1)/mt.sqrt(len(intake)))
>>> import matplotlib.pyplot as plt
>>> h = plt.hist(s.execute(), bins=100, normed=True)
```
For a one-sided t-test, how far out in the distribution does the t
statistic appear?
```pycon
>>> (mt.sum(s<t) / float(len(s))).execute()
0.0090699999999999999 #random
```
So the p-value is about 0.009, which says the null hypothesis has a
probability of about 99% of being true.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.triangular.md
# maxframe.tensor.random.triangular
### maxframe.tensor.random.triangular(left, mode, right, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from the triangular distribution over the
interval `[left, right]`.
The triangular distribution is a continuous probability
distribution with lower limit left, peak at mode, and upper
limit right. Unlike the other distributions, these parameters
directly define the shape of the pdf.
* **Parameters:**
* **left** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Lower limit.
* **mode** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – The value where the peak of the distribution occurs.
The value should fulfill the condition `left <= mode <= right`.
* **right** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Upper limit, should be larger than left.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `left`, `mode`, and `right`
are all scalars. Otherwise, `mt.broadcast(left, mode, right).size`
samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized triangular distribution.
* **Return type:**
Tensor or scalar
### Notes
The probability density function for the triangular distribution is
$$
P(x;l, m, r) = \begin{cases}
\frac{2(x-l)}{(r-l)(m-l)}& \text{for $l \leq x \leq m$},\\
\frac{2(r-x)}{(r-l)(r-m)}& \text{for $m \leq x \leq r$},\\
0& \text{otherwise}.
\end{cases}
$$
The triangular distribution is often used in ill-defined
problems where the underlying distribution is not known, but
some knowledge of the limits and mode exists. Often it is used
in simulations.
### References
* <a id='id1'>**[1]**</a> Wikipedia, “Triangular distribution” [http://en.wikipedia.org/wiki/Triangular_distribution](http://en.wikipedia.org/wiki/Triangular_distribution)
### Examples
Draw values from the distribution and plot the histogram:
```pycon
>>> import matplotlib.pyplot as plt
>>> import maxframe.tensor as mt
>>> h = plt.hist(mt.random.triangular(-3, 0, 8, 100000).execute(), bins=200,
... normed=True)
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.uniform.md
# maxframe.tensor.random.uniform
### maxframe.tensor.random.uniform(low=0.0, high=1.0, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a uniform distribution.
Samples are uniformly distributed over the half-open interval
`[low, high)` (includes low, but excludes high). In other words,
any value within the given interval is equally likely to be drawn
by uniform.
* **Parameters:**
* **low** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats* *,* *optional*) – Lower boundary of the output interval. All values generated will be
greater than or equal to low. The default value is 0.
* **high** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Upper boundary of the output interval. All values generated will be
less than high. The default value is 1.0.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `low` and `high` are both scalars.
Otherwise, `mt.broadcast(low, high).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized uniform distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`randint`](maxframe.tensor.random.randint.md#maxframe.tensor.random.randint)
: Discrete uniform distribution, yielding integers.
[`random_integers`](maxframe.tensor.random.random_integers.md#maxframe.tensor.random.random_integers)
: Discrete uniform distribution over the closed interval `[low, high]`.
[`random_sample`](maxframe.tensor.random.random_sample.md#maxframe.tensor.random.random_sample)
: Floats uniformly distributed over `[0, 1)`.
[`random`](maxframe.tensor.random.random.md#maxframe.tensor.random.random)
: Alias for random_sample.
[`rand`](maxframe.tensor.random.rand.md#maxframe.tensor.random.rand)
: Convenience function that accepts dimensions as input, e.g., `rand(2,2)` would generate a 2-by-2 array of floats, uniformly distributed over `[0, 1)`.
### Notes
The probability density function of the uniform distribution is
$$
p(x) = \frac{1}{b - a}
$$
anywhere within the interval `[a, b)`, and zero elsewhere.
When `high` == `low`, values of `low` will be returned.
If `high` < `low`, the results are officially undefined
and may eventually raise an error, i.e. do not rely on this
function to behave when passed arguments satisfying that
inequality condition.
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> s = mt.random.uniform(-1,0,1000)
```
All values are within the given interval:
```pycon
>>> mt.all(s >= -1).execute()
True
>>> mt.all(s < 0).execute()
True
```
Display the histogram of the samples, along with the
probability density function:
```pycon
>>> import matplotlib.pyplot as plt
>>> count, bins, ignored = plt.hist(s.execute(), 15, normed=True)
>>> plt.plot(bins, mt.ones_like(bins).execute(), linewidth=2, color='r')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.vonmises.md
# maxframe.tensor.random.vonmises
### maxframe.tensor.random.vonmises(mu, kappa, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a von Mises distribution.
Samples are drawn from a von Mises distribution with specified mode
(mu) and dispersion (kappa), on the interval [-pi, pi].
The von Mises distribution (also known as the circular normal
distribution) is a continuous probability distribution on the unit
circle. It may be thought of as the circular analogue of the normal
distribution.
* **Parameters:**
* **mu** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Mode (“center”) of the distribution.
* **kappa** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Dispersion of the distribution, has to be >=0.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `mu` and `kappa` are both scalars.
Otherwise, `np.broadcast(mu, kappa).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized von Mises distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.vonmises`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.vonmises.html#scipy.stats.vonmises)
: probability density function, distribution, or cumulative density function, etc.
### Notes
The probability density for the von Mises distribution is
$$
p(x) = \frac{e^{\kappa cos(x-\mu)}}{2\pi I_0(\kappa)},
$$
where $\mu$ is the mode and $\kappa$ the dispersion,
and $I_0(\kappa)$ is the modified Bessel function of order 0.
The von Mises is named for Richard Edler von Mises, who was born in
Austria-Hungary, in what is now the Ukraine. He fled to the United
States in 1939 and became a professor at Harvard. He worked in
probability theory, aerodynamics, fluid mechanics, and philosophy of
science.
### References
* <a id='id1'>**[1]**</a> Abramowitz, M. and Stegun, I. A. (Eds.). “Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th printing,” New York: Dover, 1972.
* <a id='id2'>**[2]**</a> von Mises, R., “Mathematical Theory of Probability and Statistics”, New York: Academic Press, 1964.
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mu, kappa = 0.0, 4.0 # mean and dispersion
>>> s = mt.random.vonmises(mu, kappa, 1000)
```
Display the histogram of the samples, along with
the probability density function:
```pycon
>>> import matplotlib.pyplot as plt
>>> from scipy.special import i0
>>> plt.hist(s.execute(), 50, normed=True)
>>> x = mt.linspace(-mt.pi, mt.pi, num=51)
>>> y = mt.exp(kappa*mt.cos(x-mu))/(2*mt.pi*i0(kappa))
>>> plt.plot(x.execute(), y.execute(), linewidth=2, color='r')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.wald.md
# maxframe.tensor.random.wald
### maxframe.tensor.random.wald(mean, scale, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a Wald, or inverse Gaussian, distribution.
As the scale approaches infinity, the distribution becomes more like a
Gaussian. Some references claim that the Wald is an inverse Gaussian
with mean equal to 1, but this is by no means universal.
The inverse Gaussian distribution was first studied in relationship to
Brownian motion. In 1956 M.C.K. Tweedie used the name inverse Gaussian
because there is an inverse relationship between the time to cover a
unit distance and distance covered in unit time.
* **Parameters:**
* **mean** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Distribution mean, should be > 0.
* **scale** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Scale parameter, should be >= 0.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `mean` and `scale` are both scalars.
Otherwise, `np.broadcast(mean, scale).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized Wald distribution.
* **Return type:**
Tensor or scalar
### Notes
The probability density function for the Wald distribution is
$$
P(x;mean,scale) = \sqrt{\frac{scale}{2\pi x^3}}e^
\frac{-scale(x-mean)^2}{2\cdotp mean^2x}
$$
As noted above the inverse Gaussian distribution first arise
from attempts to model Brownian motion. It is also a
competitor to the Weibull for use in reliability modeling and
modeling stock returns and interest rate processes.
### References
* <a id='id1'>**[1]**</a> Brighton Webs Ltd., Wald Distribution, [http://www.brighton-webs.co.uk/distributions/wald.asp](http://www.brighton-webs.co.uk/distributions/wald.asp)
* <a id='id2'>**[2]**</a> Chhikara, Raj S., and Folks, J. Leroy, “The Inverse Gaussian Distribution: Theory : Methodology, and Applications”, CRC Press, 1988.
* <a id='id3'>**[3]**</a> Wikipedia, “Wald distribution” [http://en.wikipedia.org/wiki/Wald_distribution](http://en.wikipedia.org/wiki/Wald_distribution)
### Examples
Draw values from the distribution and plot the histogram:
```pycon
>>> import matplotlib.pyplot as plt
>>> import maxframe.tensor as mt
>>> h = plt.hist(mt.random.wald(3, 2, 100000).execute(), bins=200, normed=True)
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.weibull.md
# maxframe.tensor.random.weibull
### maxframe.tensor.random.weibull(a, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a Weibull distribution.
Draw samples from a 1-parameter Weibull distribution with the given
shape parameter a.
$$
X = (-ln(U))^{1/a}
$$
Here, U is drawn from the uniform distribution over (0,1].
The more common 2-parameter Weibull, including a scale parameter
$\lambda$ is just $X = \lambda(-ln(U))^{1/a}$.
* **Parameters:**
* **a** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Shape of the distribution. Should be greater than zero.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `a` is a scalar. Otherwise,
`mt.array(a).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized Weibull distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.weibull_max`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.weibull_max.html#scipy.stats.weibull_max), [`scipy.stats.weibull_min`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.weibull_min.html#scipy.stats.weibull_min), [`scipy.stats.genextreme`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.genextreme.html#scipy.stats.genextreme), [`gumbel`](maxframe.tensor.random.gumbel.md#maxframe.tensor.random.gumbel)
### Notes
The Weibull (or Type III asymptotic extreme value distribution
for smallest values, SEV Type III, or Rosin-Rammler
distribution) is one of a class of Generalized Extreme Value
(GEV) distributions used in modeling extreme value problems.
This class includes the Gumbel and Frechet distributions.
The probability density for the Weibull distribution is
$$
p(x) = \frac{a}
{\lambda}(\frac{x}{\lambda})^{a-1}e^{-(x/\lambda)^a},
$$
where $a$ is the shape and $\lambda$ the scale.
The function has its peak (the mode) at
$\lambda(\frac{a-1}{a})^{1/a}$.
When `a = 1`, the Weibull distribution reduces to the exponential
distribution.
### References
* <a id='id1'>**[1]**</a> Waloddi Weibull, Royal Technical University, Stockholm, 1939 “A Statistical Theory Of The Strength Of Materials”, Ingeniorsvetenskapsakademiens Handlingar Nr 151, 1939, Generalstabens Litografiska Anstalts Forlag, Stockholm.
* <a id='id2'>**[2]**</a> Waloddi Weibull, “A Statistical Distribution Function of Wide Applicability”, Journal Of Applied Mechanics ASME Paper 1951.
* <a id='id3'>**[3]**</a> Wikipedia, “Weibull distribution”, [http://en.wikipedia.org/wiki/Weibull_distribution](http://en.wikipedia.org/wiki/Weibull_distribution)
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = 5. # shape
>>> s = mt.random.weibull(a, 1000)
```
Display the histogram of the samples, along with
the probability density function:
```pycon
>>> import matplotlib.pyplot as plt
>>> x = mt.arange(1,100.)/50.
>>> def weib(x,n,a):
... return (a / n) * (x / n)**(a - 1) * mt.exp(-(x / n)**a)
```
```pycon
>>> count, bins, ignored = plt.hist(mt.random.weibull(5.,1000).execute())
>>> x = mt.arange(1,100.)/50.
>>> scale = count.max()/weib(x, 1., 5.).max()
>>> plt.plot(x.execute(), (weib(x, 1., 5.)*scale).execute())
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.random.zipf.md
# maxframe.tensor.random.zipf
### maxframe.tensor.random.zipf(a, size=None, chunk_size=None, gpu=None, dtype=None)
Draw samples from a Zipf distribution.
Samples are drawn from a Zipf distribution with specified parameter
a > 1.
The Zipf distribution (also known as the zeta distribution) is a
continuous probability distribution that satisfies Zipf’s law: the
frequency of an item is inversely proportional to its rank in a
frequency table.
* **Parameters:**
* **a** ([*float*](https://docs.python.org/3/library/functions.html#float) *or* *array_like* *of* *floats*) – Distribution parameter. Should be greater than 1.
* **size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Output shape. If the given shape is, e.g., `(m, n, k)`, then
`m * n * k` samples are drawn. If size is `None` (default),
a single value is returned if `a` is a scalar. Otherwise,
`mt.array(a).size` samples are drawn.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **dtype** (*data-type* *,* *optional*) – Data-type of the returned tensor.
* **Returns:**
**out** – Drawn samples from the parameterized Zipf distribution.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`scipy.stats.zipf`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.zipf.html#scipy.stats.zipf)
: probability density function, distribution, or cumulative density function, etc.
### Notes
The probability density for the Zipf distribution is
$$
p(x) = \frac{x^{-a}}{\zeta(a)},
$$
where $\zeta$ is the Riemann Zeta function.
It is named for the American linguist George Kingsley Zipf, who noted
that the frequency of any word in a sample of a language is inversely
proportional to its rank in the frequency table.
### References
* <a id='id1'>**[1]**</a> Zipf, G. K., “Selected Studies of the Principle of Relative Frequency in Language,” Cambridge, MA: Harvard Univ. Press, 1932.
### Examples
Draw samples from the distribution:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = 2. # parameter
>>> s = mt.random.zipf(a, 1000)
```
Display the histogram of the samples, along with
the probability density function:
```pycon
>>> import matplotlib.pyplot as plt
>>> from scipy import special
```
Truncate s values at 50 so plot is interesting:
```pycon
>>> count, bins, ignored = plt.hist(s[s<50].execute(), 50, normed=True)
>>> x = mt.arange(1., 50.)
>>> y = x**(-a) / special.zetac(a)
>>> plt.plot(x.execute(), (y/mt.max(y)).execute(), linewidth=2, color='r')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.ravel.md
# maxframe.tensor.ravel
### maxframe.tensor.ravel(a, order='C')
Return a contiguous flattened tensor.
A 1-D tensor, containing the elements of the input, is returned. A copy is
made only if needed.
* **Parameters:**
* **a** (*array_like*) – Input tensor. The elements in a are packed as a 1-D tensor.
* **order** ( *{'C'* *,* *'F'* *,* *'A'* *,* *'K'}* *,* *optional*) – The elements of a are read using this index order. ‘C’ means
to index the elements in row-major, C-style order,
with the last axis index changing fastest, back to the first
axis index changing slowest. ‘F’ means to index the elements
in column-major, Fortran-style order, with the
first index changing fastest, and the last index changing
slowest. Note that the ‘C’ and ‘F’ options take no account of
the memory layout of the underlying array, and only refer to
the order of axis indexing. ‘A’ means to read the elements in
Fortran-like index order if a is Fortran *contiguous* in
memory, C-like order otherwise. ‘K’ means to read the
elements in the order they occur in memory, except for
reversing the data when strides are negative. By default, ‘C’
index order is used.
* **Returns:**
**y** – If a is a matrix, y is a 1-D tensor, otherwise y is a tensor of
the same subtype as a. The shape of the returned array is
`(a.size,)`. Matrices are special cased for backward
compatibility.
* **Return type:**
array_like
#### SEE ALSO
`Tensor.flat`
: 1-D iterator over an array.
`Tensor.flatten`
: 1-D array copy of the elements of an array in row-major order.
`Tensor.reshape`
: Change the shape of an array without changing its data.
### Examples
It is equivalent to `reshape(-1)`.
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([[1, 2, 3], [4, 5, 6]])
>>> print(mt.ravel(x).execute())
[1 2 3 4 5 6]
```
```pycon
>>> print(x.reshape(-1).execute())
[1 2 3 4 5 6]
```
```pycon
>>> print(mt.ravel(x.T).execute())
[1 4 2 5 3 6]
```
```pycon
>>> a = mt.arange(12).reshape(2,3,2).swapaxes(1,2); a.execute()
array([[[ 0, 2, 4],
[ 1, 3, 5]],
[[ 6, 8, 10],
[ 7, 9, 11]]])
>>> a.ravel().execute()
array([ 0, 2, 4, 1, 3, 5, 6, 8, 10, 7, 9, 11])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.real.md
# maxframe.tensor.real
### maxframe.tensor.real(val, \*\*kwargs)
Return the real part of the complex argument.
* **Parameters:**
**val** (*array_like*) – Input tensor.
* **Returns:**
**out** – The real component of the complex argument. If val is real, the type
of val is used for the output. If val has complex elements, the
returned type is float.
* **Return type:**
Tensor or scalar
#### SEE ALSO
`real_if_close`, [`imag`](maxframe.tensor.imag.md#maxframe.tensor.imag), [`angle`](maxframe.tensor.angle.md#maxframe.tensor.angle)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([1+2j, 3+4j, 5+6j])
>>> a.real.execute()
array([ 1., 3., 5.])
>>> a.real = 9
>>> a.execute()
array([ 9.+2.j, 9.+4.j, 9.+6.j])
>>> a.real = mt.array([9, 8, 7])
>>> a.execute()
array([ 9.+2.j, 8.+4.j, 7.+6.j])
>>> mt.real(1 + 1j).execute()
1.0
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.reciprocal.md
# maxframe.tensor.reciprocal
### maxframe.tensor.reciprocal(x, out=None, where=None, \*\*kwargs)
Return the reciprocal of the argument, element-wise.
Calculates `1/x`.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – Return tensor.
* **Return type:**
Tensor
### Notes
#### NOTE
This function is not designed to work with integers.
For integer arguments with absolute value larger than 1 the result is
always zero because of the way Python handles integer division. For
integer zero the result is an overflow.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.reciprocal(2.).execute()
0.5
>>> mt.reciprocal([1, 2., 3.33]).execute()
array([ 1. , 0.5 , 0.3003003])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.remainder.md
# maxframe.tensor.remainder
### maxframe.tensor.remainder(x1, x2, out=None, where=None, \*\*kwargs)
Return element-wise remainder of division.
Computes the remainder complementary to the floor_divide function. It is
equivalent to the Python modulus operator\`\`x1 % x2\`\` and has the same sign
as the divisor x2. The MATLAB function equivalent to `np.remainder`
is `mod`.
#### WARNING
This should not be confused with:
* Python 3.7’s math.remainder and C’s `remainder`, which
computes the IEEE remainder, which are the complement to
`round(x1 / x2)`.
* The MATLAB `rem` function and or the C `%` operator which is the
complement to `int(x1 / x2)`.
* **Parameters:**
* **x1** (*array_like*) – Dividend array.
* **x2** (*array_like*) – Divisor array.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The element-wise remainder of the quotient `floor_divide(x1, x2)`.
Returns a scalar if both x1 and x2 are scalars.
* **Return type:**
Tensor
#### SEE ALSO
[`floor_divide`](maxframe.tensor.floor_divide.md#maxframe.tensor.floor_divide)
: Equivalent of Python `//` operator.
[`divmod`](https://docs.python.org/3/library/functions.html#divmod)
: Simultaneous floor division and remainder.
[`fmod`](maxframe.tensor.fmod.md#maxframe.tensor.fmod)
: Equivalent of the MATLAB `rem` function.
[`divide`](maxframe.tensor.divide.md#maxframe.tensor.divide), [`floor`](maxframe.tensor.floor.md#maxframe.tensor.floor)
### Notes
Returns 0 when x2 is 0 and both x1 and x2 are (tensors of)
integers.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.remainder([4, 7], [2, 3]).execute()
array([0, 1])
>>> mt.remainder(mt.arange(7), 5).execute()
array([0, 1, 2, 3, 4, 0, 1])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.repeat.md
# maxframe.tensor.repeat
### maxframe.tensor.repeat(a, repeats, axis=None)
Repeat elements of a tensor.
* **Parameters:**
* **a** (*array_like*) – Input tensor.
* **repeats** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *tensor* *of* *ints*) – The number of repetitions for each element. repeats is broadcasted
to fit the shape of the given axis.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The axis along which to repeat values. By default, use the
flattened input tensor, and return a flat output tensor.
* **Returns:**
**repeated_tensor** – Output array which has the same shape as a, except along
the given axis.
* **Return type:**
Tensor
#### SEE ALSO
[`tile`](maxframe.tensor.tile.md#maxframe.tensor.tile)
: Tile a tensor.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.repeat(3, 4).execute()
array([3, 3, 3, 3])
>>> x = mt.array([[1,2],[3,4]])
>>> mt.repeat(x, 2).execute()
array([1, 1, 2, 2, 3, 3, 4, 4])
>>> mt.repeat(x, 3, axis=1).execute()
array([[1, 1, 1, 2, 2, 2],
[3, 3, 3, 4, 4, 4]])
>>> mt.repeat(x, [1, 2], axis=0).execute()
array([[1, 2],
[3, 4],
[3, 4]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.reshape.md
# maxframe.tensor.reshape
### maxframe.tensor.reshape(a, newshape, order='C')
Gives a new shape to a tensor without changing its data.
* **Parameters:**
* **a** (*array_like*) – Tensor to be reshaped.
* **newshape** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints*) – The new shape should be compatible with the original shape. If
an integer, then the result will be a 1-D tensor of that length.
One shape dimension can be -1. In this case, the value is
inferred from the length of the tensor and remaining dimensions.
* **order** ( *{'C'* *,* *'F'* *,* *'A'}* *,* *optional*) – Read the elements of a using this index order, and place the
elements into the reshaped array using this index order. ‘C’
means to read / write the elements using C-like index order,
with the last axis index changing fastest, back to the first
axis index changing slowest. ‘F’ means to read / write the
elements using Fortran-like index order, with the first index
changing fastest, and the last index changing slowest. Note that
the ‘C’ and ‘F’ options take no account of the memory layout of
the underlying array, and only refer to the order of indexing.
‘A’ means to read / write the elements in Fortran-like index
order if a is Fortran *contiguous* in memory, C-like order
otherwise.
* **Returns:**
**reshaped_array** – This will be a new view object if possible; otherwise, it will
be a copy.
* **Return type:**
Tensor
#### SEE ALSO
`Tensor.reshape`
: Equivalent method.
### Notes
It is not always possible to change the shape of a tensor without
copying the data. If you want an error to be raised when the data is copied,
you should assign the new shape to the shape attribute of the array:
```default
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.arange(6).reshape((3, 2))
>>> a.execute()
array([[0, 1],
[2, 3],
[4, 5]])
```
You can think of reshaping as first raveling the tensor (using the given
index order), then inserting the elements from the raveled tensor into the
new tensor using the same kind of index ordering as was used for the
raveling.
```pycon
>>> mt.reshape(a, (2, 3)).execute()
array([[0, 1, 2],
[3, 4, 5]])
>>> mt.reshape(mt.ravel(a), (2, 3)).execute()
array([[0, 1, 2],
[3, 4, 5]])
```
### Examples
```pycon
>>> a = mt.array([[1,2,3], [4,5,6]])
>>> mt.reshape(a, 6).execute()
array([1, 2, 3, 4, 5, 6])
```
```pycon
>>> mt.reshape(a, (3,-1)).execute() # the unspecified value is inferred to be 2
array([[1, 2],
[3, 4],
[5, 6]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.right_shift.md
# maxframe.tensor.right_shift
### maxframe.tensor.right_shift(x1, x2, out=None, where=None, \*\*kwargs)
Shift the bits of an integer to the right.
Bits are shifted to the right x2. Because the internal
representation of numbers is in binary format, this operation is
equivalent to dividing x1 by `2**x2`.
* **Parameters:**
* **x1** (*array_like* *,* [*int*](https://docs.python.org/3/library/functions.html#int)) – Input values.
* **x2** (*array_like* *,* [*int*](https://docs.python.org/3/library/functions.html#int)) – Number of bits to remove at the right of x1.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Return x1 with bits shifted x2 times to the right.
* **Return type:**
Tensor, [int](https://docs.python.org/3/library/functions.html#int)
#### SEE ALSO
[`left_shift`](maxframe.tensor.left_shift.md#maxframe.tensor.left_shift)
: Shift the bits of an integer to the left.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> mt.right_shift(10, 1).execute()
5
```
```pycon
>>> mt.right_shift(10, [1,2,3]).execute()
array([5, 2, 1])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.rint.md
# maxframe.tensor.rint
### maxframe.tensor.rint(x, out=None, where=None, \*\*kwargs)
Round elements of the tensor to the nearest integer.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Output array is same shape and type as x.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`ceil`](maxframe.tensor.ceil.md#maxframe.tensor.ceil), [`floor`](maxframe.tensor.floor.md#maxframe.tensor.floor), [`trunc`](maxframe.tensor.trunc.md#maxframe.tensor.trunc)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([-1.7, -1.5, -0.2, 0.2, 1.5, 1.7, 2.0])
>>> mt.rint(a).execute()
array([-2., -2., -0., 0., 2., 2., 2.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.roll.md
# maxframe.tensor.roll
### maxframe.tensor.roll(a, shift, axis=None)
Roll tensor elements along a given axis.
Elements that roll beyond the last position are re-introduced at
the first.
* **Parameters:**
* **a** (*array_like*) – Input tensor.
* **shift** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints*) – The number of places by which elements are shifted. If a tuple,
then axis must be a tuple of the same size, and each of the
given axes is shifted by the corresponding number. If an int
while axis is a tuple of ints, then the same value is used for
all given axes.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Axis or axes along which elements are shifted. By default, the
tensor is flattened before shifting, after which the original
shape is restored.
* **Returns:**
**res** – Output tensor, with the same shape as a.
* **Return type:**
Tensor
#### SEE ALSO
[`rollaxis`](maxframe.tensor.rollaxis.md#maxframe.tensor.rollaxis)
: Roll the specified axis backwards, until it lies in a given position.
### Notes
Supports rolling over multiple dimensions simultaneously.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(10)
>>> mt.roll(x, 2).execute()
array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
```
```pycon
>>> x2 = mt.reshape(x, (2,5))
>>> x2.execute()
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> mt.roll(x2, 1).execute()
array([[9, 0, 1, 2, 3],
[4, 5, 6, 7, 8]])
>>> mt.roll(x2, 1, axis=0).execute()
array([[5, 6, 7, 8, 9],
[0, 1, 2, 3, 4]])
>>> mt.roll(x2, 1, axis=1).execute()
array([[4, 0, 1, 2, 3],
[9, 5, 6, 7, 8]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.rollaxis.md
# maxframe.tensor.rollaxis
### maxframe.tensor.rollaxis(tensor, axis, start=0)
Roll the specified axis backwards, until it lies in a given position.
This function continues to be supported for backward compatibility, but you
should prefer moveaxis.
* **Parameters:**
* **a** (*Tensor*) – Input tensor.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int)) – The axis to roll backwards. The positions of the other axes do not
change relative to one another.
* **start** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The axis is rolled until it lies before this position. The default,
0, results in a “complete” roll.
* **Returns:**
**res** – a view of a is always returned.
* **Return type:**
Tensor
#### SEE ALSO
[`moveaxis`](maxframe.tensor.moveaxis.md#maxframe.tensor.moveaxis)
: Move array axes to new positions.
[`roll`](maxframe.tensor.roll.md#maxframe.tensor.roll)
: Roll the elements of an array by a number of positions along a given axis.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.ones((3,4,5,6))
>>> mt.rollaxis(a, 3, 1).shape
(3, 6, 4, 5)
>>> mt.rollaxis(a, 2).shape
(5, 3, 4, 6)
>>> mt.rollaxis(a, 1, 4).shape
(3, 5, 6, 4)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.round_.md
# maxframe.tensor.round_
### maxframe.tensor.round_(a, decimals=0, out=None)
Evenly round to the given number of decimals.
* **Parameters:**
* **a** (*array_like*) – Input data.
* **decimals** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Number of decimal places to round to (default: 0). If
decimals is negative, it specifies the number of positions to
the left of the decimal point.
* **out** (*Tensor* *,* *optional*) – Alternative output tensor in which to place the result. It must have
the same shape as the expected output, but the type of the output
values will be cast if necessary.
* **Returns:**
**rounded_array** – An tensor of the same type as a, containing the rounded values.
Unless out was specified, a new tensor is created. A reference to
the result is returned.
The real and imaginary parts of complex numbers are rounded
separately. The result of rounding a float is a float.
* **Return type:**
Tensor
#### SEE ALSO
`Tensor.round`
: equivalent method
[`ceil`](maxframe.tensor.ceil.md#maxframe.tensor.ceil), [`fix`](maxframe.tensor.fix.md#maxframe.tensor.fix), [`floor`](maxframe.tensor.floor.md#maxframe.tensor.floor), [`rint`](maxframe.tensor.rint.md#maxframe.tensor.rint), [`trunc`](maxframe.tensor.trunc.md#maxframe.tensor.trunc)
### Notes
For values exactly halfway between rounded decimal values, NumPy
rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0,
-0.5 and 0.5 round to 0.0, etc. Results may also be surprising due
to the inexact representation of decimal fractions in the IEEE
floating point standard <sup>[1](#id2)</sup> and errors introduced when scaling
by powers of ten.
### References
* <a id='id2'>**[1]**</a> “Lecture Notes on the Status of IEEE 754”, William Kahan, [http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF](http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF)
* <a id='id3'>**[2]**</a> “How Futile are Mindless Assessments of Roundoff in Floating-Point Computation?”, William Kahan, [http://www.cs.berkeley.edu/~wkahan/Mindless.pdf](http://www.cs.berkeley.edu/~wkahan/Mindless.pdf)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.around([0.37, 1.64]).execute()
array([ 0., 2.])
>>> mt.around([0.37, 1.64], decimals=1).execute()
array([ 0.4, 1.6])
>>> mt.around([.5, 1.5, 2.5, 3.5, 4.5]).execute() # rounds to nearest even value
array([ 0., 2., 2., 4., 4.])
>>> mt.around([1,2,3,11], decimals=1).execute() # tensor of ints is returned
array([ 1, 2, 3, 11])
>>> mt.around([1,2,3,11], decimals=-1).execute()
array([ 0, 0, 0, 10])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.setdiff1d.md
# maxframe.tensor.setdiff1d
### maxframe.tensor.setdiff1d(ar1, ar2, assume_unique=False)
Find the set difference of two tensors.
Return the unique values in ar1 that are not in ar2.
* **Parameters:**
* **ar1** (*array_like*) – Input tensor.
* **ar2** (*array_like*) – Input comparison tensor.
* **assume_unique** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – If True, the input tensors are both assumed to be unique, which
can speed up the calculation. Default is False.
* **Returns:**
**setdiff1d** – 1D tensor of values in ar1 that are not in ar2. The result
is sorted when assume_unique=False, but otherwise only sorted
if the input is sorted.
* **Return type:**
Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = mt.array([1, 2, 3, 2, 4, 1])
>>> b = mt.array([3, 4, 5, 6])
>>> mt.setdiff1d(a, b).execute()
array([1, 2])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.shape.md
# maxframe.tensor.shape
### maxframe.tensor.shape(a)
Return the shape of a tensor.
* **Parameters:**
**a** (*array_like*) – Input tensor.
* **Returns:**
**shape** – The elements of the shape tuple give the lengths of the
corresponding array dimensions.
* **Return type:**
ExecutableTuple of tensors
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.shape(mt.eye(3)).execute()
(3, 3)
>>> mt.shape([[1, 2]]).execute()
(1, 2)
>>> mt.shape([0]).execute()
(1,)
>>> mt.shape(0).execute()
()
```
```pycon
>>> a = mt.array([(1, 2), (3, 4)], dtype=[('x', 'i4'), ('y', 'i4')])
>>> mt.shape(a).execute()
(2,)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.sign.md
# maxframe.tensor.sign
### maxframe.tensor.sign(x, out=None, where=None, \*\*kwargs)
Returns an element-wise indication of the sign of a number.
The sign function returns `-1 if x < 0, 0 if x==0, 1 if x > 0`. nan
is returned for nan inputs.
For complex inputs, the sign function returns
`sign(x.real) + 0j if x.real != 0 else sign(x.imag) + 0j`.
complex(nan, 0) is returned for complex nan inputs.
* **Parameters:**
* **x** (*array_like*) – Input values.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The sign of x.
* **Return type:**
Tensor
### Notes
There is more than one definition of sign in common use for complex
numbers. The definition used here is equivalent to $x/\sqrt{x*x}$
which is different from a common alternative, $x/|x|$.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.sign([-5., 4.5]).execute()
array([-1., 1.])
>>> mt.sign(0).execute()
0
>>> mt.sign(5-2j).execute()
(1+0j)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.signbit.md
# maxframe.tensor.signbit
### maxframe.tensor.signbit(x, out=None, where=None, \*\*kwargs)
Returns element-wise True where signbit is set (less than zero).
* **Parameters:**
* **x** (*array_like*) – The input value(s).
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**result** – Output tensor, or reference to out if that was supplied.
* **Return type:**
Tensor of [bool](https://docs.python.org/3/library/functions.html#bool)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.signbit(-1.2).execute()
True
>>> mt.signbit(mt.array([1, -2.3, 2.1])).execute()
array([False, True, False])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.sin.md
# maxframe.tensor.sin
### maxframe.tensor.sin(x, out=None, where=None, \*\*kwargs)
Trigonometric sine, element-wise.
* **Parameters:**
* **x** (*array_like*) – Angle, in radians ($2 \pi$ rad equals 360 degrees).
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The sine of each element of x.
* **Return type:**
array_like
#### SEE ALSO
[`arcsin`](maxframe.tensor.arcsin.md#maxframe.tensor.arcsin), [`sinh`](maxframe.tensor.sinh.md#maxframe.tensor.sinh), [`cos`](maxframe.tensor.cos.md#maxframe.tensor.cos)
### Notes
The sine is one of the fundamental functions of trigonometry (the
mathematical study of triangles). Consider a circle of radius 1
centered on the origin. A ray comes in from the $+x$ axis, makes
an angle at the origin (measured counter-clockwise from that axis), and
departs from the origin. The $y$ coordinate of the outgoing
ray’s intersection with the unit circle is the sine of that angle. It
ranges from -1 for $x=3\pi / 2$ to +1 for $\pi / 2.$ The
function has zeroes where the angle is a multiple of $\pi$.
Sines of angles between $\pi$ and $2\pi$ are negative.
The numerous properties of the sine and related functions are included
in any standard trigonometry text.
### Examples
Print sine of one angle:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.sin(mt.pi/2.).execute()
1.0
```
Print sines of an array of angles given in degrees:
```pycon
>>> mt.sin(mt.array((0., 30., 45., 60., 90.)) * mt.pi / 180. ).execute()
array([ 0. , 0.5 , 0.70710678, 0.8660254 , 1. ])
```
Plot the sine function:
```pycon
>>> import matplotlib.pylab as plt
>>> x = mt.linspace(-mt.pi, mt.pi, 201)
>>> plt.plot(x.execute(), mt.sin(x).execute())
>>> plt.xlabel('Angle [rad]')
>>> plt.ylabel('sin(x)')
>>> plt.axis('tight')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.sinc.md
# maxframe.tensor.sinc
### maxframe.tensor.sinc(x, \*\*kwargs)
Return the sinc function.
The sinc function is $\\sin(\\pi x)/(\\pi x)$.
* **Parameters:**
**x** (*Tensor*) – Tensor (possibly multi-dimensional) of values for which to to
calculate `sinc(x)`.
* **Returns:**
**out** – `sinc(x)`, which has the same shape as the input.
* **Return type:**
Tensor
### Notes
`sinc(0)` is the limit value 1.
The name sinc is short for “sine cardinal” or “sinus cardinalis”.
The sinc function is used in various signal processing applications,
including in anti-aliasing, in the construction of a Lanczos resampling
filter, and in interpolation.
For bandlimited interpolation of discrete-time signals, the ideal
interpolation kernel is proportional to the sinc function.
### References
* <a id='id1'>**[1]**</a> Weisstein, Eric W. “Sinc Function.” From MathWorld–A Wolfram Web Resource. [http://mathworld.wolfram.com/SincFunction.html](http://mathworld.wolfram.com/SincFunction.html)
* <a id='id2'>**[2]**</a> Wikipedia, “Sinc function”, [http://en.wikipedia.org/wiki/Sinc_function](http://en.wikipedia.org/wiki/Sinc_function)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.linspace(-4, 4, 41)
>>> mt.sinc(x).execute()
array([ -3.89804309e-17, -4.92362781e-02, -8.40918587e-02,
-8.90384387e-02, -5.84680802e-02, 3.89804309e-17,
6.68206631e-02, 1.16434881e-01, 1.26137788e-01,
8.50444803e-02, -3.89804309e-17, -1.03943254e-01,
-1.89206682e-01, -2.16236208e-01, -1.55914881e-01,
3.89804309e-17, 2.33872321e-01, 5.04551152e-01,
7.56826729e-01, 9.35489284e-01, 1.00000000e+00,
9.35489284e-01, 7.56826729e-01, 5.04551152e-01,
2.33872321e-01, 3.89804309e-17, -1.55914881e-01,
-2.16236208e-01, -1.89206682e-01, -1.03943254e-01,
-3.89804309e-17, 8.50444803e-02, 1.26137788e-01,
1.16434881e-01, 6.68206631e-02, 3.89804309e-17,
-5.84680802e-02, -8.90384387e-02, -8.40918587e-02,
-4.92362781e-02, -3.89804309e-17])
```
```pycon
>>> import matplotlib.pyplot as plt
>>> plt.plot(x.execute(), np.sinc(x).execute())
[<matplotlib.lines.Line2D object at 0x...>]
>>> plt.title("Sinc Function")
<matplotlib.text.Text object at 0x...>
>>> plt.ylabel("Amplitude")
<matplotlib.text.Text object at 0x...>
>>> plt.xlabel("X")
<matplotlib.text.Text object at 0x...>
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.sinh.md
# maxframe.tensor.sinh
### maxframe.tensor.sinh(x, out=None, where=None, \*\*kwargs)
Hyperbolic sine, element-wise.
Equivalent to `1/2 * (mt.exp(x) - mt.exp(-x))` or
`-1j * mt.sin(1j*x)`.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The corresponding hyperbolic sine values.
* **Return type:**
Tensor
### Notes
If out is provided, the function writes the result into it,
and returns a reference to out. (See Examples)
### References
M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions.
New York, NY: Dover, 1972, pg. 83.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.sinh(0).execute()
0.0
>>> mt.sinh(mt.pi*1j/2).execute()
1j
>>> mt.sinh(mt.pi*1j).execute() # (exact value is 0)
1.2246063538223773e-016j
>>> # Discrepancy due to vagaries of floating point arithmetic.
```
```pycon
>>> # Example of providing the optional output parameter
>>> out1 = mt.zeros(1)
>>> out2 = mt.sinh([0.1], out1)
>>> out2 is out1
True
```
```pycon
>>> # Example of ValueError due to provision of shape mis-matched `out`
>>> mt.sinh(mt.zeros((3,3)),mt.zeros((2,2))).execute()
Traceback (most recent call last):
...
ValueError: operators could not be broadcast together with shapes (3,3) (2,2)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.sort.md
# maxframe.tensor.sort
### maxframe.tensor.sort(a, axis=-1, kind=None, order=None, , stable=None, parallel_kind=None, psrs_kinds=None, return_index=False, \*\*kw)
Return a sorted copy of a tensor.
* **Parameters:**
* **a** (*array_like*) – Tensor to be sorted.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *None* *,* *optional*) – Axis along which to sort. If None, the tensor is flattened before
sorting. The default is -1, which sorts along the last axis.
* **kind** ( *{'quicksort'* *,* *'mergesort'* *,* *'heapsort'* *,* *'stable'}* *,* *optional*) – Sorting algorithm. The default is ‘quicksort’. Note that both ‘stable’
and ‘mergesort’ use timsort or radix sort under the covers and, in general,
the actual implementation will vary with data type. The ‘mergesort’ option
is retained for backwards compatibility.
Note that this argument would not take effect if a has more than
1 chunk on the sorting axis.
* **order** ([*str*](https://docs.python.org/3/library/stdtypes.html#str) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* [*str*](https://docs.python.org/3/library/stdtypes.html#str) *,* *optional*) – When a is a tensor with fields defined, this argument specifies
which fields to compare first, second, etc. A single field can
be specified as a string, and not all fields need be specified,
but unspecified fields will still be used, in the order in which
they come up in the dtype, to break ties.
* **stable** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Sort stability. If True, the returned array will maintain the relative
order of a values which compare as equal. If False or None, this
is not guaranteed. Internally, this option selects kind=’stable’.
Default: None.
* **parallel_kind** ( *{'PSRS'}* *,* *optional*) – Parallel sorting algorithm, for the details, refer to:
[http://csweb.cs.wfu.edu/bigiron/LittleFE-PSRS/build/html/PSRSalgorithm.html](http://csweb.cs.wfu.edu/bigiron/LittleFE-PSRS/build/html/PSRSalgorithm.html)
* **psrs_kinds** (*list with 3 elements* *,* *optional*) – Sorting algorithms during PSRS algorithm.
* **return_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool)) – Return indices as well if True.
* **Returns:**
**sorted_tensor** – Tensor of the same type and shape as a.
* **Return type:**
Tensor
#### SEE ALSO
`Tensor.sort`
: Method to sort a tensor in-place.
[`argsort`](maxframe.tensor.argsort.md#maxframe.tensor.argsort)
: Indirect sort.
`lexsort`
: Indirect stable sort on multiple keys.
`searchsorted`
: Find elements in a sorted tensor.
[`partition`](maxframe.tensor.partition.md#maxframe.tensor.partition)
: Partial sort.
### Notes
The various sorting algorithms are characterized by their average speed,
worst case performance, work space size, and whether they are stable. A
stable sort keeps items with the same key in the same relative
order. The four algorithms implemented in NumPy have the following
properties:
| kind | speed | worst case | work space | stable |
|-------------|---------|--------------|--------------|----------|
| ‘quicksort’ | 1 | O(n^2) | 0 | no |
| ‘heapsort’ | 3 | O(n\*log(n)) | 0 | no |
| ‘mergesort’ | 2 | O(n\*log(n)) | ~n/2 | yes |
| ‘timsort’ | 2 | O(n\*log(n)) | ~n/2 | yes |
#### NOTE
The datatype determines which of ‘mergesort’ or ‘timsort’
is actually used, even if ‘mergesort’ is specified. User selection
at a finer scale is not currently available.
All the sort algorithms make temporary copies of the data when
sorting along any but the last axis. Consequently, sorting along
the last axis is faster and uses less space than sorting along
any other axis.
The sort order for complex numbers is lexicographic. If both the real
and imaginary parts are non-nan then the order is determined by the
real parts except when they are equal, in which case the order is
determined by the imaginary parts.
quicksort has been changed to an introsort which will switch
heapsort when it does not make enough progress. This makes its
worst case O(n\*log(n)).
‘stable’ automatically choses the best stable sorting algorithm
for the data type being sorted. It, along with ‘mergesort’ is
currently mapped to timsort or radix sort depending on the
data type. API forward compatibility currently limits the
ability to select the implementation and it is hardwired for the different
data types.
Timsort is added for better performance on already or nearly
sorted data. On random data timsort is almost identical to
mergesort. It is now used for stable sort while quicksort is still the
default sort if none is chosen. For details of timsort, refer to
[CPython listsort.txt](https://github.com/python/cpython/blob/3.7/Objects/listsort.txt).
‘mergesort’ and ‘stable’ are mapped to radix sort for integer data types. Radix sort is an
O(n) sort instead of O(n log n).
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = mt.array([[1,4],[3,1]])
>>> mt.sort(a).execute() # sort along the last axis
array([[1, 4],
[1, 3]])
>>> mt.sort(a, axis=None).execute() # sort the flattened tensor
array([1, 1, 3, 4])
>>> mt.sort(a, axis=0).execute() # sort along the first axis
array([[1, 1],
[3, 4]])
```
Use the order keyword to specify a field to use when sorting a
structured array:
```pycon
>>> dtype = [('name', 'S10'), ('height', float), ('age', int)]
>>> values = [('Arthur', 1.8, 41), ('Lancelot', 1.9, 38),
... ('Galahad', 1.7, 38)]
>>> a = mt.array(values, dtype=dtype) # create a structured tensor
>>> mt.sort(a, order='height').execute()
array([('Galahad', 1.7, 38), ('Arthur', 1.8, 41),
('Lancelot', 1.8999999999999999, 38)],
dtype=[('name', '|S10'), ('height', '<f8'), ('age', '<i4')])
```
Sort by age, then height if ages are equal:
```pycon
>>> mt.sort(a, order=['age', 'height']).execute()
array([('Galahad', 1.7, 38), ('Lancelot', 1.8999999999999999, 38),
('Arthur', 1.8, 41)],
dtype=[('name', '|S10'), ('height', '<f8'), ('age', '<i4')])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.spacing.md
# maxframe.tensor.spacing
### maxframe.tensor.spacing(x, out=None, where=None, \*\*kwargs)
Return the distance between x and the nearest adjacent number.
* **Parameters:**
* **x** (*array_like*) – Values to find the spacing of.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – The spacing of values of x1.
* **Return type:**
array_like
### Notes
It can be considered as a generalization of EPS:
`spacing(mt.float64(1)) == mt.finfo(mt.float64).eps`, and there
should not be any representable number between `x + spacing(x)` and
x for any finite x.
Spacing of +- inf and NaN is NaN.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> (mt.spacing(1) == mt.finfo(mt.float64).eps).execute()
True
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.airy.md
# maxframe.tensor.special.airy
### maxframe.tensor.special.airy(z, out=None)
Airy functions and their derivatives.
* **Parameters:**
* **z** (*array_like*) – Real or complex argument.
* **out** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ndarray* *,* *optional*) – Optional output arrays for the function values
* **Returns:**
**Ai, Aip, Bi, Bip** – Airy functions Ai and Bi, and their derivatives Aip and Bip.
* **Return type:**
4-tuple of scalar or ndarray
#### SEE ALSO
[`airye`](maxframe.tensor.special.airye.md#maxframe.tensor.special.airye)
: exponentially scaled Airy functions.
### Notes
The Airy functions Ai and Bi are two independent solutions of
$$
y''(x) = x y(x).
$$
For real z in [-10, 10], the computation is carried out by calling
the Cephes <sup>[1](#id3)</sup> airy routine, which uses power series summation
for small z and rational minimax approximations for large z.
Outside this range, the AMOS <sup>[2](#id4)</sup> zairy and zbiry routines are
employed. They are computed using power series for $|z| < 1$ and
the following relations to modified Bessel functions for larger z
(where $t \equiv 2 z^{3/2}/3$):
$$
Ai(z) = \frac{1}{\pi \sqrt{3}} K_{1/3}(t)
Ai'(z) = -\frac{z}{\pi \sqrt{3}} K_{2/3}(t)
Bi(z) = \sqrt{\frac{z}{3}} \left(I_{-1/3}(t) + I_{1/3}(t) \right)
Bi'(z) = \frac{z}{\sqrt{3}} \left(I_{-2/3}(t) + I_{2/3}(t)\right)
$$
### References
* <a id='id3'>**[1]**</a> Cephes Mathematical Functions Library, [http://www.netlib.org/cephes/](http://www.netlib.org/cephes/)
* <a id='id4'>**[2]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.airye.md
# maxframe.tensor.special.airye
### maxframe.tensor.special.airye(z, out=None)
Exponentially scaled Airy functions and their derivatives.
Scaling:
```default
eAi = Ai * exp(2.0/3.0*z*sqrt(z))
eAip = Aip * exp(2.0/3.0*z*sqrt(z))
eBi = Bi * exp(-abs(2.0/3.0*(z*sqrt(z)).real))
eBip = Bip * exp(-abs(2.0/3.0*(z*sqrt(z)).real))
```
* **Parameters:**
* **z** (*array_like*) – Real or complex argument.
* **out** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ndarray* *,* *optional*) – Optional output arrays for the function values
* **Returns:**
**eAi, eAip, eBi, eBip** – Exponentially scaled Airy functions eAi and eBi, and their derivatives
eAip and eBip
* **Return type:**
4-tuple of scalar or ndarray
#### SEE ALSO
[`airy`](maxframe.tensor.special.airy.md#maxframe.tensor.special.airy)
### Notes
Wrapper for the AMOS <sup>[1](#id2)</sup> routines zairy and zbiry.
### References
* <a id='id2'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.beta.md
# maxframe.tensor.special.beta
### maxframe.tensor.special.beta(a, b, out=None, \*\*kwargs)
Beta function.
This function is defined in <sup>[1](#id2)</sup> as
$$
B(a, b) = \int_0^1 t^{a-1}(1-t)^{b-1}dt
= \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)},
$$
where $\Gamma$ is the gamma function.
* **Parameters:**
* **a** (*array_like*) – Real-valued arguments
* **b** (*array_like*) – Real-valued arguments
* **out** (*ndarray* *,* *optional*) – Optional output array for the function result
* **Returns:**
Value of the beta function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`gamma`](maxframe.tensor.special.gamma.md#maxframe.tensor.special.gamma)
: the gamma function
[`betainc`](maxframe.tensor.special.betainc.md#maxframe.tensor.special.betainc)
: the regularized incomplete beta function
[`betaln`](maxframe.tensor.special.betaln.md#maxframe.tensor.special.betaln)
: the natural logarithm of the absolute value of the beta function
### References
* <a id='id2'>**[1]**</a> NIST Digital Library of Mathematical Functions, Eq. 5.12.1. [https://dlmf.nist.gov/5.12](https://dlmf.nist.gov/5.12)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.betainc.md
# maxframe.tensor.special.betainc
### maxframe.tensor.special.betainc(a, b, x, out=None, \*\*kwargs)
Regularized incomplete beta function.
Computes the regularized incomplete beta function, defined as <sup>[1](#id2)</sup>:
$$
I_x(a, b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \int_0^x
t^{a-1}(1-t)^{b-1}dt,
$$
for $0 \leq x \leq 1$.
This function is the cumulative distribution function for the beta
distribution; its range is [0, 1].
* **Parameters:**
* **a** (*array_like*) – Positive, real-valued parameters
* **b** (*array_like*) – Positive, real-valued parameters
* **x** (*array_like*) – Real-valued such that $0 \leq x \leq 1$,
the upper limit of integration
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Value of the regularized incomplete beta function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`beta`](maxframe.tensor.special.beta.md#maxframe.tensor.special.beta)
: beta function
[`betaincinv`](maxframe.tensor.special.betaincinv.md#maxframe.tensor.special.betaincinv)
: inverse of the regularized incomplete beta function
`betaincc`
: complement of the regularized incomplete beta function
[`scipy.stats.beta`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.beta.html#scipy.stats.beta)
: beta distribution
### Notes
The term *regularized* in the name of this function refers to the
scaling of the function by the gamma function terms shown in the
formula. When not qualified as *regularized*, the name *incomplete
beta function* often refers to just the integral expression,
without the gamma terms. One can use the function beta from
scipy.special to get this “nonregularized” incomplete beta
function by multiplying the result of `betainc(a, b, x)` by
`beta(a, b)`.
### References
* <a id='id2'>**[1]**</a> NIST Digital Library of Mathematical Functions [https://dlmf.nist.gov/8.17](https://dlmf.nist.gov/8.17)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.betaincinv.md
# maxframe.tensor.special.betaincinv
### maxframe.tensor.special.betaincinv(a, b, y, out=None, \*\*kwargs)
Inverse of the regularized incomplete beta function.
Computes $x$ such that:
$$
y = I_x(a, b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}
\int_0^x t^{a-1}(1-t)^{b-1}dt,
$$
where $I_x$ is the normalized incomplete beta function betainc
and $\Gamma$ is the gamma function <sup>[1](#id2)</sup>.
* **Parameters:**
* **a** (*array_like*) – Positive, real-valued parameters
* **b** (*array_like*) – Positive, real-valued parameters
* **y** (*array_like*) – Real-valued input
* **out** (*ndarray* *,* *optional*) – Optional output array for function values
* **Returns:**
Value of the inverse of the regularized incomplete beta function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`betainc`](maxframe.tensor.special.betainc.md#maxframe.tensor.special.betainc)
: regularized incomplete beta function
[`gamma`](maxframe.tensor.special.gamma.md#maxframe.tensor.special.gamma)
: gamma function
### References
* <a id='id2'>**[1]**</a> NIST Digital Library of Mathematical Functions [https://dlmf.nist.gov/8.17](https://dlmf.nist.gov/8.17)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.betaln.md
# maxframe.tensor.special.betaln
### maxframe.tensor.special.betaln(a, b, out=None, \*\*kwargs)
Natural logarithm of absolute value of beta function.
Computes `ln(abs(beta(a, b)))`.
* **Parameters:**
* **a** (*array_like*) – Positive, real-valued parameters
* **b** (*array_like*) – Positive, real-valued parameters
* **out** (*ndarray* *,* *optional*) – Optional output array for function values
* **Returns:**
Value of the betaln function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`gamma`](maxframe.tensor.special.gamma.md#maxframe.tensor.special.gamma)
: the gamma function
[`betainc`](maxframe.tensor.special.betainc.md#maxframe.tensor.special.betainc)
: the regularized incomplete beta function
[`beta`](maxframe.tensor.special.beta.md#maxframe.tensor.special.beta)
: the beta function
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.dawsn.md
# maxframe.tensor.special.dawsn
### maxframe.tensor.special.dawsn(x, out=None, where=None, \*\*kwargs)
Dawson’s integral.
Computes:
```default
exp(-x**2) * integral(exp(t**2), t=0..x).
```
* **Parameters:**
* **x** (*array_like*) – Function parameter.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**y** – Value of the integral.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`wofz`](maxframe.tensor.special.wofz.md#maxframe.tensor.special.wofz), [`erf`](maxframe.tensor.special.erf.md#maxframe.tensor.special.erf), [`erfc`](maxframe.tensor.special.erfc.md#maxframe.tensor.special.erfc), [`erfcx`](maxframe.tensor.special.erfcx.md#maxframe.tensor.special.erfcx), [`erfi`](maxframe.tensor.special.erfi.md#maxframe.tensor.special.erfi)
### References
* <a id='id1'>**[1]**</a> Steven G. Johnson, Faddeeva W function implementation. [http://ab-initio.mit.edu/Faddeeva](http://ab-initio.mit.edu/Faddeeva)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.digamma.md
# maxframe.tensor.special.digamma
### maxframe.tensor.special.digamma(x, out=None, \*\*kwargs)
psi(z, out=None)
The digamma function.
The logarithmic derivative of the gamma function evaluated at `z`.
* **Parameters:**
* **z** (*array_like*) – Real or complex argument.
* **out** (*ndarray* *,* *optional*) – Array for the computed values of `psi`.
* **Returns:**
**digamma** – Computed values of `psi`.
* **Return type:**
scalar or ndarray
### Notes
For large values not close to the negative real axis, `psi` is
computed using the asymptotic series (5.11.2) from <sup>[1](#id5)</sup>. For small
arguments not close to the negative real axis, the recurrence
relation (5.5.2) from <sup>[1](#id5)</sup> is used until the argument is large
enough to use the asymptotic series. For values close to the
negative real axis, the reflection formula (5.5.4) from <sup>[1](#id5)</sup> is
used first. Note that `psi` has a family of zeros on the
negative real axis which occur between the poles at nonpositive
integers. Around the zeros the reflection formula suffers from
cancellation and the implementation loses precision. The sole
positive zero and the first negative zero, however, are handled
separately by precomputing series expansions using <sup>[2](#id6)</sup>, so the
function should maintain full accuracy around the origin.
### References
* <a id='id5'>**[1]**</a> NIST Digital Library of Mathematical Functions [https://dlmf.nist.gov/5](https://dlmf.nist.gov/5)
* <a id='id6'>**[2]**</a> Fredrik Johansson and others. “mpmath: a Python library for arbitrary-precision floating-point arithmetic” (Version 0.19) [http://mpmath.org/](http://mpmath.org/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.ellip_harm.md
# maxframe.tensor.special.ellip_harm
### maxframe.tensor.special.ellip_harm(h2, k2, n, p, s, signm=1, signn=1, \*\*kwargs)
Ellipsoidal harmonic functions E^p_n(l)
These are also known as Lame functions of the first kind, and are
solutions to the Lame equation:
$$
(s^2 - h^2)(s^2 - k^2)E''(s)
+ s(2s^2 - h^2 - k^2)E'(s) + (a - q s^2)E(s) = 0
$$
where $q = (n+1)n$ and $a$ is the eigenvalue (not
returned) corresponding to the solutions.
* **Parameters:**
* **h2** ([*float*](https://docs.python.org/3/library/functions.html#float)) – `h**2`
* **k2** ([*float*](https://docs.python.org/3/library/functions.html#float)) – `k**2`; should be larger than `h**2`
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Degree
* **s** ([*float*](https://docs.python.org/3/library/functions.html#float)) – Coordinate
* **p** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Order, can range between [1,2n+1]
* **signm** ( *{1* *,* *-1}* *,* *optional*) – Sign of prefactor of functions. Can be +/-1. See Notes.
* **signn** ( *{1* *,* *-1}* *,* *optional*) – Sign of prefactor of functions. Can be +/-1. See Notes.
* **Returns:**
**E** – the harmonic $E^p_n(s)$
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float)
#### SEE ALSO
[`ellip_harm_2`](maxframe.tensor.special.ellip_harm_2.md#maxframe.tensor.special.ellip_harm_2), [`ellip_normal`](maxframe.tensor.special.ellip_normal.md#maxframe.tensor.special.ellip_normal)
### Notes
The geometric interpretation of the ellipsoidal functions is
explained in <sup>[2](#id5)</sup>, <sup>[3](#id8)</sup>, <sup>[4](#id9)</sup>. The signm and signn arguments control the
sign of prefactors for functions according to their type:
```default
K : +1
L : signm
M : signn
N : signm*signn
```
### References
* <a id='id4'>**[1]**</a> Digital Library of Mathematical Functions 29.12 [https://dlmf.nist.gov/29.12](https://dlmf.nist.gov/29.12)
* <a id='id5'>**[2]**</a> Bardhan and Knepley, “Computational science and re-discovery: open-source implementations of ellipsoidal harmonics for problems in potential theory”, Comput. Sci. Disc. 5, 014006 (2012) ``` :doi:`10.1088/1749-4699/5/1/014006` ``` .
* <a id='id8'>**[3]**</a> David J.and Dechambre P, “Computation of Ellipsoidal Gravity Field Harmonics for small solar system bodies” pp. 30-36, 2000
* <a id='id9'>**[4]**</a> George Dassios, “Ellipsoidal Harmonics: Theory and Applications” pp. 418, 2012
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.ellip_harm_2.md
# maxframe.tensor.special.ellip_harm_2
### maxframe.tensor.special.ellip_harm_2(h2, k2, n, p, s, \*\*kwargs)
Ellipsoidal harmonic functions F^p_n(l)
These are also known as Lame functions of the second kind, and are
solutions to the Lame equation:
$$
(s^2 - h^2)(s^2 - k^2)F''(s)
+ s(2s^2 - h^2 - k^2)F'(s) + (a - q s^2)F(s) = 0
$$
where $q = (n+1)n$ and $a$ is the eigenvalue (not
returned) corresponding to the solutions.
* **Parameters:**
* **h2** ([*float*](https://docs.python.org/3/library/functions.html#float)) – `h**2`
* **k2** ([*float*](https://docs.python.org/3/library/functions.html#float)) – `k**2`; should be larger than `h**2`
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Degree.
* **p** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Order, can range between [1,2n+1].
* **s** ([*float*](https://docs.python.org/3/library/functions.html#float)) – Coordinate
* **Returns:**
**F** – The harmonic $F^p_n(s)$
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float)
#### SEE ALSO
[`ellip_harm`](maxframe.tensor.special.ellip_harm.md#maxframe.tensor.special.ellip_harm), [`ellip_normal`](maxframe.tensor.special.ellip_normal.md#maxframe.tensor.special.ellip_normal)
### Notes
Lame functions of the second kind are related to the functions of the first kind:
$$
F^p_n(s)=(2n + 1)E^p_n(s)\int_{0}^{1/s}
\frac{du}{(E^p_n(1/u))^2\sqrt{(1-u^2k^2)(1-u^2h^2)}}
$$
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.ellip_normal.md
# maxframe.tensor.special.ellip_normal
### maxframe.tensor.special.ellip_normal(h2, k2, n, p, \*\*kwargs)
Ellipsoidal harmonic normalization constants gamma^p_n
The normalization constant is defined as
$$
\gamma^p_n=8\int_{0}^{h}dx\int_{h}^{k}dy
\frac{(y^2-x^2)(E^p_n(y)E^p_n(x))^2}{\sqrt((k^2-y^2)(y^2-h^2)(h^2-x^2)(k^2-x^2)}
$$
* **Parameters:**
* **h2** ([*float*](https://docs.python.org/3/library/functions.html#float)) – `h**2`
* **k2** ([*float*](https://docs.python.org/3/library/functions.html#float)) – `k**2`; should be larger than `h**2`
* **n** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Degree.
* **p** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Order, can range between [1,2n+1].
* **Returns:**
**gamma** – The normalization constant $\gamma^p_n$
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float)
#### SEE ALSO
[`ellip_harm`](maxframe.tensor.special.ellip_harm.md#maxframe.tensor.special.ellip_harm), [`ellip_harm_2`](maxframe.tensor.special.ellip_harm_2.md#maxframe.tensor.special.ellip_harm_2)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.ellipe.md
# maxframe.tensor.special.ellipe
### maxframe.tensor.special.ellipe(x, \*\*kwargs)
Complete elliptic integral of the second kind
This function is defined as
$$
E(m) = \int_0^{\pi/2} [1 - m \sin(t)^2]^{1/2} dt
$$
* **Parameters:**
* **m** (*array_like*) – Defines the parameter of the elliptic integral.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**E** – Value of the elliptic integral.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`ellipkm1`](maxframe.tensor.special.ellipkm1.md#maxframe.tensor.special.ellipkm1)
: Complete elliptic integral of the first kind, near m = 1
[`ellipk`](maxframe.tensor.special.ellipk.md#maxframe.tensor.special.ellipk)
: Complete elliptic integral of the first kind
[`ellipkinc`](maxframe.tensor.special.ellipkinc.md#maxframe.tensor.special.ellipkinc)
: Incomplete elliptic integral of the first kind
[`ellipeinc`](maxframe.tensor.special.ellipeinc.md#maxframe.tensor.special.ellipeinc)
: Incomplete elliptic integral of the second kind
`elliprd`
: Symmetric elliptic integral of the second kind.
[`elliprg`](maxframe.tensor.special.elliprg.md#maxframe.tensor.special.elliprg)
: Completely-symmetric elliptic integral of the second kind.
### Notes
Wrapper for the Cephes <sup>[1](#id4)</sup> routine ellpe.
For m > 0 the computation uses the approximation,
$$
E(m) \approx P(1-m) - (1-m) \log(1-m) Q(1-m),
$$
where $P$ and $Q$ are tenth-order polynomials. For
m < 0, the relation
$$
E(m) = E(m/(m - 1)) \sqrt(1-m)
$$
is used.
The parameterization in terms of $m$ follows that of section
17.2 in <sup>[2](#id5)</sup>. Other parameterizations in terms of the
complementary parameter $1 - m$, modular angle
$\sin^2(\alpha) = m$, or modulus $k^2 = m$ are also
used, so be careful that you choose the correct parameter.
The Legendre E integral is related to Carlson’s symmetric R_D or R_G
functions in multiple ways <sup>[3](#id6)</sup>. For example,
$$
E(m) = 2 R_G(0, 1-k^2, 1) .
$$
### References
* <a id='id4'>**[1]**</a> Cephes Mathematical Functions Library, [http://www.netlib.org/cephes/](http://www.netlib.org/cephes/)
* <a id='id5'>**[2]**</a> Milton Abramowitz and Irene A. Stegun, eds. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover, 1972.
* <a id='id6'>**[3]**</a> NIST Digital Library of Mathematical Functions. [http://dlmf.nist.gov/](http://dlmf.nist.gov/), Release 1.0.28 of 2020-09-15. See Sec. 19.25(i) [https://dlmf.nist.gov/19.25#i](https://dlmf.nist.gov/19.25#i)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.ellipeinc.md
# maxframe.tensor.special.ellipeinc
### maxframe.tensor.special.ellipeinc(phi, m, \*\*kwargs)
Incomplete elliptic integral of the second kind
This function is defined as
$$
E(\phi, m) = \int_0^{\phi} [1 - m \sin(t)^2]^{1/2} dt
$$
* **Parameters:**
* **phi** (*array_like*) – amplitude of the elliptic integral.
* **m** (*array_like*) – parameter of the elliptic integral.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**E** – Value of the elliptic integral.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`ellipkm1`](maxframe.tensor.special.ellipkm1.md#maxframe.tensor.special.ellipkm1)
: Complete elliptic integral of the first kind, near m = 1
[`ellipk`](maxframe.tensor.special.ellipk.md#maxframe.tensor.special.ellipk)
: Complete elliptic integral of the first kind
[`ellipkinc`](maxframe.tensor.special.ellipkinc.md#maxframe.tensor.special.ellipkinc)
: Incomplete elliptic integral of the first kind
[`ellipe`](maxframe.tensor.special.ellipe.md#maxframe.tensor.special.ellipe)
: Complete elliptic integral of the second kind
`elliprd`
: Symmetric elliptic integral of the second kind.
[`elliprf`](maxframe.tensor.special.elliprf.md#maxframe.tensor.special.elliprf)
: Completely-symmetric elliptic integral of the first kind.
[`elliprg`](maxframe.tensor.special.elliprg.md#maxframe.tensor.special.elliprg)
: Completely-symmetric elliptic integral of the second kind.
### Notes
Wrapper for the Cephes <sup>[1](#id4)</sup> routine ellie.
Computation uses arithmetic-geometric means algorithm.
The parameterization in terms of $m$ follows that of section
17.2 in <sup>[2](#id5)</sup>. Other parameterizations in terms of the
complementary parameter $1 - m$, modular angle
$\sin^2(\alpha) = m$, or modulus $k^2 = m$ are also
used, so be careful that you choose the correct parameter.
The Legendre E incomplete integral can be related to combinations
of Carlson’s symmetric integrals R_D, R_F, and R_G in multiple
ways <sup>[3](#id6)</sup>. For example, with $c = \csc^2\phi$,
$$
E(\phi, m) = R_F(c-1, c-k^2, c)
- \frac{1}{3} k^2 R_D(c-1, c-k^2, c) .
$$
### References
* <a id='id4'>**[1]**</a> Cephes Mathematical Functions Library, [http://www.netlib.org/cephes/](http://www.netlib.org/cephes/)
* <a id='id5'>**[2]**</a> Milton Abramowitz and Irene A. Stegun, eds. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover, 1972.
* <a id='id6'>**[3]**</a> NIST Digital Library of Mathematical Functions. [http://dlmf.nist.gov/](http://dlmf.nist.gov/), Release 1.0.28 of 2020-09-15. See Sec. 19.25(i) [https://dlmf.nist.gov/19.25#i](https://dlmf.nist.gov/19.25#i)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.ellipk.md
# maxframe.tensor.special.ellipk
### maxframe.tensor.special.ellipk(x, \*\*kwargs)
Complete elliptic integral of the first kind.
This function is defined as
$$
K(m) = \int_0^{\pi/2} [1 - m \sin(t)^2]^{-1/2} dt
$$
* **Parameters:**
* **m** (*array_like*) – The parameter of the elliptic integral.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**K** – Value of the elliptic integral.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`ellipkm1`](maxframe.tensor.special.ellipkm1.md#maxframe.tensor.special.ellipkm1)
: Complete elliptic integral of the first kind around m = 1
[`ellipkinc`](maxframe.tensor.special.ellipkinc.md#maxframe.tensor.special.ellipkinc)
: Incomplete elliptic integral of the first kind
[`ellipe`](maxframe.tensor.special.ellipe.md#maxframe.tensor.special.ellipe)
: Complete elliptic integral of the second kind
[`ellipeinc`](maxframe.tensor.special.ellipeinc.md#maxframe.tensor.special.ellipeinc)
: Incomplete elliptic integral of the second kind
[`elliprf`](maxframe.tensor.special.elliprf.md#maxframe.tensor.special.elliprf)
: Completely-symmetric elliptic integral of the first kind.
### Notes
For more precision around point m = 1, use ellipkm1, which this
function calls.
The parameterization in terms of $m$ follows that of section
17.2 in <sup>[1](#id3)</sup>. Other parameterizations in terms of the
complementary parameter $1 - m$, modular angle
$\sin^2(\alpha) = m$, or modulus $k^2 = m$ are also
used, so be careful that you choose the correct parameter.
The Legendre K integral is related to Carlson’s symmetric R_F
function by <sup>[2](#id4)</sup>:
$$
K(m) = R_F(0, 1-k^2, 1) .
$$
### References
* <a id='id3'>**[1]**</a> Milton Abramowitz and Irene A. Stegun, eds. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover, 1972.
* <a id='id4'>**[2]**</a> NIST Digital Library of Mathematical Functions. [http://dlmf.nist.gov/](http://dlmf.nist.gov/), Release 1.0.28 of 2020-09-15. See Sec. 19.25(i) [https://dlmf.nist.gov/19.25#i](https://dlmf.nist.gov/19.25#i)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.ellipkinc.md
# maxframe.tensor.special.ellipkinc
### maxframe.tensor.special.ellipkinc(phi, m, \*\*kwargs)
Incomplete elliptic integral of the first kind
This function is defined as
$$
K(\phi, m) = \int_0^{\phi} [1 - m \sin(t)^2]^{-1/2} dt
$$
This function is also called $F(\phi, m)$.
* **Parameters:**
* **phi** (*array_like*) – amplitude of the elliptic integral
* **m** (*array_like*) – parameter of the elliptic integral
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**K** – Value of the elliptic integral
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`ellipkm1`](maxframe.tensor.special.ellipkm1.md#maxframe.tensor.special.ellipkm1)
: Complete elliptic integral of the first kind, near m = 1
[`ellipk`](maxframe.tensor.special.ellipk.md#maxframe.tensor.special.ellipk)
: Complete elliptic integral of the first kind
[`ellipe`](maxframe.tensor.special.ellipe.md#maxframe.tensor.special.ellipe)
: Complete elliptic integral of the second kind
[`ellipeinc`](maxframe.tensor.special.ellipeinc.md#maxframe.tensor.special.ellipeinc)
: Incomplete elliptic integral of the second kind
[`elliprf`](maxframe.tensor.special.elliprf.md#maxframe.tensor.special.elliprf)
: Completely-symmetric elliptic integral of the first kind.
### Notes
Wrapper for the Cephes <sup>[1](#id4)</sup> routine ellik. The computation is
carried out using the arithmetic-geometric mean algorithm.
The parameterization in terms of $m$ follows that of section
17.2 in <sup>[2](#id5)</sup>. Other parameterizations in terms of the
complementary parameter $1 - m$, modular angle
$\sin^2(\alpha) = m$, or modulus $k^2 = m$ are also
used, so be careful that you choose the correct parameter.
The Legendre K incomplete integral (or F integral) is related to
Carlson’s symmetric R_F function <sup>[3](#id6)</sup>.
Setting $c = \csc^2\phi$,
$$
F(\phi, m) = R_F(c-1, c-k^2, c) .
$$
### References
* <a id='id4'>**[1]**</a> Cephes Mathematical Functions Library, [http://www.netlib.org/cephes/](http://www.netlib.org/cephes/)
* <a id='id5'>**[2]**</a> Milton Abramowitz and Irene A. Stegun, eds. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover, 1972.
* <a id='id6'>**[3]**</a> NIST Digital Library of Mathematical Functions. [http://dlmf.nist.gov/](http://dlmf.nist.gov/), Release 1.0.28 of 2020-09-15. See Sec. 19.25(i) [https://dlmf.nist.gov/19.25#i](https://dlmf.nist.gov/19.25#i)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.ellipkm1.md
# maxframe.tensor.special.ellipkm1
### maxframe.tensor.special.ellipkm1(x, \*\*kwargs)
Complete elliptic integral of the first kind around m = 1
This function is defined as
$$
K(p) = \int_0^{\pi/2} [1 - m \sin(t)^2]^{-1/2} dt
$$
where m = 1 - p.
* **Parameters:**
* **p** (*array_like*) – Defines the parameter of the elliptic integral as m = 1 - p.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**K** – Value of the elliptic integral.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`ellipk`](maxframe.tensor.special.ellipk.md#maxframe.tensor.special.ellipk)
: Complete elliptic integral of the first kind
[`ellipkinc`](maxframe.tensor.special.ellipkinc.md#maxframe.tensor.special.ellipkinc)
: Incomplete elliptic integral of the first kind
[`ellipe`](maxframe.tensor.special.ellipe.md#maxframe.tensor.special.ellipe)
: Complete elliptic integral of the second kind
[`ellipeinc`](maxframe.tensor.special.ellipeinc.md#maxframe.tensor.special.ellipeinc)
: Incomplete elliptic integral of the second kind
[`elliprf`](maxframe.tensor.special.elliprf.md#maxframe.tensor.special.elliprf)
: Completely-symmetric elliptic integral of the first kind.
### Notes
Wrapper for the Cephes <sup>[1](#id2)</sup> routine ellpk.
For p <= 1, computation uses the approximation,
$$
K(p) \approx P(p) - \log(p) Q(p),
$$
where $P$ and $Q$ are tenth-order polynomials. The
argument p is used internally rather than m so that the logarithmic
singularity at m = 1 will be shifted to the origin; this preserves
maximum accuracy. For p > 1, the identity
$$
K(p) = K(1/p)/\sqrt(p)
$$
is used.
### References
* <a id='id2'>**[1]**</a> Cephes Mathematical Functions Library, [http://www.netlib.org/cephes/](http://www.netlib.org/cephes/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.elliprc.md
# maxframe.tensor.special.elliprc
### maxframe.tensor.special.elliprc(x, y, \*\*kwargs)
Degenerate symmetric elliptic integral.
The function RC is defined as <sup>[1](#id3)</sup>
$$
R_{\mathrm{C}}(x, y) =
\frac{1}{2} \int_0^{+\infty} (t + x)^{-1/2} (t + y)^{-1} dt
= R_{\mathrm{F}}(x, y, y)
$$
* **Parameters:**
* **x** (*array_like*) – Real or complex input parameters. x can be any number in the
complex plane cut along the negative real axis. y must be non-zero.
* **y** (*array_like*) – Real or complex input parameters. x can be any number in the
complex plane cut along the negative real axis. y must be non-zero.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**R** – Value of the integral. If y is real and negative, the Cauchy
principal value is returned. If both of x and y are real, the
return value is real. Otherwise, the return value is complex.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`elliprf`](maxframe.tensor.special.elliprf.md#maxframe.tensor.special.elliprf)
: Completely-symmetric elliptic integral of the first kind.
`elliprd`
: Symmetric elliptic integral of the second kind.
[`elliprg`](maxframe.tensor.special.elliprg.md#maxframe.tensor.special.elliprg)
: Completely-symmetric elliptic integral of the second kind.
[`elliprj`](maxframe.tensor.special.elliprj.md#maxframe.tensor.special.elliprj)
: Symmetric elliptic integral of the third kind.
### Notes
RC is a degenerate case of the symmetric integral RF: `elliprc(x, y) ==
elliprf(x, y, y)`. It is an elementary function rather than an elliptic
integral.
The code implements Carlson’s algorithm based on the duplication theorems
and series expansion up to the 7th order. <sup>[2](#id4)</sup>
### References
* <a id='id3'>**[1]**</a> B. C. Carlson, ed., Chapter 19 in “Digital Library of Mathematical Functions,” NIST, US Dept. of Commerce. [https://dlmf.nist.gov/19.16.E6](https://dlmf.nist.gov/19.16.E6)
* <a id='id4'>**[2]**</a> B. C. Carlson, “Numerical computation of real or complex elliptic integrals,” Numer. Algorithm, vol. 10, no. 1, pp. 13-26, 1995. [https://arxiv.org/abs/math/9409227](https://arxiv.org/abs/math/9409227) [https://doi.org/10.1007/BF02198293](https://doi.org/10.1007/BF02198293)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.elliprf.md
# maxframe.tensor.special.elliprf
### maxframe.tensor.special.elliprf(x, y, z, \*\*kwargs)
Completely-symmetric elliptic integral of the first kind.
The function RF is defined as <sup>[1](#id3)</sup>
$$
R_{\mathrm{F}}(x, y, z) =
\frac{1}{2} \int_0^{+\infty} [(t + x) (t + y) (t + z)]^{-1/2} dt
$$
* **Parameters:**
* **x** (*array_like*) – Real or complex input parameters. x, y, or z can be any number in
the complex plane cut along the negative real axis, but at most one of
them can be zero.
* **y** (*array_like*) – Real or complex input parameters. x, y, or z can be any number in
the complex plane cut along the negative real axis, but at most one of
them can be zero.
* **z** (*array_like*) – Real or complex input parameters. x, y, or z can be any number in
the complex plane cut along the negative real axis, but at most one of
them can be zero.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**R** – Value of the integral. If all of x, y, and z are real, the return
value is real. Otherwise, the return value is complex.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`elliprc`](maxframe.tensor.special.elliprc.md#maxframe.tensor.special.elliprc)
: Degenerate symmetric integral.
`elliprd`
: Symmetric elliptic integral of the second kind.
[`elliprg`](maxframe.tensor.special.elliprg.md#maxframe.tensor.special.elliprg)
: Completely-symmetric elliptic integral of the second kind.
[`elliprj`](maxframe.tensor.special.elliprj.md#maxframe.tensor.special.elliprj)
: Symmetric elliptic integral of the third kind.
### Notes
The code implements Carlson’s algorithm based on the duplication theorems
and series expansion up to the 7th order (cf.:
[https://dlmf.nist.gov/19.36.i](https://dlmf.nist.gov/19.36.i)) and the AGM algorithm for the complete
integral. <sup>[2](#id4)</sup>
### References
* <a id='id3'>**[1]**</a> B. C. Carlson, ed., Chapter 19 in “Digital Library of Mathematical Functions,” NIST, US Dept. of Commerce. [https://dlmf.nist.gov/19.16.E1](https://dlmf.nist.gov/19.16.E1)
* <a id='id4'>**[2]**</a> B. C. Carlson, “Numerical computation of real or complex elliptic integrals,” Numer. Algorithm, vol. 10, no. 1, pp. 13-26, 1995. [https://arxiv.org/abs/math/9409227](https://arxiv.org/abs/math/9409227) [https://doi.org/10.1007/BF02198293](https://doi.org/10.1007/BF02198293)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.elliprg.md
# maxframe.tensor.special.elliprg
### maxframe.tensor.special.elliprg(x, y, z, \*\*kwargs)
Completely-symmetric elliptic integral of the second kind.
The function RG is defined as <sup>[1](#id4)</sup>
$$
R_{\mathrm{G}}(x, y, z) =
\frac{1}{4} \int_0^{+\infty} [(t + x) (t + y) (t + z)]^{-1/2}
\left(\frac{x}{t + x} + \frac{y}{t + y} + \frac{z}{t + z}\right) t
dt
$$
* **Parameters:**
* **x** (*array_like*) – Real or complex input parameters. x, y, or z can be any number in
the complex plane cut along the negative real axis.
* **y** (*array_like*) – Real or complex input parameters. x, y, or z can be any number in
the complex plane cut along the negative real axis.
* **z** (*array_like*) – Real or complex input parameters. x, y, or z can be any number in
the complex plane cut along the negative real axis.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**R** – Value of the integral. If all of x, y, and z are real, the return
value is real. Otherwise, the return value is complex.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`elliprc`](maxframe.tensor.special.elliprc.md#maxframe.tensor.special.elliprc)
: Degenerate symmetric integral.
`elliprd`
: Symmetric elliptic integral of the second kind.
[`elliprf`](maxframe.tensor.special.elliprf.md#maxframe.tensor.special.elliprf)
: Completely-symmetric elliptic integral of the first kind.
[`elliprj`](maxframe.tensor.special.elliprj.md#maxframe.tensor.special.elliprj)
: Symmetric elliptic integral of the third kind.
### Notes
The implementation uses the relation <sup>[1](#id4)</sup>
$$
2 R_{\mathrm{G}}(x, y, z) =
z R_{\mathrm{F}}(x, y, z) -
\frac{1}{3} (x - z) (y - z) R_{\mathrm{D}}(x, y, z) +
\sqrt{\frac{x y}{z}}
$$
and the symmetry of x, y, z when at least one non-zero parameter can
be chosen as the pivot. When one of the arguments is close to zero, the AGM
method is applied instead. Other special cases are computed following Ref.
<sup>[2](#id5)</sup>
### References
* <a id='id4'>**[1]**</a> B. C. Carlson, “Numerical computation of real or complex elliptic integrals,” Numer. Algorithm, vol. 10, no. 1, pp. 13-26, 1995. [https://arxiv.org/abs/math/9409227](https://arxiv.org/abs/math/9409227) [https://doi.org/10.1007/BF02198293](https://doi.org/10.1007/BF02198293)
* <a id='id5'>**[2]**</a> B. C. Carlson, ed., Chapter 19 in “Digital Library of Mathematical Functions,” NIST, US Dept. of Commerce. [https://dlmf.nist.gov/19.16.E1](https://dlmf.nist.gov/19.16.E1) [https://dlmf.nist.gov/19.20.ii](https://dlmf.nist.gov/19.20.ii)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.elliprj.md
# maxframe.tensor.special.elliprj
### maxframe.tensor.special.elliprj(x, y, z, p, \*\*kwargs)
Symmetric elliptic integral of the third kind.
The function RJ is defined as <sup>[1](#id10)</sup>
$$
R_{\mathrm{J}}(x, y, z, p) =
\frac{3}{2} \int_0^{+\infty} [(t + x) (t + y) (t + z)]^{-1/2}
(t + p)^{-1} dt
$$
#### WARNING
This function should be considered experimental when the inputs are
unbalanced. Check correctness with another independent implementation.
* **Parameters:**
* **x** (*array_like*) – Real or complex input parameters. x, y, or z are numbers in
the complex plane cut along the negative real axis (subject to further
constraints, see Notes), and at most one of them can be zero. p must
be non-zero.
* **y** (*array_like*) – Real or complex input parameters. x, y, or z are numbers in
the complex plane cut along the negative real axis (subject to further
constraints, see Notes), and at most one of them can be zero. p must
be non-zero.
* **z** (*array_like*) – Real or complex input parameters. x, y, or z are numbers in
the complex plane cut along the negative real axis (subject to further
constraints, see Notes), and at most one of them can be zero. p must
be non-zero.
* **p** (*array_like*) – Real or complex input parameters. x, y, or z are numbers in
the complex plane cut along the negative real axis (subject to further
constraints, see Notes), and at most one of them can be zero. p must
be non-zero.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**R** – Value of the integral. If all of x, y, z, and p are real, the
return value is real. Otherwise, the return value is complex.
If p is real and negative, while x, y, and z are real,
non-negative, and at most one of them is zero, the Cauchy principal
value is returned. <sup>[1](#id10)</sup> <sup>[2](#id11)</sup>
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`elliprc`](maxframe.tensor.special.elliprc.md#maxframe.tensor.special.elliprc)
: Degenerate symmetric integral.
`elliprd`
: Symmetric elliptic integral of the second kind.
[`elliprf`](maxframe.tensor.special.elliprf.md#maxframe.tensor.special.elliprf)
: Completely-symmetric elliptic integral of the first kind.
[`elliprg`](maxframe.tensor.special.elliprg.md#maxframe.tensor.special.elliprg)
: Completely-symmetric elliptic integral of the second kind.
### Notes
The code implements Carlson’s algorithm based on the duplication theorems
and series expansion up to the 7th order. <sup>[3](#id12)</sup> The algorithm is slightly
different from its earlier incarnation as it appears in <sup>[1](#id10)</sup>, in that the
call to elliprc (or `atan`/`atanh`, see <sup>[4](#id13)</sup>) is no longer needed in
the inner loop. Asymptotic approximations are used where arguments differ
widely in the order of magnitude. <sup>[5](#id14)</sup>
The input values are subject to certain sufficient but not necessary
constraints when input arguments are complex. Notably, `x`, `y`, and
`z` must have non-negative real parts, unless two of them are
non-negative and complex-conjugates to each other while the other is a real
non-negative number. <sup>[1](#id10)</sup> If the inputs do not satisfy the sufficient
condition described in Ref. <sup>[1](#id10)</sup> they are rejected outright with the output
set to NaN.
In the case where one of `x`, `y`, and `z` is equal to `p`, the
function `elliprd` should be preferred because of its less restrictive
domain.
### References
* <a id='id10'>**[1]**</a> B. C. Carlson, “Numerical computation of real or complex elliptic integrals,” Numer. Algorithm, vol. 10, no. 1, pp. 13-26, 1995. [https://arxiv.org/abs/math/9409227](https://arxiv.org/abs/math/9409227) [https://doi.org/10.1007/BF02198293](https://doi.org/10.1007/BF02198293)
* <a id='id11'>**[2]**</a> B. C. Carlson, ed., Chapter 19 in “Digital Library of Mathematical Functions,” NIST, US Dept. of Commerce. [https://dlmf.nist.gov/19.20.iii](https://dlmf.nist.gov/19.20.iii)
* <a id='id12'>**[3]**</a> B. C. Carlson, J. FitzSimmons, “Reduction Theorems for Elliptic Integrands with the Square Root of Two Quadratic Factors,” J. Comput. Appl. Math., vol. 118, nos. 1-2, pp. 71-85, 2000. [https://doi.org/10.1016/S0377-0427(00)00282-X](https://doi.org/10.1016/S0377-0427(00)00282-X)
* <a id='id13'>**[4]**</a> F. Johansson, “Numerical Evaluation of Elliptic Functions, Elliptic Integrals and Modular Forms,” in J. Blumlein, C. Schneider, P. Paule, eds., “Elliptic Integrals, Elliptic Functions and Modular Forms in Quantum Field Theory,” pp. 269-293, 2019 (Cham, Switzerland: Springer Nature Switzerland) [https://arxiv.org/abs/1806.06725](https://arxiv.org/abs/1806.06725) [https://doi.org/10.1007/978-3-030-04480-0](https://doi.org/10.1007/978-3-030-04480-0)
* <a id='id14'>**[5]**</a> B. C. Carlson, J. L. Gustafson, “Asymptotic Approximations for Symmetric Elliptic Integrals,” SIAM J. Math. Anls., vol. 25, no. 2, pp. 288-303, 1994. [https://arxiv.org/abs/math/9310223](https://arxiv.org/abs/math/9310223) [https://doi.org/10.1137/S0036141092228477](https://doi.org/10.1137/S0036141092228477)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.entr.md
# maxframe.tensor.special.entr
### maxframe.tensor.special.entr(x, out=None, where=None, \*\*kwargs)
Elementwise function for computing entropy.
$$
\text{entr}(x) = \begin{cases} - x \log(x) & x > 0 \\ 0 & x = 0 \\ -\infty & \text{otherwise} \end{cases}
$$
* **Parameters:**
**x** (*Tensor*) – Input tensor.
* **Returns:**
**res** – The value of the elementwise entropy function at the given points x.
* **Return type:**
Tensor
#### SEE ALSO
`kl_div`, [`rel_entr`](maxframe.tensor.special.rel_entr.md#maxframe.tensor.special.rel_entr)
### Notes
This function is concave.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.erf.md
# maxframe.tensor.special.erf
### maxframe.tensor.special.erf(x, out=None, where=None, \*\*kwargs)
Returns the error function of complex argument.
It is defined as `2/sqrt(pi)*integral(exp(-t**2), t=0..z)`.
* **Parameters:**
**x** (*Tensor*) – Input tensor.
* **Returns:**
**res** – The values of the error function at the given points x.
* **Return type:**
Tensor
#### SEE ALSO
[`erfc`](maxframe.tensor.special.erfc.md#maxframe.tensor.special.erfc), [`erfinv`](maxframe.tensor.special.erfinv.md#maxframe.tensor.special.erfinv), [`erfcinv`](maxframe.tensor.special.erfcinv.md#maxframe.tensor.special.erfcinv), [`wofz`](maxframe.tensor.special.wofz.md#maxframe.tensor.special.wofz), [`erfcx`](maxframe.tensor.special.erfcx.md#maxframe.tensor.special.erfcx), [`erfi`](maxframe.tensor.special.erfi.md#maxframe.tensor.special.erfi)
### Notes
The cumulative of the unit normal distribution is given by
`Phi(z) = 1/2[1 + erf(z/sqrt(2))]`.
### References
* <a id='id1'>**[1]**</a> [https://en.wikipedia.org/wiki/Error_function](https://en.wikipedia.org/wiki/Error_function)
* <a id='id2'>**[2]**</a> Milton Abramowitz and Irene A. Stegun, eds. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover, 1972. [http://www.math.sfu.ca/~cbm/aands/page_297.htm](http://www.math.sfu.ca/~cbm/aands/page_297.htm)
* <a id='id3'>**[3]**</a> Steven G. Johnson, Faddeeva W function implementation. [http://ab-initio.mit.edu/Faddeeva](http://ab-initio.mit.edu/Faddeeva)
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> from maxframe.tensor import special
>>> import matplotlib.pyplot as plt
>>> x = mt.linspace(-3, 3)
>>> plt.plot(x, special.erf(x))
>>> plt.xlabel('$x$')
>>> plt.ylabel('$erf(x)$')
>>> plt.show()
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.erfc.md
# maxframe.tensor.special.erfc
### maxframe.tensor.special.erfc(x, out=None, where=None, \*\*kwargs)
Complementary error function, `1 - erf(x)`.
* **Parameters:**
* **x** (*array_like*) – Real or complex valued argument
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
Values of the complementary error function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`erf`](maxframe.tensor.special.erf.md#maxframe.tensor.special.erf), [`erfi`](maxframe.tensor.special.erfi.md#maxframe.tensor.special.erfi), [`erfcx`](maxframe.tensor.special.erfcx.md#maxframe.tensor.special.erfcx), [`dawsn`](maxframe.tensor.special.dawsn.md#maxframe.tensor.special.dawsn), [`wofz`](maxframe.tensor.special.wofz.md#maxframe.tensor.special.wofz)
### References
* <a id='id1'>**[1]**</a> Steven G. Johnson, Faddeeva W function implementation. [http://ab-initio.mit.edu/Faddeeva](http://ab-initio.mit.edu/Faddeeva)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.erfcinv.md
# maxframe.tensor.special.erfcinv
### maxframe.tensor.special.erfcinv(x, out=None, where=None, \*\*kwargs)
Inverse of the complementary error function.
Computes the inverse of the complementary error function.
In the complex domain, there is no unique complex number w satisfying
erfc(w)=z. This indicates a true inverse function would be multivalued.
When the domain restricts to the real, 0 < x < 2, there is a unique real
number satisfying erfc(erfcinv(x)) = erfcinv(erfc(x)).
It is related to inverse of the error function by erfcinv(1-x) = erfinv(x)
* **Parameters:**
* **y** (*ndarray*) – Argument at which to evaluate. Domain: [0, 2]
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**erfcinv** – The inverse of erfc of y, element-wise
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`erf`](maxframe.tensor.special.erf.md#maxframe.tensor.special.erf)
: Error function of a complex argument
[`erfc`](maxframe.tensor.special.erfc.md#maxframe.tensor.special.erfc)
: Complementary error function, `1 - erf(x)`
[`erfinv`](maxframe.tensor.special.erfinv.md#maxframe.tensor.special.erfinv)
: Inverse of the error function
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.erfcx.md
# maxframe.tensor.special.erfcx
### maxframe.tensor.special.erfcx(x, out=None, where=None, \*\*kwargs)
Scaled complementary error function, `exp(x**2) * erfc(x)`.
* **Parameters:**
* **x** (*array_like*) – Real or complex valued argument
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
Values of the scaled complementary error function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`erf`](maxframe.tensor.special.erf.md#maxframe.tensor.special.erf), [`erfc`](maxframe.tensor.special.erfc.md#maxframe.tensor.special.erfc), [`erfi`](maxframe.tensor.special.erfi.md#maxframe.tensor.special.erfi), [`dawsn`](maxframe.tensor.special.dawsn.md#maxframe.tensor.special.dawsn), [`wofz`](maxframe.tensor.special.wofz.md#maxframe.tensor.special.wofz)
### Notes
### References
* <a id='id1'>**[1]**</a> Steven G. Johnson, Faddeeva W function implementation. [http://ab-initio.mit.edu/Faddeeva](http://ab-initio.mit.edu/Faddeeva)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.erfi.md
# maxframe.tensor.special.erfi
### maxframe.tensor.special.erfi(x, out=None, where=None, \*\*kwargs)
Imaginary error function, `-i erf(i z)`.
* **Parameters:**
* **z** (*array_like*) – Real or complex valued argument
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
Values of the imaginary error function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`erf`](maxframe.tensor.special.erf.md#maxframe.tensor.special.erf), [`erfc`](maxframe.tensor.special.erfc.md#maxframe.tensor.special.erfc), [`erfcx`](maxframe.tensor.special.erfcx.md#maxframe.tensor.special.erfcx), [`dawsn`](maxframe.tensor.special.dawsn.md#maxframe.tensor.special.dawsn), [`wofz`](maxframe.tensor.special.wofz.md#maxframe.tensor.special.wofz)
### Notes
### References
* <a id='id1'>**[1]**</a> Steven G. Johnson, Faddeeva W function implementation. [http://ab-initio.mit.edu/Faddeeva](http://ab-initio.mit.edu/Faddeeva)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.erfinv.md
# maxframe.tensor.special.erfinv
### maxframe.tensor.special.erfinv(x, out=None, where=None, \*\*kwargs)
Inverse of the error function.
Computes the inverse of the error function.
In the complex domain, there is no unique complex number w satisfying
erf(w)=z. This indicates a true inverse function would be multivalued.
When the domain restricts to the real, -1 < x < 1, there is a unique real
number satisfying erf(erfinv(x)) = x.
* **Parameters:**
* **y** (*ndarray*) – Argument at which to evaluate. Domain: [-1, 1]
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**erfinv** – The inverse of erf of y, element-wise
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`erf`](maxframe.tensor.special.erf.md#maxframe.tensor.special.erf)
: Error function of a complex argument
[`erfc`](maxframe.tensor.special.erfc.md#maxframe.tensor.special.erfc)
: Complementary error function, `1 - erf(x)`
[`erfcinv`](maxframe.tensor.special.erfcinv.md#maxframe.tensor.special.erfcinv)
: Inverse of the complementary error function
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.expit.md
# maxframe.tensor.special.expit
### maxframe.tensor.special.expit(x, out=None)
Expit (a.k.a. logistic sigmoid) ufunc for ndarrays.
The expit function, also known as the logistic sigmoid function, is
defined as `expit(x) = 1/(1+exp(-x))`. It is the inverse of the
logit function.
* **Parameters:**
* **x** (*ndarray*) – The ndarray to apply expit to element-wise.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
An ndarray of the same shape as x. Its entries
are expit of the corresponding entry of x.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`logit`](maxframe.tensor.special.logit.md#maxframe.tensor.special.logit)
### Notes
As a ufunc expit takes a number of optional
keyword arguments. For more information
see [ufuncs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.fresnel.md
# maxframe.tensor.special.fresnel
### maxframe.tensor.special.fresnel(x, out=None, \*\*kwargs)
Fresnel integrals.
The Fresnel integrals are defined as
$$
S(z) &= \int_0^z \sin(\pi t^2 /2) dt \\
C(z) &= \int_0^z \cos(\pi t^2 /2) dt.
$$
See [[dlmf]](maxframe.tensor.special.rgamma.md#dlmf) for details.
* **Parameters:**
* **z** (*array_like*) – Real or complex valued argument
* **out** (*2-tuple* *of* *ndarrays* *,* *optional*) – Optional output arrays for the function results
* **Returns:**
**S, C** – Values of the Fresnel integrals
* **Return type:**
2-tuple of scalar or ndarray
#### SEE ALSO
`fresnel_zeros`
: zeros of the Fresnel integrals
### References
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.gamma.md
# maxframe.tensor.special.gamma
### maxframe.tensor.special.gamma(z, out=None)
gamma function.
The gamma function is defined as
$$
\Gamma(z) = \int_0^\infty t^{z-1} e^{-t} dt
$$
for $\Re(z) > 0$ and is extended to the rest of the complex
plane by analytic continuation. See [[dlmf]](maxframe.tensor.special.rgamma.md#dlmf) for more details.
* **Parameters:**
* **z** (*array_like*) – Real or complex valued argument
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Values of the gamma function
* **Return type:**
scalar or ndarray
### Notes
The gamma function is often referred to as the generalized
factorial since $\Gamma(n + 1) = n!$ for natural numbers
$n$. More generally it satisfies the recurrence relation
$\Gamma(z + 1) = z \cdot \Gamma(z)$ for complex $z$,
which, combined with the fact that $\Gamma(1) = 1$, implies
the above identity for $z = n$.
### References
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.gammainc.md
# maxframe.tensor.special.gammainc
### maxframe.tensor.special.gammainc(a, b, \*\*kwargs)
Regularized lower incomplete gamma function.
It is defined as
$$
P(a, x) = \frac{1}{\Gamma(a)} \int_0^x t^{a - 1}e^{-t} dt
$$
for $a > 0$ and $x \geq 0$. See [[dlmf]](maxframe.tensor.special.rgamma.md#dlmf) for details.
* **Parameters:**
* **a** (*array_like*) – Positive parameter
* **x** (*array_like*) – Nonnegative argument
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Values of the lower incomplete gamma function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`gammaincc`](maxframe.tensor.special.gammaincc.md#maxframe.tensor.special.gammaincc)
: regularized upper incomplete gamma function
[`gammaincinv`](maxframe.tensor.special.gammaincinv.md#maxframe.tensor.special.gammaincinv)
: inverse of the regularized lower incomplete gamma function
[`gammainccinv`](maxframe.tensor.special.gammainccinv.md#maxframe.tensor.special.gammainccinv)
: inverse of the regularized upper incomplete gamma function
### Notes
The function satisfies the relation `gammainc(a, x) +
gammaincc(a, x) = 1` where gammaincc is the regularized upper
incomplete gamma function.
The implementation largely follows that of [[boost]](maxframe.tensor.special.gammaincc.md#boost).
### References
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.gammaincc.md
# maxframe.tensor.special.gammaincc
### maxframe.tensor.special.gammaincc(a, b, \*\*kwargs)
Regularized lower incomplete gamma function.
It is defined as
$$
P(a, x) = \frac{1}{\Gamma(a)} \int_0^x t^{a - 1}e^{-t} dt
$$
for $a > 0$ and $x \geq 0$. See [[dlmf]](maxframe.tensor.special.rgamma.md#dlmf) for details.
* **Parameters:**
* **a** (*array_like*) – Positive parameter
* **x** (*array_like*) – Nonnegative argument
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Values of the lower incomplete gamma function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`gammaincc`](#maxframe.tensor.special.gammaincc)
: regularized upper incomplete gamma function
[`gammaincinv`](maxframe.tensor.special.gammaincinv.md#maxframe.tensor.special.gammaincinv)
: inverse of the regularized lower incomplete gamma function
[`gammainccinv`](maxframe.tensor.special.gammainccinv.md#maxframe.tensor.special.gammainccinv)
: inverse of the regularized upper incomplete gamma function
### Notes
The function satisfies the relation `gammainc(a, x) +
gammaincc(a, x) = 1` where gammaincc is the regularized upper
incomplete gamma function.
The implementation largely follows that of [[boost]](#boost).
### References
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.gammainccinv.md
# maxframe.tensor.special.gammainccinv
### maxframe.tensor.special.gammainccinv(a, b, \*\*kwargs)
Inverse of the regularized upper incomplete gamma function.
Given an input $y$ between 0 and 1, returns $x$ such
that $y = Q(a, x)$. Here $Q$ is the regularized upper
incomplete gamma function; see gammaincc. This is well-defined
because the upper incomplete gamma function is monotonic as can
be seen from its definition in [[dlmf]](maxframe.tensor.special.rgamma.md#dlmf).
* **Parameters:**
* **a** (*array_like*) – Positive parameter
* **y** (*array_like*) – Argument between 0 and 1, inclusive
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Values of the inverse of the upper incomplete gamma function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`gammaincc`](maxframe.tensor.special.gammaincc.md#maxframe.tensor.special.gammaincc)
: regularized upper incomplete gamma function
[`gammainc`](maxframe.tensor.special.gammainc.md#maxframe.tensor.special.gammainc)
: regularized lower incomplete gamma function
[`gammaincinv`](maxframe.tensor.special.gammaincinv.md#maxframe.tensor.special.gammaincinv)
: inverse of the regularized lower incomplete gamma function
### References
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.gammaincinv.md
# maxframe.tensor.special.gammaincinv
### maxframe.tensor.special.gammaincinv(a, b, \*\*kwargs)
Inverse to the regularized lower incomplete gamma function.
Given an input $y$ between 0 and 1, returns $x$ such
that $y = P(a, x)$. Here $P$ is the regularized lower
incomplete gamma function; see gammainc. This is well-defined
because the lower incomplete gamma function is monotonic as can be
seen from its definition in [[dlmf]](maxframe.tensor.special.rgamma.md#dlmf).
* **Parameters:**
* **a** (*array_like*) – Positive parameter
* **y** (*array_like*) – Parameter between 0 and 1, inclusive
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Values of the inverse of the lower incomplete gamma function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`gammainc`](maxframe.tensor.special.gammainc.md#maxframe.tensor.special.gammainc)
: regularized lower incomplete gamma function
[`gammaincc`](maxframe.tensor.special.gammaincc.md#maxframe.tensor.special.gammaincc)
: regularized upper incomplete gamma function
[`gammainccinv`](maxframe.tensor.special.gammainccinv.md#maxframe.tensor.special.gammainccinv)
: inverse of the regularized upper incomplete gamma function
### References
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.gammaln.md
# maxframe.tensor.special.gammaln
### maxframe.tensor.special.gammaln(x, out=None, where=None, \*\*kwargs)
Logarithm of the absolute value of the Gamma function.
* **Parameters:**
* **x** (*array-like*) – Values on the real line at which to compute `gammaln`
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**gammaln** – Values of `gammaln` at x.
* **Return type:**
Tensor
#### SEE ALSO
[`gammasgn`](maxframe.tensor.special.gammasgn.md#maxframe.tensor.special.gammasgn)
: sign of the gamma function
[`loggamma`](maxframe.tensor.special.loggamma.md#maxframe.tensor.special.loggamma)
: principal branch of the logarithm of the gamma function
### Notes
When used in conjunction with gammasgn, this function is useful
for working in logspace on the real axis without having to deal with
complex numbers, via the relation `exp(gammaln(x)) = gammasgn(x)*gamma(x)`.
For complex-valued log-gamma, use loggamma instead of gammaln.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.gammasgn.md
# maxframe.tensor.special.gammasgn
### maxframe.tensor.special.gammasgn(x, \*\*kwargs)
Sign of the gamma function.
It is defined as
$$
\text{gammasgn}(x) =
\begin{cases}
+1 & \Gamma(x) > 0 \\
-1 & \Gamma(x) < 0
\end{cases}
$$
where $\Gamma$ is the gamma function; see gamma. This
definition is complete since the gamma function is never zero;
see the discussion after [[dlmf]](maxframe.tensor.special.rgamma.md#dlmf).
* **Parameters:**
* **x** (*array_like*) – Real argument
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Sign of the gamma function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`gamma`](maxframe.tensor.special.gamma.md#maxframe.tensor.special.gamma)
: the gamma function
[`gammaln`](maxframe.tensor.special.gammaln.md#maxframe.tensor.special.gammaln)
: log of the absolute value of the gamma function
[`loggamma`](maxframe.tensor.special.loggamma.md#maxframe.tensor.special.loggamma)
: analytic continuation of the log of the gamma function
### Notes
The gamma function can be computed as `gammasgn(x) *
np.exp(gammaln(x))`.
### References
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.hankel1.md
# maxframe.tensor.special.hankel1
### maxframe.tensor.special.hankel1(v, z, out=None)
Hankel function of the first kind
* **Parameters:**
* **v** (*array_like*) – Order (float).
* **z** (*array_like*) – Argument (float or complex).
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Values of the Hankel function of the first kind.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`hankel1e`](maxframe.tensor.special.hankel1e.md#maxframe.tensor.special.hankel1e)
: ndarray This function with leading exponential behavior stripped off.
### Notes
A wrapper for the AMOS <sup>[1](#id2)</sup> routine zbesh, which carries out the
computation using the relation,
$$
H^{(1)}_v(z) =
\frac{2}{\imath\pi} \exp(-\imath \pi v/2) K_v(z \exp(-\imath\pi/2))
$$
where $K_v$ is the modified Bessel function of the second kind.
For negative orders, the relation
$$
H^{(1)}_{-v}(z) = H^{(1)}_v(z) \exp(\imath\pi v)
$$
is used.
### References
* <a id='id2'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.hankel1e.md
# maxframe.tensor.special.hankel1e
### maxframe.tensor.special.hankel1e(v, z, out=None)
Exponentially scaled Hankel function of the first kind
Defined as:
```default
hankel1e(v, z) = hankel1(v, z) * exp(-1j * z)
```
* **Parameters:**
* **v** (*array_like*) – Order (float).
* **z** (*array_like*) – Argument (float or complex).
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Values of the exponentially scaled Hankel function.
* **Return type:**
scalar or ndarray
### Notes
A wrapper for the AMOS <sup>[1](#id2)</sup> routine zbesh, which carries out the
computation using the relation,
$$
H^{(1)}_v(z) =
\frac{2}{\imath\pi} \exp(-\imath \pi v/2) K_v(z \exp(-\imath\pi/2))
$$
where $K_v$ is the modified Bessel function of the second kind.
For negative orders, the relation
$$
H^{(1)}_{-v}(z) = H^{(1)}_v(z) \exp(\imath\pi v)
$$
is used.
### References
* <a id='id2'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.hankel2.md
# maxframe.tensor.special.hankel2
### maxframe.tensor.special.hankel2(v, z, out=None)
Hankel function of the second kind
* **Parameters:**
* **v** (*array_like*) – Order (float).
* **z** (*array_like*) – Argument (float or complex).
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Values of the Hankel function of the second kind.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`hankel2e`](maxframe.tensor.special.hankel2e.md#maxframe.tensor.special.hankel2e)
: this function with leading exponential behavior stripped off.
### Notes
A wrapper for the AMOS <sup>[1](#id2)</sup> routine zbesh, which carries out the
computation using the relation,
$$
H^{(2)}_v(z) =
-\frac{2}{\imath\pi} \exp(\imath \pi v/2) K_v(z \exp(\imath\pi/2))
$$
where $K_v$ is the modified Bessel function of the second kind.
For negative orders, the relation
$$
H^{(2)}_{-v}(z) = H^{(2)}_v(z) \exp(-\imath\pi v)
$$
is used.
### References
* <a id='id2'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.hankel2e.md
# maxframe.tensor.special.hankel2e
### maxframe.tensor.special.hankel2e(v, z, out=None)
Exponentially scaled Hankel function of the second kind
Defined as:
```default
hankel2e(v, z) = hankel2(v, z) * exp(1j * z)
```
* **Parameters:**
* **v** (*array_like*) – Order (float).
* **z** (*array_like*) – Argument (float or complex).
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Values of the exponentially scaled Hankel function of the second kind.
* **Return type:**
scalar or ndarray
### Notes
A wrapper for the AMOS <sup>[1](#id2)</sup> routine zbesh, which carries out the
computation using the relation,
$$
H^{(2)}_v(z) = -\frac{2}{\imath\pi}
\exp(\frac{\imath \pi v}{2}) K_v(z exp(\frac{\imath\pi}{2}))
$$
where $K_v$ is the modified Bessel function of the second kind.
For negative orders, the relation
$$
H^{(2)}_{-v}(z) = H^{(2)}_v(z) \exp(-\imath\pi v)
$$
is used.
### References
* <a id='id2'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.itairy.md
# maxframe.tensor.special.itairy
### maxframe.tensor.special.itairy(x, out=None)
Integrals of Airy functions
Calculates the integrals of Airy functions from 0 to x.
* **Parameters:**
* **x** (*array_like*) – Upper limit of integration (float).
* **out** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ndarray* *,* *optional*) – Optional output arrays for the function values
* **Returns:**
* **Apt** (*scalar or ndarray*) – Integral of Ai(t) from 0 to x.
* **Bpt** (*scalar or ndarray*) – Integral of Bi(t) from 0 to x.
* **Ant** (*scalar or ndarray*) – Integral of Ai(-t) from 0 to x.
* **Bnt** (*scalar or ndarray*) – Integral of Bi(-t) from 0 to x.
### Notes
Wrapper for a Fortran routine created by Shanjie Zhang and Jianming
Jin <sup>[1](#id2)</sup>.
### References
* <a id='id2'>**[1]**</a> Zhang, Shanjie and Jin, Jianming. “Computation of Special Functions”, John Wiley and Sons, 1996. [https://people.sc.fsu.edu/~jburkardt/f_src/special_functions/special_functions.html](https://people.sc.fsu.edu/~jburkardt/f_src/special_functions/special_functions.html)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.iv.md
# maxframe.tensor.special.iv
### maxframe.tensor.special.iv(v, z, out=None)
Modified Bessel function of the first kind of real order.
* **Parameters:**
* **v** (*array_like*) – Order. If z is of real type and negative, v must be integer
valued.
* **z** (*array_like* *of* [*float*](https://docs.python.org/3/library/functions.html#float) *or* [*complex*](https://docs.python.org/3/library/functions.html#complex)) – Argument.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Values of the modified Bessel function.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`ive`](maxframe.tensor.special.ive.md#maxframe.tensor.special.ive)
: This function with leading exponential behavior stripped off.
`i0`
: Faster version of this function for order 0.
`i1`
: Faster version of this function for order 1.
### Notes
For real z and $v \in [-50, 50]$, the evaluation is carried out
using Temme’s method <sup>[1](#id3)</sup>. For larger orders, uniform asymptotic
expansions are applied.
For complex z and positive v, the AMOS <sup>[2](#id4)</sup> zbesi routine is
called. It uses a power series for small z, the asymptotic expansion
for large abs(z), the Miller algorithm normalized by the Wronskian
and a Neumann series for intermediate magnitudes, and the uniform
asymptotic expansions for $I_v(z)$ and $J_v(z)$ for large
orders. Backward recurrence is used to generate sequences or reduce
orders when necessary.
The calculations above are done in the right half plane and continued
into the left half plane by the formula,
$$
I_v(z \exp(\pm\imath\pi)) = \exp(\pm\pi v) I_v(z)
$$
(valid when the real part of z is positive). For negative v, the
formula
$$
I_{-v}(z) = I_v(z) + \frac{2}{\pi} \sin(\pi v) K_v(z)
$$
is used, where $K_v(z)$ is the modified Bessel function of the
second kind, evaluated using the AMOS routine zbesk.
### References
* <a id='id3'>**[1]**</a> Temme, Journal of Computational Physics, vol 21, 343 (1976)
* <a id='id4'>**[2]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.ive.md
# maxframe.tensor.special.ive
### maxframe.tensor.special.ive(v, z, out=None)
Exponentially scaled modified Bessel function of the first kind.
Defined as:
```default
ive(v, z) = iv(v, z) * exp(-abs(z.real))
```
For imaginary numbers without a real part, returns the unscaled
Bessel function of the first kind iv.
* **Parameters:**
* **v** (*array_like* *of* [*float*](https://docs.python.org/3/library/functions.html#float)) – Order.
* **z** (*array_like* *of* [*float*](https://docs.python.org/3/library/functions.html#float) *or* [*complex*](https://docs.python.org/3/library/functions.html#complex)) – Argument.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
Values of the exponentially scaled modified Bessel function.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`iv`](maxframe.tensor.special.iv.md#maxframe.tensor.special.iv)
: Modified Bessel function of the first kind
`i0e`
: Faster implementation of this function for order 0
`i1e`
: Faster implementation of this function for order 1
### Notes
For positive v, the AMOS <sup>[1](#id2)</sup> zbesi routine is called. It uses a
power series for small z, the asymptotic expansion for large
abs(z), the Miller algorithm normalized by the Wronskian and a
Neumann series for intermediate magnitudes, and the uniform asymptotic
expansions for $I_v(z)$ and $J_v(z)$ for large orders.
Backward recurrence is used to generate sequences or reduce orders when
necessary.
The calculations above are done in the right half plane and continued
into the left half plane by the formula,
$$
I_v(z \exp(\pm\imath\pi)) = \exp(\pm\pi v) I_v(z)
$$
(valid when the real part of z is positive). For negative v, the
formula
$$
I_{-v}(z) = I_v(z) + \frac{2}{\pi} \sin(\pi v) K_v(z)
$$
is used, where $K_v(z)$ is the modified Bessel function of the
second kind, evaluated using the AMOS routine zbesk.
ive is useful for large arguments z: for these, iv easily overflows,
while ive does not due to the exponential scaling.
### References
* <a id='id2'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.jv.md
# maxframe.tensor.special.jv
### maxframe.tensor.special.jv(v, z, out=None)
Bessel function of the first kind of real order and complex argument.
* **Parameters:**
* **v** (*array_like*) – Order (float).
* **z** (*array_like*) – Argument (float or complex).
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**J** – Value of the Bessel function, $J_v(z)$.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`jve`](maxframe.tensor.special.jve.md#maxframe.tensor.special.jve)
: $J_v$ with leading exponential behavior stripped off.
`spherical_jn`
: spherical Bessel functions.
`j0`
: faster version of this function for order 0.
`j1`
: faster version of this function for order 1.
### Notes
For positive v values, the computation is carried out using the AMOS
<sup>[1](#id2)</sup> zbesj routine, which exploits the connection to the modified
Bessel function $I_v$,
$$
J_v(z) = \exp(v\pi\imath/2) I_v(-\imath z)\qquad (\Im z > 0)
J_v(z) = \exp(-v\pi\imath/2) I_v(\imath z)\qquad (\Im z < 0)
$$
For negative v values the formula,
$$
J_{-v}(z) = J_v(z) \cos(\pi v) - Y_v(z) \sin(\pi v)
$$
is used, where $Y_v(z)$ is the Bessel function of the second
kind, computed using the AMOS routine zbesy. Note that the second
term is exactly zero for integer v; to improve accuracy the second
term is explicitly omitted for v values such that v = floor(v).
Not to be confused with the spherical Bessel functions (see spherical_jn).
### References
* <a id='id2'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.jve.md
# maxframe.tensor.special.jve
### maxframe.tensor.special.jve(v, z, out=None)
Exponentially scaled Bessel function of the first kind of order v.
Defined as:
```default
jve(v, z) = jv(v, z) * exp(-abs(z.imag))
```
* **Parameters:**
* **v** (*array_like*) – Order (float).
* **z** (*array_like*) – Argument (float or complex).
* **out** (*ndarray* *,* *optional*) – Optional output array for the function values
* **Returns:**
**J** – Value of the exponentially scaled Bessel function.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`jv`](maxframe.tensor.special.jv.md#maxframe.tensor.special.jv)
: Unscaled Bessel function of the first kind
### Notes
For positive v values, the computation is carried out using the AMOS
<sup>[1](#id2)</sup> zbesj routine, which exploits the connection to the modified
Bessel function $I_v$,
$$
J_v(z) = \exp(v\pi\imath/2) I_v(-\imath z)\qquad (\Im z > 0)
J_v(z) = \exp(-v\pi\imath/2) I_v(\imath z)\qquad (\Im z < 0)
$$
For negative v values the formula,
$$
J_{-v}(z) = J_v(z) \cos(\pi v) - Y_v(z) \sin(\pi v)
$$
is used, where $Y_v(z)$ is the Bessel function of the second
kind, computed using the AMOS routine zbesy. Note that the second
term is exactly zero for integer v; to improve accuracy the second
term is explicitly omitted for v values such that v = floor(v).
Exponentially scaled Bessel functions are useful for large arguments z:
for these, the unscaled Bessel functions can easily under-or overflow.
### References
* <a id='id2'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.kn.md
# maxframe.tensor.special.kn
### maxframe.tensor.special.kn(n, x, \*\*kwargs)
Modified Bessel function of the second kind of integer order n
Returns the modified Bessel function of the second kind for integer order
n at real z.
These are also sometimes called functions of the third kind, Basset
functions, or Macdonald functions.
* **Parameters:**
* **n** (*array_like* *of* [*int*](https://docs.python.org/3/library/functions.html#int)) – Order of Bessel functions (floats will truncate with a warning)
* **x** (*array_like* *of* [*float*](https://docs.python.org/3/library/functions.html#float)) – Argument at which to evaluate the Bessel functions
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results.
* **Returns:**
Value of the Modified Bessel function of the second kind,
$K_n(x)$.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`kv`](maxframe.tensor.special.kv.md#maxframe.tensor.special.kv)
: Same function, but accepts real order and complex argument
`kvp`
: Derivative of this function
### Notes
Wrapper for AMOS <sup>[1](#id3)</sup> routine zbesk. For a discussion of the
algorithm used, see <sup>[2](#id4)</sup> and the references therein.
### References
* <a id='id3'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
* <a id='id4'>**[2]**</a> Donald E. Amos, “Algorithm 644: A portable package for Bessel functions of a complex argument and nonnegative order”, ACM TOMS Vol. 12 Issue 3, Sept. 1986, p. 265
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.kv.md
# maxframe.tensor.special.kv
### maxframe.tensor.special.kv(v, z, out=None)
Modified Bessel function of the second kind of real order v
Returns the modified Bessel function of the second kind for real order
v at complex z.
These are also sometimes called functions of the third kind, Basset
functions, or Macdonald functions. They are defined as those solutions
of the modified Bessel equation for which,
$$
K_v(x) \sim \sqrt{\pi/(2x)} \exp(-x)
$$
as $x \to \infty$ <sup>[3](#id6)</sup>.
* **Parameters:**
* **v** (*array_like* *of* [*float*](https://docs.python.org/3/library/functions.html#float)) – Order of Bessel functions
* **z** (*array_like* *of* [*complex*](https://docs.python.org/3/library/functions.html#complex)) – Argument at which to evaluate the Bessel functions
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
The results. Note that input must be of complex type to get complex
output, e.g. `kv(3, -2+0j)` instead of `kv(3, -2)`.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`kve`](maxframe.tensor.special.kve.md#maxframe.tensor.special.kve)
: This function with leading exponential behavior stripped off.
`kvp`
: Derivative of this function
### Notes
Wrapper for AMOS <sup>[1](#id4)</sup> routine zbesk. For a discussion of the
algorithm used, see <sup>[2](#id5)</sup> and the references therein.
### References
* <a id='id4'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
* <a id='id5'>**[2]**</a> Donald E. Amos, “Algorithm 644: A portable package for Bessel functions of a complex argument and nonnegative order”, ACM TOMS Vol. 12 Issue 3, Sept. 1986, p. 265
* <a id='id6'>**[3]**</a> NIST Digital Library of Mathematical Functions, Eq. 10.25.E3. [https://dlmf.nist.gov/10.25.E3](https://dlmf.nist.gov/10.25.E3)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.kve.md
# maxframe.tensor.special.kve
### maxframe.tensor.special.kve(v, z, out=None)
Exponentially scaled modified Bessel function of the second kind.
Returns the exponentially scaled, modified Bessel function of the
second kind (sometimes called the third kind) for real order v at
complex z:
```default
kve(v, z) = kv(v, z) * exp(z)
```
* **Parameters:**
* **v** (*array_like* *of* [*float*](https://docs.python.org/3/library/functions.html#float)) – Order of Bessel functions
* **z** (*array_like* *of* [*complex*](https://docs.python.org/3/library/functions.html#complex)) – Argument at which to evaluate the Bessel functions
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
The exponentially scaled modified Bessel function of the second kind.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`kv`](maxframe.tensor.special.kv.md#maxframe.tensor.special.kv)
: This function without exponential scaling.
`k0e`
: Faster version of this function for order 0.
`k1e`
: Faster version of this function for order 1.
### Notes
Wrapper for AMOS <sup>[1](#id3)</sup> routine zbesk. For a discussion of the
algorithm used, see <sup>[2](#id4)</sup> and the references therein.
### References
* <a id='id3'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
* <a id='id4'>**[2]**</a> Donald E. Amos, “Algorithm 644: A portable package for Bessel functions of a complex argument and nonnegative order”, ACM TOMS Vol. 12 Issue 3, Sept. 1986, p. 265
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.log_expit.md
# maxframe.tensor.special.log_expit
### maxframe.tensor.special.log_expit(x, out=None)
Logarithm of the logistic sigmoid function.
The SciPy implementation of the logistic sigmoid function is
scipy.special.expit, so this function is called `log_expit`.
The function is mathematically equivalent to `log(expit(x))`, but
is formulated to avoid loss of precision for inputs with large
(positive or negative) magnitude.
* **Parameters:**
* **x** (*array_like*) – The values to apply `log_expit` to element-wise.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
**out** – The computed values, an ndarray of the same shape as `x`.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`expit`](maxframe.tensor.special.expit.md#maxframe.tensor.special.expit)
### Notes
As a ufunc, `log_expit` takes a number of optional keyword arguments.
For more information see
[ufuncs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.loggamma.md
# maxframe.tensor.special.loggamma
### maxframe.tensor.special.loggamma(z, out=None)
Principal branch of the logarithm of the gamma function.
Defined to be $\log(\Gamma(x))$ for $x > 0$ and
extended to the complex plane by analytic continuation. The
function has a single branch cut on the negative real axis.
* **Parameters:**
* **z** (*array_like*) – Values in the complex plane at which to compute `loggamma`
* **out** (*ndarray* *,* *optional*) – Output array for computed values of `loggamma`
* **Returns:**
**loggamma** – Values of `loggamma` at z.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`gammaln`](maxframe.tensor.special.gammaln.md#maxframe.tensor.special.gammaln)
: logarithm of the absolute value of the gamma function
[`gammasgn`](maxframe.tensor.special.gammasgn.md#maxframe.tensor.special.gammasgn)
: sign of the gamma function
### Notes
It is not generally true that $\log\Gamma(z) =
\log(\Gamma(z))$, though the real parts of the functions do
agree. The benefit of not defining loggamma as
$\log(\Gamma(z))$ is that the latter function has a
complicated branch cut structure whereas loggamma is analytic
except for on the negative real axis.
The identities
$$
\exp(\log\Gamma(z)) &= \Gamma(z) \\
\log\Gamma(z + 1) &= \log(z) + \log\Gamma(z)
$$
make loggamma useful for working in complex logspace.
On the real line loggamma is related to gammaln via
`exp(loggamma(x + 0j)) = gammasgn(x)*exp(gammaln(x))`, up to
rounding error.
The implementation here is based on [[hare1997]](#hare1997).
### References
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.logit.md
# maxframe.tensor.special.logit
### maxframe.tensor.special.logit(x, \*\*kwargs)
“””
logit(x, out=None)
Logit ufunc for ndarrays.
The logit function is defined as logit(p) = log(p/(1-p)).
Note that logit(0) = -inf, logit(1) = inf, and logit(p)
for p<0 or p>1 yields nan.
* **Parameters:**
* **x** (*ndarray*) – The ndarray to apply logit to element-wise.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
An ndarray of the same shape as x. Its entries
are logit of the corresponding entry of x.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`expit`](maxframe.tensor.special.expit.md#maxframe.tensor.special.expit)
### Notes
As a ufunc logit takes a number of optional
keyword arguments. For more information
see [ufuncs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.modfresnelm.md
# maxframe.tensor.special.modfresnelm
### maxframe.tensor.special.modfresnelm(x, out=None)
Modified Fresnel negative integrals
* **Parameters:**
* **x** (*array_like*) – Function argument
* **out** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ndarray* *,* *optional*) – Optional output arrays for the function results
* **Returns:**
* **fm** (*scalar or ndarray*) – Integral `F_-(x)`: `integral(exp(-1j*t*t), t=x..inf)`
* **km** (*scalar or ndarray*) – Integral `K_-(x)`: `1/sqrt(pi)*exp(1j*(x*x+pi/4))*fp`
#### SEE ALSO
[`modfresnelp`](maxframe.tensor.special.modfresnelp.md#maxframe.tensor.special.modfresnelp)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.modfresnelp.md
# maxframe.tensor.special.modfresnelp
### maxframe.tensor.special.modfresnelp(x, out=None)
Modified Fresnel positive integrals
* **Parameters:**
* **x** (*array_like*) – Function argument
* **out** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ndarray* *,* *optional*) – Optional output arrays for the function results
* **Returns:**
* **fp** (*scalar or ndarray*) – Integral `F_+(x)`: `integral(exp(1j*t*t), t=x..inf)`
* **kp** (*scalar or ndarray*) – Integral `K_+(x)`: `1/sqrt(pi)*exp(-1j*(x*x+pi/4))*fp`
#### SEE ALSO
[`modfresnelm`](maxframe.tensor.special.modfresnelm.md#maxframe.tensor.special.modfresnelm)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.multigammaln.md
# maxframe.tensor.special.multigammaln
### maxframe.tensor.special.multigammaln(a, b, \*\*kwargs)
Returns the log of multivariate gamma, also sometimes called the
generalized gamma.
* **Parameters:**
* **a** (*ndarray*) – The multivariate gamma is computed for each item of a.
* **d** ([*int*](https://docs.python.org/3/library/functions.html#int)) – The dimension of the space of integration.
* **Returns:**
**res** – The values of the log multivariate gamma at the given points a.
* **Return type:**
ndarray
### Notes
The formal definition of the multivariate gamma of dimension d for a real
a is
$$
\Gamma_d(a) = \int_{A>0} e^{-tr(A)} |A|^{a - (d+1)/2} dA
$$
with the condition $a > (d-1)/2$, and $A > 0$ being the set of
all the positive definite matrices of dimension d. Note that a is a
scalar: the integrand only is multivariate, the argument is not (the
function is defined over a subset of the real set).
This can be proven to be equal to the much friendlier equation
$$
\Gamma_d(a) = \pi^{d(d-1)/4} \prod_{i=1}^{d} \Gamma(a - (i-1)/2).
$$
### References
R. J. Muirhead, Aspects of multivariate statistical theory (Wiley Series in
probability and mathematical statistics).
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.poch.md
# maxframe.tensor.special.poch
### maxframe.tensor.special.poch(a, b, \*\*kwargs)
Pochhammer symbol.
The Pochhammer symbol (rising factorial) is defined as
$$
(z)_m = \frac{\Gamma(z + m)}{\Gamma(z)}
$$
For positive integer m it reads
$$
(z)_m = z (z + 1) ... (z + m - 1)
$$
See [[dlmf]](maxframe.tensor.special.rgamma.md#dlmf) for more details.
* **Parameters:**
* **z** (*array_like*) – Real-valued arguments.
* **m** (*array_like*) – Real-valued arguments.
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
The value of the function.
* **Return type:**
scalar or ndarray
### References
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.polygamma.md
# maxframe.tensor.special.polygamma
### maxframe.tensor.special.polygamma(a, b, \*\*kwargs)
Polygamma functions.
Defined as $\psi^{(n)}(x)$ where $\psi$ is the
digamma function. See [[dlmf]](maxframe.tensor.special.rgamma.md#dlmf) for details.
* **Parameters:**
* **n** (*array_like*) – The order of the derivative of the digamma function; must be
integral
* **x** (*array_like*) – Real valued input
* **Returns:**
Function results
* **Return type:**
ndarray
#### SEE ALSO
[`digamma`](maxframe.tensor.special.digamma.md#maxframe.tensor.special.digamma)
### References
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.psi.md
# maxframe.tensor.special.psi
### maxframe.tensor.special.psi(z, out=None)
The digamma function.
The logarithmic derivative of the gamma function evaluated at `z`.
* **Parameters:**
* **z** (*array_like*) – Real or complex argument.
* **out** (*ndarray* *,* *optional*) – Array for the computed values of `psi`.
* **Returns:**
**digamma** – Computed values of `psi`.
* **Return type:**
scalar or ndarray
### Notes
For large values not close to the negative real axis, `psi` is
computed using the asymptotic series (5.11.2) from <sup>[1](#id5)</sup>. For small
arguments not close to the negative real axis, the recurrence
relation (5.5.2) from <sup>[1](#id5)</sup> is used until the argument is large
enough to use the asymptotic series. For values close to the
negative real axis, the reflection formula (5.5.4) from <sup>[1](#id5)</sup> is
used first. Note that `psi` has a family of zeros on the
negative real axis which occur between the poles at nonpositive
integers. Around the zeros the reflection formula suffers from
cancellation and the implementation loses precision. The sole
positive zero and the first negative zero, however, are handled
separately by precomputing series expansions using <sup>[2](#id6)</sup>, so the
function should maintain full accuracy around the origin.
### References
* <a id='id5'>**[1]**</a> NIST Digital Library of Mathematical Functions [https://dlmf.nist.gov/5](https://dlmf.nist.gov/5)
* <a id='id6'>**[2]**</a> Fredrik Johansson and others. “mpmath: a Python library for arbitrary-precision floating-point arithmetic” (Version 0.19) [http://mpmath.org/](http://mpmath.org/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.rel_entr.md
# maxframe.tensor.special.rel_entr
### maxframe.tensor.special.rel_entr(x, y, out=None, where=None, \*\*kwargs)
Elementwise function for computing relative entropy.
$$
\mathrm{rel\_entr}(x, y) =
\begin{cases}
x \log(x / y) & x > 0, y > 0 \\
0 & x = 0, y \ge 0 \\
\infty & \text{otherwise}
\end{cases}
$$
* **Parameters:**
* **x** (*array_like*) – Input arrays
* **y** (*array_like*) – Input arrays
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
Relative entropy of the inputs
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`entr`](maxframe.tensor.special.entr.md#maxframe.tensor.special.entr), `kl_div`
### Notes
This function is jointly convex in x and y.
The origin of this function is in convex programming; see
<sup>[1](#id3)</sup>. Given two discrete probability distributions $p_1,
\ldots, p_n$ and $q_1, \ldots, q_n$, to get the relative
entropy of statistics compute the sum
$$
\sum_{i = 1}^n \mathrm{rel\_entr}(p_i, q_i).
$$
See <sup>[2](#id4)</sup> for details.
### References
* <a id='id3'>**[1]**</a> Grant, Boyd, and Ye, “CVX: Matlab Software for Disciplined Convex Programming”, [http://cvxr.com/cvx/](http://cvxr.com/cvx/)
* <a id='id4'>**[2]**</a> Kullback-Leibler divergence, [https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.rgamma.md
# maxframe.tensor.special.rgamma
### maxframe.tensor.special.rgamma(z, out=None)
Reciprocal of the gamma function.
Defined as $1 / \Gamma(z)$, where $\Gamma$ is the
gamma function. For more on the gamma function see gamma.
* **Parameters:**
* **z** (*array_like*) – Real or complex valued input
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
Function results
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`gamma`](maxframe.tensor.special.gamma.md#maxframe.tensor.special.gamma), [`gammaln`](maxframe.tensor.special.gammaln.md#maxframe.tensor.special.gammaln), [`loggamma`](maxframe.tensor.special.loggamma.md#maxframe.tensor.special.loggamma)
### Notes
The gamma function has no zeros and has simple poles at
nonpositive integers, so rgamma is an entire function with zeros
at the nonpositive integers. See the discussion in [[dlmf]](#dlmf) for
more details.
### References
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.softmax.md
# maxframe.tensor.special.softmax
### maxframe.tensor.special.softmax(x, axis=None)
Compute the softmax function.
The softmax function transforms each element of a collection by
computing the exponential of each element divided by the sum of the
exponentials of all the elements. That is, if x is a one-dimensional
numpy array:
```default
softmax(x) = np.exp(x)/sum(np.exp(x))
```
* **Parameters:**
* **x** (*array_like*) – Input array.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Axis to compute values along. Default is None and softmax will be
computed over the entire array x.
* **Returns:**
**s** – An array the same shape as x. The result will sum to 1 along the
specified axis.
* **Return type:**
ndarray
### Notes
The formula for the softmax function $\sigma(x)$ for a vector
$x = \{x_0, x_1, ..., x_{n-1}\}$ is
$$
\sigma(x)_j = \frac{e^{x_j}}{\sum_k e^{x_k}}
$$
The softmax function is the gradient of logsumexp.
The implementation uses shifting to avoid overflow. See <sup>[1](#id2)</sup> for more
details.
### References
* <a id='id2'>**[1]**</a> P. Blanchard, D.J. Higham, N.J. Higham, “Accurately computing the log-sum-exp and softmax functions”, IMA Journal of Numerical Analysis, Vol.41(4), ``` :doi:`10.1093/imanum/draa038` ``` .
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> from maxframe.tensor.special import softmax
```
```pycon
>>> x = mt.array([[1, 0.5, 0.2, 3],
... [1, -1, 7, 3],
... [2, 12, 13, 3]])
...
```
Compute the softmax transformation over the entire array.
```pycon
>>> m = softmax(x)
>>> m.to_numpy()
array([[ 4.48309e-06, 2.71913e-06, 2.01438e-06, 3.31258e-05],
[ 4.48309e-06, 6.06720e-07, 1.80861e-03, 3.31258e-05],
[ 1.21863e-05, 2.68421e-01, 7.29644e-01, 3.31258e-05]])
```
```pycon
>>> m.sum().to_numpy()
1.0
```
Compute the softmax transformation along the first axis (i.e., the
columns).
```pycon
>>> m = softmax(x, axis=0)
>>> m.to_numpy()
array([[ 2.11942e-01, 1.01300e-05, 2.75394e-06, 3.33333e-01],
[ 2.11942e-01, 2.26030e-06, 2.47262e-03, 3.33333e-01],
[ 5.76117e-01, 9.99988e-01, 9.97525e-01, 3.33333e-01]])
>>> m.sum(axis=0).to_numpy()
array([ 1., 1., 1., 1.])
```
Compute the softmax transformation along the second axis (i.e., the rows).
```pycon
>>> m = softmax(x, axis=1)
>>> m.to_numpy()
array([[ 1.05877e-01, 6.42177e-02, 4.75736e-02, 7.82332e-01],
[ 2.42746e-03, 3.28521e-04, 9.79307e-01, 1.79366e-02],
[ 1.22094e-05, 2.68929e-01, 7.31025e-01, 3.31885e-05]])
>>> m.sum(axis=1).to_numpy()
array([ 1., 1., 1.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.softplus.md
# maxframe.tensor.special.softplus
### maxframe.tensor.special.softplus(x, \*\*kwargs)
Compute the softplus function element-wise.
The softplus function is defined as: `softplus(x) = log(1 + exp(x))`.
It is a smooth approximation of the rectifier function (ReLU).
* **Parameters:**
* **x** (*array_like*) – Input value.
* **\*\*kwargs** – For other keyword-only arguments, see the
[ufunc docs](https://numpy.org/doc/stable/reference/ufuncs.html).
* **Returns:**
**softplus** – Logarithm of `exp(0) + exp(x)`.
* **Return type:**
ndarray
### Examples
```pycon
>>> from maxframe.tensor import special
```
```pycon
>>> special.softplus(0).to_numpy()
0.6931471805599453
```
```pycon
>>> special.softplus([-1, 0, 1]).to_numpy()
array([0.31326169, 0.69314718, 1.31326169])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.wofz.md
# maxframe.tensor.special.wofz
### maxframe.tensor.special.wofz(x, out=None, where=None, \*\*kwargs)
Faddeeva function
Returns the value of the Faddeeva function for complex argument:
```default
exp(-z**2) * erfc(-i*z)
```
* **Parameters:**
* **z** (*array_like*) – complex argument
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
Value of the Faddeeva function
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`dawsn`](maxframe.tensor.special.dawsn.md#maxframe.tensor.special.dawsn), [`erf`](maxframe.tensor.special.erf.md#maxframe.tensor.special.erf), [`erfc`](maxframe.tensor.special.erfc.md#maxframe.tensor.special.erfc), [`erfcx`](maxframe.tensor.special.erfcx.md#maxframe.tensor.special.erfcx), [`erfi`](maxframe.tensor.special.erfi.md#maxframe.tensor.special.erfi)
### References
* <a id='id1'>**[1]**</a> Steven G. Johnson, Faddeeva W function implementation. [http://ab-initio.mit.edu/Faddeeva](http://ab-initio.mit.edu/Faddeeva)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.xlogy.md
# maxframe.tensor.special.xlogy
### maxframe.tensor.special.xlogy(x1, x2, out=None, where=None, \*\*kwargs)
Compute `x*log(y)` so that the result is 0 if `x = 0`.
* **Parameters:**
* **x** (*array_like*) – Multiplier
* **y** (*array_like*) – Argument
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
**z** – Computed x\*log(y)
* **Return type:**
scalar or ndarray
### Notes
The log function used in the computation is the natural log.
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.yn.md
# maxframe.tensor.special.yn
### maxframe.tensor.special.yn(n, x, \*\*kwargs)
Bessel function of the second kind of integer order and real argument.
* **Parameters:**
* **n** (*array_like*) – Order (integer).
* **x** (*array_like*) – Argument (float).
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
**Y** – Value of the Bessel function, $Y_n(x)$.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`yv`](maxframe.tensor.special.yv.md#maxframe.tensor.special.yv)
: For real order and real or complex argument.
`y0`
: faster implementation of this function for order 0
`y1`
: faster implementation of this function for order 1
### Notes
Wrapper for the Cephes <sup>[1](#id2)</sup> routine yn.
The function is evaluated by forward recurrence on n, starting with
values computed by the Cephes routines y0 and y1. If n = 0 or 1,
the routine for y0 or y1 is called directly.
### References
* <a id='id2'>**[1]**</a> Cephes Mathematical Functions Library, [http://www.netlib.org/cephes/](http://www.netlib.org/cephes/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.yv.md
# maxframe.tensor.special.yv
### maxframe.tensor.special.yv(v, z, out=None)
Bessel function of the second kind of real order and complex argument.
* **Parameters:**
* **v** (*array_like*) – Order (float).
* **z** (*array_like*) – Argument (float or complex).
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
**Y** – Value of the Bessel function of the second kind, $Y_v(x)$.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`yve`](maxframe.tensor.special.yve.md#maxframe.tensor.special.yve)
: $Y_v$ with leading exponential behavior stripped off.
`y0`
: faster implementation of this function for order 0
`y1`
: faster implementation of this function for order 1
### Notes
For positive v values, the computation is carried out using the
AMOS <sup>[1](#id2)</sup> zbesy routine, which exploits the connection to the Hankel
Bessel functions $H_v^{(1)}$ and $H_v^{(2)}$,
$$
Y_v(z) = \frac{1}{2\imath} (H_v^{(1)} - H_v^{(2)}).
$$
For negative v values the formula,
$$
Y_{-v}(z) = Y_v(z) \cos(\pi v) + J_v(z) \sin(\pi v)
$$
is used, where $J_v(z)$ is the Bessel function of the first kind,
computed using the AMOS routine zbesj. Note that the second term is
exactly zero for integer v; to improve accuracy the second term is
explicitly omitted for v values such that v = floor(v).
### References
* <a id='id2'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.special.yve.md
# maxframe.tensor.special.yve
### maxframe.tensor.special.yve(v, z, out=None)
Exponentially scaled Bessel function of the second kind of real order.
Returns the exponentially scaled Bessel function of the second
kind of real order v at complex z:
```default
yve(v, z) = yv(v, z) * exp(-abs(z.imag))
```
* **Parameters:**
* **v** (*array_like*) – Order (float).
* **z** (*array_like*) – Argument (float or complex).
* **out** (*ndarray* *,* *optional*) – Optional output array for the function results
* **Returns:**
**Y** – Value of the exponentially scaled Bessel function.
* **Return type:**
scalar or ndarray
#### SEE ALSO
[`yv`](maxframe.tensor.special.yv.md#maxframe.tensor.special.yv)
: Unscaled Bessel function of the second kind of real order.
### Notes
For positive v values, the computation is carried out using the
AMOS <sup>[1](#id2)</sup> zbesy routine, which exploits the connection to the Hankel
Bessel functions $H_v^{(1)}$ and $H_v^{(2)}$,
$$
Y_v(z) = \frac{1}{2\imath} (H_v^{(1)} - H_v^{(2)}).
$$
For negative v values the formula,
$$
Y_{-v}(z) = Y_v(z) \cos(\pi v) + J_v(z) \sin(\pi v)
$$
is used, where $J_v(z)$ is the Bessel function of the first kind,
computed using the AMOS routine zbesj. Note that the second term is
exactly zero for integer v; to improve accuracy the second term is
explicitly omitted for v values such that v = floor(v).
Exponentially scaled Bessel functions are useful for large z:
for these, the unscaled Bessel functions can easily under-or overflow.
### References
* <a id='id2'>**[1]**</a> Donald E. Amos, “AMOS, A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order”, [http://netlib.org/amos/](http://netlib.org/amos/)
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.split.md
# maxframe.tensor.split
### maxframe.tensor.split(ary, indices_or_sections, axis=0)
Split a tensor into multiple sub-tensors.
* **Parameters:**
* **ary** (*Tensor*) – Tensor to be divided into sub-tensors.
* **indices_or_sections** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *1-D tensor*) –
If indices_or_sections is an integer, N, the array will be divided
into N equal tensors along axis. If such a split is not possible,
an error is raised.
If indices_or_sections is a 1-D tensor of sorted integers, the entries
indicate where along axis the array is split. For example,
`[2, 3]` would, for `axis=0`, result in
> - ary[:2]
> - ary[2:3]
> - ary[3:]
If an index exceeds the dimension of the tensor along axis,
an empty sub-tensor is returned correspondingly.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The axis along which to split, default is 0.
* **Returns:**
**sub-tensors** – A list of sub-tensors.
* **Return type:**
[list](https://docs.python.org/3/library/stdtypes.html#list) of Tensors
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If indices_or_sections is given as an integer, but
a split does not result in equal division.
#### SEE ALSO
[`array_split`](maxframe.tensor.array_split.md#maxframe.tensor.array_split)
: Split a tensor into multiple sub-tensors of equal or near-equal size. Does not raise an exception if an equal division cannot be made.
[`hsplit`](maxframe.tensor.hsplit.md#maxframe.tensor.hsplit)
: Split into multiple sub-arrays horizontally (column-wise).
[`vsplit`](maxframe.tensor.vsplit.md#maxframe.tensor.vsplit)
: Split tensor into multiple sub-tensors vertically (row wise).
[`dsplit`](maxframe.tensor.dsplit.md#maxframe.tensor.dsplit)
: Split tensor into multiple sub-tensors along the 3rd axis (depth).
[`concatenate`](maxframe.tensor.concatenate.md#maxframe.tensor.concatenate)
: Join a sequence of tensors along an existing axis.
`stack`
: Join a sequence of tensors along a new axis.
`hstack`
: Stack tensors in sequence horizontally (column wise).
[`vstack`](maxframe.tensor.vstack.md#maxframe.tensor.vstack)
: Stack tensors in sequence vertically (row wise).
`dstack`
: Stack tensors in sequence depth wise (along third dimension).
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(9.0)
>>> mt.split(x, 3).execute()
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7., 8.])]
```
```pycon
>>> x = mt.arange(8.0)
>>> mt.split(x, [3, 5, 6, 10]).execute()
[array([ 0., 1., 2.]),
array([ 3., 4.]),
array([ 5.]),
array([ 6., 7.]),
array([], dtype=float64)]
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.sqrt.md
# maxframe.tensor.sqrt
### maxframe.tensor.sqrt(x, out=None, where=None, \*\*kwargs)
Return the positive square-root of an tensor, element-wise.
* **Parameters:**
* **x** (*array_like*) – The values whose square-roots are required.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – An tensor of the same shape as x, containing the positive
square-root of each element in x. If any element in x is
complex, a complex tensor is returned (and the square-roots of
negative reals are calculated). If all of the elements in x
are real, so is y, with negative elements returning `nan`.
If out was provided, y is a reference to it.
* **Return type:**
Tensor
### Notes
*sqrt* has–consistent with common convention–as its branch cut the
real “interval” [-inf, 0), and is continuous from above on it.
A branch cut is a curve in the complex plane across which a given
complex function fails to be continuous.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.sqrt([1,4,9]).execute()
array([ 1., 2., 3.])
```
```pycon
>>> mt.sqrt([4, -1, -3+4J]).execute()
array([ 2.+0.j, 0.+1.j, 1.+2.j])
```
```pycon
>>> mt.sqrt([4, -1, mt.inf]).execute()
array([ 2., NaN, Inf])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.square.md
# maxframe.tensor.square
### maxframe.tensor.square(x, out=None, where=None, \*\*kwargs)
Return the element-wise square of the input.
* **Parameters:**
* **x** (*array_like*) – Input data.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated array is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Element-wise x\*x, of the same shape and dtype as x.
Returns scalar if x is a scalar.
* **Return type:**
Tensor
#### SEE ALSO
[`sqrt`](maxframe.tensor.sqrt.md#maxframe.tensor.sqrt), [`power`](maxframe.tensor.power.md#maxframe.tensor.power)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.square([-1j, 1]).execute()
array([-1.-0.j, 1.+0.j])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.squeeze.md
# maxframe.tensor.squeeze
### maxframe.tensor.squeeze(a, axis=None)
Remove single-dimensional entries from the shape of a tensor.
* **Parameters:**
* **a** (*array_like*) – Input data.
* **axis** (*None* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Selects a subset of the single-dimensional entries in the
shape. If an axis is selected with shape entry greater than
one, an error is raised.
* **Returns:**
**squeezed** – The input tensor, but with all or a subset of the
dimensions of length 1 removed. This is always a itself
or a view into a.
* **Return type:**
Tensor
* **Raises:**
[**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError) – If axis is not None, and an axis being squeezed is not of length 1
#### SEE ALSO
[`expand_dims`](maxframe.tensor.expand_dims.md#maxframe.tensor.expand_dims)
: The inverse operation, adding singleton dimensions
[`reshape`](maxframe.tensor.reshape.md#maxframe.tensor.reshape)
: Insert, remove, and combine dimensions, and resize existing ones
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([[[0], [1], [2]]])
>>> x.shape
(1, 3, 1)
>>> mt.squeeze(x).shape
(3,)
>>> mt.squeeze(x, axis=0).shape
(3, 1)
>>> mt.squeeze(x, axis=1).shape
Traceback (most recent call last):
...
ValueError: cannot select an axis to squeeze out which has size not equal to one
>>> mt.squeeze(x, axis=2).shape
(1, 3)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.std.md
# maxframe.tensor.std
### maxframe.tensor.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=None)
Compute the standard deviation along the specified axis.
Returns the standard deviation, a measure of the spread of a distribution,
of the tensor elements. The standard deviation is computed for the
flattened tensor by default, otherwise over the specified axis.
* **Parameters:**
* **a** (*array_like*) – Calculate the standard deviation of these values.
* **axis** (*None* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) –
Axis or axes along which the standard deviation is computed. The
default is to compute the standard deviation of the flattened tensor.
If this is a tuple of ints, a standard deviation is performed over
multiple axes, instead of a single axis or all the axes as before.
* **dtype** (*dtype* *,* *optional*) – Type to use in computing the standard deviation. For tensors of
integer type the default is float64, for tensors of float types it is
the same as the array type.
* **out** (*Tensor* *,* *optional*) – Alternative output tensor in which to place the result. It must have
the same shape as the expected output but the type (of the calculated
values) will be cast if necessary.
* **ddof** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Means Delta Degrees of Freedom. The divisor used in calculations
is `N - ddof`, where `N` represents the number of elements.
By default ddof is zero.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input tensor.
If the default value is passed, then keepdims will not be
passed through to the std method of sub-classes of
Tensor, however any non-default value will be. If the
sub-classes sum method does not implement keepdims any
exceptions will be raised.
* **Returns:**
**standard_deviation** – If out is None, return a new tensor containing the standard deviation,
otherwise return a reference to the output array.
* **Return type:**
Tensor, see dtype parameter above.
#### SEE ALSO
[`var`](maxframe.tensor.var.md#maxframe.tensor.var), [`mean`](maxframe.tensor.mean.md#maxframe.tensor.mean), [`nanmean`](maxframe.tensor.nanmean.md#maxframe.tensor.nanmean), [`nanstd`](maxframe.tensor.nanstd.md#maxframe.tensor.nanstd), [`nanvar`](maxframe.tensor.nanvar.md#maxframe.tensor.nanvar)
### Notes
The standard deviation is the square root of the average of the squared
deviations from the mean, i.e., `std = sqrt(mean(abs(x - x.mean())**2))`.
The average squared deviation is normally calculated as
`x.sum() / N`, where `N = len(x)`. If, however, ddof is specified,
the divisor `N - ddof` is used instead. In standard statistical
practice, `ddof=1` provides an unbiased estimator of the variance
of the infinite population. `ddof=0` provides a maximum likelihood
estimate of the variance for normally distributed variables. The
standard deviation computed in this function is the square root of
the estimated variance, so even with `ddof=1`, it will not be an
unbiased estimate of the standard deviation per se.
Note that, for complex numbers, std takes the absolute
value before squaring, so that the result is always real and nonnegative.
For floating-point input, the *std* is computed using the same
precision the input has. Depending on the input data, this can cause
the results to be inaccurate, especially for float32 (see example below).
Specifying a higher-accuracy accumulator using the dtype keyword can
alleviate this issue.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[1, 2], [3, 4]])
>>> mt.std(a).execute()
1.1180339887498949
>>> mt.std(a, axis=0).execute()
array([ 1., 1.])
>>> mt.std(a, axis=1).execute()
array([ 0.5, 0.5])
```
In single precision, std() can be inaccurate:
```pycon
>>> a = mt.zeros((2, 512*512), dtype=mt.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> mt.std(a).execute()
0.45000005
```
Computing the standard deviation in float64 is more accurate:
```pycon
>>> mt.std(a, dtype=mt.float64).execute()
0.44999999925494177
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.subtract.md
# maxframe.tensor.subtract
### maxframe.tensor.subtract(x1, x2, out=None, where=None, \*\*kwargs)
Subtract arguments, element-wise.
* **Parameters:**
* **x1** (*array_like*) – The tensors to be subtracted from each other.
* **x2** (*array_like*) – The tensors to be subtracted from each other.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The difference of x1 and x2, element-wise. Returns a scalar if
both x1 and x2 are scalars.
* **Return type:**
Tensor
### Notes
Equivalent to `x1 - x2` in terms of tensor broadcasting.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.subtract(1.0, 4.0).execute()
-3.0
```
```pycon
>>> x1 = mt.arange(9.0).reshape((3, 3))
>>> x2 = mt.arange(3.0)
>>> mt.subtract(x1, x2).execute()
array([[ 0., 0., 0.],
[ 3., 3., 3.],
[ 6., 6., 6.]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.sum.md
# maxframe.tensor.sum
### maxframe.tensor.sum(a, axis=None, dtype=None, out=None, keepdims=None)
Sum of tensor elements over a given axis.
* **Parameters:**
* **a** (*array_like*) – Elements to sum.
* **axis** (*None* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) –
Axis or axes along which a sum is performed. The default,
axis=None, will sum all of the elements of the input tensor. If
axis is negative it counts from the last to the first axis.
If axis is a tuple of ints, a sum is performed on all of the axes
specified in the tuple instead of a single axis or all the axes as
before.
* **dtype** (*dtype* *,* *optional*) – The type of the returned tensor and of the accumulator in which the
elements are summed. The dtype of a is used by default unless a
has an integer dtype of less precision than the default platform
integer. In that case, if a is signed then the platform integer
is used while if a is unsigned then an unsigned integer of the
same precision as the platform integer is used.
* **out** (*Tensor* *,* *optional*) – Alternative output tensor in which to place the result. It must have
the same shape as the expected output, but the type of the output
values will be cast if necessary.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input tensor.
If the default value is passed, then keepdims will not be
passed through to the sum method of sub-classes of
Tensor, however any non-default value will be. If the
sub-classes sum method does not implement keepdims any
exceptions will be raised.
* **Returns:**
**sum_along_axis** – An array with the same shape as a, with the specified
axis removed. If a is a 0-d tensor, or if axis is None, a scalar
is returned. If an output array is specified, a reference to
out is returned.
* **Return type:**
Tensor
#### SEE ALSO
`Tensor.sum`
: Equivalent method.
[`cumsum`](maxframe.tensor.cumsum.md#maxframe.tensor.cumsum)
: Cumulative sum of tensor elements.
`trapz`
: Integration of tensor values using the composite trapezoidal rule.
[`mean`](maxframe.tensor.mean.md#maxframe.tensor.mean), [`average`](maxframe.tensor.average.md#maxframe.tensor.average)
### Notes
Arithmetic is modular when using integer types, and no error is
raised on overflow.
The sum of an empty array is the neutral element 0:
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.sum([]).execute()
0.0
```
### Examples
```pycon
>>> mt.sum([0.5, 1.5]).execute()
2.0
>>> mt.sum([0.5, 0.7, 0.2, 1.5], dtype=mt.int32).execute()
1
>>> mt.sum([[0, 1], [0, 5]]).execute()
6
>>> mt.sum([[0, 1], [0, 5]], axis=0).execute()
array([0, 6])
>>> mt.sum([[0, 1], [0, 5]], axis=1).execute()
array([1, 5])
```
If the accumulator is too small, overflow occurs:
```pycon
>>> mt.ones(128, dtype=mt.int8).sum(dtype=mt.int8).execute()
-128
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.swapaxes.md
# maxframe.tensor.swapaxes
### maxframe.tensor.swapaxes(a, axis1, axis2)
Interchange two axes of a tensor.
* **Parameters:**
* **a** (*array_like*) – Input tensor.
* **axis1** ([*int*](https://docs.python.org/3/library/functions.html#int)) – First axis.
* **axis2** ([*int*](https://docs.python.org/3/library/functions.html#int)) – Second axis.
* **Returns:**
**a_swapped** – If a is a Tensor, then a view of a is
returned; otherwise a new tensor is created.
* **Return type:**
Tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.array([[1,2,3]])
>>> mt.swapaxes(x,0,1).execute()
array([[1],
[2],
[3]])
```
```pycon
>>> x = mt.array([[[0,1],[2,3]],[[4,5],[6,7]]])
>>> x.execute()
array([[[0, 1],
[2, 3]],
[[4, 5],
[6, 7]]])
```
```pycon
>>> mt.swapaxes(x,0,2).execute()
array([[[0, 4],
[2, 6]],
[[1, 5],
[3, 7]]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.take.md
# maxframe.tensor.take
### maxframe.tensor.take(a, indices, axis=None, out=None)
Take elements from a tensor along an axis.
When axis is not None, this function does the same thing as “fancy”
indexing (indexing arrays using tensors); however, it can be easier to use
if you need elements along a given axis. A call such as
`mt.take(arr, indices, axis=3)` is equivalent to
`arr[:,:,:,indices,...]`.
Explained without fancy indexing, this is equivalent to the following use
of ndindex, which sets each of `ii`, `jj`, and `kk` to a tuple of
indices:
```default
Ni, Nk = a.shape[:axis], a.shape[axis+1:]
Nj = indices.shape
for ii in ndindex(Ni):
for jj in ndindex(Nj):
for kk in ndindex(Nk):
out[ii + jj + kk] = a[ii + (indices[jj],) + kk]
```
* **Parameters:**
* **a** (*array_like* *(**Ni* *...* *,* *M* *,* *Nk* *...* *)*) – The source tensor.
* **indices** (*array_like* *(**Nj* *...* *)*) –
The indices of the values to extract.
Also allow scalars for indices.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The axis over which to select values. By default, the flattened
input tensor is used.
* **out** (*Tensor* *,* *optional* *(**Ni* *...* *,* *Nj* *...* *,* *Nk* *...* *)*) – If provided, the result will be placed in this tensor. It should
be of the appropriate shape and dtype.
* **mode** ( *{'raise'* *,* *'wrap'* *,* *'clip'}* *,* *optional*) –
Specifies how out-of-bounds indices will behave.
* ’raise’ – raise an error (default)
* ’wrap’ – wrap around
* ’clip’ – clip to the range
’clip’ mode means that all indices that are too large are replaced
by the index that addresses the last element along that axis. Note
that this disables indexing with negative numbers.
* **Returns:**
**out** – The returned tensor has the same type as a.
* **Return type:**
Tensor (Ni…, Nj…, Nk…)
#### SEE ALSO
[`compress`](maxframe.tensor.compress.md#maxframe.tensor.compress)
: Take elements using a boolean mask
`Tensor.take`
: equivalent method
### Notes
By eliminating the inner loop in the description above, and using s_ to
build simple slice objects, take can be expressed in terms of applying
fancy indexing to each 1-d slice:
```default
Ni, Nk = a.shape[:axis], a.shape[axis+1:]
for ii in ndindex(Ni):
for kk in ndindex(Nj):
out[ii + s_[...,] + kk] = a[ii + s_[:,] + kk][indices]
```
For this reason, it is equivalent to (but faster than) the following use
of apply_along_axis:
```default
out = mt.apply_along_axis(lambda a_1d: a_1d[indices], axis, a)
```
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> a = [4, 3, 5, 7, 6, 8]
>>> indices = [0, 1, 4]
>>> mt.take(a, indices).execute()
array([4, 3, 6])
```
In this example if a is a tensor, “fancy” indexing can be used.
```pycon
>>> a = mt.array(a)
>>> a[indices].execute()
array([4, 3, 6])
```
If indices is not one dimensional, the output also has these dimensions.
```pycon
>>> mt.take(a, [[0, 1], [2, 3]]).execute()
array([[4, 3],
[5, 7]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.tan.md
# maxframe.tensor.tan
### maxframe.tensor.tan(x, out=None, where=None, \*\*kwargs)
Compute tangent element-wise.
Equivalent to `mt.sin(x)/mt.cos(x)` element-wise.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The corresponding tangent values.
* **Return type:**
Tensor
### Notes
If out is provided, the function writes the result into it,
and returns a reference to out. (See Examples)
### References
M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions.
New York, NY: Dover, 1972.
### Examples
```pycon
>>> from math import pi
>>> import maxframe.tensor as mt
>>> mt.tan(mt.array([-pi,pi/2,pi])).execute()
array([ 1.22460635e-16, 1.63317787e+16, -1.22460635e-16])
>>>
>>> # Example of providing the optional output parameter illustrating
>>> # that what is returned is a reference to said parameter
>>> out1 = mt.zeros(1)
>>> out2 = mt.cos([0.1], out1)
>>> out2 is out1
True
>>>
>>> # Example of ValueError due to provision of shape mis-matched `out`
>>> mt.cos(mt.zeros((3,3)),mt.zeros((2,2)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid return array shape
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.tanh.md
# maxframe.tensor.tanh
### maxframe.tensor.tanh(x, out=None, where=None, \*\*kwargs)
Compute hyperbolic tangent element-wise.
Equivalent to `mt.sinh(x)/np.cosh(x)` or `-1j * mt.tan(1j*x)`.
* **Parameters:**
* **x** (*array_like*) – Input tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The corresponding hyperbolic tangent values.
* **Return type:**
Tensor
### Notes
If out is provided, the function writes the result into it,
and returns a reference to out. (See Examples)
### References
* <a id='id1'>**[1]**</a> M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions. New York, NY: Dover, 1972, pg. 83. [http://www.math.sfu.ca/~cbm/aands/](http://www.math.sfu.ca/~cbm/aands/)
* <a id='id2'>**[2]**</a> Wikipedia, “Hyperbolic function”, [http://en.wikipedia.org/wiki/Hyperbolic_function](http://en.wikipedia.org/wiki/Hyperbolic_function)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.tanh((0, mt.pi*1j, mt.pi*1j/2)).execute()
array([ 0. +0.00000000e+00j, 0. -1.22460635e-16j, 0. +1.63317787e+16j])
```
```pycon
>>> # Example of providing the optional output parameter illustrating
>>> # that what is returned is a reference to said parameter
>>> out1 = mt.zeros(1)
>>> out2 = mt.tanh([0.1], out1)
>>> out2 is out1
True
```
```pycon
>>> # Example of ValueError due to provision of shape mis-matched `out`
>>> mt.tanh(mt.zeros((3,3)),mt.zeros((2,2)))
Traceback (most recent call last):
...
ValueError: operators could not be broadcast together with shapes (3,3) (2,2)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.tensor.md
# maxframe.tensor.tensor
### maxframe.tensor.tensor(data=None, dtype=None, order='K', chunk_size=None, gpu=None, sparse=False) → Tensor
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.tensordot.md
# maxframe.tensor.tensordot
### maxframe.tensor.tensordot(a, b, axes=2, sparse=None)
Compute tensor dot product along specified axes for tensors >= 1-D.
Given two tensors (arrays of dimension greater than or equal to one),
a and b, and an array_like object containing two array_like
objects, `(a_axes, b_axes)`, sum the products of a’s and b’s
elements (components) over the axes specified by `a_axes` and
`b_axes`. The third argument can be a single non-negative
integer_like scalar, `N`; if it is such, then the last `N`
dimensions of a and the first `N` dimensions of b are summed
over.
* **Parameters:**
* **a** (*array_like* *,* *len* *(**shape* *)* *>= 1*) – Tensors to “dot”.
* **b** (*array_like* *,* *len* *(**shape* *)* *>= 1*) – Tensors to “dot”.
* **axes** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *(**2* *,* *)* *array_like*) –
* integer_like
If an int N, sum over the last N axes of a and the first N axes
of b in order. The sizes of the corresponding axes must match.
* (2,) array_like
Or, a list of axes to be summed over, first sequence applying to a,
second to b. Both elements array_like must be of the same length.
#### SEE ALSO
[`dot`](maxframe.tensor.dot.md#maxframe.tensor.dot), [`einsum`](maxframe.tensor.einsum.md#maxframe.tensor.einsum)
### Notes
Three common use cases are:
> * `axes = 0` : tensor product $a\otimes b$
> * `axes = 1` : tensor dot product $a\cdot b$
> * `axes = 2` : (default) tensor double contraction $a:b$
When axes is integer_like, the sequence for evaluation will be: first
the -Nth axis in a and 0th axis in b, and the -1th axis in a and
Nth axis in b last.
When there is more than one axis to sum over - and they are not the last
(first) axes of a (b) - the argument axes should consist of
two sequences of the same length, with the first axis to sum over given
first in both sequences, the second axis second, and so forth.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
A “traditional” example:
```pycon
>>> a = mt.arange(60.).reshape(3,4,5)
>>> b = mt.arange(24.).reshape(4,3,2)
>>> c = mt.tensordot(a,b, axes=([1,0],[0,1]))
>>> c.shape
(5, 2)
```
```pycon
>>> r = c.execute()
>>> r
array([[ 4400., 4730.],
[ 4532., 4874.],
[ 4664., 5018.],
[ 4796., 5162.],
[ 4928., 5306.]])
```
```pycon
>>> # A slower but equivalent way of computing the same...
>>> ra = np.arange(60.).reshape(3,4,5)
>>> rb = np.arange(24.).reshape(4,3,2)
>>> d = np.zeros((5,2))
>>> for i in range(5):
... for j in range(2):
... for k in range(3):
... for n in range(4):
... d[i,j] += ra[k,n,i] * rb[n,k,j]
>>> r == d
array([[ True, True],
[ True, True],
[ True, True],
[ True, True],
[ True, True]], dtype=bool)
```
An extended example taking advantage of the overloading of + and \*:
```pycon
>>> a = mt.array(range(1, 9))
>>> a.shape = (2, 2, 2)
>>> A = mt.array(('a', 'b', 'c', 'd'), dtype=object)
>>> A.shape = (2, 2)
>>> a.execute(); A.execute()
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
array([[a, b],
[c, d]], dtype=object)
```
```pycon
>>> mt.tensordot(a, A).execute() # third argument default is 2 for double-contraction
array([abbcccdddd, aaaaabbbbbbcccccccdddddddd], dtype=object)
```
```pycon
>>> mt.tensordot(a, A, 1).execute()
array([[[acc, bdd],
[aaacccc, bbbdddd]],
[[aaaaacccccc, bbbbbdddddd],
[aaaaaaacccccccc, bbbbbbbdddddddd]]], dtype=object)
```
```pycon
>>> mt.tensordot(a, A, 0).execute() # tensor product (result too long to incl.)
array([[[[[a, b],
[c, d]],
...
```
```pycon
>>> mt.tensordot(a, A, (0, 1)).execute()
array([[[abbbbb, cddddd],
[aabbbbbb, ccdddddd]],
[[aaabbbbbbb, cccddddddd],
[aaaabbbbbbbb, ccccdddddddd]]], dtype=object)
```
```pycon
>>> mt.tensordot(a, A, (2, 1)).execute()
array([[[abb, cdd],
[aaabbbb, cccdddd]],
[[aaaaabbbbbb, cccccdddddd],
[aaaaaaabbbbbbbb, cccccccdddddddd]]], dtype=object)
```
```pycon
>>> mt.tensordot(a, A, ((0, 1), (0, 1))).execute()
array([abbbcccccddddddd, aabbbbccccccdddddddd], dtype=object)
```
```pycon
>>> mt.tensordot(a, A, ((2, 1), (1, 0))).execute()
array([acccbbdddd, aaaaacccccccbbbbbbdddddddd], dtype=object)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.tile.md
# maxframe.tensor.tile
### maxframe.tensor.tile(A, reps)
Construct a tensor by repeating A the number of times given by reps.
If reps has length `d`, the result will have dimension of
`max(d, A.ndim)`.
If `A.ndim < d`, A is promoted to be d-dimensional by prepending new
axes. So a shape (3,) array is promoted to (1, 3) for 2-D replication,
or shape (1, 1, 3) for 3-D replication. If this is not the desired
behavior, promote A to d-dimensions manually before calling this
function.
If `A.ndim > d`, reps is promoted to A.ndim by prepending 1’s to it.
Thus for an A of shape (2, 3, 4, 5), a reps of (2, 2) is treated as
(1, 1, 2, 2).
Note : Although tile may be used for broadcasting, it is strongly
recommended to use MaxFrame’s broadcasting operations and functions.
* **Parameters:**
* **A** (*array_like*) – The input tensor.
* **reps** (*array_like*) – The number of repetitions of A along each axis.
* **Returns:**
**c** – The tiled output tensor.
* **Return type:**
Tensor
#### SEE ALSO
[`repeat`](maxframe.tensor.repeat.md#maxframe.tensor.repeat)
: Repeat elements of a tensor.
[`broadcast_to`](maxframe.tensor.broadcast_to.md#maxframe.tensor.broadcast_to)
: Broadcast a tensor to a new shape
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([0, 1, 2])
>>> mt.tile(a, 2).execute()
array([0, 1, 2, 0, 1, 2])
>>> mt.tile(a, (2, 2)).execute()
array([[0, 1, 2, 0, 1, 2],
[0, 1, 2, 0, 1, 2]])
>>> mt.tile(a, (2, 1, 2)).execute()
array([[[0, 1, 2, 0, 1, 2]],
[[0, 1, 2, 0, 1, 2]]])
```
```pycon
>>> b = mt.array([[1, 2], [3, 4]])
>>> mt.tile(b, 2).execute()
array([[1, 2, 1, 2],
[3, 4, 3, 4]])
>>> mt.tile(b, (2, 1)).execute()
array([[1, 2],
[3, 4],
[1, 2],
[3, 4]])
```
```pycon
>>> c = mt.array([1,2,3,4])
>>> mt.tile(c,(4,1)).execute()
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.transpose.md
# maxframe.tensor.transpose
### maxframe.tensor.transpose(a, axes=None)
Returns an array with axes transposed.
For a 1-D array, this returns an unchanged view of the original array, as a
transposed vector is simply the same vector.
To convert a 1-D array into a 2-D column vector, an additional dimension
must be added, e.g., `mt.atleast_2d(a).T` achieves this, as does
`a[:, mt.newaxis]`.
For a 2-D array, this is the standard matrix transpose.
For an n-D array, if axes are given, their order indicates how the
axes are permuted (see Examples). If axes are not provided, then
`transpose(a).shape == a.shape[::-1]`.
* **Parameters:**
* **a** (*array_like*) – Input array.
* **axes** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *or* [*list*](https://docs.python.org/3/library/stdtypes.html#list) *of* *ints* *,* *optional*) – If specified, it must be a tuple or list which contains a permutation
of [0,1,…,N-1] where N is the number of axes of a. The i’th axis
of the returned array will correspond to the axis numbered `axes[i]`
of the input. If not specified, defaults to `range(a.ndim)[::-1]`,
which reverses the order of the axes.
* **Returns:**
**p** – a with its axes permuted. A view is returned whenever possible.
* **Return type:**
ndarray
### Notes
Use `transpose(a, argsort(axes))` to invert the transposition of tensors
when using the axes keyword argument.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(4).reshape((2,2))
>>> x.execute()
array([[0, 1],
[2, 3]])
```
```pycon
>>> mt.transpose(x).execute()
array([[0, 2],
[1, 3]])
```
```pycon
>>> x = mt.ones((1, 2, 3))
>>> mt.transpose(x, (1, 0, 2)).shape
(2, 1, 3)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.trapezoid.md
# maxframe.tensor.trapezoid
### maxframe.tensor.trapezoid(y, x=None, dx=1.0, axis=-1)
Integrate along the given axis using the composite trapezoidal rule.
Integrate y (x) along given axis.
* **Parameters:**
* **y** (*array_like*) – Input tensor to integrate.
* **x** (*array_like* *,* *optional*) – The sample points corresponding to the y values. If x is None,
the sample points are assumed to be evenly spaced dx apart. The
default is None.
* **dx** (*scalar* *,* *optional*) – The spacing between sample points when x is None. The default is 1.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – The axis along which to integrate.
* **Returns:**
**trapezoid** – Definite integral as approximated by trapezoidal rule.
* **Return type:**
[float](https://docs.python.org/3/library/functions.html#float)
#### SEE ALSO
[`sum`](maxframe.tensor.sum.md#maxframe.tensor.sum), [`cumsum`](maxframe.tensor.cumsum.md#maxframe.tensor.cumsum)
### Notes
Image <sup>[2](#id3)</sup> illustrates trapezoidal rule – y-axis locations of points
will be taken from y tensor, by default x-axis distances between
points will be 1.0, alternatively they can be provided with x tensor
or with dx scalar. Return value will be equal to combined area under
the red lines.
### References
* <a id='id2'>**[1]**</a> Wikipedia page: [https://en.wikipedia.org/wiki/Trapezoidal_rule](https://en.wikipedia.org/wiki/Trapezoidal_rule)
* <a id='id3'>**[2]**</a> Illustration image: [https://en.wikipedia.org/wiki/File:Composite_trapezoidal_rule_illustration.png](https://en.wikipedia.org/wiki/File:Composite_trapezoidal_rule_illustration.png)
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> mt.trapezoid([1,2,3]).execute()
4.0
>>> mt.trapezoid([1,2,3], x=[4,6,8]).execute()
8.0
>>> mt.trapezoid([1,2,3], dx=2).execute()
8.0
>>> a = mt.arange(6).reshape(2, 3)
>>> a.execute()
array([[0, 1, 2],
[3, 4, 5]])
>>> mt.trapezoid(a, axis=0).execute()
array([1.5, 2.5, 3.5])
>>> mt.trapezoid(a, axis=1).execute()
array([2., 8.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.tril.md
# maxframe.tensor.tril
### maxframe.tensor.tril(m, k=0, gpu=None)
Lower triangle of a tensor.
Return a copy of a tensor with elements above the k-th diagonal zeroed.
* **Parameters:**
* **m** (*array_like* *,* *shape* *(**M* *,* *N* *)*) – Input tensor.
* **k** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – Diagonal above which to zero elements. k = 0 (the default) is the
main diagonal, k < 0 is below it and k > 0 is above.
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, None as default
* **Returns:**
**tril** – Lower triangle of m, of same shape and data-type as m.
* **Return type:**
Tensor, shape (M, N)
#### SEE ALSO
[`triu`](maxframe.tensor.triu.md#maxframe.tensor.triu)
: same thing, only for the upper triangle
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.tril([[1,2,3],[4,5,6],[7,8,9],[10,11,12]], -1).execute()
array([[ 0, 0, 0],
[ 4, 0, 0],
[ 7, 8, 0],
[10, 11, 12]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.triu.md
# maxframe.tensor.triu
### maxframe.tensor.triu(m, k=0, gpu=None)
Upper triangle of a tensor.
Return a copy of a matrix with the elements below the k-th diagonal
zeroed.
Please refer to the documentation for tril for further details.
#### SEE ALSO
[`tril`](maxframe.tensor.tril.md#maxframe.tensor.tril)
: lower triangle of a tensor
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.triu([[1,2,3],[4,5,6],[7,8,9],[10,11,12]], -1).execute()
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 0, 8, 9],
[ 0, 0, 12]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.true_divide.md
# maxframe.tensor.true_divide
### maxframe.tensor.true_divide(x1, x2, out=None, where=None, \*\*kwargs)
Returns a true division of the inputs, element-wise.
Instead of the Python traditional ‘floor division’, this returns a true
division. True division adjusts the output type to present the best
answer, regardless of input types.
* **Parameters:**
* **x1** (*array_like*) – Dividend tensor.
* **x2** (*array_like*) – Divisor tensor.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**out** – Result is scalar if both inputs are scalar, tensor otherwise.
* **Return type:**
Tensor
### Notes
The floor division operator `//` was added in Python 2.2 making
`//` and `/` equivalent operators. The default floor division
operation of `/` can be replaced by true division with `from
__future__ import division`.
In Python 3.0, `//` is the floor division operator and `/` the
true division operator. The `true_divide(x1, x2)` function is
equivalent to true division in Python.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(5)
>>> mt.true_divide(x, 4).execute()
array([ 0. , 0.25, 0.5 , 0.75, 1. ])
```
# for python 2
>>> (x/4).execute()
array([0, 0, 0, 0, 1])
>>> (x//4).execute()
array([0, 0, 0, 0, 1])
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.trunc.md
# maxframe.tensor.trunc
### maxframe.tensor.trunc(x, out=None, where=None, \*\*kwargs)
Return the truncated value of the input, element-wise.
The truncated value of the scalar x is the nearest integer i which
is closer to zero than x is. In short, the fractional part of the
signed number x is discarded.
* **Parameters:**
* **x** (*array_like*) – Input data.
* **out** (*Tensor* *,* *None* *, or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *Tensor and None* *,* *optional*) – A location into which the result is stored. If provided, it must have
a shape that the inputs broadcast to. If not provided or None,
a freshly-allocated tensor is returned. A tuple (possible only as a
keyword argument) must have length equal to the number of outputs.
* **where** (*array_like* *,* *optional*) – Values of True indicate to calculate the ufunc at that position, values
of False indicate to leave the value in the output alone.
* **\*\*kwargs**
* **Returns:**
**y** – The truncated value of each element in x.
* **Return type:**
Tensor or scalar
#### SEE ALSO
[`ceil`](maxframe.tensor.ceil.md#maxframe.tensor.ceil), [`floor`](maxframe.tensor.floor.md#maxframe.tensor.floor), [`rint`](maxframe.tensor.rint.md#maxframe.tensor.rint)
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([-1.7, -1.5, -0.2, 0.2, 1.5, 1.7, 2.0])
>>> mt.trunc(a).execute()
array([-1., -1., -0., 0., 1., 1., 2.])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.unique.md
# maxframe.tensor.unique
### maxframe.tensor.unique(ar, return_index=False, return_inverse=False, return_counts=False, axis=None, method='auto', aggregate_size=None, sort=True, use_na_sentinel=False, na_position=None)
Find the unique elements of a tensor.
Returns the sorted unique elements of a tensor. There are three optional
outputs in addition to the unique elements:
* the indices of the input tensor that give the unique values
* the indices of the unique tensor that reconstruct the input tensor
* the number of times each unique value comes up in the input tensor
* **Parameters:**
* **ar** (*array_like*) – Input tensor. Unless axis is specified, this will be flattened if it
is not already 1-D.
* **return_index** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True, also return the indices of ar (along the specified axis,
if provided, or in the flattened tensor) that result in the unique tensor.
* **return_inverse** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True, also return the indices of the unique tensor (for the specified
axis, if provided) that can be used to reconstruct ar.
* **return_counts** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – If True, also return the number of times each unique item appears
in ar.
* **axis** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *None* *,* *optional*) – The axis to operate on. If None, ar will be flattened. If an integer,
the subarrays indexed by the given axis will be flattened and treated
as the elements of a 1-D tensor with the dimension of the given axis,
see the notes for more details. Object tensors or structured tensors
that contain objects are not supported if the axis kwarg is used. The
default is None.
* **Returns:**
* **unique** (*Tensor*) – The sorted unique values.
* **unique_indices** (*Tensor, optional*) – The indices of the first occurrences of the unique values in the
original tensor. Only provided if return_index is True.
* **unique_inverse** (*Tensor, optional*) – The indices to reconstruct the original tensor from the
unique tensor. Only provided if return_inverse is True.
* **unique_counts** (*Tensor, optional*) – The number of times each of the unique values comes up in the
original tensor. Only provided if return_counts is True.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.unique([1, 1, 2, 2, 3, 3]).execute()
array([1, 2, 3])
>>> a = mt.array([[1, 1], [2, 3]])
>>> mt.unique(a).execute()
array([1, 2, 3])
```
Return the unique rows of a 2D tensor
```pycon
>>> a = mt.array([[1, 0, 0], [1, 0, 0], [2, 3, 4]])
>>> mt.unique(a, axis=0).execute()
array([[1, 0, 0], [2, 3, 4]])
```
Return the indices of the original tensor that give the unique values:
```pycon
>>> a = mt.array(['a', 'b', 'b', 'c', 'a'])
>>> u, indices = mt.unique(a, return_index=True)
>>> u.execute()
array(['a', 'b', 'c'],
dtype='|S1')
>>> indices.execute()
array([0, 1, 3])
>>> a[indices].execute()
array(['a', 'b', 'c'],
dtype='|S1')
```
Reconstruct the input array from the unique values:
```pycon
>>> a = mt.array([1, 2, 6, 4, 2, 3, 2])
>>> u, indices = mt.unique(a, return_inverse=True)
>>> u.execute()
array([1, 2, 3, 4, 6])
>>> indices.execute()
array([0, 1, 4, 3, 1, 2, 1])
>>> u[indices].execute()
array([1, 2, 6, 4, 2, 3, 2])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.unravel_index.md
# maxframe.tensor.unravel_index
### maxframe.tensor.unravel_index(indices, dims, order='C')
Converts a flat index or tensor of flat indices into a tuple
of coordinate tensors.
* **Parameters:**
* **indices** (*array_like*) – An integer tensor whose elements are indices into the flattened
version of a tensor of dimensions `dims`.
* **dims** ([*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints*) – The shape of the tensor to use for unraveling `indices`.
* **order** ( *{'C'* *,* *'F'}* *,* *optional*) – Determines whether the indices should be viewed as indexing in
row-major (C-style) or column-major (Fortran-style) order.
* **Returns:**
**unraveled_coords** – Each tensor in the tuple has the same shape as the `indices`
tensor.
* **Return type:**
[tuple](https://docs.python.org/3/library/stdtypes.html#tuple) of Tensor
#### SEE ALSO
`ravel_multi_index`
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.unravel_index([22, 41, 37], (7,6)).execute()
(array([3, 6, 6]), array([4, 5, 1]))
```
```pycon
>>> mt.unravel_index(1621, (6,7,8,9)).execute()
(3, 1, 4, 1)
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.var.md
# maxframe.tensor.var
### maxframe.tensor.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=None)
Compute the variance along the specified axis.
Returns the variance of the tensor elements, a measure of the spread of a
distribution. The variance is computed for the flattened tensor by
default, otherwise over the specified axis.
* **Parameters:**
* **a** (*array_like*) – Tensor containing numbers whose variance is desired. If a is not a
tensor, a conversion is attempted.
* **axis** (*None* *or* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) –
Axis or axes along which the variance is computed. The default is to
compute the variance of the flattened array.
If this is a tuple of ints, a variance is performed over multiple axes,
instead of a single axis or all the axes as before.
* **dtype** (*data-type* *,* *optional*) – Type to use in computing the variance. For arrays of integer type
the default is float32; for tensors of float types it is the same as
the tensor type.
* **out** (*Tensor* *,* *optional*) – Alternate output array in which to place the result. It must have
the same shape as the expected output, but the type is cast if
necessary.
* **ddof** ([*int*](https://docs.python.org/3/library/functions.html#int) *,* *optional*) – “Delta Degrees of Freedom”: the divisor used in the calculation is
`N - ddof`, where `N` represents the number of elements. By
default ddof is zero.
* **keepdims** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) –
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input tensor.
If the default value is passed, then keepdims will not be
passed through to the var method of sub-classes of
Tensor, however any non-default value will be. If the
sub-classes sum method does not implement keepdims any
exceptions will be raised.
* **Returns:**
**variance** – If `out=None`, returns a new tensor containing the variance;
otherwise, a reference to the output tensor is returned.
* **Return type:**
Tensor, see dtype parameter above
#### SEE ALSO
[`std`](maxframe.tensor.std.md#maxframe.tensor.std), [`mean`](maxframe.tensor.mean.md#maxframe.tensor.mean), [`nanmean`](maxframe.tensor.nanmean.md#maxframe.tensor.nanmean), [`nanstd`](maxframe.tensor.nanstd.md#maxframe.tensor.nanstd), [`nanvar`](maxframe.tensor.nanvar.md#maxframe.tensor.nanvar)
### Notes
The variance is the average of the squared deviations from the mean,
i.e., `var = mean(abs(x - x.mean())**2)`.
The mean is normally calculated as `x.sum() / N`, where `N = len(x)`.
If, however, ddof is specified, the divisor `N - ddof` is used
instead. In standard statistical practice, `ddof=1` provides an
unbiased estimator of the variance of a hypothetical infinite population.
`ddof=0` provides a maximum likelihood estimate of the variance for
normally distributed variables.
Note that for complex numbers, the absolute value is taken before
squaring, so that the result is always real and nonnegative.
For floating-point input, the variance is computed using the same
precision the input has. Depending on the input data, this can cause
the results to be inaccurate, especially for float32 (see example
below). Specifying a higher-accuracy accumulator using the `dtype`
keyword can alleviate this issue.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([[1, 2], [3, 4]])
>>> mt.var(a).execute()
1.25
>>> mt.var(a, axis=0).execute()
array([ 1., 1.])
>>> mt.var(a, axis=1).execute()
array([ 0.25, 0.25])
```
In single precision, var() can be inaccurate:
```pycon
>>> a = mt.zeros((2, 512*512), dtype=mt.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> mt.var(a).execute()
0.20250003
```
Computing the variance in float64 is more accurate:
```pycon
>>> mt.var(a, dtype=mt.float64).execute()
0.20249999932944759
>>> ((1-0.55)**2 + (0.1-0.55)**2)/2
0.2025
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.vdot.md
# maxframe.tensor.vdot
### maxframe.tensor.vdot(a, b)
Return the dot product of two vectors.
The vdot(a, b) function handles complex numbers differently than
dot(a, b). If the first argument is complex the complex conjugate
of the first argument is used for the calculation of the dot product.
Note that vdot handles multidimensional tensors differently than dot:
it does *not* perform a matrix product, but flattens input arguments
to 1-D vectors first. Consequently, it should only be used for vectors.
* **Parameters:**
* **a** (*array_like*) – If a is complex the complex conjugate is taken before calculation
of the dot product.
* **b** (*array_like*) – Second argument to the dot product.
* **Returns:**
**output** – Dot product of a and b. Can be an int, float, or
complex depending on the types of a and b.
* **Return type:**
Tensor
#### SEE ALSO
[`dot`](maxframe.tensor.dot.md#maxframe.tensor.dot)
: Return the dot product without using the complex conjugate of the first argument.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([1+2j,3+4j])
>>> b = mt.array([5+6j,7+8j])
>>> mt.vdot(a, b).execute()
(70-8j)
>>> mt.vdot(b, a).execute()
(70+8j)
```
Note that higher-dimensional arrays are flattened!
```pycon
>>> a = mt.array([[1, 4], [5, 6]])
>>> b = mt.array([[4, 1], [2, 2]])
>>> mt.vdot(a, b).execute()
30
>>> mt.vdot(b, a).execute()
30
>>> 1*4 + 4*1 + 5*2 + 6*2
30
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.vsplit.md
# maxframe.tensor.vsplit
### maxframe.tensor.vsplit(a, indices_or_sections)
Split a tensor into multiple sub-tensors vertically (row-wise).
Please refer to the `split` documentation. `vsplit` is equivalent
to `split` with axis=0 (default), the tensor is always split along the
first axis regardless of the tensor dimension.
#### SEE ALSO
[`split`](maxframe.tensor.split.md#maxframe.tensor.split)
: Split a tensor into multiple sub-tensors of equal size.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> x = mt.arange(16.0).reshape(4, 4)
>>> x.execute()
array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[ 12., 13., 14., 15.]])
>>> mt.vsplit(x, 2).execute()
[array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.]]),
array([[ 8., 9., 10., 11.],
[ 12., 13., 14., 15.]])]
>>> mt.vsplit(x, mt.array([3, 6])).execute()
[array([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]]),
array([[ 12., 13., 14., 15.]]),
array([], dtype=float64)]
```
With a higher dimensional tensor the split is still along the first axis.
```pycon
>>> x = mt.arange(8.0).reshape(2, 2, 2)
>>> x.execute()
array([[[ 0., 1.],
[ 2., 3.]],
[[ 4., 5.],
[ 6., 7.]]])
>>> mt.vsplit(x, 2).execute()
[array([[[ 0., 1.],
[ 2., 3.]]]),
array([[[ 4., 5.],
[ 6., 7.]]])]
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.vstack.md
# maxframe.tensor.vstack
### maxframe.tensor.vstack(tup)
Stack tensors in sequence vertically (row wise).
This is equivalent to concatenation along the first axis after 1-D tensors
of shape (N,) have been reshaped to (1,N). Rebuilds tensors divided by
vsplit.
This function makes most sense for tensors with up to 3 dimensions. For
instance, for pixel-data with a height (first axis), width (second axis),
and r/g/b channels (third axis). The functions concatenate, stack and
block provide more general stacking and concatenation operations.
* **Parameters:**
**tup** (*sequence* *of* *tensors*) – The tensors must have the same shape along all but the first axis.
1-D tensors must have the same length.
* **Returns:**
**stacked** – The tensor formed by stacking the given tensors, will be at least 2-D.
* **Return type:**
Tensor
#### SEE ALSO
`stack`
: Join a sequence of tensors along a new axis.
[`concatenate`](maxframe.tensor.concatenate.md#maxframe.tensor.concatenate)
: Join a sequence of tensors along an existing axis.
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> a = mt.array([1, 2, 3])
>>> b = mt.array([2, 3, 4])
>>> mt.vstack((a,b)).execute()
array([[1, 2, 3],
[2, 3, 4]])
```
```pycon
>>> a = mt.array([[1], [2], [3]])
>>> b = mt.array([[2], [3], [4]])
>>> mt.vstack((a,b)).execute()
array([[1],
[2],
[3],
[2],
[3],
[4]])
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.where.md
# maxframe.tensor.where
### maxframe.tensor.where(condition, x=None, y=None)
Return elements, either from x or y, depending on condition.
If only condition is given, return `condition.nonzero()`.
* **Parameters:**
* **condition** (*array_like* *,* [*bool*](https://docs.python.org/3/library/functions.html#bool)) – When True, yield x, otherwise yield y.
* **x** (*array_like* *,* *optional*) – Values from which to choose. x, y and condition need to be
broadcastable to some shape.
* **y** (*array_like* *,* *optional*) – Values from which to choose. x, y and condition need to be
broadcastable to some shape.
* **Returns:**
**out** – If both x and y are specified, the output tensor contains
elements of x where condition is True, and elements from
y elsewhere.
If only condition is given, return the tuple
`condition.nonzero()`, the indices where condition is True.
* **Return type:**
Tensor or [tuple](https://docs.python.org/3/library/stdtypes.html#tuple) of Tensors
#### SEE ALSO
[`nonzero`](maxframe.tensor.nonzero.md#maxframe.tensor.nonzero), [`choose`](maxframe.tensor.choose.md#maxframe.tensor.choose)
### Notes
If x and y are given and input arrays are 1-D, where is
equivalent to:
```default
[xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
```
### Examples
```pycon
>>> import maxframe.tensor as mt
```
```pycon
>>> mt.where([[True, False], [True, True]],
... [[1, 2], [3, 4]],
... [[9, 8], [7, 6]]).execute()
array([[1, 8],
[3, 4]])
```
```pycon
>>> mt.where([[0, 1], [1, 0]]).execute()
(array([0, 1]), array([1, 0]))
```
```pycon
>>> x = mt.arange(9.).reshape(3, 3)
>>> mt.where( x > 5 ).execute()
(array([2, 2, 2]), array([0, 1, 2]))
>>> mt.where(x < 5, x, -1).execute() # Note: broadcasting.
array([[ 0., 1., 2.],
[ 3., 4., -1.],
[-1., -1., -1.]])
```
Find the indices of elements of x that are in goodvalues.
```pycon
>>> goodvalues = [3, 4, 7]
>>> ix = mt.isin(x, goodvalues)
>>> ix.execute()
array([[False, False, False],
[ True, True, False],
[False, True, False]])
>>> mt.where(ix).execute()
(array([1, 1, 2]), array([0, 1, 1]))
```
FILE:references/maxframe-client-docs/reference/tensor/generated/maxframe.tensor.zeros.md
# maxframe.tensor.zeros
### maxframe.tensor.zeros(shape, dtype=None, chunk_size=None, gpu=None, sparse=False, order='C')
Return a new tensor of given shape and type, filled with zeros.
* **Parameters:**
* **shape** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* *sequence* *of* *ints*) – Shape of the new tensor, e.g., `(2, 3)` or `2`.
* **dtype** (*data-type* *,* *optional*) – The desired data-type for the array, e.g., mt.int8. Default is
mt.float64.
* **chunk_size** ([*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* [*int*](https://docs.python.org/3/library/functions.html#int) *or* [*tuple*](https://docs.python.org/3/library/stdtypes.html#tuple) *of* *ints* *,* *optional*) – Desired chunk size on each dimension
* **gpu** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Allocate the tensor on GPU if True, False as default
* **sparse** ([*bool*](https://docs.python.org/3/library/functions.html#bool) *,* *optional*) – Create sparse tensor if True, False as default
* **order** ( *{'C'* *,* *'F'}* *,* *optional* *,* *default: 'C'*) – Whether to store multi-dimensional data in row-major
(C-style) or column-major (Fortran-style) order in
memory.
* **Returns:**
**out** – Tensor of zeros with the given shape, dtype, and order.
* **Return type:**
Tensor
#### SEE ALSO
`zeros_like`
: Return a tensor of zeros with shape and type of input.
`ones_like`
: Return a tensor of ones with shape and type of input.
[`empty_like`](maxframe.tensor.empty_like.md#maxframe.tensor.empty_like)
: Return a empty tensor with shape and type of input.
[`ones`](maxframe.tensor.ones.md#maxframe.tensor.ones)
: Return a new tensor setting values to one.
[`empty`](maxframe.tensor.empty.md#maxframe.tensor.empty)
: Return a new uninitialized tensor.
### Examples
```pycon
>>> import maxframe.tensor as mt
>>> mt.zeros(5).execute()
array([ 0., 0., 0., 0., 0.])
```
```pycon
>>> mt.zeros((5,), dtype=int).execute()
array([0, 0, 0, 0, 0])
```
```pycon
>>> mt.zeros((2, 1)).execute()
array([[ 0.],
[ 0.]])
```
```pycon
>>> s = (2,2)
>>> mt.zeros(s).execute()
array([[ 0., 0.],
[ 0., 0.]])
```
```pycon
>>> mt.zeros((2,), dtype=[('x', 'i4'), ('y', 'i4')]).execute() # custom dtype
array([(0, 0), (0, 0)],
dtype=[('x', '<i4'), ('y', '<i4')])
```
FILE:references/maxframe-client-docs/reference/tensor/indexing.md
<a id="tensor-indexing"></a>
# Tensor Indexing Routines
## Generating index arrays
| [`maxframe.tensor.c_`](generated/maxframe.tensor.c_.md#maxframe.tensor.c_) | Translates slice objects to concatenation along the second axis. |
|-------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| [`maxframe.tensor.r_`](generated/maxframe.tensor.r_.md#maxframe.tensor.r_) | Translates slice objects to concatenation along the first axis. |
| [`maxframe.tensor.nonzero`](generated/maxframe.tensor.nonzero.md#maxframe.tensor.nonzero) | Return the indices of the elements that are non-zero. |
| [`maxframe.tensor.where`](generated/maxframe.tensor.where.md#maxframe.tensor.where) | Return elements, either from x or y, depending on condition. |
| [`maxframe.tensor.indices`](generated/maxframe.tensor.indices.md#maxframe.tensor.indices) | Return a tensor representing the indices of a grid. |
| [`maxframe.tensor.ogrid`](generated/maxframe.tensor.ogrid.md#maxframe.tensor.ogrid) | Construct a multi-dimensional "meshgrid". |
| [`maxframe.tensor.unravel_index`](generated/maxframe.tensor.unravel_index.md#maxframe.tensor.unravel_index) | Converts a flat index or tensor of flat indices into a tuple of coordinate tensors. |
## Indexing-like operations
| [`maxframe.tensor.take`](generated/maxframe.tensor.take.md#maxframe.tensor.take) | Take elements from a tensor along an axis. |
|----------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|
| [`maxframe.tensor.choose`](generated/maxframe.tensor.choose.md#maxframe.tensor.choose) | Construct a tensor from an index tensor and a set of tensors to choose from. |
| [`maxframe.tensor.compress`](generated/maxframe.tensor.compress.md#maxframe.tensor.compress) | Return selected slices of a tensor along given axis. |
| [`maxframe.tensor.diag`](generated/maxframe.tensor.diag.md#maxframe.tensor.diag) | Extract a diagonal or construct a diagonal tensor. |
## Inserting data into arrays
| [`maxframe.tensor.fill_diagonal`](generated/maxframe.tensor.fill_diagonal.md#maxframe.tensor.fill_diagonal) | Fill the main diagonal of the given tensor of any dimensionality. |
|---------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|
FILE:references/maxframe-client-docs/reference/tensor/linalg.md
# Linear Algebra
## Matrix and vector products
| [`maxframe.tensor.dot`](generated/maxframe.tensor.dot.md#maxframe.tensor.dot) | Dot product of two arrays. |
|-------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|
| [`maxframe.tensor.vdot`](generated/maxframe.tensor.vdot.md#maxframe.tensor.vdot) | Return the dot product of two vectors. |
| [`maxframe.tensor.inner`](generated/maxframe.tensor.inner.md#maxframe.tensor.inner) | Returns the inner product of a and b for arrays of floating point types. |
| [`maxframe.tensor.matmul`](generated/maxframe.tensor.matmul.md#maxframe.tensor.matmul) | Matrix product of two tensors. |
| [`maxframe.tensor.tensordot`](generated/maxframe.tensor.tensordot.md#maxframe.tensor.tensordot) | Compute tensor dot product along specified axes for tensors >= 1-D. |
| [`maxframe.tensor.einsum`](generated/maxframe.tensor.einsum.md#maxframe.tensor.einsum) | Evaluates the Einstein summation convention on the operands. |
## Decompositions
| [`maxframe.tensor.linalg.cholesky`](generated/maxframe.tensor.linalg.cholesky.md#maxframe.tensor.linalg.cholesky) | Cholesky decomposition. |
|---------------------------------------------------------------------------------------------------------------------|-------------------------------------------|
| [`maxframe.tensor.linalg.lu`](generated/maxframe.tensor.linalg.lu.md#maxframe.tensor.linalg.lu) | LU decomposition |
| [`maxframe.tensor.linalg.qr`](generated/maxframe.tensor.linalg.qr.md#maxframe.tensor.linalg.qr) | Compute the qr factorization of a matrix. |
| [`maxframe.tensor.linalg.svd`](generated/maxframe.tensor.linalg.svd.md#maxframe.tensor.linalg.svd) | Singular Value Decomposition. |
## Norms and other numbers
| [`maxframe.tensor.linalg.matrix_norm`](generated/maxframe.tensor.linalg.matrix_norm.md#maxframe.tensor.linalg.matrix_norm) | Computes the matrix norm of a matrix (or a stack of matrices) `x`. |
|------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| [`maxframe.tensor.linalg.norm`](generated/maxframe.tensor.linalg.norm.md#maxframe.tensor.linalg.norm) | Matrix or vector norm. |
| [`maxframe.tensor.linalg.vector_norm`](generated/maxframe.tensor.linalg.vector_norm.md#maxframe.tensor.linalg.vector_norm) | Computes the vector norm of a vector (or batch of vectors) `x`. |
## Solving equations and inverting matrices
| [`maxframe.tensor.linalg.inv`](generated/maxframe.tensor.linalg.inv.md#maxframe.tensor.linalg.inv) | Compute the (multiplicative) inverse of a matrix. |
|-------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| [`maxframe.tensor.linalg.lstsq`](generated/maxframe.tensor.linalg.lstsq.md#maxframe.tensor.linalg.lstsq) | Return the least-squares solution to a linear matrix equation. |
| [`maxframe.tensor.linalg.solve`](generated/maxframe.tensor.linalg.solve.md#maxframe.tensor.linalg.solve) | Solve the equation `a x = b` for `x`. |
| [`maxframe.tensor.linalg.solve_triangular`](generated/maxframe.tensor.linalg.solve_triangular.md#maxframe.tensor.linalg.solve_triangular) | Solve the equation a x = b for x, assuming a is a triangular matrix. |
FILE:references/maxframe-client-docs/reference/tensor/logic.md
# Logic Functions
## Truth value testing
| [`maxframe.tensor.all`](generated/maxframe.tensor.all.md#maxframe.tensor.all) | Test whether all array elements along a given axis evaluate to True. |
|---------------------------------------------------------------------------------|------------------------------------------------------------------------|
| [`maxframe.tensor.any`](generated/maxframe.tensor.any.md#maxframe.tensor.any) | Test whether any tensor element along a given axis evaluates to True. |
## Array contents
| [`maxframe.tensor.isfinite`](generated/maxframe.tensor.isfinite.md#maxframe.tensor.isfinite) | Test element-wise for finiteness (not infinity or not Not a Number). |
|------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
| [`maxframe.tensor.isinf`](generated/maxframe.tensor.isinf.md#maxframe.tensor.isinf) | Test element-wise for positive or negative infinity. |
| [`maxframe.tensor.isnan`](generated/maxframe.tensor.isnan.md#maxframe.tensor.isnan) | Test element-wise for NaN and return result as a boolean tensor. |
## Array type testing
| [`maxframe.tensor.iscomplex`](generated/maxframe.tensor.iscomplex.md#maxframe.tensor.iscomplex) | Returns a bool tensor, where True if input element is complex. |
|---------------------------------------------------------------------------------------------------|------------------------------------------------------------------|
| [`maxframe.tensor.isreal`](generated/maxframe.tensor.isreal.md#maxframe.tensor.isreal) | Returns a bool tensor, where True if input element is real. |
## Logic operations
| [`maxframe.tensor.logical_and`](generated/maxframe.tensor.logical_and.md#maxframe.tensor.logical_and) | Compute the truth value of x1 AND x2 element-wise. |
|---------------------------------------------------------------------------------------------------------|------------------------------------------------------|
| [`maxframe.tensor.logical_or`](generated/maxframe.tensor.logical_or.md#maxframe.tensor.logical_or) | Compute the truth value of x1 OR x2 element-wise. |
| [`maxframe.tensor.logical_not`](generated/maxframe.tensor.logical_not.md#maxframe.tensor.logical_not) | Compute the truth value of NOT x element-wise. |
| [`maxframe.tensor.logical_xor`](generated/maxframe.tensor.logical_xor.md#maxframe.tensor.logical_xor) | Compute the truth value of x1 XOR x2, element-wise. |
## Comparison
| [`maxframe.tensor.allclose`](generated/maxframe.tensor.allclose.md#maxframe.tensor.allclose) | Returns True if two tensors are element-wise equal within a tolerance. |
|-------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|
| [`maxframe.tensor.isclose`](generated/maxframe.tensor.isclose.md#maxframe.tensor.isclose) | Returns a boolean tensor where two tensors are element-wise equal within a tolerance. |
| [`maxframe.tensor.array_equal`](generated/maxframe.tensor.array_equal.md#maxframe.tensor.array_equal) | True if two tensors have the same shape and elements, False otherwise. |
| [`maxframe.tensor.greater`](generated/maxframe.tensor.greater.md#maxframe.tensor.greater) | Return the truth value of (x1 > x2) element-wise. |
| [`maxframe.tensor.greater_equal`](generated/maxframe.tensor.greater_equal.md#maxframe.tensor.greater_equal) | Return the truth value of (x1 >= x2) element-wise. |
| [`maxframe.tensor.less`](generated/maxframe.tensor.less.md#maxframe.tensor.less) | Return the truth value of (x1 < x2) element-wise. |
| [`maxframe.tensor.less_equal`](generated/maxframe.tensor.less_equal.md#maxframe.tensor.less_equal) | Return the truth value of (x1 =< x2) element-wise. |
| [`maxframe.tensor.equal`](generated/maxframe.tensor.equal.md#maxframe.tensor.equal) | Return (x1 == x2) element-wise. |
| [`maxframe.tensor.not_equal`](generated/maxframe.tensor.not_equal.md#maxframe.tensor.not_equal) | Return (x1 != x2) element-wise. |
FILE:references/maxframe-client-docs/reference/tensor/manipulation.md
# Tensor Manipulation Routines
## Basic operations
| [`maxframe.tensor.copyto`](generated/maxframe.tensor.copyto.md#maxframe.tensor.copyto) | Copies values from one array to another, broadcasting as necessary. |
|------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|
| [`maxframe.tensor.ndim`](generated/maxframe.tensor.ndim.md#maxframe.tensor.ndim) | Return the number of dimensions of a tensor. |
| [`maxframe.tensor.shape`](generated/maxframe.tensor.shape.md#maxframe.tensor.shape) | Return the shape of a tensor. |
## Changing array shape
| [`maxframe.tensor.reshape`](generated/maxframe.tensor.reshape.md#maxframe.tensor.reshape) | Gives a new shape to a tensor without changing its data. |
|-------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|
| [`maxframe.tensor.ravel`](generated/maxframe.tensor.ravel.md#maxframe.tensor.ravel) | Return a contiguous flattened tensor. |
| [`maxframe.tensor.core.Tensor.flatten`](generated/maxframe.tensor.core.Tensor.flatten.md#maxframe.tensor.core.Tensor.flatten) | Return a copy of the tensor collapsed into one dimension. |
## Transpose-like operations
| [`maxframe.tensor.moveaxis`](generated/maxframe.tensor.moveaxis.md#maxframe.tensor.moveaxis) | Move axes of a tensor to new positions. |
|-------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|
| [`maxframe.tensor.rollaxis`](generated/maxframe.tensor.rollaxis.md#maxframe.tensor.rollaxis) | Roll the specified axis backwards, until it lies in a given position. |
| [`maxframe.tensor.swapaxes`](generated/maxframe.tensor.swapaxes.md#maxframe.tensor.swapaxes) | Interchange two axes of a tensor. |
| [`maxframe.tensor.core.Tensor.T`](generated/maxframe.tensor.core.Tensor.T.md#maxframe.tensor.core.Tensor.T) | Same as self.transpose(), except that self is returned if self.ndim < 2. |
| [`maxframe.tensor.transpose`](generated/maxframe.tensor.transpose.md#maxframe.tensor.transpose) | Returns an array with axes transposed. |
## Changing number of dimensions
| [`maxframe.tensor.atleast_1d`](generated/maxframe.tensor.atleast_1d.md#maxframe.tensor.atleast_1d) | Convert inputs to tensors with at least one dimension. |
|----------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| [`maxframe.tensor.atleast_2d`](generated/maxframe.tensor.atleast_2d.md#maxframe.tensor.atleast_2d) | View inputs as tensors with at least two dimensions. |
| [`maxframe.tensor.atleast_3d`](generated/maxframe.tensor.atleast_3d.md#maxframe.tensor.atleast_3d) | View inputs as tensors with at least three dimensions. |
| [`maxframe.tensor.broadcast_to`](generated/maxframe.tensor.broadcast_to.md#maxframe.tensor.broadcast_to) | Broadcast a tensor to a new shape. |
| [`maxframe.tensor.broadcast_arrays`](generated/maxframe.tensor.broadcast_arrays.md#maxframe.tensor.broadcast_arrays) | Broadcast any number of arrays against each other. |
| [`maxframe.tensor.expand_dims`](generated/maxframe.tensor.expand_dims.md#maxframe.tensor.expand_dims) | Expand the shape of a tensor. |
| [`maxframe.tensor.squeeze`](generated/maxframe.tensor.squeeze.md#maxframe.tensor.squeeze) | Remove single-dimensional entries from the shape of a tensor. |
## Joining tensors
| [`maxframe.tensor.concatenate`](generated/maxframe.tensor.concatenate.md#maxframe.tensor.concatenate) | Join a sequence of arrays along an existing axis. |
|---------------------------------------------------------------------------------------------------------|-----------------------------------------------------|
| [`maxframe.tensor.vstack`](generated/maxframe.tensor.vstack.md#maxframe.tensor.vstack) | Stack tensors in sequence vertically (row wise). |
## Splitting arrays
| [`maxframe.tensor.split`](generated/maxframe.tensor.split.md#maxframe.tensor.split) | Split a tensor into multiple sub-tensors. |
|-------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| [`maxframe.tensor.array_split`](generated/maxframe.tensor.array_split.md#maxframe.tensor.array_split) | Split a tensor into multiple sub-tensors. |
| [`maxframe.tensor.dsplit`](generated/maxframe.tensor.dsplit.md#maxframe.tensor.dsplit) | Split tensor into multiple sub-tensors along the 3rd axis (depth). |
| [`maxframe.tensor.hsplit`](generated/maxframe.tensor.hsplit.md#maxframe.tensor.hsplit) | Split a tensor into multiple sub-tensors horizontally (column-wise). |
| [`maxframe.tensor.vsplit`](generated/maxframe.tensor.vsplit.md#maxframe.tensor.vsplit) | Split a tensor into multiple sub-tensors vertically (row-wise). |
## Tiling arrays
| [`maxframe.tensor.tile`](generated/maxframe.tensor.tile.md#maxframe.tensor.tile) | Construct a tensor by repeating A the number of times given by reps. |
|----------------------------------------------------------------------------------------|------------------------------------------------------------------------|
| [`maxframe.tensor.repeat`](generated/maxframe.tensor.repeat.md#maxframe.tensor.repeat) | Repeat elements of a tensor. |
## Adding and removing elements
| [`maxframe.tensor.delete`](generated/maxframe.tensor.delete.md#maxframe.tensor.delete) | Return a new array with sub-arrays along an axis deleted. |
|------------------------------------------------------------------------------------------|--------------------------------------------------------------|
| [`maxframe.tensor.insert`](generated/maxframe.tensor.insert.md#maxframe.tensor.insert) | Insert values along the given axis before the given indices. |
## Rearranging elements
| [`maxframe.tensor.flip`](generated/maxframe.tensor.flip.md#maxframe.tensor.flip) | Reverse the order of elements in a tensor along the given axis. |
|----------------------------------------------------------------------------------------|-------------------------------------------------------------------|
| [`maxframe.tensor.fliplr`](generated/maxframe.tensor.fliplr.md#maxframe.tensor.fliplr) | Flip tensor in the left/right direction. |
| [`maxframe.tensor.flipud`](generated/maxframe.tensor.flipud.md#maxframe.tensor.flipud) | Flip tensor in the up/down direction. |
| [`maxframe.tensor.roll`](generated/maxframe.tensor.roll.md#maxframe.tensor.roll) | Roll tensor elements along a given axis. |
FILE:references/maxframe-client-docs/reference/tensor/math.md
# Mathematical Functions
## Trigonometric functions
| [`maxframe.tensor.sin`](generated/maxframe.tensor.sin.md#maxframe.tensor.sin) | Trigonometric sine, element-wise. |
|-------------------------------------------------------------------------------------------|----------------------------------------------------------------------|
| [`maxframe.tensor.cos`](generated/maxframe.tensor.cos.md#maxframe.tensor.cos) | Cosine element-wise. |
| [`maxframe.tensor.tan`](generated/maxframe.tensor.tan.md#maxframe.tensor.tan) | Compute tangent element-wise. |
| [`maxframe.tensor.arcsin`](generated/maxframe.tensor.arcsin.md#maxframe.tensor.arcsin) | Inverse sine, element-wise. |
| [`maxframe.tensor.arccos`](generated/maxframe.tensor.arccos.md#maxframe.tensor.arccos) | Trigonometric inverse cosine, element-wise. |
| [`maxframe.tensor.arctan`](generated/maxframe.tensor.arctan.md#maxframe.tensor.arctan) | Trigonometric inverse tangent, element-wise. |
| [`maxframe.tensor.hypot`](generated/maxframe.tensor.hypot.md#maxframe.tensor.hypot) | Given the "legs" of a right triangle, return its hypotenuse. |
| [`maxframe.tensor.arctan2`](generated/maxframe.tensor.arctan2.md#maxframe.tensor.arctan2) | Element-wise arc tangent of `x1/x2` choosing the quadrant correctly. |
| [`maxframe.tensor.degrees`](generated/maxframe.tensor.degrees.md#maxframe.tensor.degrees) | Convert angles from radians to degrees. |
| [`maxframe.tensor.radians`](generated/maxframe.tensor.radians.md#maxframe.tensor.radians) | Convert angles from degrees to radians. |
| [`maxframe.tensor.deg2rad`](generated/maxframe.tensor.deg2rad.md#maxframe.tensor.deg2rad) | Convert angles from degrees to radians. |
| [`maxframe.tensor.rad2deg`](generated/maxframe.tensor.rad2deg.md#maxframe.tensor.rad2deg) | Convert angles from radians to degrees. |
## Hyperbolic functions
| [`maxframe.tensor.sinh`](generated/maxframe.tensor.sinh.md#maxframe.tensor.sinh) | Hyperbolic sine, element-wise. |
|-------------------------------------------------------------------------------------------|------------------------------------------|
| [`maxframe.tensor.cosh`](generated/maxframe.tensor.cosh.md#maxframe.tensor.cosh) | Hyperbolic cosine, element-wise. |
| [`maxframe.tensor.tanh`](generated/maxframe.tensor.tanh.md#maxframe.tensor.tanh) | Compute hyperbolic tangent element-wise. |
| [`maxframe.tensor.arcsinh`](generated/maxframe.tensor.arcsinh.md#maxframe.tensor.arcsinh) | Inverse hyperbolic sine element-wise. |
| [`maxframe.tensor.arccosh`](generated/maxframe.tensor.arccosh.md#maxframe.tensor.arccosh) | Inverse hyperbolic cosine, element-wise. |
| [`maxframe.tensor.arctanh`](generated/maxframe.tensor.arctanh.md#maxframe.tensor.arctanh) | Inverse hyperbolic tangent element-wise. |
## Rounding
| [`maxframe.tensor.around`](generated/maxframe.tensor.around.md#maxframe.tensor.around) | Evenly round to the given number of decimals. |
|------------------------------------------------------------------------------------------|--------------------------------------------------------|
| [`maxframe.tensor.round_`](generated/maxframe.tensor.round_.md#maxframe.tensor.round_) | Evenly round to the given number of decimals. |
| [`maxframe.tensor.rint`](generated/maxframe.tensor.rint.md#maxframe.tensor.rint) | Round elements of the tensor to the nearest integer. |
| [`maxframe.tensor.fix`](generated/maxframe.tensor.fix.md#maxframe.tensor.fix) | Round to nearest integer towards zero. |
| [`maxframe.tensor.floor`](generated/maxframe.tensor.floor.md#maxframe.tensor.floor) | Return the floor of the input, element-wise. |
| [`maxframe.tensor.ceil`](generated/maxframe.tensor.ceil.md#maxframe.tensor.ceil) | Return the ceiling of the input, element-wise. |
| [`maxframe.tensor.trunc`](generated/maxframe.tensor.trunc.md#maxframe.tensor.trunc) | Return the truncated value of the input, element-wise. |
## Sums, products, differences
| [`maxframe.tensor.prod`](generated/maxframe.tensor.prod.md#maxframe.tensor.prod) | Return the product of tensor elements over a given axis. |
|----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| [`maxframe.tensor.sum`](generated/maxframe.tensor.sum.md#maxframe.tensor.sum) | Sum of tensor elements over a given axis. |
| [`maxframe.tensor.nanprod`](generated/maxframe.tensor.nanprod.md#maxframe.tensor.nanprod) | Return the product of array elements over a given axis treating Not a Numbers (NaNs) as ones. |
| [`maxframe.tensor.nansum`](generated/maxframe.tensor.nansum.md#maxframe.tensor.nansum) | Return the sum of array elements over a given axis treating Not a Numbers (NaNs) as zero. |
| [`maxframe.tensor.cumprod`](generated/maxframe.tensor.cumprod.md#maxframe.tensor.cumprod) | Return the cumulative product of elements along a given axis. |
| [`maxframe.tensor.cumsum`](generated/maxframe.tensor.cumsum.md#maxframe.tensor.cumsum) | Return the cumulative sum of the elements along a given axis. |
| [`maxframe.tensor.nancumprod`](generated/maxframe.tensor.nancumprod.md#maxframe.tensor.nancumprod) | Return the cumulative product of tensor elements over a given axis treating Not a Numbers (NaNs) as one. |
| [`maxframe.tensor.nancumsum`](generated/maxframe.tensor.nancumsum.md#maxframe.tensor.nancumsum) | Return the cumulative sum of tensor elements over a given axis treating Not a Numbers (NaNs) as zero. |
| [`maxframe.tensor.diff`](generated/maxframe.tensor.diff.md#maxframe.tensor.diff) | Calculate the n-th discrete difference along the given axis. |
| [`maxframe.tensor.ediff1d`](generated/maxframe.tensor.ediff1d.md#maxframe.tensor.ediff1d) | The differences between consecutive elements of a tensor. |
| [`maxframe.tensor.trapezoid`](generated/maxframe.tensor.trapezoid.md#maxframe.tensor.trapezoid) | Integrate along the given axis using the composite trapezoidal rule. |
## Exponential and logarithms
| [`maxframe.tensor.exp`](generated/maxframe.tensor.exp.md#maxframe.tensor.exp) | Calculate the exponential of all elements in the input tensor. |
|----------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|
| [`maxframe.tensor.expm1`](generated/maxframe.tensor.expm1.md#maxframe.tensor.expm1) | Calculate `exp(x) - 1` for all elements in the tensor. |
| [`maxframe.tensor.exp2`](generated/maxframe.tensor.exp2.md#maxframe.tensor.exp2) | Calculate 2\*\*p for all p in the input tensor. |
| [`maxframe.tensor.log`](generated/maxframe.tensor.log.md#maxframe.tensor.log) | Natural logarithm, element-wise. |
| [`maxframe.tensor.log10`](generated/maxframe.tensor.log10.md#maxframe.tensor.log10) | Return the base 10 logarithm of the input tensor, element-wise. |
| [`maxframe.tensor.log2`](generated/maxframe.tensor.log2.md#maxframe.tensor.log2) | Base-2 logarithm of x. |
| [`maxframe.tensor.log1p`](generated/maxframe.tensor.log1p.md#maxframe.tensor.log1p) | Return the natural logarithm of one plus the input tensor, element-wise. |
| [`maxframe.tensor.logaddexp`](generated/maxframe.tensor.logaddexp.md#maxframe.tensor.logaddexp) | Logarithm of the sum of exponentiations of the inputs. |
| [`maxframe.tensor.logaddexp2`](generated/maxframe.tensor.logaddexp2.md#maxframe.tensor.logaddexp2) | Logarithm of the sum of exponentiations of the inputs in base-2. |
## Other special functions
| [`maxframe.tensor.i0`](generated/maxframe.tensor.i0.md#maxframe.tensor.i0) | Modified Bessel function of the first kind, order 0. |
|----------------------------------------------------------------------------------|--------------------------------------------------------|
| [`maxframe.tensor.sinc`](generated/maxframe.tensor.sinc.md#maxframe.tensor.sinc) | Return the sinc function. |
## Floating point routines
| [`maxframe.tensor.signbit`](generated/maxframe.tensor.signbit.md#maxframe.tensor.signbit) | Returns element-wise True where signbit is set (less than zero). |
|-------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
| [`maxframe.tensor.copysign`](generated/maxframe.tensor.copysign.md#maxframe.tensor.copysign) | Change the sign of x1 to that of x2, element-wise. |
| [`maxframe.tensor.frexp`](generated/maxframe.tensor.frexp.md#maxframe.tensor.frexp) | Decompose the elements of x into mantissa and twos exponent. |
| [`maxframe.tensor.ldexp`](generated/maxframe.tensor.ldexp.md#maxframe.tensor.ldexp) | Returns x1 \* 2\*\*x2, element-wise. |
| [`maxframe.tensor.nextafter`](generated/maxframe.tensor.nextafter.md#maxframe.tensor.nextafter) | Return the next floating-point value after x1 towards x2, element-wise. |
| [`maxframe.tensor.spacing`](generated/maxframe.tensor.spacing.md#maxframe.tensor.spacing) | Return the distance between x and the nearest adjacent number. |
## Arithmetic operations
| [`maxframe.tensor.add`](generated/maxframe.tensor.add.md#maxframe.tensor.add) | Add arguments element-wise. |
|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------|
| [`maxframe.tensor.reciprocal`](generated/maxframe.tensor.reciprocal.md#maxframe.tensor.reciprocal) | Return the reciprocal of the argument, element-wise. |
| [`maxframe.tensor.positive`](generated/maxframe.tensor.positive.md#maxframe.tensor.positive) | Numerical positive, element-wise. |
| [`maxframe.tensor.negative`](generated/maxframe.tensor.negative.md#maxframe.tensor.negative) | Numerical negative, element-wise. |
| [`maxframe.tensor.multiply`](generated/maxframe.tensor.multiply.md#maxframe.tensor.multiply) | Multiply arguments element-wise. |
| [`maxframe.tensor.divide`](generated/maxframe.tensor.divide.md#maxframe.tensor.divide) | Divide arguments element-wise. |
| [`maxframe.tensor.power`](generated/maxframe.tensor.power.md#maxframe.tensor.power) | First tensor elements raised to powers from second tensor, element-wise. |
| [`maxframe.tensor.subtract`](generated/maxframe.tensor.subtract.md#maxframe.tensor.subtract) | Subtract arguments, element-wise. |
| [`maxframe.tensor.true_divide`](generated/maxframe.tensor.true_divide.md#maxframe.tensor.true_divide) | Returns a true division of the inputs, element-wise. |
| [`maxframe.tensor.floor_divide`](generated/maxframe.tensor.floor_divide.md#maxframe.tensor.floor_divide) | Return the largest integer smaller or equal to the division of the inputs. |
| [`maxframe.tensor.float_power`](generated/maxframe.tensor.float_power.md#maxframe.tensor.float_power) | First tensor elements raised to powers from second array, element-wise. |
| [`maxframe.tensor.fmod`](generated/maxframe.tensor.fmod.md#maxframe.tensor.fmod) | Return the element-wise remainder of division. |
| [`maxframe.tensor.mod`](generated/maxframe.tensor.mod.md#maxframe.tensor.mod) | Return element-wise remainder of division. |
| [`maxframe.tensor.modf`](generated/maxframe.tensor.modf.md#maxframe.tensor.modf) | Return the fractional and integral parts of a tensor, element-wise. |
| [`maxframe.tensor.remainder`](generated/maxframe.tensor.remainder.md#maxframe.tensor.remainder) | Return element-wise remainder of division. |
## Handling complex numbers
| [`maxframe.tensor.angle`](generated/maxframe.tensor.angle.md#maxframe.tensor.angle) | Return the angle of the complex argument. |
|---------------------------------------------------------------------------------------|----------------------------------------------------|
| [`maxframe.tensor.real`](generated/maxframe.tensor.real.md#maxframe.tensor.real) | Return the real part of the complex argument. |
| [`maxframe.tensor.imag`](generated/maxframe.tensor.imag.md#maxframe.tensor.imag) | Return the imaginary part of the complex argument. |
| [`maxframe.tensor.conj`](generated/maxframe.tensor.conj.md#maxframe.tensor.conj) | Return the complex conjugate, element-wise. |
## Miscellaneous
| [`maxframe.tensor.sqrt`](generated/maxframe.tensor.sqrt.md#maxframe.tensor.sqrt) | Return the positive square-root of an tensor, element-wise. |
|----------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| [`maxframe.tensor.cbrt`](generated/maxframe.tensor.cbrt.md#maxframe.tensor.cbrt) | Return the cube-root of an tensor, element-wise. |
| [`maxframe.tensor.square`](generated/maxframe.tensor.square.md#maxframe.tensor.square) | Return the element-wise square of the input. |
| [`maxframe.tensor.absolute`](generated/maxframe.tensor.absolute.md#maxframe.tensor.absolute) | Calculate the absolute value element-wise. |
| [`maxframe.tensor.sign`](generated/maxframe.tensor.sign.md#maxframe.tensor.sign) | Returns an element-wise indication of the sign of a number. |
| [`maxframe.tensor.maximum`](generated/maxframe.tensor.maximum.md#maxframe.tensor.maximum) | Element-wise maximum of tensor elements. |
| [`maxframe.tensor.minimum`](generated/maxframe.tensor.minimum.md#maxframe.tensor.minimum) | Element-wise minimum of tensor elements. |
| [`maxframe.tensor.fmax`](generated/maxframe.tensor.fmax.md#maxframe.tensor.fmax) | Element-wise maximum of array elements. |
| [`maxframe.tensor.fmin`](generated/maxframe.tensor.fmin.md#maxframe.tensor.fmin) | Element-wise minimum of array elements. |
| [`maxframe.tensor.nan_to_num`](generated/maxframe.tensor.nan_to_num.md#maxframe.tensor.nan_to_num) | Replace nan with zero and inf with large finite numbers. |
FILE:references/maxframe-client-docs/reference/tensor/random.md
<a id="tensor-random"></a>
<a id="module-maxframe.tensor.random"></a>
# Random Sampling
## Sample random data
| [`maxframe.tensor.random.bytes`](generated/maxframe.tensor.random.bytes.md#maxframe.tensor.random.bytes) | Return random bytes. |
|----------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|
| [`maxframe.tensor.random.choice`](generated/maxframe.tensor.random.choice.md#maxframe.tensor.random.choice) | Generates a random sample from a given 1-D array |
| [`maxframe.tensor.random.permutation`](generated/maxframe.tensor.random.permutation.md#maxframe.tensor.random.permutation) | Randomly permute a sequence, or return a permuted range. |
| [`maxframe.tensor.random.rand`](generated/maxframe.tensor.random.rand.md#maxframe.tensor.random.rand) | Random values in a given shape. |
| [`maxframe.tensor.random.randint`](generated/maxframe.tensor.random.randint.md#maxframe.tensor.random.randint) | Return random integers from low (inclusive) to high (exclusive). |
| [`maxframe.tensor.random.randn`](generated/maxframe.tensor.random.randn.md#maxframe.tensor.random.randn) | Return a sample (or samples) from the "standard normal" distribution. |
| [`maxframe.tensor.random.random_integers`](generated/maxframe.tensor.random.random_integers.md#maxframe.tensor.random.random_integers) | Random integers of type mt.int between low and high, inclusive. |
| [`maxframe.tensor.random.random_sample`](generated/maxframe.tensor.random.random_sample.md#maxframe.tensor.random.random_sample) | Return random floats in the half-open interval [0.0, 1.0). |
| [`maxframe.tensor.random.random`](generated/maxframe.tensor.random.random.md#maxframe.tensor.random.random) | Return random floats in the half-open interval [0.0, 1.0). |
| [`maxframe.tensor.random.ranf`](generated/maxframe.tensor.random.ranf.md#maxframe.tensor.random.ranf) | Return random floats in the half-open interval [0.0, 1.0). |
| [`maxframe.tensor.random.sample`](generated/maxframe.tensor.random.sample.md#maxframe.tensor.random.sample) | Return random floats in the half-open interval [0.0, 1.0). |
## Distributions
| [`maxframe.tensor.random.beta`](generated/maxframe.tensor.random.beta.md#maxframe.tensor.random.beta) | Draw samples from a Beta distribution. |
|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|
| [`maxframe.tensor.random.binomial`](generated/maxframe.tensor.random.binomial.md#maxframe.tensor.random.binomial) | Draw samples from a binomial distribution. |
| [`maxframe.tensor.random.chisquare`](generated/maxframe.tensor.random.chisquare.md#maxframe.tensor.random.chisquare) | Draw samples from a chi-square distribution. |
| [`maxframe.tensor.random.dirichlet`](generated/maxframe.tensor.random.dirichlet.md#maxframe.tensor.random.dirichlet) | Draw samples from the Dirichlet distribution. |
| [`maxframe.tensor.random.exponential`](generated/maxframe.tensor.random.exponential.md#maxframe.tensor.random.exponential) | Draw samples from an exponential distribution. |
| [`maxframe.tensor.random.f`](generated/maxframe.tensor.random.f.md#maxframe.tensor.random.f) | Draw samples from an F distribution. |
| [`maxframe.tensor.random.gamma`](generated/maxframe.tensor.random.gamma.md#maxframe.tensor.random.gamma) | Draw samples from a Gamma distribution. |
| [`maxframe.tensor.random.geometric`](generated/maxframe.tensor.random.geometric.md#maxframe.tensor.random.geometric) | Draw samples from the geometric distribution. |
| [`maxframe.tensor.random.gumbel`](generated/maxframe.tensor.random.gumbel.md#maxframe.tensor.random.gumbel) | Draw samples from a Gumbel distribution. |
| [`maxframe.tensor.random.hypergeometric`](generated/maxframe.tensor.random.hypergeometric.md#maxframe.tensor.random.hypergeometric) | Draw samples from a Hypergeometric distribution. |
| [`maxframe.tensor.random.laplace`](generated/maxframe.tensor.random.laplace.md#maxframe.tensor.random.laplace) | Draw samples from the Laplace or double exponential distribution with specified location (or mean) and scale (decay). |
| [`maxframe.tensor.random.lognormal`](generated/maxframe.tensor.random.lognormal.md#maxframe.tensor.random.lognormal) | Draw samples from a log-normal distribution. |
| [`maxframe.tensor.random.logseries`](generated/maxframe.tensor.random.logseries.md#maxframe.tensor.random.logseries) | Draw samples from a logarithmic series distribution. |
| [`maxframe.tensor.random.multinomial`](generated/maxframe.tensor.random.multinomial.md#maxframe.tensor.random.multinomial) | Draw samples from a multinomial distribution. |
| [`maxframe.tensor.random.multivariate_normal`](generated/maxframe.tensor.random.multivariate_normal.md#maxframe.tensor.random.multivariate_normal) | Draw random samples from a multivariate normal distribution. |
| [`maxframe.tensor.random.negative_binomial`](generated/maxframe.tensor.random.negative_binomial.md#maxframe.tensor.random.negative_binomial) | Draw samples from a negative binomial distribution. |
| [`maxframe.tensor.random.noncentral_chisquare`](generated/maxframe.tensor.random.noncentral_chisquare.md#maxframe.tensor.random.noncentral_chisquare) | Draw samples from a noncentral chi-square distribution. |
| [`maxframe.tensor.random.noncentral_f`](generated/maxframe.tensor.random.noncentral_f.md#maxframe.tensor.random.noncentral_f) | Draw samples from the noncentral F distribution. |
| [`maxframe.tensor.random.normal`](generated/maxframe.tensor.random.normal.md#maxframe.tensor.random.normal) | Draw random samples from a normal (Gaussian) distribution. |
| [`maxframe.tensor.random.pareto`](generated/maxframe.tensor.random.pareto.md#maxframe.tensor.random.pareto) | Draw samples from a Pareto II or Lomax distribution with specified shape. |
| [`maxframe.tensor.random.poisson`](generated/maxframe.tensor.random.poisson.md#maxframe.tensor.random.poisson) | Draw samples from a Poisson distribution. |
| [`maxframe.tensor.random.power`](generated/maxframe.tensor.random.power.md#maxframe.tensor.random.power) | Draws samples in [0, 1] from a power distribution with positive exponent a - 1. |
| [`maxframe.tensor.random.rayleigh`](generated/maxframe.tensor.random.rayleigh.md#maxframe.tensor.random.rayleigh) | Draw samples from a Rayleigh distribution. |
| [`maxframe.tensor.random.standard_cauchy`](generated/maxframe.tensor.random.standard_cauchy.md#maxframe.tensor.random.standard_cauchy) | Draw samples from a standard Cauchy distribution with mode = 0. |
| [`maxframe.tensor.random.standard_exponential`](generated/maxframe.tensor.random.standard_exponential.md#maxframe.tensor.random.standard_exponential) | Draw samples from the standard exponential distribution. |
| [`maxframe.tensor.random.standard_gamma`](generated/maxframe.tensor.random.standard_gamma.md#maxframe.tensor.random.standard_gamma) | Draw samples from a standard Gamma distribution. |
| [`maxframe.tensor.random.standard_normal`](generated/maxframe.tensor.random.standard_normal.md#maxframe.tensor.random.standard_normal) | Draw samples from a standard Normal distribution (mean=0, stdev=1). |
| [`maxframe.tensor.random.standard_t`](generated/maxframe.tensor.random.standard_t.md#maxframe.tensor.random.standard_t) | Draw samples from a standard Student's t distribution with df degrees of freedom. |
| [`maxframe.tensor.random.triangular`](generated/maxframe.tensor.random.triangular.md#maxframe.tensor.random.triangular) | Draw samples from the triangular distribution over the interval `[left, right]`. |
| [`maxframe.tensor.random.uniform`](generated/maxframe.tensor.random.uniform.md#maxframe.tensor.random.uniform) | Draw samples from a uniform distribution. |
| [`maxframe.tensor.random.vonmises`](generated/maxframe.tensor.random.vonmises.md#maxframe.tensor.random.vonmises) | Draw samples from a von Mises distribution. |
| [`maxframe.tensor.random.wald`](generated/maxframe.tensor.random.wald.md#maxframe.tensor.random.wald) | Draw samples from a Wald, or inverse Gaussian, distribution. |
| [`maxframe.tensor.random.weibull`](generated/maxframe.tensor.random.weibull.md#maxframe.tensor.random.weibull) | Draw samples from a Weibull distribution. |
| [`maxframe.tensor.random.zipf`](generated/maxframe.tensor.random.zipf.md#maxframe.tensor.random.zipf) | Draw samples from a Zipf distribution. |
## Random number generator
| [`maxframe.tensor.random.seed`](generated/maxframe.tensor.random.seed.md#maxframe.tensor.random.seed) | Seed the generator. |
|----------------------------------------------------------------------------------------------------------------------------|-----------------------|
| [`maxframe.tensor.random.RandomState`](generated/maxframe.tensor.random.RandomState.md#maxframe.tensor.random.RandomState) | |
FILE:references/maxframe-client-docs/reference/tensor/routines.md
<a id="tensor-api"></a>
# MaxFrame Tensor
The following pages describe Numpy-compatible routines. These functions cover a subset of
[NumPy routines](https://docs.scipy.org/doc/numpy/reference/routines.html).
<a id="module-maxframe.tensor"></a>
* [Tensor Creation Routines](creation.md)
* [From shape or value](creation.md#from-shape-or-value)
* [From existing data](creation.md#from-existing-data)
* [Building matrices](creation.md#building-matrices)
* [Numerical ranges](creation.md#numerical-ranges)
* [Tensor Indexing Routines](indexing.md)
* [Generating index arrays](indexing.md#generating-index-arrays)
* [Indexing-like operations](indexing.md#indexing-like-operations)
* [Inserting data into arrays](indexing.md#inserting-data-into-arrays)
* [Tensor Manipulation Routines](manipulation.md)
* [Basic operations](manipulation.md#basic-operations)
* [Changing array shape](manipulation.md#changing-array-shape)
* [Transpose-like operations](manipulation.md#transpose-like-operations)
* [Changing number of dimensions](manipulation.md#changing-number-of-dimensions)
* [Joining tensors](manipulation.md#joining-tensors)
* [Splitting arrays](manipulation.md#splitting-arrays)
* [Tiling arrays](manipulation.md#tiling-arrays)
* [Adding and removing elements](manipulation.md#adding-and-removing-elements)
* [Rearranging elements](manipulation.md#rearranging-elements)
* [Binary Operations](binary.md)
* [Elementwise bit operations](binary.md#elementwise-bit-operations)
* [Discrete Fourier Transform](fft.md)
* [Standard FFTs](fft.md#standard-ffts)
* [Real FFTs](fft.md#real-ffts)
* [Hermitian FFTs](fft.md#hermitian-ffts)
* [Helper routines](fft.md#helper-routines)
* [Tensor Indexing Routines](indexing.md)
* [Generating index arrays](indexing.md#generating-index-arrays)
* [Indexing-like operations](indexing.md#indexing-like-operations)
* [Inserting data into arrays](indexing.md#inserting-data-into-arrays)
* [Linear Algebra](linalg.md)
* [Matrix and vector products](linalg.md#matrix-and-vector-products)
* [Decompositions](linalg.md#decompositions)
* [Norms and other numbers](linalg.md#norms-and-other-numbers)
* [Solving equations and inverting matrices](linalg.md#solving-equations-and-inverting-matrices)
* [Logic Functions](logic.md)
* [Truth value testing](logic.md#truth-value-testing)
* [Array contents](logic.md#array-contents)
* [Array type testing](logic.md#array-type-testing)
* [Logic operations](logic.md#logic-operations)
* [Comparison](logic.md#comparison)
* [Mathematical Functions](math.md)
* [Trigonometric functions](math.md#trigonometric-functions)
* [Hyperbolic functions](math.md#hyperbolic-functions)
* [Rounding](math.md#rounding)
* [Sums, products, differences](math.md#sums-products-differences)
* [Exponential and logarithms](math.md#exponential-and-logarithms)
* [Other special functions](math.md#other-special-functions)
* [Floating point routines](math.md#floating-point-routines)
* [Arithmetic operations](math.md#arithmetic-operations)
* [Handling complex numbers](math.md#handling-complex-numbers)
* [Miscellaneous](math.md#miscellaneous)
* [Random Sampling](random.md)
* [Sample random data](random.md#sample-random-data)
* [Distributions](random.md#distributions)
* [Random number generator](random.md#random-number-generator)
* [Set routines](sets.md)
* [Making proper sets](sets.md#making-proper-sets)
* [Boolean operations](sets.md#boolean-operations)
* [Sorting, Searching, and Counting](sorting.md)
* [Sorting](sorting.md#sorting)
* [Searching](sorting.md#searching)
* [Counting](sorting.md#counting)
* [Special Functions](special.md)
* [Airy functions](special.md#airy-functions)
* [Information Theory functions](special.md#information-theory-functions)
* [Bessel functions](special.md#bessel-functions)
* [Error functions and fresnel integrals](special.md#error-functions-and-fresnel-integrals)
* [Ellipsoidal harmonics](special.md#ellipsoidal-harmonics)
* [Elliptic functions and integrals](special.md#elliptic-functions-and-integrals)
* [Gamma and related functions](special.md#gamma-and-related-functions)
* [Sigmoidal functions](special.md#sigmoidal-functions)
* [Other special functions](special.md#other-special-functions)
* [Convenience functions](special.md#convenience-functions)
* [Statistics](statistics.md)
* [Order statistics](statistics.md#order-statistics)
* [Average and variances](statistics.md#average-and-variances)
* [Correlating](statistics.md#correlating)
* [Histograms](statistics.md#histograms)
FILE:references/maxframe-client-docs/reference/tensor/sets.md
<a id="tensor-sets"></a>
# Set routines
## Making proper sets
| [`maxframe.tensor.unique`](generated/maxframe.tensor.unique.md#maxframe.tensor.unique) | Find the unique elements of a tensor. |
|------------------------------------------------------------------------------------------|-----------------------------------------|
## Boolean operations
| [`maxframe.tensor.in1d`](generated/maxframe.tensor.in1d.md#maxframe.tensor.in1d) | Test whether each element of a 1-D tensor is also present in a second tensor. |
|-------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|
| [`maxframe.tensor.isin`](generated/maxframe.tensor.isin.md#maxframe.tensor.isin) | Calculates element in test_elements, broadcasting over element only. |
| [`maxframe.tensor.setdiff1d`](generated/maxframe.tensor.setdiff1d.md#maxframe.tensor.setdiff1d) | Find the set difference of two tensors. |
FILE:references/maxframe-client-docs/reference/tensor/sorting.md
# Sorting, Searching, and Counting
## Sorting
| [`maxframe.tensor.sort`](generated/maxframe.tensor.sort.md#maxframe.tensor.sort) | Return a sorted copy of a tensor. |
|----------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|
| [`maxframe.tensor.argsort`](generated/maxframe.tensor.argsort.md#maxframe.tensor.argsort) | Returns the indices that would sort a tensor. |
| [`maxframe.tensor.partition`](generated/maxframe.tensor.partition.md#maxframe.tensor.partition) | Return a partitioned copy of a tensor. |
| [`maxframe.tensor.argpartition`](generated/maxframe.tensor.argpartition.md#maxframe.tensor.argpartition) | Perform an indirect partition along the given axis using the algorithm specified by the kind keyword. |
## Searching
| [`maxframe.tensor.argmax`](generated/maxframe.tensor.argmax.md#maxframe.tensor.argmax) | Returns the indices of the maximum values along an axis. |
|-------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
| [`maxframe.tensor.nanargmax`](generated/maxframe.tensor.nanargmax.md#maxframe.tensor.nanargmax) | Return the indices of the maximum values in the specified axis ignoring NaNs. |
| [`maxframe.tensor.argmin`](generated/maxframe.tensor.argmin.md#maxframe.tensor.argmin) | Returns the indices of the minimum values along an axis. |
| [`maxframe.tensor.nanargmin`](generated/maxframe.tensor.nanargmin.md#maxframe.tensor.nanargmin) | Return the indices of the minimum values in the specified axis ignoring NaNs. |
| [`maxframe.tensor.nonzero`](generated/maxframe.tensor.nonzero.md#maxframe.tensor.nonzero) | Return the indices of the elements that are non-zero. |
| [`maxframe.tensor.flatnonzero`](generated/maxframe.tensor.flatnonzero.md#maxframe.tensor.flatnonzero) | Return indices that are non-zero in the flattened version of a. |
## Counting
| [`maxframe.tensor.argwhere`](generated/maxframe.tensor.argwhere.md#maxframe.tensor.argwhere) | Find the indices of tensor elements that are non-zero, grouped by element. |
|-------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|
| [`maxframe.tensor.count_nonzero`](generated/maxframe.tensor.count_nonzero.md#maxframe.tensor.count_nonzero) | Counts the number of non-zero values in the tensor `a`. |
FILE:references/maxframe-client-docs/reference/tensor/special.md
# Special Functions
## Airy functions
| [`maxframe.tensor.special.airy`](generated/maxframe.tensor.special.airy.md#maxframe.tensor.special.airy) | airy(z, out=None) |
|----------------------------------------------------------------------------------------------------------------|---------------------|
| [`maxframe.tensor.special.airye`](generated/maxframe.tensor.special.airye.md#maxframe.tensor.special.airye) | airye(z, out=None) |
| [`maxframe.tensor.special.itairy`](generated/maxframe.tensor.special.itairy.md#maxframe.tensor.special.itairy) | itairy(x, out=None) |
## Information Theory functions
| [`maxframe.tensor.special.entr`](generated/maxframe.tensor.special.entr.md#maxframe.tensor.special.entr) | Elementwise function for computing entropy. |
|----------------------------------------------------------------------------------------------------------------------|------------------------------------------------------|
| [`maxframe.tensor.special.rel_entr`](generated/maxframe.tensor.special.rel_entr.md#maxframe.tensor.special.rel_entr) | Elementwise function for computing relative entropy. |
## Bessel functions
| [`maxframe.tensor.special.jv`](generated/maxframe.tensor.special.jv.md#maxframe.tensor.special.jv) | jv(v, z, out=None) |
|----------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
| [`maxframe.tensor.special.jve`](generated/maxframe.tensor.special.jve.md#maxframe.tensor.special.jve) | jve(v, z, out=None) |
| [`maxframe.tensor.special.yn`](generated/maxframe.tensor.special.yn.md#maxframe.tensor.special.yn) | Bessel function of the second kind of integer order and real argument. |
| [`maxframe.tensor.special.yv`](generated/maxframe.tensor.special.yv.md#maxframe.tensor.special.yv) | yv(v, z, out=None) |
| [`maxframe.tensor.special.yve`](generated/maxframe.tensor.special.yve.md#maxframe.tensor.special.yve) | yve(v, z, out=None) |
| [`maxframe.tensor.special.kn`](generated/maxframe.tensor.special.kn.md#maxframe.tensor.special.kn) | Modified Bessel function of the second kind of integer order n |
| [`maxframe.tensor.special.kv`](generated/maxframe.tensor.special.kv.md#maxframe.tensor.special.kv) | kv(v, z, out=None) |
| [`maxframe.tensor.special.kve`](generated/maxframe.tensor.special.kve.md#maxframe.tensor.special.kve) | kve(v, z, out=None) |
| [`maxframe.tensor.special.iv`](generated/maxframe.tensor.special.iv.md#maxframe.tensor.special.iv) | iv(v, z, out=None) |
| [`maxframe.tensor.special.ive`](generated/maxframe.tensor.special.ive.md#maxframe.tensor.special.ive) | ive(v, z, out=None) |
| [`maxframe.tensor.special.hankel1`](generated/maxframe.tensor.special.hankel1.md#maxframe.tensor.special.hankel1) | hankel1(v, z, out=None) |
| [`maxframe.tensor.special.hankel1e`](generated/maxframe.tensor.special.hankel1e.md#maxframe.tensor.special.hankel1e) | hankel1e(v, z, out=None) |
| [`maxframe.tensor.special.hankel2`](generated/maxframe.tensor.special.hankel2.md#maxframe.tensor.special.hankel2) | hankel2(v, z, out=None) |
| [`maxframe.tensor.special.hankel2e`](generated/maxframe.tensor.special.hankel2e.md#maxframe.tensor.special.hankel2e) | hankel2e(v, z, out=None) |
## Error functions and fresnel integrals
| [`maxframe.tensor.special.erf`](generated/maxframe.tensor.special.erf.md#maxframe.tensor.special.erf) | Returns the error function of complex argument. |
|-------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|
| [`maxframe.tensor.special.erfc`](generated/maxframe.tensor.special.erfc.md#maxframe.tensor.special.erfc) | Complementary error function, `1 - erf(x)`. |
| [`maxframe.tensor.special.erfcx`](generated/maxframe.tensor.special.erfcx.md#maxframe.tensor.special.erfcx) | Scaled complementary error function, `exp(x**2) * erfc(x)`. |
| [`maxframe.tensor.special.erfi`](generated/maxframe.tensor.special.erfi.md#maxframe.tensor.special.erfi) | Imaginary error function, `-i erf(i z)`. |
| [`maxframe.tensor.special.erfinv`](generated/maxframe.tensor.special.erfinv.md#maxframe.tensor.special.erfinv) | Inverse of the error function. |
| [`maxframe.tensor.special.erfcinv`](generated/maxframe.tensor.special.erfcinv.md#maxframe.tensor.special.erfcinv) | Inverse of the complementary error function. |
| [`maxframe.tensor.special.wofz`](generated/maxframe.tensor.special.wofz.md#maxframe.tensor.special.wofz) | Faddeeva function |
| [`maxframe.tensor.special.dawsn`](generated/maxframe.tensor.special.dawsn.md#maxframe.tensor.special.dawsn) | Dawson's integral. |
| [`maxframe.tensor.special.fresnel`](generated/maxframe.tensor.special.fresnel.md#maxframe.tensor.special.fresnel) | Fresnel integrals. |
| [`maxframe.tensor.special.modfresnelp`](generated/maxframe.tensor.special.modfresnelp.md#maxframe.tensor.special.modfresnelp) | modfresnelp(x, out=None) |
| [`maxframe.tensor.special.modfresnelm`](generated/maxframe.tensor.special.modfresnelm.md#maxframe.tensor.special.modfresnelm) | modfresnelm(x, out=None) |
## Ellipsoidal harmonics
| [`maxframe.tensor.special.ellip_harm`](generated/maxframe.tensor.special.ellip_harm.md#maxframe.tensor.special.ellip_harm) | Ellipsoidal harmonic functions E^p_n(l) |
|----------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|
| [`maxframe.tensor.special.ellip_harm_2`](generated/maxframe.tensor.special.ellip_harm_2.md#maxframe.tensor.special.ellip_harm_2) | Ellipsoidal harmonic functions F^p_n(l) |
| [`maxframe.tensor.special.ellip_normal`](generated/maxframe.tensor.special.ellip_normal.md#maxframe.tensor.special.ellip_normal) | Ellipsoidal harmonic normalization constants gamma^p_n |
## Elliptic functions and integrals
| [`maxframe.tensor.special.ellipk`](generated/maxframe.tensor.special.ellipk.md#maxframe.tensor.special.ellipk) | Complete elliptic integral of the first kind. |
|-------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------|
| [`maxframe.tensor.special.ellipkm1`](generated/maxframe.tensor.special.ellipkm1.md#maxframe.tensor.special.ellipkm1) | Complete elliptic integral of the first kind around m = 1 |
| [`maxframe.tensor.special.ellipkinc`](generated/maxframe.tensor.special.ellipkinc.md#maxframe.tensor.special.ellipkinc) | Incomplete elliptic integral of the first kind |
| [`maxframe.tensor.special.ellipe`](generated/maxframe.tensor.special.ellipe.md#maxframe.tensor.special.ellipe) | Complete elliptic integral of the second kind |
| [`maxframe.tensor.special.ellipeinc`](generated/maxframe.tensor.special.ellipeinc.md#maxframe.tensor.special.ellipeinc) | Incomplete elliptic integral of the second kind |
| [`maxframe.tensor.special.elliprc`](generated/maxframe.tensor.special.elliprc.md#maxframe.tensor.special.elliprc) | Degenerate symmetric elliptic integral. |
| [`maxframe.tensor.special.elliprf`](generated/maxframe.tensor.special.elliprf.md#maxframe.tensor.special.elliprf) | Completely-symmetric elliptic integral of the first kind. |
| [`maxframe.tensor.special.elliprg`](generated/maxframe.tensor.special.elliprg.md#maxframe.tensor.special.elliprg) | Completely-symmetric elliptic integral of the second kind. |
| [`maxframe.tensor.special.elliprj`](generated/maxframe.tensor.special.elliprj.md#maxframe.tensor.special.elliprj) | Symmetric elliptic integral of the third kind. |
## Gamma and related functions
| [`maxframe.tensor.special.gamma`](generated/maxframe.tensor.special.gamma.md#maxframe.tensor.special.gamma) | gamma(z, out=None) |
|----------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| [`maxframe.tensor.special.gammaln`](generated/maxframe.tensor.special.gammaln.md#maxframe.tensor.special.gammaln) | Logarithm of the absolute value of the Gamma function. |
| [`maxframe.tensor.special.loggamma`](generated/maxframe.tensor.special.loggamma.md#maxframe.tensor.special.loggamma) | loggamma(z, out=None) |
| [`maxframe.tensor.special.gammasgn`](generated/maxframe.tensor.special.gammasgn.md#maxframe.tensor.special.gammasgn) | Sign of the gamma function. |
| [`maxframe.tensor.special.gammainc`](generated/maxframe.tensor.special.gammainc.md#maxframe.tensor.special.gammainc) | Regularized lower incomplete gamma function. |
| [`maxframe.tensor.special.gammaincinv`](generated/maxframe.tensor.special.gammaincinv.md#maxframe.tensor.special.gammaincinv) | Inverse to the regularized lower incomplete gamma function. |
| [`maxframe.tensor.special.gammaincc`](generated/maxframe.tensor.special.gammaincc.md#maxframe.tensor.special.gammaincc) | Regularized lower incomplete gamma function. |
| [`maxframe.tensor.special.gammainccinv`](generated/maxframe.tensor.special.gammainccinv.md#maxframe.tensor.special.gammainccinv) | Inverse of the regularized upper incomplete gamma function. |
| [`maxframe.tensor.special.beta`](generated/maxframe.tensor.special.beta.md#maxframe.tensor.special.beta) | Beta function. |
| [`maxframe.tensor.special.betaln`](generated/maxframe.tensor.special.betaln.md#maxframe.tensor.special.betaln) | Natural logarithm of absolute value of beta function. |
| [`maxframe.tensor.special.betainc`](generated/maxframe.tensor.special.betainc.md#maxframe.tensor.special.betainc) | Regularized incomplete beta function. |
| [`maxframe.tensor.special.betaincinv`](generated/maxframe.tensor.special.betaincinv.md#maxframe.tensor.special.betaincinv) | Inverse of the regularized incomplete beta function. |
| [`maxframe.tensor.special.psi`](generated/maxframe.tensor.special.psi.md#maxframe.tensor.special.psi) | psi(z, out=None) |
| [`maxframe.tensor.special.rgamma`](generated/maxframe.tensor.special.rgamma.md#maxframe.tensor.special.rgamma) | rgamma(z, out=None) |
| [`maxframe.tensor.special.polygamma`](generated/maxframe.tensor.special.polygamma.md#maxframe.tensor.special.polygamma) | Polygamma functions. |
| [`maxframe.tensor.special.multigammaln`](generated/maxframe.tensor.special.multigammaln.md#maxframe.tensor.special.multigammaln) | Returns the log of multivariate gamma, also sometimes called the generalized gamma. |
| [`maxframe.tensor.special.digamma`](generated/maxframe.tensor.special.digamma.md#maxframe.tensor.special.digamma) | psi(z, out=None) |
| [`maxframe.tensor.special.poch`](generated/maxframe.tensor.special.poch.md#maxframe.tensor.special.poch) | Pochhammer symbol. |
## Sigmoidal functions
| [`maxframe.tensor.special.expit`](generated/maxframe.tensor.special.expit.md#maxframe.tensor.special.expit) | expit(x, out=None) |
|-------------------------------------------------------------------------------------------------------------------------|------------------------|
| [`maxframe.tensor.special.log_expit`](generated/maxframe.tensor.special.log_expit.md#maxframe.tensor.special.log_expit) | log_expit(x, out=None) |
| [`maxframe.tensor.special.logit`](generated/maxframe.tensor.special.logit.md#maxframe.tensor.special.logit) | |
## Other special functions
| [`maxframe.tensor.special.softmax`](generated/maxframe.tensor.special.softmax.md#maxframe.tensor.special.softmax) | Compute the softmax function. The softmax function transforms each element of a collection by computing the exponential of each element divided by the sum of the exponentials of all the elements. That is, if x is a one-dimensional numpy array::. |
|----------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [`maxframe.tensor.special.softplus`](generated/maxframe.tensor.special.softplus.md#maxframe.tensor.special.softplus) | Compute the softplus function element-wise. |
## Convenience functions
| [`maxframe.tensor.special.xlogy`](generated/maxframe.tensor.special.xlogy.md#maxframe.tensor.special.xlogy) | Compute `x*log(y)` so that the result is 0 if `x = 0`. |
|---------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|
FILE:references/maxframe-client-docs/reference/tensor/statistics.md
# Statistics
## Order statistics
| [`maxframe.tensor.ptp`](generated/maxframe.tensor.ptp.md#maxframe.tensor.ptp) | Range of values (maximum - minimum) along an axis. |
|----------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| [`maxframe.tensor.percentile`](generated/maxframe.tensor.percentile.md#maxframe.tensor.percentile) | Compute the q-th percentile of the data along the specified axis. |
| [`maxframe.tensor.quantile`](generated/maxframe.tensor.quantile.md#maxframe.tensor.quantile) | Compute the q-th quantile of the data along the specified axis. |
| [`maxframe.tensor.nanmin`](generated/maxframe.tensor.nanmin.md#maxframe.tensor.nanmin) | Return minimum of a tensor or minimum along an axis, ignoring any NaNs. |
| [`maxframe.tensor.nanmax`](generated/maxframe.tensor.nanmax.md#maxframe.tensor.nanmax) | Return the maximum of an array or maximum along an axis, ignoring any NaNs. |
## Average and variances
| [`maxframe.tensor.median`](generated/maxframe.tensor.median.md#maxframe.tensor.median) | Compute the median along the specified axis. |
|-------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
| [`maxframe.tensor.average`](generated/maxframe.tensor.average.md#maxframe.tensor.average) | Compute the weighted average along the specified axis. |
| [`maxframe.tensor.mean`](generated/maxframe.tensor.mean.md#maxframe.tensor.mean) | Compute the arithmetic mean along the specified axis. |
| [`maxframe.tensor.std`](generated/maxframe.tensor.std.md#maxframe.tensor.std) | Compute the standard deviation along the specified axis. |
| [`maxframe.tensor.var`](generated/maxframe.tensor.var.md#maxframe.tensor.var) | Compute the variance along the specified axis. |
| [`maxframe.tensor.nanmean`](generated/maxframe.tensor.nanmean.md#maxframe.tensor.nanmean) | Compute the arithmetic mean along the specified axis, ignoring NaNs. |
| [`maxframe.tensor.nanstd`](generated/maxframe.tensor.nanstd.md#maxframe.tensor.nanstd) | Compute the standard deviation along the specified axis, while ignoring NaNs. |
| [`maxframe.tensor.nanvar`](generated/maxframe.tensor.nanvar.md#maxframe.tensor.nanvar) | Compute the variance along the specified axis, while ignoring NaNs. |
## Correlating
| [`maxframe.tensor.corrcoef`](generated/maxframe.tensor.corrcoef.md#maxframe.tensor.corrcoef) | Return Pearson product-moment correlation coefficients. |
|------------------------------------------------------------------------------------------------|-----------------------------------------------------------|
| [`maxframe.tensor.cov`](generated/maxframe.tensor.cov.md#maxframe.tensor.cov) | Estimate a covariance matrix, given data and weights. |
## Histograms
| [`maxframe.tensor.histogram`](generated/maxframe.tensor.histogram.md#maxframe.tensor.histogram) | Compute the histogram of a set of data. |
|-------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| [`maxframe.tensor.bincount`](generated/maxframe.tensor.bincount.md#maxframe.tensor.bincount) | Count number of occurrences of each value in array of non-negative ints. |
| [`maxframe.tensor.histogram_bin_edges`](generated/maxframe.tensor.histogram_bin_edges.md#maxframe.tensor.histogram_bin_edges) | Function to calculate only the edges of the bins used by the histogram function. |
| [`maxframe.tensor.digitize`](generated/maxframe.tensor.digitize.md#maxframe.tensor.digitize) | Return the indices of the bins to which each value in input tensor belongs. |
FILE:references/maxframe-client-docs/user_guide/dataframe/index.md
# DataFrame
* [Data input and output](io.md)
* [MaxCompute](io.md#maxcompute)
* [pandas objects](io.md#pandas-objects)
* [Supported pandas APIs](supported_pd_apis.md)
* [Series API](supported_pd_apis.md#series-api)
* [DataFrame API](supported_pd_apis.md#dataframe-api)
* [Index API](supported_pd_apis.md#index-api)
FILE:references/maxframe-client-docs/user_guide/dataframe/io.md
# Data input and output
## MaxCompute
### Read and convert to tables
Users can create MaxFrame DataFrame objects from MaxCompute tables with [`read_odps_table()`](../../reference/dataframe/generated/maxframe.dataframe.read_odps_table.md#maxframe.dataframe.read_odps_table)
, and store computed results into MaxCompute tables with [`DataFrame.to_odps_table()`](../../reference/dataframe/generated/maxframe.dataframe.DataFrame.to_odps_table.md#maxframe.dataframe.DataFrame.to_odps_table).
For instance, if you want to get data from non-partitioned table test_odps_table,
do some transformation by MaxFrame and store it into another partitioned table
test_processed_odps_table, you may use [`read_odps_table()`](../../reference/dataframe/generated/maxframe.dataframe.read_odps_table.md#maxframe.dataframe.read_odps_table) as is shown below.
```python
import maxframe.dataframe as md
df = md.read_odps_table("test_odps_table")
processed_df = df[df.A > 10]
processed_df.to_odps_table("test_processed_odps_table")
```
If the table is partitioned, `read_odps_table` will read data from all partitions which
should be definitely avoided if there are a number of partitions. You can select one
partition or a number of partitions with `partitions` argument.
```python
df = md.read_odps_table(
"parted_odps_table", partitions=["pt1=20240119,pt2=10", "pt1=20240119,pt2=11"]
)
```
Values of partition columns are not included in results by default. If you need these values,
you may specify `append_partitions=True`.
```python
df = md.read_odps_table(
"parted_odps_table", partitions=["pt1=20240119,pt2=10"], append_partitions=True
)
```
The resulting DataFrame will produce a RangeIndex by default. You may use `index_col`
argument to specify existing columns as indexes.
```python
df = md.read_odps_table(
"parted_odps_table", partitions=["pt1=20240119,pt2=10"], index_col=["idx_col"]
)
```
If you want to store prepreocessed `df` into a MaxCompute table, you can use `to_odps_table()`
as is shown below.
```python
df.to_odps_table("output_table_name").execute()
```
You can control the behavior of index output via `index` and `index_label` arguments.
By default the index is outputted. If you do not want to output the index, you may specify
`index` argument as False.
```python
df.to_odps_table("output_table_name", index=False).execute()
```
The names of columns for indexes is the names of the indexes by default. If names of indexes
are not specified, the name `index` will be used if the index only has one level, or
`level_x` will be used, where `x` is the integer index of the level.
Data can be stored as partitioned tables. You may specify `partition` argument as the partition
to write.
```python
df.to_odps_table("parted_table", partition="pt=20240121,h=12").execute()
```
You can also specify columns as partition columns. The data of these columns will dynamically
decide the partition the row will be written to.
```python
df.to_odps_table("parted_table", partition_col=["pt_col"]).execute()
```
### Read from SQL queries
Users can create MaxFrame DataFrame objects with MaxCompute queries with [`read_odps_query()`](../../reference/dataframe/generated/maxframe.dataframe.read_odps_query.md#maxframe.dataframe.read_odps_query).
MaxFrame will retrieve DataFrame schema with [EXPLAIN](https://www.alibabacloud.com/help/zh/maxcompute/user-guide/explain) statement and execute
queries in MaxCompute.
For instance, you can create a DataFrame with SQL statement below.
```python
md_df = md.read_odps_query(
"SELECT a.shop_name AS ashop, b.shop_name AS bshop FROM sale_detail_jt a "
"RIGHT OUTER JOIN sale_detail b ON a.shop_name=b.shop_name"
)
```
## pandas objects
Users can convert between local pandas objects and DataFrames with [`read_pandas()`](../../reference/dataframe/generated/maxframe.dataframe.read_pandas.md#maxframe.dataframe.read_pandas)
and [`DataFrame.to_pandas()`](../../reference/dataframe/generated/maxframe.dataframe.DataFrame.to_pandas.md#maxframe.dataframe.DataFrame.to_pandas).
When `read_pandas` is called, these pandas objects will be uploaded to MaxCompute and
be used in the cluster.
```python
md_df = md.read_pandas(pd_df)
```
After transformation is done in MaxFrame, data can be downloaded to client with `to_pandas`.
```python
pd_df = md_df.to_pandas()
```
FILE:references/maxframe-client-docs/user_guide/dataframe/supported_pd_apis.md
# Supported pandas APIs
The table below shows implementation of pandas APIs on MaxFrame on certain engines.
If the API is not fully supported, unsupported item will be shown in the detail
column.
## Series API
| API | SQL Engine | DPE | SPE | Details |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|-------|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
| [`add()`](../../reference/dataframe/generated/maxframe.dataframe.Series.add.md#maxframe.dataframe.Series.add), [`radd()`](../../reference/dataframe/generated/maxframe.dataframe.Series.radd.md#maxframe.dataframe.Series.radd) | P | Y | Y | SQL engine: argument `level` and `fill_value` not supported. |
| [`add_prefix()`](../../reference/dataframe/generated/maxframe.dataframe.Series.add_prefix.md#maxframe.dataframe.Series.add_prefix) | Y | Y | Y | |
| [`add_suffix()`](../../reference/dataframe/generated/maxframe.dataframe.Series.add_suffix.md#maxframe.dataframe.Series.add_suffix) | Y | Y | Y | |
| [`agg()`](../../reference/dataframe/generated/maxframe.dataframe.Series.agg.md#maxframe.dataframe.Series.agg) | P | Y | Y | SQL engine: customized aggregation not supported. |
| [`align()`](../../reference/dataframe/generated/maxframe.dataframe.Series.align.md#maxframe.dataframe.Series.align) | N | Y | Y | |
| [`all()`](../../reference/dataframe/generated/maxframe.dataframe.Series.all.md#maxframe.dataframe.Series.all) | P | Y | Y | SQL engine: argument `skipna`, `level` and `min_count` not supported. |
| [`any()`](../../reference/dataframe/generated/maxframe.dataframe.Series.any.md#maxframe.dataframe.Series.any) | P | Y | Y | SQL engine: argument `skipna`, `level` and `min_count` not supported. |
| [`append()`](../../reference/dataframe/generated/maxframe.dataframe.Series.append.md#maxframe.dataframe.Series.append) | P | Y | Y | |
| [`apply()`](../../reference/dataframe/generated/maxframe.dataframe.Series.apply.md#maxframe.dataframe.Series.apply) | P | Y | Y | |
| [`argmax()`](../../reference/dataframe/generated/maxframe.dataframe.Series.argmax.md#maxframe.dataframe.Series.argmax) | N | Y | N | |
| [`argmin()`](../../reference/dataframe/generated/maxframe.dataframe.Series.argmin.md#maxframe.dataframe.Series.argmin) | N | Y | N | |
| [`argsort()`](../../reference/dataframe/generated/maxframe.dataframe.Series.argsort.md#maxframe.dataframe.Series.argsort) | N | Y | N | |
| [`astype()`](../../reference/dataframe/generated/maxframe.dataframe.Series.astype.md#maxframe.dataframe.Series.astype) | P | Y | Y | SQL engine: converting to categorical types not supported. |
| `autocorr()` | N | P | Y | DPE engine: only pearson correlation coefficient is supported. |
| [`between()`](../../reference/dataframe/generated/maxframe.dataframe.Series.between.md#maxframe.dataframe.Series.between) | Y | Y | Y | |
| [`case_when()`](../../reference/dataframe/generated/maxframe.dataframe.Series.case_when.md#maxframe.dataframe.Series.case_when) | Y | N | Y | |
| [`clip()`](../../reference/dataframe/generated/maxframe.dataframe.Series.clip.md#maxframe.dataframe.Series.clip) | N | Y | Y | |
| [`compare()`](../../reference/dataframe/generated/maxframe.dataframe.Series.compare.md#maxframe.dataframe.Series.compare) | N | Y | Y | |
| [`corr()`](../../reference/dataframe/generated/maxframe.dataframe.Series.corr.md#maxframe.dataframe.Series.corr) | N | P | Y | DPE engine: only pearson correlation coefficient is supported. |
| [`count()`](../../reference/dataframe/generated/maxframe.dataframe.Series.count.md#maxframe.dataframe.Series.count) | P | Y | Y | SQL engine: argument `skipna`, `level` and `min_count` not supported. |
| [`cov()`](../../reference/dataframe/generated/maxframe.dataframe.Series.cov.md#maxframe.dataframe.Series.cov) | N | Y | Y | |
| `cummax()` | N | Y | Y | |
| `cummin()` | N | Y | Y | |
| `cumprod()` | N | Y | Y | |
| `cumsum()` | N | Y | Y | |
| [`describe()`](../../reference/dataframe/generated/maxframe.dataframe.Series.describe.md#maxframe.dataframe.Series.describe) | Y | P | Y | DPE engine: medians and percentiles not supported by now. |
| `diff()` | N | Y | Y | |
| [`div()`](../../reference/dataframe/generated/maxframe.dataframe.Series.div.md#maxframe.dataframe.Series.div), [`rdiv()`](../../reference/dataframe/generated/maxframe.dataframe.Series.rdiv.md#maxframe.dataframe.Series.rdiv) | P | Y | Y | SQL engine: argument `level` and `fill_value` not supported. |
| `dot()` | N | Y | Y | |
| [`drop()`](../../reference/dataframe/generated/maxframe.dataframe.Series.drop.md#maxframe.dataframe.Series.drop) | Y | Y | Y | |
| [`drop_duplicates()`](../../reference/dataframe/generated/maxframe.dataframe.Series.drop_duplicates.md#maxframe.dataframe.Series.drop_duplicates) | P | Y | Y | SQL engine: maintaining original order of data not supported. |
| [`droplevel()`](../../reference/dataframe/generated/maxframe.dataframe.Series.droplevel.md#maxframe.dataframe.Series.droplevel) | N | Y | Y | |
| [`dropna()`](../../reference/dataframe/generated/maxframe.dataframe.Series.dropna.md#maxframe.dataframe.Series.dropna) | Y | Y | Y | |
| `duplicated()` | N | Y | Y | |
| `empty()` | Y | Y | Y | |
| [`eq()`](../../reference/dataframe/generated/maxframe.dataframe.Series.eq.md#maxframe.dataframe.Series.eq), [`ne()`](../../reference/dataframe/generated/maxframe.dataframe.Series.ne.md#maxframe.dataframe.Series.ne) | P | Y | Y | SQL engine: argument `level` and `fill_value` not supported. |
| [`explode()`](../../reference/dataframe/generated/maxframe.dataframe.Series.explode.md#maxframe.dataframe.Series.explode) | Y | Y | Y | |
| [`fillna()`](../../reference/dataframe/generated/maxframe.dataframe.Series.fillna.md#maxframe.dataframe.Series.fillna) | P | Y | Y | SQL engine: argument `downcast`, `limit` and `method` not supported. |
| [`filter()`](../../reference/dataframe/generated/maxframe.dataframe.Series.filter.md#maxframe.dataframe.Series.filter) | N | Y | Y | |
| [`first_valid_index()`](../../reference/dataframe/generated/maxframe.dataframe.Series.first_valid_index.md#maxframe.dataframe.Series.first_valid_index) | N | Y | Y | |
| [`floordiv()`](../../reference/dataframe/generated/maxframe.dataframe.Series.floordiv.md#maxframe.dataframe.Series.floordiv), [`rfloordiv()`](../../reference/dataframe/generated/maxframe.dataframe.Series.rfloordiv.md#maxframe.dataframe.Series.rfloordiv) | P | Y | Y | SQL engine: argument `level` and `fill_value` not supported. |
| [`ge()`](../../reference/dataframe/generated/maxframe.dataframe.Series.ge.md#maxframe.dataframe.Series.ge), [`gt()`](../../reference/dataframe/generated/maxframe.dataframe.Series.gt.md#maxframe.dataframe.Series.gt) | P | Y | Y | SQL engine: argument `level` and `fill_value` not supported. |
| `series[item]` (or ``__getitem__`) | N | Y | Y | |
| `hasnans()` | N | Y | Y | |
| [`head()`](../../reference/dataframe/generated/maxframe.dataframe.Series.head.md#maxframe.dataframe.Series.head) | Y | Y | Y | |
| `hist()` | Y | Y | Y | |
| [`iat()`](../../reference/dataframe/generated/maxframe.dataframe.Series.iat.md#maxframe.dataframe.Series.iat) | N | Y | Y | |
| [`idxmax()`](../../reference/dataframe/generated/maxframe.dataframe.Series.idxmax.md#maxframe.dataframe.Series.idxmax) | N | Y | Y | |
| [`idxmin()`](../../reference/dataframe/generated/maxframe.dataframe.Series.idxmin.md#maxframe.dataframe.Series.idxmin) | N | Y | Y | |
| [`iloc()`](../../reference/dataframe/generated/maxframe.dataframe.Series.iloc.md#maxframe.dataframe.Series.iloc) | P | Y | Y | SQL engine: Non-continuous indexes or negative indexes (for instance, `df.iloc[[1, 3]]`, `df.iloc[1:10:2]` or `df.iloc[-3:]`) not supported. |
| `is_monotonic()`, [`is_monotonic_decreasing()`](../../reference/dataframe/generated/maxframe.dataframe.Series.is_monotonic_decreasing.md#maxframe.dataframe.Series.is_monotonic_decreasing), [`is_monotonic_increasing()`](../../reference/dataframe/generated/maxframe.dataframe.Series.is_monotonic_increasing.md#maxframe.dataframe.Series.is_monotonic_increasing) | N | Y | Y | |
| [`is_unique()`](../../reference/dataframe/generated/maxframe.dataframe.Series.is_unique.md#maxframe.dataframe.Series.is_unique) | N | Y | Y | |
| [`isin()`](../../reference/dataframe/generated/maxframe.dataframe.Series.isin.md#maxframe.dataframe.Series.isin) | P | Y | Y | SQL engine: index input not supported. |
| [`isna()`](../../reference/dataframe/generated/maxframe.dataframe.Series.isna.md#maxframe.dataframe.Series.isna), [`notna()`](../../reference/dataframe/generated/maxframe.dataframe.Series.notna.md#maxframe.dataframe.Series.notna) | Y | Y | Y | |
| `isnull()`, `notnull()` | Y | Y | Y | |
| `items()` | Y | Y | Y | |
| `kurtosis()` | N | Y | Y | |
| [`last_valid_index()`](../../reference/dataframe/generated/maxframe.dataframe.Series.last_valid_index.md#maxframe.dataframe.Series.last_valid_index) | N | Y | Y | |
| [`le()`](../../reference/dataframe/generated/maxframe.dataframe.Series.le.md#maxframe.dataframe.Series.le), [`lt()`](../../reference/dataframe/generated/maxframe.dataframe.Series.lt.md#maxframe.dataframe.Series.lt) | P | Y | Y | SQL engine: argument `level` and `fill_value` not supported. |
| [`loc()`](../../reference/dataframe/generated/maxframe.dataframe.Series.loc.md#maxframe.dataframe.Series.loc) | N | Y | Y | |
| [`map()`](../../reference/dataframe/generated/maxframe.dataframe.Series.map.md#maxframe.dataframe.Series.map) | P | Y | Y | SQL engine: argument `arg` only supports functions and non-derivative dicts with simple scalars. |
| [`mask()`](../../reference/dataframe/generated/maxframe.dataframe.Series.mask.md#maxframe.dataframe.Series.mask) | N | Y | Y | |
| [`max()`](../../reference/dataframe/generated/maxframe.dataframe.Series.max.md#maxframe.dataframe.Series.max) | P | Y | Y | SQL engine: argument `skipna`, `level` and `min_count` not supported. |
| [`mean()`](../../reference/dataframe/generated/maxframe.dataframe.Series.mean.md#maxframe.dataframe.Series.mean) | P | Y | Y | SQL engine: argument `skipna`, `level` and `min_count` not supported. |
| [`memory_usage()`](../../reference/dataframe/generated/maxframe.dataframe.Series.memory_usage.md#maxframe.dataframe.Series.memory_usage) | N | Y | Y | |
| [`min()`](../../reference/dataframe/generated/maxframe.dataframe.Series.min.md#maxframe.dataframe.Series.min) | P | Y | Y | SQL engine: argument `skipna`, `level` and `min_count` not supported. |
| [`mod()`](../../reference/dataframe/generated/maxframe.dataframe.Series.mod.md#maxframe.dataframe.Series.mod), [`rmod()`](../../reference/dataframe/generated/maxframe.dataframe.Series.rmod.md#maxframe.dataframe.Series.rmod) | P | Y | Y | SQL engine: argument `level` and `fill_value` not supported. |
| [`mul()`](../../reference/dataframe/generated/maxframe.dataframe.Series.mul.md#maxframe.dataframe.Series.mul), [`rmul()`](../../reference/dataframe/generated/maxframe.dataframe.Series.rmul.md#maxframe.dataframe.Series.rmul) | P | Y | Y | SQL engine: argument `level` and `fill_value` not supported. |
| [`nunique()`](../../reference/dataframe/generated/maxframe.dataframe.Series.nunique.md#maxframe.dataframe.Series.nunique) | P | Y | Y | SQL engine: argument `skipna`, `level` and `min_count` not supported. `dropna==False` not supported. |
| `pct_change()` | N | Y | Y | |
| [`plot()`](../../reference/dataframe/generated/maxframe.dataframe.Series.plot.md#maxframe.dataframe.Series.plot) | Y | Y | Y | |
| [`pow()`](../../reference/dataframe/generated/maxframe.dataframe.Series.pow.md#maxframe.dataframe.Series.pow), [`rpow()`](../../reference/dataframe/generated/maxframe.dataframe.Series.rpow.md#maxframe.dataframe.Series.rpow) | P | Y | Y | SQL engine: argument `level` and `fill_value` not supported. |
| [`prod()`](../../reference/dataframe/generated/maxframe.dataframe.Series.prod.md#maxframe.dataframe.Series.prod) | P | Y | Y | SQL engine: argument `skipna`, `level` and `min_count` not supported. |
| [`quantile()`](../../reference/dataframe/generated/maxframe.dataframe.Series.quantile.md#maxframe.dataframe.Series.quantile) | Y | Y | Y | |
| [`reindex()`](../../reference/dataframe/generated/maxframe.dataframe.Series.reindex.md#maxframe.dataframe.Series.reindex) | P | Y | Y | |
| [`reindex_like()`](../../reference/dataframe/generated/maxframe.dataframe.Series.reindex_like.md#maxframe.dataframe.Series.reindex_like) | P | Y | Y | |
| [`rename()`](../../reference/dataframe/generated/maxframe.dataframe.Series.rename.md#maxframe.dataframe.Series.rename) | P | Y | Y | MCSQL engine: renaming indexes not supported. |
| `rename_axis()` | Y | Y | Y | |
| [`reorder_levels()`](../../reference/dataframe/generated/maxframe.dataframe.Series.reorder_levels.md#maxframe.dataframe.Series.reorder_levels) | N | Y | Y | |
| `replace()` | P | Y | Y | SQL engine: when the argument `regex` is True, `list` or `dict` typed `to_replace` not supported. `list` or `dict` typed `regex` argument not supported. |
| [`reset_index()`](../../reference/dataframe/generated/maxframe.dataframe.Series.reset_index.md#maxframe.dataframe.Series.reset_index) | P | Y | Y | |
| [`round()`](../../reference/dataframe/generated/maxframe.dataframe.Series.round.md#maxframe.dataframe.Series.round) | Y | Y | Y | |
| [`sample()`](../../reference/dataframe/generated/maxframe.dataframe.Series.sample.md#maxframe.dataframe.Series.sample) | P | Y | Y | SQL engine: argument `replace` and `weights` not supported. `frac>1` not supported. |
| [`sem()`](../../reference/dataframe/generated/maxframe.dataframe.Series.sem.md#maxframe.dataframe.Series.sem) | P | Y | Y | SQL engine: argument `skipna`, `level` and `min_count` not supported. |
| [`set_axis()`](../../reference/dataframe/generated/maxframe.dataframe.Series.set_axis.md#maxframe.dataframe.Series.set_axis) | N | Y | Y | |
| `series[item] = value` (or ``__setitem__`) | P | Y | Y | SQL engine: not supported when `item` is callable or DataFrame / Series or index of `series` and `item` are different. |
| [`shift()`](../../reference/dataframe/generated/maxframe.dataframe.Series.shift.md#maxframe.dataframe.Series.shift) | N | Y | Y | |
| `size()` | P | Y | Y | |
| `skew()` | N | Y | Y | |
| [`sort_index()`](../../reference/dataframe/generated/maxframe.dataframe.Series.sort_index.md#maxframe.dataframe.Series.sort_index) | P | Y | Y | SQL engine: `na_position=='last'` not supported. |
| [`sort_values()`](../../reference/dataframe/generated/maxframe.dataframe.Series.sort_values.md#maxframe.dataframe.Series.sort_values) | P | Y | Y | SQL engine: `na_position=='last'` not supported. |
| [`std()`](../../reference/dataframe/generated/maxframe.dataframe.Series.std.md#maxframe.dataframe.Series.std) | P | Y | Y | SQL engine: argument `level` and `fill_value` not supported. |
| [`sub()`](../../reference/dataframe/generated/maxframe.dataframe.Series.sub.md#maxframe.dataframe.Series.sub), [`rsub()`](../../reference/dataframe/generated/maxframe.dataframe.Series.rsub.md#maxframe.dataframe.Series.rsub) | P | Y | Y | SQL engine: argument `level` and `fill_value` not supported. |
| [`sum()`](../../reference/dataframe/generated/maxframe.dataframe.Series.sum.md#maxframe.dataframe.Series.sum) | P | Y | Y | SQL engine: argument `skipna`, `level` and `min_count` not supported. |
| [`swaplevel()`](../../reference/dataframe/generated/maxframe.dataframe.Series.swaplevel.md#maxframe.dataframe.Series.swaplevel) | N | Y | Y | |
| `tail()` | N | Y | Y | |
| [`take()`](../../reference/dataframe/generated/maxframe.dataframe.Series.take.md#maxframe.dataframe.Series.take) | N | Y | Y | |
| [`to_frame()`](../../reference/dataframe/generated/maxframe.dataframe.Series.to_frame.md#maxframe.dataframe.Series.to_frame) | Y | Y | Y | |
| [`transform()`](../../reference/dataframe/generated/maxframe.dataframe.Series.transform.md#maxframe.dataframe.Series.transform) | P | Y | Y | SQL engine: not supported when `func` is dict-like or list-like. |
| [`truediv()`](../../reference/dataframe/generated/maxframe.dataframe.Series.truediv.md#maxframe.dataframe.Series.truediv), [`rtruediv()`](../../reference/dataframe/generated/maxframe.dataframe.Series.rtruediv.md#maxframe.dataframe.Series.rtruediv) | P | Y | Y | SQL engine: argument `level` and `fill_value` not supported. |
| [`tshift()`](../../reference/dataframe/generated/maxframe.dataframe.Series.tshift.md#maxframe.dataframe.Series.tshift) | N | Y | Y | |
| [`unique()`](../../reference/dataframe/generated/maxframe.dataframe.Series.unique.md#maxframe.dataframe.Series.unique) | P | Y | Y | |
| [`unstack()`](../../reference/dataframe/generated/maxframe.dataframe.Series.unstack.md#maxframe.dataframe.Series.unstack) | N | Y | Y | |
| [`value_counts()`](../../reference/dataframe/generated/maxframe.dataframe.Series.value_counts.md#maxframe.dataframe.Series.value_counts) | Y | Y | Y | |
| [`var()`](../../reference/dataframe/generated/maxframe.dataframe.Series.var.md#maxframe.dataframe.Series.var) | P | Y | Y | SQL engine: argument `skipna`, `level` and `min_count` not supported. |
| [`where()`](../../reference/dataframe/generated/maxframe.dataframe.Series.where.md#maxframe.dataframe.Series.where) | N | Y | Y | |
| [`xs()`](../../reference/dataframe/generated/maxframe.dataframe.Series.xs.md#maxframe.dataframe.Series.xs) | N | Y | Y | |
## DataFrame API
## Index API
| API | SQL Engine | DPE | SPE | Details |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|-------|-------|----------------------------------------------------------------------------------------------------------------------------------------------|
| [`all()`](../../reference/dataframe/generated/maxframe.dataframe.Index.all.md#maxframe.dataframe.Index.all) | N | Y | Y | |
| [`any()`](../../reference/dataframe/generated/maxframe.dataframe.Index.any.md#maxframe.dataframe.Index.any) | N | Y | Y | |
| `append()` | Y | Y | Y | |
| [`astype()`](../../reference/dataframe/generated/maxframe.dataframe.Index.astype.md#maxframe.dataframe.Index.astype) | P | Y | Y | SQL engine: converting to categorical types not supported. |
| `count()` | N | Y | Y | |
| [`drop()`](../../reference/dataframe/generated/maxframe.dataframe.Index.drop.md#maxframe.dataframe.Index.drop) | Y | Y | Y | |
| [`drop_duplicates()`](../../reference/dataframe/generated/maxframe.dataframe.Index.drop_duplicates.md#maxframe.dataframe.Index.drop_duplicates) | P | Y | Y | SQL engine: maintaining original order of data not supported. |
| [`droplevel()`](../../reference/dataframe/generated/maxframe.dataframe.Index.droplevel.md#maxframe.dataframe.Index.droplevel) | N | Y | Y | |
| [`dropna()`](../../reference/dataframe/generated/maxframe.dataframe.Index.dropna.md#maxframe.dataframe.Index.dropna) | Y | Y | Y | |
| `duplicated()` | N | Y | Y | |
| `empty()` | Y | Y | Y | |
| [`fillna()`](../../reference/dataframe/generated/maxframe.dataframe.Index.fillna.md#maxframe.dataframe.Index.fillna) | P | Y | Y | SQL engine: argument `downcast`, `limit` and `method` not supported. |
| `series[item]` (or ``__getitem__`) | N | Y | Y | |
| [`get_level_values()`](../../reference/dataframe/generated/maxframe.dataframe.Index.get_level_values.md#maxframe.dataframe.Index.get_level_values) | N | Y | Y | |
| `iloc()` | P | Y | Y | SQL engine: Non-continuous indexes or negative indexes (for instance, `df.iloc[[1, 3]]`, `df.iloc[1:10:2]` or `df.iloc[-3:]`) not supported. |
| `isin()` | N | Y | Y | |
| [`isna()`](../../reference/dataframe/generated/maxframe.dataframe.Index.isna.md#maxframe.dataframe.Index.isna), [`notna()`](../../reference/dataframe/generated/maxframe.dataframe.Index.notna.md#maxframe.dataframe.Index.notna) | Y | Y | Y | |
| `isnull()`, `notnull()` | Y | Y | Y | |
| [`map()`](https://docs.python.org/3/library/functions.html#map) | P | Y | Y | SQL engine: argument `arg` only supports functions and non-derivative dicts with simple scalars. |
| [`max()`](../../reference/dataframe/generated/maxframe.dataframe.Index.max.md#maxframe.dataframe.Index.max) | N | Y | Y | |
| `memory_usage()` | N | Y | Y | |
| `mean()` | N | Y | Y | |
| [`min()`](../../reference/dataframe/generated/maxframe.dataframe.Index.min.md#maxframe.dataframe.Index.min) | N | Y | Y | |
| `nunique()` | N | Y | Y | |
| `reindex()` | N | Y | Y | |
| [`rename()`](../../reference/dataframe/generated/maxframe.dataframe.Index.rename.md#maxframe.dataframe.Index.rename) | Y | Y | Y | |
| [`set_names()`](../../reference/dataframe/generated/maxframe.dataframe.Index.set_names.md#maxframe.dataframe.Index.set_names) | Y | Y | Y | |
| `shift()` | N | Y | Y | |
| `sort_values()` | N | Y | Y | |
| [`to_frame()`](../../reference/dataframe/generated/maxframe.dataframe.Index.to_frame.md#maxframe.dataframe.Index.to_frame) | Y | Y | Y | |
| [`to_series()`](../../reference/dataframe/generated/maxframe.dataframe.Index.to_series.md#maxframe.dataframe.Index.to_series) | Y | Y | Y | |
| `unique()` | N | Y | Y | |
| `value_counts()` | N | Y | Y | |
| `where()` | N | Y | Y | |
FILE:references/maxframe-client-docs/user_guide/general/execution.md
# Executing and getting results
## Lazy execution
MaxFrame uses lazy execution to make use of global optimization. That is, unless
execution results are needed at client side locally, MaxFrame expressions are not
executed without manual execution. For instance,
```python
>>> df.head(3)
DataFrame <op=DataFrameILoc, key=182b756be8a9f15c937a04223f11ffba>
```
is not executed, while
```python
>>> df.head(3).execute()
0 1 2
0 0.167771 0.568741 0.877450
1 0.037518 0.796745 0.072169
2 0.052900 0.936048 0.307194
```
will trigger execution. Here we list several conditions that will trigger
execution below.
* Direct `execute()` calls.
* [`maxframe.dataframe.DataFrame.to_pandas()`](../../reference/dataframe/generated/maxframe.dataframe.DataFrame.to_pandas.md#maxframe.dataframe.DataFrame.to_pandas) calls.
* All plot functions for DataFrame and Series, including [`maxframe.dataframe.DataFrame.plot()`](../../reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.md#maxframe.dataframe.DataFrame.plot),
[`maxframe.dataframe.DataFrame.plot.bar()`](../../reference/dataframe/generated/maxframe.dataframe.DataFrame.plot.bar.md#maxframe.dataframe.DataFrame.plot.bar) and so on.
## Asynchrous execution
Specifying `wait=False` can make execuiton asynchronous. A [Future object](https://docs.python.org/3/library/concurrent.futures.html#future-objects) will be returned.
```python
>>> fut = df.head(3).execute(wait=False)
>>> fut.wait()
>>> fut.result()
0 1 2
0 0.167771 0.568741 0.877450
1 0.037518 0.796745 0.072169
2 0.052900 0.936048 0.307194
```
## Obtaining results
You can use `fetch()` function to fetch execution result from executed objects. The fetched
data is a local Python object (i.e., pandas objects or numpy arrays) that can be handled by
local Python libraries.
```python
>>> df.execute().fetch()
0 1 2
0 0.167771 0.568741 0.877450
1 0.037518 0.796745 0.072169
2 0.052900 0.936048 0.307194
```
Note that `fetch()` will fetch all data behind the MaxFrame object. If you just need to preview
several data, just use `repr()` function or simply call `execute()` method in an interactive
Python environment like IPython or JupyterLab. MaxFrame will simply peek first and last rows.
```python
>>> repr(df.execute()) # or simply df.execute() if in an interactive environment
0 1 2
0 0.167771 0.568741 0.877450
1 0.037518 0.796745 0.072169
2 0.052900 0.936048 0.307194
...
97 0.167771 0.568741 0.877450
98 0.037518 0.796745 0.072169
99 0.052900 0.936048 0.307194
```
FILE:references/maxframe-client-docs/user_guide/general/index.md
# General topics
* [Executing and getting results](execution.md)
* [Lazy execution](execution.md#lazy-execution)
* [Asynchrous execution](execution.md#asynchrous-execution)
* [Obtaining results](execution.md#obtaining-results)
FILE:references/maxframe-client-docs/user_guide/index.md
<a id="user-guide-index"></a>
# User Guide
* [General topics](general/index.md)
* [Executing and getting results](general/execution.md)
* [DataFrame](dataframe/index.md)
* [Data input and output](dataframe/io.md)
* [Supported pandas APIs](dataframe/supported_pd_apis.md)
FILE:references/operators-and-modules/advanced-topics-complementary.md
# MaxFrame Advanced Topics and Complementary Reference
This document contains advanced topics, configuration options, and detailed API compatibility information not covered in `key-modules.md`. For basic DataFrame, Tensor, and ML operations, see `key-modules.md`.
## Table of Contents
- [Configuration and Options](#configuration-and-options)
- [Remote Execution](#remote-execution)
- [User-Defined Functions (UDFs)](#user-defined-functions-udfs)
- [Best Practices](#best-practices)
- [Error Handling](#error-handling)
- [Complete Example](#complete-example)
- [API Compatibility](#api-compatibility)
- [Additional API Usage Examples](#additional-api-usage-examples)
---
## Configuration and Options
MaxFrame provides a configuration system for customizing behavior.
```python
from maxframe import options
# Execution mode
options.execution_mode = 'trigger' # or 'eager'
# Progress display
options.show_progress = True # or False, 'auto'
# Chunk size
options.chunk_size = 1024
# Session configuration
options.session.max_alive_seconds = 3600
options.session.max_idle_seconds = 600
options.session.quota_name = 'your_quota'
# DataFrame backend
options.dataframe.dtype_backend = 'numpy' # or 'pyarrow'
# Retry configuration
options.retry_times = 4
options.retry_delay = 0.1
```
### Option Context
```python
from maxframe.config import option_context
# Temporary options
with option_context({'show_progress': True, 'chunk_size': 2048}):
# Operations with custom options
df.execute()
```
### Common Options
| Option | Default | Description |
|--------|---------|-------------|
| `show_progress` | `'auto'` | Show progress bar during execution |
| `execution_mode` | `'trigger'` | Execution mode: 'trigger' or 'eager' |
| `chunk_size` | `None` | Size of data chunks |
| `session.max_alive_seconds` | `259200` | Maximum session lifetime (3 days) |
| `session.max_idle_seconds` | `3600` | Maximum idle time (1 hour) |
| `session.quota_name` | `None` | MaxCompute quota to use |
| `dataframe.dtype_backend` | `'numpy'` | Backend for dtypes: 'numpy' or 'pyarrow' |
| `retry_times` | `4` | Number of retry attempts |
---
## Remote Execution
Execute custom Python code remotely on MaxCompute.
```python
from maxframe import remote
# Define remote function
@remote.spawn
def my_function(x, y):
return x + y
# Execute remotely
result = my_function(10, 20)
result.execute()
```
### Run Scripts Remotely
```python
from maxframe.remote import run_script
# Execute Python script on cluster
result = run_script(
script_path='path/to/script.py',
args=['arg1', 'arg2']
)
```
---
## User-Defined Functions (UDFs)
MaxFrame supports custom Python functions with dependency management.
```python
from maxframe.udf import with_python_requirements
# Function with dependencies
@with_python_requirements("numpy", "scipy")
def process_data(df):
import numpy as np
return np.sqrt(df['value'])
# Apply UDF
result = df['value'].apply(process_data)
```
For resource allocation (CPU, memory, GPU) in UDFs, see `key-modules.md` → UDF Resource Allocation.
---
## Best Practices
### Performance Optimization
1. **Use Lazy Execution**: Build computation graphs before execution
2. **Batch Operations**: Use `apply_chunk` for batch processing
3. **Appropriate Chunk Size**: Configure based on data size
4. **Minimize Data Movement**: Read and write to MaxCompute tables directly
5. **Use Built-in Functions**: Prefer MaxFrame functions over custom UDFs
6. **Partitioning**: Read only necessary partitions from tables
### Common Patterns
**ETL Pipeline:**
```python
# Read -> Transform -> Write
df = md.read_odps_table("source_table", partitions=["pt=20240101"])
transformed = df[df['status'] == 'active'].groupby('category').agg({'value': 'sum'})
transformed.to_odps_table("target_table", partition="pt=20240101").execute()
```
**Data Quality Check:**
```python
# Check for nulls
null_counts = df.isnull().sum()
# Check duplicates
duplicates = df.duplicated().sum()
# Value ranges
summary = df.describe()
```
**Incremental Processing:**
```python
# Process new data only
new_data = md.read_odps_table(
"source_table",
partitions=[f"dt={today}"]
)
processed = new_data.groupby('key').sum()
processed.to_odps_table(
"result_table",
partition=f"dt={today}"
).execute()
```
---
## Error Handling
### Common Issues
**Session Timeout:**
```python
# Increase session timeout
from maxframe import options
options.session.max_alive_seconds = 7200 # 2 hours
```
**Memory Issues:**
```python
# Reduce chunk size
options.chunk_size = 512
```
**Connection Issues:**
```python
# Increase retry attempts
options.retry_times = 10
options.retry_delay = 1.0
```
---
## Complete Example
### End-to-End Example
```python
import os
import maxframe.dataframe as md
from maxframe import new_session, options
from odps import ODPS
# Setup
o = ODPS(
access_id=os.getenv('ODPS_ACCESS_ID'),
secret_access_key=os.getenv('ODPS_ACCESS_KEY'),
project='my_project',
endpoint='http://service.odps.aliyun.com/api',
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
# Configure options
options.show_progress = True
options.session.quota_name = 'my_quota'
# Create session
with new_session(o) as session:
# Read data
users = md.read_odps_table('users')
orders = md.read_odps_table('orders', partitions=['dt=20240101'])
# Transform
# Filter active users
active_users = users[users['status'] == 'active']
# Join with orders
user_orders = orders.merge(
active_users[['user_id', 'name', 'email']],
on='user_id'
)
# Aggregate
daily_summary = user_orders.groupby('user_id').agg({
'order_id': 'count',
'amount': ['sum', 'mean']
})
# Flatten column names
daily_summary.columns = ['order_count', 'total_amount', 'avg_amount']
# Filter high-value users
high_value = daily_summary[daily_summary['total_amount'] > 1000]
# Write results
high_value.to_odps_table(
'high_value_users',
partition='dt=20240101'
).execute()
print("Processing complete!")
```
---
## API Compatibility
### Pandas API Support
MaxFrame aims for high compatibility with pandas APIs. Most common pandas operations are supported:
**Fully Supported:**
- DataFrame/Series creation and basic operations
- Indexing and selection (loc, iloc, boolean indexing)
- GroupBy operations and aggregations
- Merge, join, concat operations
- String and datetime accessors
- Missing data handling
- Sorting and ranking
- Mathematical operations (mean, median, std, var, min, max, sum)
- Array operations (rank, unique, duplicated, factorize)
- Eval and query expressions
- Rolling and expanding window operations
- Categorical operations
**Partially Supported:**
- Some advanced groupby operations (nth, interpolation not available)
- Complex multi-index operations
- Certain time series operations
- Eval/query with custom parser/engine parameters
**Not Supported:**
- Operations requiring full data in memory on client
- Some specialized statistical functions
- Direct matplotlib integration (use `.plot()` methods)
- Pandas internal APIs (pd.core.algorithms, pd.core.nanops)
- Numexpr/Numba configuration (MaxFrame uses its own engine)
- Parser/engine parameters for eval (only default 'maxframe' parser)
---
### MaxFrame-Specific Patterns and Workarounds
#### Index Creation Pattern
MaxFrame's `md.Index` wraps pandas Index objects rather than creating them directly. When you need specific Index types (RangeIndex, CategoricalIndex, DatetimeIndex, MultiIndex), create them with pandas first, then wrap with `md.Index()`:
```python
import pandas as pd
import maxframe.dataframe as md
# Create pandas Index objects first
pd_range_index = pd.RangeIndex(0, 10000, 1, name="range_index")
pd_categorical_index = pd.CategoricalIndex(['A', 'B', 'C'] * 100, name="cat_idx")
pd_datetime_index = pd.DatetimeIndex(pd.date_range('2024-01-01', periods=100))
pd_multi_index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1)])
# Wrap with MaxFrame Index
mf_range_index = md.Index(pd_range_index)
mf_categorical_index = md.Index(pd_categorical_index)
mf_datetime_index = md.Index(pd_datetime_index)
mf_multi_index = md.Index(pd_multi_index)
# Use in DataFrame
df = md.DataFrame({'value': range(100)}, index=mf_range_index)
```
#### Eval and Query Limitations
MaxFrame supports `eval()` and `query()` methods but with some limitations:
```python
# ✅ Supported: DataFrame.eval() for column assignments
df.eval("new_col = col1 + col2")
df.eval("""
col3 = col1 * col2
col4 = col1 / col2
""")
# ✅ Supported: query() for boolean filtering
df.query("age > 25 and salary > 50000")
# ✅ Supported: local_dict parameter
x = 10
md.eval("df.A + x", local_dict={"df": df, "x": x})
# ✅ Supported: target parameter
md.eval("new_col = df.A + df.B", target=df)
# ❌ Not Supported: parser parameter (raises NotImplementedError)
# df.eval("A + B", parser='python') # Will fail
# ❌ Not Supported: engine parameter (raises NotImplementedError)
# df.eval("A + B", engine='numexpr') # Will fail
# Default behavior: MaxFrame uses its own parser='maxframe'
```
#### GroupBy Operations
Most groupby operations work as expected, with a few exceptions:
```python
# ✅ Supported: Standard aggregations
df.groupby('category').mean()
df.groupby('category').agg({'value': ['sum', 'mean', 'std']})
# ✅ Supported: Apply operations (use skip_infer for complex functions)
df.groupby('category').apply(
lambda x: md.Series({'max': x['value'].max(), 'min': x['value'].min()}),
skip_infer=True
)
# ✅ Supported: Parameters
df.groupby('category').mean(numeric_only=True)
df.groupby('category').sum(min_count=1)
df.groupby('category').std(ddof=0)
# ❌ Not Supported: nth() method
# df.groupby('category').nth(0) # Not available
# ❌ Not Supported: quantile with interpolation parameter
# df.groupby('category').quantile(0.5, interpolation='linear') # Not available
```
#### Statistical Operations
MaxFrame handles NaN values automatically in statistical operations:
```python
# All these operations handle NaN automatically (skipna=True by default)
df['column'].mean() # Equivalent to pandas nanmean
df['column'].std() # Equivalent to pandas nanstd
df['column'].var() # Equivalent to pandas nanvar
df['column'].sum() # Equivalent to pandas nansum
df['column'].median() # Equivalent to pandas nanmedian
# Correlation and covariance
df.corr() # Handles NaN automatically
df.cov() # Handles NaN automatically
# No need to use pandas internal APIs like pd.core.nanops
# MaxFrame's public API methods handle NaN values correctly
```
#### Apply Operations with skip_infer
For complex apply operations, use `skip_infer=True` to improve performance:
```python
# For complex transformations that return Series
result = df.groupby('category').apply(
lambda x: md.Series({
'total': x['value'].sum(),
'avg': x['value'].mean(),
'count': len(x)
}),
skip_infer=True # Skip type inference for better performance
)
# For Series apply with custom functions
result = df['column'].apply(
lambda x: complex_function(x),
skip_infer=True
)
```
#### Lazy Execution Best Practices
Always remember to call `.execute()` on MaxFrame objects:
```python
# ❌ Wrong: Result is not computed
result = df[df['age'] > 25].groupby('city').mean()
print(result) # Shows computation graph, not data
# ✅ Correct: Execute to get results
result = df[df['age'] > 25].groupby('city').mean()
result = result.execute() # Now result contains actual data
print(result) # Shows the data
# ✅ Chain operations before executing
result = (df[df['age'] > 25]
.groupby('city')
.agg({'salary': 'mean', 'count': 'size'})
.execute())
```
#### Working with Different Data Types
```python
# ✅ Supported: Basic dtype conversions
df['int_col'].astype('float64')
df['string_col'].astype('category')
df['int_col'].astype('string')
# ✅ Supported: Categorical operations
df['cat_col'].cat.categories
df['cat_col'].cat.codes
df['cat_col'].cat.add_categories(['new_cat'])
df['cat_col'].cat.remove_categories(['old_cat'])
# ⚠️ Limited: Nullable dtypes (Int64, Float64, boolean, string)
# These work but may have limited support compared to pandas
df['col'].astype('Int64')
df['col'].astype('Float64')
df['col'].astype('boolean')
```
#### Array Operations
MaxFrame supports common array operations through its public API:
```python
# ✅ Supported: Rank operations
df['column'].rank(method='average')
df['column'].rank(method='min')
# ✅ Supported: Unique and duplicated
df['column'].unique()
df['column'].duplicated()
df.drop_duplicates()
# ✅ Supported: Factorize
codes, uniques = md.factorize(df['column'])
# ❌ Don't use: Internal pandas APIs
# from pandas.core.algorithms import rank # Don't use internal APIs
# Use MaxFrame's public methods instead
```
#### Performance Tips
1. **Batch Operations**: Use `.mf.apply_chunk()` for processing large datasets in batches
2. **Skip Infer**: Use `skip_infer=True` in apply operations when you know the output type
3. **Minimize Executions**: Build computation graphs and execute once rather than multiple times
4. **Use Built-in Functions**: MaxFrame's built-in aggregations are optimized for distributed execution
5. **Avoid Internal APIs**: Always use MaxFrame's public API methods
```python
# ✅ Good: Single execution of complex pipeline
result = (df
.query("age > 25")
.groupby('category')
.agg({'value': ['sum', 'mean', 'std']})
.execute())
# ❌ Bad: Multiple executions
filtered = df.query("age > 25").execute()
grouped = filtered.groupby('category').execute()
result = grouped.agg({'value': 'sum'}).execute()
```
---
## Additional API Usage Examples
### Index Operations
```python
# Creating different types of indexes
import pandas as pd
# RangeIndex
pd_range_index = pd.RangeIndex(0, 10000, 1, name="range_index")
range_index = md.Index(pd_range_index)
# CategoricalIndex
categories = ["A", "B", "C"]
cat_data = ["A", "B", "C", "A", "B"] # example data
pd_cat_index = pd.CategoricalIndex(cat_data, categories=categories, name="cat_index")
cat_index = md.Index(pd_cat_index)
# DatetimeIndex
dates = pd.date_range("2020-01-01", periods=1000, freq="D")
pd_datetime_index = pd.DatetimeIndex(dates, name="datetime_index")
datetime_index = md.Index(pd_datetime_index)
# MultiIndex
level1 = ["A", "B"] * 500
level2 = list(range(1000))
pd_multi_index = pd.MultiIndex.from_arrays([level1, level2], names=["first", "second"])
multi_index = md.Index(pd_multi_index)
# Common index operations
idx_unique = index.unique()
idx_sorted = index.sort_values()
idx_dropped = index.drop_duplicates() # New API discovered in benchmarks
idx_to_frame = index.to_frame() # New API discovered in benchmarks
# Index to dataframe conversion
df_from_index = index.to_frame(name="custom_name")
```
### Series Operations
```python
# Series creation
series = md.Series([1, 2, 3, 4, 5], name="values")
# Basic operations
result = series + 10
result = result * 2
result = result / 5
# Aggregations
mean_val = series.mean()
sum_val = series.sum()
min_val = series.min()
max_val = series.max()
# Statistical operations
median_val = series.median()
std_val = series.std()
var_val = series.var()
q25 = series.quantile(0.25)
q75 = series.quantile(0.75)
# Apply and map
applied = series.apply(lambda x: x ** 2)
mapped = series.map({1: "one", 2: "two", 3: "three"})
# String operations on series
string_series = md.Series(["hello", "world", "test"])
upper_strings = string_series.str.upper()
string_lengths = string_series.str.len()
# Value operations
value_counts = series.value_counts()
unique_vals = series.unique()
# Series to dataframe conversion
series_df = series.to_frame()
named_series_df = series.to_frame(name="custom_name")
# Groupby operations on series
grouped = series.groupby(level=0)
grouped_mean = grouped.mean()
# Rolling and expanding operations
rolling_mean = series.rolling(window=10).mean()
expanding_sum = series.expanding().sum()
# Sorting
sorted_series = series.sort_values()
sorted_by_index = series.sort_index()
# Missing value operations
missing_mask = series.isna()
not_missing_mask = series.notna()
filled_series = series.fillna(0)
dropped_na = series.dropna()
```
### DataFrame Operations
```python
# Create a sample dataframe
df = md.DataFrame({
"int_col": [1, 2, 3, 4, 5],
"float_col": [1.1, 2.2, 3.3, 4.4, 5.5],
"category_col": ["A", "B", "A", "C", "B"],
"group_col": ["Group1", "Group1", "Group2", "Group2", "Group1"]
})
# Describe operations
description = df.describe()
series_description = df["int_col"].describe()
# Apply operations
df_result = df.apply(lambda x: x["int_col"] * x["float_col"], axis=1)
# GroupBy operations
grouped = df.groupby("group_col")
agg_result = grouped.agg({"int_col": ["mean", "sum", "count"]})
transform_result = grouped["int_col"].transform(lambda x: x - x.mean())
# Merge operations
df2 = md.DataFrame({
"int_col": [1, 2, 3],
"additional_col": [10, 20, 30]
})
merged = md.merge(df, df2, on="int_col", how="inner")
left_joined = md.merge(df, df2, on="int_col", how="left")
# Head and tail operations
head_result = df.head(10)
tail_result = df.tail(10)
# Sample operations
sampled = df.sample(n=100, random_state=42)
# Query operations
filtered = df.query("int_col > 2 and float_col < 4.0")
complex_filtered = df.query("int_col > 30 and float_col > 80000 and score > 70")
# Cross-tabulation and pivot tables
crosstab_result = md.crosstab(df["category_col"], df["group_col"])
pivot_result = md.pivot_table(
df,
values="float_col",
index="category_col",
columns="group_col",
aggfunc="mean"
)
# Melt operations
melted = md.melt(df, id_vars=["group_col"], value_vars=["int_col", "float_col"])
```
### Utility Functions
```python
# Binning operations
ages = md.Series([25, 35, 45, 55, 65])
age_groups = md.cut(ages, bins=3) # Equal-width bins
age_groups = md.cut(ages, bins=[0, 30, 50, 100], labels=["Young", "Adult", "Senior"])
# Quantile-based binning
salaries = md.Series([30000, 50000, 70000, 90000, 120000])
salary_quartiles = md.qcut(salaries, q=4) # Quartiles
salary_quartiles = md.qcut(salaries, q=4, labels=["Low", "Medium", "High", "Very High"])
# One-hot encoding
df_with_categorical = md.DataFrame({"department": ["IT", "HR", "Sales", "IT"]})
dummies = md.get_dummies(df_with_categorical, columns=["department"])
dummies_prefix = md.get_dummies(df_with_categorical, columns=["department"], prefix="dept")
dummies_dropped = md.get_dummies(df_with_categorical, columns=["department"], drop_first=True)
# Type conversion
string_numbers = md.Series(["100", "200.5", "300"])
numeric_series = md.to_numeric(string_numbers, errors="coerce") # Invalid values become NaN
integer_series = md.to_numeric(string_numbers, downcast="integer")
# Expression evaluation
eval_df = md.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6],
"c": [7, 8, 9],
"d": [10, 11, 12]
})
result_df = eval_df.eval("result = a + b")
eval_df = eval_df.eval("result = (a + b) * c / d")
flagged_df = eval_df.eval("flag = a > b")
multi_assign_df = eval_df.eval("""
x = a + b
y = c * d
""")
# Record operations
records = [{"id": i, "value": i * 10, "name": f"item_{i}"} for i in range(5)]
records_df = md.DataFrame.from_records(records)
dict_df = md.DataFrame.from_dict({
"id": [1, 2, 3],
"value": [10, 20, 30],
"name": ["item_1", "item_2", "item_3"]
})
# Array utilities
unique_vals = md.unique(df["category_col"])
codes, uniques = md.factorize(df["category_col"])
# Missing value functions
missing_mask = md.isna(df["int_col"])
not_missing_mask = md.notna(df["int_col"])
```
### Advanced Operations
```python
# Categorical operations
cat_series = df["category_col"].astype("category")
categories = cat_series.cat.categories
codes = cat_series.cat.codes
is_ordered = cat_series.cat.ordered
cat_with_new = cat_series.cat.add_categories(["F", "G"])
cat_removed = cat_series.cat.remove_categories(["C"])
# String accessor operations (series)
text_series = md.Series(["Hello World", "MaxFrame Test", "Data Processing"])
lower_text = text_series.str.lower()
upper_text = text_series.str.upper()
text_len = text_series.str.len()
contains_pattern = text_series.str.contains("Hello")
replaced_text = text_series.str.replace("Hello", "Hi")
split_text = text_series.str.split(" ")
string_slice = text_series.str.slice(0, 5)
# Datetime accessor operations
dates = ["2021-01-01", "2021-02-15", "2021-03-20"]
datetime_series = md.to_datetime(md.Series(dates))
years = datetime_series.dt.year
months = datetime_series.dt.month
days = datetime_series.dt.day
weekdays = datetime_series.dt.weekday
quarters = datetime_series.dt.quarter
# Date range generation
date_range = md.date_range(start="2020-01-01", end="2024-12-31", freq="D")
monthly_range = md.date_range(start="2020-01-01", end="2024-12-31", freq="MS")
business_days = md.date_range(start="2020-01-01", end="2024-12-31", freq="B")
# Mathematical and statistical operations
series_with_nans = md.Series([1, 2, None, 4, 5])
mean_val = series_with_nans.mean() # Automatically handles NaN
std_val = series_with_nans.std() # Automatically handles NaN
sum_val = series_with_nans.sum() # Automatically handles NaN
# Correlation and covariance
numeric_df = md.DataFrame({
"A": [1, 2, 3, 4, 5],
"B": [2, 4, 6, 8, 10],
"C": [5, 3, 1, 4, 2]
})
corr_matrix = numeric_df.corr()
cov_matrix = numeric_df.cov()
# Skewness and kurtosis
skewness = numeric_df["A"].skew()
kurtosis = numeric_df["A"].kurtosis()
```
---
*For basic operations and module reference, see `key-modules.md`. For official documentation, visit https://maxframe.readthedocs.io*
FILE:references/operators-and-modules/key-modules.md
# MaxFrame Key Modules Reference
This document provides comprehensive reference for MaxFrame's key modules: DataFrame operations, Tensor operations, and Machine Learning capabilities.
## Table of Contents
- [DataFrame Operations](#dataframe-operations)
- [Creating DataFrames](#creating-dataframes)
- [Data Reading](#data-reading)
- [Data Selection and Filtering](#data-selection-and-filtering)
- [Data Transformation](#data-transformation)
- [Aggregation and GroupBy](#aggregation-and-groupby)
- [Join and Merge Operations](#join-and-merge-operations)
- [Data Writing](#data-writing)
- [MaxFrame-Specific Operations](#maxframe-specific-operations)
- [Tensor Operations](#tensor-operations)
- [Creating Tensors](#creating-tensors)
- [Mathematical Operations](#mathematical-operations)
- [Statistical Operations](#statistical-operations)
- [Linear Algebra](#linear-algebra)
- [Array Manipulation](#array-manipulation)
- [Machine Learning](#machine-learning)
- [Preprocessing](#preprocessing)
- [Linear Models](#linear-models)
- [Tree-Based Models](#tree-based-models)
- [Model Evaluation](#model-evaluation)
- [Model Selection](#model-selection)
- [UDF Resource Allocation](#udf-resource-allocation)
- [with_running_options Decorator](#with_running_options-decorator)
- [Common Operations Reference Table](#common-operations-reference-table)
---
## DataFrame Operations
MaxFrame DataFrame provides pandas-compatible APIs for distributed data processing on MaxCompute. Import the module:
```python
import maxframe.dataframe as md
```
### Creating DataFrames
Create DataFrames from various data sources:
```python
# From dictionary
df = md.DataFrame({
'id': [1, 2, 3, 4, 5],
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'age': [25, 30, 35, 40, 45],
'salary': [50000, 60000, 70000, 80000, 90000]
})
# From list of dictionaries
df = md.DataFrame([
{'id': 1, 'name': 'Alice', 'age': 25},
{'id': 2, 'name': 'Bob', 'age': 30}
])
# From list of tuples with columns
df = md.DataFrame(
[(1, 'Alice', 25), (2, 'Bob', 30)],
columns=['id', 'name', 'age']
)
```
### Data Reading
Read data from various sources:
```python
# Read from MaxCompute table
df = md.read_odps_table("my_table")
# Read with specific columns
df = md.read_odps_table("my_table", columns=['col1', 'col2'])
# Read from SQL query
df = md.read_odps_query("SELECT * FROM my_table WHERE status = 'active'")
# Read from pandas DataFrame
import pandas as pd
pd_df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df = md.read_pandas(pd_df)
# Read from CSV (if supported in your environment)
df = md.read_csv("path/to/file.csv")
```
### Data Selection and Filtering
Select and filter data using various methods:
```python
# Select columns
df_subset = df[['name', 'age']]
# Filter by condition
filtered = df[df['age'] > 30]
# Multiple conditions
filtered = df[(df['age'] > 25) & (df['salary'] < 80000)]
# Filter using isin()
filtered = df[df['name'].isin(['Alice', 'Bob'])]
# Filter using query() method
filtered = df.query('age > 30 and salary < 80000')
# Select rows by position
first_three = df.head(3)
last_three = df.tail(3)
# Select rows by index
selected = df.iloc[0:5] # First 5 rows
selected = df.loc[df['age'] > 30] # Label-based selection
```
### Data Transformation
Transform data using various operations:
```python
# Add new column
df['bonus'] = df['salary'] * 0.1
# Modify existing column
df['age'] = df['age'] + 1
# Column arithmetic
df['total_compensation'] = df['salary'] + df['bonus']
# Apply function to column
df['name_upper'] = df['name'].apply(lambda x: x.upper())
# Apply function to multiple columns
df[['salary', 'bonus']] = df[['salary', 'bonus']].apply(lambda x: x * 1.05)
# Rename columns
df_renamed = df.rename(columns={'name': 'full_name', 'age': 'years_old'})
# Drop columns
df_dropped = df.drop(columns=['bonus', 'total_compensation'])
# Drop duplicates
df_unique = df.drop_duplicates(subset=['name'])
# Fill missing values
df_filled = df.fillna({'age': 0, 'salary': df['salary'].mean()})
# Sort by column
df_sorted = df.sort_values('age', ascending=True)
df_sorted = df.sort_values(['age', 'salary'], ascending=[True, False])
```
### Aggregation and GroupBy
Perform aggregation and grouping operations:
```python
# Simple aggregation
total_salary = df['salary'].sum()
average_age = df['age'].mean()
max_salary = df['salary'].max()
min_salary = df['salary'].min()
count_rows = df.count()
# Multiple aggregations
stats = df[['age', 'salary']].agg(['mean', 'std', 'min', 'max'])
# GroupBy single column
grouped = df.groupby('department')['salary'].mean()
# GroupBy multiple columns
grouped = df.groupby(['department', 'role'])['salary'].sum()
# Multiple aggregations per group
grouped = df.groupby('department').agg({
'salary': ['mean', 'sum', 'count'],
'age': ['mean', 'min', 'max']
})
# Apply custom function to groups
def custom_agg(group):
return pd.Series({
'avg_salary': group['salary'].mean(),
'salary_range': group['salary'].max() - group['salary'].min()
})
grouped = df.groupby('department').apply(custom_agg)
# Count unique values per group
unique_counts = df.groupby('department')['role'].nunique()
```
### Join and Merge Operations
Combine DataFrames using various join operations:
```python
# Inner join (default)
merged = df1.merge(df2, on='id')
# Left join
merged = df1.merge(df2, on='id', how='left')
# Right join
merged = df1.merge(df2, on='id', how='right')
# Outer join
merged = df1.merge(df2, on='id', how='outer')
# Join on different columns
merged = df1.merge(df2, left_on='user_id', right_on='id')
# Join on multiple columns
merged = df1.merge(df2, on=['id', 'date'])
# Join with suffixes for duplicate columns
merged = df1.merge(df2, on='id', suffixes=('_left', '_right'))
# Concatenate DataFrames vertically
concatenated = md.concat([df1, df2])
# Concatenate DataFrames horizontally
concatenated = md.concat([df1, df2], axis=1)
```
### Data Writing
Write data to various destinations:
```python
# Write to MaxCompute table
md.to_odps_table(df, "output_table", overwrite=True).execute()
# Write without overwriting
md.to_odps_table(df, "output_table", overwrite=False).execute()
# Write to DLF external table (requires configuration)
from maxframe import options
options.sql.settings = {
"odps.maxframe.resolve_dlf_tables": "true"
}
md.to_odps_table(df, "dlf_table").execute()
# Write to CSV
df.to_csv("output.csv")
# Write to pandas DataFrame (for local processing)
pd_df = df.to_pandas().execute()
# Write to parquet
df.to_parquet("output.parquet").execute()
```
### MaxFrame-Specific Operations
MaxFrame provides additional operations for distributed processing:
```python
# Apply function in batches (efficient for large datasets)
def process_batch(chunk):
# Process each batch of rows
chunk['processed'] = chunk['value'] * 2
return chunk
result = df.mf.apply_chunk(
process_batch,
batch_rows=1000,
output_type='dataframe'
)
# Rebalance data distribution
df_rebalanced = df.mf.rebalance()
# Resuffle data
df_shuffled = df.mf.reshuffle()
# MapReduce operation
def mapper(row):
return (row['category'], row['value'])
def reducer(key, values):
return sum(values)
result = df.mf.map_reduce(mapper, reducer)
# Extract key-value pairs
kv_pairs = df.mf.extract_kv('key_column', 'value_column')
# Collect key-value pairs
result = df.mf.collect_kv()
```
---
## Tensor Operations
MaxFrame tensor provides NumPy-like operations for numerical computing. Import the module:
```python
import maxframe.tensor as mt
```
### Creating Tensors
Create tensors from various sources:
```python
# From list or array
arr = mt.array([1, 2, 3, 4, 5])
# From nested list (2D array)
matrix = mt.array([[1, 2, 3], [4, 5, 6]])
# Create zeros array
zeros = mt.zeros((3, 4))
# Create ones array
ones = mt.ones((2, 3))
# Create array with specific value
full = mt.full((3, 3), 7)
# Create identity matrix
identity = mt.eye(4)
# Create range
range_arr = mt.arange(0, 10, 2) # [0, 2, 4, 6, 8]
# Create linspace
linspace = mt.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1]
# Create random array
random_arr = mt.random.rand(3, 4)
random_int = mt.random.randint(0, 10, size=(3, 3))
```
### Mathematical Operations
Perform mathematical operations on tensors:
```python
# Basic arithmetic
a = mt.array([1, 2, 3])
b = mt.array([4, 5, 6])
addition = a + b # [5, 7, 9]
subtraction = a - b # [-3, -3, -3]
multiplication = a * b # [4, 10, 18]
division = a / b # [0.25, 0.4, 0.5]
power = a ** 2 # [1, 4, 9]
# Scalar operations
scalar_add = a + 10 # [11, 12, 13]
scalar_mul = a * 2 # [2, 4, 6]
# Trigonometric functions
angles = mt.array([0, mt.pi/2, mt.pi])
sin_vals = mt.sin(angles)
cos_vals = mt.cos(angles)
tan_vals = mt.tan(angles)
# Exponential and logarithmic
exp_vals = mt.exp(a)
log_vals = mt.log(a)
log10_vals = mt.log10(a)
# Rounding
arr = mt.array([1.2, 2.6, 3.5, -1.7])
rounded = mt.round(arr)
floor = mt.floor(arr)
ceil = mt.ceil(arr)
```
### Statistical Operations
Calculate statistics on tensors:
```python
arr = mt.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Basic statistics
total = mt.sum(arr) # Sum of all elements
mean = mt.mean(arr) # Mean value
median = mt.median(arr) # Median value
std = mt.std(arr) # Standard deviation
var = mt.var(arr) # Variance
# Axis-specific operations
row_sum = mt.sum(arr, axis=1) # Sum of each row
col_sum = mt.sum(arr, axis=0) # Sum of each column
row_mean = mt.mean(arr, axis=1) # Mean of each row
col_mean = mt.mean(arr, axis=0) # Mean of each column
# Min and max
min_val = mt.min(arr)
max_val = mt.max(arr)
row_min = mt.min(arr, axis=1)
col_max = mt.max(arr, axis=0)
# Percentiles
percentile_25 = mt.percentile(arr, 25)
percentile_50 = mt.percentile(arr, 50)
percentile_75 = mt.percentile(arr, 75)
# Correlation and covariance
corr = mt.corrcoef(arr)
cov = mt.cov(arr)
```
### Linear Algebra
Perform linear algebra operations:
```python
# Matrix multiplication
A = mt.array([[1, 2], [3, 4]])
B = mt.array([[5, 6], [7, 8]])
dot_product = mt.dot(A, B) # Matrix multiplication
matmul = mt.matmul(A, B) # Same as dot for 2D
# Element-wise operations
element_mul = A * B # Element-wise multiplication
# Transpose
transposed = mt.transpose(A)
transposed = A.T # Alternative syntax
# Matrix properties
determinant = mt.linalg.det(A)
rank = mt.linalg.matrix_rank(A)
trace = mt.trace(A)
# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = mt.linalg.eig(A)
# Solving linear systems
b = mt.array([1, 2])
x = mt.linalg.solve(A, b)
# Matrix inverse
inverse = mt.linalg.inv(A)
# Norm
frobenius_norm = mt.linalg.norm(A)
l1_norm = mt.linalg.norm(A, ord=1)
l2_norm = mt.linalg.norm(A, ord=2)
max_norm = mt.linalg.norm(A, ord=mt.inf)
# SVD decomposition
U, S, V = mt.linalg.svd(A)
# Cholesky decomposition (for positive definite matrices)
pos_def = mt.array([[4, 12], [12, 37]])
L = mt.linalg.cholesky(pos_def)
```
### Array Manipulation
Manipulate tensor shapes and contents:
```python
arr = mt.array([[1, 2, 3], [4, 5, 6]])
# Reshape
reshaped = mt.reshape(arr, (3, 2)) # Change shape
flattened = arr.flatten() # Flatten to 1D
raveled = arr.ravel() # Flatten (may return view)
# Transpose operations
transposed = arr.T
swapped = mt.swapaxes(arr, 0, 1)
# Stacking
a = mt.array([1, 2, 3])
b = mt.array([4, 5, 6])
stacked = mt.stack([a, b]) # Stack along new axis
hstacked = mt.hstack([a, b]) # Horizontal stack
vstacked = mt.vstack([a, b]) # Vertical stack
# Splitting
arr = mt.array([[1, 2, 3], [4, 5, 6]])
split = mt.split(arr, 2, axis=0) # Split into 2 arrays
hsplit = mt.hsplit(arr, 3) # Horizontal split
vsplit = mt.vsplit(arr, 2) # Vertical split
# Indexing and slicing
arr = mt.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
element = arr[1, 2] # Single element
row = arr[1, :] # Entire row
col = arr[:, 1] # Entire column
subarray = arr[0:2, 1:3] # Subarray
# Boolean indexing
arr = mt.array([1, 2, 3, 4, 5])
mask = arr > 2
filtered = arr[mask] # [3, 4, 5]
# Fancy indexing
arr = mt.array([10, 20, 30, 40, 50])
indices = mt.array([0, 2, 4])
selected = arr[indices] # [10, 30, 50]
# Sorting
arr = mt.array([3, 1, 4, 1, 5, 9])
sorted_arr = mt.sort(arr) # Returns sorted copy
argsorted = mt.argsort(arr) # Returns indices that would sort
# Unique values
arr = mt.array([1, 2, 2, 3, 3, 3])
unique = mt.unique(arr) # [1, 2, 3]
# Broadcasting
a = mt.array([[1], [2], [3]])
b = mt.array([4, 5, 6])
result = a + b # [[5, 6, 7], [6, 7, 8], [7, 8, 9]]
```
---
## Machine Learning
MaxFrame Learn provides scikit-learn-like machine learning capabilities. Import the modules:
```python
from maxframe import learn
from maxframe.learn import preprocessing
from maxframe.learn.linear_model import LinearRegression, LogisticRegression
from maxframe.learn.tree import DecisionTreeClassifier, RandomForestClassifier
from maxframe.learn.metrics import accuracy_score, mean_squared_error
```
### Preprocessing
Prepare data for machine learning:
```python
# Standardization (zero mean, unit variance)
from maxframe.learn import preprocessing
scaler = preprocessing.StandardScaler()
X_scaled = scaler.fit_transform(X)
# Only fit on training data, then transform test data
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Min-Max scaling (scale to [0, 1])
minmax_scaler = preprocessing.MinMaxScaler()
X_normalized = minmax_scaler.fit_transform(X)
# Robust scaling (using median and quartiles)
robust_scaler = preprocessing.RobustScaler()
X_robust = robust_scaler.fit_transform(X)
# Label encoding (convert categorical to numeric)
label_encoder = preprocessing.LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
# One-hot encoding
onehot_encoder = preprocessing.OneHotEncoder()
X_onehot = onehot_encoder.fit_transform(X_categorical)
# Polynomial features
poly = preprocessing.PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
# Train-test split
from maxframe.learn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# K-Fold cross-validation
from maxframe.learn.model_selection import KFold
kf = KFold(n_splits=5, shuffle=True, random_state=42)
for train_idx, test_idx in kf.split(X):
X_train, X_test = X[train_idx], X[test_idx]
y_train, y_test = y[train_idx], y[test_idx]
```
### Linear Models
Train linear models for regression and classification:
```python
# Linear Regression
from maxframe.learn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
# Get model coefficients
coefficients = model.coef_
intercept = model.intercept_
# Logistic Regression
from maxframe.learn.linear_model import LogisticRegression
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
# Ridge Regression (L2 regularization)
from maxframe.learn.linear_model import Ridge
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
predictions = ridge.predict(X_test)
# Lasso Regression (L1 regularization)
from maxframe.learn.linear_model import Lasso
lasso = Lasso(alpha=1.0)
lasso.fit(X_train, y_train)
predictions = lasso.predict(X_test)
# Elastic Net (combined L1 and L2)
from maxframe.learn.linear_model import ElasticNet
elastic_net = ElasticNet(alpha=1.0, l1_ratio=0.5)
elastic_net.fit(X_train, y_train)
predictions = elastic_net.predict(X_test)
```
### Tree-Based Models
Train tree-based models:
```python
# Decision Tree Classifier
from maxframe.learn.tree import DecisionTreeClassifier
dt_clf = DecisionTreeClassifier(max_depth=5, random_state=42)
dt_clf.fit(X_train, y_train)
predictions = dt_clf.predict(X_test)
# Decision Tree Regressor
from maxframe.learn.tree import DecisionTreeRegressor
dt_reg = DecisionTreeRegressor(max_depth=5, random_state=42)
dt_reg.fit(X_train, y_train)
predictions = dt_reg.predict(X_test)
# Random Forest Classifier
from maxframe.learn.ensemble import RandomForestClassifier
rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
rf_clf.fit(X_train, y_train)
predictions = rf_clf.predict(X_test)
# Random Forest Regressor
from maxframe.learn.ensemble import RandomForestRegressor
rf_reg = RandomForestRegressor(n_estimators=100, random_state=42)
rf_reg.fit(X_train, y_train)
predictions = rf_reg.predict(X_test)
# Gradient Boosting
from maxframe.learn.ensemble import GradientBoostingClassifier
gb_clf = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
gb_clf.fit(X_train, y_train)
predictions = gb_clf.predict(X_test)
```
### Model Evaluation
Evaluate model performance:
```python
from maxframe.learn import metrics
# Classification metrics
accuracy = metrics.accuracy_score(y_test, predictions)
precision = metrics.precision_score(y_test, predictions, average='weighted')
recall = metrics.recall_score(y_test, predictions, average='weighted')
f1 = metrics.f1_score(y_test, predictions, average='weighted')
# Confusion matrix
conf_matrix = metrics.confusion_matrix(y_test, predictions)
# Classification report
report = metrics.classification_report(y_test, predictions)
# ROC AUC score
probabilities = clf.predict_proba(X_test)
roc_auc = metrics.roc_auc_score(y_test, probabilities[:, 1])
# Regression metrics
mse = metrics.mean_squared_error(y_test, predictions)
rmse = metrics.mean_squared_error(y_test, predictions, squared=False)
mae = metrics.mean_absolute_error(y_test, predictions)
r2 = metrics.r2_score(y_test, predictions)
# Cross-validation score
from maxframe.learn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
mean_score = scores.mean()
```
### Model Selection
Select and tune models:
```python
# Grid search for hyperparameter tuning
from maxframe.learn.model_selection import GridSearchCV
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [3, 5, 7],
'learning_rate': [0.01, 0.1, 0.2]
}
grid_search = GridSearchCV(
estimator=GradientBoostingClassifier(),
param_grid=param_grid,
cv=5,
scoring='accuracy'
)
grid_search.fit(X_train, y_train)
best_model = grid_search.best_estimator_
best_params = grid_search.best_params_
# Random search for hyperparameter tuning
from maxframe.learn.model_selection import RandomizedSearchCV
random_search = RandomizedSearchCV(
estimator=RandomForestClassifier(),
param_distributions=param_grid,
n_iter=10,
cv=5,
scoring='accuracy'
)
random_search.fit(X_train, y_train)
best_model = random_search.best_estimator_
```
---
## UDF Resource Allocation
### with_running_options Decorator
The `@with_running_options` decorator from `maxframe.udf` allows you to allocate computational resources (CPU, memory, GPU units) for user-defined functions (UDFs) executed on the DPE (Data Processing Engine) engine. This is essential for optimizing performance and ensuring functions have adequate resources.
#### Import
```python
from maxframe.udf import with_running_options
```
#### Parameters
| Parameter | Type | Description | Example Values |
|-----------|------|-------------|----------------|
| `engine` | str | Execution engine type | `"dpe"` (recommended for UDFs) |
| `cpu` | int | Number of CPU cores to allocate | 1, 2, 4, 8 |
| `memory` | int | Memory allocation in **gigabytes (GB)** | 2, 4, 8, 16 |
| `gu` | int | Number of GPU Units (GU) for GPU-accelerated processing | 1, 2, 4 |
| `gu_quota` | str | GPU quota name for GU allocation | `"your_gu_quota_name"` |
**IMPORTANT**: The `memory` parameter expects values in **gigabytes (GB)**, not megabytes (MB). For example:
- `memory=4` means 4 GB
- `memory=2` means 2 GB
- ❌ **Wrong**: `memory=4096` (this would request 4096 GB = 4 TB!)
#### Basic Usage
##### 1. CPU and Memory Allocation
Allocate CPU cores and memory for compute-intensive UDFs:
```python
from maxframe.udf import with_running_options
import maxframe.dataframe as md
# Allocate 2 CPU cores and 4 GB memory
@with_running_options(engine="dpe", cpu=2, memory=4)
def process_data(batch):
"""Process data with allocated resources."""
result = batch.copy()
result['computed'] = result['value'] * 2
return result
# Apply the function
df = md.read_odps_table("input_table")
result = df.mf.apply_chunk(
process_data,
batch_rows=1000,
output_type='dataframe'
)
result.execute()
```
##### 2. GPU Acceleration with GPU Units (GU)
Allocate GPU units for GPU-accelerated workloads:
```python
from maxframe.udf import with_running_options
import maxframe.dataframe as md
# Allocate 1 GPU Unit for GPU-accelerated processing
# Replace 'your_gu_quota' with your actual GU quota name
@with_running_options(engine="dpe", gu=1, gu_quota="your_gu_quota")
def gpu_accelerated_function(row):
"""Process data with GPU acceleration."""
# Your GPU-accelerated logic here
# Example: matrix operations, ML inference, etc.
result = row.copy()
result['processed'] = row['value'] * 2
return result
# Apply the GPU-accelerated function
df = md.read_odps_table("ml_features")
result = df.apply(
gpu_accelerated_function,
axis=1,
dtypes=df.dtypes,
output_type='dataframe',
skip_infer=True
)
result.execute()
```
##### 3. Resource Allocation for Classes (MapReduce)
Use with class-based UDFs such as reducers in MapReduce operations:
```python
from collections import defaultdict
from maxframe.udf import with_running_options
# Allocate resources for a reducer class
@with_running_options(cpu=2, memory=4)
class WordCountReducer:
"""Reducer for word count with allocated resources."""
def __init__(self):
self._word_to_count = defaultdict(lambda: 0)
def __call__(self, batch, end=False):
"""Process batch and aggregate counts."""
word = None
for _, row in batch.iterrows():
word = row.iloc[0]
self._word_to_count[row.iloc[0]] += row.iloc[1]
if end:
return pd.DataFrame(
[[word, self._word_to_count[word]]],
columns=["word", "count"]
)
# Use in MapReduce
df = md.read_odps_table("documents")
result = df.mf.map_reduce(mapper_func, WordCountReducer)
result.execute()
```
#### When to Use
Use `@with_running_options` when:
1. **Your UDF is compute-intensive**: Functions with complex calculations, large data transformations, or ML inference
2. **Processing large batches**: `apply_chunk()` operations with large `batch_rows`
3. **GPU acceleration needed**: Deep learning inference, matrix operations, or other GPU-accelerated workloads
4. **Memory-intensive operations**: Functions that load models, process large arrays, or create intermediate data structures
5. **MapReduce reducers**: Class-based reducers that aggregate data and benefit from more resources
#### Resource Allocation Guidelines
##### CPU Allocation
| Use Case | Recommended CPU | Example |
|----------|----------------|---------|
| Simple transformations | 1 core | String operations, basic arithmetic |
| Moderate computation | 2 cores | Aggregations, filtering, joins |
| Heavy computation | 4-8 cores | ML inference, complex calculations |
##### Memory Allocation
| Use Case | Recommended Memory | Example |
|----------|-------------------|---------|
| Small batches (<1000 rows) | 2 GB | Simple row transformations |
| Medium batches (1000-10000 rows) | 4 GB | Grouping, sorting, merging |
| Large batches (>10000 rows) | 8-16 GB | Large-scale aggregations, model inference |
| Model loading | 16+ GB | Deep learning models, large embeddings |
**Important**:
- Start with conservative allocations (2 CPU, 4 GB) and increase as needed
- Monitor resource usage via logview URLs to optimize allocations
- Over-allocation wastes resources; under-allocation causes failures
- Memory is specified in **GB**, not MB!
##### GPU Unit Allocation
| Use Case | Recommended GU | Notes |
|----------|---------------|-------|
| ML inference (small models) | 1 GU | LightGBM, small neural networks |
| ML inference (large models) | 2-4 GU | Deep learning, transformers |
| GPU-accelerated computation | 1-2 GU | Matrix operations, signal processing |
**Important**:
- Requires `gu_quota` parameter with your quota name
- Check your MaxCompute account for available GU quotas
- GPU resources are more expensive than CPU; use only when needed
#### Complete Example: Resource-Aware Data Processing Pipeline
```python
import os
import maxframe.dataframe as md
from maxframe.session import new_session
from maxframe.udf import with_running_options
from maxframe.config import options
from odps import ODPS
# Configure DPE engine
options.dag.settings = {
"engine_order": ["DPE"],
"unavailable_engines": ["MCSQL", "SPE"],
}
options.sql.settings = {"odps.session.image": "maxframe_service_dpe_runtime"}
# Create session
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
session = new_session(o)
# Define resource-intensive processing function
@with_running_options(engine="dpe", cpu=4, memory=8)
def complex_transformation(batch):
"""
Complex data transformation requiring significant resources.
This function:
- Loads a pre-trained model
- Performs feature engineering
- Runs batch inference
- Post-processes results
"""
# Feature engineering
batch['feature_1'] = batch['value_1'] * batch['value_2']
batch['feature_2'] = batch['value_1'] / (batch['value_2'] + 1e-6)
# Complex aggregation
batch['rolling_mean'] = batch['value_1'].rolling(window=100).mean()
# Post-processing
batch['result'] = batch['feature_1'] + batch['rolling_mean']
return batch
# Process data
try:
df = md.read_odps_table("large_dataset")
# Apply transformation with allocated resources
result = df.mf.apply_chunk(
complex_transformation,
batch_rows=5000, # Larger batches for efficiency
output_type='dataframe'
)
# Write results
md.to_odps_table(result, "processed_data", overwrite=True).execute()
print(f"Processing complete. Logview: {session.get_logview_address()}")
finally:
session.destroy()
```
#### Best Practices
1. **Start Small, Scale Up**: Begin with `cpu=2, memory=4` and increase based on actual needs
2. **Monitor via Logview**: Use logview URLs to check resource utilization and execution performance
3. **Match Batch Size**: Align `batch_rows` in `apply_chunk()` with allocated resources
- Small batches (<1000): `cpu=1-2, memory=2-4`
- Medium batches (1000-10000): `cpu=2-4, memory=4-8`
- Large batches (>10000): `cpu=4-8, memory=8-16`
4. **GPU for ML**: Use GPU units (GU) for ML inference and GPU-accelerated operations
5. **Test Locally First**: Use `debug=True` to test UDF logic locally before allocating resources
6. **Avoid Over-allocation**: Don't request more resources than needed; it wastes quota and may delay scheduling
#### Common Patterns
##### Pattern 1: Simple Data Transformation
```python
@with_running_options(engine="dpe", cpu=1, memory=2)
def simple_transform(batch):
batch['new_col'] = batch['col_a'] + batch['col_b']
return batch
```
##### Pattern 2: Aggregation with Moderate Resources
```python
@with_running_options(engine="dpe", cpu=2, memory=4)
def aggregate_processing(batch):
result = batch.groupby('category').agg({
'value': ['sum', 'mean', 'count']
})
return result
```
##### Pattern 3: ML Inference with High Resources
```python
@with_running_options(engine="dpe", cpu=4, memory=16)
def ml_inference(batch):
# Load model (requires more memory)
# model = load_model()
# Run inference
# predictions = model.predict(batch)
batch['prediction'] = batch['feature'] * 2 # Simplified
return batch
```
##### Pattern 4: GPU-Accelerated Processing
```python
@with_running_options(engine="dpe", gu=1, gu_quota="ml_gpu_quota")
def gpu_processing(batch):
# GPU-accelerated operations
# Ideal for: deep learning, matrix operations, signal processing
return batch
```
#### Troubleshooting
| Issue | Cause | Solution |
|-------|-------|----------|
| **Out of memory error** | Insufficient memory allocation | Increase `memory` parameter (e.g., from 4 to 8) |
| **Slow execution** | Under-allocated CPU | Increase `cpu` parameter (e.g., from 2 to 4) |
| **Resource unavailable** | Quota limits reached | Check MaxCompute quota or reduce allocation |
| **GPU not found** | Missing `gu_quota` or invalid quota name | Verify GU quota name in MaxCompute console |
| **Task timeout** | Insufficient resources for workload | Increase both CPU and memory |
| **High cost** | Over-allocation | Monitor usage and reduce allocation to minimum needed |
#### Related Resources
- **Examples**: See `assets/examples/gpu_unit_dpe_processing.py` for GPU usage
- **Examples**: See `assets/examples/fs_mount_example.py` for combined filesystem and resource allocation
- **Examples**: See `assets/examples/oss_multi_mount.py` for multiple OSS mounts with resource allocation
- **Local Debugging**: Use `debug=True` parameter for local testing without resource allocation
---
## Common Operations Reference Table
### DataFrame Operations
| Operation | Description | Example |
|-----------|-------------|---------|
| `md.DataFrame()` | Create DataFrame | `df = md.DataFrame({'A': [1, 2, 3]})` |
| `md.read_odps_table()` | Read from MaxCompute table | `df = md.read_odps_table("my_table")` |
| `md.read_odps_query()` | Read from SQL query | `df = md.read_odps_query("SELECT * FROM table")` |
| `df.head()` | Get first n rows | `df.head(5)` |
| `df.tail()` | Get last n rows | `df.tail(5)` |
| `df.describe()` | Get summary statistics | `df.describe()` |
| `df.info()` | Get DataFrame info | `df.info()` |
| `df.columns` | Get column names | `df.columns` |
| `df.shape` | Get dimensions | `df.shape` |
| `df.dtypes` | Get column data types | `df.dtypes` |
| `df.filter()` | Filter columns | `df.filter(['col1', 'col2'])` |
| `df.drop()` | Drop columns/rows | `df.drop(columns=['col1'])` |
| `df.rename()` | Rename columns | `df.rename(columns={'old': 'new'})` |
| `df.sort_values()` | Sort by values | `df.sort_values('col', ascending=False)` |
| `df.sort_index()` | Sort by index | `df.sort_index()` |
| `df.drop_duplicates()` | Remove duplicates | `df.drop_duplicates()` |
| `df.fillna()` | Fill missing values | `df.fillna(0)` |
| `df.dropna()` | Drop missing values | `df.dropna()` |
| `df.groupby()` | Group data | `df.groupby('col').sum()` |
| `df.agg()` | Aggregate | `df.agg({'col': ['mean', 'sum']})` |
| `df.merge()` | Merge DataFrames | `df1.merge(df2, on='key')` |
| `df.join()` | Join DataFrames | `df1.join(df2, on='key')` |
| `md.concat()` | Concatenate DataFrames | `md.concat([df1, df2])` |
| `df.apply()` | Apply function | `df['new'] = df['col'].apply(func)` |
| `df.mf.apply_chunk()` | Apply in batches | `df.mf.apply_chunk(func, batch_rows=1000)` |
| `df.mf.rebalance()` | Rebalance data | `df.mf.rebalance()` |
| `df.mf.reshuffle()` | Shuffle data | `df.mf.reshuffle()` |
| `md.to_odps_table()` | Write to MaxCompute | `md.to_odps_table(df, "table")` |
| `df.to_pandas()` | Convert to pandas | `df.to_pandas()` |
| `df.execute()` | Execute lazy operations | `result.execute()` |
### Tensor Operations
| Operation | Description | Example |
|-----------|-------------|---------|
| `mt.array()` | Create tensor | `arr = mt.array([1, 2, 3])` |
| `mt.zeros()` | Create zeros array | `mt.zeros((3, 4))` |
| `mt.ones()` | Create ones array | `mt.ones((2, 3))` |
| `mt.eye()` | Create identity matrix | `mt.eye(4)` |
| `mt.arange()` | Create range | `mt.arange(0, 10)` |
| `mt.linspace()` | Create linspace | `mt.linspace(0, 1, 5)` |
| `mt.sum()` | Sum elements | `mt.sum(arr)` |
| `mt.mean()` | Mean value | `mt.mean(arr)` |
| `mt.median()` | Median value | `mt.median(arr)` |
| `mt.std()` | Standard deviation | `mt.std(arr)` |
| `mt.var()` | Variance | `mt.var(arr)` |
| `mt.min()` | Minimum value | `mt.min(arr)` |
| `mt.max()` | Maximum value | `mt.max(arr)` |
| `mt.dot()` | Dot product | `mt.dot(A, B)` |
| `mt.matmul()` | Matrix multiplication | `mt.matmul(A, B)` |
| `mt.transpose()` | Transpose | `mt.transpose(arr)` |
| `mt.reshape()` | Reshape array | `mt.reshape(arr, (3, 2))` |
| `mt.flatten()` | Flatten array | `arr.flatten()` |
| `mt.sort()` | Sort array | `mt.sort(arr)` |
| `mt.unique()` | Unique values | `mt.unique(arr)` |
| `mt.stack()` | Stack arrays | `mt.stack([a, b])` |
| `mt.concat()` | Concatenate arrays | `mt.concat([a, b])` |
| `mt.linalg.inv()` | Matrix inverse | `mt.linalg.inv(A)` |
| `mt.linalg.det()` | Determinant | `mt.linalg.det(A)` |
| `mt.linalg.eig()` | Eigenvalues | `mt.linalg.eig(A)` |
| `mt.linalg.solve()` | Solve linear system | `mt.linalg.solve(A, b)` |
| `mt.linalg.norm()` | Matrix norm | `mt.linalg.norm(A)` |
### Machine Learning Operations
| Operation | Description | Example |
|-----------|-------------|---------|
| `preprocessing.StandardScaler()` | Standardize features | `scaler = StandardScaler()` |
| `preprocessing.MinMaxScaler()` | Min-max scaling | `scaler = MinMaxScaler()` |
| `preprocessing.LabelEncoder()` | Encode labels | `encoder = LabelEncoder()` |
| `preprocessing.OneHotEncoder()` | One-hot encoding | `encoder = OneHotEncoder()` |
| `train_test_split()` | Split train/test | `X_train, X_test, y_train, y_test = train_test_split(X, y)` |
| `LinearRegression()` | Linear regression | `model = LinearRegression()` |
| `LogisticRegression()` | Logistic regression | `clf = LogisticRegression()` |
| `Ridge()` | Ridge regression | `model = Ridge(alpha=1.0)` |
| `Lasso()` | Lasso regression | `model = Lasso(alpha=1.0)` |
| `DecisionTreeClassifier()` | Decision tree classifier | `clf = DecisionTreeClassifier()` |
| `DecisionTreeRegressor()` | Decision tree regressor | `reg = DecisionTreeRegressor()` |
| `RandomForestClassifier()` | Random forest classifier | `clf = RandomForestClassifier()` |
| `RandomForestRegressor()` | Random forest regressor | `reg = RandomForestRegressor()` |
| `GradientBoostingClassifier()` | Gradient boosting | `clf = GradientBoostingClassifier()` |
| `model.fit()` | Train model | `model.fit(X_train, y_train)` |
| `model.predict()` | Make predictions | `predictions = model.predict(X_test)` |
| `model.predict_proba()` | Get probabilities | `probs = model.predict_proba(X_test)` |
| `metrics.accuracy_score()` | Accuracy | `accuracy_score(y_test, predictions)` |
| `metrics.precision_score()` | Precision | `precision_score(y_test, predictions)` |
| `metrics.recall_score()` | Recall | `recall_score(y_test, predictions)` |
| `metrics.f1_score()` | F1 score | `f1_score(y_test, predictions)` |
| `metrics.mean_squared_error()` | MSE | `mean_squared_error(y_test, predictions)` |
| `metrics.mean_absolute_error()` | MAE | `mean_absolute_error(y_test, predictions)` |
| `metrics.r2_score()` | R2 score | `r2_score(y_test, predictions)` |
| `cross_val_score()` | Cross-validation | `cross_val_score(model, X, y, cv=5)` |
| `GridSearchCV()` | Grid search | `grid = GridSearchCV(model, param_grid, cv=5)` |
| `RandomizedSearchCV()` | Random search | `search = RandomizedSearchCV(model, param_dist, n_iter=10)` |
---
## Additional Resources
For more detailed information, refer to:
- **MaxFrame Client Documentation**: `references/maxframe-client-docs/`
- **Online API Reference**: https://maxframe.readthedocs.io/en/latest/reference/index.html
- **Source Code**: https://github.com/aliyun/alibabacloud-odps-maxframe-client.git
FILE:references/operators-and-modules/operator-selection-rules.md
# Operator Selection Rules
This document provides guidance for selecting and recommending MaxFrame operators. The operator-selector agent uses the `lookup_operator.py` script to find, verify, and recommend operators based on actual MaxFrame documentation.
## Core Principles
### 1. Performance First
Prefer operators optimized for distributed execution:
- Use `.mf.apply_chunk()` for batch processing of large datasets
- Use vectorized operations instead of row-wise operations
- Use built-in aggregations instead of custom UDFs when possible
### 2. Batch First
For processing large datasets, prioritize batch operations:
- Use `.mf.apply_chunk()` for custom functions on large data
- Use `.mf.map_reduce()` for map-reduce patterns
- Process data in chunks to distribute workload
### 3. Pandas Compatibility First
MaxFrame provides pandas-compatible APIs. When multiple options exist, prefer pandas-compatible operators that users are already familiar with.
### 4. Provide Alternatives
When recommending operators, always provide fallback options:
- Primary recommendation (best engine support, best performance)
- Alternative 1 (if primary has limitations or partial support)
- Alternative 2 (if primary is not supported on current engine)
- UDF approach (if no native operator is available)
### 5. UDF Fallback for Unsupported Operators
When an operator is not supported or has significant limitations, provide a UDF-based solution:
- Use `.apply()` with a custom function
- Use `.mf.apply_chunk()` for distributed UDF execution
- Prefer `dataframe.mf.apply_chunk` than `dataframe.apply` than `series.apply`.
- Document any performance implications of the UDF approach
## Engine Support Priority
MaxFrame operators run on different execution engines. Check `./maxframe-client-docs/user_guide/dataframe/supported_pd_apis.md` for detailed support matrix.
**Priority order:**
1. **SQL Engine (MCSQL)** - Highest priority, best performance
2. **DPE** - Good performance, broader API support
3. **SPE** - Lowest priority, fallback option
When recommending an operator, check if it's supported on SQL Engine first. If not, check DPE, then SPE.
**Legend:**
- `Y` - Fully supported
- `P` - Partially supported (see Details column)
- `N` - Not supported
## Using lookup_operator.py
The `lookup_operator.py` script is the authoritative source for operator information. Always use it to verify operator existence, get signatures, and retrieve documentation.
### Available Commands
#### List All Operators
```bash
python scripts/lookup_operator.py list [--fold] [--json]
```
#### Search for Operators
```bash
python scripts/lookup_operator.py search <pattern> [-n|--name-only] [--fold] [--json]
```
#### Get Operator Information
```bash
python scripts/lookup_operator.py info <name> [-s|--section SECTION] [--json]
```
### Available Sections for Info Command
- `signature` - Function signature
- `description` - Description paragraphs
- `params` / `parameters` - Parameters section
- `returns` - Returns section
- `return_type` - Return type
- `see_also` - See Also section
- `notes` - Notes section
- `examples` - Examples section
Sections can be empty.
## Operator Selection Guidelines
### Check Series.apply Implications
`Series.apply()` currently generates a join operation which can be expensive. Prefer:
- `DataFrame.apply()` when working with DataFrames
- `DataFrame.mf.apply_chunk()` for batch processing on large datasets
- Vectorized operations when possible
## Workflow for Operator Selection
1. **Search** for operators matching the task description
2. **Validate** operator existence and engine support
3. **Check for known issues** (e.g., Series.apply generates joins)
4. **Prepare alternative options** with different operators or UDF approaches
5. **Retrieve** only the sections needed (signature, examples, etc.)
6. **Recommend** with primary choice and fallbacks
## Example
```bash
# Find operators for time series operations
python scripts/lookup_operator.py search rolling
# Get full info on a specific operator
python scripts/lookup_operator.py info DataFrame.rolling
# Get only the signature and examples
python scripts/lookup_operator.py info DataFrame.rolling -s signature
python scripts/lookup_operator.py info DataFrame.rolling -s examples
# Check if operator is supported on SQL Engine
grep "rolling" references/maxframe-client-docs/user_guide/dataframe/supported_pd_apis.md
```
## Important
- **Never assume** operator behavior - always use lookup_operator.py
- **Always verify** operator existence before recommending
- **Check engine support** in supported_pd_apis.md for compatibility
- **Provide alternatives** - always give backup options if primary has limitations
- **Include UDF fallback** - provide custom function approach when native operators are unavailable
- **Use section extraction** to avoid loading large documentation files
- **Reference actual documentation** for accurate information
## Reference
For complete API documentation:
- See `maxframe-client-docs/` directory
- See `supported_pd_apis.md` for engine support matrix
- Use `lookup_operator.py` script for detailed operator information
FILE:references/operators-and-modules/operator-selector.md
---
name: operator-selector
description: Use this agent when you need to find, validate, or recommend MaxFrame operators for specific tasks. Examples:
<example>
Context: User is building a MaxFrame job and needs to perform data transformation
user: "How do I transform a DataFrame column using a custom function?"
assistant: "I'll search for the appropriate MaxFrame operators for DataFrame transformations."
<commentary>
User needs to find operators for DataFrame transformation. The operator-selector agent should search for relevant operators like apply, map, or transform functions.
</commentary>
assistant: "I'll use the operator-selector agent to find the best MaxFrame operators for your data transformation task."
</example>
<example>
Context: User is unsure if a specific operator exists in MaxFrame
user: "Does MaxFrame have an apply_chunk operator for processing data in chunks?"
assistant: "Let me check if that operator is available in MaxFrame."
<commentary>
User is asking about operator existence. The operator-selector agent should validate if the operator exists and provide its documentation.
</commentary>
assistant: "I'll use the operator-selector agent to verify if the apply_chunk operator exists in MaxFrame."
</example>
<example>
Context: User describes a task but doesn't know which operators to use
user: "I need to resample time series data and compute rolling averages on a large dataset."
assistant: "I'll find the best MaxFrame operators for time series resampling and rolling operations."
<commentary>
User has described a task but doesn't know specific operators. The operator-selector agent should search for and recommend appropriate operators like resample, rolling, or ewm.
</commentary>
assistant: "I'll use the operator-selector agent to recommend the optimal MaxFrame operators for your time series processing task."
</example>
<example>
Context: User wants examples of how to use specific operators
user: "Show me examples of how to use the merge operator for joining DataFrames."
assistant: "I'll find the merge operator documentation and provide usage examples."
<commentary>
User wants specific usage examples. The operator-selector agent should retrieve the examples section from the operator documentation.
</commentary>
assistant: "I'll use the operator-selector agent to get the merge operator documentation and usage examples."
</example>
<example>
Context: User needs to validate multiple operators exist before using them
user: "Before I start coding, can you check if maxframe has these operators: read_parquet, merge, groupby, and to_odps_table?"
assistant: "I'll validate that all those operators are available in MaxFrame."
<commentary>
User needs to validate multiple operators. The operator-selector agent should search for each operator and confirm availability.
</commentary>
assistant: "I'll use the operator-selector agent to validate that all those operators are available in MaxFrame."
</example>
<example>
Context: User is optimizing a pipeline and wants better operator alternatives
user: "I'm using apply() but it's slow. Is there a faster way to process my DataFrame?"
assistant: "I'll find optimized alternatives to the apply operator for better performance."
<commentary>
User needs performance optimization recommendations. The operator-selector agent should suggest more efficient operators like vectorized operations or map_reduce.
</commentary>
assistant: "I'll use the operator-selector agent to find more efficient MaxFrame operators for your DataFrame processing."
</example>
model: inherit
color: blue
tools: ["Bash"]
---
You are a MaxFrame API expert specializing in operator selection, validation, and recommendation. Your role is to help users find the right MaxFrame operators for their tasks, validate operator availability, and provide optimized operator combinations with usage examples.
## Table of Contents
- [Core Responsibilities](#core-responsibilities)
- [Available Tools](#available-tools)
- [Task Execution Process](#task-execution-process)
- [Quality Standards](#quality-standards)
- [Search Best Practices](#search-best-practices)
- [Common Operator Categories](#common-operator-categories)
- [Edge Case Handling](#edge-case-handling)
- [Output Guidelines](#output-guidelines)
- [Example Responses](#example-responses)
- [Remember](#remember)
## Core Responsibilities
1. **Operator Discovery**: Search and identify relevant MaxFrame operators based on user task descriptions
2. **Operator Validation**: Verify that operators exist and are available in the MaxFrame API
3. **Operator Recommendation**: Suggest the most appropriate operators for specific tasks, considering performance and best practices
4. **Documentation Retrieval**: Provide operator signatures, parameters, and usage examples without loading large documentation files
5. **Operator Combination**: Recommend optimized combinations of operators for complex tasks
6. **Context-Aware Search**: Use pattern matching and content search to find operators even when users don't know exact names
## Available Tools
You have access to:
1. **lookup_operator.py script** at `scripts/lookup_operator.py`
2. **Operator Selection Rules** at `operator-selection-rules.md`
### Script Commands
**List all operators:**
```bash
python scripts/lookup_operator.py list [--fold] [--json]
```
**Search for operators:**
```bash
python scripts/lookup_operator.py search <pattern> [-n|--name-only] [--fold] [--json]
```
**Get operator documentation:**
```bash
python scripts/lookup_operator.py info <name> [-s|--section SECTION] [--json]
```
### Available Sections for Info Command
- `signature`: The function signature line
- `description`: Description paragraphs
- `params` / `parameters`: Parameters section
- `returns`: Returns section
- `return_type`: Return type
- `see_also`: See Also section
- `notes`: Notes section
- `examples`: Examples section
Sections can be empty.
## Task Execution Process
### Step 1: Understand the User's Requirement
Analyze what the user is trying to accomplish:
- Are they asking about a specific operator?
- Are they describing a task but don't know which operators to use?
- Do they need to validate operator existence?
- Do they need usage examples or performance recommendations?
### Step 2: Search for Relevant Operators
Use the `search` command with appropriate patterns:
- For specific operator names: search for the exact name or partial match
- For task descriptions: search for relevant keywords (e.g., "transform", "resample", "merge", "group")
- Use `-n` flag to search names only for faster results
- Use `--fold` flag to save tokens when displaying many results
**Search Strategy:**
- Start with broad search terms related to the task
- If too many results, refine with more specific patterns
- Try different variations of the search term
- Search in content if name-only search doesn't yield good results
### Step 3: Validate Operator Existence
For each candidate operator:
```bash
python .../lookup_operator.py info <operator_name>
```
Check that the operator exists and is available in MaxFrame.
### Step 4: Retrieve Relevant Documentation Sections
To avoid loading large documentation, retrieve only the necessary sections:
- For understanding what the operator does: `--section description`
- For usage examples: `--section examples`
- For function signature: `--section signature`
- For parameter details: `--section params`
Example:
```bash
python .../lookup_operator.py info DataFrame.merge --section signature
python .../lookup_operator.py info DataFrame.merge --section examples
```
### Step 5: Analyze and Recommend
Based on the retrieved information:
1. **Select the best operators** for the user's task
2. **Consider performance implications** (e.g., vectorized operations vs apply)
3. **Suggest operator combinations** for complex tasks
4. **Provide clear usage examples** with relevant parameters
### Step 6: Format the Output
Present the recommendation in a clear, structured format:
```
## Recommended Operators
### 1. [Operator Name]
**Purpose**: Brief description of what it does
**Use case**: When to use this operator
**Signature**:
```python
# Function signature
```
**Parameters**:
- param1: description
- param2: description
**Example**:
```python
# Usage example
```
### 2. [Additional Operator if needed]
...
## Operator Combination Strategy
[Explain how to combine these operators for the task]
## Performance Considerations
[Any performance tips or alternatives]
```
## Quality Standards
1. **Accuracy**: Always verify operator existence before recommending
2. **Relevance**: Select operators that directly address the user's task
3. **Clarity**: Provide clear, working code examples
4. **Efficiency**: Recommend the most performant operators available
5. **Completeness**: Include all necessary parameters and context in examples
6. **Token Efficiency**: Use section extraction to avoid loading large documentation
## Search Best Practices
1. **Start Broad, Then Narrow**: Begin with general terms, then refine
2. **Use Glob Patterns**: Use wildcards (`*`) for flexible matching
- `*transform*` - matches any operator containing "transform"
- `DataFrame.*merge*` - matches DataFrame merge operations
- `mf.*` - matches all maxframe-specific (mf) operators
3. **Try Variations**: Search for synonyms and related terms
- "join" vs "merge"
- "transform" vs "map" vs "apply"
4. **Check Content**: If name-only search fails, search in content as well
5. **Use Section Extraction**: Retrieve only what's needed to save tokens
## Common Operator Categories
Be familiar with these MaxFrame operator categories:
**DataFrame Operations:**
- Data loading: `read_parquet`, `read_csv`, `read_table`
- Data transformation: `apply`, `map`, `transform`, `assign`
- Data aggregation: `groupby`, `agg`, `aggregate`, `pivot_table`
- Data joining: `merge`, `join`, `concat`
- Data filtering: `query`, `filter`, `loc`, `iloc`
- Data sorting: `sort_values`, `sort_index`
**MaxFrame-Specific (mf) Operations:**
- Distributed processing: `mf.map_reduce`, `mf.apply_chunk`, `mf.rebalance`, `mf.reshuffle`
- Complex operations: `mf.flatmap`, `mf.collect_kv`, `mf.extract_kv`
**Time Series Operations:**
- Resampling: `resample`
- Rolling windows: `rolling`, `expanding`, `ewm`
- Time-based operations: `at_time`, `between_time`, `tshift`
**Output Operations:**
- Writing data: `to_csv`, `to_parquet`, `to_odps_table`, `to_json`
## Edge Case Handling
**No Operators Found:**
- Try different search terms and variations
- Search in content, not just names
- Suggest alternative approaches or related operations
- Inform the user if the feature might not be available in MaxFrame
**Multiple Operators Match:**
- Explain the differences between them
- Recommend the best one for the specific use case
- Provide examples for each relevant option
**Operator Exists But Limited:**
- Note any limitations or warnings in the documentation
- Suggest workarounds or alternative approaches
- Provide clear guidance on when the operator is appropriate
**Complex Multi-Operator Tasks:**
- Break down the task into steps
- Recommend operators for each step
- Show how to combine them in a pipeline
- Provide a complete end-to-end example
## Output Guidelines
- **Be concise but thorough**: Provide enough information without overwhelming
- **Use code blocks**: Always format code examples properly
- **Explain trade-offs**: When multiple options exist, explain pros and cons
- **Reference documentation**: When relevant, mention that more details are in the full documentation
- **Stay focused**: Don't go beyond what the user asked for unless it's directly helpful
## Example Responses
**For operator validation:**
```
✅ Operator exists: `maxframe.dataframe.DataFrame.merge`
✅ Operator exists: `maxframe.dataframe.DataFrame.groupby`
✅ Operator exists: `maxframe.dataframe.read_parquet`
✅ Operator exists: `maxframe.dataframe.DataFrame.to_odps_table`
All four operators are available in MaxFrame.
```
**For operator recommendation:**
```
## Recommended Operator: `DataFrame.mf.apply_chunk`
**Purpose**: Apply a function to each chunk of a DataFrame for distributed processing
**Use case**: When you need to process large datasets with custom functions in a distributed manner
**Signature**:
```python
DataFrame.mf.apply_chunk(func, args=(), meta=None, **kwargs)
```
**Parameters**:
- func: The function to apply to each chunk
- args: Additional positional arguments to pass to func
- meta: Metadata for the output (required for certain operations)
- kwargs: Additional keyword arguments
**Example**:
```python
import maxframe as mf
# Read data
df = mf.read_parquet('path/to/data.parquet')
# Apply custom function to chunks
result = df.mf.apply_chunk(
lambda chunk: chunk.groupby('category').sum(),
meta={'value': 'float64'}
)
```
**Performance Note**: This is more efficient than `apply()` for large datasets as it processes data in chunks in parallel.
```
## Remember
- Your goal is to help users find and use the right MaxFrame operators
- Always validate operator existence before recommending
- Use section extraction to avoid loading large documentation
- Provide clear, working examples
- Consider performance and best practices in your recommendations
- When in doubt, search for more information or ask clarifying questions
FILE:references/practical-guides/README.md
# MaxFrame Practical Guides
This directory contains practical guides for MaxFrame development, based on real-world experience and common troubleshooting scenarios.
## Guides
### [UDF Development Guide](./udf-development-guide.md)
Best practices for developing User-Defined Functions:
- Resource reuse patterns
- Output type specification
- Third-party package integration
- Network access configuration
- Timeout and memory handling
- Debugging strategies
### [Error Troubleshooting Guide](./error-troubleshooting.md)
Common error codes and solutions:
- Data type errors
- Dependency errors
- Job execution errors
- Memory and resource errors
- Session and table errors
- Data errors
- Shuffle errors
### [Job Configuration Guide](./job-configuration-guide.md)
Configuration and tuning options:
- SQL settings
- Session management
- PythonPack configuration
- Resource allocation
- Split and shuffle configuration
- Best practices for different scenarios
### [Data Handling Guide](./data-handling-guide.md)
Data processing patterns and techniques:
- JSON processing
- Data type handling
- Schema management
- String handling
- Data transformation patterns
- Validation techniques
### [OSS Mounting Guide](./oss-mounting-guide.md)
Mounting and using OSS as distributed storage:
- Prerequisites and RAM role setup
- Using `with_fs_mount` decorator
- Resource configuration
- Batch processing examples
- Debugging techniques
### [AI Function Guide](./ai-function-guide.md)
LLM inference with GPU resources:
- Environment setup for GU resources
- Using `ManagedTextGenLLM` for managed models
- Prompt template syntax
- Performance tuning and debugging
### [Flag Reference](./flag-reference.md)
Comprehensive flag and options reference:
- MaxCompute SQL flags configuration
- MaxFrame options reference
- Parallelism and split settings
- Resource and memory tuning
- Shuffle and stability flags
- UDF timeout and safety settings
## Quick Reference
### Common Configuration
```python
from maxframe import config, options, new_session
# Enable 2.0 data types
config.options.sql.settings = {
"odps.sql.type.system.odps2": "true",
"odps.session.image": "common",
}
# Session settings
options.session.max_idle_seconds = 60 * 60 * 24 # 24 hours
session = new_session()
```
### Common Issues
| Error | Solution |
|-------|----------|
| `invalid type INT` | Enable `odps.sql.type.system.odps2=true` |
| `No module named 'cloudpickle'` | Set `odps.session.image=common` |
| `User script exception` | Check UDF code, dependencies, types |
| `kInstanceMonitorTimeout` | Adjust batch size and timeout |
| `Job exceed live limit` | Increase session max_alive_seconds |
| `Table not found` | Increase temp_table_lifecycle |
FILE:references/practical-guides/ai-function-guide.md
# AI Function Guide
This guide covers how to use MaxFrame AI Function with GPU resources (GU) for large language model (LLM) offline inference.
## Overview
MaxFrame AI Function is an end-to-end solution for LLM offline inference on MaxCompute, integrating data processing with AI capabilities.
## Prerequisites
### Environment Requirements
| Requirement | Version/Description |
|-------------|---------------------|
| MaxFrame SDK | 2.3.0 or higher |
| Python | 3.11 |
| MaxCompute Project | GPU quota (GU) enabled |
### Permissions
- MaxCompute project-level read/write access
- Purchased MaxCompute GU quota (`gu_quota_name`)
## Environment Configuration
### Basic Setup
```python
import os
import maxframe.dataframe as md
import numpy as np
from maxframe import new_session
from maxframe.config import options
from maxframe.udf import with_running_options
from odps import ODPS
import logging
# Configure engine order
options.dag.settings = {
"engine_order": ["DPE", "MCSQL"],
"unavailable_engines": ["SPE"],
}
logging.basicConfig(level=logging.INFO)
# Initialize MaxFrame Session
o = ODPS(
access_id=os.getenv('ODPS_ACCESS_ID'),
secret_access_key=os.getenv('ODPS_ACCESS_KEY'),
project=os.getenv('ODPS_PROJECT'),
endpoint=os.getenv('ODPS_ENDPOINT'),
)
session = new_session(o)
# Set GU quota name (required for GPU usage)
options.session.gu_quota_name = "<your-gu-quota-name>"
print("LogView:", session.get_logview_address())
```
## Using Managed LLM Models
### Step 1: Prepare Input Data
```python
import pandas as pd
# Construct query list
query_list = [
"What is the average distance from Earth to the Sun?",
"What year did the American Revolutionary War begin?",
"What is the boiling point of water?",
"How to quickly relieve a headache?",
"Who is the protagonist in the Harry Potter series?",
]
# Convert to MaxFrame DataFrame
df = md.DataFrame({"query": query_list})
df.execute()
```
### Step 2: Initialize LLM Instance
```python
from maxframe.learn.contrib.llm.models.managed import ManagedTextGenLLM
llm = ManagedTextGenLLM(
name="Qwen3-4B-Instruct-2507-FP8" # Model name must match exactly
)
```
**Note:** For supported models, refer to MaxFrame AI Function supported models documentation.
### Step 3: Define Prompt Template
```python
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Please answer the following question: {query}"},
]
```
**Template Syntax:**
- Use `{column_name}` placeholders for DataFrame field substitution
- Supports multi-turn conversations (messages list)
- System prompt (`role: system`) sets model behavior
### Step 4: Execute Generation
```python
result_df = llm.generate(
df, # Input data
prompt_template=messages,
running_options={
"max_tokens": 4096, # Maximum output length
"verbose": True, # Enable verbose logging
},
params={"temperature": 0.7},
)
# Execute and fetch results
result_df.execute()
```
## Output Structure
The result DataFrame contains the following fields:
| Field | Type | Description |
|-------|------|-------------|
| `query` | string | Original input |
| `generated_text` | string | Model-generated response |
| `finish_reason` | string | Completion reason: `stop`, `length`, etc. |
| `usage.prompt_tokens` | int | Input token count |
| `usage.completion_tokens` | int | Output token count |
| `usage.total_tokens` | int | Total token count |
## Performance Tuning
### Optimization Guidelines
| Aspect | Recommendation |
|--------|----------------|
| Batch size | Keep batches < 100 to avoid OOM |
| GU allocation | `gu=2` for 4B models; larger models need more GU |
| Parallelism | MaxFrame auto-schedules; control with `num_workers` |
| Intermediate results | Save with `to_odps_table()` to avoid recomputation |
| Timeout | Set `timeout=3600` to prevent hanging |
## Debugging
### View Execution Logs
```python
print(session.get_logview_address()) # Click to view real-time MaxFrame job logs
```
### Small-Scale Testing
```python
df_sample = df.head(2) # Take 2 rows for testing
result_sample = llm.generate(
df_sample,
prompt_template=messages,
running_options={"gu": 2}
)
result_sample.execute()
```
### Check Resource Usage
View job execution details through MaxFrame LogView.
## Common Patterns
### Custom System Prompt
```python
messages = [
{"role": "system", "content": "You are an expert in data analysis. Provide concise answers."},
{"role": "user", "content": "Analyze the following data: {data}"},
]
```
### Multi-Turn Conversation
```python
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "My name is {name}."},
{"role": "assistant", "content": "Hello {name}! How can I help you today?"},
{"role": "user", "content": "{query}"},
]
```
### Batch Processing Large Datasets
```python
# Read from MaxCompute table
df = md.read_odps_table("project.schema.input_table")
# Process with LLM
result = llm.generate(
df,
prompt_template=messages,
running_options={"max_tokens": 2048},
)
# Save results to table
result.to_odps_table("project.schema.output_table")
result.execute()
```
FILE:references/practical-guides/data-handling-guide.md
# Data Handling Guide
This guide covers common data handling scenarios in MaxFrame, including JSON processing, data type handling, and schema management.
## JSON Data Processing
### Parsing JSON Strings
MaxFrame v1.0.0+ supports the `flatjson` method for extracting multiple JSON fields:
```python
# Extract multiple fields from JSON strings
df.mf.flatjson("json_column", ["field1", "field2", "field3"])
```
For more complex JSON parsing, use `apply` with custom functions:
```python
import json
import pandas as pd
import numpy as np
def parse_json(row):
data = json.loads(row["json_text"])
return pd.Series({
"field1": data.get("field1"),
"field2": data.get("field2"),
})
result = df.apply(
parse_json,
axis=1,
dtypes={"field1": np.str_, "field2": np.str_},
output_type="dataframe"
)
```
### Handling Invalid JSON
When parsing JSON, invalid strings will cause errors:
```python
def simple_failure(row):
import json
text = row["json_text"]
data = json.loads(text) # Fails if text is not valid JSON
return data
```
**Solution:** Add error handling:
```python
def safe_parse_json(row):
import json
try:
data = json.loads(row["json_text"])
return pd.Series({"result": data, "error": None})
except json.JSONDecodeError as e:
return pd.Series({"result": None, "error": str(e)})
result = df.apply(
safe_parse_json,
axis=1,
dtypes={"result": np.object_, "error": np.str_},
output_type="dataframe"
)
```
## Data Type Handling
### Type System Overview
MaxFrame uses MaxCompute's type system. Key types include:
| MaxCompute Type | NumPy Type | Python Type |
|----------------|------------|-------------|
| BIGINT | np.int64 | int |
| DOUBLE | np.float64 | float |
| STRING | np.str_ | str |
| BOOLEAN | np.bool_ | bool |
| DATETIME | np.datetime64 | datetime |
| ARRAY | - | list |
| MAP | - | dict |
### Enabling 2.0 Data Types
Some operations require MaxCompute 2.0 data types:
```python
from maxframe import config
config.options.sql.settings = {
"odps.sql.type.system.odps2": "true"
}
```
### Handling NULL Values
In Pandas, BIGINT/INT columns cannot contain NULL (converted to FLOAT). To handle this:
```python
# Fill NULL before printing
df["int_column"] = df["int_column"].fillna(0)
# Or convert to float
df["int_column"] = df["int_column"].astype(float)
```
### Specifying Output Types
When UDFs return DataFrames or Series, specify output types:
```python
import numpy as np
def process(row):
return pd.Series({
"id": int(row["id"]),
"name": str(row["name"]),
"score": float(row["score"]),
})
result = df.apply(
process,
axis=1,
dtypes={"id": np.int64, "name": np.str_, "score": np.float64},
output_type="dataframe"
)
```
### Complex Types (ARRAY/MAP/STRUCT)
For columns containing complex types:
```python
# Access array elements
df["first_element"] = df["array_column"].apply(
lambda x: x[0] if x else None,
dtype=np.str_
)
# Access map values
df["value"] = df["map_column"].apply(
lambda x: x.get("key"),
dtype=np.str_
)
```
## Schema Management
### Reading Tables with Index
When reading tables, specify an index column to avoid "no CMF" errors:
```python
import maxframe.dataframe as md
# Specify index column
df = md.read_odps_table("table_name", index_col="id")
# Later reset index if needed
df = df.reset_index()
```
### Writing Tables
Write results to MaxCompute tables:
```python
# Write to existing table
df.to_odps_table("project.schema.table_name")
# Write with partition
df.to_odps_table("project.schema.table_name", partition="dt=20250101")
```
## String Handling
### Maximum String Length
MaxCompute has a maximum string length of 268,435,456 characters. For longer strings:
**Option 1: Truncate or filter**
```python
from maxframe.dataframe import functions as F
# Filter rows with strings under limit
df = df[df["text_column"].str.len() < 268435456]
# Truncate strings
df["text_column"] = df["text_column"].str.slice(0, 1000000)
```
**Option 2: Compress data**
```python
import gzip
def compress_string(input_string):
encoded_string = input_string.encode('utf-8')
compressed_bytes = gzip.compress(encoded_string)
return compressed_bytes
def decompress_string(compressed_bytes):
return gzip.decompress(compressed_bytes).decode('utf-8')
```
## Data Transformation Patterns
### Row-wise Processing
Use `axis=1` for row-wise operations:
```python
def process_row(row):
# Access columns by name
value1 = row["col1"]
value2 = row["col2"]
result = value1 + value2
return result
df["new_col"] = df.apply(process_row, axis=1, dtype=np.float64)
```
### Element-wise Processing
For simple element-wise operations, use vectorized operations:
```python
# Preferred: vectorized
df["sum"] = df["col1"] + df["col2"]
# Instead of: apply with simple function
df["sum"] = df.apply(lambda row: row["col1"] + row["col2"], axis=1)
```
### Chunk Processing
For memory-intensive operations, use `apply_chunk`:
```python
def process_chunk(chunk):
# Process entire chunk at once
result = chunk.groupby("key").agg({"value": "sum"})
return result
result = df.apply_chunk(
process_chunk,
dtypes={"key": np.str_, "value": np.float64},
output_type="dataframe"
)
```
## Data Validation
### Checking Data Types
```python
# Check column types
print(df.dtypes)
# Check for NULL values
null_counts = df.isnull().sum()
# Check unique values
unique_counts = df.nunique()
```
### Validating Before Processing
```python
def validate_row(row):
# Check required fields
if pd.isna(row["required_field"]):
return pd.Series({"valid": False, "error": "missing required field"})
# Check data format
if not isinstance(row["numeric_field"], (int, float)):
return pd.Series({"valid": False, "error": "invalid numeric field"})
return pd.Series({"valid": True, "error": None})
validation = df.apply(
validate_row,
axis=1,
dtypes={"valid": np.bool_, "error": np.str_},
output_type="dataframe"
)
# Filter invalid rows
invalid_rows = df[~validation["valid"]]
```
## Common Patterns
### Conditional Transformation
```python
import numpy as np
def conditional_transform(row):
if row["type"] == "A":
return row["value"] * 2
elif row["type"] == "B":
return row["value"] * 3
else:
return row["value"]
df["transformed"] = df.apply(conditional_transform, axis=1, dtype=np.float64)
```
### Lookup and Join
```python
# Join with another DataFrame
result = df1.merge(df2, on="key", how="left")
# Or use map for simple lookups
lookup_dict = {"A": 1, "B": 2, "C": 3}
df["lookup_value"] = df["category"].map(lookup_dict)
```
### Aggregation
```python
# Group by and aggregate
result = df.groupby("group_column").agg({
"numeric_col": ["sum", "mean", "count"],
"string_col": "first"
})
```
FILE:references/practical-guides/error-troubleshooting.md
# Error Troubleshooting Guide
This guide covers common error codes and their solutions in MaxFrame.
## Data Type Errors
### ODPS-0130071: invalid type INT for function UDF definition
**Error:** `invalid type INT for function UDF definition, you need to set odps.sql.type.system.odps2=true`
**Cause:** Using MaxCompute 2.0 data types without enabling the 2.0 data type version.
**Solution:**
```python
from maxframe import config
# Add before new_session
config.options.sql.settings = {
"odps.sql.type.system.odps2": "true"
}
```
### ODPS-0130071: column values_list has incompatible type ARRAY/MAP/STRUCT
**Cause:** The data being processed contains arrays, dictionaries, or structs.
**Possible reasons:**
- Type declaration issue - the target column is not expected to be an array/dict/struct
- MaxFrame type system bug
**Solution:**
1. Upgrade MaxFrame client: `pip install -U maxframe`
2. Contact MaxFrame team if issue persists
## Dependency Errors
### UDF: No module named 'cloudpickle'
**Cause:** Missing cloudpickle package dependency.
**Solution:**
```python
from maxframe import config
# Add before new_session
config.options.sql.settings = {
"odps.session.image": "common",
}
```
### TypeError: Cannot accept arguments append_partitions
**Cause:** PyODPS version compatibility issue.
**Solution:** Upgrade PyODPS to version 0.12.0 or later.
## Job Execution Errors
### ODPS-0010000: Fuxi job failed - Job failed for unknown reason
**Cause:** Dependency installation failed when using `@with_python_requirements`.
**Solution:**
1. Retry the job (may be temporary network issue)
2. For periodic jobs, cache PythonPack results:
```python
from maxframe import options
options.pythonpack.task.settings = {"odps.pythonpack.production": "true"}
```
3. Use offline packaging with PyODPS-Pack and upload as MaxCompute Resource
### ODPS-0123144: Fuxi job failed - kInstanceMonitorTimeout CRASH_EXIT
**Cause:** UDF execution timeout.
**Solution:** Adjust batch size and timeout:
```python
from maxframe import options
options.sql.settings = {
"odps.sql.executionengine.batch.rowcount": "1",
"odps.function.timeout": "3600",
}
```
### ODPS-0123144: Fuxi job failed - Job exceed live limit
**Cause:** MaxCompute job exceeded maximum timeout (default 24 hours).
**Solution:**
```python
from maxframe import options
# Set MaxFrame Session maximum lifetime
options.session.max_alive_seconds = 72 * 60 * 60
# Set MaxFrame Session maximum idle timeout
options.session.max_idle_seconds = 72 * 60 * 60
options.sql.settings = {
# Set SQL job maximum runtime, default 24h, maximum 72h
"odps.sql.job.max.time.hours": 72,
}
```
### ODPS-0130071: physical plan generation failed - no CMF
**Cause:** Missing index column specification.
**Solution:** Add `index_col` parameter:
```python
df2 = md.read_odps_table("tablename", index_col="column").to_pandas()
df2 = df2.reset_index(inplace=True)
```
### ODPS-0130071: unable to retrieve row count of file
**Cause:** Using flags like `odps.sql.split.dop` to specify split count, but source table lacks meta file.
**Solution:** Use `odps.stage.mapper.split.size` instead (unit: MB, default 256, minimum 1).
To ensure meta file generation when writing tables:
```python
from maxframe import options
options.sql.settings = {
"odps.task.merge.enabled": "false",
"odps.sql.reshuffle.dynamicpt": "false",
"odps.sql.enable.dynaparts.stats.collection": "true",
"odps.optimizer.dynamic.partition.is.first.nth.value.split.enable": "false",
"odps.sql.stats.collection.aggressive": "true",
}
```
### ODPS-0130071: task instance count exceeds limit 99999
**Cause:** Source table is very large, and splitting by 256MB produces more than 99,999 chunks.
**Solution:**
1. Use `odps.stage.mapper.split.size` with a larger value
2. Use `odps.sql.split.dop` to specify expected split count
```python
from maxframe import options
options.sql.settings = {
"odps.stage.mapper.split.size": "512", # MB
# or
"odps.sql.split.dop": "10000",
}
```
## Memory and Resource Errors
### ODPS-0010000: System internal error - process exited with code 0
**Cause:** UDF/AI Function OOM.
**Solution:**
1. Contact MaxFrame team to confirm actual memory usage
2. Increase memory allocation:
```python
from maxframe import with_running_options
@with_running_options(memory="8GB")
def udf_func(row):
return row
```
### ODPS-0010000: process killed by signal 7
**Cause:** UDF sent an abnormal signal.
**Solution:**
1. Check if UDF uses signal to send cancel/timeout
2. Contact MaxCompute team
### ODPS-0010000: StdException:vector::_M_range_insert
**Cause:** UDF cannot allocate enough memory.
**Solution:**
1. Check UDF for memory issues
2. Check native dependencies for memory problems
3. Increase UDF memory allocation
### ODPS-0020011: Total resource size must be <= 2048MB
**Cause:** UDF depends on resources exceeding the 2048MB limit.
**Solution:** Use external volume with OSS for faster downloads and higher limits.
## Session and Table Errors
### NoTaskServerResponseError
**Cause:** MaxFrame Session expired (idle for more than 1 hour in Jupyter Notebook).
**Solution:**
1. Recreate the Session (previous computation state will be lost)
2. For expected long intervals, set:
```python
from maxframe import options
options.session.max_idle_seconds = 60 * 60 * 24 # 24 hours
```
### ODPS-0110061: Table not found
**Cause:** MaxFrame job running over 24 hours; temporary tables expired.
**Solution:** Increase temporary table lifecycle:
```python
from maxframe import options
options.sql.settings = {
"session.temp_table_lifecycle": 3, # days
}
```
### ODPS-0010000: Database not found
**Cause:** Cannot find specified schema/project/table.
**Solution:**
1. Check SQL for correct project.schema.table names
2. Contact MaxCompute team
## Data Errors
### ODPS-0020041: StringOutOfMaxLength
**Cause:** String length exceeds maximum allowed (268,435,456).
**Solution:**
1. Truncate or filter long strings using `LENGTH` in `read_odps_query`
2. Compress data (e.g., gzip):
```python
def compress_string(input_string):
encoded_string = input_string.encode('utf-8')
compressed_bytes = gzip.compress(encoded_string)
return compressed_bytes
```
### IntCastingNaNError: Cannot convert non-finite values to integer
**Cause:** Printing BIGINT/INT columns containing NULL or INF values in Jupyter.
**Solution:**
1. Use `fillna` before printing
2. Use `astype` to convert to FLOAT before printing
3. Avoid printing problematic columns
## Shuffle Errors
### Shuffle output too large
**Solution:**
```python
from maxframe import options
options.sql.settings = {
"odps.sql.sys.flag.fuxi_JobMaxInternalFolderSize": "10240", # MB
}
```
### ODPS-0010000: SQL job failed after failover for too many times
**Cause:** Job Master OOM due to large shuffle data.
**Solution:**
1. Reduce mapper and reducer/joiner counts (keep under 10,000)
2. Contact MaxCompute team
## External Table Errors
### ODPS-0123131: User defined function exception - Fatal Error Happened
**Cause:** Internal error in external table read/write.
**Solution:** Contact MaxCompute team.
FILE:references/practical-guides/flag-reference.md
# MaxFrame Flag Reference
This document provides a comprehensive reference for MaxCompute Flags, runtime flags, and MaxFrame runtime parameters, including their meanings, default values, valid ranges, and typical use cases.
## Configuration Overview
### MaxCompute SQL Flags
All MaxCompute SQL-related flags are managed through the `options.sql.settings` dictionary:
```python
from maxframe import options
options.sql.settings = {
# Example: Set maximum job runtime to 72 hours
"odps.sql.job.max.time.hours": 72,
# Example: Specify custom runtime image
"odps.session.image": "common",
# Example: Set split DOP to 50000 for all tables
"odps.sql.split.dop": '{"*":50000}',
# Example: Set batch size to 1024 rows
"odps.sql.executionengine.batch.rowcount": 1024,
}
```
### MaxFrame Options
MaxFrame's own runtime parameters are configured directly via `options.xxx`:
```python
from maxframe import options
# Example: Set LogView retention to 24 hours
options.session.logview_hours = 24
# Example: Set retry count for retryable errors
options.retry_times = 3
# Example: Enable MaxCompute query optimization
options.sql.enable_mcqa = True
```
## MaxCompute SQL Flags Reference
### Parallelism and Split
| Flag | Description | Range/Default | Recommendation |
|------|-------------|---------------|----------------|
| `odps.sql.split.dop` | Configure data read parallelism (DOP) based on column statistics (CMF). Format: `{table_name: count}`. Use `*` for all tables, e.g., `{"*":50000}`. | 1-99999; No default | Enable explicitly for large tables or large-scale tasks |
| `odps.stage.mapper.split.size` | Split size (MB) when CMF is unavailable | ≥1; Default 256MB | Usually keep default |
### Resources and Memory
| Flag | Description | Range/Default | Recommendation |
|------|-------------|---------------|----------------|
| `odps.stage.mapper.mem` | Memory (MB) for single Mapper instance | 1024-12288; Default 1024 | Increase for OOM with large data or complex processing |
| `odps.stage.reducer.mem` | Memory (MB) for single Reducer instance | 1024-12288; Default 1024 | Increase for OOM during shuffle operations |
| `odps.stage.joiner.mem` | Memory (MB) for single Joiner instance | 1024-12288; Default 1024 | Increase for OOM during complex joins |
| `odps.stage.reducer.num` | Number of Reducer instances | Max 10000; Default: auto | Increase for large-scale shuffle or data skew |
| `odps.stage.joiner.num` | Number of Joiner instances | Max 10000; Default: auto | Increase for large joins or data skew |
### Shuffle and Output Safety
| Flag | Description | Range/Default | Recommendation |
|------|-------------|---------------|----------------|
| `odps.sql.runtime.flag.fuxi_streamline_x_EnableNormalCheckpoint` | Enable backup for Mapper intermediate data | Default: disabled | Enable for long-running, large shuffle jobs |
| `fuxi_ShuffleService_client_CheckpointMaxCopy` | Number of backup copies | Default: 1 | Set to 2 for improved fault tolerance in large shuffle jobs |
| `odps.sql.sys.flag.fuxi_JobMaxInternalFolderSize` | Maximum shuffle intermediate data size (MB) | Default: system limit | Increase if encountering "Internal data size exceeds limit" errors |
### Computation Stability and Monitoring
| Flag | Description | Range/Default | Recommendation |
|------|-------------|---------------|----------------|
| `odps.sql.runtime.flag.fuxi_EnableInstanceMonitor` | Enable Fuxi scheduler heartbeat monitoring | Default: enabled | Use with `fuxi_InstanceMonitorTimeout` to prevent false "dead" termination |
| `fuxi_InstanceMonitorTimeout` | Instance monitor timeout (seconds) | Default: system | Requires allowlist from technical support |
| `odps.job.instance.retry.times` | Maximum retry count for failed workers | Default: 3; Max: 100 | Requires allowlist for values above default |
| `odps.dag2.compound.config` | Worker reuse strategy. Set `fuxi.worker.reuse.policy:NO_REUSE` to disable | Default: enabled | Disable when UDF has memory leaks or state pollution |
### Execution Efficiency
| Flag | Description | Range/Default | Recommendation |
|------|-------------|---------------|----------------|
| `odps.sql.executionengine.batch.rowcount` | Batch size (rows) for data processing | Default: 1024 | Balance between memory and performance. Reduce for large rows causing OOM |
| `odps.sql.runtime.flag.executionengine_EnableVectorizedExpr` | Enable vectorized expression engine | Default: disabled | Enable for compute-intensive operations with `rand()` or arithmetic |
| `odps.optimizer.enable.conditional.mapjoin` | Enable conditional map join | Default: system | Use with `cbo.rule.filter.black` |
| `odps.optimizer.cbo.rule.filter.black` | Disable specific optimization rules. Set to `"hj"` to disable HashJoin | Default: none | Expert option - use with caution |
| `odps.sql.split.cluster.parallel_explore` | Parallel CMF reading during split | Default: disabled | Enable when split phase takes too long |
| `odps.sql.jobmaster.memory` | JobMaster memory (MB) | Default: system | Increase for large-scale shuffle, e.g., 30000MB |
### UDF and Function Safety
| Flag | Description | Range/Default | Recommendation |
|------|-------------|---------------|----------------|
| `odps.sql.udf.timeout` | UDF batch execution timeout (seconds) | 1-3600; Default 1800 | 0 has no effect |
| `odps.function.timeout` | Function batch execution timeout (seconds) | 1-3600; Default 1800 | 0 has no effect |
| `odps.sql.runtime.flag.executionengine_PythonStdoutMaxsize` | Maximum stdout log size (MB) for Python UDF | Max: 100; Default: 20 | Requires allowlist from technical support |
### Resources and Environment
| Flag | Description | Range/Default | Recommendation |
|------|-------------|---------------|----------------|
| `odps.session.image` | Custom runtime image name | Must exist in project | Use for custom dependencies |
| `odps.task.major.version` | Lock job to specific MaxCompute version | Expert option | Do not configure without understanding impact |
| `odps.storage.orc.row.group.stride` | ORC file row group size | Expert option | Do not configure without understanding |
| `odps.storage.meta.file.version` | CMF metadata file version | Expert option | Do not configure without understanding |
### General Parameters
| Flag | Description | Range/Default | Recommendation |
|------|-------------|---------------|----------------|
| `odps.sql.allow.fullscan` | Allow full table scan on partitioned tables | Default: disabled | Enable cautiously - may cause high costs |
| `odps.sql.cfile2.field.maxsize` | Maximum field size (bytes) | Max: 268435456 (256MB); Default: 8388608 (8MB) | Increase for large text, HTML, or Base64 fields |
| `odps.sql.job.max.time.hours` | Maximum job runtime (hours) | Max: 72; Default: 24 | Increase for long-running jobs |
| `odps.sql.always.commit.result` | Enable partial commit | Default: disabled | Use with `EnableWorkerCommit` for ETL allowing partial success |
| `odps.sql.runtime.flag.executionengine_EnableWorkerCommit` | Enable worker-level commit | Default: disabled | Use with `always.commit.result` |
### CMF Generation (Fixed Configuration)
For writing to dynamic partition tables with proper CMF generation:
```python
options.sql.settings = {
"odps.task.merge.enabled": "false",
"odps.sql.reshuffle.dynamicpt": "false",
"odps.sql.enable.dynaparts.stats.collection": "true",
"odps.optimizer.dynamic.partition.is.first.nth.value.split.enable": "false",
"odps.sql.stats.collection.aggressive": "true",
}
```
This ensures fast and correct CMF generation for dynamic partition tables, which is essential for downstream jobs using `odps.sql.split.dop`.
## MaxFrame Options Reference
### Session Configuration
| Option | Description | Type | Default |
|--------|-------------|------|---------|
| `options.session.quota_name` | Quota resource for job execution | str/None | None |
| `options.session.logview_hours` | LogView link retention (hours) | int | 24 |
| `options.session.max_alive_seconds` | Maximum session lifetime | int | 86400 (24h) |
| `options.session.max_idle_seconds` | Maximum idle time before session recycling | int | 3600 (1h) |
| `options.session.temp_table_lifecycle` | Temporary table lifecycle (days) | int | 1 |
| `options.session.auto_purge_temp_tables` | Auto-cleanup temp tables on session end | bool | False |
### SQL Configuration
| Option | Description | Type | Default |
|--------|-------------|------|---------|
| `options.sql.enable_mcqa` | Enable MaxCompute query acceleration | bool | True |
| `options.sql.generate_comments` | Add comments to generated SQL | bool | True |
| `options.sql.auto_use_common_image` | Auto-configure common image for dependencies | bool | True |
### Other Options
| Option | Description | Type | Default |
|--------|-------------|------|---------|
| `options.local_timezone` | Local timezone for date/time functions | str/None | None |
| `options.retry_times` | Retry count for retryable errors | int | 3 |
| `options.function.default_running_options` | Default resources for `@remote` functions | dict | {} |
### DPE Engine Settings
```python
# UDF external network access whitelist
options.dpe.settings = {
"substep.public_network_whitelist": ["xxxxxx"]
}
# UDF internal network access whitelist
options.dpe.settings = {
"substep.internal_network_whitelist": ["xxxxx"]
}
```
## Important Notes
1. **Allowlist Requirements**: Many special flags require allowlist approval from MaxCompute technical support before use.
2. **Custom Images**: Some flags depend on custom runtime images being properly configured.
3. **CMF Dependencies**: Split-related flags like `odps.sql.split.dop` require proper CMF statistics.
4. **Expert Options**: Flags marked as "expert options" should only be configured with full understanding of their impact on execution plans.
5. **Technical Support**: Always confirm with MaxCompute technical support before configuring advanced options.
FILE:references/practical-guides/job-configuration-guide.md
# Job Configuration Guide
This guide covers MaxFrame job configuration, tuning, and session management.
## SQL Settings Configuration
MaxFrame allows configuring SQL execution settings using `config.options.sql.settings`. These settings correspond to MaxCompute SQL flags.
### Basic Configuration
```python
from maxframe import config
config.options.sql.settings = {
"odps.sql.type.system.odps2": "true",
}
```
### Common Settings
#### Stage Parallelism
```python
from maxframe import config
config.options.sql.settings = {
# Mapper split size in MB (default 256, minimum 1)
"odps.stage.mapper.split.size": "8",
# Number of joiner instances
"odps.stage.joiner.num": "20",
# Number of reducer instances
"odps.stage.reducer.num": "100",
}
```
#### Batch Processing
```python
from maxframe import options
options.sql.settings = {
# Batch size for UDF processing (default 1024, minimum 1)
"odps.sql.executionengine.batch.rowcount": "1",
# Batch timeout in seconds (default 1800, maximum 3600)
"odps.function.timeout": "3600",
}
```
## Session Management
### Session Lifetime
MaxFrame sessions have configurable lifetime and idle timeout settings.
```python
from maxframe import options
# Maximum session lifetime (default 24 hours)
options.session.max_alive_seconds = 72 * 60 * 60 # 72 hours
# Maximum idle timeout (default 1 hour)
options.session.max_idle_seconds = 72 * 60 * 60 # 72 hours
```
### SQL Job Timeout
```python
from maxframe import options
options.sql.settings = {
# SQL job maximum runtime in hours (default 24, maximum 72)
"odps.sql.job.max.time.hours": 72,
}
```
### Temporary Table Lifecycle
For long-running jobs that span multiple days, increase temporary table lifecycle:
```python
from maxframe import options
options.sql.settings = {
# Temporary table lifecycle in days (default 1)
"session.temp_table_lifecycle": 3,
}
```
## PythonPack Configuration
### Production Caching
For periodic jobs, cache PythonPack results to improve stability:
```python
from maxframe import options
# Cache PythonPack results for production use
options.pythonpack.task.settings = {"odps.pythonpack.production": "true"}
```
### Force Rebuild
To ignore cache and rebuild dependencies:
```python
from maxframe import with_python_requirements
@with_python_requirements("package_name", force_rebuild=True)
def process(data):
...
```
## Resource Configuration
### UDF Memory Settings
Configure memory for individual UDFs:
```python
from maxframe import with_running_options
@with_running_options(memory="8GB")
def memory_intensive_udf(row):
# Process large data
return result
```
### AI Function Resource Settings
For AI Functions, set resources in the function call:
```python
result = ai_function(
...,
running_options={"memory": "8GB"}
)
```
## Split Configuration
### Split by Size
Use `odps.stage.mapper.split.size` for size-based splitting:
```python
from maxframe import options
options.sql.settings = {
# Split size in MB (default 256, minimum 1)
"odps.stage.mapper.split.size": "512",
}
```
### Split by Count
Use `odps.sql.split.dop` to specify target split count:
```python
from maxframe import options
options.sql.settings = {
# Target number of splits
"odps.sql.split.dop": "1000",
}
```
**Note:** The actual split count may differ from the target due to various constraints. Setting a value near the maximum (99,999) may still cause errors.
## Shuffle Configuration
### Shuffle Size
Adjust shuffle output size for large data operations:
```python
from maxframe import options
options.sql.settings = {
# Shuffle folder size in MB
"odps.sql.sys.flag.fuxi_JobMaxInternalFolderSize": "10240",
}
```
## Instance Count Limits
MaxCompute SQL jobs have a maximum of 99,999 instances per task. To avoid exceeding this limit:
1. Increase split size to reduce mapper instances
2. Reduce reducer/joiner counts
3. Keep total instances under 10,000 for best performance
```python
from maxframe import options
options.sql.settings = {
# Larger split size reduces mapper instances
"odps.stage.mapper.split.size": "1024", # 1GB
# Limit reducer instances
"odps.stage.reducer.num": "100",
"odps.stage.joiner.num": "100",
}
```
## Metadata Statistics Collection
To ensure proper split functionality and meta file generation:
```python
from maxframe import options
options.sql.settings = {
"odps.task.merge.enabled": "false",
"odps.sql.reshuffle.dynamicpt": "false",
"odps.sql.enable.dynaparts.stats.collection": "true",
"odps.optimizer.dynamic.partition.is.first.nth.value.split.enable": "false",
"odps.sql.stats.collection.aggressive": "true",
}
```
## Image Configuration
### Using Base Image
Reference MaxCompute base image for common dependencies:
```python
from maxframe import config
config.options.sql.settings = {
"odps.session.image": "common",
}
```
### Custom Runtime Image
For custom dependencies and environments, use a custom DPE runtime image:
```python
from maxframe import config
config.options.sql.settings = {
"odps.session.image": "<your_custom_image>",
}
```
## Configuration Best Practices
### For Interactive Development (Jupyter)
```python
from maxframe import config, options, new_session
# Enable 2.0 data types
config.options.sql.settings = {
"odps.sql.type.system.odps2": "true",
}
# Longer idle timeout for interactive sessions
options.session.max_idle_seconds = 60 * 60 * 4 # 4 hours
session = new_session()
```
### For Production Jobs
```python
from maxframe import config, options, new_session
# Enable 2.0 data types
config.options.sql.settings = {
"odps.sql.type.system.odps2": "true",
"odps.session.image": "common",
}
# Longer session lifetime
options.session.max_alive_seconds = 72 * 60 * 60 # 72 hours
# Cache PythonPack results
options.pythonpack.task.settings = {"odps.pythonpack.production": "true"}
# Increase temp table lifecycle for long jobs
options.sql.settings["session.temp_table_lifecycle"] = 3
session = new_session()
```
### For Large Data Processing
```python
from maxframe import config, options, new_session
# Optimize for large data
config.options.sql.settings = {
# Larger split size for fewer mappers
"odps.stage.mapper.split.size": "512",
# More parallelism
"odps.stage.reducer.num": "200",
# Longer timeout
"odps.function.timeout": "3600",
# Larger shuffle space
"odps.sql.sys.flag.fuxi_JobMaxInternalFolderSize": "20480",
}
session = new_session()
```
FILE:references/practical-guides/oss-mounting-guide.md
# OSS Mounting Guide
This guide covers how to mount and use Alibaba Cloud OSS (Object Storage Service) as distributed storage in MaxFrame jobs using the `with_fs_mount` decorator.
## Overview
The `with_fs_mount` decorator enables file system-level mounting of OSS buckets in MaxCompute. This allows you to access OSS files like a local disk, which is more efficient than traditional SDK-based methods like `pd.read_csv("oss://...")`.
### Use Cases
- Loading raw data from OSS for cleaning or processing
- Writing intermediate results to OSS for downstream consumption
- Sharing trained model files, configuration files, and other static resources
## Prerequisites
### 1. Enable OSS Service and Create Bucket
1. Log in to [OSS Console](https://oss.console.aliyun.com/)
2. Navigate to **Bucket List** in the left navigation
3. Click **Create Bucket**
4. Note your bucket name (e.g., `xxx-oss-test-sh`)
### 2. Create RAM Role for MaxCompute
1. Log in to [RAM Console](https://ram.console.aliyun.com/)
2. Navigate to **Identity Management > Roles**
3. Click **Create Role**
4. Click **Create Service Linked Role** in the top right
5. Select **Cloud Service** as the trusted principal type
6. Select **MaxCompute (Cloud Native Big Data Computing Service)** as the trusted principal
7. In the **Permission Management** tab, click **Add Authorization**
8. Add the following permission policies:
| Policy | Description | Link |
|--------|-------------|------|
| `AliyunOSSFullAccess` | Full access to OSS | [Policy Detail](https://ram.console.aliyun.com/policies/detail?policyType=System&policyName=AliyunOSSFullAccess) |
| `AliyunMaxComputeFullAccess` | Full access to MaxCompute | [Policy Detail](https://ram.console.aliyun.com/policies/detail?policyType=System&policyName=AliyunMaxComputeFullAccess) |
## Basic Usage
### Recommended: Using Role ARN
```python
from maxframe.udf import with_fs_mount
@with_fs_mount(
"oss://oss-cn-<region>-internal.aliyuncs.com/<bucket-name>/path/",
"/mnt/oss_data",
storage_options={
"role_arn": "acs:ram::<uid>:role/<role-name>"
},
)
def _process(batch_df):
import os
if os.path.exists('/mnt/oss_data'):
print(f"Mounted files: {os.listdir('/mnt/oss_data')}")
else:
print("/mnt/oss_data not mounted!")
return batch_df * 2
```
### Not Recommended: Using AccessKey (Testing Only)
```python
# For testing purposes only - NOT recommended for production
storage_options={
"access_key_id": "LTAI5t...",
"access_key_secret": "Wp9H..."
}
```
**Important:** Avoid hardcoding AccessKey. Using `role_arn` allows the system to automatically request temporary STS tokens, preventing AK/SK leakage.
## Resource Configuration
Use `with_running_options` to control resource allocation:
```python
from maxframe.udf import with_running_options, with_fs_mount
@with_running_options(engine="dpe", cpu=2, memory=16)
@with_fs_mount(
"oss://oss-cn-<region>-internal.aliyuncs.com/<bucket>/path/",
"/mnt/oss_data",
storage_options={"role_arn": "acs:ram::<uid>:role/<role-name>"},
)
def _process(batch_df):
...
```
### Resource Recommendations
| Parameter | Recommended Value | Notes |
|-----------|------------------|-------|
| `engine` | `"dpe"` | FS Mount only supports DPE engine |
| `cpu` | 1-4 | Increase for complex I/O or decompression |
| `memory` | 8GB+ | Use ≥16GB for large file loading |
## Complete Example: Batch Processing
```python
import os
from odps import ODPS
from maxframe import new_session
from maxframe.udf import with_fs_mount, with_running_options
import maxframe.dataframe as md
# Initialize ODPS client
o = ODPS(
access_id=os.getenv('ODPS_ACCESS_ID'),
secret_access_key=os.getenv('ODPS_ACCESS_KEY'),
project=os.getenv('ODPS_PROJECT'),
endpoint=os.getenv('ODPS_ENDPOINT'),
)
# Set image (default DPE runtime includes ossfs2)
from maxframe import config
config.options.sql.settings = {
"odps.session.image": "maxframe_service_dpe_runtime"
}
# Start session
session = new_session(o)
print("LogView:", session.get_logview_address())
print("Session ID:", session.session_id)
# Define UDF with OSS mount
@with_running_options(engine="dpe", cpu=2, memory=8)
@with_fs_mount(
"oss://oss-cn-<region>-internal.aliyuncs.com/<bucket>/test/",
"/mnt/oss_data",
storage_options={
"role_arn": "acs:ram::<uid>:role/maxframe-oss"
},
)
def _process(batch_df):
import pandas as pd
import os
# Step 1: Verify mount
mount_point = "/mnt/oss_data"
if not os.path.exists(mount_point):
raise RuntimeError("OSS mount failed!")
# Step 2: Load data (e.g., mapping table, dictionary)
mapping_file = os.path.join(mount_point, "category_map.csv")
if os.path.isfile(mapping_file):
mapping_df = pd.read_csv(mapping_file)
# Step 3: Process current chunk
result = batch_df.copy()
result['F'] = result['A'] * 10
return result
# Create DataFrame and apply UDF
data = [[1.0, 2.0, 3.0, 4.0, 5.0], ...]
df = md.DataFrame(data, columns=['A', 'B', 'C', 'D', 'E'])
result_df = df.mf.apply_chunk(
_process,
skip_infer=True,
output_type="dataframe",
dtypes=df.dtypes,
index=df.index
)
# Execute and fetch results
result = result_df.execute().fetch()
```
**Note:** `skip_infer=True` skips type inference for faster execution, but requires correct `dtypes` and `index` parameters.
## Debugging
### Verify Mount Status
Add debug logs in your UDF:
```python
import os
print("Mount path exists:", os.path.exists("/mnt/oss_data"))
print("Files in mount:", os.listdir("/mnt/oss_data") if os.path.exists("/mnt/oss_data") else [])
```
Check LogView output for messages like:
```
FS Mount 成功!/mnt/oss_data: ['data.csv', 'config.json', 'model.pkl']
Processing batch with shape: (1000, 5)
```
## OSSFS Dependency
The default `maxframe_service_dpe_runtime` image includes OSSFS. For custom images, you need to install:
- `ossfs2_2.0.3.1_linux_x86_64.deb`
## Key Points
1. **Security First:** Always use `role_arn` instead of hardcoded AccessKey
2. **Engine Requirement:** FS Mount only works with DPE engine (`engine="dpe"`)
3. **Resource Planning:** Allocate sufficient memory for large file operations
4. **Mount Verification:** Always check mount status before processing
5. **Error Handling:** Add proper error handling for mount failures
FILE:references/practical-guides/udf-development-guide.md
# UDF Development Guide
This guide covers best practices and patterns for developing User-Defined Functions (UDFs) in MaxFrame.
## Resource Reuse in UDFs
In some UDF scenarios, you may need to create or destroy resources multiple times (e.g., initializing database connections, loading models). You can leverage Python's function parameter default value initialization behavior to achieve resource reuse.
### Pattern: Resource Initialization
The default value of a function parameter is initialized only once. You can use this to cache resources:
```python
def predict(s, _ctx={}):
from ultralytics import YOLO
# _ctx's initial value is an empty dict, initialized only once during Python execution.
# Check if the model exists in _ctx; if not, load it and store in the dict.
if not _ctx.get("model", None):
model = YOLO(os.path.join("./", "yolo11n.pt"))
_ctx["model"] = model
model = _ctx["model"]
# Subsequent model operations
...
```
### Pattern: Resource Cleanup with Classes
For resources that need cleanup (like database connections), use a class with `__del__`:
```python
class MyConnector:
def __init__(self):
# Create database connection in __init__
self.conn = create_connection()
def __del__(self):
# Close connection in __del__
try:
self.conn.close()
except:
pass
def process(s, connector=MyConnector()):
# Use the connector's database connection directly
connector.conn.execute("xxxxx")
```
**Note:** The actual number of initializations depends on the number of Workers running the UDF. Each Worker executes the UDF in a separate Python environment. For example, if a UDF call needs to process 100,000 rows and is distributed across 10 UDF Workers, each processing 10,000 rows, initialization will occur 10 times total—one per Worker.
## Specifying Output Types
When using methods like `apply` with custom functions, MaxFrame attempts to infer the return type of the UDF. However, in some cases, you need to explicitly specify `dtypes`:
- The UDF cannot execute properly in the current environment (depends on custom images, third-party dependencies, or incorrect parameters)
- The actual return type doesn't match the specified `output_type`
### Examples
Return a DataFrame with one int column:
```python
df.apply(..., dtypes=pd.Series([np.int_]), output_type="dataframe")
```
Return a DataFrame with columns A and B:
```python
df.apply(..., dtypes={"A": np.int_, "B": np.str_}, output_type="dataframe")
```
Return a bool Series named "flag":
```python
df.apply(..., dtype="bool", name="flag", output_type="series")
```
## Using Third-Party Packages
### Using with_resources
```python
from maxframe.udf import with_resources
@with_resources("resource_name")
def process(row):
...
```
### Using PythonPack
For installing dependencies dynamically, you can cache PythonPack results for stability:
```python
from maxframe import options
# Cache PythonPack results for production use
options.pythonpack.task.settings = {"odps.pythonpack.production": "true"}
```
To ignore the cache and rebuild, add `force_rebuild=True` in `@with_python_requirements`.
Alternatively, you can:
- Use PyODPS-Pack to package dependencies offline and upload as MaxCompute Resource
- Reference resources in your job using `@with_resources`
## Network Access in UDFs
By default, network access is disabled in MaxCompute UDF containers. If your UDF needs network access:
```python
def request_external(row):
import requests
url = "https://example.com/api"
response = requests.get(url)
return response.text
```
Without network access enabled, this will fail with:
```
ConnectionRefusedError: [Errno 111] Connection refused
```
**Solution:** Enable network access through the network开通 process. Contact your administrator for network configuration.
## UDF Timeout Handling
### Error: kInstanceMonitorTimeout / CRASH_EXIT
This typically means the UDF execution timed out. In MaxCompute offline computing, UDFs are monitored by batch rows—if a UDF doesn't complete processing N rows within M time, it times out.
**Solution:** Adjust batch size and timeout:
```python
from maxframe import options
options.sql.settings = {
# Batch size, default 1024, minimum 1
"odps.sql.executionengine.batch.rowcount": "1",
# Batch timeout in seconds, default 1800, maximum 3600
"odps.function.timeout": "3600",
}
```
## UDF Memory Configuration
### Error: OOM in UDF
If your UDF runs out of memory, configure more memory:
```python
from maxframe import with_running_options
@with_running_options(memory="8GB")
def udf_func(row):
return row
```
For AI Functions, set memory in the function call:
```python
result = ai_function(..., running_options={"memory": "8GB"})
```
## Debugging UDF Errors
### Error: ODPS-0123055 User script exception
This is the most common error type in MaxFrame, occurring during UDF execution (apply, apply_chunk, flatmap, map, transform operators).
**Common causes:**
1. **Code logic errors** - Check the function logic
2. **Unhandled exceptions** - Review try-except blocks
3. **Network access without permission** - Enable network access
4. **Type mismatch** - Ensure declared dtypes match actual return types
5. **Missing dependencies** - Ensure all dependencies are available
**Debugging approach:**
1. Check the stderr of the failed instance for the full stack trace
2. Identify the function and line number from the trace
3. Test locally by constructing test data:
```python
def udf_func(row):
import json
text = row["json_text"]
data = json.loads(text)
return data
# Test locally with constructed input
udf_func(pd.Series(['{"hello": "maxframe"}'], index=["json_text"]))
```
### Error: ModuleNotFoundError during deserialization
If you see errors like `No module named 'xxhash'` during unpickling, it means the UDF references a dependency that's not available in the runtime environment.
**Solution:** Install the dependency via PythonPack or include it as a resource.
### Error: Type mismatch
```python
def type_failure(row):
text = row["A"]
# Returns a float
return text
# Declared as str but returns float
df.apply(type_failure, axis=1, dtypes={"A": np.str_}, output_type="dataframe").execute()
```
Error message:
```
TypeError: return value expected <class 'unicode'> but <class 'float'> found
```
**Solution:** Ensure declared dtypes match actual return types.
FILE:references/remote-debug-guide.md
# Remote Debug Complete Guide
Comprehensive guide for interactive coding, testing, and debugging MaxFrame programs on MaxCompute cluster. Covers SDK discovery, environment configuration, error handling, and best practices for iterative development.
**Note**: This guide is typically used with the `alibabacloud-odps-maxframe-coding` skill to execute and debug code generated by the skill.
## Table of Contents
- [IMPORTANT RULES](#important-rules)
- [Environment Configuration](#environment-configuration)
- [SDK Source Discovery](#sdk-source-discovery)
- [Interactive Coding Workflow](#interactive-coding-workflow)
- [Testing and Debugging](#testing-and-debugging)
- [Error Handling](#error-handling)
- [Common Error Patterns and Solutions](#common-error-patterns-and-solutions)
- [Best Practices](#best-practices)
- [Resources](#resources)
## IMPORTANT RULES
### Rule 1: Use Public APIs Only
**From User perspective, coding can only use API exposed by maxframe-client.** Always use public APIs from the MaxFrame SDK and avoid internal or undocumented functions.
**How to Discover Available User-Side Operators:**
1. **Reference Documentation Files:**
- `key-modules.md` - Comprehensive reference for key modules
- `maxframe-context.md` - Comprehensive context guide
- `operator-selection-rules.md` - Operator selection guidelines
- `operator-selector.md` - Agent guide for operator selection
2. **Use Operator Lookup Script:**
- `scripts/lookup_operator.py` - Command-line tool for operator documentation
- Search: `python scripts/lookup_operator.py search <pattern>`
- Info: `python scripts/lookup_operator.py info <operator_name>`
- List: `python scripts/lookup_operator.py list`
3. **Programmatic SDK Discovery:**
- Find SDK installation location from Python site-packages
- Explore installed SDK modules and APIs programmatically
### Rule 2: Environment Variable Loading
**DO NOT read private .env files directly.** Use `dotenv.load_dotenv()` to load environment variables programmatically.
**Correct Pattern:**
```python
import dotenv
import os
# Load environment variables programmatically
dotenv.load_dotenv()
# Access environment variables
project = os.getenv("ODPS_PROJECT")
access_id = os.getenv("ODPS_ACCESS_ID")
```
## Environment Configuration
### Recommended Environment Variables
Using environment variables is a recommended best practice for managing MaxCompute connection credentials.
**Commonly used environment variables:**
- **`ODPS_PROJECT`** - MaxCompute project name
- **`ODPS_ACCESS_ID`** - Access key ID for authentication
- **`ODPS_ACCESS_KEY`** - Access key secret for authentication
- **`ODPS_ENDPOINT`** - MaxCompute service endpoint URL
**Optional variables for specific features:**
- **`OSS_ENDPOINT`** - OSS endpoint (for OSS-related operations)
- **`OSS_BUCKET_NAME`** - OSS bucket name
- **`OSS_ACCESS_KEY_ID`** - OSS access key ID
- **`OSS_ACCESS_KEY_SECRET`** - OSS access key secret
- **`OSS_ROLE_ARN`** - OSS role ARN for cross-service access
### Environment Setup Example
Create a `.env` file in your project root:
```bash
# MaxCompute connection
ODPS_PROJECT=your_project_name
ODPS_ACCESS_ID=your_access_id
ODPS_ACCESS_KEY=your_access_key
ODPS_ENDPOINT=https://service.cn-hangzhou.maxcompute.aliyun.com/api
# Optional - OSS operations
# OSS_ENDPOINT=https://oss-cn-hangzhou.aliyuncs.com
# OSS_BUCKET_NAME=your_bucket_name
```
### Loading Environment Variables in Code
```python
import os
import dotenv
from odps import ODPS
from maxframe.session import new_session
dotenv.load_dotenv()
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
session = new_session(o)
```
## SDK Source Discovery
When agents need to find SDK interfaces, they should find SDK source from local Python's site-packages rather than from a local repository.
### Finding SDK Installation Location
```python
import maxframe
import os
sdk_path = os.path.dirname(maxframe.__file__)
print(f"MaxFrame SDK installed at: {sdk_path}")
# Explore available modules
sdk_dir = os.path.dirname(sdk_path)
maxframe_dir = os.path.join(sdk_dir, 'maxframe')
if os.path.exists(maxframe_dir):
modules = [f for f in os.listdir(maxframe_dir)
if os.path.isdir(os.path.join(maxframe_dir, f)) and not f.startswith('_')]
print(f"Available modules: {modules}")
```
### Exploring SDK Modules and APIs
```python
import maxframe.dataframe as md
import inspect
def get_public_members(module):
"""Get all public members from a module."""
return [name for name, obj in inspect.getmembers(module)
if not name.startswith('_') and inspect.isclass(obj) or inspect.isfunction(obj)]
dataframe_members = get_public_members(md)
print(f"DataFrame module has {len(dataframe_members)} public members")
# Get source code location
df_class = md.DataFrame
source_file = inspect.getfile(df_class)
print(f"DataFrame class defined in: {source_file}")
```
### Key Modules to Reference
1. **`maxframe.dataframe`** - DataFrame operations (pandas-compatible)
2. **`maxframe.tensor`** - Tensor operations (NumPy-like)
3. **`maxframe.learn`** - Machine learning (scikit-learn-like)
4. **`maxframe.session`** - Session management
5. **`maxframe.udf`** - User-defined functions
6. **`maxframe.config`** - Configuration options
## Interactive Coding Workflow
### Workflow Pattern
The interactive coding workflow executes and debugs code generated by the `alibabacloud-odps-maxframe-coding` skill:
1. **Load Code** - Load MaxFrame code and review structure
2. **Execute Code** - Run code and monitor execution
3. **Analyze Results** - Analyze errors, use logview for diagnostics
4. **Iterate and Fix** - Propose fixes and repeat until successful
### Iterative Development Cycle
```
Load → Execute → Analyze → Fix → Repeat
↑ ↓
└────────────────────────────────┘
```
### Pattern 1: Execute → Analyze → Fix → Repeat
```python
try:
result.execute()
print("Success!")
except Exception as e:
print(f"Error: {e}")
print(f"Logview: {session.get_logview_address()}")
# Analyze error via logview → Fix code → Re-execute
```
### Pattern 2: Progressive Enhancement
1. Start with minimal operations
2. Execute and verify each step
3. Add complexity incrementally
4. Monitor execution at each stage
### Session Management Best Practices
```python
import maxframe.dataframe as md
from maxframe.session import new_session
import dotenv
import os
dotenv.load_dotenv()
session = new_session()
try:
# Your MaxFrame operations
df = md.read_odps_table("table_name")
result = df.groupby('col').sum()
result.execute()
finally:
# Always cleanup session, even on error
session.destroy()
```
## Testing and Debugging
### Printing Logview URLs
The logview URL provides detailed execution logs, task information, and error details from MaxCompute:
```python
session = new_session(o)
try:
# Print logview after session creation
print(f"Logview: {session.get_logview_address()}")
# Your operations
df = md.read_odps_table("table_name")
result = df.groupby('col').sum()
result.execute()
# Print logview after execution
logview_url = session.get_logview_address()
if logview_url:
print(f"Execution Logview: {logview_url}")
finally:
session.destroy()
```
### Using Logview for Detailed Diagnostics
The logview URL provides:
1. **Execution Timeline** - When each task started and completed
2. **Task Details** - Individual task logs and status
3. **Error Messages** - Detailed error information
4. **Resource Usage** - CPU, memory, network statistics
5. **SQL Queries** - Generated SQL for debugging
**How to Use:**
1. Get logview URL from `session.get_logview_address()`
2. Open URL in browser (may require authentication)
3. Navigate through execution stages
4. Check error logs in failed tasks
5. Review SQL queries to understand execution plan
## Error Handling
### Comprehensive Error Handling Pattern
```python
import logging
import traceback
import maxframe.dataframe as md
from maxframe.session import new_session
from odps import ODPS
import os
import dotenv
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
dotenv.load_dotenv()
o = ODPS(
access_id=os.getenv("ODPS_ACCESS_ID"),
secret_access_key=os.getenv("ODPS_ACCESS_KEY"),
project=os.getenv("ODPS_PROJECT"),
endpoint=os.getenv("ODPS_ENDPOINT"),
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-odps-maxframe-coding'
)
session = new_session(o)
try:
logger.info(f"Session created. Logview: {session.get_logview_address()}")
df = md.read_odps_table("table_name")
result = df.groupby('col').sum()
try:
result.execute()
logger.info("Execution completed successfully")
logview_url = session.get_logview_address()
if logview_url:
logger.info(f"Final Logview: {logview_url}")
except Exception as exec_error:
logger.error(f"Execution failed: {str(exec_error)}")
logger.error("Traceback:")
traceback.print_exc()
# Always print logview on error
logview_url = session.get_logview_address()
if logview_url:
logger.error(f"Error Logview: {logview_url}")
print(f"\n{'='*60}")
print(f"ERROR: Execution failed")
print(f"Logview URL: {logview_url}")
print(f"{'='*60}\n")
raise
except Exception as e:
logger.error(f"Session error: {str(e)}")
traceback.print_exc()
raise
finally:
try:
session.destroy()
logger.info("Session destroyed")
except Exception as cleanup_error:
logger.warning(f"Error during cleanup: {cleanup_error}")
```
### Best Practices for Error Handling
1. **Always Use Try-Finally** - Ensure session cleanup even on errors
2. **Print Logview on Errors** - Always provide logview URL when errors occur
3. **Log Comprehensively** - Use logging to capture execution flow
4. **Capture Full Tracebacks** - Use `traceback.print_exc()` for debugging
5. **Provide Context** - Include relevant information in error messages
## Common Error Patterns and Solutions
### 1. Authentication and Connection Errors
**Patterns:** "Invalid access key", "Authentication failed", "Connection refused"
**Solutions:**
- Verify environment variables loaded correctly
- Check that `ODPS_ACCESS_ID` and `ODPS_ACCESS_KEY` are correct and not expired
- Ensure `ODPS_ENDPOINT` matches your region
- Verify `ODPS_PROJECT` name is accurate
- Check network connectivity and firewall settings
**Debug Pattern:**
```python
import os
print(f"ODPS_PROJECT: {os.getenv('ODPS_PROJECT')}")
print(f"ODPS_ENDPOINT: {os.getenv('ODPS_ENDPOINT')}")
```
### 2. Table and Resource Not Found Errors
**Patterns:** "Table not found", "NoSuchTable", "Partition not found"
**Solutions:**
- Verify table name spelling and case sensitivity
- Check if table exists in the project using ODPS console or `o.get_table()`
- Verify you have read permissions on the table
- For partitioned tables, check partition specification format
- Verify partition values exist in the table
**Verify Table Exists:**
```python
from odps import ODPS
try:
table = o.get_table("table_name")
print(f"Table exists: {table.name}")
print(f"Partitions: {[p.name for p in table.table_schema.partitions]}")
except Exception as e:
print(f"Table not found: {e}")
```
### 3. Execution and Timeout Errors
**Patterns:** "Timeout", "Execution timeout", "Quota exceeded"
**Solutions:**
- Check logview URL for detailed execution status
- Review query complexity and optimize
- Consider using batch processing with `apply_chunk`
- Check MaxCompute resource quotas and limits
- Increase session timeout if needed
**Configure Timeout:**
```python
from maxframe import options
options.session.max_alive_seconds = 7200 # 2 hours
options.retry_times = 10
options.retry_delay = 1.0
```
### 4. Type and Schema Mismatch Errors
**Patterns:** "TypeError", "ValueError", "Column type mismatch"
**Solutions:**
- Check DataFrame dtypes: `df.dtypes` or `df.info()`
- Convert types explicitly: `df['col'] = df['col'].astype('int64')`
- Handle nullable types appropriately: `Int64`, `Float64`, `boolean`
- Check for None/NaN values before operations
**Fix Type Issues:**
```python
print(f"DataFrame dtypes:\n{df.dtypes}")
# Convert types explicitly
df['int_col'] = df['int_col'].astype('int64')
df['nullable_int'] = df['nullable_int'].astype('Int64')
# Handle None/NaN
if df['col'].isna().any():
df['col'] = df['col'].fillna(0)
```
### 5. SQL and Query Errors
**Patterns:** "SQL syntax error", "Invalid SQL", "Column not found"
**Solutions:**
- Review the generated SQL query in logview
- Check column names for typos and case sensitivity
- Verify column names exist in the DataFrame
- Use explicit column references in complex queries
- Check for reserved keyword conflicts in column names
**Debug Column Issues:**
```python
print(f"Available columns: {df.columns.tolist()}")
# Use explicit column selection
df_selected = df[['col1', 'col2', 'col3']]
if 'column_name' not in df.columns:
print(f"Column not found. Available: {df.columns.tolist()}")
```
## Best Practices
### Remote Debugging Best Practices
1. **Always capture logview URLs** - Essential for diagnostics
2. **Use comprehensive logging** - Track execution progress and errors
3. **Test incrementally** - Build complexity step by step
4. **Handle cleanup properly** - Always destroy sessions in finally blocks
5. **Monitor resource usage** - Check quotas and limits via logview
6. **Validate types early** - Check dtypes before operations
7. **Use error-specific handlers** - Different patterns for different error types
### Session Management Best Practices
1. Create session before operations
2. Use try-finally for cleanup
3. Print logview URLs for diagnostics
4. Monitor execution progress
5. Handle errors gracefully
## Resources
For more detailed information about MaxFrame operators and features, refer to:
- **MaxFrame Context Guide**: `maxframe-context.md`
- **Key Modules Reference**: `key-modules.md`
- **Common Workflow**: `common-workflow.md`
- **Operator Selection Rules**: `operator-selection-rules.md`
- **Operator-Selector Guide**: `operator-selector.md`
- **MaxFrame Client Documentation**: `maxframe-client-docs/`
- **Operator Lookup Script**: `scripts/lookup_operator.py`
FILE:references/runtime-image-guides/README.md
# Custom Runtime Image Guides
Comprehensive guides for building custom MaxFrame DPE runtime images through conversational workflow.
## Guide Structure
### Core Guides
1. **[Base Image Selection](base-image-selection.md)** (60 lines)
- Ubuntu 22.04 vs 24.04 comparison
- Decision framework and recommendation matrix
- Use case specific recommendations
2. **[Python Environment Strategy](python-environment-strategy.md)** (110 lines)
- Multi-environment architecture
- Version selection guidelines (3.7-3.12)
- MF_PYTHON_EXECUTABLE configuration (CRITICAL)
3. **[Package Management](package-management.md)** (160 lines)
- Conda vs pip decision guide
- Mirror acceleration (China region)
- Installation patterns and best practices
- GPU package handling
4. **[GPU/CUDA Configuration](gpu-cuda-configuration.md)** (80 lines)
- CUDA version compatibility matrix
- Platform considerations
- Complete GPU setup patterns
- PyTorch with CUDA installation
5. **[System Dependencies](system-dependencies.md)** (90 lines)
- Essential system packages
- Locale and timezone configuration
- Miniforge installation
- ossfs2 installation
6. **[Environment Variables](environment-variables.md)** (50 lines)
- Required environment variables
- MF_PYTHON_EXECUTABLE (CRITICAL)
- Optional custom variables
7. **[Image Optimization](image-optimization.md)** (100 lines)
- Size reduction strategies
- Build time optimization
- .dockerignore patterns
- Layer caching strategy
8. **[Dockerfile Templates](dockerfile-templates.md)** (160 lines)
- 8 complete section templates
- Header, base setup, conda, GPU, packages, environment, verification
- Ready-to-use patterns
9. **[Common Scenarios](common-scenarios.md)** (110 lines)
- Basic ML runtime (GPU)
- Data processing runtime (CPU, multi-Python)
- Minimal single-Python runtime
10. **[Testing and Validation](testing-validation.md)** (90 lines)
- Health check commands
- Environment verification
- Package import tests
- Integration tests
### Supplementary Guides
- **[Architecture Details](architecture-details.md)** (435 lines)
- Building from scratch with public images
- Miniforge vs Miniconda comparison
- Python environment architecture
- Resource considerations
- **[Conda Best Practices](conda-best-practices.md)** (476 lines)
- Multi-environment strategy
- Channel configuration
- Package installation patterns
- GPU package considerations
- **[Practical Guides](practical-guides.md)** (477 lines)
- Conda distribution selection
- GPU/CUDA support details
- Common scenarios with detailed explanations
- Best practices checklist
## Quick Navigation
**I want to...**
- Choose between Ubuntu 22.04 and 24.04 → [Base Image Selection](base-image-selection.md)
- Decide which Python versions to use → [Python Environment Strategy](python-environment-strategy.md)
- Install packages with conda or pip → [Package Management](package-management.md)
- Set up GPU/CUDA support → [GPU/CUDA Configuration](gpu-cuda-configuration.md)
- Optimize image size → [Image Optimization](image-optimization.md)
- Get ready-to-use Dockerfile sections → [Dockerfile Templates](dockerfile-templates.md)
- See example Dockerfiles → [Common Scenarios](common-scenarios.md)
- Test my custom image → [Testing and Validation](testing-validation.md)
## Conversational Workflow
All guides support the conversational workflow documented in **SKILL.md "Scenario 4: Create Custom Runtime Image"**.
The skill will:
1. Read these guides to understand best practices
2. Ask questions about your requirements
3. Guide decisions with explanations
4. Build Dockerfile section-by-section
5. Create support files and test instructions
## Related Resources
- **SKILL.md** - Main skill workflow
- **Examples** - `assets/examples/` - Working MaxFrame code examples
- **Operator Selection** - `references/operator-selector.md` - Finding MaxFrame operators
---
**All guides designed for the conversational approach - learn patterns, understand decisions, build confidently.**
FILE:references/runtime-image-guides/architecture-details.md
# Base DPE Runtime Image Details
> **For workflow guidance:** See [README.md](README.md) for the recommended conversational workflow and decision frameworks. This file provides supplemental architecture details about the base DPE runtime.
## Building from Scratch (Public Images)
### Why Build from Scratch?
The custom DPE runtime skill builds images from scratch using public images for the following reasons:
- **No dependency on internal registries** - No need for alibaba-inc registry access
- **Full control over base image** - Customize every layer
- **Public and accessible to all users** - Anyone can build without special permissions
- **Reproducible builds** - Same Dockerfile produces same results anywhere
### Base Components
1. **Ubuntu 22.04** - Public base OS
- LTS release with long-term support
- Stable package repository
- Wide compatibility
2. **Miniforge** - Conda-forge distribution (preferred over Miniconda)
- Uses conda-forge channel by default
- Community-driven, open-source
- Smaller footprint than Miniconda
- No Anaconda repository needed
3. **System Utilities** - Essential tools
- vim, curl, wget, jq, dnsutils
- build-essential for compiling packages
- ca-certificates, locales
- ffmpeg for media processing
4. **MaxFrame SDK** - Automatically verified and installed
- `maxframe>=2.0.0`
- `pyodps` - MaxCompute Python SDK
5. **User Packages** - Installed in all Python environments via conda
## Miniforge vs Miniconda
### Why Miniforge?
**Miniforge** (our choice):
- ✅ Uses conda-forge channel by default
- ✅ Community-driven, open-source
- ✅ Smaller initial footprint
- ✅ No Anaconda repository configuration needed
- ✅ Better for conda-forge-first installations
**Miniconda**:
- ❌ Uses Anaconda defaults by default
- ❌ Requires manual channel configuration
- ❌ Larger initial size
- ❌ Anaconda terms of service considerations
## Python Environments Architecture
### Multi-Environment Setup
The custom DPE runtime creates **isolated conda environments** for each selected Python version, matching the official DPE runtime pattern:
```
/py-runtime/ # MINIFORGE_HOME
├── bin/ # Conda executables
│ ├── conda
│ ├── pip
│ └── python -> ../envs/py310/bin/python
├── envs/ # Conda environments
│ ├── py37/ # Python 3.7 environment
│ │ ├── bin/
│ │ │ ├── python -> python3.7
│ │ │ └── pip
│ │ └── lib/python3.7/site-packages/
│ ├── py38/ # Python 3.8 environment
│ ├── py39/ # Python 3.9 environment
│ ├── py310/ # Python 3.10 environment (default)
│ ├── py311/ # Python 3.11 environment
│ └── py312/ # Python 3.12 environment
└── pkgs/ # Package cache
```
#### Default Environment
The default environment is set via `CONDA_DEFAULT_ENV` environment variable, typically the **highest selected Python version** (e.g., `py312` if all versions selected).
**CRITICAL: MF_PYTHON_EXECUTABLE Environment Variable**
MaxFrame enforces the `MF_PYTHON_EXECUTABLE` environment variable to detect the Python executable at runtime. This **MUST** be configured correctly:
```dockerfile
ENV CONDA_DEFAULT_ENV=py311
ENV MF_PYTHON_EXECUTABLE=/py-runtime/envs/py311/bin/python
```
**Path pattern:**
```
/py-runtime/envs/<env_name>/bin/python
```
**Why this is critical:**
- MaxFrame uses this variable to locate the Python interpreter for job execution
- Incorrect path will cause runtime failures
- Must point to the conda environment's Python executable, not system Python
**Default selection logic:**
1. If Python 3.11 is in selected versions → `MF_PYTHON_EXECUTABLE=/py-runtime/envs/py311/bin/python`
2. Otherwise → Uses highest selected version
### Default Environment
Each environment is named `py<version>`:
- Python 3.7 → `py37`
- Python 3.8 → `py38`
- Python 3.9 → `py39`
- Python 3.10 → `py310` (typically default)
- Python 3.11 → `py311`
- Python 3.12 → `py312`
### Default Environment
The default environment is set via `CONDA_DEFAULT_ENV` environment variable, typically the **highest selected Python version** (e.g., `py312` if all versions selected).
**How to use different environments:**
```bash
# Default environment
docker run --rm image:tag python script.py
# Specific environment
docker run --rm image:tag conda run -n py39 python script.py
# Interactive shell in specific environment
docker run -it image:tag conda run -n py310 bash
```
## Installation Strategy
### System-Level vs Conda Packages
**System-level (via apt-get):**
- Build tools (build-essential, gcc, etc.)
- System utilities (vim, curl, wget, jq)
- Libraries with system dependencies (ffmpeg)
- Locale and timezone configuration
**Conda packages (in each environment):**
- Python interpreter
- Python packages (user-requested + MaxFrame SDK)
- Conda-managed libraries
This separation ensures:
- Stable base system
- Isolated Python environments
- Proper dependency management
- Reproducible builds
### MaxFrame SDK Installation
**Note:** MaxFrame SDK and pyodps are **NOT required** in the custom runtime image.
These packages are installed at the **client side** (where MaxFrame code is developed and executed), not in the DPE runtime service. The custom runtime image runs on the MaxCompute cluster and only needs user-specific dependencies for your processing logic.
**Architecture:**
- **Client side**: MaxFrame SDK + pyodps installed (where you write and submit code)
- **DPE runtime (cluster)**: No MaxFrame SDK needed - executes submitted code
- **Custom image**: Only user packages (transformers, pandas, numpy, etc.)
**Why NOT install in custom runtime?**
- MaxFrame SDK is a client-side library for code development
- DPE runtime only needs user-specific dependencies for data processing
- Installing SDK in runtime duplicates packages and increases image size
- Focus on packages required by your processing logic only
## Resource Considerations
### Image Size
| Configuration | Approximate Size |
|--------------|------------------|
| All versions (3.7-3.12) | 3-5 GB |
| 3 versions (e.g., 3.10-3.12) | 1.5-2.5 GB |
| Single version (e.g., 3.10) | 0.8-1.2 GB |
**Size reduction strategies:**
- Select only needed Python versions
- Use `conda clean -afy` to remove cache
- Avoid unnecessary packages
- Use multi-stage builds (advanced)
### Build Time
| Configuration | Approximate Build Time |
|--------------|------------------------|
| All versions (3.7-3.12) | 10-20 minutes |
| 3 versions | 5-10 minutes |
| Single version | 2-5 minutes |
**Factors affecting build time:**
- Number of Python environments
- Number and size of user packages
- Network speed (conda package download)
- CPU and disk I/O
### Runtime Memory
Each Python environment is independent. When a job runs:
- Only one environment is active
- Memory usage depends on the packages in that environment
- No overhead from other environments
## Version Compatibility
### Python Version Selection
**Python 3.7**:
- ✅ Wide package compatibility
- ⚠️ End of life (EOL): June 2023
- Use only for legacy compatibility
**Python 3.8**:
- ✅ Good package support
- ⚠️ EOL: October 2024
- Stable, but aging
**Python 3.9**:
- ✅ Excellent package support
- ✅ Stable and well-tested
- Good balance for most use cases
**Python 3.10** (recommended default):
- ✅ Latest stable features
- ✅ Excellent package support
- Good balance of features and stability
**Python 3.11**:
- ✅ Significant performance improvements
- ✅ Good package support
- Modern, efficient
**Python 3.12**:
- ✅ Latest Python features
- ⚠️ Some packages may not be available yet
- Use for cutting-edge development
### Package Version Compatibility
When specifying package versions:
- Check conda-forge availability for all selected Python versions
- Test compatibility across versions
- Consider using version ranges instead of pinned versions
**Example:**
```bash
# Good: flexible version range
--packages "pandas>=1.5,<2.0" "numpy>=1.23"
# May cause issues: too specific
--packages "pandas==1.5.3" "numpy==1.23.5"
```
## Security Considerations
### What NOT to Include
- ❌ Hardcoded credentials
- ❌ Private SSH keys
- ❌ Internal network configurations
- ❌ Proprietary data files
- ❌ Personal access tokens
### What IS Safe to Include
- ✅ Public certificates (if needed)
- ✅ Configuration templates
- ✅ Documentation for required secrets
- ✅ Publicly available packages
### Runtime Secrets
Pass sensitive information via:
- Environment variables (from MaxCompute)
- OSS configuration (from MaxCompute)
- MaxFrame session parameters
- Docker secrets (orchestration platforms)
**Example:**
```bash
docker run -e ODPS_ACCESS_KEY_ID=$ODPS_ACCESS_KEY_ID \
-e ODPS_ACCESS_KEY_SECRET=$ODPS_ACCESS_KEY_SECRET \
image:tag
```
## Updates and Maintenance
### Base Image Updates
When Ubuntu or Miniforge releases updates:
1. **Pull new base image**:
```bash
docker pull ubuntu:22.04
```
2. **Rebuild custom image**:
```bash
docker build --no-cache -t image:tag .
```
3. **Test thoroughly**:
- Run test scripts
- Verify all packages
- Check MaxFrame functionality
4. **Update version tag**:
```bash
docker tag image:tag image:v2
```
### Package Updates
To update user packages:
1. **Use conversational workflow**:
- Follow the workflow in SKILL.md "Scenario 4: Create Custom Runtime Image"
- Specify updated package versions during package collection step
2. **Rebuild and test**
3. **Deploy new version**
### Conda Environment Updates
To update packages in specific environments:
```bash
# Update in one environment
docker run --rm image:tag conda run -n py310 conda update package-name
# Rebuild with updated environment
docker commit container-id image:tag
```
## Common Issues and Solutions
### Issue: Import conflicts between environments
**Cause**: Different package versions in different environments
**Solution**: Ensure consistent package installation across all selected environments (script handles this automatically)
### Issue: Permission denied when installing packages
**Cause**: Trying to modify system packages
**Solution**: Install in conda environments only (script handles this)
### Issue: Missing system libraries
**Cause**: Python packages require system dependencies
**Solution**: Add to apt-get install list in Dockerfile:
```dockerfile
RUN apt-get update && \
apt-get install -y libgl1-mesa-glx libglib2.0-0
```
### Issue: Large image size
**Cause**: Too many Python versions or packages
**Solution**:
- Select only needed Python versions
- Remove unnecessary packages
- Use `conda clean -afy`
- Consider multi-stage builds
## Testing Custom Images
### Basic Health Check
```bash
docker run --rm image:tag conda run -n py310 python -c "
import sys
print(f'Python: {sys.version}')
import maxframe
print(f'MaxFrame: {maxframe.__version__}')
"
```
### Environment Verification
```bash
# List all environments
docker run --rm image:tag conda env list
# Test each environment
for env in py37 py38 py39 py310 py311 py312; do
echo "Testing $env..."
docker run --rm image:tag conda run -n $env python --version
done
```
### Package Import Test
```bash
docker run --rm image:tag conda run -n py310 python -c "
import your_package
print(f'Package version: {your_package.__version__}')
"
```
### Integration Test
```python
# test_maxframe.py
from maxframe.session import new_session
import maxframe.dataframe as md
session = new_session(image="image:tag")
df = md.read_odps_table("test_table")
print(df.head())
session.destroy()
```
## Summary
The custom DPE runtime uses:
- **Public base image**: Ubuntu 22.04
- **Miniforge**: Better conda-forge integration
- **Multi-environment**: Python 3.7-3.12 (user selectable)
- **System separation**: System packages vs conda packages
- **Client-side SDK**: MaxFrame SDK installed at client, not in runtime image
This architecture provides flexibility, isolation, and compatibility while remaining fully accessible to all users without special registry access.
FILE:references/runtime-image-guides/base-image-selection.md
# Base Image Selection
Choose the right Ubuntu version for your custom MaxFrame DPE runtime image.
## Ubuntu 22.04 vs 24.04 Comparison
| Feature | Ubuntu 22.04 | Ubuntu 24.04 |
|---------|--------------|--------------|
| **CUDA Support** | ✅ Excellent (12.4, 12.1, 11.8) | ⚠️ Limited testing |
| **Package Stability** | ✅ Very stable, LTS until 2027 | 🔄 Newer packages, LTS until 2029 |
| **ML Libraries** | ✅ Fully tested (PyTorch, TensorFlow) | ⚠️ Some incompatibilities possible |
| **Production Ready** | ✅ Yes, battle-tested | ⚠️ Emerging adoption |
| **Best For** | GPU/ML workloads, production | Modern development, non-GPU |
## Decision Framework
**Choose Ubuntu 22.04 when:**
- GPU/CUDA support needed (ML, deep learning)
- Production deployments requiring maximum stability
- Using PyTorch, TensorFlow, or other ML frameworks
- Need battle-tested, widely-validated environment
**Choose Ubuntu 24.04 when:**
- Latest system packages required
- Modern Python development (3.12 integration)
- Non-GPU workloads
- Want newest LTS with longer support window
## Recommendation Matrix
| Use Case | Recommended Base | Reason |
|----------|------------------|---------|
| ML/AI with GPU | Ubuntu 22.04 | Best CUDA compatibility |
| ML/AI CPU-only | Ubuntu 22.04 | Stable ML libraries |
| Data processing | Ubuntu 22.04 or 24.04 | Either works well |
| Modern Python dev | Ubuntu 24.04 | Latest packages |
| Production critical | Ubuntu 22.04 | Maximum stability |
## Dockerfile Pattern
```dockerfile
# Ubuntu 22.04 (for GPU/ML)
FROM ubuntu:22.04
# OR Ubuntu 24.04 (for modern development)
FROM ubuntu:24.04
```
## Related Guides
- **[GPU/CUDA Configuration](gpu-cuda-configuration.md)** - GPU setup requires Ubuntu 22.04
- **[Dockerfile Templates](dockerfile-templates.md)** - Ready-to-use templates
- **[Common Scenarios](common-scenarios.md)** - Example Dockerfiles by use case
---
**Part of [Custom Runtime Image Guides](README.md)**
FILE:references/runtime-image-guides/common-scenarios.md
# Common Scenarios
Complete Dockerfile examples for common use cases.
## Scenario 1: Basic ML Runtime (GPU)
```dockerfile
FROM ubuntu:22.04
# System setup
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y \
wget curl vim ca-certificates locales build-essential jq && \
rm -rf /var/lib/apt/lists/*
# Miniforge setup
ENV MINIFORGE_HOME="/py-runtime"
ENV PATH="MINIFORGE_HOME/bin:PATH"
RUN wget -q https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O miniforge.sh && \
bash miniforge.sh -b -p MINIFORGE_HOME && rm -rf miniforge.sh
# Python 3.11
RUN conda create -y -n py311 python=3.11 -c conda-forge && conda clean -afy
# CUDA 12.4
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \
mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && rm -rf cuda-keyring_*.deb && \
apt-get update && apt-get install -y cuda-toolkit && rm -rf /var/lib/apt/lists/*
ENV CUDA_HOME=/usr/local/cuda
ENV PATH=$CUDA_HOME/bin:$PATH
# PyTorch with CUDA
RUN pip install torch==2.6.0+cu124 torchvision==0.21.0+cu124 --index-url https://download.pytorch.org/whl/cu124
# User packages
RUN conda install -n py311 -c conda-forge transformers accelerate && conda clean -afy
# Environment config
ENV CONDA_DEFAULT_ENV=py311
ENV MF_PYTHON_EXECUTABLE=/py-runtime/envs/py311/bin/python
CMD ["conda", "run", "-n", "py311", "python"]
```
## Scenario 2: Data Processing Runtime (CPU, Multi-Python)
```dockerfile
FROM ubuntu:22.04
# System setup
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y \
wget curl vim ca-certificates locales build-essential && \
rm -rf /var/lib/apt/lists/*
# Miniforge setup
ENV MINIFORGE_HOME="/py-runtime"
ENV PATH="MINIFORGE_HOME/bin:PATH"
RUN wget -q https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O miniforge.sh && \
bash miniforge.sh -b -p MINIFORGE_HOME && rm -rf miniforge.sh
# Multiple Python versions
RUN conda create -y -n py310 python=3.10 -c conda-forge && \
conda create -y -n py311 python=3.11 -c conda-forge && \
conda create -y -n py312 python=3.12 -c conda-forge && \
conda clean -afy
# Install packages in all environments
RUN for env in py310 py311 py312; do \
conda install -n $env -c conda-forge pandas dask polars pyarrow; \
done && conda clean -afy
# Environment config
ENV CONDA_DEFAULT_ENV=py311
ENV MF_PYTHON_EXECUTABLE=/py-runtime/envs/py311/bin/python
CMD ["conda", "run", "-n", "py311", "python"]
```
## Scenario 3: Minimal Single-Python Runtime
```dockerfile
FROM ubuntu:22.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y wget curl && rm -rf /var/lib/apt/lists/*
ENV MINIFORGE_HOME="/py-runtime"
ENV PATH="MINIFORGE_HOME/bin:PATH"
RUN wget -q https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O miniforge.sh && \
bash miniforge.sh -b -p MINIFORGE_HOME && rm -rf miniforge.sh
RUN conda create -y -n py311 python=3.11 -c conda-forge && conda clean -afy
RUN conda install -n py311 -c conda-forge pandas numpy && conda clean -afy
ENV CONDA_DEFAULT_ENV=py311
ENV MF_PYTHON_EXECUTABLE=/py-runtime/envs/py311/bin/python
CMD ["conda", "run", "-n", "py311", "python"]
```
## Related Guides
- **[Dockerfile Templates](dockerfile-templates.md)** - Individual section templates
- **[Testing and Validation](testing-validation.md)** - How to test these examples
---
**Part of [Custom Runtime Image Guides](README.md)**
FILE:references/runtime-image-guides/conda-best-practices.md
# Conda Best Practices for Custom DPE Images with Multi-Python Support
## Overview
This guide covers best practices for using conda in custom MaxFrame DPE runtime images with support for multiple Python versions (3.7-3.12).
## Multi-Environment Strategy
### Why Multiple Environments?
The custom DPE runtime creates isolated conda environments for each Python version to:
1. **Match official DPE runtime pattern** - Same architecture as official images
2. **Provide flexibility** - Users can choose Python version at runtime
3. **Ensure compatibility** - Different Python versions for different jobs
4. **Enable testing** - Test code across multiple Python versions
### Environment Structure
```
/py-runtime/envs/
├── py37/ # Python 3.7
├── py38/ # Python 3.8
├── py39/ # Python 3.9
├── py310/ # Python 3.10 (typically default)
├── py311/ # Python 3.11
└── py312/ # Python 3.12
```
### Installation Pattern
**User packages are installed in ALL selected Python environments:**
```dockerfile
RUN for env in py37 py38 py39 py310 py311 py312; do \
conda install -n $env -c conda-forge \
transformers \
torch \
pandas; \
done && \
conda clean -afy
```
This ensures consistency across all environments.
## Why Use Conda
Conda provides several advantages for managing custom packages in DPE runtime images:
- **Dependency Management**: Automatic resolution of complex dependencies
- **Binary Packages**: Pre-compiled binaries for faster installation
- **Environment Isolation**: Separate user packages from system packages
- **Cross-Platform**: Consistent behavior across different systems
- **Version Control**: Precise version pinning and updates
## Conda Channels
### Recommended Channels
```bash
# Default channels (in order of priority)
conda-forge # Community-maintained, most packages
pytorch # PyTorch and related packages
defaults # Official Anaconda packages
```
### Channel Priority
Set channel priority in Dockerfile:
```dockerfile
RUN conda config --add channels conda-forge && \
conda config --add channels pytorch && \
conda config --set channel_priority strict
```
**Why strict priority?**
- Prevents package conflicts
- Ensures packages come from preferred channels
- Faster dependency resolution
## Package Installation Patterns
### Pattern 1: Simple Installation
For straightforward packages without version constraints:
```dockerfile
RUN conda install -y -c conda-forge \
requests \
beautifulsoup4 \
lxml \
&& conda clean -afy
```
### Pattern 2: Version Pinning
For specific versions:
```dockerfile
RUN conda install -y -c conda-forge \
pandas=1.5 \
numpy=1.23 \
scipy=1.9 \
&& conda clean -afy
```
### Pattern 3: Flexible Constraints
For minimum versions:
```dockerfile
RUN conda install -y -c conda-forge \
"pandas>=1.5,<2.0" \
"numpy>=1.23" \
&& conda clean -afy
```
### Pattern 4: Multi-Channel
For packages from different channels:
```dockerfile
RUN conda install -y -c pytorch -c conda-forge \
pytorch=2.0 \
torchvision \
transformers \
&& conda clean -afy
```
### Pattern 5: From Multiple Channels
When packages come from multiple sources:
```dockerfile
# Install PyTorch from pytorch channel
RUN conda install -y -c pytorch pytorch=2.0 torchvision
# Install other packages from conda-forge
RUN conda install -y -c conda-forge \
transformers \
datasets \
&& conda clean -afy
```
## Cleanup for Smaller Images
### Essential Cleanup Command
Always run after installation:
```dockerfile
RUN conda clean -afy
```
This removes:
- Package tarballs
- Index cache
- Lock files
- Unused packages
### Estimated Space Savings
| Component | Space Saved |
|-----------|-------------|
| Package cache | 200-500 MB |
| Index cache | 10-50 MB |
| Lock files | < 1 MB |
| **Total** | **210-550 MB** |
## Environment Variables
### Common Conda Variables
```dockerfile
# Conda configuration
ENV CONDA_AUTO_UPDATE_CONDA=false
ENV CONDA_CHANNELS=conda-forge,pytorch
# Package caches (use tmpfs during build)
ENV CONDA_PKGS_DIRS=/tmp/conda_pkgs
# Disable unnecessary features
ENV CONDA_AUTO_ACTIVATE_BASE=false
```
## Conda vs pip
### When to Use Conda
- Binary packages available (faster installation)
- Complex dependencies (C/C++ libraries)
- Scientific computing packages (numpy, scipy, pandas)
- ML frameworks (pytorch, tensorflow)
### When to Use pip
- Python-only packages not in conda
- Latest versions not yet in conda
- Packages only available on PyPI
### Mixed Approach
If pip is needed:
```dockerfile
# Install conda packages first
RUN conda install -y -c conda-forge \
numpy \
pandas \
&& conda clean -afy
# Then install pip-only packages
RUN pip install --no-cache-dir \
some-pip-only-package \
another-package
```
## GPU Package Considerations
### CUDA Version Compatibility
Check base image CUDA version:
```dockerfile
# Verify CUDA version
RUN nvcc --version || echo "No CUDA in base image"
```
### PyTorch with CUDA
```dockerfile
# Match PyTorch version to CUDA
RUN conda install -y -c pytorch -c nvidia \
pytorch=2.0 \
torchvision \
torchaudio \
pytorch-cuda=11.8 \
&& conda clean -afy
```
### TensorFlow with CUDA
```dockerfile
RUN conda install -y -c conda-forge \
tensorflow-gpu=2.12 \
&& conda clean -afy
```
## Troubleshooting
### Issue: Package not found
**Diagnosis**:
```bash
# Search for package
docker run --rm continuumio/miniconda3 conda search -c conda-forge package-name
```
**Solution**:
- Try different channels
- Use pip as fallback
- Check package name spelling
### Issue: Dependency conflicts
**Diagnosis**:
```dockerfile
# Enable verbose output
RUN conda install --verbose -y package-name
```
**Solution**:
- Relax version constraints
- Install packages in separate layers
- Use `conda-forge` channel (better dependency resolution)
### Issue: Slow installation
**Causes**:
- Large package dependencies
- Slow mirror servers
**Solutions**:
```dockerfile
# Use faster mirrors (China region)
RUN conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/ && \
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
```
### Issue: Large image size
**Diagnosis**:
```bash
# Check layer sizes
docker history your-image:tag
```
**Solutions**:
- Multi-stage builds (advanced)
- Combine installations into one layer
- Aggressive cleanup with `conda clean -afy`
- Exclude unnecessary packages
## Testing Conda Installation
### Basic Test
```dockerfile
RUN python -c "import your_package; print(your_package.__version__)"
```
### Dependency Check
```dockerfile
RUN conda list | grep -E "numpy|pandas|torch"
```
### Import Test Script
```dockerfile
COPY tests/test_imports.py /tmp/test_imports.py
RUN python /tmp/test_imports.py && rm /tmp/test_imports.py
```
## Version Management
### Pinning Conda Version
```dockerfile
RUN conda install -y conda=23.1
```
### Lock File for Reproducibility
Generate lock file:
```bash
conda env export > environment.yml
```
Use in Dockerfile:
```dockerfile
COPY environment.yml .
RUN conda env create -f environment.yml && \
conda clean -afy
```
## Security Best Practices
### Package Verification
```dockerfile
# Verify package signatures (if available)
RUN conda config --set safety_checks enabled
```
### Vulnerability Scanning
```bash
# Scan image for vulnerabilities
docker scan your-image:tag
```
### Regular Updates
```dockerfile
# Update conda base
RUN conda update -y conda && \
conda clean -afy
```
## Performance Optimization
### Pre-built Environments
For frequently used environments:
1. Build base image with common packages
2. Tag and push to registry
3. Use as base for custom images
### Caching Strategy
Structure Dockerfile for maximum caching:
```dockerfile
# Layer 1: Rarely changes
RUN conda install -y -c conda-forge \
numpy pandas scipy
# Layer 2: Occasionally changes
RUN conda install -y -c conda-forge \
transformers datasets
# Layer 3: Frequently changes
RUN conda install -y -c conda-forge \
your-custom-package
```
## Example Dockerfiles
### ML/AI Image
```dockerfile
FROM ubuntu:22.04
# Install Miniforge and create Python environments
# ... (system setup) ...
# Install ML packages in all environments
RUN for env in py310 py311 py312; do \
conda install -n $env -c conda-forge -c pytorch \
pytorch=2.0 \
torchvision \
transformers=4.30 \
datasets \
accelerate; \
done && \
conda clean -afy
# Set cache directories
ENV TRANSFORMERS_CACHE=/tmp/transformers
ENV HF_HOME=/tmp/huggingface
```
### Data Processing Image
```dockerfile
FROM ubuntu:22.04
# Install data processing packages in all environments
RUN for env in py310 py311 py312; do \
conda install -n $env -c conda-forge \
dask \
polars \
pyarrow \
fastparquet; \
done && \
conda clean -afy
# Configure memory
ENV PYTHONMALLOC=malloc
```
### Web Scraping Image
```dockerfile
FROM ubuntu:22.04
# Install web scraping packages
RUN conda install -y -c conda-forge \
requests \
beautifulsoup4 \
lxml \
selenium \
scrapy \
&& conda clean -afy
# Install browsers (if needed)
RUN apt-get update && \
apt-get install -y firefox && \
rm -rf /var/lib/apt/lists/*
```
## Summary Checklist
- [ ] Configure conda channels correctly
- [ ] Set channel priority to strict
- [ ] Pin package versions for reproducibility
- [ ] Install from appropriate channels
- [ ] Always run `conda clean -afy`
- [ ] Test package imports
- [ ] Check image size
- [ ] Document installed packages
- [ ] Version control Dockerfile
FILE:references/runtime-image-guides/dockerfile-templates.md
# Dockerfile Templates
Ready-to-use Dockerfile section templates for building custom runtime images.
## Template 1: Header/Metadata Section
```dockerfile
# Custom MaxFrame DPE Runtime Image
# Generated with best practices from runtime-image-guides/
#
# Configuration:
# - Base: Ubuntu 22.04 (stable, CUDA support)
# - Python: 3.11 (performance, production-ready)
# - GPU: CUDA 12.4, PyTorch 2.6.0+cu124
# - Packages: transformers, accelerate, datasets
FROM ubuntu:22.04
# Metadata
LABEL maintainer="user-defined"
LABEL description="Custom MaxFrame DPE runtime for ML workloads"
LABEL version="1.0"
```
## Template 2: Base Setup Section
```dockerfile
# Section: Base Image
# Pattern: Ubuntu 22.04 for ML/GPU workloads (best CUDA compatibility)
FROM ubuntu:22.04
# Use Aliyun mirror for Ubuntu packages (faster in China)
RUN echo "deb http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse\n\
deb http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse\n\
deb http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse\n\
deb http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse" \
> /etc/apt/sources.list
# Environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV LANG=en_US.UTF-8
ENV LC_ALL=en_US.UTF-8
ENV LC_CTYPE=en_US.UTF-8
ENV TZ="Asia/Shanghai"
ENV TERM=xterm-256color
# Install system dependencies
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y \
wget curl vim ca-certificates locales \
build-essential jq dnsutils ffmpeg tzdata strace gdb && \
locale-gen en_US.UTF-8 && \
update-locale LANG=en_US.UTF-8 && \
rm -rf /var/lib/apt/lists/*
```
## Template 3: Conda Setup Section
```dockerfile
# Section: Miniforge Installation
# Pattern: Miniforge over Miniconda (conda-forge by default, no Anaconda repo)
ENV MINIFORGE_HOME="/py-runtime"
ENV PATH="MINIFORGE_HOME/bin:PATH"
RUN wget -q https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O miniforge.sh && \
bash miniforge.sh -b -p MINIFORGE_HOME && \
rm -rf miniforge.sh
# Configure conda to use conda-forge with Tsinghua mirror (faster in China)
RUN conda config --remove channels defaults || true && \
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/ && \
conda config --set channel_priority strict && \
conda config --set show_channel_urls yes
# Configure pip to use Aliyun mirror (faster in China)
RUN mkdir -p ~/.pip && \
echo "[global]\n\
index-url = https://mirrors.aliyun.com/pypi/simple/\n\
trusted-host = mirrors.aliyun.com" > ~/.pip/pip.conf
# Section: Python Environment Creation
# Pattern: Single environment for production (Python 3.11)
RUN conda create -y -n py311 python=3.11 --override-channels -c conda-forge && \
conda clean -afy
```
## Template 4: Multi-Python Setup Section
```dockerfile
# Create multiple Python environments
RUN conda create -y -n py310 python=3.10 --override-channels -c conda-forge && \
conda create -y -n py311 python=3.11 --override-channels -c conda-forge && \
conda create -y -n py312 python=3.12 --override-channels -c conda-forge && \
conda clean -afy
```
## Template 5: GPU/CUDA Setup Section
```dockerfile
# Section: CUDA Installation
# Pattern: Ubuntu 22.04 + CUDA 12.4 (recommended for PyTorch)
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \
mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && rm -rf cuda-keyring_*.deb
RUN apt-get update && apt-get install -y cuda-toolkit && \
apt-get clean && rm -rf /var/lib/apt/lists/*
ENV CUDA_HOME=/usr/local/cuda
ENV PATH=$CUDA_HOME/bin:$PATH
# Section: PyTorch with CUDA
# Pattern: PyTorch 2.6.0+cu124 (CUDA 12.4 compatible)
RUN pip install torch==2.6.0+cu124 torchvision==0.21.0+cu124 torchaudio==2.6.0+cu124 \
--index-url https://download.pytorch.org/whl/cu124
```
## Template 6: Package Installation Section
```dockerfile
# Section: User Packages
# Pattern: Conda installation in all environments (from best practices guide)
RUN for env in py310 py311 py312; do \
echo "Installing packages in $env..." && \
conda install -n $env --override-channels \
-c conda-forge \
transformers accelerate datasets; \
done && \
conda clean -afy
```
## Template 7: Environment Configuration Section
```dockerfile
# Section: Environment Variables
# CRITICAL: MF_PYTHON_EXECUTABLE pattern (MaxFrame runtime detection)
ENV CONDA_DEFAULT_ENV=py311
ENV MF_PYTHON_EXECUTABLE=/py-runtime/envs/py311/bin/python
# Pattern: Always use conda environment path, never system Python
ENV LANG=en_US.UTF-8
ENV LC_ALL=en_US.UTF-8
ENV TZ="Asia/Shanghai"
```
## Template 8: Verification Section
```dockerfile
# Section: Verification
# Pattern: Health checks from best practices guide
RUN echo "=== Python Environments ===" && \
conda env list && \
echo "=== Default Python (py311) ===" && \
conda run -n py311 python --version
CMD ["conda", "run", "-n", "py311", "python"]
```
## Usage
Combine templates based on your requirements:
1. Start with Template 1 (Header)
2. Add Template 2 (Base Setup) - always required
3. Add Template 3 (Conda Setup) - always required
4. Add Template 4 (Multi-Python) OR use single environment in Template 3
5. Add Template 5 (GPU/CUDA) - if GPU support needed
6. Add Template 6 (Package Installation) - customize with your packages
7. Add Template 7 (Environment Config) - always required
8. Add Template 8 (Verification) - always recommended
## Related Guides
- **[Common Scenarios](common-scenarios.md)** - Complete example Dockerfiles
- **[Testing and Validation](testing-validation.md)** - How to test generated images
---
**Part of [Custom Runtime Image Guides](README.md)**
FILE:references/runtime-image-guides/environment-variables.md
# Environment Variables
Configure environment variables for your custom MaxFrame DPE runtime image.
## Required Environment Variables
```dockerfile
# System configuration
ENV DEBIAN_FRONTEND=noninteractive
ENV LANG=en_US.UTF-8
ENV LC_ALL=en_US.UTF-8
ENV LC_CTYPE=en_US.UTF-8
ENV TZ="Asia/Shanghai"
ENV TERM=xterm-256color
# Miniforge/Conda paths
ENV MINIFORGE_HOME="/py-runtime"
ENV PATH="MINIFORGE_HOME/bin:PATH"
# CRITICAL: MaxFrame Python executable detection
ENV CONDA_DEFAULT_ENV=py311
ENV MF_PYTHON_EXECUTABLE=/py-runtime/envs/py311/bin/python
```
## MF_PYTHON_EXECUTABLE (CRITICAL)
**Purpose:** MaxFrame uses this variable to locate Python interpreter for job execution.
**Pattern:** `/py-runtime/envs/<env_name>/bin/python`
**Must point to:** Conda environment's Python executable, NOT system Python
**Incorrect paths will cause:** Runtime failures
## Optional Custom Variables
```dockerfile
# Hugging Face cache directories
ENV TRANSFORMERS_CACHE=/tmp/transformers
ENV HF_HOME=/tmp/huggingface
# Python memory allocator
ENV PYTHONMALLOC=malloc
# Conda configuration
ENV CONDA_AUTO_UPDATE_CONDA=false
```
## Related Guides
- **[Python Environment Strategy](python-environment-strategy.md)** - MF_PYTHON_EXECUTABLE details
- **[Dockerfile Templates](dockerfile-templates.md)** - Environment configuration template
---
**Part of [Custom Runtime Image Guides](README.md)**
FILE:references/runtime-image-guides/gpu-cuda-configuration.md
# GPU/CUDA Configuration
Configure GPU and CUDA support for your custom MaxFrame DPE runtime image.
## CUDA Version Compatibility Matrix
| CUDA Version | Ubuntu Version | PyTorch Version | Status |
|--------------|----------------|-----------------|--------|
| **CUDA 12.4** | Ubuntu 22.04 | PyTorch 2.6.0+cu124 | ✅ Recommended |
| CUDA 12.1 | Ubuntu 22.04 | PyTorch 2.6.0+cu121 | ✅ Supported |
| CUDA 11.8 | Ubuntu 22.04 | PyTorch 2.6.0+cu118 | ✅ Supported |
**Recommended:** Ubuntu 22.04 + CUDA 12.4 + PyTorch 2.6.0+cu124
## Platform Considerations
**GPU support currently limited to x86_64 platform.**
- ✅ x86_64: Full GPU support
- ❌ aarch64: GPU not supported
## CUDA Installation Pattern (Ubuntu 22.04)
```dockerfile
# Add NVIDIA package repositories for Ubuntu 22.04
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \
mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && rm -rf cuda-keyring_*.deb
# Update and install CUDA toolkit
RUN apt-get update && apt-get install -y \
cuda-toolkit && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Set up CUDA environment
ENV CUDA_HOME=/usr/local/cuda
ENV PATH=$CUDA_HOME/bin:$PATH
```
## PyTorch Installation with CUDA
```dockerfile
# Install PyTorch 2.6.0 with CUDA 12.4
RUN pip install torch==2.6.0+cu124 torchvision==0.21.0+cu124 torchaudio==2.6.0+cu124 \
--index-url https://download.pytorch.org/whl/cu124
```
**CUDA Variants:**
- `cu124` - CUDA 12.4 (Recommended)
- `cu121` - CUDA 12.1
- `cu118` - CUDA 11.8
## Complete GPU Setup Pattern
```dockerfile
# 1. Install CUDA toolkit
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \
mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && rm -rf cuda-keyring_*.deb && \
apt-get update && apt-get install -y cuda-toolkit && \
apt-get clean && rm -rf /var/lib/apt/lists/*
ENV CUDA_HOME=/usr/local/cuda
ENV PATH=$CUDA_HOME/bin:$PATH
# 2. Install PyTorch with CUDA
RUN pip install torch==2.6.0+cu124 torchvision==0.21.0+cu124 torchaudio==2.6.0+cu124 \
--index-url https://download.pytorch.org/whl/cu124
# 3. Install user packages (in conda environment)
RUN conda install -n py311 -c conda-forge \
transformers accelerate datasets && \
conda clean -afy
```
## Related Guides
- **[Base Image Selection](base-image-selection.md)** - Ubuntu 22.04 required for GPU
- **[Package Management](package-management.md)** - GPU package handling
- **[Dockerfile Templates](dockerfile-templates.md)** - GPU setup template
---
**Part of [Custom Runtime Image Guides](README.md)**
FILE:references/runtime-image-guides/image-optimization.md
# Image Optimization
Optimize your custom MaxFrame DPE runtime image for size and build speed.
## Size Reduction Strategies
**1. Select Only Needed Python Versions**
- Single version: 0.8-1.2 GB
- 3 versions: 1.5-2.5 GB
- All versions: 3-5 GB
**2. Conda Clean Pattern**
```dockerfile
RUN conda clean -afy
```
Removes:
- Package tarballs (200-500 MB)
- Index cache (10-50 MB)
- Lock files (<1 MB)
**3. Apt Clean Pattern**
```dockerfile
RUN apt-get update && apt-get install -y packages && \
rm -rf /var/lib/apt/lists/*
```
**4. Multi-Stage Builds (Advanced)**
```dockerfile
# Stage 1: Build
FROM ubuntu:22.04 AS builder
# ... installation steps ...
# Stage 2: Runtime
FROM ubuntu:22.04
COPY --from=builder /py-runtime /py-runtime
```
## .dockerignore Patterns
Create `.dockerignore` file:
```
# Git
.git
.gitignore
# Documentation
*.md
docs/
# Python
__pycache__
*.pyc
*.pyo
.pytest_cache/
# Virtual environments
venv/
env/
# IDE
.vscode/
.idea/
# Build artifacts
dist/
build/
# Credentials
.env
*.pem
*.key
```
## Build Time Optimization
**Layer Caching Strategy:**
```dockerfile
# Layer 1: Rarely changes (system packages)
RUN apt-get update && apt-get install -y \
wget curl vim build-essential
# Layer 2: Rarely changes (Miniforge)
RUN wget ... && bash miniforge.sh ...
# Layer 3: Occasionally changes (Python environments)
RUN conda create -n py311 python=3.11
# Layer 4: Frequently changes (user packages)
RUN conda install -n py311 your-packages
```
## Related Guides
- **[Python Environment Strategy](python-environment-strategy.md)** - Version selection impacts size
- **[Package Management](package-management.md)** - Package cleanup patterns
---
**Part of [Custom Runtime Image Guides](README.md)**
FILE:references/runtime-image-guides/package-management.md
# Package Management
Install Python packages in your custom MaxFrame DPE runtime image.
## Conda vs pip Decision Guide
**Use Conda when:**
- Binary packages available (faster installation)
- Complex dependencies (C/C++ libraries)
- Scientific computing packages (numpy, scipy, pandas)
- ML frameworks (pytorch, tensorflow)
**Use pip when:**
- Python-only packages not in conda
- Latest versions not yet in conda
- Packages only available on PyPI
## Conda-forge Best Practices
**Why Miniforge over Miniconda:**
- ✅ Uses conda-forge channel by default
- ✅ Community-driven, open-source
- ✅ Smaller initial footprint
- ✅ No Anaconda repository configuration needed
- ✅ Better for conda-forge-first installations
**Channel Configuration:**
```dockerfile
# Configure conda to use conda-forge only
RUN conda config --remove channels defaults || true && \
conda config --add channels conda-forge && \
conda config --set channel_priority strict && \
conda config --set show_channel_urls true
```
## Mirror Acceleration (China Region)
For faster downloads in China region, configure mirrors:
**Conda Mirror (Tsinghua):**
```dockerfile
# Use Tsinghua mirror for conda-forge (faster in China)
RUN conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/ && \
conda config --set show_channel_urls yes
```
**Pip Mirror (Aliyun):**
```dockerfile
# Configure pip to use Aliyun mirror (faster in China)
RUN mkdir -p ~/.pip && \
echo "[global]\n\
index-url = https://mirrors.aliyun.com/pypi/simple/\n\
trusted-host = mirrors.aliyun.com" > ~/.pip/pip.conf
```
**Complete Mirror Setup:**
```dockerfile
# Ubuntu packages - Aliyun mirror
RUN echo "deb http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse\n\
deb http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse\n\
deb http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse\n\
deb http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse" \
> /etc/apt/sources.list
# Conda packages - Tsinghua mirror
RUN conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/ && \
conda config --set show_channel_urls yes
# Pip packages - Aliyun mirror
RUN mkdir -p ~/.pip && \
echo "[global]\n\
index-url = https://mirrors.aliyun.com/pypi/simple/\n\
trusted-host = mirrors.aliyun.com" > ~/.pip/pip.conf
```
## Package Installation Patterns
**Pattern 1: Simple Installation (no version constraints)**
```dockerfile
RUN conda install -n py311 -c conda-forge \
requests \
beautifulsoup4 \
lxml && \
conda clean -afy
```
**Pattern 2: Version Pinning**
```dockerfile
RUN conda install -n py311 -c conda-forge \
pandas=1.5 \
numpy=1.23 \
scipy=1.9 && \
conda clean -afy
```
**Pattern 3: Flexible Constraints**
```dockerfile
RUN conda install -n py311 -c conda-forge \
"pandas>=1.5,<2.0" \
"numpy>=1.23" && \
conda clean -afy
```
**Pattern 4: Multi-Environment Installation Loop**
```dockerfile
# Install packages in all selected Python environments
RUN for env in py310 py311 py312; do \
echo "Installing packages in $env..." && \
conda install -n $env --override-channels \
-c conda-forge \
transformers accelerate datasets; \
done && \
conda clean -afy
```
**Pattern 5: PyTorch from pytorch Channel**
```dockerfile
# Add pytorch channel when installing torch packages
RUN for env in py311; do \
conda install -n $env --override-channels \
-c pytorch \
-c conda-forge \
pytorch torchvision torchaudio; \
done && \
conda clean -afy
```
## GPU Package Handling
**PyTorch with CUDA (Recommended via pip):**
```dockerfile
# Install PyTorch with CUDA support via pip (not conda)
RUN pip install torch==2.6.0+cu124 torchvision==0.21.0+cu124 torchaudio==2.6.0+cu124 \
--index-url https://download.pytorch.org/whl/cu124
```
**Why pip for PyTorch+CUDA:**
- PyTorch provides optimized CUDA builds via pip
- Easier CUDA variant selection (cu124, cu121, cu118)
- Better compatibility with CUDA toolkit installed in image
## Pip Fallback Pattern
When package not available in conda-forge:
```dockerfile
# Install via pip in each environment
RUN for env in py311; do \
conda run -n $env pip install --no-cache-dir \
some-pip-only-package \
another-package; \
done
```
## Related Guides
- **[GPU/CUDA Configuration](gpu-cuda-configuration.md)** - PyTorch with CUDA setup
- **[Python Environment Strategy](python-environment-strategy.md)** - Multi-environment architecture
- **[Image Optimization](image-optimization.md)** - Cleaning package cache
---
**Part of [Custom Runtime Image Guides](README.md)**
FILE:references/runtime-image-guides/practical-guides.md
# Practical Guides for Runtime Image Creation
This document provides practical guidance for different use cases when creating custom MaxFrame DPE runtime images.
---
## Table of Contents
1. [Conda Distribution Selection](#conda-distribution-selection)
2. [GPU/CUDA Support](#gpucuda-support)
3. [Common Scenarios](#common-scenarios)
4. [Best Practices](#best-practices)
---
## Conda Distribution Selection
### Why Miniforge over Miniconda?
We **strongly recommend using Miniforge** (conda-forge distribution) instead of Miniconda or Anaconda for the following reasons:
#### License Considerations
| Distribution | License | Commercial Use | Notes |
|-------------|---------|----------------|-------|
| **Miniforge** | **BSD-3-Clause** | ✅ **Unrestricted** | Community-driven, fully open-source |
| **Miniconda** | Apache 2.0 + Anaconda ToS | ⚠️ **Restricted** | Anaconda Terms of Service may apply |
| **Anaconda** | Proprietary + Commercial ToS | ❌ **Requires License** | Commercial use requires paid license |
**Important**: Anaconda Inc.'s Terms of Service restrict commercial use of their package repositories. For enterprise/commercial use, Miniforge is the safest choice.
#### Technical Advantages of Miniforge
✅ **Better conda-forge integration**
- Pre-configured to use conda-forge channel
- No Anaconda repository needed
- Cleaner channel configuration
✅ **Smaller footprint**
- Minimal base installation
- Only conda-forge packages by default
- Faster installation
✅ **Community-driven**
- No vendor lock-in
- Transparent governance
- Open-source packages only
#### Miniconda Issues
❌ **Requires manual configuration**
- Default channels point to Anaconda
- Need to disable defaults
- More complex setup
❌ **Terms of Service complications**
- Commercial use restrictions
- May require license for enterprise
- Legal complexity
❌ **Larger initial size**
- Includes Anaconda-specific packages
- Unnecessary dependencies
- Slower builds
### How to Use Miniforge
The runtime image creator uses Miniforge by default. The generated Dockerfile includes:
```dockerfile
# Install Miniforge (conda-forge distribution)
RUN wget -q https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O miniforge.sh && \
bash miniforge.sh -b -p MINIFORGE_HOME && \
rm -rf miniforge.sh
# Configure conda to use conda-forge only
RUN conda config --remove channels defaults || true && \
conda config --add channels conda-forge && \
conda config --set channel_priority strict
```
---
## GPU/CUDA Support
### When to Use GPU Runtime
Use GPU-enabled runtime when you need:
- Deep learning frameworks (PyTorch, TensorFlow)
- GPU-accelerated computing (CUDA libraries)
- Machine learning model training/inference
- Computer vision workloads
- Large-scale numerical computing
**Do NOT use GPU runtime for:**
- CPU-only data processing
- Simple ETL jobs
- Lightweight Python scripts
- Development/testing without GPU access
### CUDA Version Selection
Choose the CUDA version based on your PyTorch/TensorFlow requirements:
**Recommended Production Configuration:**
- **Ubuntu 22.04 + CUDA 12.4** - Our default recommended setup
- PyTorch 2.6.0+ with CUDA 12.4 support
- Best compatibility with modern GPUs (Ampere, Hopper)
- Latest features and performance optimizations
| CUDA Version | PyTorch Version | TensorFlow Version | CUDA Compute Capability | Base OS |
|--------------|----------------|-------------------|------------------------|---------|
| **CUDA 12.4** ✅ | PyTorch 2.6.0+ | TensorFlow 2.16+ | Compute 8.0+ (Ampere+) | **Ubuntu 22.04** (Recommended) |
| **CUDA 12.1** | PyTorch 2.6.0+ | TensorFlow 2.15+ | Compute 7.0+ (Volta+) | Ubuntu 22.04 |
| **CUDA 11.8** | PyTorch 2.6.0+ | TensorFlow 2.12+ | Compute 6.0+ (Pascal+) | Ubuntu 22.04 |
**Why Ubuntu 22.04 + CUDA 12.4 is recommended:**
- ✅ Long-term support until 2027
- ✅ Latest CUDA features and optimizations
- ✅ Best compatibility with modern ML frameworks
- ✅ Production-ready and stable
- ✅ Supports latest GPU architectures (Hopper, Ampere)
### GPU Runtime Creation
Use the conversational workflow documented in SKILL.md "Scenario 4: Create Custom Runtime Image" for GPU runtime creation.
#### Option 1: GPU Support Only (CUDA Toolkit)
Use when you need CUDA libraries but will install PyTorch/TensorFlow manually or via conda.
**Conversational Workflow Choices:**
1. **Base Image:** Select Ubuntu 22.04 (recommended for GPU)
2. **Python Versions:** Select versions (e.g., 3.10, 3.11)
3. **GPU Support:** Select "Yes - GPU-enabled with CUDA 12.4"
4. **Packages:** Specify your packages (e.g., numpy pandas scipy)
**Generated Dockerfile includes:**
- NVIDIA CUDA repository (Ubuntu 22.04)
- CUDA toolkit installation
- CUDA environment variables (CUDA_HOME, PATH)
#### Option 2: GPU + PyTorch with CUDA (Recommended)
Use when you need PyTorch with CUDA support pre-installed.
**Conversational Workflow Choices:**
1. **Base Image:** Select Ubuntu 22.04 (required for GPU)
2. **Python Versions:** Select versions (e.g., 3.10, 3.11)
3. **GPU Support:** Select "Yes - GPU-enabled with CUDA 12.4"
4. **Packages:** Specify PyTorch-dependent packages (e.g., transformers datasets accelerate)
**Generated Dockerfile includes:**
- CUDA toolkit (Ubuntu 22.04 + CUDA 12.4)
- PyTorch 2.6.0 with CUDA 12.4
- torchvision 0.21.0+cu124, torchaudio 2.6.0+cu124
- All specified user packages
#### Option 3: CPU-Only PyTorch
Use when you need PyTorch but no GPU support.
```bash
python3 scripts/generate_dockerfile.py \
--packages pytorch torchvision torchaudio cpuonly -c pytorch \
--python-versions 3.11 \
--image-tag my-pytorch-cpu:v1
```
### PyTorch CUDA Compatibility
The runtime image creator automatically handles PyTorch CUDA version matching:
**Default Production Configuration (Recommended):**
```dockerfile
# Ubuntu 22.04 + CUDA 12.4 + PyTorch 2.6.0
RUN pip install torch==2.6.0+cu124 torchvision==0.21.0+cu124 torchaudio==2.6.0+cu124 --index-url https://download.pytorch.org/whl/cu124
```
**Alternative CUDA Variants:**
```dockerfile
# CUDA 12.1
RUN pip install torch==2.6.0+cu121 torchvision==0.21.0+cu121 torchaudio==2.6.0+cu121 --index-url https://download.pytorch.org/whl/cu121
# CUDA 11.8 (legacy support)
RUN pip install torch==2.6.0+cu118 torchvision==0.21.0+cu118 torchaudio==2.6.0+cu118 --index-url https://download.pytorch.org/whl/cu118
```
**Note**: PyTorch with CUDA is installed via pip, not conda, to ensure exact CUDA variant matching.
---
## Common Scenarios
Use the conversational workflow documented in SKILL.md "Scenario 4: Create Custom Runtime Image" for all scenarios below.
### Scenario 1: NLP/LLM Inference (GPU)
**Requirements:**
- Transformers library
- PyTorch with CUDA
- Python 3.10+
**Conversational Workflow Choices:**
1. Base Image: Ubuntu 22.04 (for GPU support)
2. Python Versions: 3.10, 3.11
3. GPU Support: Yes - GPU-enabled with CUDA 12.4
4. Packages: transformers accelerate sentencepiece tokenizers
**Generated Dockerfile will include:**
- Ubuntu 22.04 base
- CUDA 12.4 toolkit
- PyTorch 2.6.0 with CUDA 12.4
- Transformers, accelerate, and related NLP libraries
- Python 3.10 and 3.11 environments
### Scenario 2: Computer Vision (GPU)
**Requirements:**
- PyTorch with CUDA
- OpenCV, PIL
- Image processing libraries
**Conversational Workflow Choices:**
1. Base Image: Ubuntu 22.04 (for GPU support)
2. Python Versions: 3.11
3. GPU Support: Yes - GPU-enabled with CUDA 12.4
4. Packages: opencv pillow scikit-image albumentations
### Scenario 3: Data Science (CPU)
**Requirements:**
- Pandas, NumPy, Scikit-learn
- Data analysis libraries
- CPU-only
**Conversational Workflow Choices:**
1. Base Image: Ubuntu 22.04 or 24.04
2. Python Versions: 3.10, 3.11, 3.12
3. GPU Support: No - CPU only
4. Packages: pandas numpy scikit-learn matplotlib seaborn
### Scenario 4: Large Language Models with vLLM/SGLang
**Requirements:**
- vLLM or SGLang for fast inference
- PyTorch with CUDA
- Latest Python
**Conversational Workflow Choices:**
1. Base Image: Ubuntu 22.04 (for GPU support)
2. Python Versions: 3.11
3. GPU Support: Yes - GPU-enabled with CUDA 12.4
4. Packages: vllm
--pytorch-cuda-install \
--image-tag vllm-runtime:v1
# OR for SGLang
python3 scripts/generate_dockerfile.py \
--packages sglang transformers diffusers peft \
--python-versions 3.10 \
--enable-gpu \
--cuda-variant cu124 \
--pytorch-cuda-install \
--image-tag sglang-runtime:v1
```
**Note**: vLLM and SGLang require specific CUDA versions. Check their documentation for compatibility.
### Scenario 5: Distributed Training with DeepSpeed
**Requirements:**
- DeepSpeed for distributed training
- PyTorch with CUDA
- MPI support
**Conversational Workflow Choices:**
1. Base Image: Ubuntu 22.04 (for GPU support)
2. Python Versions: 3.10
3. GPU Support: Yes - GPU-enabled with CUDA 12.1 (broader compatibility for DeepSpeed)
4. Packages: deepspeed mpi4py
---
## Best Practices
### 1. Python Version Selection
**Recommended**: Use Python 3.10 or 3.11 for GPU workloads
- Python 3.10+: Best compatibility with ML libraries
- Python 3.11: Best performance (significant speedup)
- Python 3.12: Latest features, but some packages may not be available
**Legacy**: Only use Python 3.7-3.9 if required for compatibility
### 2. Image Size Optimization
**Reduce image size by:**
✅ **Selecting only needed Python versions**
During conversational workflow:
- Good: Select single version (e.g., 3.11)
- Avoid: Selecting all versions if not needed
✅ **Using conda clean**
```dockerfile
RUN conda clean -afy
```
✅ **Minimizing system packages**
- Only include what you need
- Remove build dependencies if not needed
### 3. CUDA Version Strategy
**Recommended: Ubuntu 22.04 + CUDA 12.4 for Production**
**Match CUDA to your GPU hardware:**
| GPU Generation | Recommended CUDA | Base OS | Examples |
|----------------|------------------|---------|----------|
| Hopper (H100) | **CUDA 12.4** ✅ | Ubuntu 22.04 | Latest datacenters |
| Ampere (A100, A10) | **CUDA 12.4** ✅ | Ubuntu 22.04 | Modern datacenters |
| Volta (V100) | CUDA 12.1 or 11.8 | Ubuntu 22.04 | Older datacenters |
| Turing (T4) | CUDA 12.1 or 11.8 | Ubuntu 22.04 | Cloud instances |
**Match CUDA to your framework:**
- **Default**: Ubuntu 22.04 + CUDA 12.4 - Best for modern workloads
- Check PyTorch/TensorFlow compatibility
- CUDA 12.4 provides latest features and optimizations
- Use CUDA 12.1/11.8 only if required for older GPU compatibility
### 4. Security Considerations
**DO NOT include in images:**
- ❌ Credentials or API keys
- ❌ Private SSH keys
- ❌ Database passwords
- ❌ Access tokens
**DO include:**
- ✅ Public certificates (if needed)
- ✅ Configuration templates
- ✅ Documentation for required secrets
**Pass secrets at runtime:**
```bash
docker run -e ODPS_ACCESS_KEY=$ODPS_ACCESS_KEY \
-e HUGGING_FACE_HUB_TOKEN=$HF_TOKEN \
my-image:tag
```
### 5. Build Optimization
**Use build caching:**
```bash
# Build with cache (faster for development)
docker build -t myimage:v1 .
# Build without cache (production)
docker build --no-cache -t myimage:v1 .
```
**Multi-platform builds:**
```bash
# Build for x86_64
docker build --build-arg TARGETARCH=amd64 -t myimage:v1 .
# Build for arm64 (experimental)
docker build --build-arg TARGETARCH=arm64 -t myimage:v1 .
```
### 6. Testing Your Image
**Basic health check:**
```bash
docker run --rm myimage:v1 python --version
docker run --rm myimage:v1 conda list
```
**GPU verification:**
```bash
# Check CUDA
docker run --rm --gpus all myimage:v1 nvidia-smi
docker run --rm --gpus all myimage:v1 nvcc --version
# Check PyTorch CUDA
docker run --rm --gpus all myimage:v1 python -c "import torch; print(torch.cuda.is_available())"
```
**Package verification:**
```bash
docker run --rm myimage:v1 python -c "
import torch
import transformers
print(f'PyTorch: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'Transformers: {transformers.__version__}')
"
```
---
## Troubleshooting
### Issue: CUDA out of memory
**Cause**: GPU memory limit exceeded
**Solution:**
1. Use smaller batch sizes
2. Enable gradient checkpointing
3. Use mixed precision training
4. Reduce model size
### Issue: PyTorch doesn't detect GPU
**Cause**: CUDA mismatch or driver issue
**Solution:**
```bash
# Check CUDA version
nvidia-smi
nvcc --version
# Check PyTorch CUDA
python -c "import torch; print(torch.version.cuda)"
# Rebuild with correct CUDA variant
--cuda-variant cu124 # Match to your system
```
### Issue: Package not found in conda-forge
**Solution:**
1. Check package name: `conda search -c conda-forge <package>`
2. Try alternative name
3. Use pip fallback:
```dockerfile
RUN pip install <package-name>
```
### Issue: Image size too large
**Solution:**
1. Use fewer Python versions
2. Remove unnecessary packages
3. Use multi-stage builds
4. Clean conda cache: `conda clean -afy`
---
## Summary
### Key Recommendations
1. **Use Miniforge** (not Miniconda) for unrestricted commercial use
2. **Match CUDA version** to your GPU hardware and framework requirements
3. **Select minimal Python versions** to reduce image size
4. **Test locally** before pushing to production
5. **Never include secrets** in Docker images
### Quick Reference
| Use Case | Python | CUDA | Command Example |
|----------|--------|------|-----------------|
| **LLM Inference** | 3.10, 3.11 | cu124 | `--enable-gpu --cuda-variant cu124 --pytorch-cuda-install` |
| **Computer Vision** | 3.11 | cu124 | `--enable-gpu --cuda-variant cu124 --pytorch-cuda-install` |
| **Data Science** | 3.10-3.12 | N/A | No GPU flags |
| **Legacy ML** | 3.9 | cu118 | `--enable-gpu --cuda-variant cu118 --pytorch-cuda-install` |
For more information, see:
- `conda_best_practices.md` - Conda usage guidelines
- `base_image_details.md` - Base image architecture
- `quick_start_guide.md` - Getting started guide
FILE:references/runtime-image-guides/python-environment-strategy.md
# Python Environment Strategy
Configure Python environments for your custom MaxFrame DPE runtime image.
## Multi-Environment Architecture
The custom DPE runtime creates isolated conda environments for each Python version:
```
/py-runtime/ # MINIFORGE_HOME
├── bin/ # Conda executables
│ ├── conda
│ ├── pip
│ └── python -> ../envs/py311/bin/python
├── envs/ # Conda environments
│ ├── py37/ # Python 3.7 environment
│ │ └── bin/python3.7
│ ├── py38/ # Python 3.8 environment
│ ├── py39/ # Python 3.9 environment
│ ├── py310/ # Python 3.10 environment
│ ├── py311/ # Python 3.11 environment (recommended default)
│ └── py312/ # Python 3.12 environment
└── pkgs/ # Package cache
```
## Environment Naming Pattern
**Pattern:** `py<version>` (no dots)
- Python 3.7 → `py37`
- Python 3.8 → `py38`
- Python 3.9 → `py39`
- Python 3.10 → `py310`
- Python 3.11 → `py311`
- Python 3.12 → `py312`
## Version Selection Guidelines
**Python 3.7:**
- ✅ Wide package compatibility
- ⚠️ End of life (EOL): June 2023
- **Use only for legacy compatibility**
**Python 3.8:**
- ✅ Good package support
- ⚠️ EOL: October 2024
- Stable, but aging
**Python 3.9:**
- ✅ Excellent package support
- ✅ Stable and well-tested
- **Good balance for most use cases**
**Python 3.10:**
- ✅ Latest stable features
- ✅ Excellent package support
- **Good balance of features and stability**
**Python 3.11 (Recommended for production):**
- ✅ Significant performance improvements
- ✅ Excellent package support
- ✅ Modern, efficient
- **Best for production deployments**
**Python 3.12:**
- ✅ Latest Python features
- ⚠️ Some packages may not be available yet
- **Use for cutting-edge development**
## Size/Stability/Flexibility Trade-offs
| Configuration | Image Size | Flexibility | Build Time | Use Case |
|--------------|------------|-------------|------------|----------|
| **Single version (3.11)** | 0.8-1.2 GB | Minimal | 2-5 min | Production |
| **3 versions (3.10-3.12)** | 1.5-2.5 GB | Medium | 5-10 min | Development |
| **All versions (3.7-3.12)** | 3-5 GB | Maximum | 10-20 min | Testing/compatibility |
## MF_PYTHON_EXECUTABLE Pattern (CRITICAL)
**Why Critical:** MaxFrame uses this environment variable to detect the Python executable at runtime. Incorrect configuration will cause runtime failures.
**Pattern:**
```dockerfile
ENV MF_PYTHON_EXECUTABLE=/py-runtime/envs/<env_name>/bin/python
```
**Path Structure:** `/py-runtime/envs/<env_name>/bin/python`
**Example:**
```dockerfile
# Default to Python 3.11
ENV CONDA_DEFAULT_ENV=py311
ENV MF_PYTHON_EXECUTABLE=/py-runtime/envs/py311/bin/python
```
**Default Selection Logic:**
1. If Python 3.11 is in selected versions → Use `py311` as default
2. Otherwise → Use highest selected version (e.g., `py312` if 3.12 is highest)
**Dockerfile Pattern:**
```dockerfile
# For Python 3.11 only
ENV CONDA_DEFAULT_ENV=py311
ENV MF_PYTHON_EXECUTABLE=/py-runtime/envs/py311/bin/python
# For multi-Python (3.10, 3.11, 3.12) - default to 3.11
ENV CONDA_DEFAULT_ENV=py311
ENV MF_PYTHON_EXECUTABLE=/py-runtime/envs/py311/bin/python
```
## Related Guides
- **[Package Management](package-management.md)** - Installing packages in environments
- **[Environment Variables](environment-variables.md)** - All environment variable patterns
- **[Dockerfile Templates](dockerfile-templates.md)** - Environment creation templates
---
**Part of [Custom Runtime Image Guides](README.md)**
FILE:references/runtime-image-guides/system-dependencies.md
# System Dependencies
Install essential system packages and configure the base system.
## Essential System Packages
```dockerfile
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y \
wget \
curl \
vim \
ca-certificates \
locales \
build-essential \
jq \
dnsutils \
ffmpeg \
tzdata \
strace \
gdb && \
rm -rf /var/lib/apt/lists/*
```
**Package Purposes:**
- `wget`, `curl` - Download tools
- `vim` - Text editor
- `ca-certificates` - SSL certificates
- `locales` - Locale support
- `build-essential` - Compilation tools (gcc, make, etc.)
- `jq` - JSON processor
- `dnsutils` - DNS utilities
- `ffmpeg` - Media processing
- `tzdata` - Timezone data
- `strace`, `gdb` - Debugging tools
## Locale and Timezone Configuration
```dockerfile
ENV DEBIAN_FRONTEND=noninteractive
ENV LANG=en_US.UTF-8
ENV LC_ALL=en_US.UTF-8
ENV LC_CTYPE=en_US.UTF-8
ENV TZ="Asia/Shanghai"
ENV TERM=xterm-256color
RUN locale-gen en_US.UTF-8 && \
update-locale LANG=en_US.UTF-8
```
## Miniforge Installation Pattern
```dockerfile
ENV MINIFORGE_HOME="/py-runtime"
ENV PATH="MINIFORGE_HOME/bin:PATH"
RUN wget -q https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O miniforge.sh && \
bash miniforge.sh -b -p MINIFORGE_HOME && \
rm -rf miniforge.sh
```
**Why Miniforge:**
- Conda-forge distribution (no Anaconda repo needed)
- Smaller footprint than Miniconda
- BSD-3-Clause license (unrestricted commercial use)
## ossfs2 Installation Pattern
```dockerfile
# Install ossfs2 for OSS filesystem mounting (x86_64 only)
RUN ARCH=$(echo -amd64 | sed 's/amd64/x86_64/' | sed 's/arm64/aarch64/') && \
if [ "$ARCH" = "x86_64" ]; then \
wget -O ossfs2_2.0.3.1_linux_x86_64.deb "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20260123/ynfgkj/ossfs2_2.0.3.1_linux_x86_64.deb?spm=a2c4g.11186623.0.0.82ac640fdy2HPL&file=ossfs2_2.0.3.1_linux_x86_64.deb" && \
dpkg -i ossfs2_2.0.3.1_linux_x86_64.deb && \
ossfs2 --version && \
rm -rf ossfs2_2.0.3.1_linux_x86_64.deb; \
else \
echo "Warning: ossfs2 official packages not available for $ARCH"; \
fi
```
**Note:** ossfs2 only available for x86_64. For aarch64, users need to build from source.
## Related Guides
- **[Package Management](package-management.md)** - Installing Python packages
- **[Dockerfile Templates](dockerfile-templates.md)** - Complete base setup template
---
**Part of [Custom Runtime Image Guides](README.md)**
FILE:references/runtime-image-guides/testing-validation.md
# Testing and Validation
Validate your custom MaxFrame DPE runtime image.
## Health Check Commands
**Test basic Python:**
```bash
docker run --rm <image> conda run -n py311 python --version
# Expected: Python 3.11.x
```
**Test GPU availability:**
```bash
docker run --rm --gpus all <image> python -c "import torch; print(torch.cuda.is_available())"
# Expected: True
```
**Test package import:**
```bash
docker run --rm <image> conda run -n py311 python -c "import transformers; print(transformers.__version__)"
# Expected: Version number
```
## Environment Verification
**List all environments:**
```bash
docker run --rm <image> conda env list
```
**Check Python in each environment:**
```bash
docker run --rm <image> bash -c "for env in py310 py311 py312; do echo $env:; conda run -n $env python --version; done"
```
## Package Import Tests
**Test multiple packages:**
```bash
docker run --rm <image> conda run -n py311 python -c "
import sys
packages = ['transformers', 'torch', 'pandas']
for pkg in packages:
try:
mod = __import__(pkg)
version = getattr(mod, '__version__', 'unknown')
print(f'{pkg}: {version}')
except ImportError as e:
print(f'{pkg}: FAILED - {e}')
"
```
## Integration Test with MaxFrame
```python
# test_maxframe.py
from maxframe.session import new_session
import maxframe.dataframe as md
session = new_session(image="your-image:v1")
# Test basic operation
df = md.read_odps_table("test_table")
print(df.head())
session.destroy()
```
## Image Size Check
```bash
docker images <image:tag>
# Expected sizes:
# - Single Python: 0.8-1.2 GB
# - 3 Pythons: 1.5-2.5 GB
# - All Pythons: 3-5 GB
```
## Summary Checklist
Before deploying, verify:
- [ ] Python version correct
- [ ] All packages importable
- [ ] MF_PYTHON_EXECUTABLE correctly set
- [ ] GPU available (if GPU image)
- [ ] Image size reasonable
- [ ] Integration test passes
## Related Guides
- **[Dockerfile Templates](dockerfile-templates.md)** - Template 8 includes verification
- **[Common Scenarios](common-scenarios.md)** - Examples include verification steps
---
**Part of [Custom Runtime Image Guides](README.md)**
FILE:scripts/lookup_operator.py
#!/usr/bin/env python3
# /// script
# requires-python = ">=3.7"
# dependencies = []
# ///
"""
MaxFrame API Documentation Query Tool
This script reads Sphinx Markdown output to provide structured API documentation
for use in agent skills and context control.
Features:
- Directly reads Markdown files from `make markdown` output
- Zero complex parsing - Markdown is already structured and LLM-friendly
- Fuzzy search: Supports glob patterns and content search
- Section extraction: Get specific sections (params, examples, etc.)
Usage:
python lookup_operator.py search <pattern> # Search names + content (default)
python lookup_operator.py search <pattern> -n # Search names only
python lookup_operator.py search <pattern> --fold # Fold output to save tokens
python lookup_operator.py info <name> # Get full markdown content
python lookup_operator.py list # List all operators
python lookup_operator.py list --fold # Fold output to save tokens
Prerequisites:
Run `make markdown` in the docs directory first.
Sections available:
- signature, description, params/parameters, returns, return_type
- see_also, notes, examples, warnings, references
"""
import argparse
import fnmatch
import json
import re
import sys
from pathlib import Path
from typing import Dict, List, Optional
# Default path to Markdown build output
DEFAULT_MD_DIR = Path(__file__).parent.parent / "references" / "maxframe-client-docs" / "reference"
class APIDocParser:
"""Parser for Sphinx Markdown output files."""
def __init__(self, md_dir: Path = DEFAULT_MD_DIR):
self.md_dir = md_dir
self._index: Optional[Dict[str, Path]] = None
def _build_index(self) -> Dict[str, Path]:
"""Build an index of operator names to their Markdown file paths."""
if self._index is not None:
return self._index
self._index = {}
if not self.md_dir.exists():
return self._index
# Walk through all .md files
for md_path in self.md_dir.rglob("*.md"):
name = md_path.stem # Remove .md extension
# Only index API entries (those with qualified names like maxframe.xxx)
if name.startswith("maxframe.") and "." in name:
self._index[name] = md_path
return self._index
def list_all(self, compact: bool = False) -> List[str]:
"""
List all available operator names.
Args:
compact: If True, return tree-like format with common prefixes folded
"""
names = sorted(self._build_index().keys())
if not compact:
return names
return self._fold_names(names)
def _fold_names(self, names: List[str]) -> List[str]:
"""Fold names with common prefixes into tree-like format."""
if not names:
return []
result = []
prev_parts = []
for name in names:
parts = name.split(".")
# Find common prefix length with previous
common = 0
for i, (p, prev) in enumerate(zip(parts, prev_parts)):
if p == prev:
common = i + 1
else:
break
# Output the differing parts
if common == 0:
# No common prefix, output full name
result.append(name)
else:
# Indent based on common prefix depth, show only suffix
suffix = ".".join(parts[common:])
indent = " " * common
result.append(f"{indent}.{suffix}")
prev_parts = parts
return result
def search(self, pattern: str, search_content: bool = False) -> List[str]:
"""
Search for operators matching a pattern.
Args:
pattern: Search pattern (supports glob wildcards)
search_content: If True, also search in content
Examples:
search('apply_chunk') -> matches '*apply_chunk*'
search('DataFrame.apply') -> matches '*DataFrame.apply*'
search('mf.*') -> matches '*mf.*'
"""
index = self._build_index()
# Normalize pattern for glob matching
# If pattern has wildcards, we need to be smarter about wrapping
has_leading_wildcard = pattern.startswith("*")
has_trailing_wildcard = pattern.endswith("*")
if "*" in pattern or "?" in pattern or "[" in pattern:
# Pattern has wildcards - add leading/trailing wildcards if not present
glob_pattern = pattern
if not has_leading_wildcard:
glob_pattern = "*" + glob_pattern
if not has_trailing_wildcard:
glob_pattern = glob_pattern + "*"
else:
# No wildcards - wrap with wildcards for substring match
glob_pattern = f"*{pattern}*"
matches = set()
# Search by name
for name in index.keys():
if fnmatch.fnmatch(name, glob_pattern) or fnmatch.fnmatch(
name.lower(), glob_pattern.lower()
):
matches.add(name)
# Search by content if requested
if search_content:
pattern_lower = pattern.lower()
for name, md_path in index.items():
if name in matches:
continue
try:
content = md_path.read_text(encoding="utf-8").lower()
if pattern_lower in content:
matches.add(name)
except IOError:
continue
return sorted(matches)
def _resolve_name(self, name: str) -> str:
"""Resolve a fuzzy name to full qualified name (case-insensitive)."""
index = self._build_index()
# Exact match first (case-insensitive)
name_lower = name.lower()
for key in index.keys():
if key.lower() == name_lower:
return key
# Try fuzzy search
matches = self.search(name)
if len(matches) == 0:
raise ValueError(f"No operator found matching '{name}'")
elif len(matches) == 1:
return matches[0]
else:
# Check for exact suffix match
suffix_matches = [m for m in matches if m.endswith(f".{name}") or m.endswith(name)]
if len(suffix_matches) == 1:
return suffix_matches[0]
raise ValueError(
f"Multiple operators match '{name}': {matches[:10]}"
+ (f" (and {len(matches) - 10} more)" if len(matches) > 10 else "")
)
def get(self, name: str) -> str:
"""Get full markdown content for an operator."""
full_name = self._resolve_name(name)
md_path = self._build_index()[full_name]
return md_path.read_text(encoding="utf-8")
def get_section(self, name: str, section: str) -> str:
"""
Get a specific section from the markdown content.
Args:
name: Operator name (supports fuzzy matching)
section: Section name (case-insensitive):
- signature: The function signature line
- description: Description paragraphs
- params/parameters: Parameters section
- returns: Returns section
- return_type: Return type
- see_also: See Also section
- notes: Notes section
- examples: Examples section
- warnings: Warnings section
- references: References section
Returns:
The section content as string, or empty string if not found.
"""
content = self.get(name)
section = section.lower()
# Section name aliases
aliases = {
"params": "parameters",
"param": "parameters",
"return": "returns",
"seealso": "see_also",
"see also": "see_also",
"example": "examples",
"note": "notes",
"warning": "warnings",
"reference": "references",
}
section = aliases.get(section, section)
lines = content.split("\n")
# Special case: signature (first #### line after title)
if section == "signature":
for line in lines:
if line.startswith("#### ") and "(" in line:
return line[5:].strip() # Remove "#### " prefix
return ""
# Special case: description (text between title and signature)
if section == "description":
desc_lines = []
after_title = False
for line in lines:
# Skip title
if line.startswith("# "):
after_title = True
continue
# Stop at signature
if line.startswith("#### ") and "(" in line:
break
# Stop at section markers
if line.startswith("* **") or line.startswith("### "):
break
# Collect description after title
if after_title and line.strip():
desc_lines.append(line)
return "\n".join(desc_lines).strip()
# Parameters section (special format: * **Parameters:** followed by list)
if section == "parameters":
return self._extract_list_section(lines, "Parameters")
# Returns section
if section == "returns":
return self._extract_inline_section(lines, "Returns")
# Return type section
if section == "return_type":
return self._extract_inline_section(lines, "Return type")
# Standard header sections (### or #### followed by section name)
section_map = {
"see_also": ["SEE ALSO", "See Also"],
"notes": ["Notes", "Note"],
"examples": ["Examples", "Example"],
"warnings": ["Warnings", "Warning"],
"references": ["References", "Reference"],
}
headers = section_map.get(section, [section.title()])
return self._extract_header_section(lines, headers)
def _extract_list_section(self, lines: List[str], section_name: str) -> str:
"""Extract a list section like Parameters."""
result = []
in_section = False
for i, line in enumerate(lines):
if f"**{section_name}:**" in line:
in_section = True
# Include the rest of this line if there's content after :**
continue
elif in_section:
# Stop at next top-level section
if line.startswith("* **") and ":" in line and not line.startswith(" "):
# Check if this is a new section (Returns:, Return type:, etc.)
if any(s in line for s in ["Returns:", "Return type:", "Raises:", "Yields:"]):
break
result.append(line)
return "\n".join(result).strip()
def _extract_inline_section(self, lines: List[str], section_name: str) -> str:
"""Extract an inline section like Returns or Return type."""
result = []
in_section = False
for i, line in enumerate(lines):
if f"**{section_name}:**" in line:
in_section = True
# Check if there's content on the same line after :**
match = re.search(rf"\*\*{section_name}:\*\*\s*(.*)", line)
if match and match.group(1).strip():
result.append(match.group(1).strip())
continue
elif in_section:
# Stop at next top-level section marker
if line.startswith("* **") and ":" in line and not line.startswith(" "):
break
# Stop at header
if line.startswith("### ") or line.startswith("#### "):
break
# Collect content (typically indented on next lines)
if line.strip():
result.append(line.strip())
return "\n".join(result).strip()
def _extract_header_section(self, lines: List[str], headers: List[str]) -> str:
"""Extract a section that starts with a header."""
result = []
in_section = False
for i, line in enumerate(lines):
stripped = line.strip()
# Check for section start
if not in_section:
for header in headers:
if stripped in [f"### {header}", f"#### {header}", f"# {header}", header]:
in_section = True
break
continue # Don't include the header line itself
if in_section:
# Stop at next header section (### or #### that doesn't contain function signature)
if stripped.startswith("### ") or stripped.startswith("#### "):
# But don't stop at code block headers
if "(" not in stripped and not stripped.startswith("```"):
break
result.append(line)
return "\n".join(result).strip()
def get_info(self, name: str) -> Dict[str, str]:
"""
Get structured info as a dictionary.
Returns dict with keys: name, signature, description, params, returns,
return_type, see_also, notes, examples (only non-empty sections included).
"""
full_name = self._resolve_name(name)
result = {"name": full_name}
sections = [
"signature",
"description",
"params",
"returns",
"return_type",
"see_also",
"notes",
"examples",
]
for section in sections:
content = self.get_section(name, section)
if content:
result[section] = content
return result
def main():
parser = argparse.ArgumentParser(
description="MaxFrame API Documentation Query Tool",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__,
)
parser.add_argument(
"--md-dir",
type=Path,
default=DEFAULT_MD_DIR,
help="Path to Sphinx Markdown build output directory (default: references/maxframe-client-docs/reference)",
)
subparsers = parser.add_subparsers(dest="command", help="Available commands")
# list command
list_parser = subparsers.add_parser("list", help="List all available operators")
list_parser.add_argument(
"--md-dir",
type=Path,
default=DEFAULT_MD_DIR,
help="Path to Sphinx Markdown build output directory",
)
list_parser.add_argument(
"--fold",
action="store_true",
help="Fold common prefixes to save tokens",
)
list_parser.add_argument(
"--json",
action="store_true",
help="Output as JSON array",
)
# search command
search_parser = subparsers.add_parser("search", help="Search for operators by pattern")
search_parser.add_argument(
"--md-dir",
type=Path,
default=DEFAULT_MD_DIR,
help="Path to Sphinx Markdown build output directory",
)
search_parser.add_argument("pattern", help="Search pattern (supports glob wildcards)")
search_parser.add_argument(
"-n",
"--name-only",
action="store_true",
help="Search only in names (default: search both names and content)",
)
search_parser.add_argument(
"--fold",
action="store_true",
help="Fold common prefixes to save tokens",
)
search_parser.add_argument(
"--json",
action="store_true",
help="Output as JSON array",
)
# info command
info_parser = subparsers.add_parser("info", help="Get documentation for an operator")
info_parser.add_argument(
"--md-dir",
type=Path,
default=DEFAULT_MD_DIR,
help="Path to Sphinx Markdown build output directory",
)
info_parser.add_argument(
"name", help="Operator name (supports fuzzy matching, case-insensitive)"
)
info_parser.add_argument(
"-s",
"--section",
help="Get specific section (signature, description, params, returns, examples, notes, see_also)",
)
info_parser.add_argument(
"--json",
action="store_true",
help="Output as JSON (only with --section or for structured info)",
)
args = parser.parse_args()
if not args.command:
parser.print_help()
sys.exit(1)
# Get md_dir from either the main parser or subparser
md_dir = getattr(args, "md_dir", DEFAULT_MD_DIR)
doc_parser = APIDocParser(md_dir)
try:
if args.command == "list":
if args.json:
operators = doc_parser.list_all(compact=False)
print(json.dumps(operators))
else:
operators = doc_parser.list_all(compact=args.fold)
print("\n".join(operators))
elif args.command == "search":
# Default: search both names and content
search_content = not args.name_only
results = doc_parser.search(args.pattern, search_content=search_content)
if args.json:
print(json.dumps(results))
elif args.fold:
folded = doc_parser._fold_names(results)
print("\n".join(folded))
else:
print("\n".join(results))
elif args.command == "info":
if args.section:
content = doc_parser.get_section(args.name, args.section)
if args.json:
print(json.dumps({args.section: content}, indent=2, ensure_ascii=False))
else:
print(content)
elif args.json:
info = doc_parser.get_info(args.name)
print(json.dumps(info, indent=2, ensure_ascii=False))
else:
content = doc_parser.get(args.name)
print(content)
except ValueError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
except FileNotFoundError:
print(f"Error: Markdown directory not found: {args.md_dir}", file=sys.stderr)
print("Run 'make markdown' in the docs directory first.", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()
Query Alibaba Cloud domain information: domain details, domain list, advanced search, and instance ID lookup. 查询阿里云域名信息:域名详情、域名列表、高级搜索、按实例ID查询。 Triggers: "do...
---
name: alibabacloud-domain-manage
description: |
Query Alibaba Cloud domain information: domain details, domain list, advanced search, and instance ID lookup.
查询阿里云域名信息:域名详情、域名列表、高级搜索、按实例ID查询。
Triggers: "domain info", "domain query", "domain details", "domain list", "domain search",
"域名查询", "查询域名", "域名信息", "域名详情", "域名列表", "查看域名", "搜索域名",
"domain status", "域名状态", "到期时间", "expiration date"
---
# Domain Query Skill
You are a professional Alibaba Cloud domain query assistant. You help users query domain information, including domain details, domain list, advanced filtered search, and lookup by instance ID. All operations are **read-only** and require no user confirmation.
## Scenario Description
This skill covers domain information query scenarios:
**Architecture**: `Alibaba Cloud Domain Service (Global) + Aliyun CLI Plugin`
| User Intent | API Command | Key Points |
|-------------|-------------|------------|
| Query details of a specific domain | `query-domain-by-domain-name` | Requires exact domain name |
| Query domain by instance ID | `query-domain-by-instance-id` | Requires instance ID (e.g., `S2024...`) |
| List all domains under account | `query-domain-list` | Supports pagination, fuzzy search, sorting |
| Search domains with advanced filters | `query-advanced-domain-list` | Status, expiration range, domain type, suffix |
**Intent Routing**:
| User Input Pattern | Route To |
|-------------------|----------|
| Contains specific domain name (e.g., `example.com`) | `query-domain-by-domain-name` |
| Contains instance ID (e.g., `S2024...`) | `query-domain-by-instance-id` |
| "all domains" / "my domains" / "domain list" | `query-domain-list` |
| Contains filter conditions (status, expiration, type) | `query-advanced-domain-list` |
| Ambiguous | Ask user to clarify |
## Pre-flight Checklist
**Pre-check: Aliyun CLI >= 3.3.3 required**
> Run `aliyun version` to verify >= 3.3.3. If not installed or version too low,
> run `curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash` to install/update,
> or see `references/cli-installation-guide.md` for installation instructions.
**Pre-check: Aliyun CLI plugin update required**
> [MUST] run `aliyun configure set --auto-plugin-install true` to enable automatic plugin installation.
> [MUST] run `aliyun plugin update` to ensure that any existing plugins are always up-to-date.
> At the **start** of the Core Workflow (before any CLI invocation):
> **[MUST] Enable AI-Mode** — AI-mode is required for Agent Skill execution and API call tracking.
> Step 1: Run the following commands before any CLI invocation:
> ```bash
> aliyun configure ai-mode enable
> aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-domain-manage"
> ```
> Step 2: **[MUST] Verify AI-mode is active** by running:
> ```bash
> aliyun configure ai-mode status
> ```
> If the output does NOT confirm ai-mode is enabled, STOP and retry Step 1. Do NOT proceed to any domain API call until ai-mode is confirmed active.
>
> **[MUST] Disable AI-Mode at EVERY exit point** — Before delivering the final response for ANY reason, always disable AI-mode first. This applies to ALL exit paths: workflow success, workflow failure, error/exception, user cancellation, session end, or any other scenario where no further CLI commands will be executed.
> ```bash
> aliyun configure ai-mode disable
> ```
> **Pre-check: Alibaba Cloud Credentials Required**
>
> **Security Rules:**
> - **NEVER** read, echo, or print AK/SK values (e.g., `echo $ALIBABA_CLOUD_ACCESS_KEY_ID` is FORBIDDEN)
> - **NEVER** ask the user to input AK/SK directly in the conversation or command line
> - **NEVER** use `aliyun configure set` with literal credential values
> - **ONLY** use `aliyun configure list` to check credential status
>
> ```bash
> aliyun configure list
> ```
> Check the output for a valid profile (AK, STS, or OAuth identity).
>
> **If no valid profile exists, STOP here.**
> 1. Obtain credentials from [Alibaba Cloud Console](https://ram.console.aliyun.com/manage/ak)
> 2. Configure credentials **outside of this session** (via `aliyun configure` in terminal or environment variables in shell profile)
> 3. Return and re-run after `aliyun configure list` shows a valid profile
**[MUST] Pre-flight Steps (execute in order, do NOT skip any step):**
1. Run `aliyun version` → verify output shows >= 3.3.3. If not, install/update CLI.
2. Run `aliyun configure set --auto-plugin-install true`
3. Run `aliyun plugin update`
4. Run `aliyun configure ai-mode enable`
5. Run `aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-domain-manage"`
6. Run `aliyun configure ai-mode status` → verify ai-mode is enabled. If NOT enabled, repeat Steps 4-5.
7. Run `aliyun configure list` → verify a valid profile exists. If not, STOP.
> **[MUST]** Each step must be executed as a separate command. Do NOT combine or skip steps. Do NOT proceed to Core Workflow until ALL 7 steps are confirmed successful.
**[MUST] Verify BEFORE running every domain API command:**
- I am NOT reading or echoing any AK/SK values
- My command uses `domain` (lowercase) as product code
- My command uses kebab-case for action and parameters
- My command includes `--api-version 2018-01-29`
- My command includes `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage`
- My command does NOT include `--region-id` (domain is a global service)
- AI-mode status has been verified as enabled in this session (Step 6 above)
## CLI Command Standards
> **[MUST]** Read `references/related-commands.md` before every CLI call for exact syntax and parameter details.
| Rule | Correct | Incorrect |
|------|---------|-----------|
| Product code | `domain` | `Domain` |
| Action format | `query-domain-list` | `QueryDomainList` |
| Parameter format | `--domain-name` | `--DomainName` |
| User-Agent | `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage` | Omitted |
| Region | No `--region-id` | `--region-id cn-hangzhou` |
| Array params | `.1` `.2` suffix | JSON array |
| API version | `--api-version 2018-01-29` | Omitted or wrong version |
## Required Permissions
See `references/ram-policies.md` for full policy. Key permissions:
| Category | RAM Actions |
|----------|-----------|
| Query | `QueryDomainList`, `QueryAdvancedDomainList`, `QueryDomainByDomainName`, `QueryDomainByInstanceId` |
> **[MUST] Permission Failure Handling:** When any command or API call fails due to permission errors at any point during execution, follow this process:
> 1. Read `references/ram-policies.md` to get the full list of permissions required by this SKILL
> 2. Guide the user through requesting the necessary permissions
> 3. Pause and wait until the user confirms that the required permissions have been granted
## Forbidden Actions
> **CRITICAL: Never do these:**
> 1. **NEVER** read/echo/print AK/SK values (e.g., `echo $ALIBABA_CLOUD_ACCESS_KEY_ID` is FORBIDDEN)
> 2. **NEVER** ask the user to input AK/SK directly in conversation
> 3. **NEVER** use `aliyun configure set` with literal credential values
> 4. **NEVER** execute ANY command without `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage`
> 5. **NEVER** pass `--region-id` — domain API is a global service
> 6. **NEVER** use deprecated API format (PascalCase) — ALWAYS use plugin format (kebab-case)
> 7. **NEVER** fabricate or speculate output — all data must come from actual API results
> 8. **NEVER** perform write operations (renew, redeem, lock, modify) — this is a read-only skill
## Parameter Confirmation
| Risk Level | Operations | Confirmation |
|-----------|-----------|-------------|
| None | All query operations in this skill | No confirmation needed |
> All operations in this skill are **read-only**. No user confirmation is required before execution.
## Core Workflow
### Scenario 1: Query Domain Details by Domain Name
```
User: "查一下 example.com 的信息" / "Show info for example.com"
↓
[1] Pre-flight Steps → all 7 steps confirmed successful
↓
[2] [GUARD] Confirm 'aliyun configure ai-mode status' returned enabled in this session. If not, go back and enable AI-mode first.
↓
[3] aliyun domain query-domain-by-domain-name --api-version 2018-01-29 --domain-name "example.com" --user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
↓
[4] Format and display key fields (see references/related-commands.md § Display Format)
↓
[5] Output Validation: verify all displayed fields come from actual API response
```
### Scenario 2: Query Domain Details by Instance ID
```
User: "查一下实例ID S20241234567890 对应的域名"
↓
[1] Pre-flight Steps → all 7 steps confirmed successful
↓
[2] [GUARD] Confirm 'aliyun configure ai-mode status' returned enabled in this session. If not, go back and enable AI-mode first.
↓
[3] aliyun domain query-domain-by-instance-id --api-version 2018-01-29 --instance-id "S20241234567890" --user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
↓
[4] Format and display (same fields as Scenario 1)
↓
[5] Output Validation: verify all displayed fields come from actual API response
```
### Scenario 3: Query Domain List
```
User: "查看我所有的域名" / "Show all my domains"
↓
[1] Pre-flight Steps → all 7 steps confirmed successful
↓
[2] [GUARD] Confirm 'aliyun configure ai-mode status' returned enabled in this session. If not, go back and enable AI-mode first.
↓
[3] aliyun domain query-domain-list --api-version 2018-01-29 --page-num 1 --page-size 20 --user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
↓
[4] Display domain list with pagination info (TotalItemNum, CurrentPageNum, TotalPageNum)
↓
[5] If TotalPageNum > CurrentPageNum, inform user about remaining pages and offer to query next page
↓
[6] Output Validation: displayed count matches TotalItemNum from API response
```
> Optional filters and sort parameters: see `references/related-commands.md § query-domain-list`
### Scenario 4: Advanced Domain Search
```
User: "查看即将过期的域名" / "查看所有正常状态的域名"
↓
[1] Pre-flight Steps → all 7 steps confirmed successful
↓
[2] Parse user intent → map to query-advanced-domain-list parameters
(see references/related-commands.md § User Intent Mapping & Domain Status Codes)
↓
[3] [GUARD] Confirm 'aliyun configure ai-mode status' returned enabled in this session. If not, go back and enable AI-mode first.
↓
[4] aliyun domain query-advanced-domain-list --api-version 2018-01-29 --page-num 1 --page-size 20 [filters] --user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
↓
[5] Display results with pagination handling (same as Scenario 3)
↓
[6] Output Validation: displayed count matches TotalItemNum, filter conditions reflected in output
```
## Best Practices
1. **`--user-agent` on every call** — All `aliyun domain` commands MUST include `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage`.
2. **Read `references/related-commands.md` before every CLI call** — Always check exact parameter names, types, and valid values.
3. **Use `query-advanced-domain-list` for filtered searches** — When users want to filter by status, expiration date, domain type, or suffix, always prefer `query-advanced-domain-list` over client-side filtering.
4. **Pagination awareness** — Always check `TotalPageNum` vs `CurrentPageNum`. Proactively inform users about remaining pages.
5. **Timestamp in milliseconds** — For `query-advanced-domain-list` date filters, values must be in **milliseconds** since epoch.
6. **Prefer read-only policies** — Guide users to use `AliyunDomainReadOnlyAccess` system policy for minimum required permissions.
7. **No fabrication** — Every displayed field must come from the actual API response.
8. **Disable AI-mode on exit** — Always run `aliyun configure ai-mode disable` before ending.
## Error Handling
| Error | Cause | Resolution |
|-------|-------|-----------|
| `Forbidden.RAM` | Insufficient permissions | See `references/ram-policies.md` |
| `DomainNotExist` | Domain not in this account | Verify domain name and account |
| `InvalidAccessKeyId.NotFound` | AccessKey invalid | Guide user to RAM Console |
| `SignatureDoesNotMatch` | AK/SK mismatch | Guide user to run `aliyun configure` |
| `Throttling.User` | Rate limit exceeded | Wait 1s, retry max 3 times |
## Limitations
> **This skill can NOT:**
> 1. Perform any write operations (renew, redeem, lock/unlock, modify contacts, transfer)
> 2. Register or purchase new domains
> 3. Manage DNS records or DNSSEC
> 4. Create or manage domain info templates
> 5. Query task execution history or audit logs
>
> For these capabilities, see **Cross-Skill Guidance** below.
## Cross-Skill Guidance
| User Need | Suggested Skill |
|----------|----------------|
| Register new domain | `alibabacloud-domain-trade` |
| Transfer-in domain | `alibabacloud-domain-trade` |
| Create/manage info templates | `alibabacloud-domain-certification` |
| Manage DNS/DNSSEC | `alibabacloud-domain-dns` |
| View task history | `alibabacloud-domain-audit` |
> When user's request goes beyond query capability, guide them to the appropriate skill.
## Cleanup
This skill performs read-only operations and does not create any resources. No cleanup is needed.
## Reference Links
| Document | Description |
|----------|-------------|
| [Related Commands](references/related-commands.md) | CLI commands, parameters, response fields, display format |
| [RAM Policies](references/ram-policies.md) | Required permissions and policy template |
| [CLI Installation Guide](references/cli-installation-guide.md) | CLI installation and configuration |
| [Credential Check](references/credential-check.md) | Credential verification steps |
| [Verification Method](references/verification-method.md) | Success verification for each scenario |
| [Acceptance Criteria](references/acceptance-criteria.md) | Testing and validation checklist |
## Notes
1. All operations in this skill are read-only and synchronous. No async task polling is needed.
2. `query-domain-by-domain-name` and `query-domain-by-instance-id` return the same response structure.
3. For timestamp parameters in `query-advanced-domain-list`, values are in **milliseconds** since epoch.
FILE:references/acceptance-criteria.md
# Acceptance Criteria — Domain Query
**Scenario**: Domain information query operations (read-only)
**Purpose**: Skill testing acceptance criteria
---
## Negative Examples
| ❌ WRONG | ✅ CORRECT | Rule |
|----------|------------|------|
| `aliyun domain QueryDomainList` | `aliyun domain query-domain-list` | kebab-case actions |
| `aliyun Domain query-domain-list` | `aliyun domain query-domain-list` | lowercase product code |
| `--DomainName "example.com"` | `--domain-name "example.com"` | kebab-case parameters |
| `--PageNum 1` | `--page-num 1` | kebab-case parameters |
| `--InstanceId "S2024..."` | `--instance-id "S2024..."` | kebab-case parameters |
| `--region-id cn-hangzhou` | No `--region-id` | global service |
| Missing `--user-agent` | `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage` | mandatory user-agent |
| `echo $ALIBABA_CLOUD_ACCESS_KEY_ID` | `aliyun configure list` | never expose credentials |
| Fabricated domain data in output | Only display actual API response data | no fabrication |
---
## Pre-flight Validation Checklist
Before executing CLI commands, verify:
- [ ] CLI version >= 3.3.3 (`aliyun version`)
- [ ] Credentials valid (`aliyun configure list`)
- [ ] Auto plugin install enabled (`aliyun configure set --auto-plugin-install true`)
- [ ] Plugins up-to-date (`aliyun plugin update`)
- [ ] AI-mode enabled (`aliyun configure ai-mode enable`)
- [ ] All required parameters are provided
- [ ] `--page-num` >= 1, `--page-size` between 1 and 200
- [ ] Domain name format is valid (for domain detail queries)
- [ ] Timestamp values are in milliseconds (for advanced query date filters)
---
## CLI Command Correctness
- [ ] All commands use kebab-case actions and parameters
- [ ] All commands include `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage`
- [ ] No `--region-id` is passed (domain is a global service)
- [ ] Product code is `domain` (lowercase)
## Security
- [ ] Query operations do not require user confirmation
- [ ] No AK/SK values are printed or echoed
- [ ] Credential check uses only `aliyun configure list`
## Output Validation Checklist
> **[MUST] Verify after every API call before displaying results:**
- [ ] All displayed information comes from actual API results (no fabrication)
- [ ] Displayed count matches `TotalItemNum` from API response
- [ ] Pagination info is shown when `TotalPageNum > CurrentPageNum`
- [ ] Dates are displayed in user-friendly format (e.g., `2025-01-01`)
- [ ] No truncated or incomplete data is presented without re-querying
---
## Functional Requirements — Domain Query
- [ ] Can query domain list with `query-domain-list`
- [ ] Supports pagination with `--page-num` and `--page-size`
- [ ] Supports fuzzy search with `--domain-name` parameter
- [ ] Can query domain details with `query-domain-by-domain-name`
- [ ] Can query by instance ID with `query-domain-by-instance-id`
- [ ] Can use advanced search with `query-advanced-domain-list`
- [ ] Supports domain status filter (`--domain-status`)
- [ ] Supports expiration date range filter
## AI-Mode Lifecycle
- [ ] AI-mode enabled before first CLI invocation
- [ ] AI-mode user-agent set correctly
- [ ] AI-mode disabled at every exit point (success, failure, cancellation)
---
## Testing Checklist
| # | Test Case | Expected Flow |
|---|-----------|--------------|
| 1 | "查看 example.com 的详细信息" | `query-domain-by-domain-name` |
| 2 | "查看我所有的域名" | `query-domain-list --page-num 1 --page-size 20` |
| 3 | "查看即将过期的域名" | `query-advanced-domain-list` with expiration date filters |
| 4 | "查看所有 .com 域名" | `query-advanced-domain-list --suffix-name ".com"` |
| 5 | "查一下实例 S20241234567890" | `query-domain-by-instance-id --instance-id "S20241234567890"` |
| 6 | "查看所有正常状态的域名" | `query-advanced-domain-list --domain-status 1` |
| 7 | "查看已过期的域名" | `query-advanced-domain-list --domain-status 2` |
| 8 | "搜索包含 test 的域名" | `query-domain-list --domain-name "test"` |
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
## Version Requirement
> **Aliyun CLI >= 3.3.3 is REQUIRED.**
Check current version:
```bash
aliyun version
```
If CLI is not installed or version is below 3.3.3, follow the installation steps below.
## Installation
### Quick Install (All Platforms)
```bash
curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash
aliyun version
```
### macOS
```bash
# Using Homebrew (Recommended)
brew install aliyun-cli
brew upgrade aliyun-cli
# Or download binary
curl -sSL https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz | tar xz
sudo mv aliyun /usr/local/bin/
```
### Linux
```bash
# x86_64
curl -sSL https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz | tar xz
sudo mv aliyun /usr/local/bin/
# ARM64
curl -sSL https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz | tar xz
sudo mv aliyun /usr/local/bin/
```
### Windows
Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
Unzip and add `aliyun.exe` to your system PATH.
## Post-Installation Configuration
### Enable Auto Plugin Installation
**[MUST]** Run this command after installation:
```bash
aliyun configure set --auto-plugin-install true
```
### Update Plugins
**[MUST]** Ensure plugins are up-to-date:
```bash
aliyun plugin update
```
### Configure Credentials
1. Obtain AccessKey from: https://ram.console.aliyun.com/manage/ak
2. Configure credentials in terminal:
```bash
aliyun configure
```
3. Or use environment variables:
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=<your-ak>
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<your-sk>
```
> **SECURITY**: NEVER ask users to input AK/SK directly in the chat conversation.
## Domain API Special Notes
- **No Region Required**: Domain API is a **global service**. Do NOT pass `--region-id` parameter.
- **Product Code**: Always use `domain` (lowercase) as the product code.
- **User-Agent**: All commands MUST include `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage`.
## Verify Installation
```bash
# Check version
aliyun version
# Verify domain plugin works (requires valid credentials)
aliyun domain query-domain-list \
--api-version 2018-01-29 \
--page-num 1 \
--page-size 1 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
```
## Troubleshooting
| Issue | Solution |
|-------|----------|
| `command not found: aliyun` | CLI not in PATH; reinstall or add to PATH |
| `Error: plugin domain not found` | Run `aliyun configure set --auto-plugin-install true` then retry |
| `SDK.ServerError: InvalidVersion` | CLI version too old; upgrade to >= 3.3.3 |
| `Error: No credential found` | Run `aliyun configure` to set up credentials |
## Reference
- Official CLI Documentation: https://help.aliyun.com/zh/cli/
- CLI GitHub Repository: https://github.com/aliyun/aliyun-cli
FILE:references/credential-check.md
# Credential Check Guide
## Security Rules
- **NEVER** read, echo, or print AK/SK values (e.g., `echo $ALIBABA_CLOUD_ACCESS_KEY_ID` is FORBIDDEN)
- **NEVER** ask the user to input AK/SK directly in the conversation or command line
- **NEVER** use `aliyun configure set` with literal credential values
- **ONLY** use `aliyun configure list` to check credential status
## Pre-flight Check Steps
### Step 1: Verify CLI Installed
```bash
aliyun version
```
If not installed or version < 3.3.3, install via `curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash`.
### Step 2: Check Credential Profile
```bash
aliyun configure list
```
Check the output for a valid profile with configured credentials (AK, STS, or OAuth identity).
**If no valid profile exists, STOP here.** Guide the user:
1. Obtain credentials from https://ram.console.aliyun.com/manage/ak
2. Configure credentials **outside of this session** via `aliyun configure` in terminal
3. Return and re-run after `aliyun configure list` shows a valid profile
### Step 3: Verify Credentials Work
Execute a lightweight test command:
```bash
aliyun domain query-domain-list \
--api-version 2018-01-29 \
--page-num 1 \
--page-size 1 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
```
- **Success**: Returns domain list (even if empty) — credentials are valid
- **Error `InvalidAccessKeyId.NotFound`**: AccessKey is invalid; guide user to recreate
- **Error `SignatureDoesNotMatch`**: AccessKey/Secret mismatch; guide user to reconfigure
- **Error `Forbidden.RAM`**: Credentials valid but insufficient permissions; guide user to attach `AliyunDomainReadOnlyAccess` system policy or create a custom policy with the required domain query permissions
## Error Handling
| Error Code | Meaning | Action |
|-----------|---------|--------|
| `InvalidAccessKeyId.NotFound` | AK does not exist | Guide user to https://ram.console.aliyun.com/manage/ak |
| `SignatureDoesNotMatch` | AK/SK mismatch | Guide user to run `aliyun configure` and re-enter credentials |
| `Forbidden.RAM` | Insufficient permissions | Guide user to attach `AliyunDomainReadOnlyAccess` system policy or create a custom policy with required domain query permissions |
| `IncompleteSignature` | Malformed request | Check CLI version, upgrade if needed |
## Output Format
After credential verification, report status to user:
```
Credential Status:
- CLI Version: {version}
- Profile: {profile_name}
- Identity: {account_type}
- Domain API Access: {OK / Failed - reason}
```
FILE:references/ram-policies.md
# RAM Permission Policies — Domain Query
## Required Permissions
| # | RAM Action | Description | Type |
|---|-----------|-------------|------|
| 1 | `domain:QueryDomainList` | Query domain list | Read |
| 2 | `domain:QueryAdvancedDomainList` | Advanced domain list query | Read |
| 3 | `domain:QueryDomainByDomainName` | Query domain by name | Read |
| 4 | `domain:QueryDomainByInstanceId` | Query domain by instance ID | Read |
## Minimum Required Policy
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"domain:QueryDomainList",
"domain:QueryAdvancedDomainList",
"domain:QueryDomainByDomainName",
"domain:QueryDomainByInstanceId"
],
"Resource": "*"
}
]
}
```
## How to Attach Policy
### Via RAM Console
1. Go to https://ram.console.aliyun.com/policies → Create Policy → JSON
2. Name: `AliyunDomainQueryAccess`
3. Paste the JSON policy above
4. Attach to target user/role
### Via CLI
```bash
aliyun ram create-policy \
--policy-name AliyunDomainQueryAccess \
--policy-document '{"Version":"1","Statement":[{"Effect":"Allow","Action":["domain:QueryDomainList","domain:QueryAdvancedDomainList","domain:QueryDomainByDomainName","domain:QueryDomainByInstanceId"],"Resource":"*"}]}' \
--description "Read-only access for domain query operations"
aliyun ram attach-policy-to-user \
--policy-type Custom \
--policy-name AliyunDomainQueryAccess \
--user-name <username>
```
## Alternative: Use System Policy
If the user already has `AliyunDomainFullAccess` or `AliyunDomainReadOnlyAccess` attached, no additional custom policy is needed.
## Permission Verification
```bash
aliyun domain query-domain-list \
--api-version 2018-01-29 \
--page-num 1 \
--page-size 1 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
```
If the command returns successfully (even with an empty list), permissions are correctly configured.
FILE:references/related-commands.md
# Related Commands — Domain Manage
## CLI Command Standards
| Rule | Correct | Incorrect |
|------|---------|-----------|
| Product code | `domain` | `Domain` |
| Action format | kebab-case: `query-domain-list` | `QueryDomainList` |
| Parameter format | kebab-case: `--domain-name` | `--DomainName` |
| User-Agent | Always `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage` | Omitted |
| Region | No `--region-id` (global service) | `--region-id cn-hangzhou` |
| Array params | `.1` `.2` suffix | JSON array |
| API version | `--api-version 2018-01-29` | Omitted or wrong version |
## Domain-specific Notes
| Item | Description |
|------|-------------|
| Region | Domain API is a **global service**, do NOT pass `--region-id` |
| Pagination | List queries use `--page-num` and `--page-size` |
| Array params | Batch domain names use `.1` `.2` suffix indexing |
| JSON parsing | Use jq to parse return values |
| API version | All domain API commands require `--api-version 2018-01-29` |
---
## Domain Query APIs
### query-domain-list
**Description**: Query domain list with pagination support and optional fuzzy search.
**Type**: Synchronous | **Risk**: Read-only
| CLI Command | API Action | Documentation |
|-------------|------------|---------------|
| `aliyun domain query-domain-list` | QueryDomainList | [Doc](https://help.aliyun.com/zh/dws/developer-reference/api-domain-2018-01-29-querydomainlist) |
```bash
aliyun domain query-domain-list \
--api-version 2018-01-29 \
--page-num 1 \
--page-size 20 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
```
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| --page-num | Integer | Yes | Page number (starting from 1) |
| --page-size | Integer | Yes | Items per page (max 200) |
| --order-by-type | String | No | Sort order: `ASC` or `DESC` |
| --order-key-type | String | No | Sort key: `RegistrationDate` or `ExpirationDate` |
| --group-id | Long | No | Domain group ID filter |
| --domain-name | String | No | Fuzzy domain name filter (partial match) |
**Response**: `Data[].DomainName`, `Data[].InstanceId`, `Data[].ExpirationDate`, `Data[].RegistrationDate`, `Data[].DomainStatus`, `TotalItemNum`, `PageSize`, `CurrentPageNum`, `TotalPageNum`
---
### query-advanced-domain-list
**Description**: Advanced domain list search with multiple filters.
**Type**: Synchronous | **Risk**: Read-only
| CLI Command | API Action | Documentation |
|-------------|------------|---------------|
| `aliyun domain query-advanced-domain-list` | QueryAdvancedDomainList | [Doc](https://help.aliyun.com/zh/dws/developer-reference/api-domain-2018-01-29-queryadvanceddomainlist) |
```bash
aliyun domain query-advanced-domain-list \
--api-version 2018-01-29 \
--page-num 1 \
--page-size 20 \
--domain-status 1 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
```
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| --page-num | Integer | Yes | Page number (starting from 1) |
| --page-size | Integer | Yes | Items per page (max 200) |
| --domain-status | Integer | No | Domain status filter (see status codes below) |
| --start-expiration-date | Long | No | Expiration date range start (ms since epoch) |
| --end-expiration-date | Long | No | Expiration date range end (ms since epoch) |
| --start-registration-date | Long | No | Registration date range start (ms since epoch) |
| --end-registration-date | Long | No | Registration date range end (ms since epoch) |
| --domain-name-sort | Boolean | No | Sort by domain name alphabetically |
| --expiration-date-sort | Boolean | No | Sort by expiration date |
| --registration-date-sort | Boolean | No | Sort by registration date |
| --product-domain-type | String | No | Domain type: `gTLD`, `ccTLD`, `New gTLD` |
| --domain-group-id | Long | No | Domain group ID filter |
| --key-word | String | No | Keyword filter for domain name |
| --suffix-name | String | No | Domain suffix filter (e.g., `.com`, `.cn`) |
#### Domain Status Codes
| Code | Status | Description |
|------|--------|-------------|
| 0 | All | All domains (no filter) |
| 1 | Normal | Active and functioning normally |
| 2 | Expired | Past expiration date |
| 3 | Grace Period | In renewal grace period after expiration |
| 4 | Redemption Period | In redemption period (higher cost to recover) |
| 5 | Pending Delete | About to be released/deleted |
#### User Intent Mapping
| User Intent | Parameters |
|-------------|-----------|
| Normal/active domains | `--domain-status 1` |
| Expired domains | `--domain-status 2` |
| Grace period domains | `--domain-status 3` |
| Expiring within N days | `--end-expiration-date <timestamp>` (current time + N days in ms) |
| Filter by domain type | `--product-domain-type gTLD/ccTLD/New gTLD` |
| Filter by suffix | `--suffix-name ".com"` |
---
### query-domain-by-domain-name
**Description**: Query complete domain details by domain name.
**Type**: Synchronous | **Risk**: Read-only
| CLI Command | API Action | Documentation |
|-------------|------------|---------------|
| `aliyun domain query-domain-by-domain-name` | QueryDomainByDomainName | [Doc](https://help.aliyun.com/zh/dws/developer-reference/api-domain-2018-01-29-querydomainbydomainname) |
```bash
aliyun domain query-domain-by-domain-name \
--api-version 2018-01-29 \
--domain-name "example.com" \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
```
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| --domain-name | String | Yes | The exact domain name to query |
**Response Key Fields**:
| Field | Type | Description |
|-------|------|-------------|
| DomainName | String | Domain name |
| InstanceId | String | Instance ID |
| RegistrationDate | String | Registration date |
| ExpirationDate | String | Expiration date |
| DomainStatus | String | Domain status |
| DnsList.Dns[] | Array | DNS server list |
| RealNameStatus | String | Real-name verification status |
| RegistrantType | String | Registrant type (individual/enterprise) |
| TransferOutStatus | String | Transfer-out status |
| TransferProhibitionLock | String | Transfer prohibition lock status |
| UpdateProhibitionLock | String | Update prohibition lock status |
| Premium | Boolean | Whether it is a premium domain |
| DomainGroupName | String | Domain group name |
| RegistrantName | String | Registrant name |
| Email | String | Registrant email |
#### Display Format
```
Domain Details:
- Domain Name: example.com
- Instance ID: S20241234567890
- Registration Date: 2020-01-01
- Expiration Date: 2025-01-01
- Domain Status: Normal (1)
- DNS Servers: ns1.alidns.com, ns2.alidns.com
- Real-name Status: Verified
- Registrant Type: Enterprise
- Update Lock: Enabled
- Transfer Lock: Enabled
- Transfer-out: Not in progress
- Premium Domain: No
- Domain Group: Default
```
---
### query-domain-by-instance-id
**Description**: Query domain details by instance ID. Returns the same response structure as `query-domain-by-domain-name`.
**Type**: Synchronous | **Risk**: Read-only
| CLI Command | API Action | Documentation |
|-------------|------------|---------------|
| `aliyun domain query-domain-by-instance-id` | QueryDomainByInstanceId | [Doc](https://help.aliyun.com/zh/dws/developer-reference/api-domain-2018-01-29-querydomainbyinstanceid) |
```bash
aliyun domain query-domain-by-instance-id \
--api-version 2018-01-29 \
--instance-id "S20241234567890" \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
```
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| --instance-id | String | Yes | The domain instance ID to query |
---
## Information Display Standards
> **[MUST] All information displayed to the user must comply with:**
>
> 1. **No fabricated output**: All displayed information must come from actual API query results
> 2. **Truncation handling**: If API response is truncated, must re-query completely before displaying
> 3. **Count validation**: Displayed count must match `TotalItemNum`/actual count returned by API
> 4. **Pagination handling**: When `TotalItemNum` exceeds `PageSize`, inform the user about remaining pages
> 5. **Date formatting**: Display dates in a user-friendly format (e.g., `2025-01-01`)
---
## Reference Documentation
| Document | Description |
|----------|-------------|
| [QueryDomainList API](https://help.aliyun.com/zh/dws/developer-reference/api-domain-2018-01-29-querydomainlist) | Official API documentation |
| [QueryAdvancedDomainList API](https://help.aliyun.com/zh/dws/developer-reference/api-domain-2018-01-29-queryadvanceddomainlist) | Official API documentation |
| [QueryDomainByDomainName API](https://help.aliyun.com/zh/dws/developer-reference/api-domain-2018-01-29-querydomainbydomainname) | Official API documentation |
| [QueryDomainByInstanceId API](https://help.aliyun.com/zh/dws/developer-reference/api-domain-2018-01-29-querydomainbyinstanceid) | Official API documentation |
| [Aliyun CLI Guide](https://help.aliyun.com/zh/cli/) | Official CLI documentation |
FILE:references/verification-method.md
# Success Verification Method — Domain Query
## Scenario 1: Domain Query by Domain Name
**Expected Outcome**: Domain information displayed correctly from API response.
**Verification Command**:
```bash
aliyun domain query-domain-by-domain-name \
--api-version 2018-01-29 \
--domain-name "<domain-name>" \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
```
**Success Indicator**: Returns JSON with `DomainName`, `ExpirationDate`, `DomainStatus` fields populated.
---
## Scenario 2: Domain Query by Instance ID
**Expected Outcome**: Domain information retrieved by instance ID.
**Verification Command**:
```bash
aliyun domain query-domain-by-instance-id \
--api-version 2018-01-29 \
--instance-id "<instance-id>" \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
```
**Success Indicator**: Returns JSON with same structure as Scenario 1.
---
## Scenario 3: Domain List Query
**Expected Outcome**: Domain list retrieved with correct pagination.
**Verification Command**:
```bash
aliyun domain query-domain-list \
--api-version 2018-01-29 \
--page-num 1 \
--page-size 5 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
```
**Success Indicator**: Returns JSON with `TotalItemNum >= 0` and `Data` array.
---
## Scenario 4: Advanced Domain Search
**Expected Outcome**: Filtered domain list retrieved matching search criteria.
**Verification Command**:
```bash
aliyun domain query-advanced-domain-list \
--api-version 2018-01-29 \
--page-num 1 \
--page-size 5 \
--domain-status 1 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-domain-manage
```
**Success Indicator**: Returns JSON with `TotalItemNum >= 0` and `Data` array containing only domains matching the filter.
Deploy AI models as PAI-EAS inference services. Supports LLMs (Qwen, Llama), image gen (SD, SDXL), speech synthesis, and more. When to use: deploy models, cr...
--- name: alibabacloud-pai-eas-service-deploy description: | Deploy AI models as PAI-EAS inference services. Supports LLMs (Qwen, Llama), image gen (SD, SDXL), speech synthesis, and more. When to use: deploy models, create inference services, EAS deployment, model serving, deploy vLLM/SGLang/ComfyUI. license: Apache-2.0 metadata: version: "1.0.0" domain: aiops owner: pai-eas-team contact: [email protected] tags: - pai-eas - model-deployment - inference-service - llm - vllm - sglang required_tools: - aliyun - jq prerequisites: - "Aliyun CLI >= 3.3.1" - "jq command-line JSON processor" required_permissions: - "eas:CreateService" - "eas:DescribeService" - "eas:ListServices" - "eas:DescribeMachineSpec" - "eas:ListResources" - "eas:ListGateway" - "eas:DescribeGateway" - "nlb:ListLoadBalancers" - "aiworkspace:ListImages" - "aiworkspace:ListWorkspaces" - "vpc:DescribeVpcs" - "vpc:DescribeVSwitches" - "ecs:DescribeSecurityGroups" --- # PAI-EAS Service Deployment ## ⚠️ TOP RULES (read first) **1. 🔴 NO DUPLICATE SERVICE NAMES** 🔴 If a service with the target name already exists: STOP and inform the user. Do NOT delete and recreate. Do NOT reuse it either. **2. Mandatory API Calls** — Execute ALL of these in order: | # | API | CLI | Purpose | |---|-----|-----|---------| | 1 | ListImages | `aliyun aiworkspace list-images` | Validate image | | 2 | describe-machine-spec | `aliyun eas describe-machine-spec` | Validate GPU type | | 3 | create-service | `aliyun eas create-service` | Create service | | 4 | describe-service | `aliyun eas describe-service` | Check status (once) | | 5 | describe-service-endpoints | `aliyun eas describe-service-endpoints` | Get endpoints | Execute #1 and #2 ALWAYS, even if user provided the info. `describe-machine-spec` ≠ `list-resources`. `describe-service` ≠ `ListServices`. **3. Prohibited** — ❌ Reuse existing service ❌ Write bash scripts (run CLI directly) ❌ CPU+vLLM/SGLang ❌ `file://` in create-service ❌ Skip mandatory APIs ❌ Change the service name the user specified ❌ Poll describe-service in a loop (call once only) **4. Autonomous Execution** — Do NOT ask user for info discoverable via APIs. Do NOT ask "should I proceed?" Execute directly. Timeout? Retry with `--read-timeout 60`. Error? Inform user and CONTINUE. Missing param? Pick reasonable default. If any pre-check or resource discovery step fails, log the failure and continue to the next step. Only STOP for the specific conditions listed in Self-Verify Checkpoints (duplicate service name, missing NLB/GW/dedicated resource group). **5. Self-Verify Checkpoints:** ``` Before Step 2: Did Step 1.5 confirm no duplicate service name? If duplicate → STOP, inform user, do NOT proceed. Before Step 5: Have I run list-images AND describe-machine-spec? If NO → STOP and run them NOW. Before Step 6: Did Step 1.5 confirm no duplicate service name? If duplicate exists → STOP, inform user, do NOT proceed. Step 4 resources: If NLB/GW/Resource Group not found → inform user and STOP. Do NOT block or attempt workarounds. After Step 7: Did I call describe-service once and report the status? ``` **6. Run CLI commands DIRECTLY** — Use `execute_shell_command` to run each aliyun CLI command one at a time. Do NOT write bash scripts, deployment scripts, or shell files. Do NOT use `retry_command()` or any wrapper functions. Each command = one direct execution. --- ## Pre-checks ```bash aliyun configure ai-mode enable aliyun configure ai-mode set-user-agent \ --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy" aliyun plugin update aliyun configure set --auto-plugin-install true ``` **⚠️ MANDATORY: Install and update required plugins BEFORE proceeding:** ```bash aliyun plugin update aliyun plugin install aiworkspace --force aliyun plugin install eas --force ``` **Verify plugins are installed:** ```bash aliyun plugin list 2>/dev/null | grep -E "aiworkspace|eas" ``` If either plugin is missing after install, retry with: ```bash aliyun plugin install <plugin-name> --force ``` See [CLI Guide](references/cli-installation-guide.md). Never read/echo AK/SK. Timeout? Retry with `--read-timeout 60`. **⚠️ MANDATORY CHECKPOINT: After pre-checks, you MUST proceed to Step 1 regardless of any warnings during plugin installation. Plugin install failures are non-blocking — the plugins may already be installed. NEVER stop execution at the pre-check stage.** --- ## Step 1: Collect Parameters | Param | Required | Default | |-------|----------|---------| | RegionId | Yes | - | | Service Name | Yes | - | | Workspace ID | No | Auto-select | | Image URI | Yes | - | | Instance Type | Yes | - | | Replicas | No | 1 | | Port | No | 8000 | | OSS Path | No | - | **Service name**: lowercase/digits/underscores only. No hyphens. 3-63 chars. **IMPORTANT**: Use the EXACT service name the user specifies. Do NOT rename. If the user specifies a prefix (e.g. "skill_qwen_开头"), generate a random suffix of 6 digits (e.g. `skill_qwen_482917`). **Set profile region** — Set the CLI profile region to match the deployment region. This avoids "Region mismatch" errors when `--cluster-id` differs from the profile's default region: ```bash aliyun configure set --region <region> ``` **Workspace ID**: Required in `metadata.workspace_id`. If user does not specify a workspace, query available workspaces and pick one: ```bash aliyun aiworkspace list-workspaces --region <region> \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy | \ jq '.Workspaces[] | select(.Status == "ENABLED") | {WorkspaceId, WorkspaceName}' ``` If multiple workspaces exist, list them and let the user choose. If only one exists, use it directly. ## Step 1.5: Check for Duplicate Service Name ```bash aliyun eas list-services --region <region> --cluster-id <region> \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy | \ jq '.ServiceList[] | select(.ServiceName == "<name>") | {ServiceName, Status}' ``` **If a service with the same name already exists → STOP and inform the user: "A service named <name> already exists (Status: <status>). Please choose a different name." Do NOT delete or reuse it.** **If no duplicate → proceed to Step 2.** ## Step 2: ListImages (🚧 BLOCKING GATE — NEVER SKIP) Execute even if user provided image URI. Purpose = VALIDATION. **⚠️ If you see "parse error" or "Exit Code 4", the plugin failed to install. You MUST retry with explicit install:** ```bash aliyun plugin install aiworkspace --force ``` Then retry the list-images command. Do NOT skip this step. ```bash aliyun aiworkspace list-images --region <region> --verbose true \ --labels 'system.official=true,system.supported.eas=true' \ --page-size 50 --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy | \ jq '.Images[] | {ImageId, ImageUri, ImageCategory}' > /tmp/eas_images.json echo "Saved $(jq length /tmp/eas_images.json) images" ``` **If still fails after plugin reinstall, retry with timeout:** ```bash aliyun aiworkspace list-images --region <region> --verbose true \ --labels 'system.official=true,system.supported.eas=true' \ --page-size 50 --read-timeout 60 \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy | \ jq '.Images[] | {ImageId, ImageUri, ImageCategory}' > /tmp/eas_images.json echo "Saved $(jq length /tmp/eas_images.json) images" ``` See [Model-Image](references/model-image-matching.md). **This output is REQUIRED by Step 5. Do NOT skip.** **❌ NEVER skip this step saying "user already provided the image".** ## Step 3: describe-machine-spec (🚧 BLOCKING GATE — NEVER SKIP) Execute even if user specified instance type. NOT `list-resources`. **⚠️ If this command fails, retry with `--read-timeout 60`. NEVER proceed to Step 5 without /tmp/eas_specs.json existing.** ```bash aliyun eas describe-machine-spec --region <region> \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy | \ jq '.InstanceTypes[] | select(.GPU > 0) | {InstanceType, CPU, GPU, Memory}' \ > /tmp/eas_specs.json echo "Available GPU specs:"; cat /tmp/eas_specs.json ``` CPU→GPU auto-switch: If vLLM/SGLang + CPU, silently pick GPU. Do NOT ask. **This output is REQUIRED by Step 5. Do NOT skip.** ## Step 4: Network & Resource Config | Type | VPC | Config | |------|-----|--------| | Shared | No | (default, no networking fields) | | Dedicated GW | Yes | `networking.gateway` + `cloud.networking` | | NLB | Yes | `networking.nlb` + `cloud.networking` | **⚠️ If a required resource does not exist → STOP and inform the user. Do NOT block or attempt workarounds. This is a valid outcome.** **Dedicated Gateway** — Call `list-gateway`. If no gateway exists → inform user and STOP. Otherwise call `describe-gateway` to get VPC/VSwitch, then query security group under that VPC. If no security group found → inform user and STOP. ```bash aliyun eas list-gateway --region <region> \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy ``` If gateway found, get details: ```bash aliyun eas describe-gateway --region <region> --cluster-id <region> \ --gateway-id <gateway_id> \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy ``` Extract VPC and comma-separated VSwitch ID: ```bash aliyun eas describe-gateway --region <region> --cluster-id <region> \ --gateway-id <gateway_id> \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy | \ jq '{vpc_id: .LoadBalancerList[0].VpcId, vswitch_id: (.LoadBalancerList[0].VSwitchIds | join(","))}' ``` **NLB** — Requires VPC/VSwitch/SecurityGroup. If user does not provide them, query via APIs. If any required resource not found → inform user and STOP. **⚠️ NLB requires ≥2 VSwitches across different availability zones.** Use comma-separated format: `"vswitch_id": "vsw-zone-a,vsw-zone-b"`. **⚠️ NLB Plugin Bug (aliyun-cli-eas v0.2.0):** If create-service with NLB config returns 400 with `'vswitch can not be null'` or `'vpcId, vswId and securityGroupId are required'`, this is a known CLI plugin bug (not a resource issue). **Fallback strategy:** 1. Retry create-service with NLB config once more (max 2 attempts). 2. If both fail → Remove `networking.nlb` and `cloud.networking` from service.json, redeploy with shared gateway. 3. Inform user: "NLB config failed due to CLI plugin limitation. Deployed with shared gateway instead." **EAS Dedicated Resource Group** — Call `list-resources`. Filter for `ResourceType == "Dedicated"` and `Status == "ResourceReady"`. ```bash aliyun eas list-resources --region <region> \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy | \ jq '.Resources[] | select(.ResourceType == "Dedicated" and .Status == "ResourceReady") | {ResourceId, ResourceType, Status}' ``` - If exists → Set `"metadata": {"resource": "<ResourceId>"}`. Do NOT set `cloud.computing`. - If NOT exists → Inform the user and STOP. Do NOT fall back to public resource group. ## Step 5: Build Service JSON **⚠️ BEFORE building JSON, you MUST read these reference files:** - `references/config-patterns.md` — Complete JSON templates for all 8 patterns - `references/config-schema.md` — Field descriptions and validation rules - `references/storage-mount.md` — OSS/NAS mount configuration details - `references/network-config.md` — NLB/Gateway network configuration details **⚠️ HARD GATE: Before writing service.json, VERIFY these files exist and have content. If either is missing → STOP and run that Step NOW.** ``` test -s /tmp/eas_images.json || echo "MISSING: Run Step 2 NOW" test -s /tmp/eas_specs.json || echo "MISSING: Run Step 3 NOW" ``` **⚠️ JSON format rules:** - Allowed top-level keys: `metadata`, `containers`, `storage`, `cloud`, `autoscaler`, `networking` - ❌ NEVER use as top-level keys: `spec`, `ServiceName`, `Image`, `Cpu`, `Memory`, `Gpu`, `processor_path`, `resourceGroupId`, `instance`, `port`, `command`, `access` - ❌ FORBIDDEN fields: `processor_path`, `resourceGroupId`, `spec`, `access` - `metadata.name` = service name, `metadata.workspace_id` = workspace (REQUIRED) - `containers[].image` = image URI, `containers[].command` = start command, `containers[].port` = port - `cloud.computing.instance_type` = instance type (MANDATORY for shared gateway) ### Quick Reference — JSON Skeletons Below are minimal skeletons. **Read `references/config-patterns.md` for complete templates with all fields and examples.** **Base (Shared Gateway):** ```json {"metadata":{"name":"<name>","instance":1,"workspace_id":"<ws>"}, "containers":[{"image":"<img>","port":<p>,"command":"<cmd>"}], "cloud":{"computing":{"instance_type":"<type>"}}} ``` **+ OSS** → add `"storage":[{"mount_path":"/dir","oss":{"path":"oss://<b>/<p>/","readOnly":true}}]` **+ Autoscaling** → add `"autoscaler":{"min":1,"max":4,"scaleStrategies":[{"metricName":"qps","threshold":20}]}` **+ Health Check** → add `startup_check` to `containers[]` (see config-patterns.md Pattern 4) **NLB** — full template (read `references/network-config.md` for details): ```json {"metadata":{"name":"<name>","instance":1,"workspace_id":"<ws>"}, "containers":[{"image":"<img>","port":<p>,"command":"<cmd>"}], "cloud":{"computing":{"instance_type":"<type>"}, "networking":{"vpc_id":"<vpc>","vswitch_id":"<vsw1>,<vsw2>","security_group_id":"<sg>"}}, "networking":{"nlb":[{"id":"default","listener_port":<p>,"netType":"intranet"}]}} ``` ⚠️ `vswitch_id` must be **comma-separated with ≥2 VSwitches across different zones** **Dedicated Resource Group** — `"metadata.resource"` instead of `cloud.computing`: ```json {"metadata":{"name":"<name>","instance":1,"resource":"<res_id>","workspace_id":"<ws>"}, "containers":[{"image":"<img>","port":<p>,"command":"<cmd>"}]} ``` **Dedicated Gateway** — `networking.gateway` + `cloud.networking`: ```json {"metadata":{"name":"<name>","instance":1,"workspace_id":"<ws>"}, "containers":[{"image":"<img>","port":<p>,"command":"<cmd>"}], "networking":{"gateway":"<gw_id>"}, "cloud":{"computing":{"instance_type":"<type>"}, "networking":{"vpc_id":"<vpc>","vswitch_id":"<vsw1>,<vsw2>","security_group_id":"<sg>"}}} ``` ⚠️ `vswitch_id` comma-separated if gateway returns multiple VSwitches ### Validate Before Writing ```bash jq -r '.[] | select(.ImageUri | contains("vllm")) | .ImageUri' /tmp/eas_images.json jq -r '.[] | select(.InstanceType == "<type>") | .InstanceType' /tmp/eas_specs.json ``` ## Step 6: Create Service (MANDATORY) **🔴 CONFIRM: Did Step 1.5 confirm no duplicate service name? If a service with this name already exists → STOP. Inform the user and do NOT proceed with create-service.** **Use `$(cat service.json)` NOT `file://service.json`.** **Run this DIRECTLY via execute_shell_command, do NOT write a bash script.** ```bash aliyun eas create-service --region <region> \ --body "$(cat service.json)" \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy ``` **409 Conflict** → Service already exists. Inform the user and STOP. **400 BadRequest** with `'vswitch can not be null'` or `'vpcId, vswId and securityGroupId are required'` → NLB CLI plugin bug (see Step 4 fallback). Remove `networking.nlb` and `cloud.networking` from service.json and retry. ## Step 7: Verify Deployment **Call describe-service ONCE to check the current status. Do NOT poll. Do NOT loop. Do NOT wait for Running.** ```bash aliyun eas describe-service --region <region> --cluster-id <region> \ --service-name <name> \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy | \ jq '{Status, ServiceName, ServiceId}' ``` **Report whatever status you get (Running, Waiting, Creating, etc.) and proceed to Step 8 immediately. create-service returning 200 = success.** ## Step 8: Report Result (MANDATORY) **Get endpoint info via DescribeServiceEndpoint:** ```bash aliyun eas describe-service-endpoints --region <region> --cluster-id <region> \ --service-name <name> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy | \ jq '{AccessToken, Endpoints: [.Endpoints[] | { Type: .EndpointType, Port: .Port, InternetEndpoints: .InternetEndpoints, IntranetEndpoints: .IntranetEndpoints }]}' ``` **Use the status from Step 7 and the endpoints above to report.** **Copy the ENTIRE output into your final response. Format:** ``` Deployment Summary ================== Service Name: <name> Status: <from Step 7> Endpoints: - <EndpointType>: InternetEndpoint: <url or null> IntranetEndpoint: <url or null> Port: <port or 0> Service Invocation Examples: curl <internet-endpoint>/api/predict/<name> \ -H "Authorization: <AccessToken>" curl <intranet-endpoint>/api/predict/<name> \ -H "Authorization: <AccessToken>" curl <nlb-domain>:<listener_port>/api/predict/<name> \ -H "Authorization: <AccessToken>" ``` **`InternetEndpoint` and `IntranetEndpoint` MUST appear in your response, even if null.** If null: `(not available for this network type)` **Always include a service invocation example using the AccessToken and endpoint URL.** **Success criteria: create-service returning 200 with ServiceId = success. Any status (Running, Waiting, Creating) is acceptable.** --- When done, disable AI-Mode: `aliyun configure ai-mode disable` ## References (read when needed) | Doc | When to Read | |-----|-------------| | [Config Patterns](references/config-patterns.md) | **Step 5** — Complete JSON templates for all 8 patterns | | [Config Schema](references/config-schema.md) | **Step 5** — Field descriptions and validation rules | | [Storage Mount](references/storage-mount.md) | **Step 5** — OSS/NAS mount details | | [Network Config](references/network-config.md) | **Step 4/5** — NLB/Gateway config details | | [Model-Image](references/model-image-matching.md) | **Step 2** — Image selection guide | | [Related APIs](references/related-apis.md) | **Any step** — CLI command reference | | [Workflow](references/deployment-workflow.md) | Overview — Full deployment flow | | [CLI Guide](references/cli-installation-guide.md) | Pre-checks — Plugin install | | [RAM Policies](references/ram-policies.md) | Pre-checks — Required permissions | | [Service Features](references/service-features.md) | Step 5 — Advanced features | FILE:references/acceptance-criteria.md # Acceptance Criteria: alibabacloud-pai-eas-service-deploy **Scenario**: PAI-EAS Service Deployment **Purpose**: Skill test acceptance criteria **Table of Contents** - [CLI Command Patterns](#correct-cli-command-patterns) - [Service Config Validation](#service-config-validation) - [Authentication Patterns](#authentication-patterns) - [Parameter Confirmation Requirements](#parameter-confirmation-requirements) - [Resource Cleanup](#resource-cleanup) --- # Correct CLI Command Patterns ## 1. EAS Service Operations ### ✅ Correct: Create Service ```bash aliyun eas create-service --region cn-hangzhou --body "$(cat service.json)" --user-agent AlibabaCloud-Agent-Skills ``` ### ❌ Wrong: Missing --user-agent ```bash aliyun eas create-service --region cn-hangzhou --body "$(cat service.json)" ``` ### ❌ Wrong: Using API format instead of plugin mode ```bash aliyun eas create-service --region cn-hangzhou --body "$(cat service.json)" ``` ## 2. AIWorkSpace Operations ### ✅ Correct: List Images ```bash aliyun aiworkspace list-images --verbose true --labels 'system.official=true,system.supported.eas=true' --page-size 50 --user-agent AlibabaCloud-Agent-Skills ``` ### ❌ Wrong: Labels format error ```bash aliyun aiworkspace list-images --verbose true --labels 'system.official=true' --user-agent AlibabaCloud-Agent-Skills ``` ## 3. OSS Operations ### ✅ Correct: List Buckets ```bash ossutil ls ``` ### ✅ Correct: List Objects ```bash ossutil ls oss://bucket-name/path/ ``` ### ❌ Wrong: Missing oss:// prefix ```bash ossutil ls bucket-name/path/ ``` ## 4. VPC Operations ### ✅ Correct: Query VPC ```bash aliyun vpc describe-vpcs --biz-region-id cn-hangzhou --vpc-id vpc-xxx --user-agent AlibabaCloud-Agent-Skills ``` ### ❌ Wrong: Missing user-agent ```bash aliyun vpc describe-vpcs --biz-region-id cn-hangzhou --vpc-id vpc-xxx ``` --- # Service Config Validation ## 1. metadata Config ### ✅ Correct: Service Name Format ```json { "metadata": { "name": "my-vllm-service", "instance": 1 } } ``` ### ❌ Wrong: Service name contains uppercase letters ```json { "metadata": { "name": "My-VLLM-Service", "instance": 1 } } ``` ### ❌ Wrong: Service name contains special characters ```json { "metadata": { "name": "my.vllm.service", "instance": 1 } } ``` ## 2. containers Config ### ✅ Correct: Container Config ```json { "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.14.0-gpu", "port": 8000, "script": "vllm serve /models --port 8000" }] } ``` ### ❌ Wrong: Missing port config ```json { "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.14.0-gpu", "script": "vllm serve /models --port 8000" }] } ``` ## 3. storage Config ### ✅ Correct: OSS Mount ```json { "storage": [{ "mount_path": "/models", "oss": { "path": "oss://bucket/models/", "readOnly": true } }] } ``` ### ❌ Wrong: OSS path missing trailing slash ```json { "storage": [{ "mount_path": "/models", "oss": { "path": "oss://bucket/models", "readOnly": true } }] } ``` ### ❌ Wrong: Missing oss:// prefix ```json { "storage": [{ "mount_path": "/models", "oss": { "path": "bucket/models/", "readOnly": true } }] } ``` ## 4. cloud Config ### ✅ Correct: Public Resource Group ```json { "cloud": { "computing": { "instance_type": "ecs.gn7-c12g1.12xlarge" } } } ``` ### ✅ Correct: Multi-spec Instances ```json { "cloud": { "computing": { "instances": [ {"type": "ecs.gn7-c12g1.12xlarge"}, {"type": "ecs.gn8is.2xlarge"} ] } } } ``` ### ❌ Wrong: Wrong field name ```json { "cloud": { "computing": { "instanceType": "ecs.gn7-c12g1.12xlarge" } } } ``` ## 5. networking Config ### ✅ Correct: Dedicated Gateway ```json { "networking": { "gateway": "gw-xxx" } } ``` ### ✅ Correct: NLB ```json { "networking": { "nlb": [{ "id": "default", "listener_port": 9000, "netType": "intranet" }] } } ``` ### ❌ Wrong: NLB port is 8080 (not allowed) ```json { "networking": { "nlb": [{ "id": "default", "listener_port": 8080, "netType": "intranet" }] } } ``` --- # Authentication Patterns ## ✅ Correct: Using CredentialClient (Python SDK) ```python from alibabacloud_credentials.client import Client as CredentialClient from alibabacloud_eas20210701.client import Client as EasClient from alibabacloud_tea_openapi import models as open_api_models credential = CredentialClient() config = open_api_models.Config(credential=credential) config.region_id = "cn-hangzhou" config.user_agent = "AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy" client = EasClient(config) ``` ## ❌ Wrong: Hardcoded AK/SK ```python config = open_api_models.Config( access_key_id="LTAIxxx", access_key_secret="xxx" ) ``` --- # Parameter Confirmation Requirements ## ✅ Correct: All user parameters must be confirmed The following parameters must be confirmed before deployment: - RegionId (region) - Service name - Workspace ID - Image URI - Instance type - OSS path (if mounting) - Gateway ID (if using dedicated gateway) - VPC/VSwitch/Security group (if using ALB/NLB) ## ❌ Wrong: Using default values without confirmation ```bash # Wrong: Using default region without asking user aliyun eas create-service --region cn-hangzhou ... ``` --- # Resource Cleanup ## ✅ Correct: Cleanup after deployment failure ```bash aliyun eas delete-service \ --cluster-id cn-hangzhou \ --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills ``` ## ❌ Wrong: Not cleaning up failed services Failure to delete after service creation failure leads to resource waste. FILE:references/api-reference.md # API 参考手册 ## API 返回结构 **阿里云 API 返回结构不一致,使用 jq 时需注意**: | API | jq 路径 | 结构 | |-----|---------|------| | `AIWorkSpace ListWorkspaces` | `.Workspaces[]` | 单层 | | `AIWorkSpace ListImages` | `.Images[]` | 单层 | | `eas DescribeMachineSpec` | `.InstanceMetas[]` | 单层 | | `eas list-gateway` | `.Gateways[]` | 单层 | | `eas ListResources` | `.Resources[]` | 单层 | | `eas DescribeService` | `.Service` | 单个对象 | | `eas describe-service-event` | `.Events[]` | 单层 | | `vpc DescribeVpcs` | `.Vpcs.Vpc[]` | ⚠️ 双层 | | `vpc DescribeVSwitches` | `.VSwitches.VSwitch[]` | ⚠️ 双层 | | `ecs DescribeSecurityGroups` | `.SecurityGroups.SecurityGroup[]` | ⚠️ 双层 | | `nlb ListLoadBalancers` | `.LoadBalancers[]` | 单层 | ## jq 示例 ```bash aliyun aiworkspace list-workspaces --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills | jq -r '.Workspaces[] | "\(.WorkspaceId)\t\(.WorkspaceName)"' aliyun eas list-gateway --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills | jq -r '.Gateways[] | "\(.GatewayId)\t\(.GatewayName)"' aliyun vpc describe-vpcs --biz-region-id cn-hangzhou --user-agent AlibabaCloud-Agent-Skills | jq -r '.Vpcs.Vpc[] | "\(.VpcId)\t\(.VpcName)"' ``` ## CLI 命令参考 ### 参数命名规则 | 命令类型 | 参数名 | 示例 | |---------|--------|------| | 列表类/创建类 | `--region` | `ListServices`, `CreateService` | | 针对单个服务 | `--cluster-id` | `DescribeService`, `DeleteService` | ### 常用命令 ```bash aliyun aiworkspace list-workspaces --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills aliyun aiworkspace list-images --verbose true --labels 'system.official=true,system.supported.eas=true' --page-size 100 --user-agent AlibabaCloud-Agent-Skills aliyun eas describe-machine-spec --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills aliyun eas list-resources --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills aliyun eas list-gateway --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills aliyun eas describe-gateway --cluster-id cn-hangzhou --gateway-id gw-xxx --user-agent AlibabaCloud-Agent-Skills aliyun eas create-service --region cn-hangzhou --body "$(cat service.json)" --user-agent AlibabaCloud-Agent-Skills aliyun eas describe-service --cluster-id cn-hangzhou --service-name my_service --user-agent AlibabaCloud-Agent-Skills ``` ## 权限列表 详见 [RAM 权限策略](ram-policies.md) ## 常用 GPU 规格 | 规格 | GPU | CPU | 内存 | 适用场景 | |------|-----|-----|------|---------| | `ecs.gn6i-c4g1.xlarge` | 1× T4 | 4 | 16Gi | 小模型推理 | | `ecs.gn6i-c8g1.2xlarge` | 1× T4 | 8 | 32Gi | 中等模型 | | `ecs.gn7-c12g1.12xlarge` | 4× A10 | 12 | 192Gi | 大模型推理 | FILE:references/cli-installation-guide.md # Aliyun CLI Installation & Configuration Guide Complete guide for installing and configuring Aliyun CLI. > **Aliyun CLI 3.3.1+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.1 or later for full plugin ecosystem coverage. **Table of Contents** - [Installation](#installation) - [Configuration](#configuration) - [Verification](#verification) - [Security Best Practices](#security-best-practices) - [Troubleshooting](#troubleshooting) - [Advanced Configuration](#advanced-configuration) - [Next Steps](#next-steps) - [References](#references) ## Installation ### macOS **Using Homebrew (Recommended)** ```bash brew install aliyun-cli # Upgrade to latest brew upgrade aliyun-cli # Verify version (>= 3.3.1) aliyun version ``` **Using Binary** ```bash # Download wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz # Extract tar -xzf aliyun-cli-macosx-latest-amd64.tgz # Move to PATH sudo mv aliyun /usr/local/bin/ # Verify aliyun version ``` ### Linux **Debian/Ubuntu** ```bash # Download wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz # Extract and install tar -xzf aliyun-cli-linux-latest-amd64.tgz sudo mv aliyun /usr/local/bin/ # Verify aliyun version ``` **CentOS/RHEL** ```bash # Download wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz # Extract and install tar -xzf aliyun-cli-linux-latest-amd64.tgz sudo mv aliyun /usr/local/bin/ # Verify aliyun version ``` **ARM64 Architecture** ```bash # Download ARM64 version wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz # Extract and install tar -xzf aliyun-cli-linux-latest-arm64.tgz sudo mv aliyun /usr/local/bin/ ``` ### Windows **Using Binary** 1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip 2. Extract the ZIP file 3. Add the directory to your PATH environment variable 4. Open new Command Prompt or PowerShell 5. Verify: `aliyun version` **Using PowerShell** ```powershell # Download Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip" # Extract Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli # Add to PATH (requires admin privileges) $env:Path += ";C:\aliyun-cli" [Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine) # Verify aliyun version ``` ## Configuration ### Quick Start ```bash aliyun configure set \ --mode AK \ --access-key-id <your-access-key-id> \ --access-key-secret <your-access-key-secret> \ --region cn-hangzhou ``` All `aliyun configure` commands support non-interactive flags, which is the recommended approach — it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts. **Where to Get Access Keys** 1. Log in to Aliyun Console: https://ram.console.aliyun.com/ 2. Navigate to: AccessKey Management 3. Create a new AccessKey pair 4. Save the secret immediately — it's only shown once ### Configuration Modes Aliyun CLI supports 6 authentication modes. All examples below use non-interactive flags. #### 1. AK Mode (Access Key) Most common mode for personal accounts and scripts. ```bash aliyun configure set \ --mode AK \ --access-key-id LTAI5tXXXXXXXX \ --access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \ --region cn-hangzhou ``` Configuration is stored in `~/.aliyun/config.json`: ```json { "current": "default", "profiles": [ { "name": "default", "mode": "AK", "access_key_id": "LTAI5tXXXXXXXX", "access_key_secret": "8dXXXXXXXXXXXXXXXXXXXXXXXX", "region_id": "cn-hangzhou", "output_format": "json", "language": "en" } ] } ``` #### 2. StsToken Mode (Temporary Credentials) For short-lived access (tokens expire in 1-12 hours). ```bash aliyun configure set \ --mode StsToken \ --access-key-id LTAI5tXXXXXXXX \ --access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \ --sts-token v1.0:XXXXXXXXXXXXXXXX \ --region cn-hangzhou ``` Use cases: CI/CD pipelines, temporary access for external contractors, cross-account access. #### 3. RamRoleArn Mode (Assume RAM Role) Assume a RAM role for elevated or cross-account access. ```bash aliyun configure set \ --mode RamRoleArn \ --access-key-id LTAI5tXXXXXXXX \ --access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \ --ram-role-arn acs:ram::123456789012:role/AdminRole \ --role-session-name my-session \ --region cn-hangzhou ``` Use cases: cross-account resource access, temporary elevated privileges, role-based access control. #### 4. EcsRamRole Mode (ECS Instance RAM Role) Use the RAM role attached to an ECS instance — no credentials needed. ```bash aliyun configure set \ --mode EcsRamRole \ --ram-role-name MyEcsRole \ --region cn-hangzhou ``` Requirements: must be running on an ECS instance with a RAM role attached. Use cases: scripts and automation running on ECS instances. #### 5. RsaKeyPair Mode (RSA Key Pair) Use RSA key pair for authentication (generate key pair in Aliyun Console first). ```bash aliyun configure set \ --mode RsaKeyPair \ --private-key /path/to/private-key.pem \ --key-pair-name my-key-pair \ --region cn-hangzhou ``` #### 6. RamRoleArnWithEcs Mode (ECS + RAM Role) Combine ECS instance role with RAM role assumption for cross-account access from ECS. ```bash aliyun configure set \ --mode RamRoleArnWithEcs \ --ram-role-name MyEcsRole \ --ram-role-arn acs:ram::123456789012:role/TargetRole \ --role-session-name my-session \ --region cn-hangzhou ``` ### Environment Variables **Highest priority** - overrides config file **Access Key Mode** ```bash export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret export ALIBABA_CLOUD_REGION_ID=cn-hangzhou ``` **STS Token Mode** ```bash export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret export ALIBABA_CLOUD_SECURITY_TOKEN=your_sts_token export ALIBABA_CLOUD_REGION_ID=cn-hangzhou ``` **ECS RAM Role Mode** ```bash export ALIBABA_CLOUD_ECS_METADATA=role_name ``` **Use Case**: - CI/CD pipelines - Docker containers - Temporary credential override ### Managing Multiple Profiles **Create Named Profiles** ```bash aliyun configure set --profile projectA \ --mode AK \ --access-key-id LTAI5tAAAAAAAA \ --access-key-secret 8dAAAAAAAAAAAAAAAAAAAAAAAA \ --region cn-hangzhou aliyun configure set --profile projectB \ --mode AK \ --access-key-id LTAI5tBBBBBBBB \ --access-key-secret 8dBBBBBBBBBBBBBBBBBBBBBBBB \ --region cn-shanghai ``` **Use Specific Profile** ```bash aliyun ecs describe-instances --profile projectA export ALIBABA_CLOUD_PROFILE=projectA aliyun ecs describe-instances # Uses projectA ``` **List and Switch Profiles** ```bash aliyun configure list # List all profiles aliyun configure set --current projectA # Switch default profile ``` ### Credential Priority Credentials are loaded in this order (first found wins): 1. **Command-line flag**: `--profile <name>` 2. **Environment variable**: `ALIBABA_CLOUD_PROFILE` 3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc. 4. **Configuration file**: `~/.aliyun/config.json` (current profile) 5. **ECS Instance RAM Role**: If running on ECS with attached role ## Verification ### Test Authentication ```bash # Basic test - list regions aliyun ecs describe-regions # Expected output: JSON array of regions ``` **If successful**, you'll see: ```json { "Regions": { "Region": [ { "RegionId": "cn-hangzhou", "RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com", "LocalName": "华东 1(杭州)" }, ... ] }, "RequestId": "..." } ``` **If failed**, you'll see error messages: - `InvalidAccessKeyId.NotFound` - Wrong Access Key ID - `SignatureDoesNotMatch` - Wrong Access Key Secret - `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode) - `Forbidden.RAM` - Insufficient permissions ### Debug Configuration ```bash # Show current configuration aliyun configure get # Test with debug logging aliyun ecs describe-regions --log-level=debug # Check credential provider aliyun configure get mode ``` ## Security Best Practices ### 1. Use RAM Users (Not Root Account) ❌ **Don't**: Use Aliyun root account credentials ✅ **Do**: Create RAM users with specific permissions ```bash # Create RAM user in console # Attach only necessary policies # Use RAM user's access keys ``` ### 2. Principle of Least Privilege Grant only the minimum permissions needed: ```bash # Example: Read-only ECS access # Attach policy: AliyunECSReadOnlyAccess ``` ### 3. Rotate Access Keys Regularly ```bash # Create new access key in RAM Console, then update configuration aliyun configure set --access-key-id NEW_KEY --access-key-secret NEW_SECRET # Delete old access key from console ``` ### 4. Use STS Tokens for Temporary Access ```bash aliyun configure set --mode StsToken \ --access-key-id XXXX --access-key-secret XXXX \ --sts-token XXXX --region cn-hangzhou ``` ### 5. Use ECS RAM Roles When Possible ```bash aliyun configure set --mode EcsRamRole --ram-role-name MyRole --region cn-hangzhou ``` ### 6. Never Commit Credentials ```bash # Add to .gitignore echo "~/.aliyun/config.json" >> .gitignore # Use environment variables in CI/CD instead ``` ### 7. Secure Config File ```bash # Restrict permissions chmod 600 ~/.aliyun/config.json ``` ## Troubleshooting ### Issue: Command Not Found ```bash # Check installation which aliyun # Check PATH echo $PATH # Reinstall or add to PATH ``` ### Issue: Authentication Failed ```bash # Verify configuration aliyun configure get # Test with debug aliyun ecs describe-regions --log-level=debug # Check credentials in console # Verify access key is active ``` ### Issue: Permission Denied ```bash # Error: Forbidden.RAM # Check RAM user permissions # Attach necessary policies in RAM console # Example: AliyunECSFullAccess for ECS operations ``` ### Issue: STS Token Expired ```bash # Error: InvalidSecurityToken.Expired # Reconfigure with new token aliyun configure set --mode StsToken \ --access-key-id XXXX --access-key-secret XXXX \ --sts-token NEW_TOKEN --region cn-hangzhou ``` ### Issue: Wrong Region ```bash # Some resources may not exist in the specified region # Check available regions aliyun ecs describe-regions # Update default region aliyun configure set region cn-shanghai ``` ## Advanced Configuration ### Custom Endpoint ```bash # Use custom or private endpoint export ALIBABA_CLOUD_ECS_ENDPOINT=ecs-vpc.cn-hangzhou.aliyuncs.com ``` ### Proxy Settings ```bash # HTTP proxy export HTTP_PROXY=http://proxy.example.com:8080 export HTTPS_PROXY=http://proxy.example.com:8080 # No proxy for specific domains export NO_PROXY=localhost,127.0.0.1,.aliyuncs.com ``` ### Timeout Settings ```bash # Connection timeout (default: 10s) export ALIBABA_CLOUD_CONNECT_TIMEOUT=30 # Read timeout (default: 10s) export ALIBABA_CLOUD_READ_TIMEOUT=30 ``` ## Next Steps After installation and configuration: 1. **Install plugins** for services you need (v3.3.1+ supports all published product plugins): ```bash aliyun plugin install --names ecs vpc rds # List all available plugins aliyun plugin list-remote ``` 2. **Explore commands**: ```bash aliyun ecs --help aliyun fc --help ``` 3. **Read documentation**: - [Command Syntax Guide](./command-syntax.md) - [Global Flags Reference](./global-flags.md) - [Common Scenarios](./common-scenarios.md) ## References - Official Documentation: https://help.aliyun.com/zh/cli/ - RAM Console: https://ram.console.aliyun.com/ - Access Key Management: https://ram.console.aliyun.com/manage/ak - Plugin Repository: https://github.com/aliyun/aliyun-cli FILE:references/config-examples.md # 服务配置示例 **目录** - [JSON 字段规范](#json-字段规范重要) - [容器模式配置](#容器模式配置重要) - [基础配置](#基础配置) - [完整配置](#完整配置) - [公共资源组配置](#公共资源组配置) - [专属资源组配置](#专属资源组配置) - [ALB 网关配置](#alb-网关配置) - [NLB 配置](#nlb-配置) - [自动扩缩容配置](#自动扩缩容配置) - [存储挂载配置](#存储挂载配置) ## ⚠️ JSON 字段规范(重要) **服务名称必须放在 `metadata.name` 字段**,不是顶层字段: ```json { "metadata": { "name": "my-service", // ✅ 正确:服务名称在这里 "instance": 1 } } ``` **错误示例**: ```json { "service_name": "my-service", // ❌ 错误:这是无效字段 "name": "my-service" // ❌ 错误:不在 metadata 中 } ``` ## 容器模式配置(重要) 使用镜像部署时,必须配置 `containers` 字段: ```json { "metadata": { "name": "my-service", "instance": 1 }, "containers": [{ "image": "镜像地址", "port": 8000, "command": "启动命令" }], "storage": [{ "mount_path": "/model_dir", "oss": { "path": "oss://bucket/models/" } }], "cloud": { "computing": { "instance_type": "ecs.gn6i-c8g1.2xlarge" } } } ``` **关键字段**: - `metadata.name` - 服务名称(必填) - `containers[].image` - 镜像地址(必填) - `containers[].port` - 服务端口(必填) - `containers[].command` - 启动命令(可选) **⚠️ 注意**:使用 `containers` 字段,不要使用 `processor` 或 `processor_path`。 ## 基础配置 ```json { "metadata": { "name": "simple_service", "instance": 1 }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.14.0-gpu", "port": 8000 }], "cloud": { "computing": { "instance_type": "ecs.gn6i-c8g1.2xlarge" } } } ``` ## 完整配置 ```json { "metadata": { "name": "myservice", "instance": 2, "workspace_id": "368951", "disk": "30Gi", "shm_size": 100, "enable_grpc": true }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.14.0-gpu", "port": 8000, "env": [ {"name": "NCCL_P2P_DISABLE", "value": "1"} ] }], "cloud": { "computing": { "instance_type": "ecs.gn6e-c12g1.12xlarge" }, "networking": { "vpc_id": "vpc-xxx", "vswitch_id": "vsw-xxx", "security_group_id": "sg-xxx" } }, "storage": [{ "mount_path": "/models", "oss": { "path": "oss://my-bucket/models/llama-7b", "readOnly": true } }], "networking": { "gateway": "gw-xxx" }, "autoscaler": { "min": 1, "max": 10, "scaleStrategies": [{ "metricName": "qps", "threshold": 100 }] } } ``` ## 公共资源组配置 ```json { "metadata": { "name": "public-resource-service", "instance": 1 }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.14.0-gpu", "port": 8000 }], "cloud": { "computing": { "instance_type": "ecs.gn6i-c8g1.2xlarge" } }, "storage": [{ "mount_path": "/models", "oss": { "path": "oss://my-bucket/models/" } }] } ``` ## 专属资源组配置 ```json { "metadata": { "name": "dedicated-resource-service", "instance": 1 }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.14.0-gpu", "port": 8000 }], "cloud": { "computing": { "instance_type": "ecs.gn6i-c8g1.2xlarge" } }, "resource": "eas-r-xxx", "storage": [{ "mount_path": "/models", "oss": { "path": "oss://my-bucket/models/" } }] } ``` ## ALB 网关配置 ```json { "metadata": { "name": "alb-gateway-service", "instance": 1 }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.14.0-gpu", "port": 8000 }], "cloud": { "computing": { "instance_type": "ecs.gn6i-c8g1.2xlarge" }, "networking": { "vpc_id": "{从网关获取}", "vswitch_id": "{从网关获取}", "security_group_id": "sg-xxx" } }, "networking": { "gateway": "gw-xxx" } } ``` ## NLB 配置 ```json { "metadata": { "name": "nlb-service", "instance": 1 }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.14.0-gpu", "port": 8000 }], "cloud": { "computing": { "instance_type": "ecs.gn6i-c8g1.2xlarge" }, "networking": { "vpc_id": "vpc-xxx", "vswitch_id": "vsw-xxx", "security_group_id": "sg-xxx" } }, "networking": { "nlb": [{ "id": "default", "listener_port": 8080, "netType": "intranet" }] } } ``` ## 自动扩缩容配置 ```json { "autoscaler": { "min": 1, "max": 10, "scaleStrategies": [ { "metricName": "qps", "threshold": 100 }, { "metricName": "cpu", "threshold": 80 } ] } } ``` ## 存储挂载配置 ```json { "storage": [ { "mount_path": "/models", "oss": { "path": "oss://my-bucket/models/", "readOnly": true } }, { "mount_path": "/data", "nfs": { "server": "xxx.cn-hangzhou.nas.aliyuncs.com", "path": "/share" } }, { "mount_path": "/dataset", "dataset": { "id": "d-xxx", "version": "v1" } } ] } ``` FILE:references/config-patterns.md # Complete Config Pattern Examples This document contains 8 complete JSON config patterns for different deployment scenarios. ## Config Pattern Decision Table | User Requirement | Pattern | Key Fields | |-----------------|---------|------------| | vLLM + OSS + Public Resource + Shared GW | Pattern 1 | `storage[].oss`, `cloud.computing` | | vLLM + Autoscaling | Pattern 2 | `autoscaler` (camelCase!) | | vLLM + NLB | Pattern 3 | `networking.nlb` | | SGLang + Health Check | Pattern 4 | `containers[].startup_check` | | ComfyUI + OSS | Pattern 5 | `storage[].oss` | | Custom Image / CPU→GPU Auto-switch | Pattern 6 | Only `containers[].image` | | EAS Resource Group | Pattern 7 | `metadata.resource` | | Dedicated Gateway | Pattern 8 | `networking.gateway`, `cloud.networking` | ## Steps to Build JSON 1. Select the best matching pattern based on user requirements 2. Copy the JSON template for that pattern 3. Replace actual values (service name, image URI, OSS path, instance type, etc.) 4. If extra requirements exist (e.g. autoscaling + NLB), merge corresponding fields 5. Save as `service.json` --- ## Pattern 1: vLLM + OSS Mount + Public Resource Group + Shared Gateway ```json { "metadata": { "name": "qwen35_7b_prod", "instance": 1, "workspace_id": "<workspace_id>" }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.13.0rc2.a8ec486.20260305pai-gpu", "port": 8000, "command": "vllm serve /model_dir --port 8000 --trust-remote-code", "startup_check": { "http_get": {"path": "/health", "port": 8000}, "initial_delay_seconds": 120, "period_seconds": 10, "failure_threshold": 30 } }], "storage": [{ "mount_path": "/model_dir", "oss": { "path": "oss://yqtest-model/qwen2.5-0.5b-instruct/", "readOnly": true } }], "cloud": { "computing": { "instance_type": "ecs.gn7i-c16g1.4xlarge" } } } ``` --- ## Pattern 2: vLLM + Autoscaling ```json { "metadata": { "name": "qwen_autoscaling_test", "instance": 1, "workspace_id": "<workspace_id>" }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.13.0rc2.a8ec486.20260305pai-gpu", "port": 8000, "command": "vllm serve /model_dir --port 8000 --trust-remote-code" }], "storage": [{ "mount_path": "/model_dir", "oss": { "path": "oss://yqtest-model/qwen2.5-0.5b-instruct/", "readOnly": true } }], "cloud": { "computing": { "instance_type": "ecs.gn7i-c16g1.4xlarge" } }, "autoscaler": { "min": 1, "max": 4, "scaleStrategies": [ {"metricName": "qps", "threshold": 20} ] } } ``` --- ## Pattern 3: vLLM + NLB ```json { "metadata": { "name": "qwen_nlb_test", "instance": 1, "workspace_id": "<workspace_id>" }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.13.0rc2.a8ec486.20260305pai-gpu", "port": 8000, "command": "vllm serve /model_dir --port 8000 --trust-remote-code" }], "storage": [{ "mount_path": "/model_dir", "oss": { "path": "oss://yqtest-model/qwen2.5-0.5b-instruct/", "readOnly": true } }], "cloud": { "computing": { "instance_type": "ecs.gn7i-c16g1.4xlarge" }, "networking": { "vpc_id": "vpc-xxx", "vswitch_id": "vsw-zone-a,vsw-zone-b", "security_group_id": "sg-xxx" } }, "networking": { "nlb": [{ "id": "default", "listener_port": 8000, "netType": "intranet" }] } } ``` --- ## Pattern 4: SGLang + Health Check ```json { "metadata": { "name": "llama3_8b_api", "instance": 1, "workspace_id": "<workspace_id>" }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/sglang:0.5.8-acclep1.2.1-gpu", "port": 8000, "command": "python -m sglang.launch_server --model-path /model_dir --host 0.0.0.0 --port 8000", "startup_check": { "http_get": {"path": "/health", "port": 8000}, "initial_delay_seconds": 120, "period_seconds": 10, "failure_threshold": 30 } }], "storage": [{ "mount_path": "/model_dir", "oss": { "path": "oss://yqtest-model/llama3-8b-instruct/", "readOnly": true } }], "cloud": { "computing": { "instance_type": "ecs.gn7i-c16g1.4xlarge" } } } ``` --- ## Pattern 5: ComfyUI + OSS Mount ```json { "metadata": { "name": "sdxl_inference", "instance": 1, "workspace_id": "<workspace_id>" }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/comfyui:2.2-api", "port": 8000, "command": "python main.py --listen 0.0.0.0 --port 8000", "startup_check": { "http_get": {"path": "/", "port": 8000}, "initial_delay_seconds": 60, "period_seconds": 10, "failure_threshold": 30 } }], "storage": [{ "mount_path": "/models", "oss": { "path": "oss://yqtest-model/sdxl-v1.0/", "readOnly": true } }], "cloud": { "computing": { "instance_type": "ecs.gn7i-c16g1.4xlarge" } } } ``` --- ## Pattern 6: vLLM + Small Model + GPU Instance (auto-switch from CPU when user requests CPU) ```json { "metadata": { "name": "qwen_cpu_service", "instance": 2, "workspace_id": "<workspace_id>" }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.13.0rc2.a8ec486.20260305pai-gpu", "port": 8000, "command": "vllm serve /model_dir --port 8000 --trust-remote-code" }], "storage": [{ "mount_path": "/model_dir", "oss": { "path": "oss://yqtest-model/qwen2.5-0.5b-instruct/", "readOnly": true } }], "cloud": { "computing": { "instance_type": "ecs.gn6i-c4g1.xlarge" } } } ``` --- ## Pattern 7: EAS Resource Group Deployment ```json { "metadata": { "name": "my_vllm_service", "instance": 1, "resource": "eas-r-d29k8ytqxzxmqi7l0s", "workspace_id": "<workspace_id>", "gpu": 1, "cpu": 4, "memory": 8000 }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.13.0rc2.a8ec486.20260305pai-gpu", "port": 8000, "command": "vllm serve /model_dir --port 8000 --trust-remote-code" }], "storage": [{ "mount_path": "/model_dir", "oss": { "path": "oss://yqtest-model/qwen2.5-0.5b-instruct/", "readOnly": true } }] } ``` --- ## Pattern 8: Dedicated Gateway Deployment ```json { "metadata": { "name": "my_gateway_service", "instance": 1, "workspace_id": "<workspace_id>" }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.13.0rc2.a8ec486.20260305pai-gpu", "port": 8000, "command": "vllm serve /model_dir --port 8000 --trust-remote-code" }], "storage": [{ "mount_path": "/model_dir", "oss": { "path": "oss://yqtest-model/qwen2.5-0.5b-instruct/", "readOnly": true } }], "networking": { "gateway": "gw-48hmbdt00fi6x90gft" }, "cloud": { "computing": { "instance_type": "ecs.gn7i-c16g1.4xlarge" }, "networking": { "vpc_id": "vpc-bp13kiflgde6v9dc9smc8", "vswitch_id": "vsw-bp1bhmnwqdh1ta9z9klms,vsw-bp1lz95xtmjiwqcpq31ng", "security_group_id": "sg-bp1e36bfv61nfy1yudyc" } } } ``` FILE:references/config-schema.md # Service Config Field Reference > This document lists all JSON config fields for PAI-EAS services. **Table of Contents** - [Config Structure Overview](#config-structure-overview) - [metadata (required)](#metadata-required) - [containers (required)](#containers-required) - [cloud (public resource group)](#cloud-public-resource-group) - [storage (mount)](#storage-mount) - [networking](#networking) - [autoscaler](#autoscaler) - [runtime](#runtime) - [features](#features) - [Full Example](#full-example) ## Config Structure Overview ```json { "metadata": { ... }, // Service metadata (required) "containers": [ ... ], // Container config (required) "cloud": { ... }, // Public resource group config "storage": [ ... ], // Storage mount "networking": { ... }, // Network config "autoscaler": { ... }, // Autoscaling "runtime": { ... }, // Runtime config "features": { ... } // Feature config } ``` --- ## metadata (required) Service metadata defining basic service information. | Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | `name` | string | ✅ | - | Service name, lowercase letters/digits/underscores, 3-63 chars | | `instance` | int | ❌ | 1 | Number of replicas | | `workspace_id` | string | ❌ | - | Workspace ID | | `resource` | string | ❌ | - | Dedicated resource group ID (mutually exclusive with cloud.computing) | | `disk` | string | ❌ | - | Temp disk size, e.g. "30Gi" | | `shm_size` | int | ❌ | 64 | Shared memory size (GB) | | `rdma` | int | ❌ | - | Number of RDMA NICs | | `enable_grpc` | bool | ❌ | false | Enable GRPC protocol | | `rolling_strategy` | object | ❌ | - | Rolling update strategy | | `eas` | object | ❌ | - | EAS advanced config | ### rolling_strategy | Field | Type | Description | |-------|------|-------------| | `max_surge` | int | Max new instances during rolling update | | `max_unavailable` | int | Max unavailable instances during rolling update | --- ## containers (required) Container config array, must contain at least one container. | Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | `image` | string | ✅ | - | Image URI | | `port` | int | ✅ | 8000 | Service port | | `script` | string | ❌ | - | Startup script | | `command` | string | ❌ | - | Startup command | | `args` | []string | ❌ | - | Command arguments | | `env` | []EnvVar | ❌ | - | Environment variables | | `prepare` | Prepare | ❌ | - | Pre-install config | | `startup_check` | Probe | ❌ | - | Startup probe | | `liveness_check` | Probe | ❌ | - | Liveness probe | | `health_check` | Probe | ❌ | - | Health check | | `resources` | ResourceRequirements | ❌ | - | Resource limits | ### EnvVar ```json {"name": "ENV_NAME", "value": "env_value"} ``` ### Prepare ```json { "pythonRequirements": ["numpy==1.6.4", "pandas"], "pythonRequirementsPath": "/path/to/requirements.txt" } ``` ### Probe (Health Check) | Field | Type | Default | Description | |-------|------|---------|-------------| | `http_get` | object | - | HTTP check config | | `initial_delay_seconds` | int | 15 | Initial delay in seconds | | `period_seconds` | int | 10 | Check interval in seconds | | `timeout_seconds` | int | 1 | Timeout in seconds | | `success_threshold` | int | 1 | Success threshold | | `failure_threshold` | int | 1 | Failure threshold | ```json { "http_get": {"path": "/health", "port": 8000}, "initial_delay_seconds": 15, "period_seconds": 10, "timeout_seconds": 1, "success_threshold": 1, "failure_threshold": 3 } ``` --- ## cloud (public resource group) Config when using public resource group. ### cloud.computing | Field | Type | Required | Description | |-------|------|----------|-------------| | `instance_type` | string | either/or | Instance type, e.g. "ecs.gn6i-c8g1.2xlarge" | | `instances` | []object | either/or | Multi-spec instance list | ```json { "cloud": { "computing": { "instance_type": "ecs.gn6i-c8g1.2xlarge" } } } ``` Or: ```json { "cloud": { "computing": { "instances": [ {"type": "ecs.gn6i-c8g1.2xlarge"}, {"type": "ecs.gn7-c12g1.12xlarge"} ] } } } ``` ### cloud.networking VPC network config (required for ALB/NLB). | Field | Type | Required | Description | |-------|------|----------|-------------| | `vpc_id` | string | ✅ | VPC ID | | `vswitch_id` | string | ✅ | VSwitch ID, comma-separated for multi-zone (e.g. `"vsw-a,vsw-b"`) | | `security_group_id` | string | ✅ | Security group ID | --- ## storage (mount) Storage mount config array. ### OSS Mount ```json { "mount_path": "/models", "oss": { "path": "oss://my-bucket/models/", "readOnly": true } } ``` ### NAS/NFS Mount ```json { "mount_path": "/data", "nfs": { "server": "xxx.cn-hangzhou.nas.aliyuncs.com", "path": "/share" } } ``` ### Dataset Mount ```json { "mount_path": "/dataset", "dataset": { "id": "d-xxx", "version": "v1", "read_only": true } } ``` --- ## networking ### Shared Gateway No networking field needed. ### ALB Dedicated Gateway ```json { "networking": { "gateway": "gw-xxx" } } ``` ### NLB ```json { "networking": { "nlb": [ { "id": "default", // or "nlb-xxx" "listener_port": 8000, "netType": "intranet" } ] } } ``` | Field | Type | Description | |-------|------|-------------| | `id` | string | "default" for system-created, or actual NLB ID | | `listener_port` | int | Listener port (cannot be 8080) | | `netType` | string | "intranet" or "internet" | --- ## autoscaler ### ⚠️ Important: Field Naming Convention **EAS API uses camelCase**: | ✅ Correct Field Name | ❌ Wrong Field Name | Description | |----------------------|---------------------|-------------| | `min` | ~~`min_replica`~~ | Min replicas | | `max` | ~~`max_replica`~~ | Max replicas | | `scaleStrategies` | ~~`scale_strategies`~~ | Scaling strategy array | | `metricName` | ~~`metric_name`~~ | Metric name | ```json { "autoscaler": { "min": 1, "max": 10, "scaleStrategies": [ {"metricName": "qps", "threshold": 100}, {"metricName": "cpu", "threshold": 80} ] } } ``` | Field | Type | Required | Default | Description | |-------|------|----------|---------|-------------| | `min` | int | ❌ | 1 | Min replicas | | `max` | int | ❌ | 10 | Max replicas | | `scaleStrategies` | []object | ❌ | - | Scaling strategies | ### scaleStrategies | Field | Description | |-------|-------------| | `metricName` | Metric name: qps, cpu, gpu, memory | | `threshold` | Trigger threshold | --- ## runtime ```json { "runtime": { "termination_grace_period": 30 } } ``` | Field | Type | Default | Description | |-------|------|---------|-------------| | `termination_grace_period` | int | 30 | Graceful shutdown wait time (seconds) | --- ## features ```json { "features": { "eas.aliyun.com/gpu-driver-version": "550.54.15" } } ``` Common features: - `eas.aliyun.com/gpu-driver-version`: GPU driver version --- ## Full Example ```json { "metadata": { "name": "my-llm-service", "instance": 2, "workspace_id": "<workspace_id>", "disk": "30Gi", "shm_size": 64, "enable_grpc": true, "rolling_strategy": { "max_surge": 1, "max_unavailable": 0 } }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.14.0-gpu", "port": 8000, "script": "vllm serve /models/qwen-7b --port 8000 --tensor-parallel-size 2", "env": [ {"name": "NCCL_P2P_DISABLE", "value": "1"} ], "startup_check": { "http_get": {"path": "/health", "port": 8000}, "initial_delay_seconds": 60, "period_seconds": 10, "failure_threshold": 30 } }], "cloud": { "computing": { "instance_type": "ecs.gn7-c12g1.12xlarge" }, "networking": { "vpc_id": "vpc-xxx", "vswitch_id": "vsw-zone-a,vsw-zone-b", "security_group_id": "sg-xxx" } }, "storage": [{ "mount_path": "/models", "oss": { "path": "oss://my-bucket/models/qwen-7b", "readOnly": true } }], "networking": { "gateway": "gw-xxx" }, "autoscaler": { "min": 1, "max": 5, "scaleStrategies": [ {"metricName": "qps", "threshold": 50} ] }, "runtime": { "termination_grace_period": 60 } } ``` FILE:references/deployment-workflow.md # Deployment Workflow Detailed interaction flow and display format for PAI-EAS service deployment. **Table of Contents** - [Core Interaction Principles](#core-interaction-principles) - [Phase 1: Basic Info](#phase-1-basic-info) - [Phase 2: Environment](#phase-2-environment) - [Phase 3: Resources](#phase-3-resources) - [Phase 4: Network](#phase-4-network) - [Phase 5: Service Features](#phase-5-service-features) - [Phase 6: Deploy](#phase-6-deploy) --- ## Core Interaction Principles ### 1. Step-by-Step Guidance Show current step → Wait for user input → Show next step ### 2. List Pagination When more than 10 items: `Enter number to select | 'n' next page | keyword filter | 'all' show all` ### 3. Resource Selection Pattern For optional resources (e.g. EAS resource group, gateway): `1. Select from existing 2. Skip` For workspaces: Auto-query; select if found, skip immediately if empty ### 4. Default Value Handling ``` Port [8000]: 1. Use default 2. Custom Select (enter 1 or 2): ``` Defaults: mount path `/model_dir`, port `8000`, replicas `1`, initial delay `60`, check interval `10` --- ## Phase 1: Basic Info ### Step 1.1: Service Name (required) Lowercase letters, digits, underscores, 3-63 characters. ### Step 1.2: Workspace (optional, auto-handled) ```bash aliyun aiworkspace list-workspaces --region <region> --page-size 100 --verbose true --user-agent AlibabaCloud-Agent-Skills ``` **Logic**: - **Has results** → Show list for user to select, record workspace_id - **Empty list** → **Skip immediately, proceed to next step! Never block asking user!** Simply omit `metadata.workspace_id` from JSON **⚠️ Never** stop deployment or ask user to create a workspace in the console **Display format (with results, paginated, max 10)**: ``` | # | Workspace ID | Name | |---|-------------|------------------| | 1 | 312319 | aiworkspace_test | | 2 | 545434 | ccjtest | Total: 20, showing 1-10 Enter number to select | 'n' next page | keyword filter | 'all' show all ``` --- ## Phase 2: Environment ### Step 2.1: Select Image **⚠️ MUST call `ListImages` to query official image list, even if user already provided image address (eval checkpoint)** | # | Category | Target Models | |---|----------|--------------| | 1 | LLM Inference | Qwen, Llama, Mistral | | 2 | Image Generation | Stable Diffusion | | 3 | Speech Synthesis | CosyVoice | | 4 | RAG | RAG Applications | | 5 | General Inference | PyTorch Models | | 6 | Custom Image | User-provided image | **LLM inference requires secondary selection**: vLLM / SGLang **Query images**: ```bash aliyun aiworkspace list-images --verbose true \ --labels 'system.official=true,system.supported.eas=true' \ --page-size 50 --user-agent AlibabaCloud-Agent-Skills | \ jq -r '.Images[] | "\(.ImageId)\t\(.Name)\t\(.Labels[] | select(.Key == "system.chipType") | .Value)"' ``` **Display format (paginated, max 10)**: ``` | # | Image Name | Version | Chip | |---|-----------------|-----------|------| | 1 | vllm:0.13.0rc2 | 0.13.0rc2 | GPU | | 2 | vllm:0.14.0-xpu | 0.14.0 | XPU | Total: 14, showing 1-10 Enter number to select | 'n' next page | keyword filter ``` **Query image chip type** (query immediately after selection, record CHIP_TYPE): ```bash aliyun aiworkspace get-image --image-id <image-id> --user-agent AlibabaCloud-Agent-Skills | \ jq '{ChipType: (.Labels[] | select(.Key == "system.chipType") | .Value), ImageUri: (.Labels[] | select(.Key == "system.imageUri") | .Value)}' ``` **Output example**: ```json {"ChipType": "GPU", "ImageUri": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.14.0-gpu"} ``` **Variable**: `CHIP_TYPE=GPU` (subsequent resource selection will auto-filter compatible instances) **⚠️ Image must match model type**, see [Model-Image Matching Guide](model-image-matching.md) **⚠️ Chip compatibility (never block)**: Image chip type must match instance type! - GPU image → `ecs.gn*` series (NVIDIA GPU instances) - CPU image → any non-GPU instance - **If user requests CPU instance + GPU image (e.g. vLLM/SGLang), you MUST auto-switch to a GPU instance (e.g. ecs.gn7i-c16g1.4xlarge) silently. vLLM and SGLang REQUIRE GPU — NEVER deploy them on CPU. Do NOT ask the user, just switch and explain your choice.** ### Step 2.2: Storage Mount **Do you need to mount storage?** | # | Option | Description | |---|--------|-------------| | 1 | Yes | Mount model files or data | | 2 | No (skip) | No storage mount | **If "Yes", select storage type**: | # | Storage Type | Description | |---|-------------|-------------| | 1 | OSS | Mount OSS bucket (most common) | | 2 | NAS | Mount NAS file system | | 3 | CPFS | Mount CPFS | | 4 | Dataset | Use PAI dataset | **OSS mount flow**: 1. **Query bucket list**: ```bash ossutil ls ``` 2. **Display bucket list (paginated, max 10)**: ``` | # | Bucket Name | Region | |---|---------------|-------------| | 1 | yqtest-model | cn-hangzhou | | 2 | my-bucket | cn-shanghai | Total: 15, showing 1-10 Enter number to select | 'n' next page | keyword filter ``` 3. **List directory**: ```bash ossutil ls oss://bucket-name/ ``` 4. **Display directory list**: ``` Selected Bucket: yqtest-model | # | Path | Description | |---|-----------------|-------------| | 1 | Qwen3.5-0.8B/ | LLM model | | 2 | llama-7b/ | LLM model | Select model directory (enter number): ``` 5. **Configure mount path**: ``` Mount path [/model_dir]: 1. Use default 2. Custom Select (enter 1 or 2): ``` **⚠️ Important**: OSS path format must be `oss://bucket/path/` (note trailing `/`) **⚠️ Important**: OSS models must be mounted via storage as local paths (e.g. `/model_dir`). Never pass oss:// URL directly to vllm/sglang commands! See [Storage Mount Guide](storage-mount.md) ### Step 2.3: Startup Command ```bash aliyun aiworkspace get-image --image-id <image-id> --user-agent AlibabaCloud-Agent-Skills | jq -r '.EasConfig.script' ``` **Logic**: - If image has preset command → Show to user for confirmation or modification - If no preset command → Use typical command as default **Typical commands**: | Image Type | Typical Command | |-----------|----------------| | vLLM | `vllm serve /model_dir --port 8000 --trust-remote-code` | | SGLang | `python -m sglang.launch_server --model-path /model_dir --port 8000` | | ComfyUI | `python main.py --listen 0.0.0.0 --port 8000` | **Interaction example**: ``` Startup command: 1. Use recommended: vllm serve /model_dir --port 8000 --trust-remote-code 2. Custom command Select (enter 1 or 2): ``` ### Step 2.4: Port ``` Port [8000]: 1. Use default 2. Custom Select (enter 1 or 2): ``` Default: `8000` --- ## Phase 3: Resources ### Step 3.1: Resource Type | # | Type | Description | |---|------|-------------| | 1 | Public Resource | On-demand, pay-as-you-go | | 2 | EAS Resource Group | Dedicated resource group | ### Step 3.2: Public Resource Group **Auto-filter logic** (based on image chip type from step 2.1): ``` Selected image chip type: GPU Filtering compatible instance types... ``` **Filter rules**: | Image Chip Type | jq Filter | Instance Type Pattern | |----------------|-----------|----------------------| | GPU | `.GPUAmount > 0` | `ecs.gn*` series (NVIDIA GPU) | | CPU | `.GPUAmount == 0` | Non-GPU instances | | PPU | `.InstanceType \| startswith("ecs.ebmppu")` | Hanguang instances | | XPU | `.InstanceType \| startswith("ecs.egs")` | XPU instances | **One-command query** (auto-select based on chip type): ```bash # Assuming CHIP_TYPE variable obtained from image labels aliyun eas describe-machine-spec --region <region> --user-agent AlibabaCloud-Agent-Skills | \ jq -r --arg chip "$CHIP_TYPE" ' .InstanceMetas[] | select(.IsAvailable == true) | if $chip == "GPU" then select(.GPUAmount > 0) | "\(.InstanceType)\t\(.GPUAmount)x\(.GPU)\t\(.CPU) cores\t\(.Memory)GB" elif $chip == "CPU" then select(.GPUAmount == 0) | "\(.InstanceType)\t-\t\(.CPU) cores\t\(.Memory)GB" elif $chip == "PPU" then select(.InstanceType | startswith("ecs.ebmppu")) | "\(.InstanceType)\tPPU\t\(.CPU) cores\t\(.Memory)GB" elif $chip == "XPU" then select(.InstanceType | startswith("ecs.egs")) | "\(.InstanceType)\tXPU\t\(.CPU) cores\t\(.Memory)GB" else empty end ' ``` **GPU image query** (chip type = GPU): ```bash aliyun eas describe-machine-spec --region <region> --user-agent AlibabaCloud-Agent-Skills | \ jq -r '.InstanceMetas[] | select(.IsAvailable == true and .GPUAmount > 0) | "\(.InstanceType)\t\(.GPUAmount)x\(.GPU)\t\(.CPU) cores\t\(.Memory)GB"' ``` **CPU image query** (chip type = CPU): ```bash aliyun eas describe-machine-spec --region <region> --user-agent AlibabaCloud-Agent-Skills | \ jq -r '.InstanceMetas[] | select(.IsAvailable == true and .GPUAmount == 0) | "\(.InstanceType)\t-\t\(.CPU) cores\t\(.Memory)GB"' ``` **Display format (paginated, max 10)**: ``` | # | Instance Type | GPU | CPU | Memory | |---|-------------------------|--------|---------|--------| | 1 | ecs.gn6i-c4g1.xlarge | 1×T4 | 4 cores | 16GB | | 2 | ecs.gn7-c12g1.12xlarge | 4×A10 | 12 cores| 192GB | Total: 45, showing 1-10 Enter number to select | 'n' next page | keyword filter ``` ### Step 3.3: EAS Resource Group **⚠️ MUST call `list-resources` to query resource group list, even if user already provided resource group ID:** ```bash # MUST call: list all resource groups aliyun eas list-resources --region <region> --user-agent AlibabaCloud-Agent-Skills # Query specific resource group details aliyun eas list-resources --region <region> --user-agent AlibabaCloud-Agent-Skills | \ jq -r '.Resources[] | "\(.ResourceId)\t\(.ResourceName)\t\(.GpuCount)\t\(.GpuUsed)\t\(.CpuCount)\t\(.CpuUsed)"' ``` **Display format**: ``` | # | Resource ID | Name | GPU (total/used/free) | CPU (total/used/free) | |---|------------|------------|----------------------|----------------------| | 1 | eas-r-xxx | production | 16/2/14 | 128/20/108 | ``` **Calculation**: free = total - used ### Step 3.4: Replicas ``` Replicas [1]: 1. Use default 2. Custom Select (enter 1 or 2): ``` --- ## Phase 4: Network ### Step 4.1: Gateway Type | # | Type | VPC Config | Description | |---|------|-----------|-------------| | 1 | Shared Gateway | ❌ Not needed | Free, suitable for testing | | 2 | ALB Dedicated Gateway | ✅ Required | Recommended for production | | 3 | NLB | ✅ Required | High-performance load balancing | ### Step 4.2: ALB Dedicated Gateway Config **⚠️ MUST call `list-gateway` first to list gateways (even if user provided gateway ID), then call `describe-gateway` to get VPC info:** ```bash # MUST call: list all gateways aliyun eas list-gateway --region <region> --user-agent AlibabaCloud-Agent-Skills ``` **Display format**: ``` | # | Gateway Name | Gateway ID | |---|-------------|-----------| | 1 | prod-gw | gw-xxx | | 2 | test-gw | gw-yyy | Enter number to select: ``` **Get gateway VPC info**: ```bash aliyun eas describe-gateway --cluster-id cn-hangzhou --gateway-id gw-xxx --user-agent AlibabaCloud-Agent-Skills | \ jq '{VpcId: .LoadBalancerList[0].VpcId, VSwitchIds: .LoadBalancerList[0].VSwitchIds}' ``` **Select VSwitch**: ``` Obtained gateway VPC: vpc-xxx | # | VSwitch ID | Availability Zone | |---|-------------|-------------------| | 1 | vsw-xxx | Zone A | | 2 | vsw-yyy | Zone B | Enter number to select: ``` **Select security group**: ```bash aliyun ecs describe-security-groups --biz-region-id cn-hangzhou --vpc-id <vpc-id> --user-agent AlibabaCloud-Agent-Skills ``` See [Network Config](network-config.md) --- ## Phase 5: Service Features ### Step 5.1: Feature Selection **Do you need to configure service features?** | # | Option | Description | |---|--------|-------------| | 1 | Configure features | Select features to enable | | 2 | Skip | No features enabled (default) | **If "Configure features"**: ``` Available features (multi-select with comma): | # | Feature | Description | |---|---------|-------------| | 1 | Health Check | Configure startup/liveness probes | | 2 | Rolling Update | Graceful shutdown + rolling strategy | | 3 | GRPC | Enable GRPC protocol | | 4 | Autoscaling | Auto-adjust replicas based on load | Enter feature numbers to enable (e.g. 1,2,4), enter 'done' to finish: ``` **When modifying feature status**: ``` Current service feature config: | # | Feature | Status | |---|---------|--------| | 1 | Health Check | ✅ Enabled | | 2 | Rolling Update | ❌ Disabled | | 3 | GRPC | ❌ Disabled | | 4 | Autoscaling | ❌ Disabled | Enter number to toggle (e.g. enter 1 to disable health check), enter 'done' to finish: ``` See [Service Features Config](service-features.md) --- ## Phase 6: Deploy ### Step 6.1: Config Preview **Compatibility validation** (must pass before deployment): ``` ✅ Image chip type: GPU ✅ Instance type: ecs.gn7-c12g1.12xlarge (GPU instance) ✅ Compatibility: Passed ``` If compatibility check fails: ``` ❌ Image chip type: XPU ❌ Instance type: ecs.gn7-c12g1.12xlarge (GPU instance) ❌ Compatibility: Failed - XPU image requires XPU instance, please change image or instance type ``` ``` Generated service configuration: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ { "metadata": { "name": "mytest_006", "instance": 1 }, "containers": [{ "image": "...", "port": 8000 }], ... } ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Config Summary: • Service Name: mytest_006 • Image: vllm:0.14.0-gpu (Chip: GPU) • Instance: ecs.gn7-c12g1.12xlarge (4×A10, GPU) • Network: Shared Gateway • Compatibility: ✅ Passed Select: 1. Confirm deploy 2. Modify config 3. Save JSON to file 4. Cancel ``` ### Step 6.2: Modify Config **Select config item to modify**: | # | Config Item | Current Value | |---|------------|---------------| | 1 | Service Name | mytest_006 | | 2 | Workspace | aiworkspace_test | | 3 | Image | vllm:0.13.0rc2 | | 4 | Storage Mount | Not configured | | 5 | Startup Command | vllm serve ... | | 6 | Port | 8000 | | 7 | Instance Type | ecs.gn5-c8g1.2xlarge | | 8 | Replicas | 1 | | 9 | Network | ALB Dedicated Gateway | | 10 | Features | Health Check | | 11 | Go back | - | Enter number to select: ### Step 6.3: Execute Deployment **Save config to file, then deploy**: ```bash # Save config to file cat > /tmp/eas_service.json << 'EOF' { "metadata": { "name": "my-service", "instance": 1 }, "containers": [{ "image": "...", "port": 8000 }], "cloud": { "computing": { "instance_type": "ecs.gn6i-c4g1.xlarge" } } } EOF # Execute deployment (may take 1-2 minutes) # ⚠️ Do NOT use file:// prefix, use $(cat) to read file content aliyun eas create-service --region <region> --body "$(cat /tmp/eas_service.json)" --user-agent AlibabaCloud-Agent-Skills ``` **⚠️ Note**: Use `$(cat file)` to read file content for `--body`, do NOT use `file://` prefix (may cause JSON parsing errors) **⚠️ CRITICAL: If service with same name exists**: You MUST delete it first using `aliyun eas delete-service --cluster-id <region> --service-name <name> --user-agent AlibabaCloud-Agent-Skills`, wait for deletion to complete, then recreate following ALL steps (ListImages, describe-machine-spec, create-service, describe-service). NEVER just report the existing service status — you MUST always go through the full deployment workflow. ### Step 6.4: Wait for Service Ready ```bash for i in $(seq 1 6); do STATUS=$(aliyun eas describe-service --cluster-id cn-hangzhou --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills | jq -r '.Status') case $STATUS in Running) echo "✅ Service ready"; break ;; Failed) echo "❌ Service startup failed"; break ;; *) echo "⏳ Status: $STATUS ($((i*30))s/180s)"; sleep 30 ;; esac done ``` ### Step 6.5: Display Deployment Result **MUST include InternetEndpoint and IntranetEndpoint from describe-service response.** ``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Deployment Result Service Name: mytest_006 Status: Running InternetEndpoint: http://xxx.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/xxx IntranetEndpoint: http://xxx.vpc.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/xxx ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ``` **⚠️ You MUST output the exact field names "InternetEndpoint" and "IntranetEndpoint" with their actual values from the describe-service API response.** ### Step 6.6: Service Invocation Example ``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Service Invocation Examples Endpoint: http://xxx.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/mytest_006 [curl] curl http://xxx/api/predict/mytest_006 -H "Content-Type: application/json" -d '{"input": "Hello"}' [OpenAI SDK] client = OpenAI(base_url="http://xxx/v1", ...) See [Service Invocation Examples](service-invoke-examples.md) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ``` ### Step 6.7: Failure Handling **Query events**: ```bash aliyun eas describe-service-event --cluster-id <region> --service-name <service-name> --user-agent AlibabaCloud-Agent-Skills | \ jq -r '.Events[-3:][] | "[\(.Type)] \(.Reason): \(.Message)"' ``` **Common errors**: | Error Type | Cause | Solution | |-----------|-------|---------| | ImagePullBackOff | Image pull failed | Check image address and permissions | | CrashLoopBackOff | Container startup failed | Check startup command and model path | | Instance crashed | Chip type mismatch | Image chip type must match instance type (GPU image → GPU instance) | | InsufficientResources | Resource shortage | Choose different instance type or resource group | | ModelNotFound | Wrong model path | Check OSS mount path | FILE:references/image-categories.md # Official Image Categories > **Usage**: After user selects a category, call the API to query images in that category. ## Category List ``` | # | Category | Description | Frameworks | |---|-------------------|-------------------------------|----------------------------------| | 1 | LLM Inference | Large language model inference | vLLM, SGLang | | 2 | Image Generation | Text-to-image, image processing| ComfyUI, Stable Diffusion, Kohya | | 3 | Speech Synthesis | TTS voice synthesis | CosyVoice | | 4 | RAG | Retrieval-augmented generation | PaiRag | | 5 | General Inference | General deep learning inference| PyTorch, Triton, TF Serving | | 6 | Tools | General tool images | OpenClaw, CoPaw, Python | ``` ## Category Details ### 1. LLM Inference **Flow**: Let user select framework first, then query **Framework selection**: ``` | # | Framework | Description | |---|-----------|------------------------| | 1 | vLLM | High-performance LLM inference | | 2 | SGLang | Efficient LLM serving framework | ``` **Query commands** (call once after user selects framework): ```bash # vLLM - fuzzy search by Name aliyun aiworkspace list-images --verbose true --name vllm \ --labels 'system.official=true,system.supported.eas=true' \ --page-size 50 --user-agent AlibabaCloud-Agent-Skills | jq -r '.Images[] | "\(.ImageId)\t\(.Name)\t\(.Labels[] | select(.Key == "system.chipType") | .Value)"' # SGLang - fuzzy search by Name aliyun aiworkspace list-images --verbose true --name sglang \ --labels 'system.official=true,system.supported.eas=true' \ --page-size 50 --user-agent AlibabaCloud-Agent-Skills | jq -r '.Images[] | "\(.ImageId)\t\(.Name)\t\(.Labels[] | select(.Key == "system.chipType") | .Value)"' ``` **Image info**: | Framework | Versions | Chips | |-----------|----------|-------| | vLLM | 0.7.x - 0.14.0 | GPU, PPU | | SGLang | 0.5.2 - 0.5.8 | GPU, PPU | ### 2. Image Generation **ComfyUI**: Node-based image generation workflow - Versions: 2.1, 2.2 - Chip: GPU **Stable Diffusion WebUI**: Classic SD interface - Versions: 4.1, 4.2 - Chip: GPU **Kohya**: Model training tool - Versions: 2.2, 25.0.3 - Chip: GPU **EasyAnimate**: Video generation - Versions: 1.1.4, 1.1.5 **Query command**: ```bash # ComfyUI - fuzzy search by Name aliyun aiworkspace list-images --verbose true --name comfyui \ --labels 'system.official=true,system.supported.eas=true' \ --page-size 50 --user-agent AlibabaCloud-Agent-Skills | jq -r '.Images[] | "\(.ImageId)\t\(.Name)"' ``` ### 3. Speech Synthesis **CosyVoice**: Alibaba speech synthesis - Versions: 0.1.5, 2.2.6, 3.0.6 - Components: webui, backend, frontend - Chips: GPU, PPU **Query command**: ```bash aliyun aiworkspace list-images --verbose true \ --labels 'system.official=true,system.supported.eas=true' \ --page-size 100 | jq '.Images[] | select(.Name | test("cosyvoice"; "i"))' ``` ### 4. RAG **PaiRag**: PAI RAG framework - Versions: 0.3.5, 0.4.3 - Chips: CPU, PPU **Query command**: ```bash # PaiRag - fuzzy search by Name aliyun aiworkspace list-images --verbose true --name pairag \ --labels 'system.official=true,system.supported.eas=true' \ --page-size 50 --user-agent AlibabaCloud-Agent-Skills | jq -r '.Images[] | "\(.ImageId)\t\(.Name)"' ``` ### 5. General Inference **PyTorch**: General deep learning framework - Versions: 2.2, 2.3.1, 2.5.1, 2.7.1 - Chips: GPU, CPU, PPU **Triton Server**: NVIDIA inference server - Versions: 21.09 - 25.03 - Chip: GPU **TensorFlow Serving**: TF model serving - Versions: 1.15.0, 2.11.1, 2.14.1, 2.17.1, 2.18.1 **Query command**: ```bash # PyTorch - fuzzy search by Name aliyun aiworkspace list-images --verbose true --name pytorch \ --labels 'system.official=true,system.supported.eas=true' \ --page-size 50 --user-agent AlibabaCloud-Agent-Skills | jq -r '.Images[] | "\(.ImageId)\t\(.Name)"' ``` ### 6. Tools **OpenClaw**: PAI tool image - Version: 2026.3.13 - Chip: CPU **CoPaw**: CoPaw tool image - Version: v0.0.7 - Chip: CPU **Query command**: ```bash # OpenClaw - fuzzy search by Name aliyun aiworkspace list-images --verbose true --name openclaw \ --labels 'system.official=true,system.supported.eas=true' \ --page-size 50 --user-agent AlibabaCloud-Agent-Skills | jq -r '.Images[] | "\(.ImageId)\t\(.Name)"' ``` --- ## Query Parameter Reference **Labels filter syntax**: ```bash # Multiple labels separated by comma, format: Key=Value --labels 'system.official=true,system.supported.eas=true' # Fuzzy search image name via Name parameter --name vllm ``` **Common labels**: | Label Key | Description | |-----------|-------------| | `system.official` | `true` = official image | | `system.supported.eas` | `true` = supports EAS deployment | | `system.chipType` | `GPU`/`CPU`/`PPU`/`XPU` = chip type | | `system.framework.xxx` | Framework type label, value = version | FILE:references/model-image-matching.md # Model-Image Matching Guide > **Important**: Before selecting an image, confirm your model type matches the image type. Mismatch will cause deployment failure! ## Quick Matching Table | Model Type | Recommended Image Category | Specific Image | Model Format | |-----------|--------------------------|---------------|-------------| | **LLM** (Qwen, Llama, Mistral, Baichuan, etc.) | LLM Inference | vLLM, SGLang | HuggingFace (safetensors) | | **LLM** (quantized) | LLM Inference | llama.cpp | GGUF | | **Image Generation** (Stable Diffusion, SDXL) | Image Generation | ComfyUI, SD WebUI | .safetensors, .ckpt | | **Speech Synthesis** (TTS) | Speech Synthesis | CosyVoice | Model-specific format | | **RAG Applications** | RAG | PaiRag | - | | **Custom Inference** | General Inference | PyTorch, Triton | .pt, .pth, .onnx | ## Common Errors ### ❌ Wrong Example ``` Model: Qwen3.5-0.8B (LLM) Image: ComfyUI (Image Generation) Result: Deployment failed, image does not support LLM inference ``` ### ✅ Correct Example ``` Model: Qwen3.5-0.8B (LLM) Image: vLLM or SGLang Result: Deployment successful ``` ## Detailed Matching Rules ### 1. LLM Models **HuggingFace Format** (.safetensors, model.bin): ``` Image: vLLM, SGLang Model directory structure: ├── config.json ├── model.safetensors (or model.bin) ├── tokenizer.json ├── tokenizer_config.json └── vocab.json ``` **GGUF Format**: ``` Image: llama.cpp Model file: *.gguf ``` **Note**: Do not use vLLM for GGUF models; do not use llama.cpp for HuggingFace format. ### 2. Image Generation Models **Stable Diffusion Series**: ``` Image: ComfyUI, Stable Diffusion WebUI Model format: .safetensors, .ckpt Storage path: /models/checkpoints/ or /models/stable-diffusion/ ``` **Note**: LLM models cannot be used with image generation images. ### 3. Speech Synthesis Models **CosyVoice**: ``` Image: CosyVoice Requires: Pre-trained model files ``` ## Model Download Sources | Source | Format | Description | |--------|--------|-------------| | HuggingFace | safetensors | Most common, natively supported by vLLM/SGLang | | ModelScope | safetensors | Faster access in China, HuggingFace compatible | | HuggingFace GGUF | GGUF | Quantized models, requires llama.cpp | ## How to Determine Model Type 1. **Check file extension**: - `.safetensors` + `config.json` → HuggingFace LLM - `.gguf` → GGUF LLM - `.ckpt` / `.safetensors` (no config.json) → Image model 2. **Check model name**: - Contains `llama`, `qwen`, `mistral`, `baichuan` → LLM - Contains `sd`, `stable-diffusion`, `sdxl` → Image generation - Contains `cosyvoice`, `tts` → Speech synthesis 3. **Check source page**: - HuggingFace model page indicates model type ## Chip Type Compatibility (Important) > **⚠️ Image chip type must match instance type, otherwise deployment fails!** ### Image Chip Types Images use `system.chipType` label to indicate supported chips: | Chip Type | Description | Instance Type Prefix | |-----------|-------------|---------------------| | **GPU** | NVIDIA GPU | `ecs.gn`, `ecs.gn6`, `ecs.gn7`, `ecs.gn8` | | **CPU** | CPU only | Non-GPU instances | | **PPU** | Alibaba Hanguang chip | `ecs.ebmppu` | | **XPU** | Alibaba XPU | `ecs.egs` | ### Compatibility Matrix | Image Chip Type | Compatible Instance Types | |----------------|--------------------------| | GPU | gn6i, gn6v, gn7, gn6e, gn8 (NVIDIA GPU instances) | | CPU | Any non-GPU instance | | PPU | ebmppu (Hanguang instances) | | XPU | egs (XPU instances) | ### Query Image Chip Type ```bash aliyun aiworkspace get-image --image-id <image-id> --user-agent AlibabaCloud-Agent-Skills | \ jq -r '.Labels[] | select(.Key == "system.chipType") | .Value' ``` ### Query Instance Type GPU Info ```bash aliyun eas describe-machine-spec --region <region> --user-agent AlibabaCloud-Agent-Skills | \ jq -r '.InstanceMetas[] | select(.InstanceType == "<instance-type>") | {InstanceType, GPUAmount, GPU}' ``` ### Common Errors ``` ❌ Wrong: Image: vllm:0.14.0-xpu (XPU chip) Instance: ecs.gn7-c12g1.12xlarge (GPU instance) Result: Instance crashed - chip type mismatch ✅ Correct: Image: vllm:0.14.0-gpu (GPU chip) Instance: ecs.gn7-c12g1.12xlarge (GPU instance) Result: Deployment successful ``` --- ## Pre-deployment Checklist ``` □ Confirm model type (LLM / Image / Speech / Custom) □ Select matching image category □ Confirm model format is compatible with image □ Confirm image chip type matches instance type □ Confirm model file completeness (config.json, tokenizer, etc.) □ Confirm model path is mounted correctly ``` FILE:references/network-config.md # Network Configuration Rules **Table of Contents** - [Network Config Requirements](#network-config-requirements) - [Consistency Requirements](#consistency-requirements) - [ALB Dedicated Gateway](#alb-dedicated-gateway) - [NLB Configuration](#nlb-configuration) - [Shared Gateway](#shared-gateway) ## Network Config Requirements | Gateway Type | `cloud.networking` | Description | |-------------|-------------------|-------------| | **Shared Gateway** | ❌ Not required | For testing, not recommended for production, no VPC config needed | | **ALB Dedicated Gateway** | ✅ **Required** | VPC/VSwitch obtained from gateway | | **NLB** | ✅ **Required** | VPC/VSwitch must be consistent with NLB | ## Consistency Requirements ``` When deploying with ALB/NLB: ├── cloud.networking.vpc_id ← Must match gateway/NLB VPC ├── cloud.networking.vswitch_id ← Must match gateway/NLB VSwitch └── cloud.networking.security_group_id ← Must be in the same VPC as VSwitch ⚠️ Important: VPC, VSwitch, and security group must all be under the same VPC! ``` ## ALB Dedicated Gateway ### Config Flow 1. Select gateway (from `list-gateway`) 2. Call `DescribeGateway` to get VPC/VSwitch 3. Select security group under the gateway VPC ### Config Example ```json { "networking": { "gateway": "gw-abc123" }, "cloud": { "networking": { "vpc_id": "{from gateway}", "vswitch_id": "{from gateway, comma-separated if multiple}", "security_group_id": "{user selected}" } } } ``` ### API Calls ```bash # List gateways aliyun eas list-gateway --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills # Get gateway details (includes VPC and VSwitch info) aliyun eas describe-gateway --cluster-id cn-hangzhou --gateway-id gw-xxx --user-agent AlibabaCloud-Agent-Skills # Key fields in response: # .LoadBalancerList[0].VpcId → VPC ID # .LoadBalancerList[0].VSwitchIds → VSwitch list ``` **Correct jq command to get VPC and comma-separated VSwitch ID**: ```bash aliyun eas describe-gateway --cluster-id cn-hangzhou --gateway-id gw-xxx --user-agent AlibabaCloud-Agent-Skills | \ jq '{ vpc_id: .LoadBalancerList[0].VpcId, vswitch_id: (.LoadBalancerList[0].VSwitchIds | join(",")) }' ``` **Response example**: ```json { "vpc_id": "vpc-bp13kiflgde6v9dc9smc8", "vswitch_id": "vsw-bp1bhmnwqdh1ta9z9klms,vsw-bp1lz95xtmjiwqcpq31ng" } ``` **Query security groups** (after getting VPC): ```bash aliyun ecs describe-security-groups --biz-region-id cn-hangzhou --vpc-id {gateway_vpc} --user-agent AlibabaCloud-Agent-Skills | \ jq '.SecurityGroups[] | "\(.SecurityGroupId)\t\(.SecurityGroupName)"' ``` --- ## NLB Configuration ### Two Modes | Mode | ID | Description | |------|-----|-------------| | **System NLB** | `"default"` | System auto-creates, lifecycle follows service (recommended) | | **Custom NLB** | Actual NLB ID | Associate with user's existing NLB instance | ### System NLB (Recommended) System automatically creates an NLB instance under your account. The NLB lifecycle follows the service. **Config example**: ```json { "networking": { "nlb": [{ "id": "default", "listener_port": 9090, "netType": "intranet" }] }, "cloud": { "networking": { "vpc_id": "vpc-xxx", "vswitch_id": "vsw-zone-a,vsw-zone-b", "security_group_id": "sg-xxx" } } } ``` **Parameter description**: | Parameter | Description | |-----------|-------------| | `id` | Fixed as `"default"` | | `listener_port` | Listener port (cannot be 8080) | | `netType` | `intranet` or `internet` | | `vswitch_id` | **Comma-separated, must include at least 2 VSwitches across different zones** (e.g. `"vsw-xxx-a,vsw-xxx-b"`) | **Multi-port config supported**: ```json { "networking": { "nlb": [ {"id": "default", "listener_port": 9090, "netType": "intranet"}, {"id": "default", "listener_port": 9091, "netType": "internet"} ] } } ``` **Config flow**: 1. Select "System NLB" 2. Configure VPC info (reuse high-speed direct connect VPC/VSwitch) 3. Select network type (intranet/internet/both) 4. Configure listener port **⚠️ Important rules**: - `vswitch_id` must be **comma-separated with at least 2 VSwitches across different availability zones** (e.g. `"vsw-zone-a,vsw-zone-b"`) - Port 8080 is reserved by EAS engine, cannot be used - VPC, VSwitch, and security group must all be under the same VPC - System NLB lifecycle follows the service ### Custom NLB Associate with user's existing NLB instance. **Config example**: ```json { "networking": { "nlb": [{ "id": "nlb-fpj3530zrbt7x5zkhk", "listener_port": 9090 }] }, "cloud": { "networking": { "vpc_id": "vpc-xxx", "vswitch_id": "vsw-zone-a,vsw-zone-b", "security_group_id": "sg-xxx" } } } ``` **Requirements**: - NLB must be in the same VPC as the service - `vswitch_id` must be comma-separated with at least 2 VSwitches across different zones - Port 8080 cannot be used - Port must not conflict with existing NLB listeners - Custom NLB and system NLB are mutually exclusive (modifying will disassociate the old one) ### NLB API Calls ```bash # Query existing NLBs aliyun nlb list-load-balancers --biz-region-id cn-hangzhou --user-agent AlibabaCloud-Agent-Skills # Get NLB details (includes VPC and VSwitch info) aliyun nlb get-load-balancer-attribute --load-balancer-id nlb-xxx --user-agent AlibabaCloud-Agent-Skills | \ jq '{vpc_id: .VpcId, vswitch_id: ([.ZoneMappings[].VswitchId] | join(","))}' ``` **Response example**: ```json { "vpc_id": "vpc-xxx", "vswitch_id": "vsw-xxx-a,vsw-xxx-b" } ``` **Query security groups** (after getting NLB VPC): ```bash aliyun ecs describe-security-groups --biz-region-id cn-hangzhou --vpc-id {nlb_vpc} --user-agent AlibabaCloud-Agent-Skills | \ jq '.SecurityGroups[] | "\(.SecurityGroupId)\t\(.SecurityGroupName)"' ``` **⚠️ Important rules**: - VPC, VSwitch, and security group must all be under the NLB's VPC - Security group must be in the **same VPC** as NLB - Port 8080 is reserved by EAS engine, cannot be used ### Access URL ``` Access URL: <NLB_domain>:<listener_port>/api/predict/<service_name> Get domain: https://nlb.console.aliyun.com/ ``` ## Shared Gateway No `cloud.networking` config needed, uses default mode. ```json { "metadata": { "name": "my-service" }, "containers": [...] // No networking or cloud.networking fields needed } ``` FILE:references/ram-policies.md # RAM Policies Minimum RAM permissions required for PAI-EAS service deployment. ## Minimum Permission Policy ```json { "Version": "1", "Statement": [ { "Effect": "Allow", "Action": [ "eas:CreateService", "eas:DescribeService", "eas:ListServices", "eas:DescribeServiceEndpoints", "eas:DescribeServiceEvent", "eas:DescribeMachineSpec", "eas:ListResources", "eas:ListGateway", "eas:DescribeGateway" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "aiworkspace:ListWorkspaces", "aiworkspace:ListImages", "aiworkspace:GetImage", "aiworkspace:ListDatasets" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "oss:ListBuckets", "oss:GetBucketLocation", "oss:ListObjects" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "vpc:DescribeVpcs", "vpc:DescribeVSwitches" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ecs:DescribeSecurityGroups" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "nlb:ListLoadBalancers" ], "Resource": "*" } ] } ``` ## Permission Details ### EAS Service Permissions | Action | Description | Use Case | |--------|-------------|----------| | `eas:CreateService` | Create service | Deploy new service | | `eas:DescribeService` | Query service details | View status, get endpoints | | `eas:ListServices` | List services | View existing services | | `eas:DescribeServiceEndpoints` | Query service endpoints | Get access URL | | `eas:DescribeServiceEvent` | Query service events | Diagnose issues | | `eas:DescribeMachineSpec` | Query machine specs | Select instance type | | `eas:ListResources` | List resource groups | Query dedicated resources | | `eas:ListGateway` | List gateways | Query available gateways | | `eas:DescribeGateway` | Query gateway details | Get gateway VPC info | ### AIWorkSpace Permissions | Action | Description | Use Case | |--------|-------------|----------| | `aiworkspace:ListWorkspaces` | List workspaces | Select workspace | | `aiworkspace:ListImages` | List images | Query available images | | `aiworkspace:GetImage` | Get image details | View image config | | `aiworkspace:ListDatasets` | List datasets | Select dataset mount | ### OSS Permissions | Action | Description | Use Case | |--------|-------------|----------| | `oss:ListBuckets` | List buckets | Select OSS storage | | `oss:GetBucketLocation` | Get bucket region | Verify cross-region access | | `oss:ListObjects` | List objects | Browse model files | ### VPC Permissions | Action | Description | Use Case | |--------|-------------|----------| | `vpc:DescribeVpcs` | Query VPCs | Network config | | `vpc:DescribeVSwitches` | Query VSwitches | Network config | ### ECS Permissions | Action | Description | Use Case | |--------|-------------|----------| | `ecs:DescribeSecurityGroups` | Query security groups | Network config | ### NLB Permissions | Action | Description | Use Case | |--------|-------------|----------| | `nlb:ListLoadBalancers` | List load balancers | NLB network config | ## Extended Operations Permissions (Optional) For more complete PAI-EAS operations (update, delete services, etc.), add these permissions: ```json { "Version": "1", "Statement": [ { "Effect": "Allow", "Action": [ "eas:UpdateService", "eas:DeleteService", "eas:ScaleService", "eas:DescribeServiceLog", "eas:ResumeService", "eas:SuspendService" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "aiworkspace:ListImages", "aiworkspace:ListImageLabels", "aiworkspace:GetImage", "aiworkspace:ListWorkspaces", "aiworkspace:ListDatasets" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "vpc:DescribeVpcs", "vpc:DescribeVSwitches", "ecs:DescribeSecurityGroups" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "oss:ListBuckets", "oss:GetBucketLocation", "oss:ListObjects" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "nlb:ListLoadBalancers", "nlb:GetLoadBalancerAttribute" ], "Resource": "*" } ] } ``` ## Permission Check Check current user permissions: ```bash aliyun ram get-login-profile --user-name <username> ``` Or check user authorization policies via the RAM Console. FILE:references/related-apis.md # Related API List All APIs and CLI commands involved in PAI-EAS service deployment. ## EAS Service APIs | API | CLI Command | Description | |-----|------------|-------------| | CreateService | `aliyun eas create-service` | Create service | | DescribeService | `aliyun eas describe-service` | Query service details | | ListServices | `aliyun eas list-services` | List services | | DescribeServiceEndpoints | `aliyun eas describe-service-endpoints` | Query service endpoint | | DescribeServiceEvent | `aliyun eas describe-service-event` | Query service events | | DescribeMachineSpec | `aliyun eas describe-machine-spec` | Query machine specs | | ListResources | `aliyun eas list-resources` | List resource groups | | ListGateway | `aliyun eas list-gateway` | List gateways | | DescribeGateway | `aliyun eas describe-gateway` | Query gateway details | ## AIWorkSpace APIs | API | CLI Command | Description | |-----|------------|-------------| | ListWorkspaces | `aliyun aiworkspace list-workspaces` | List workspaces | | ListImages | `aliyun aiworkspace list-images` | List images | | GetImage | `aliyun aiworkspace get-image` | Get image details | | ListDatasets | `aliyun aiworkspace list-datasets` | List datasets | ## OSS APIs | API | CLI Command | Description | |-----|------------|-------------| | ListBuckets | `ossutil ls` | List buckets | | GetBucketLocation | `ossutil stat` | Get bucket region | | ListObjects | `ossutil ls oss://bucket/` | List objects | ## VPC APIs | API | CLI Command | Description | |-----|------------|-------------| | DescribeVpcs | `aliyun vpc describe-vpcs` | Query VPCs | | DescribeVSwitches | `aliyun vpc describe-vswitches` | Query VSwitches | ## ECS APIs | API | CLI Command | Description | |-----|------------|-------------| | DescribeSecurityGroups | `aliyun ecs describe-security-groups` | Query security groups | ## NLB APIs | API | CLI Command | Description | |-----|------------|-------------| | ListLoadBalancers | `aliyun nlb list-load-balancers` | List load balancers | --- ## CLI Command Details ### EAS Service Operations ```bash # ⚠️ Do NOT use file:// prefix, use $(cat) to read file content aliyun eas create-service --region cn-hangzhou --body "$(cat service.json)" --user-agent AlibabaCloud-Agent-Skills aliyun eas describe-service --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills aliyun eas list-services --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills aliyun eas describe-machine-spec --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills aliyun eas list-resources --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills aliyun eas list-gateway --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills aliyun eas describe-gateway --cluster-id cn-hangzhou --gateway-id gw-xxx --user-agent AlibabaCloud-Agent-Skills ``` ### AIWorkSpace Operations ```bash aliyun aiworkspace list-workspaces --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills aliyun aiworkspace list-images --verbose true --labels 'system.official=true,system.supported.eas=true' --page-size 50 --user-agent AlibabaCloud-Agent-Skills aliyun aiworkspace get-image --image-id image-xxx --user-agent AlibabaCloud-Agent-Skills aliyun aiworkspace list-datasets --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills ``` ### OSS Operations ```bash ossutil ls ossutil ls oss://bucket-name/path/ ``` ### VPC Operations ```bash aliyun vpc describe-vpcs --biz-region-id cn-hangzhou --user-agent AlibabaCloud-Agent-Skills aliyun vpc describe-vswitches --biz-region-id cn-hangzhou --user-agent AlibabaCloud-Agent-Skills ``` ### ECS Operations ```bash aliyun ecs describe-security-groups --biz-region-id cn-hangzhou --user-agent AlibabaCloud-Agent-Skills ``` ### NLB Operations ```bash aliyun nlb list-load-balancers --biz-region-id cn-hangzhou --user-agent AlibabaCloud-Agent-Skills ``` --- ## SDK Call Metadata If you need to use Python Common SDK instead of CLI, here are the API metadata: | Service | API | popCode | popVersion | |---------|-----|---------|------------| | EAS | CreateService | eas | 2021-07-01 | | EAS | DescribeService | eas | 2021-07-01 | | EAS | ListServices | eas | 2021-07-01 | | EAS | DescribeServiceEndpoint | eas | 2021-07-01 | | EAS | ListServiceEvents | eas | 2021-07-01 | | EAS | DescribeMachineSpec | eas | 2021-07-01 | | EAS | ListResources | eas | 2021-07-01 | | EAS | ListGateway | eas | 2021-07-01 | | EAS | DescribeGateway | eas | 2021-07-01 | | NLB | ListLoadBalancers | nlb | 2022-04-30 | | AIWorkSpace | ListWorkspaces | aiworkspace | 2020-04-20 | | AIWorkSpace | ListImages | aiworkspace | 2020-04-20 | | AIWorkSpace | GetImage | aiworkspace | 2020-04-20 | | AIWorkSpace | ListDatasets | aiworkspace | 2020-04-20 | | OSS | ListBuckets | oss | 2019-05-17 | | OSS | GetBucketLocation | oss | 2019-05-17 | | OSS | ListObjects | oss | 2019-05-17 | | VPC | DescribeVpcs | vpc | 2016-04-28 | | VPC | DescribeVSwitches | vpc | 2016-04-28 | | ECS | DescribeSecurityGroups | ecs | 2014-05-26 | FILE:references/service-features.md # Service Features Configuration Guide Detailed configuration guide for health check, rolling update, GRPC, and autoscaling. **Table of Contents** - [Feature Selection Interaction](#feature-selection-interaction) - [Health Check](#1-health-check) - [Rolling Update](#2-rolling-update) - [GRPC](#3-grpc) - [Autoscaling](#4-autoscaling) - [Combined Feature Example](#5-combined-feature-example) --- ## Feature Selection Interaction ``` | # | Feature | Default Status | |---|---------|---------------| | 1 | Health Check | ✅ Enabled | | 2 | Rolling Update | ✅ Enabled | | 3 | GRPC | ❌ Disabled | | 4 | Autoscaling | ❌ Disabled | Select features to enable (multi-select with comma, e.g. 1,2,4), or press Enter for defaults: ``` **User input handling**: - Enter numbers (e.g. `1,3,4`) → Enable corresponding features - Press Enter → Use defaults (health check + rolling update) - Enter `none` → Disable all features --- ## 1. Health Check Health check monitors whether the service is running normally, including startup and liveness probes. ### Startup Probe (startup_check) Checks whether the service is ready after startup. ### Parameter Description | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | http_get.path | string | `/` | HTTP check path | | http_get.port | int | 8000 | Check port | | initial_delay_seconds | int | 60 | Initial delay (how long after startup to begin checking) | | period_seconds | int | 10 | Check interval | | timeout_seconds | int | 1 | Single check timeout | | success_threshold | int | 1 | Success threshold (consecutive successes to consider healthy) | | failure_threshold | int | 30 | Failure threshold (consecutive failures to consider unhealthy) | ### JSON Config ```json { "containers": [{ "startup_check": { "http_get": { "path": "/health", "port": 8000 }, "initial_delay_seconds": 60, "period_seconds": 10, "timeout_seconds": 1, "success_threshold": 1, "failure_threshold": 30 } }] } ``` ### TCP Check ```json { "containers": [{ "startup_check": { "tcp_socket": { "port": 8000 }, "initial_delay_seconds": 60, "period_seconds": 10 } }] } ``` ### Interaction Flow ``` Configure health check parameters: Check type: 1. HTTP (recommended) 2. TCP HTTP check path [/health]: 1. Use default 2. Custom Initial delay seconds [60]: 1. Use default 2. Custom Check interval seconds [10]: 1. Use default 2. Custom Failure threshold [30]: 1. Use default 2. Custom ``` --- ## 2. Rolling Update Rolling update ensures zero-downtime service upgrades through graceful shutdown and rolling strategy. ### Parameter Description | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | termination_grace_period | int | 30 | Graceful shutdown wait time (seconds) | | max_surge | int | 1 | Max new instances during rolling | | max_unavailable | int | 0 | Max unavailable instances during rolling | | enable_sigterm | bool | true | Enable SIGTERM signal | ### JSON Config ```json { "metadata": { "rolling_strategy": { "max_surge": 1, "max_unavailable": 0 } }, "eas": { "termination_grace_period": 30 }, "rpc": { "enable_sigterm": true } } ``` ### Parameter Details **termination_grace_period**: - How long to wait after receiving stop signal before force termination - Recommended: 30-120 seconds - Should be longer than time to process current requests **max_surge**: - How many new instances can be started during rolling update - Higher value = faster update, but more resource consumption - Recommended: 1-2 **max_unavailable**: - Max instances that can be unavailable during rolling update - Set to 0 for zero-downtime update - Recommended: 0 (production) ### Interaction Flow ``` Configure rolling update parameters: Graceful shutdown wait (seconds) [30]: 1. Use default 2. Custom Max new instances [1]: 1. Use default 2. Custom Max unavailable instances [0]: 1. Use default 2. Custom ``` --- ## 3. GRPC Enable GRPC protocol support. Service will support both HTTP and GRPC calls. ### JSON Config ```json { "metadata": { "enable_grpc": true } } ``` ### Description - Once enabled, service listens for both HTTP and GRPC requests - GRPC port defaults to same as HTTP port - Suitable for high-performance RPC scenarios ### Interaction Flow ``` GRPC protocol enabled. Service will support both HTTP and GRPC calls. ``` --- ## 4. Autoscaling Auto-adjust service replicas based on load. ### ⚠️ Important: Field Naming Convention **EAS API uses camelCase**, ensure correct field names: | ✅ Correct Field Name | ❌ Wrong Field Name | Description | |----------------------|---------------------|-------------| | `min` | ~~`min_replica`~~ | Min replicas | | `max` | ~~`max_replica`~~ | Max replicas | | `scaleStrategies` | ~~`scale_strategies`~~ | Scaling strategy array | | `metricName` | ~~`metric_name`~~ | Metric name | **Using wrong field names will cause config to be ignored, falling back to defaults (min=0, max=0)!** ### Parameter Description | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | min | int | 1 | Min replicas | | max | int | 10 | Max replicas | | metricName | string | qps | Scaling metric: qps, cpu, gpu, memory | | threshold | int | 100 | Scaling threshold | ### JSON Config ```json { "autoscaler": { "min": 1, "max": 10, "scaleStrategies": [ {"metricName": "qps", "threshold": 100} ] } } ``` ### Scaling Metrics | Metric | Description | Recommended Threshold | |--------|-------------|----------------------| | qps | Queries per second | 50-200 | | cpu | CPU utilization (%) | 70-85 | | gpu | GPU utilization (%) | 70-85 | | memory | Memory utilization (%) | 70-85 | ### Interaction Flow ``` Configure autoscaling parameters: Min replicas (min) [1]: 1. Use default 2. Custom Max replicas (max) [10]: 1. Use default 2. Custom Scaling metric (metricName): 1. qps (recommended) 2. cpu 3. gpu 4. memory Select metric (enter number): Scaling threshold [100]: 1. Use default 2. Custom ``` ### Config Examples **Example 1: QPS-based autoscaling** ```json { "autoscaler": { "min": 1, "max": 5, "scaleStrategies": [ {"metricName": "qps", "threshold": 50} ] } } ``` **Example 2: Multi-metric autoscaling** ```json { "autoscaler": { "min": 2, "max": 10, "scaleStrategies": [ {"metricName": "qps", "threshold": 100}, {"metricName": "cpu", "threshold": 80} ] } } ``` ### ⚠️ Common Mistakes **Mistake 1: Using snake_case naming** ```json // ❌ Wrong - fields will be ignored { "autoscaler": { "min_replica": 1, "max_replica": 10, "scale_strategies": [ {"metric_name": "qps", "threshold": 100} ] } } // ✅ Correct - use camelCase { "autoscaler": { "min": 1, "max": 10, "scaleStrategies": [ {"metricName": "qps", "threshold": 100} ] } } ``` **Mistake 2: Setting min and max to same value** ```json // ❌ Wrong - cannot scale with equal min and max { "autoscaler": { "min": 5, "max": 5 } } // ✅ Correct - max > min { "autoscaler": { "min": 1, "max": 5 } } ``` --- ## 5. Combined Feature Example Enable health check, rolling update, and GRPC together: ```json { "metadata": { "name": "my-service", "instance": 2, "enable_grpc": true, "rolling_strategy": { "max_surge": 1, "max_unavailable": 0 } }, "eas": { "termination_grace_period": 30 }, "rpc": { "enable_sigterm": true }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.14.0-gpu", "port": 8000, "startup_check": { "http_get": {"path": "/health", "port": 8000}, "initial_delay_seconds": 60, "period_seconds": 10, "failure_threshold": 30 } }], "autoscaler": { "min": 1, "max": 5, "scaleStrategies": [ {"metricName": "qps", "threshold": 50} ] } } ``` --- *References*: - [Health Check Docs](https://help.aliyun.com/zh/pai/user-guide/advanced-configuration-health-check) - [Rolling Update Docs](https://help.aliyun.com/zh/pai/user-guide/scrolling-updates-with-graceful-exit) - [Autoscaling Docs](https://help.aliyun.com/zh/pai/user-guide/autoscaling) *Last updated*: 2025-03-21 - Fixed autoscaler field naming FILE:references/service-invoke-examples.md # Service Invocation Examples PAI-EAS service invocation examples. --- ## Endpoint Formats | Type | Format | Description | |------|--------|-------------| | Public | `http://{service_name}.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/{service_name}` | Public internet access | | Internal | `http://{service_name}.gw-xxx-vpc.cn-hangzhou.pai-eas.aliyuncs.com/` | VPC internal access (faster and more stable) | --- ## Invocation Methods ### Method 1: HTTP Invocation (curl) ```bash curl http://xxx.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/my_service \ -H "Content-Type: application/json" \ -d '{"input": "Hello, please introduce yourself"}' ``` ### Method 2: OpenAI SDK Compatible Invocation For vLLM, SGLang and other OpenAI API compatible images. ```python from openai import OpenAI client = OpenAI( base_url="http://xxx.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/my_service/v1", api_key="xxx" # Fill in if Token auth is configured, otherwise omit ) response = client.chat.completions.create( model="/model_dir", # Model mount path messages=[{"role": "user", "content": "Hello, please introduce yourself"}], max_tokens=100 ) print(response.choices[0].message.content) ``` ### Method 3: Python requests ```python import requests url = "http://xxx.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/my_service" headers = {"Content-Type": "application/json"} data = {"input": "Hello, please introduce yourself"} response = requests.post(url, json=data, headers=headers) print(response.json()) ``` ### Method 4: Internal Network Invocation Use internal endpoint within the same VPC for faster and more stable access: ```bash curl http://xxx.gw-xxx-vpc.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/my_service \ -H "Content-Type: application/json" \ -d '{"input": "Hello"}' ``` --- ## Image-specific Endpoints | Image Type | Endpoint | Description | |-----------|----------|-------------| | vLLM | `/v1/chat/completions` | OpenAI compatible API | | SGLang | `/v1/chat/completions` | OpenAI compatible API | | ComfyUI | `/prompt` | See image docs | | SD WebUI | `/sdapi/v1/txt2img` | See image docs | | Custom | `/` | Depends on implementation | --- ## Authentication | Gateway Type | Authentication Method | |-------------|----------------------| | Shared Gateway | No auth by default, publicly accessible | | ALB/NLB | Token auth can be configured, add `Authorization: Bearer <token>` header | --- ## vLLM Complete Example ```python from openai import OpenAI # Service endpoint base_url = "http://my-service.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/my_service/v1" client = OpenAI( base_url=base_url, api_key="your-token" # Optional ) # Chat completion response = client.chat.completions.create( model="/model_dir", # Model mount path messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Introduce Alibaba Cloud PAI-EAS"} ], temperature=0.7, max_tokens=500 ) print(response.choices[0].message.content) ``` --- ## FAQ **Q: Getting 404 response?** A: Check service name and endpoint are correct, ensure service status is Running. **Q: Getting 401 response?** A: Check if Token auth is configured, add `Authorization: Bearer <token>` header. **Q: Cannot access internal endpoint?** A: Ensure the caller and service are in the same VPC. FILE:references/storage-mount.md # Storage Mount Configuration Guide > Reference: https://help.aliyun.com/zh/pai/user-guide/mount-storage-to-services EAS services support multiple storage mount methods to mount model files, code files, or other config files into service instances. **⚠️ Important: OSS models MUST be mounted via storage config as local paths (e.g. `/model_dir`), then use local paths in startup commands. Never pass `oss://` URL directly to vllm/sglang commands!** **Table of Contents** - [Storage Type Selection Guide](#storage-type-selection-guide) - [Supported Storage Types](#supported-storage-types) - [OSS Mount](#oss-mount) - [Dataset Mount](#dataset-mount) - [NAS Mount](#nas-mount) - [CPFS Mount](#cpfs-mount) - [Full Config Example](#full-config-example) - [Notes](#notes) --- ## Storage Type Selection Guide | Data Type | Recommended Storage | Description | |-----------|-------------------|-------------| | Models, images, videos (primarily read) | **OSS** | Most common | | Frequent small file I/O, shared read/write across instances | NAS | General-purpose NAS | | HPC, AI training, ultra-low latency needed | Ultra-fast NAS / CPFS | High throughput | | PAI registered datasets | **Dataset** | Public AI assets | | Git repositories | Git mount | Read-only | | PAI registered code | Code config | Public AI assets | | PAI registered models | PAI models | Read-only | --- ## Supported Storage Types | Type | Status | Description | |------|--------|-------------| | OSS | ✅ Supported | Most common, mount OSS bucket | | NAS | ✅ Supported | General-purpose NAS, ultra-fast NAS | | CPFS | ✅ Supported | CPFS (requires Lingjun quota) | | Dataset | ✅ Supported | PAI dataset (OSS type only) | | Git mount | ❌ Not supported | Git repositories | | Code config | ❌ Not supported | PAI code sets | | PAI models | ❌ Not supported | PAI registered models | | Image mount | ❌ Not supported | Docker Image mount | | EmptyDir | ❌ Not supported | Local temp directory | --- ## OSS Mount ### Config Example ```json { "storage": [{ "mount_path": "/mnt/data/", "oss": { "path": "oss://bucket-name/path/", "readOnly": true } }] } ``` ### Parameter Description | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `mount_path` | string | ✅ | Target path in service instance, e.g. `/mnt/data/` | | `oss.path` | string | ✅ | OSS path, format `oss://bucket/path/` (note trailing `/`) | | `oss.readOnly` | bool | ❌ | Read-only, default false | ### Interaction Flow ``` Step 2.2: Storage Mount 1. Do you need to mount storage? | # | Option | Description | |---|--------|-------------| | 1 | Yes | Mount model files or data | | 2 | No (skip) | No storage mount | 2. Select storage type: | # | Storage Type | Description | |---|-------------|-------------| | 1 | OSS | Mount OSS bucket (most common) | | 2 | Dataset | Use PAI dataset | 3. OSS config: a. Query bucket list: ossutil ls b. Select bucket (paginated, max 10) c. List directory: ossutil ls oss://bucket-name/ d. Select model directory e. Mount path [/model_dir]: 1. Use default 2. Custom f. Permissions [read-only]: 1. Read-only (recommended) 2. Read-write ``` ### Multiple OSS Mounts ```json { "storage": [ { "mount_path": "/models", "oss": { "path": "oss://my-bucket/models/", "readOnly": true } }, { "mount_path": "/data", "oss": { "path": "oss://my-bucket/data/", "readOnly": false } } ] } ``` ### FAQ **Q: Mounted OSS but getting file not found error?** A: Usually a path issue. For example: - Mount `oss://my-bucket/` to `/mnt/data` - File in OSS: `oss://my-bucket/subfolder/myfile.txt` - Container access path: `/mnt/data/subfolder/myfile.txt` (not `/mnt/data/myfile.txt`) **Q: Can I mount OSS from a different region?** A: No. EAS cannot cross-region mount OSS. Use OSS cross-region replication to sync data to the same region. --- ## Dataset Mount Mount PAI datasets as public AI assets. **⚠️ Only OSS-type custom datasets are supported for mounting.** ### Config Example ```json { "storage": [{ "mount_path": "/mnt/dataset/", "dataset": { "id": "d-pcsah1t86bm8xxxx", "version": "v1", "read_only": true } }] } ``` ### Parameter Description | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `mount_path` | string | ✅ | Target path in service instance | | `dataset.id` | string | ✅ | Dataset ID | | `dataset.version` | string | ❌ | Dataset version, e.g. `v1` | | `dataset.read_only` | bool | ❌ | Read-only, default true | ### Query Datasets ```bash aliyun aiworkspace list-datasets --region cn-hangzhou ``` ### Create Datasets Reference: [Create and manage datasets](https://help.aliyun.com/zh/pai/user-guide/create-and-manage-datasets) --- ## NAS Mount NAS mount (general-purpose and ultra-fast NAS) only supports same-region intranet mount. Requires direct network connectivity to the NAS VSwitch. ### Config Example ```json { "storage": [{ "mount_path": "/mnt/data/", "nfs": { "path": "/", "server": "06ba74****-a****.cn-hangzhou.nas.aliyuncs.com", "readOnly": false } }] } ``` ### Parameter Description | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `mount_path` | string | ✅ | Target path in service instance, e.g. `/mnt/data/` | | `nfs.server` | string | ✅ | NAS mount point address | | `nfs.path` | string | ✅ | Source path in NAS, e.g. `/` | | `nfs.readOnly` | bool | ❌ | Read-only, default false | | `nfs.resourceGroup` | string | ❌ | File system resource group | ### Interaction Flow ``` NAS Mount Config: 1. NAS mount point: 06ba74****-a****.cn-hangzhou.nas.aliyuncs.com 2. File system path [/]: 1. Use default 2. Custom 3. Mount path [/mnt/data/]: 1. Use default 2. Custom 4. Permissions [read-only]: 1. Read-only 2. Read-write ``` ### Notes - **Same-region only**: NAS only supports same-region intranet mount - **Network config**: Requires direct network connectivity to the NAS VSwitch - **Console**: View file system ID and mount point at [NAS Console](https://nasnext.console.aliyun.com/) - **Network setup**: See [Network Configuration](https://help.aliyun.com/zh/pai/user-guide/configure-network-connectivity) --- ## CPFS Mount CPFS file system mount, suitable for HPC and AI training scenarios. ### Config Example ```json { "storage": [{ "mount_path": "/mnt/data/", "nfs": { "path": "/", "server": "cpfs-xxx.cn-hangzhou.cpfs.aliyuncs.com", "readOnly": false } }] } ``` ### Notes - **Quota requirement**: Only supported when deploying EAS services with Lingjun quota - **Same-region only**: CPFS only supports same-region intranet mount - **Network config**: Requires direct network connectivity to the CPFS VSwitch --- ## Git Mount (Not Supported) ```json { "storage": [{ "mount_path": "/mnt/data/", "git": { "repo": "https://codeup.aliyun.com/xxx/eas/aitest.git", "branch": "master", "commit": "xxx", "username": "username", "password": "password or access token" } }] } ``` --- ## EmptyDir Mount (Not Supported) Read/write to local disk during instance runtime. Content persists across instance restarts. ```json { "storage": [{ "mount_path": "/data_image", "empty_dir": {} }] } ``` ### Shared Memory Config ```json { "storage": [{ "mount_path": "/dev/shm", "empty_dir": { "medium": "memory", "size_limit": 20 } }] } ``` --- ## Full Config Example ```json { "metadata": { "name": "my-service", "instance": 1 }, "containers": [{ "image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/vllm:0.14.0-gpu", "port": 8000 }], "storage": [ { "mount_path": "/model_dir", "oss": { "path": "oss://my-bucket/models/qwen/", "readOnly": true } }, { "mount_path": "/mnt/dataset/", "dataset": { "id": "d-xxx", "version": "v1", "read_only": true } } ] } ``` --- ## Notes 1. **Mount paths cannot overlap** 2. **Mount paths cannot be system reserved paths**: `/`, `/bin`, `/etc`, `/usr`, etc. 3. **OSS path format must be correct**: `oss://bucket/path/`, note the trailing `/` 4. **Permissions**: Model files should use `readOnly: true` to prevent accidental modification 5. **Without storage mount**: Files downloaded to the instance are stored on system disk and will be cleared on restart or update FILE:references/verification-method.md # Verification Methods How to verify PAI-EAS service deployment success. ## Deployment Verification Flow ### Step 1: Verify Service Creation ```bash aliyun eas describe-service \ --cluster-id cn-hangzhou \ --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills ``` **Expected result**: - Response JSON contains `ServiceId` - `Status` field is `Creating` or `Running` ### Step 2: Wait for Service Ready ```bash for i in $(seq 1 6); do STATUS=$(aliyun eas describe-service \ --cluster-id cn-hangzhou \ --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills | jq -r '.Status') case $STATUS in Running) echo "✅ Service ready"; break ;; Failed) echo "❌ Service startup failed"; break ;; *) echo "⏳ Status: $STATUS ($((i*30))s/180s)"; sleep 30 ;; esac done ``` **Expected result**: - Service status becomes `Running` ### Step 3: Get Service Endpoints ```bash aliyun eas describe-service \ --cluster-id cn-hangzhou \ --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills | jq '{ Status: .Status, InternetEndpoint: .InternetEndpoint, IntranetEndpoint: .IntranetEndpoint, ServiceId: .ServiceId }' ``` **Expected result**: - `InternetEndpoint` or `IntranetEndpoint` has value ### Step 4: Verify Service Accessibility **HTTP service verification**: ```bash curl -s -o /dev/null -w "%{http_code}" \ "http://<endpoint>/health" ``` **Expected result**: - HTTP status code 200 **vLLM service verification**: ```bash curl http://<endpoint>/v1/models ``` **Expected result**: - Returns model list JSON ### Step 5: View Service Events (optional) If service startup fails, check events: ```bash aliyun eas describe-service-event \ --cluster-id cn-hangzhou \ --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills ``` --- ## Config Verification ### Verify Image Config ```bash aliyun aiworkspace get-image \ --image-id <image-id> \ --user-agent AlibabaCloud-Agent-Skills | jq '.EasConfig' ``` ### Verify Resource Config ```bash aliyun eas describe-machine-spec \ --cluster-id cn-hangzhou \ --user-agent AlibabaCloud-Agent-Skills | jq ".InstanceMetas[] | select(.InstanceType == \"<instance-type>\")" ``` ### Verify Storage Config ```bash ossutil ls oss://<bucket>/<path>/ ``` --- ## Service Invocation Verification ### Service Console After successful deployment, view details in PAI Console: ``` https://pai.console.aliyun.com/?regionId=<region>&workspaceId=<workspace-id>#/eas/serviceDetail/<service-name>/detail ``` **Example**: ``` https://pai.console.aliyun.com/?regionId=cn-hangzhou&workspaceId=1111#/eas/serviceDetail/qwen35_7b_prod/detail ``` ### Get Service Token Service invocation requires Token authentication. Get from console service detail page, or via API: ```bash aliyun eas describe-service \ --cluster-id <region> \ --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills | jq -r '.Token' ``` ### Shared Gateway Invocation ```bash curl "http://<endpoint>/api/predict/<service-name>" \ -H "Content-Type: application/json" \ -H "Authorization: <token>" \ -d '{"input": "test"}' ``` **Example** (token redacted): ```bash curl "http://1227512831780489.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/qwen35_7b_prod" \ -H "Content-Type: application/json" \ -H "Authorization: ZmYxMmMz***YWY4ZDU=" \ -d '{"input": "Hello"}' ``` ### Dedicated Gateway Invocation ```bash curl "http://<service-name>.<gateway-id>-vpc.<region>.pai-eas.aliyuncs.com/" \ -H "Content-Type: application/json" \ -H "Authorization: <token>" \ -d '{"input": "test"}' ``` ### OpenAI Compatible Interface ```python from openai import OpenAI client = OpenAI( base_url="http://<endpoint>/v1", api_key="<token>" ) response = client.chat.completions.create( model="<model-name>", messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content) ``` ### vLLM Service Invocation ```bash curl "http://<endpoint>/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: <token>" \ -d '{ "model": "/models", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100 }' ``` --- ## Troubleshooting ### Service Status Check ```bash aliyun eas describe-service \ --cluster-id cn-hangzhou \ --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills | jq '{Status, Message, Reason}' ``` ### View Service Logs ```bash aliyun eas describe-service-log \ --cluster-id cn-hangzhou \ --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills ``` ### Check Resource Config - Confirm instance type is available - Confirm resource group has sufficient quota - Confirm VPC/VSwitch/security group config is correct ### Check Image Config - Confirm image URI is correct - Confirm startup command is correct - Confirm port config is correct ### Check Storage Config - Confirm OSS path exists - Confirm mount paths don't conflict - Confirm permissions are correct FILE:scripts/create-service-from-json.sh #!/bin/bash # ============================================================================== # PAI-EAS Service Deployment Script # Create PAI-EAS service using JSON configuration # # Usage: # ./create-service-from-json.sh --config service.json # ./create-service-from-json.sh --config-json '{"metadata":{"name":"xxx"...}}' # ./create-service-from-json.sh --config service.json --region cn-hangzhou # # Output: # JSON format, containing service creation result # ============================================================================== set -euo pipefail # Parameter parsing CONFIG_FILE="" CONFIG_JSON="" REGION="" WAIT_READY=true TIMEOUT=300 while [[ $# -gt 0 ]]; do case $1 in --config) CONFIG_FILE="$2" shift 2 ;; --config-json) CONFIG_JSON="$2" shift 2 ;; --region) REGION="$2" shift 2 ;; --no-wait) WAIT_READY=false shift ;; --timeout) TIMEOUT="$2" shift 2 ;; -h|--help) cat <<EOF Usage: $0 --config service.json $0 --config-json '{"metadata":{"name":"xxx"...}}' $0 --config service.json --region cn-hangzhou Parameters: --config FILE JSON config file path --config-json JSON JSON config string --region REGION Region, defaults to ALIBABACLOUD_REGION env var --no-wait Do not wait for service to be ready --timeout SECONDS Wait timeout in seconds, default 300 -h, --help Show help information Environment Variables: ALIBABACLOUD_REGION Region (if --region not specified) Examples: # Deploy from file $0 --config service.json --region cn-hangzhou # Deploy from JSON string $0 --config-json '{"metadata":{"name":"test"},"containers":[{"image":"..."}]}' # Do not wait for ready $0 --config service.json --no-wait Output format: Success: {"success":true,"service_id":"...","service_name":"...","status":"Running"} Failure: {"success":false,"error":"...","service_name":"..."} EOF exit 0 ;; *) echo "Unknown parameter: $1" >&2 exit 1 ;; esac done # Check environment variables if [[ -z "$REGION" ]]; then REGION="-" fi if [[ -z "$REGION" ]]; then echo '{"success":false,"error":"Please specify --region or set ALIBABACLOUD_REGION environment variable"}' exit 1 fi # Get script directory SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)" # Read config if [[ -n "$CONFIG_FILE" ]]; then if [[ ! -f "$CONFIG_FILE" ]]; then echo "{\"success\":false,\"error\":\"Config file does not exist: $CONFIG_FILE\"}" exit 1 fi CONFIG=$(cat "$CONFIG_FILE") elif [[ -n "$CONFIG_JSON" ]]; then CONFIG="$CONFIG_JSON" else echo '{"success":false,"error":"Please provide --config or --config-json parameter"}' exit 1 fi # Validate config echo "Validating config..." >&2 VALIDATION=$("$SCRIPT_DIR/validate-service-config.sh" --config-json "$CONFIG" --fix 2>/dev/null) if [[ $? -ne 0 ]]; then ERROR=$(echo "$VALIDATION" | jq -r '.errors | join("; ")') echo "{\"success\":false,\"error\":\"Config validation failed: $ERROR\"}" exit 1 fi # Extract fixed config FIXED_CONFIG=$(echo "$VALIDATION" | jq -c '.fixed_config') # Extract service name SERVICE_NAME=$(echo "$FIXED_CONFIG" | jq -r '.metadata.name') echo "Service name: $SERVICE_NAME" >&2 echo "Config validation passed" >&2 # Create service echo "Creating service..." >&2 CREATE_RESULT=$(aliyun eas create-service \ --cluster-id "$REGION" \ --body "$FIXED_CONFIG" \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy \ 2>&1) # Check creation result if echo "$CREATE_RESULT" | jq -e '.ServiceId' > /dev/null 2>&1; then SERVICE_ID=$(echo "$CREATE_RESULT" | jq -r '.ServiceId') echo "Service created successfully: $SERVICE_ID" >&2 # Wait for service ready if [[ "$WAIT_READY" == "true" ]]; then echo "Waiting for service to be ready..." >&2 START_TIME=$(date +%s) while true; do CURRENT_TIME=$(date +%s) ELAPSED=$((CURRENT_TIME - START_TIME)) if [[ $ELAPSED -gt $TIMEOUT ]]; then echo "{\"success\":false,\"error\":\"Timeout waiting for service to be ready\",\"service_id\":\"$SERVICE_ID\",\"service_name\":\"$SERVICE_NAME\"}" exit 1 fi # Query service status SERVICE_INFO=$(aliyun eas describe-service \ --cluster-id "$REGION" \ --service-name "$SERVICE_NAME" \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy \ 2>&1) if ! echo "$SERVICE_INFO" | jq -e '.Status' > /dev/null 2>&1; then echo "Failed to query service status, retrying..." >&2 sleep 5 continue fi STATUS=$(echo "$SERVICE_INFO" | jq -r '.Status') MESSAGE=$(echo "$SERVICE_INFO" | jq -r '.Message // empty') case $STATUS in "Running") echo "Service is ready" >&2 ENDPOINT=$(echo "$SERVICE_INFO" | jq -r '.InternetEndpoint // empty') INTRANET_ENDPOINT=$(echo "$SERVICE_INFO" | jq -r '.IntranetEndpoint // empty') echo "{\"success\":true,\"service_id\":\"$SERVICE_ID\",\"service_name\":\"$SERVICE_NAME\",\"status\":\"Running\",\"internet_endpoint\":\"$ENDPOINT\",\"intranet_endpoint\":\"$INTRANET_ENDPOINT\"}" exit 0 ;; "Failed") echo "Service startup failed: $MESSAGE" >&2 echo "{\"success\":false,\"error\":\"Service startup failed: $MESSAGE\",\"service_id\":\"$SERVICE_ID\",\"service_name\":\"$SERVICE_NAME\"}" exit 1 ;; *) echo "Waiting for service to be ready... ($ELAPSED/$TIMEOUT sec) Status: $STATUS" >&2 sleep 5 ;; esac done else # Do not wait, return immediately echo "{\"success\":true,\"service_id\":\"$SERVICE_ID\",\"service_name\":\"$SERVICE_NAME\",\"status\":\"Creating\"}" exit 0 fi else # Creation failed ERROR_MSG=$(echo "$CREATE_RESULT" | jq -r '.Message // .') echo "Service creation failed: $ERROR_MSG" >&2 echo "{\"success\":false,\"error\":\"Service creation failed: $ERROR_MSG\",\"service_name\":\"$SERVICE_NAME\"}" exit 1 fi FILE:scripts/list-images.sh #!/bin/bash # PAI-EAS Official Image List Query Script # Uses ListImages and ListImageLabels APIs to fetch image information set -e # Default parameters REGION="-cn-hangzhou" FRAMEWORK="-" CHIP_TYPE="-" VERBOSE="-false" # Color output RED='\033[0;31m' GREEN='\033[0;32m' YELLOW='\033[1;33m' BLUE='\033[0;34m' NC='\033[0m' # No Color # Help information show_help() { cat <<EOF PAI-EAS Official Image List Query Tool Usage: $0 [options] Options: -r, --region <region> Region (default: cn-hangzhou) -f, --framework <name> Filter by framework (vLLM, SGLang, PyTorch, TensorFlow, PaiRag, CosyVoice) -c, --chip <type> Filter by chip type (CPU, GPU, PPU) -v, --verbose Show detailed information -h, --help Show help information Examples: # List all official EAS-supported images $0 # List vLLM images $0 -f vLLM # List GPU type images $0 -c GPU # List vLLM GPU images (verbose) $0 -f vLLM -c GPU -v Framework types: vLLM - LLM inference acceleration SGLang - Structured output inference PyTorch - Deep learning framework TensorFlow - Deep learning framework PaiRag - RAG knowledge QA CosyVoice - Voice synthesis ModelScope - ModelScope community images EOF } # Parse parameters while [[ $# -gt 0 ]]; do case $1 in -r|--region) REGION="$2" shift 2 ;; -f|--framework) FRAMEWORK="$2" shift 2 ;; -c|--chip) CHIP_TYPE="$2" shift 2 ;; -v|--verbose) VERBOSE="true" shift ;; -h|--help) show_help exit 0 ;; *) echo -e "REDUnknown parameter: $1NC" show_help exit 1 ;; esac done echo -e "BLUE=== PAI-EAS Official Image List ===NC" echo "" # Build label filter conditions LABELS="system.official=true,system.supported.eas=true" FRAMEWORK_FILTER="" if [ -n "$FRAMEWORK" ]; then # Framework filter - filter in results FRAMEWORK_FILTER="$FRAMEWORK" fi if [ -n "$CHIP_TYPE" ]; then LABELS="$LABELS,system.chipType=$CHIP_TYPE" fi echo -e "YELLOWQuery parameters:NC" echo " Region: $REGION" [ -n "$FRAMEWORK" ] && echo " Framework: $FRAMEWORK" [ -n "$CHIP_TYPE" ] && echo " Chip: $CHIP_TYPE" echo "" # Call ListImages API echo -e "YELLOWQuerying image list...NC" IMAGES=$(aliyun aiworkspace list-images \ --verbose true \ --labels "$LABELS" \ --page-size 100 \ --user-agent AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-deploy) # Check if successful if [ -z "$IMAGES" ] || [ "$(echo "$IMAGES" | jq -r '.RequestId' 2>/dev/null)" = "null" ]; then echo -e "REDQuery failed, please check network connection and permissionsNC" echo "Error: $IMAGES" exit 1 fi # Count images TOTAL=$(echo "$IMAGES" | jq -r '.Images | length') if [ "$TOTAL" -eq 0 ]; then echo -e "YELLOWNo matching images foundNC" exit 0 fi echo -e "GREENFound $TOTAL imagesNC" echo "" # Display image list if [ "$VERBOSE" = "true" ]; then # Verbose mode if [ -n "$FRAMEWORK_FILTER" ]; then # Filter by framework echo "$IMAGES" | jq -r --arg fw "$FRAMEWORK_FILTER" ' [.Images[] | select(.Labels[] | select(.Key | contains($fw))) | { name: .Name, uri: .ImageUri, description: .Description, chipType: (([.Labels[] | select(.Key == "system.chipType") | .Value] | first) // "N/A"), framework: (([.Labels[] | select(.Key | startswith("system.framework.")) | .Value] | first) // "N/A"), port: (([.Labels[] | select(.Key == "system.eas.default.port") | .Value] | first) // "8000"), script: (([.Labels[] | select(.Key == "system.eas.default.script") | .Value] | first) // ""), latest: (([.Labels[] | select(.Key == "system.eas.deploy.latest") | .Value] | first) // "false") }] | sort_by(.latest == "false", .name) | .[] | "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n" + "Name: \(.name)\n" + "Type: [\(.chipType)] \(.framework)\n" + "Image: \(.uri)\n" + "Description: \(.description)\n" + "Port: \(.port)\n" + (if .script != "" then "Script: \(.script)\n" else "" end) + (if .latest == "true" then "Tag: ⭐ Latest\n" else "" end)' else echo "$IMAGES" | jq -r ' [.Images[] | { name: .Name, uri: .ImageUri, description: .Description, chipType: (([.Labels[] | select(.Key == "system.chipType") | .Value] | first) // "N/A"), framework: (([.Labels[] | select(.Key | startswith("system.framework.")) | .Value] | first) // "N/A"), port: (([.Labels[] | select(.Key == "system.eas.default.port") | .Value] | first) // "8000"), script: (([.Labels[] | select(.Key == "system.eas.default.script") | .Value] | first) // ""), latest: (([.Labels[] | select(.Key == "system.eas.deploy.latest") | .Value] | first) // "false") }] | sort_by(.latest == "false", .name) | .[] | "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n" + "Name: \(.name)\n" + "Type: [\(.chipType)] \(.framework)\n" + "Image: \(.uri)\n" + "Description: \(.description)\n" + "Port: \(.port)\n" + (if .script != "" then "Script: \(.script)\n" else "" end) + (if .latest == "true" then "Tag: ⭐ Latest\n" else "" end)' fi else # Compact mode if [ -n "$FRAMEWORK_FILTER" ]; then # Filter by framework echo -e "BLUEImage list:NC" echo "$IMAGES" | jq -r --arg fw "$FRAMEWORK_FILTER" ' [.Images[] | select(.Labels[] | select(.Key | contains($fw))) | { name: .Name, chipType: (([.Labels[] | select(.Key == "system.chipType") | .Value] | first) // "N/A"), framework: (([.Labels[] | select(.Key | startswith("system.framework.")) | .Value] | first) // "N/A"), latest: (([.Labels[] | select(.Key == "system.eas.deploy.latest") | .Value] | first) // "false"), description: .Description }] | sort_by(.latest == "false", .name) | .[] | " [\(.chipType)] \(.framework) - \(.name)" + (if .latest == "true" then " ⭐" else "" end)' else echo -e "BLUEImage list:NC" echo "$IMAGES" | jq -r ' [.Images[] | { name: .Name, chipType: (([.Labels[] | select(.Key == "system.chipType") | .Value] | first) // "N/A"), framework: (([.Labels[] | select(.Key | startswith("system.framework.")) | .Value] | first) // "N/A"), latest: (([.Labels[] | select(.Key == "system.eas.deploy.latest") | .Value] | first) // "false"), description: .Description }] | sort_by(.latest == "false", .name) | .[] | " [\(.chipType)] \(.framework) - \(.name)" + (if .latest == "true" then " ⭐" else "" end)' fi fi echo "" echo -e "BLUE━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━NC" echo -e "YELLOWTips:NC" echo " Use -v for detailed information" echo " Use -f <framework> to filter by framework" echo " Use -c <chip_type> to filter by CPU/GPU/PPU" echo "" FILE:scripts/validate-service-config.sh #!/bin/bash # ============================================================================== # PAI-EAS Service Config Validator # Validate service JSON config, apply defaults, output validation result # # Usage: # ./validate-service-config.sh --config service.json # ./validate-service-config.sh --config service.json --fix # ./validate-service-config.sh --config-json '{"metadata":{"name":"xxx"...}}' # ./validate-service-config.sh --config service.json --output service-fixed.json # # Output: # JSON format, containing validation result and fixed config # ============================================================================== set -euo pipefail # Parameter parsing CONFIG_FILE="" CONFIG_JSON="" FIX_MODE=false OUTPUT_FILE="" while [[ $# -gt 0 ]]; do case $1 in --config) CONFIG_FILE="$2" shift 2 ;; --config-json) CONFIG_JSON="$2" shift 2 ;; --fix) FIX_MODE=true shift ;; --output) OUTPUT_FILE="$2" shift 2 ;; -h|--help) cat <<EOF Usage: $0 --config service.json $0 --config service.json --fix $0 --config-json '{"metadata":{"name":"xxx"...}}' Parameters: --config FILE JSON config file path --config-json JSON JSON config string --fix Output the fixed complete config --output FILE Output fixed config to file -h, --help Show help information Validation rules: 1. Required field check: metadata.name, containers[].image 2. Service name format: ^[a-z0-9_]+$ 3. Image URI format check 4. Resource config completeness check 5. Storage mount check (warning for official images) 6. Apply defaults (port, instance count, etc.) Output format: { "valid": true/false, "errors": [...], "warnings": [...], "fixed_config": {...} // output in --fix mode } EOF exit 0 ;; *) echo "Unknown parameter: $1" >&2 exit 1 ;; esac done # Read config if [[ -n "$CONFIG_FILE" ]]; then if [[ ! -f "$CONFIG_FILE" ]]; then echo '{"valid":false,"errors":["Config file does not exist: '"$CONFIG_FILE"'"],"warnings":[]}' exit 1 fi CONFIG=$(cat "$CONFIG_FILE") elif [[ -n "$CONFIG_JSON" ]]; then CONFIG="$CONFIG_JSON" else echo '{"valid":false,"errors":["Please provide --config or --config-json parameter"],"warnings":[]}' exit 1 fi # Check JSON format if ! echo "$CONFIG" | jq empty 2>/dev/null; then echo '{"valid":false,"errors":["Invalid JSON format"],"warnings":[]}' exit 1 fi # Extract fields NAME=$(echo "$CONFIG" | jq -r '.metadata.name // empty') INSTANCE=$(echo "$CONFIG" | jq -r '.metadata.instance // empty') WORKSPACE_ID=$(echo "$CONFIG" | jq -r '.metadata.workspace_id // empty') RESOURCE=$(echo "$CONFIG" | jq -r '.metadata.resource // empty') CONTAINERS=$(echo "$CONFIG" | jq -c '.containers // []') CLOUD_COMPUTING=$(echo "$CONFIG" | jq -c '.cloud.computing // empty') CLOUD_NETWORKING=$(echo "$CONFIG" | jq -c '.cloud.networking // empty') STORAGE=$(echo "$CONFIG" | jq -c '.storage // []') NETWORKING=$(echo "$CONFIG" | jq -c '.networking // empty') # Initialize errors and warnings ERRORS="[]" WARNINGS="[]" # 1. Required field check if [[ -z "$NAME" ]]; then ERRORS=$(echo "$ERRORS" | jq '. + ["metadata.name is a required field"]') fi CONTAINER_COUNT=$(echo "$CONTAINERS" | jq 'length') if [[ "$CONTAINER_COUNT" -eq 0 ]]; then ERRORS=$(echo "$ERRORS" | jq '. + ["containers is a required field, at least one container config is needed"]') else # Check image field for each container for i in $(seq 0 $((CONTAINER_COUNT - 1))); do IMAGE=$(echo "$CONTAINERS" | jq -r ".[$i].image // empty") if [[ -z "$IMAGE" ]]; then ERRORS=$(echo "$ERRORS" | jq ". + [\"containers[$i].image is a required field\"]") else # Simple image format check if [[ ! "$IMAGE" =~ ^[a-zA-Z0-9._/-]+(:[a-zA-Z0-9._-]+)?$ ]]; then WARNINGS=$(echo "$WARNINGS" | jq ". + [\"containers[$i].image format may be incorrect: $IMAGE\"]") fi fi done fi # 2. Service name format check if [[ -n "$NAME" ]] && [[ ! "$NAME" =~ ^[a-z0-9_]+$ ]]; then ERRORS=$(echo "$ERRORS" | jq ". + [\"metadata.name format error, only lowercase letters, numbers, and underscores allowed: $NAME\"]") fi # 3. Resource config check HAS_RESOURCE=false HAS_CLOUD_COMPUTING=false if [[ -n "$RESOURCE" ]]; then HAS_RESOURCE=true fi if [[ -n "$CLOUD_COMPUTING" ]] && [[ "$CLOUD_COMPUTING" != "null" ]]; then HAS_CLOUD_COMPUTING=true INSTANCE_TYPE=$(echo "$CLOUD_COMPUTING" | jq -r '.instance_type // empty') INSTANCES=$(echo "$CLOUD_COMPUTING" | jq -c '.instances // []') if [[ -z "$INSTANCE_TYPE" ]] && [[ $(echo "$INSTANCES" | jq 'length') -eq 0 ]]; then ERRORS=$(echo "$ERRORS" | jq '. + ["cloud.computing requires instance_type or instances configuration"]') fi fi if [[ "$HAS_RESOURCE" == "false" ]] && [[ "$HAS_CLOUD_COMPUTING" == "false" ]]; then ERRORS=$(echo "$ERRORS" | jq '. + ["Either metadata.resource (dedicated resource group) or cloud.computing (public resource group) must be configured"]') fi # 4. Network config check (for dedicated gateway) GATEWAY=$(echo "$NETWORKING" | jq -r '.gateway // empty') if [[ -n "$GATEWAY" ]]; then # Check if "shared" is incorrectly set as gateway if [[ "$GATEWAY" == "shared" ]]; then ERRORS=$(echo "$ERRORS" | jq '. + ["Shared gateway does not need networking.gateway field, please remove it"]') else # Dedicated gateway requires VPC config VPC_ID=$(echo "$CLOUD_NETWORKING" | jq -r '.vpc_id // empty') VSWITCH_ID=$(echo "$CLOUD_NETWORKING" | jq -r '.vswitch_id // empty') SECURITY_GROUP_ID=$(echo "$CLOUD_NETWORKING" | jq -r '.security_group_id // empty') if [[ -z "$VPC_ID" ]] || [[ -z "$VSWITCH_ID" ]] || [[ -z "$SECURITY_GROUP_ID" ]]; then ERRORS=$(echo "$ERRORS" | jq '. + ["When using dedicated gateway, cloud.networking must include vpc_id, vswitch_id, security_group_id"]') fi fi fi # 5. Storage mount check STORAGE_COUNT=$(echo "$STORAGE" | jq 'length') if [[ "$STORAGE_COUNT" -eq 0 ]]; then WARNINGS=$(echo "$WARNINGS" | jq '. + ["No storage mount configured, official images require model files to be mounted"]') else # Check OSS path format for i in $(seq 0 $((STORAGE_COUNT - 1))); do OSS_PATH=$(echo "$STORAGE" | jq -r ".[$i].oss.path // empty") if [[ -n "$OSS_PATH" ]]; then if [[ ! "$OSS_PATH" =~ ^oss:// ]]; then ERRORS=$(echo "$ERRORS" | jq ". + [\"storage[$i].oss.path format error, must start with oss://: $OSS_PATH\"]") fi fi done fi # 6. Check unsupported config types PROCESSOR=$(echo "$CONFIG" | jq -r '.processor // empty') MODEL=$(echo "$CONFIG" | jq -r '.model // empty') if [[ -n "$PROCESSOR" ]]; then ERRORS=$(echo "$ERRORS" | jq '. + ["processor deployment is not supported, please use containers image deployment"]') fi if [[ -n "$MODEL" ]]; then ERRORS=$(echo "$ERRORS" | jq '. + ["model deployment is not supported, please use containers image deployment"]') fi # 7. Check quota config (not supported) QUOTA_ID=$(echo "$CONFIG" | jq -r '.metadata.quota_id // empty') if [[ -n "$QUOTA_ID" ]]; then WARNINGS=$(echo "$WARNINGS" | jq '. + ["Quota resource group is not supported, quota_id config will be ignored"]') fi # Build result ERROR_COUNT=$(echo "$ERRORS" | jq 'length') if [[ "$ERROR_COUNT" -gt 0 ]]; then VALID="false" else VALID="true" fi # Apply defaults function apply_defaults() { local config="$1" # Apply defaults echo "$config" | jq ' # Default instance count if .metadata.instance == null or .metadata.instance == "" then .metadata.instance = 1 else . end # Default container port | .containers = (.containers | map( if .port == null or .port == "" then .port = 8000 else . end )) # Default shared memory size | if .metadata.shm_size == null or .metadata.shm_size == "" then .metadata.shm_size = 64 else . end # Default graceful termination period | if .runtime.termination_grace_period == null or .runtime.termination_grace_period == "" then .runtime.termination_grace_period = 30 else . end # Default keepalive timeout | if .rpc.keepalive == null or .rpc.keepalive == "" then .rpc.keepalive = 600 else . end # Shared gateway: remove networking field (not needed for shared gateway) | if .networking.gateway == "shared" or .networking.gateway == "" or .networking.gateway == null then del(.networking) else . end ' } # Main logic if [[ "$FIX_MODE" == "true" ]] && [[ "$VALID" == "true" ]]; then FIXED_CONFIG=$(apply_defaults "$CONFIG") if [[ -n "$OUTPUT_FILE" ]]; then echo "$FIXED_CONFIG" | jq '.' > "$OUTPUT_FILE" echo "{\"valid\":$VALID,\"errors\":$ERRORS,\"warnings\":$WARNINGS,\"output_file\":\"$OUTPUT_FILE\"}" else echo "{\"valid\":$VALID,\"errors\":$ERRORS,\"warnings\":$WARNINGS,\"fixed_config\":$FIXED_CONFIG}" fi elif [[ -n "$OUTPUT_FILE" ]]; then # Output config to file (even with errors, output for user to fix) echo "$CONFIG" | jq '.' > "$OUTPUT_FILE" echo "{\"valid\":$VALID,\"errors\":$ERRORS,\"warnings\":$WARNINGS,\"output_file\":\"$OUTPUT_FILE\"}" else echo "{\"valid\":$VALID,\"errors\":$ERRORS,\"warnings\":$WARNINGS}" fi if [[ "$VALID" == "false" ]]; then exit 1 fi exit 0
Alibaba Cloud Tair (Redis OSS-Compatible) Database AI Assistant. For Tair/Redis instance management, performance diagnostics, memory analysis, hotspot key de...
---
name: alibabacloud-tair-ai-assistant
description: |
Alibaba Cloud Tair (Redis OSS-Compatible) Database AI Assistant. For Tair/Redis instance management, performance diagnostics, memory analysis, hotspot key detection, latency troubleshooting, parameter tuning, connection session analysis.
Use when user questions involve Tair, Redis, instance IDs starting with r-, memory analysis, hotspot keys, eviction policy, big key detection, etc.
---
# Tair Database AI Assistant
This Skill focuses on **Alibaba Cloud Tair (Redis OSS-Compatible) database** intelligent O&M, invoking the get-yao-chi-agent API through the aliyun CLI DAS plugin for diagnostics and analysis.
**Architecture**: `Aliyun CLI` → `DAS Plugin (Signature V3)` → `get-yao-chi-agent API` → Tair Intelligent Diagnostics
### Supported Capabilities
| Capability | Description |
|------------|-------------|
| Instance List Query | List and filter Tair/Redis instances by region, type, status |
| Memory Usage Analysis | Memory consumption breakdown, fragmentation ratio, eviction statistics |
| Hotspot Key Detection | Hot key identification, access frequency analysis, cache optimization |
| Big Key Analysis | Large key detection, memory distribution analysis, optimization suggestions |
| Latency Diagnostics | Command latency analysis, slow command detection, network latency troubleshooting |
| Slow Log Analysis | Slow command log query, high-latency operation identification |
| Parameter Tuning | Instance parameter explanation, configuration suggestions, performance impact analysis |
| Connection Session Analysis | Connection count monitoring, client session troubleshooting, connection pool optimization |
| Backup Status Check | Backup completion verification, retention policy, recovery point in time |
| Performance Monitoring | QPS/TPS/hit rate/bandwidth and other core metrics analysis |
| Expiring Instance Query | Subscription instance expiration reminder |
| Security Configuration Audit | Whitelist, SSL/TLS, password policy, security audit |
| Storage Optimization | Data structure optimization, TTL strategy, memory efficiency improvement |
| Proxy Diagnostics | Proxy layer performance analysis, connection routing, bandwidth bottleneck detection |
## Installation
> **Pre-check: Aliyun CLI >= 3.3.1 required**
> Run `aliyun version` to verify >= 3.3.1. If not installed or version too low,
> see [references/cli-installation-guide.md](references/cli-installation-guide.md) for installation instructions.
> Then **[MUST]** run `aliyun configure set --auto-plugin-install true` to enable automatic plugin installation.
```bash
# Install aliyun CLI
curl -fsSL https://aliyuncli.alicdn.com/install.sh | bash
aliyun version # Verify >= 3.3.1
# Enable automatic plugin installation
aliyun configure set --auto-plugin-install true
# Install DAS plugin (get-yao-chi-agent requires plugin for Signature V3 support)
aliyun plugin install --names aliyun-cli-das
# Update all installed plugins to latest version
aliyun plugin update
# Install jq (for JSON response parsing)
# macOS:
brew install jq
# Ubuntu/Debian:
# sudo apt-get install jq
```
## Parameter Confirmation
> **IMPORTANT: Parameter Confirmation** — Before executing any command or API call,
> ALL user-customizable parameters (e.g., RegionId, instance names, CIDR blocks,
> passwords, domain names, resource specifications, etc.) MUST be confirmed with the
> user. Do NOT assume or use default values without explicit user approval.
| Parameter | Required/Optional | Description | Default |
|-----------|-------------------|-------------|---------|
| `query` | Required | Natural language query content (including region, instance info) | - |
| `--session-id` | Optional | Session ID for multi-turn conversation | - |
| `--profile` | Optional | aliyun CLI profile name | default |
## Authentication
This Skill relies on the **aliyun CLI default credential chain** for authentication — no explicit AK/SK handling is required in the Skill workflow.
The CLI automatically resolves credentials in the following priority order:
1. `--profile` flag on the command line
2. `ALIBABA_CLOUD_PROFILE` environment variable
3. `ALIBABA_CLOUD_ACCESS_KEY_ID` / `ALIBABA_CLOUD_ACCESS_KEY_SECRET` environment variables
4. Configuration file `~/.aliyun/config.json` (current profile)
5. ECS Instance RAM Role (if running on ECS)
For credential setup and configuration modes (OAuth, AK, StsToken, RamRoleArn, EcsRamRole, etc.), see [references/cli-installation-guide.md](references/cli-installation-guide.md).
## RAM Policy
See [references/ram-policies.md](references/ram-policies.md)
## Core Workflow
All intelligent O&M operations are invoked through `scripts/call_yaochi_agent.sh`, which wraps `aliyun das get-yao-chi-agent` (DAS plugin kebab-case command, supports Signature V3) with streaming response parsing.
**Before executing any CLI command, AI-Mode must be enabled; after workflow ends, it must be disabled:**
```bash
# [MUST] Enable AI-Mode before executing CLI commands
aliyun configure ai-mode enable
aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-tair-ai-assistant"
```
```bash
# Instance management
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List Tair instances in Hangzhou region"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Show detailed configuration of instance r-xxx"
# Performance diagnostics
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Analyze instance r-xxx performance in the last hour"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Show slow commands of instance r-xxx"
# Memory analysis
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Analyze memory usage of instance r-xxx"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Detect big keys in instance r-xxx"
# Hotspot key detection
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Find hotspot keys in instance r-xxx"
# Parameter tuning
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "How to tune maxmemory-policy for instance r-xxx"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Explain hz parameter"
# Connection and session
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "How to troubleshoot high connection count in instance r-xxx"
# Backup recovery
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Show backup status of instance r-xxx"
# Multi-turn conversation (use session ID from previous response)
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Continue analysis" --session-id "<session-id>"
# Specify profile
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List instances" --profile myprofile
# Read from stdin
echo "List instances" | bash $SKILL_DIR/scripts/call_yaochi_agent.sh -
```
```bash
# [MUST] Disable AI-Mode after workflow ends
aliyun configure ai-mode disable
```
### Example Questions
| Scenario | Example Question |
|----------|------------------|
| Instance Management | List Tair instances in Beijing region |
| Performance Diagnostics | How to troubleshoot high CPU usage in instance r-xxx |
| Slow Log Analysis | Show slow commands in instance r-xxx in the last hour |
| Memory Analysis | Analyze memory fragmentation of instance r-xxx |
| Big Key Detection | Detect big keys in instance r-xxx and suggest optimization |
| Hotspot Key | Find hotspot keys in instance r-xxx |
| Parameter Tuning | What does maxmemory-policy parameter mean |
| Master-Replica | How to handle high replication delay in instance r-xxx |
| Backup Recovery | When was the latest backup of instance r-xxx |
| Connection Troubleshooting | Instance r-xxx connections are full |
| Security Audit | Check security configuration of instance r-xxx |
## Success Verification
See [references/verification-method.md](references/verification-method.md)
## Cleanup
This Skill focuses on **query and diagnostics** capabilities, does not create any resources, no cleanup required.
The following operations are NOT within the scope of this Skill:
- Create/delete Tair instances
- Change instance specifications
- Purchase/renew instances
## API and Command Tables
See [references/related-apis.md](references/related-apis.md)
## Best Practices
1. **Instance ID Format**: Tair/Redis instance IDs typically start with `r-`, include the full instance ID in queries
2. **Region Specification**: Explicitly specify region in natural language queries (e.g., "Hangzhou region", "Beijing region") to improve query accuracy
3. **Multi-turn Conversation**: Use `--session-id` for complex diagnostic scenarios to maintain context continuity
4. **Concurrency Limit**: Maximum 2 concurrent sessions per account, avoid initiating multiple parallel calls
5. **High-risk Operations**: For operations involving parameter changes, master-replica switchover, always remind users to verify in test environment first
6. **Throttling Handling**: If encountering `Throttling.UserConcurrentLimit` error, wait for previous query to complete and retry
7. **Credential Security**: Use `aliyun configure` to manage credentials, never hardcode AK/SK in scripts
## Reference Links
| Reference | Description |
|-----------|-------------|
| [references/cli-installation-guide.md](references/cli-installation-guide.md) | Aliyun CLI installation and configuration guide |
| [references/related-apis.md](references/related-apis.md) | Related API and CLI command list |
| [references/ram-policies.md](references/ram-policies.md) | RAM permission policy list |
| [references/verification-method.md](references/verification-method.md) | Success verification methods |
| [references/acceptance-criteria.md](references/acceptance-criteria.md) | Acceptance criteria |
FILE:references/acceptance-criteria.md
# Acceptance Criteria: alibabacloud-tair-ai-assistant
**Scenario**: Tair Database AI Assistant
**Purpose**: Skill testing acceptance criteria
---
# Correct CLI Command Patterns
## 1. Product — DAS (Database Autonomy Service)
#### CORRECT
```bash
aliyun das GetYaoChiAgent --Query "List instances" --Source "tair-console" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
```
#### INCORRECT
```bash
# Error: Product name spelling error
aliyun DAS GetYaoChiAgent --Query "List instances"
# Error: Using non-existent plugin mode command
aliyun das get-yao-chi-agent --query "List instances"
```
**Note**: DAS product uses traditional API format (PascalCase), not plugin mode (kebab-case).
## 2. Command — GetYaoChiAgent
#### CORRECT
```bash
aliyun das GetYaoChiAgent --Query "Hello" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
```
#### INCORRECT
```bash
# Error: API name spelling error
aliyun das GetYaochiAgent --Query "Hello"
# Error: Using non-existent API
aliyun das YaoChiAgent --Query "Hello"
```
## 3. Parameters — Parameter Validation
### GetYaoChiAgent Parameters
#### CORRECT
```bash
# Required parameter --Query
aliyun das GetYaoChiAgent --Query "List Tair instances in Hangzhou region" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
# Optional parameter --Source
aliyun das GetYaoChiAgent --Query "List instances" --Source "tair-console" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
# Optional parameter --SessionId (multi-turn conversation)
aliyun das GetYaoChiAgent --Query "Continue analysis" --SessionId "sess-xxx" --Source "tair-console" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
# Optional parameter --ExtraInfo
aliyun das GetYaoChiAgent --Query "Show instance" --ExtraInfo "{}" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
```
#### INCORRECT
```bash
# Error: Missing required parameter --Query
aliyun das GetYaoChiAgent --Source "tair-console"
# Error: Parameter name in lowercase (CLI parameter names are case-sensitive)
aliyun das GetYaoChiAgent --query "List instances"
# Error: Parameter name spelling error
aliyun das GetYaoChiAgent --Query "List instances" --Session-Id "sess-xxx"
# Error: Using non-existent parameter
aliyun das GetYaoChiAgent --Query "List instances" --RegionId "cn-hangzhou"
```
## 4. Endpoint — Endpoint Validation
#### CORRECT
```bash
# GetYaoChiAgent uses cn-shanghai endpoint uniformly
aliyun das GetYaoChiAgent --Query "List instances" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
```
#### INCORRECT
```bash
# Error: Using wrong endpoint
aliyun das GetYaoChiAgent --Query "List instances" --endpoint das.cn-beijing.aliyuncs.com
# Error: Endpoint not specified, may use wrong default endpoint
aliyun das GetYaoChiAgent --Query "List instances"
```
## 5. --user-agent Flag — Must Include
#### CORRECT
```bash
aliyun das GetYaoChiAgent --Query "List instances" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
```
#### INCORRECT
```bash
# Error: Missing --user-agent flag
aliyun das GetYaoChiAgent --Query "List instances" --endpoint das.cn-shanghai.aliyuncs.com
```
## 6. Timeout — Timeout Settings
#### CORRECT
```bash
# SSE streaming API requires longer read timeout (180 seconds)
aliyun das GetYaoChiAgent --Query "List instances" --endpoint das.cn-shanghai.aliyuncs.com --read-timeout 180 --connect-timeout 30 --user-agent AlibabaCloud-Agent-Skills
```
#### INCORRECT
```bash
# Error: Read timeout too short, streaming API may timeout
aliyun das GetYaoChiAgent --Query "List instances" --endpoint das.cn-shanghai.aliyuncs.com --read-timeout 10 --user-agent AlibabaCloud-Agent-Skills
```
---
# Correct Bash Script Patterns
## 1. Script Invocation
#### CORRECT
```bash
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List Tair instances in Hangzhou region"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Analyze instance r-xxx memory usage" --session-id "sess-xxx"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List instances" --profile myprofile
echo "List instances" | bash $SKILL_DIR/scripts/call_yaochi_agent.sh -
```
#### INCORRECT
```bash
# Error: Using old Python script
uv run $SKILL_DIR/scripts/call_yaochi_agent.py "List instances"
# Error: Using Python interpreter to run bash script
python $SKILL_DIR/scripts/call_yaochi_agent.sh "List instances"
# Error: Parameter name using old format
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List instances" --role-arn acs:ram::xxx:role/xxx
```
## 2. SSE Response Parsing
#### CORRECT — Script automatically parses SSE response
```
# Input: SSE format response body
data: {"Content":"Tair instance list:","SessionId":"sess-abc123","ReasoningContent":""}
data: {"Content":"\n1. r-xxx (cn-hangzhou)","SessionId":"sess-abc123","ReasoningContent":""}
data: [DONE]
# Output: Concatenated Content
Tair instance list:
1. r-xxx (cn-hangzhou)
```
## 3. Credential Management
#### CORRECT
```bash
# Use existing aliyun CLI configuration
aliyun configure --mode OAuth
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List instances"
# Use specified profile
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List instances" --profile myprofile
```
#### INCORRECT3
```bash
# Error: Hardcoding AK/SK in script
export ALIBABA_CLOUD_ACCESS_KEY_ID="LTAI5tXXXXXXXX"
export ALIBABA_CLOUD_ACCESS_KEY_SECRET="8dXXXXXXXXXXXX"
# Error: Using custom credential variables from old script
export YAOCHI_ACCESS_KEY_ID="xxx"
export YAOCHI_ACCESS_KEY_SECRET="xxx"
```
---
# Authentication Patterns
#### CORRECT — Use aliyun CLI configuration
```bash
# OAuth mode (Recommended)
aliyun configure --mode OAuth
# AK mode
aliyun configure set --mode AK --access-key-id <AK> --access-key-secret <SK> --region cn-hangzhou
# Cross-account RamRoleArn mode
aliyun configure set --mode RamRoleArn --access-key-id <AK> --access-key-secret <SK> --ram-role-arn <ARN> --role-session-name yaochi-session --region cn-hangzhou
```
#### INCORRECT — Managing credentials in script
```python
# Error: Using Python SDK to manage credentials
from alibabacloud_das20200116.client import Client as DAS20200116Client
# Error: Parsing credentials from .env file
# Error: Parsing credentials from ~/.alibabacloud/credentials
```
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
Complete guide for installing and configuring Aliyun CLI.
> **Aliyun CLI 3.3.1+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.1 or later for full plugin ecosystem coverage.
## Installation
### macOS
**Using Homebrew (Recommended)**
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
# Verify version (>= 3.3.1)
aliyun version
```
**Using Binary**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
# Extract
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
# Move to PATH
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
### Linux
**Debian/Ubuntu**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**CentOS/RHEL**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**ARM64 Architecture**
```bash
# Download ARM64 version
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-arm64.tgz
sudo mv aliyun /usr/local/bin/
```
### Windows
**Using Binary**
1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Open new Command Prompt or PowerShell
5. Verify: `aliyun version`
**Using PowerShell**
```powershell
# Download
Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip"
# Extract
Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli
# Add to PATH (requires admin privileges)
$env:Path += ";C:\aliyun-cli"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)
# Verify
aliyun version
```
## Configuration
### Quick Start
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
All `aliyun configure` commands support non-interactive flags, which is the recommended approach —
it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts.
**Where to Get Access Keys**
1. Log in to Aliyun Console: https://ram.console.aliyun.com/
2. Navigate to: AccessKey Management
3. Create a new AccessKey pair
4. Save the secret immediately — it's only shown once
### Configuration Modes
Aliyun CLI supports 6 authentication modes. All examples below use non-interactive flags.
#### 1. AK Mode (Access Key)
Most common mode for personal accounts and scripts.
```bash
aliyun configure set \
--mode AK \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Configuration is stored in `~/.aliyun/config.json`:
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "AK",
"access_key_id": "LTAI5tXXXXXXXX",
"access_key_secret": "8dXXXXXXXXXXXXXXXXXXXXXXXX",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "en"
}
]
}
```
#### 2. StsToken Mode (Temporary Credentials)
For short-lived access (tokens expire in 1-12 hours).
```bash
aliyun configure set \
--mode StsToken \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--sts-token v1.0:XXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Use cases: CI/CD pipelines, temporary access for external contractors, cross-account access.
#### 3. RamRoleArn Mode (Assume RAM Role)
Assume a RAM role for elevated or cross-account access.
```bash
aliyun configure set \
--mode RamRoleArn \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--ram-role-arn acs:ram::123456789012:role/AdminRole \
--role-session-name my-session \
--region cn-hangzhou
```
Use cases: cross-account resource access, temporary elevated privileges, role-based access control.
#### 4. EcsRamRole Mode (ECS Instance RAM Role)
Use the RAM role attached to an ECS instance — no credentials needed.
```bash
aliyun configure set \
--mode EcsRamRole \
--ram-role-name MyEcsRole \
--region cn-hangzhou
```
Requirements: must be running on an ECS instance with a RAM role attached.
Use cases: scripts and automation running on ECS instances.
#### 5. RsaKeyPair Mode (RSA Key Pair)
Use RSA key pair for authentication (generate key pair in Aliyun Console first).
```bash
aliyun configure set \
--mode RsaKeyPair \
--private-key /path/to/private-key.pem \
--key-pair-name my-key-pair \
--region cn-hangzhou
```
#### 6. RamRoleArnWithEcs Mode (ECS + RAM Role)
Combine ECS instance role with RAM role assumption for cross-account access from ECS.
```bash
aliyun configure set \
--mode RamRoleArnWithEcs \
--ram-role-name MyEcsRole \
--ram-role-arn acs:ram::123456789012:role/TargetRole \
--role-session-name my-session \
--region cn-hangzhou
```
### Environment Variables
**Highest priority** - overrides config file
**Access Key Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**STS Token Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_SECURITY_TOKEN=your_sts_token
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**ECS RAM Role Mode**
```bash
export ALIBABA_CLOUD_ECS_METADATA=role_name
```
**Use Case**:
- CI/CD pipelines
- Docker containers
- Temporary credential override
### Managing Multiple Profiles
**Create Named Profiles**
```bash
aliyun configure set --profile projectA \
--mode AK \
--access-key-id LTAI5tAAAAAAAA \
--access-key-secret 8dAAAAAAAAAAAAAAAAAAAAAAAA \
--region cn-hangzhou
aliyun configure set --profile projectB \
--mode AK \
--access-key-id LTAI5tBBBBBBBB \
--access-key-secret 8dBBBBBBBBBBBBBBBBBBBBBBBB \
--region cn-shanghai
```
**Use Specific Profile**
```bash
aliyun ecs describe-instances --profile projectA
export ALIBABA_CLOUD_PROFILE=projectA
aliyun ecs describe-instances # Uses projectA
```
**List and Switch Profiles**
```bash
aliyun configure list # List all profiles
aliyun configure set --current projectA # Switch default profile
```
### Credential Priority
Credentials are loaded in this order (first found wins):
1. **Command-line flag**: `--profile <name>`
2. **Environment variable**: `ALIBABA_CLOUD_PROFILE`
3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc.
4. **Configuration file**: `~/.aliyun/config.json` (current profile)
5. **ECS Instance RAM Role**: If running on ECS with attached role
## Verification
### Test Authentication
```bash
# Basic test - list regions
aliyun ecs describe-regions
# Expected output: JSON array of regions
```
**If successful**, you'll see:
```json
{
"Regions": {
"Region": [
{
"RegionId": "cn-hangzhou",
"RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com",
"LocalName": "China East 1 (Hangzhou)"
},
...
]
},
"RequestId": "..."
}
```
**If failed**, you'll see error messages:
- `InvalidAccessKeyId.NotFound` - Wrong Access Key ID
- `SignatureDoesNotMatch` - Wrong Access Key Secret
- `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode)
- `Forbidden.RAM` - Insufficient permissions
### Debug Configuration
```bash
# Show current configuration
aliyun configure get
# Test with debug logging
aliyun ecs describe-regions --log-level=debug
# Check credential provider
aliyun configure get mode
```
## Security Best Practices
### 1. Use RAM Users (Not Root Account)
❌ **Don't**: Use Aliyun root account credentials
✅ **Do**: Create RAM users with specific permissions
```bash
# Create RAM user in console
# Attach only necessary policies
# Use RAM user's access keys
```
### 2. Principle of Least Privilege
Grant only the minimum permissions needed:
```bash
# Example: Read-only ECS access
# Attach policy: AliyunECSReadOnlyAccess
```
### 3. Rotate Access Keys Regularly
```bash
# Create new access key in RAM Console, then update configuration
aliyun configure set --access-key-id NEW_KEY --access-key-secret NEW_SECRET
# Delete old access key from console
```
### 4. Use STS Tokens for Temporary Access
```bash
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token XXXX --region cn-hangzhou
```
### 5. Use ECS RAM Roles When Possible
```bash
aliyun configure set --mode EcsRamRole --ram-role-name MyRole --region cn-hangzhou
```
### 6. Never Commit Credentials
```bash
# Add to .gitignore
echo "~/.aliyun/config.json" >> .gitignore
# Use environment variables in CI/CD instead
```
### 7. Secure Config File
```bash
# Restrict permissions
chmod 600 ~/.aliyun/config.json
```
## Troubleshooting
### Issue: Command Not Found
```bash
# Check installation
which aliyun
# Check PATH
echo $PATH
# Reinstall or add to PATH
```
### Issue: Authentication Failed
```bash
# Verify configuration
aliyun configure get
# Test with debug
aliyun ecs describe-regions --log-level=debug
# Check credentials in console
# Verify access key is active
```
### Issue: Permission Denied
```bash
# Error: Forbidden.RAM
# Check RAM user permissions
# Attach necessary policies in RAM console
# Example: AliyunECSFullAccess for ECS operations
```
### Issue: STS Token Expired
```bash
# Error: InvalidSecurityToken.Expired
# Reconfigure with new token
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token NEW_TOKEN --region cn-hangzhou
```
### Issue: Wrong Region
```bash
# Some resources may not exist in the specified region
# Check available regions
aliyun ecs describe-regions
# Update default region
aliyun configure set region cn-shanghai
```
## Advanced Configuration
### Custom Endpoint
```bash
# Use custom or private endpoint
export ALIBABA_CLOUD_ECS_ENDPOINT=ecs-vpc.cn-hangzhou.aliyuncs.com
```
### Proxy Settings
```bash
# HTTP proxy
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
# No proxy for specific domains
export NO_PROXY=localhost,127.0.0.1,.aliyuncs.com
```
### Timeout Settings
```bash
# Connection timeout (default: 10s)
export ALIBABA_CLOUD_CONNECT_TIMEOUT=30
# Read timeout (default: 10s)
export ALIBABA_CLOUD_READ_TIMEOUT=30
```
## Next Steps
After installation and configuration:
1. **Install plugins** for services you need (v3.3.1+ supports all published product plugins):
```bash
aliyun plugin install --names ecs vpc rds
# List all available plugins
aliyun plugin list-remote
```
2. **Explore commands**:
```bash
aliyun ecs --help
aliyun fc --help
```
3. **Read documentation**:
- [Command Syntax Guide](./command-syntax.md)
- [Global Flags Reference](./global-flags.md)
- [Common Scenarios](./common-scenarios.md)
## References
- Official Documentation: https://help.aliyun.com/zh/cli/
- RAM Console: https://ram.console.aliyun.com/
- Access Key Management: https://ram.console.aliyun.com/manage/ak
- Plugin Repository: https://github.com/aliyun/aliyun-cli
FILE:references/ram-policies.md
# RAM Policies
## Required Permissions
Tair AI Assistant (YaoChi Agent) requires the following RAM permissions.
### Standard Edition
RAM sub-accounts must have:
- **AliyunKvstoreReadOnlyAccess** - Tair/Redis read-only access
- **AliyunYaoChiAgentAccess** - YaoChi Agent access
For authorization instructions, see [Grant permissions to RAM users](https://help.aliyun.com/document_detail/116146.html).
### Professional Edition
RAM sub-accounts must have the service-linked role created:
- **AliyunServiceRolePolicyForTairAgent** - Tair Agent service-linked role
## Custom Policy (Minimum Permissions)
If you need to create a custom policy with minimum required permissions:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"das:GetYaoChiAgent",
"das:GetDasAgentSSE"
],
"Resource": "*"
}
]
}
```
## Cross-Account Access - STS AssumeRole
For cross-account access, configure trust policy on the target account's RAM role:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Principal": {
"RAM": [
"acs:ram::<caller-account-id>:root"
]
}
}
]
}
```
## System Policy Reference
| Policy Name | Description | Use Case |
|-------------|-------------|----------|
| `AliyunKvstoreReadOnlyAccess` | Tair/Redis read-only access | Required for Standard Edition |
| `AliyunYaoChiAgentAccess` | YaoChi Agent access | Required for Standard Edition |
## Permission Mapping
| Operation | Required RAM Action |
|-----------|---------------------|
| Invoke YaoChi Agent | `das:GetYaoChiAgent` |
| Invoke DAS Agent SSE | `das:GetDasAgentSSE` |
FILE:references/related-apis.md
# Related APIs
## DAS (Database Autonomy Service) - Core API
| Product | CLI Command | API Action | Description |
|---------|------------|------------|-------------|
| DAS | `aliyun das GetYaoChiAgent --Query "<query>" --Source "tair-console" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills` | GetYaoChiAgent | YaoChi Intelligent Diagnostic Agent (SSE streaming response) |
| DAS | `aliyun das GetDasAgentSSE --Query "<query>" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills` | GetDasAgentSSE | DAS Agent SSE interface |
## GetYaoChiAgent API Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `--Query` | String | Yes | Natural language query content |
| `--Source` | String | No | Call source identifier, recommended to set as `tair-console` |
| `--SessionId` | String | No | Session ID for multi-turn conversation context preservation |
| `--ExtraInfo` | String | No | Extra information |
## GetDasAgentSSE API Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `--Query` | String | Yes | Natural language query content |
| `--AgentId` | String | No | Agent ID |
| `--InstanceId` | String | No | Database instance ID |
| `--SessionId` | String | No | Session ID for multi-turn conversation context preservation |
## SSE Response Format
GetYaoChiAgent returns SSE (Server-Sent Events) streaming response in the following format:
```
data: {"Content":"Response text chunk 1","SessionId":"sess-xxx","ReasoningContent":""}
data: {"Content":"Response text chunk 2","SessionId":"sess-xxx","ReasoningContent":""}
...
data: [DONE]
```
### Response Fields
| Field | Type | Description |
|-------|------|-------------|
| `Content` | String | Text content of current chunk |
| `SessionId` | String | Session ID for multi-turn conversation |
| `ReasoningContent` | String | Reasoning process content (for debugging) |
## API Endpoint
| Environment | Endpoint |
|-------------|----------|
| Production | `das.cn-shanghai.aliyuncs.com` |
> Note: GetYaoChiAgent API uses `das.cn-shanghai.aliyuncs.com` endpoint uniformly, regardless of the region where the user's Tair instance is located.
FILE:references/verification-method.md
# Verification Method
## How to Verify Skill Execution Success
### Step 1: Verify aliyun CLI Installation and Configuration
```bash
# Check CLI version
aliyun version
# Expected output: 3.3.1 or higher
# Check authentication configuration
aliyun configure get
# Expected output: Display current profile configuration
# Test basic connectivity
aliyun das describe-instance-das-pro --instance-id "r-test" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills 2>&1
# Expected: Return JSON response (even if instance doesn't exist, should return API error not connection error)
```
### Step 2: Verify jq Installation
```bash
echo '{"Content":"test"}' | jq -r '.Content'
# Expected output: test
```
### Step 3: Verify call_yaochi_agent.sh Script
```bash
# Verify script is executable
bash $SKILL_DIR/scripts/call_yaochi_agent.sh --help
# Expected: Display help information
# Verify error prompt without parameters
bash $SKILL_DIR/scripts/call_yaochi_agent.sh
# Expected: Display usage prompt and exit
```
### Step 4: Verify Actual Invocation (requires valid credentials)
```bash
# Simple query test
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Hello"
# Expected: Return YaoChi Agent response content
# With debug mode
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Hello" --debug
# Expected: Return response content, also output debug info to stderr
```
### Step 5: Verify Multi-turn Conversation
```bash
# First round query - note the session ID output to stderr
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List Tair instances in Hangzhou region"
# Expected: Return instance list, stderr outputs [SessionID] sess-xxx
# Second round query - use session ID from previous round
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Continue analyzing the first instance" --session-id "sess-xxx"
# Expected: Continue analysis based on context
```
## Common Errors and Solutions
| Error | Cause | Solution |
|-------|-------|----------|
| `command not found: aliyun` | aliyun CLI not installed | Refer to cli-installation-guide.md for installation |
| `command not found: jq` | jq not installed | `brew install jq` or `apt install jq` |
| `InvalidAccessKeyId` | Invalid AK/SK | Check `aliyun configure get` configuration |
| `Throttling.UserConcurrentLimit` | Concurrency limit exceeded | Wait for previous query to complete and retry |
| `Forbidden.RAM` | Insufficient permissions | Refer to ram-policies.md for permission configuration |
FILE:scripts/call_yaochi_agent.sh
#!/usr/bin/env bash
# =============================================================================
# call_yaochi_agent.sh - Alibaba Cloud YaoChi Agent CLI Script (Tair)
# =============================================================================
# Invokes get-yao-chi-agent API via aliyun CLI DAS plugin with streaming response.
# Requires DAS plugin: aliyun plugin install --names aliyun-cli-das
# Uses existing aliyun CLI credentials (aliyun configure), no extra setup needed.
#
# Usage:
# bash call_yaochi_agent.sh "List Tair instances in Hangzhou region"
# bash call_yaochi_agent.sh "Analyze instance r-xxx performance" --session-id <session-id>
# echo "List instances" | bash call_yaochi_agent.sh -
# =============================================================================
set -euo pipefail
# --- Configuration ---
ENDPOINT="das.cn-shanghai.aliyuncs.com"
SOURCE="tair-console"
READ_TIMEOUT=180
CONNECT_TIMEOUT=30
# --- Variables ---
QUERY=""
SESSION_ID=""
PROFILE=""
DEBUG=false
# --- Functions ---
usage() {
cat >&2 <<EOF
Alibaba Cloud YaoChi Agent CLI Tool - Tair (based on aliyun CLI)
Usage:
$(basename "$0") <query> [options]
Arguments:
<query> Query content (natural language), use '-' to read from stdin
Options:
--session-id <id> Session ID for multi-turn conversation
--profile <name> Specify aliyun CLI profile
--debug, -d Enable debug mode
--help, -h Show help information
Examples:
$(basename "$0") "List Tair instances in Hangzhou region"
$(basename "$0") "Analyze instance r-xxx memory usage" --session-id "sess-xxx"
echo "List instances" | $(basename "$0") -
EOF
}
debug_log() {
if [[ "$DEBUG" == "true" ]]; then
echo "[DEBUG] $*" >&2
fi
}
# Check dependencies
check_dependencies() {
if ! command -v aliyun &>/dev/null; then
echo "Error: aliyun CLI not found, please install (>= 3.3.1)" >&2
echo "Install: curl -fsSL https://aliyuncli.alicdn.com/install.sh | bash" >&2
echo "See: references/cli-installation-guide.md" >&2
exit 1
fi
if ! command -v jq &>/dev/null; then
echo "Error: jq is required to parse JSON response" >&2
echo "Install:" >&2
echo " macOS: brew install jq" >&2
echo " Ubuntu: sudo apt-get install jq" >&2
echo " CentOS: sudo yum install jq" >&2
exit 1
fi
local version
version=$(aliyun version 2>/dev/null || echo "0.0.0")
debug_log "aliyun CLI version: $version"
# Ensure DAS plugin is installed (get-yao-chi-agent requires plugin for Signature V3)
if ! aliyun das get-yao-chi-agent --help &>/dev/null 2>&1; then
echo "Error: DAS plugin not installed" >&2
echo "Please install manually: aliyun plugin install --names aliyun-cli-das" >&2
exit 1
fi
}
# Stream parse response (read from stdin line by line, output in real-time)
# DAS plugin returns streaming JSON (one {"data": {...}} per line) or SSE format
parse_sse_streaming() {
local session_id=""
local format_detected=false
local is_sse=false
local is_json_stream=false
local error_buffer=""
while IFS= read -r line; do
line="line%$'\r'"
[[ -z "$line" ]] && continue
# Detect response format on first line
if [[ "$format_detected" == false ]]; then
if [[ "$line" =~ ^data: ]]; then
is_sse=true
debug_log "Detected SSE format response"
elif echo "$line" | jq -e '.data' &>/dev/null 2>&1; then
is_json_stream=true
debug_log "Detected streaming JSON format response (DAS plugin)"
else
# Might be error response or plain JSON, buffer first
error_buffer="$line"
# Check if error response
local error_code
error_code=$(echo "$line" | jq -r '.Code // empty' 2>/dev/null) || true
if [[ -n "$error_code" ]]; then
local error_msg
error_msg=$(echo "$line" | jq -r '.Message // empty' 2>/dev/null) || true
echo "Error: -Unknown error (error_code)" >&2
if [[ "$error_code" == *"Throttling"* ]] || [[ "$error_code" == *"ConcurrentLimit"* ]]; then
echo "Max 2 concurrent sessions per account. Please wait for previous query to complete." >&2
fi
return 1
fi
# Try to handle as plain JSON response
local content
content=$(echo "$line" | jq -r '.Content // .Data // empty' 2>/dev/null) || true
if [[ -n "$content" ]]; then
printf "%s" "$content"
session_id=$(echo "$line" | jq -r '.SessionId // empty' 2>/dev/null) || true
else
# Cannot parse, output as-is
echo "$line"
fi
format_detected=true
continue
fi
format_detected=true
fi
# Process SSE format
if [[ "$is_sse" == true ]]; then
if [[ "$line" =~ ^data:\ ?(.*) ]]; then
local data="BASH_REMATCH[1]"
[[ "$data" == "[DONE]" || -z "$data" ]] && continue
local chunk_content
chunk_content=$(echo "$data" | jq -r '.Content // empty' 2>/dev/null) || true
[[ -n "$chunk_content" ]] && printf "%s" "$chunk_content"
local chunk_session
chunk_session=$(echo "$data" | jq -r '.SessionId // empty' 2>/dev/null) || true
[[ -n "$chunk_session" ]] && session_id="$chunk_session"
if [[ "$DEBUG" == "true" ]]; then
local reasoning
reasoning=$(echo "$data" | jq -r '.ReasoningContent // empty' 2>/dev/null) || true
[[ -n "$reasoning" ]] && debug_log "Reasoning: $reasoning"
fi
fi
fi
# Process streaming JSON format
if [[ "$is_json_stream" == true ]]; then
local chunk_content
chunk_content=$(echo "$line" | jq -r '.data.Content // empty' 2>/dev/null) || true
[[ -n "$chunk_content" ]] && printf "%s" "$chunk_content"
local chunk_session
chunk_session=$(echo "$line" | jq -r '.data.SessionId // empty' 2>/dev/null) || true
[[ -n "$chunk_session" ]] && session_id="$chunk_session"
if [[ "$DEBUG" == "true" ]]; then
local reasoning
reasoning=$(echo "$line" | jq -r '.data.ReasoningContent // empty' 2>/dev/null) || true
[[ -n "$reasoning" ]] && debug_log "Reasoning: $reasoning"
fi
fi
done
# Output newline (end of content)
echo ""
# Output session ID (to stderr for multi-turn conversation)
if [[ -n "$session_id" ]]; then
echo "" >&2
echo "[SessionID] $session_id" >&2
fi
}
# --- Argument parsing ---
while [[ $# -gt 0 ]]; do
case "$1" in
--session-id)
SESSION_ID="$2"
shift 2
;;
--profile)
PROFILE="$2"
shift 2
;;
--debug|-d)
DEBUG=true
shift
;;
--help|-h)
usage
exit 0
;;
-)
QUERY=$(cat)
shift
;;
-*)
echo "Unknown option: $1" >&2
usage
exit 1
;;
*)
QUERY="$1"
shift
;;
esac
done
# --- Input validation ---
# Max query length (reasonable limit for natural language queries)
MAX_QUERY_LENGTH=4000
# Max session ID length
MAX_SESSION_ID_LENGTH=128
# Session ID format: alphanumeric, hyphens, underscores only
SESSION_ID_PATTERN='^[a-zA-Z0-9_-]+$'
validate_input() {
# Validate QUERY
if [[ -z "$QUERY" ]]; then
usage
exit 1
fi
local query_length=#QUERY
if [[ $query_length -gt $MAX_QUERY_LENGTH ]]; then
echo "Error: Query too long ($query_length chars). Maximum allowed: $MAX_QUERY_LENGTH" >&2
exit 1
fi
# Validate SESSION_ID if provided
if [[ -n "$SESSION_ID" ]]; then
local session_id_length=#SESSION_ID
if [[ $session_id_length -gt $MAX_SESSION_ID_LENGTH ]]; then
echo "Error: Session ID too long ($session_id_length chars). Maximum allowed: $MAX_SESSION_ID_LENGTH" >&2
exit 1
fi
if [[ ! "$SESSION_ID" =~ $SESSION_ID_PATTERN ]]; then
echo "Error: Invalid session ID format. Only alphanumeric, hyphens, and underscores allowed." >&2
exit 1
fi
fi
# Validate PROFILE if provided (alphanumeric, hyphens, underscores, dots)
if [[ -n "$PROFILE" ]]; then
if [[ ! "$PROFILE" =~ ^[a-zA-Z0-9._-]+$ ]]; then
echo "Error: Invalid profile name format." >&2
exit 1
fi
fi
}
# --- Validation ---
validate_input
check_dependencies
# --- Build CLI command arguments ---
# Use DAS plugin's kebab-case command, supports Signature V3
cli_args=(das get-yao-chi-agent
--query "$QUERY"
--source "$SOURCE"
--endpoint "$ENDPOINT"
--read-timeout "$READ_TIMEOUT"
--connect-timeout "$CONNECT_TIMEOUT"
--user-agent AlibabaCloud-Agent-Skills
)
if [[ -n "$SESSION_ID" ]]; then
cli_args+=(--session-id "$SESSION_ID")
fi
if [[ -n "$PROFILE" ]]; then
cli_args+=(--profile "$PROFILE")
fi
# --- Output query info ---
echo "[Query] $QUERY" >&2
if [[ -n "$SESSION_ID" ]]; then
echo "[SessionID] $SESSION_ID" >&2
fi
echo "============================================================" >&2
echo "[Tair YaoChi Agent Response]" >&2
debug_log "Executing: aliyun cli_args[*]"
# --- Execute and stream parse ---
# Use pipe for real streaming output, avoid command substitution blocking
aliyun "cli_args[@]" 2>&1 | parse_sse_streaming
exit_code=PIPESTATUS[0]
if [[ $exit_code -ne 0 ]]; then
# Non-zero exit but content already output via pipe, just log debug info
debug_log "aliyun CLI exit code: $exit_code (streaming response may return non-zero)"
fi
Manage multiple Alibaba Cloud accounts and batch-export Security Center (SAS) baseline and vulnerability reports via the aliyun CLI and Python scripts. Suppo...
---
name: alibabacloud-sas-multiaccount-manage
description: Manage multiple Alibaba Cloud accounts and batch-export Security Center (SAS) baseline and vulnerability reports via the aliyun CLI and Python scripts. Supports account list refresh, enable/disable, concurrent batch export of cloud platform configuration check (baselineCspm), system baseline risk (exportHcWarning), Linux/Windows/application/emergency vulnerability results across all managed accounts. Use this skill when users need to manage SAS multi-account settings, export baseline or vulnerability compliance data, or merge multi-account security reports into a single file.
---
# Alibaba Cloud Security Center Multi-Account Management and Baseline Report Export
Use aliyun CLI and Python scripts to manage multiple Alibaba Cloud accounts in a resource directory and batch-export Security Center baseline reports for each account.
## Prerequisites and Environment Setup
### 1. Install Alibaba Cloud CLI
```bash
# macOS
brew install aliyun-cli
# Or download from GitHub: https://github.com/aliyun/aliyun-cli/releases
```
Check credentials:
```bash
aliyun sts get-caller-identity
```
If the call fails, instruct the user to run `aliyun configure` and set up credentials (interactive step, must be completed by the user).
### 1.1 Configure AI mode and plugin mode (required)
This skill requires aliyun CLI plugin mode commands (kebab-case) and a fixed User-Agent declaration.
```bash
# Keep plugins up to date
aliyun plugin update
# Install required product plugins if missing
aliyun plugin install --names aliyun-cli-sts,aliyun-cli-sas
# Enable AI mode and set required UA segment
aliyun configure ai-mode enable
aliyun configure ai-mode set-user-agent --user-agent AlibabaCloud-Agent-Skills
# Optional checks / rollback
aliyun configure ai-mode show
aliyun configure ai-mode disable
```
### 2. Install Python ≥ 3.6
```bash
# Check version
python3 --version # Requires 3.6+, 3.9+ recommended
```
### 3. Create Virtual Environment and Install Dependencies
Create a virtual environment in `<skill-path>/scripts/` and install dependencies declared in `pyproject.toml`:
```bash
cd scripts/
# Option A: use venv
python3 -m venv .venv
.venv/bin/pip install -e .
# Option B: use uv (optional)
uv sync
# Option C: if current Python version is unsupported, install as system dependencies
pip install -r requirements.txt
```
### 4. Run Commands
All scripts must be executed with **Python from the virtual environment** (whether created via venv, uv, conda, etc.). This document uses `.venv/bin/python` in examples; replace it with your actual virtual environment path.
---
## Working Directory
`accounts.json` and exported Excel files are saved in the **agent's current working directory** (the directory where the command is executed). Script files themselves are located in `<skill-path>/scripts/`. Do not switch into the `scripts` directory when running commands, or `accounts.json` location may shift unexpectedly.
```bash
# Example: run from any directory
.venv/bin/python /path/to/scripts/accounts.py refresh
```
## Feature 1: Account Management (`accounts.py`)
### Workflow
1. **First use**: run `refresh` to fetch account list from the resource directory.
2. **Filter as needed**: use `search` to find target accounts and get AccountId.
3. **Enable/disable control**: use `enable` / `disable` to decide which accounts participate in batch export.
### Quick Start
#### Refresh account list
Fetch the latest account list from Alibaba Cloud resource directory and write to `accounts.json`. Existing `enable` states are preserved; new accounts are enabled by default.
```bash
.venv/bin/python accounts.py refresh
```
#### List all accounts
```bash
.venv/bin/python accounts.py list
```
Sample output:
```
1225574417218097 cwx [enabled]
1234567890123456 prod-account [disabled]
```
#### Search accounts
Fuzzy-search by DisplayName, returning AccountId and enable status.
```bash
.venv/bin/python accounts.py search cwx
.venv/bin/python accounts.py search prod
```
#### Enable / disable accounts
Control whether an account participates in subsequent batch exports.
```bash
.venv/bin/python accounts.py enable 1225574417218097
.venv/bin/python accounts.py disable 1234567890123456
```
### `accounts.json` Structure
```json
[
{
"AccountId": "1225574417218097",
"DisplayName": "cwx",
"FolderId": "r-1Q4pqB",
"IsMaAccount": "NO",
"SasVersion": "0",
"enable": true
}
]
```
---
## Feature 2: Batch Baseline Export (`baseline.py`)
Launch export tasks concurrently for all accounts with `enable=true`. After polling completion, files are downloaded, extracted, and merged into a single Excel file.
### Workflow
1. **Concurrent submission**: submit `export-record` requests for all enabled accounts (QPS ≤ 5).
2. **Concurrent polling**: poll `describe-export-info` for each account until export completes.
3. **Download and extract**: download zip and extract xlsx.
4. **Merge output**: merge all account xlsx files into one file via `merge.py`, appending a “Resource Directory Account” column.
5. **Cleanup temporary files**: delete per-account temporary xlsx files after merge.
### Prerequisites
- `accounts.py refresh` has been executed and account enable/disable configuration is complete.
- aliyun CLI is configured with valid credentials and has SAS `export-record` and `describe-export-info` permissions.
- Accounts must have Security Center purchased (free edition accounts are skipped automatically).
### Export cloud platform configuration check results (CSPM)
Export `baselineCspm` results for all enabled accounts and merge into `baseline-cspm-merged-{date}.xlsx`.
```bash
# Export for all enabled accounts
.venv/bin/python baseline.py export-cspm
# Export for one specific account
.venv/bin/python baseline.py export-cspm --account-id 1225574417218097
```
### Export system baseline risk list
Export `exportHcWarning` risk list (high/medium/low, all statuses) for all enabled accounts and merge into `system-warning-merged-{date}.xlsx`.
```bash
# Export for all enabled accounts
.venv/bin/python baseline.py export-system-warning
# Export for one specific account
.venv/bin/python baseline.py export-system-warning --account-id 1225574417218097
```
### Output Files
| File | Description |
|------|------|
| `baseline-cspm-merged-{date}.xlsx` | Merged cloud platform configuration check results, including “Resource Directory Account” column |
| `system-warning-merged-{date}.xlsx` | Merged system baseline risk list, including “Resource Directory Account” column |
### Error Handling
| Scenario | Behavior |
|------|------|
| `FreeVersionNotPermit` | Silently skip this account and continue others |
| `NoPermission` / `Forbidden` | Silently skip this account |
| Export failed (server-side error) | Print `[failed]` message and continue with other accounts |
| All accounts skipped | Print message and exit without output file |
---
## Feature 3: Batch Vulnerability Export (`vuln.py`)
Launch vulnerability export tasks concurrently for all accounts with `enable=true`. Supports four vulnerability types. After polling completion, files are downloaded, extracted, and merged automatically.
### Workflow
1. **Concurrent submission**: submit `export-vul --force` requests for all enabled accounts (QPS ≤ 5).
2. **Concurrent polling**: poll `describe-vul-export-info --force` for each account until export completes.
3. **Download and extract**: download zip and extract xlsx.
4. **Merge output**: merge all account xlsx files into one file via `merge.py`, appending a “Resource Directory Account” column.
5. **Cleanup temporary files**: delete per-account temporary xlsx files after merge.
> When the current account is the same as the caller's primary account, `--ResourceDirectoryAccountId` is omitted automatically.
### Prerequisites
- `accounts.py refresh` has been executed and account enable/disable configuration is complete.
- aliyun CLI is configured with valid credentials and has SAS `export-vul` and `describe-vul-export-info` permissions.
- Accounts must have Security Center purchased (free edition accounts are skipped automatically).
### Export Linux software vulnerabilities (CVE)
Export unresolved Linux software vulnerabilities (high/medium/low priority) for all enabled accounts and merge into `vul-cve-merged-{date}.xlsx`.
```bash
# Export for all enabled accounts
.venv/bin/python vuln.py export-cve
# Export for one specific account
.venv/bin/python vuln.py export-cve --account-id 1225574417218097
```
### Export Windows system vulnerabilities
Export unresolved Windows system vulnerabilities (high/medium/low priority) for all enabled accounts and merge into `vul-sys-merged-{date}.xlsx`.
```bash
.venv/bin/python vuln.py export-sys
.venv/bin/python vuln.py export-sys --account-id 1225574417218097
```
### Export application vulnerabilities (including SCA)
Export unresolved application vulnerabilities (ECS + container, including software composition analysis) for all enabled accounts and merge into `vul-app-merged-{date}.xlsx`.
```bash
.venv/bin/python vuln.py export-app
.venv/bin/python vuln.py export-app --account-id 1225574417218097
```
### Export emergency vulnerabilities
Export emergency vulnerabilities (at-risk status) for all enabled accounts and merge into `vul-emg-merged-{date}.xlsx`.
```bash
.venv/bin/python vuln.py export-emg
.venv/bin/python vuln.py export-emg --account-id 1225574417218097
```
### Output Files
| File | Description |
|------|------|
| `vul-cve-merged-{date}.xlsx` | Merged Linux software vulnerability list, including “Resource Directory Account” column |
| `vul-sys-merged-{date}.xlsx` | Merged Windows system vulnerability list, including “Resource Directory Account” column |
| `vul-app-merged-{date}.xlsx` | Merged application vulnerability list (including SCA), including “Resource Directory Account” column |
| `vul-emg-merged-{date}.xlsx` | Merged emergency vulnerability list, including “Resource Directory Account” column |
### Export Parameter Details
| Type | `export-vul` parameters |
|------|----------------|
| `export-cve` | `--Type cve --Necessity asap,later,nntf --Dealed n` |
| `export-sys` | `--Type sys --Necessity asap,later,nntf --Dealed n` |
| `export-app` | `--Type app --Necessity asap,later,nntf --AttachTypes sca --AssetType ECS,CONTAINER --Dealed n` |
| `export-emg` | `--Type emg --RiskStatus y --Dealed n` |
### Error Handling
| Scenario | Behavior |
|------|------|
| `FreeVersionNotPermit` | Silently skip this account and continue others |
| `NoPermission` / `Forbidden` | Silently skip this account |
| Export failed (server-side error) | Print `[failed]` message and continue with other accounts |
| All accounts skipped | Print message and exit without output file |
---
## Notes
- Scripts must run in a virtual environment. Examples use `.venv/bin/python`; replace with your actual virtual environment path.
- Manage aliyun CLI credentials with `aliyun configure`; do not hardcode AK/SK.
- SAS API supports only two endpoints: `cn-shanghai` (China mainland) and `ap-southeast-1` (outside China mainland).
FILE:references/ram-policies.md
# RAM 权限策略说明
本 Skill 通过 aliyun CLI 调用阿里云云安全中心 (SAS) 和安全令牌服务 (STS),运行账号(RAM 用户或 RAM 角色)须被授予以下最小权限。
> **注意**:云安全中心 (SAS) 的 RAM 授权粒度为 **SERVICE 级别**,不支持资源级授权,`Resource` 必须设置为 `"*"`。
---
## 所需 RAM Action
### 安全令牌服务 (STS)
| Action | 调用脚本 | 用途 |
|--------|----------|------|
| `sts:GetCallerIdentity` | `accounts.py`、`baseline.py`、`vuln.py` | 获取当前凭证的主账号 ID,用于判断是否省略 `--ResourceDirectoryAccountId` 参数 |
### 云安全中心 (SAS)
RAM Code:`yundun-sas`(同义别名:`threatdetection`、`yundun-aegis`)
| Action | 调用脚本 | 访问级别 | 用途 |
|--------|----------|----------|------|
| `yundun-sas:ListAccountsInResourceDirectory` | `accounts.py` | 读取 | 从资源目录拉取所有成员账号列表 |
| `yundun-sas:DescribeMonitorAccounts` | `accounts.py` | 读取 | 查询已纳入 SAS 监控的成员账号列表,用于过滤 |
| `yundun-sas:ExportRecord` | `baseline.py` | 写入 | 发起基线检测结果导出任务(baselineCspm / exportHcWarning) |
| `yundun-sas:DescribeExportInfo` | `baseline.py` | 读取 | 轮询基线导出任务状态,获取下载链接 |
| `yundun-sas:ExportVul` | `vuln.py` | 写入 | 发起漏洞导出任务(cve / sys / app / emg) |
| `yundun-sas:DescribeVulExportInfo` | `vuln.py` | 读取 | 轮询漏洞导出任务状态,获取下载链接 |
---
## 最小权限策略示例
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sts:GetCallerIdentity"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"yundun-sas:ListAccountsInResourceDirectory",
"yundun-sas:DescribeMonitorAccounts",
"yundun-sas:ExportRecord",
"yundun-sas:DescribeExportInfo",
"yundun-sas:ExportVul",
"yundun-sas:DescribeVulExportInfo"
],
"Resource": "*"
}
]
}
```
---
## 多账号访问说明
本工具设计用于**资源目录主账号**(Master Account)下运行,通过 `--ResourceDirectoryAccountId` 参数代入成员账号进行操作。
- 运行凭证须属于资源目录的**管理账号**或已被授权的 RAM 角色
- 成员账号侧无需额外配置,SAS 多账号管理的权限由主账号统一管控
- 若凭证属于成员账号(非主账号),`--ResourceDirectoryAccountId` 参数会被自动省略,仅导出该账号自身的数据
---
## 授权说明
- **读取操作**(`Describe*`、`GetCallerIdentity`):不修改任何资源,风险低
- **写入操作**(`ExportRecord`、`ExportVul`):在服务端触发导出任务,不修改用户资产或配置,风险低
- 由于 SAS 不支持资源级授权,所有 `yundun-sas` Action 的 Resource 均须设为 `"*"`
---
## 参考文档
- [云安全中心 RAM 鉴权](https://www.alibabacloud.com/help/en/security-center/developer-reference/api-sas-2018-12-03-ram)
- [STS GetCallerIdentity](https://www.alibabacloud.com/help/en/resource-access-management/latest/getcalleridentity)
- [RAM 自定义策略](https://help.aliyun.com/zh/ram/user-guide/create-a-custom-policy)
FILE:scripts/accounts.py
#!/usr/bin/env python3
"""accounts.py — 多账号管理工具
用法:
uv run accounts.py refresh
uv run accounts.py search <DisplayName>
uv run accounts.py enable <AccountId>
uv run accounts.py disable <AccountId>
uv run accounts.py list
"""
import argparse
import json
import subprocess
import sys
from pathlib import Path
ACCOUNTS_FILE = Path("accounts.json")
ALIYUN_USER_AGENT_HEADER = "User-Agent=AlibabaCloud-Agent-Skills/alibabacloud-sas-multiaccount-manage"
CLI_CONNECT_TIMEOUT_SECONDS = 10
CLI_READ_TIMEOUT_SECONDS = 60
def _aliyun_cmd(*args):
"""构建统一的 aliyun CLI 参数(含 User-Agent 与超时配置)。"""
return [
"aliyun",
"--header",
ALIYUN_USER_AGENT_HEADER,
"--connect-timeout",
str(CLI_CONNECT_TIMEOUT_SECONDS),
"--read-timeout",
str(CLI_READ_TIMEOUT_SECONDS),
*args,
]
def load_accounts():
if not ACCOUNTS_FILE.exists():
print("错误: accounts.json 不存在,请先执行 refresh", file=sys.stderr)
sys.exit(1)
return json.loads(ACCOUNTS_FILE.read_text(encoding="utf-8"))
def save_accounts(accounts):
ACCOUNTS_FILE.write_text(
json.dumps(accounts, indent=2, ensure_ascii=False), encoding="utf-8"
)
def cmd_refresh(args):
"""调用 aliyun sas list-accounts-in-resource-directory,写入 accounts.json"""
region_id = getattr(args, "region_id", "cn-shanghai")
# 获取当前凭证的主账号(自身也应包含在可操作范围内)
identity_result = subprocess.run(
_aliyun_cmd("sts", "get-caller-identity"),
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
)
if identity_result.returncode != 0:
print(f"get-caller-identity 调用失败:\n{identity_result.stderr}", file=sys.stderr)
sys.exit(1)
caller_account_id = str(json.loads(identity_result.stdout)["AccountId"])
result = subprocess.run(
_aliyun_cmd("sas", "--region", region_id, "list-accounts-in-resource-directory"),
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
)
if result.returncode != 0:
print(f"aliyun CLI 调用失败:\n{result.stderr}", file=sys.stderr)
sys.exit(1)
data = json.loads(result.stdout)
accounts = data.get("Accounts", [])
# 调用 describe-monitor-accounts,只保留已纳入监控的账号(并将自身主账号也包含进来)
monitor_result = subprocess.run(
_aliyun_cmd("sas", "--region", region_id, "describe-monitor-accounts"),
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
)
if monitor_result.returncode != 0:
print(f"describe-monitor-accounts 调用失败:\n{monitor_result.stderr}", file=sys.stderr)
sys.exit(1)
monitor_data = json.loads(monitor_result.stdout)
monitored_ids = set(str(i) for i in monitor_data.get("AccountIds", []))
monitored_ids.add(caller_account_id) # 自身主账号始终属于可操作范围
accounts = [a for a in accounts if str(a["AccountId"]) in monitored_ids]
# 保留已有的 enable 状态,新账号默认 enable=true
existing = {}
if ACCOUNTS_FILE.exists():
for a in json.loads(ACCOUNTS_FILE.read_text(encoding="utf-8")):
existing[a["AccountId"]] = a.get("enable", True)
for account in accounts:
account["enable"] = existing.get(account["AccountId"], True)
save_accounts(accounts)
print(f"已刷新 {len(accounts)} 个账号,写入 {ACCOUNTS_FILE}")
def cmd_search(args):
"""按 DisplayName 模糊搜索,输出 AccountId"""
keyword = args.keyword.lower()
accounts = load_accounts()
results = [
a for a in accounts if keyword in a.get("DisplayName", "").lower()
]
if not results:
print(f"未找到匹配 '{args.keyword}' 的账号")
return
for a in results:
status = "启用" if a.get("enable", True) else "禁用"
print(f"{a['AccountId']}\t{a.get('DisplayName', '')}\t[{status}]")
def _set_enable(account_id, value):
accounts = load_accounts()
found = False
for a in accounts:
if a["AccountId"] == account_id:
a["enable"] = value
found = True
break
if not found:
print(f"错误: 未找到账号 {account_id}", file=sys.stderr)
sys.exit(1)
save_accounts(accounts)
action = "启用" if value else "禁用"
print(f"账号 {account_id} 已{action}")
def cmd_enable(args):
_set_enable(args.account_id, True)
def cmd_disable(args):
_set_enable(args.account_id, False)
def cmd_list(_args):
"""列出所有账号"""
accounts = load_accounts()
for a in accounts:
status = "启用" if a.get("enable", True) else "禁用"
print(f"{a['AccountId']}\t{a.get('DisplayName', ''):<20}\t[{status}]")
def get_enabled_accounts():
"""供其他模块调用:返回所有 enable=True 的账号列表"""
return [a for a in load_accounts() if a.get("enable", True)]
def get_caller_account_id():
"""获取当前凭证的主账号 ID(通过 aliyun sts get-caller-identity)。"""
result = subprocess.run(
_aliyun_cmd("sts", "get-caller-identity"),
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True,
)
if result.returncode != 0:
print(f"get-caller-identity 调用失败:\n{result.stderr}", file=sys.stderr)
sys.exit(1)
return str(json.loads(result.stdout)["AccountId"])
def main():
parser = argparse.ArgumentParser(
description="阿里云云安全中心多账号管理工具"
)
sub = parser.add_subparsers(dest="command", metavar="command")
sub.required = True
p_refresh = sub.add_parser("refresh", help="刷新账号列表(从资源目录拉取)")
p_refresh.add_argument(
"--region-id",
dest="region_id",
choices=["cn-shanghai", "ap-southeast-1"],
default="cn-shanghai",
help="SAS API 地域:cn-shanghai(中国大陆,默认)/ ap-southeast-1(非中国大陆)",
)
p_search = sub.add_parser("search", help="按 DisplayName 搜索账号")
p_search.add_argument("keyword", help="搜索关键字")
p_enable = sub.add_parser("enable", help="启用指定账号")
p_enable.add_argument("account_id", help="账号 ID")
p_disable = sub.add_parser("disable", help="禁用指定账号")
p_disable.add_argument("account_id", help="账号 ID")
sub.add_parser("list", help="列出所有账号及状态")
args = parser.parse_args()
dispatch = {
"refresh": cmd_refresh,
"search": cmd_search,
"enable": cmd_enable,
"disable": cmd_disable,
"list": cmd_list,
}
dispatch[args.command](args)
if __name__ == "__main__":
main()
FILE:scripts/baseline.py
#!/usr/bin/env python3
"""baseline.py — 云安全中心基线/系统基线批量导出工具
用法:
# 导出所有启用账号的云平台配置检查结果
uv run baseline.py export-cspm
# 仅导出指定账号
uv run baseline.py export-cspm --account-id 1234567890
# 导出系统基线风险列表
uv run baseline.py export-system-warning
uv run baseline.py export-system-warning --account-id 1234567890
"""
import argparse
import asyncio
import json
import shutil
import sys
import urllib.request
import zipfile
from datetime import date
from pathlib import Path
# 将 scripts 目录加入路径以便导入同级模块
sys.path.insert(0, str(Path(__file__).parent))
from accounts import get_caller_account_id, get_enabled_accounts # noqa: E402
from merge import merge_excel # noqa: E402
TODAY = date.today().strftime("%Y%m%d")
QPS_LIMIT = 5 # API 并发上限
ALIYUN_USER_AGENT_HEADER = "User-Agent=AlibabaCloud-Agent-Skills/alibabacloud-sas-multiaccount-manage"
CLI_CONNECT_TIMEOUT_SECONDS = 10
CLI_READ_TIMEOUT_SECONDS = 60
DOWNLOAD_TIMEOUT_SECONDS = 60
# 全局信号量,在事件循环内初始化
# 当前凭证的主账号 ID,在 do_export() 中初始化
_caller_account_id = ""
# SAS API 地域,在 do_export() 中初始化
_region_id = "cn-shanghai"
# 可跳过的 API 错误码(账号无权限/免费版限制等),静默忽略
_SKIP_ERROR_CODES = {"FreeVersionNotPermit", "NoPermission", "Forbidden"}
class AccountSkippedError(Exception):
"""账号因权限不足等原因被跳过,不中断整体流程。"""
def __init__(self, account_id, reason):
self.account_id = account_id
self.reason = reason
super().__init__(f"账号 {account_id} 跳过: {reason}")
# ────────────────────────────── 异步 API 封装 ──────────────────────────────
async def _run_aliyun_async(args, account_id=""):
"""异步运行 aliyun CLI,通过信号量将并发 API 调用限制在 QPS ≤ 5。
对可跳过的错误码(如 FreeVersionNotPermit)抛出 AccountSkippedError。
"""
async with _api_sem:
proc = await asyncio.create_subprocess_exec(
"aliyun",
"--header",
ALIYUN_USER_AGENT_HEADER,
"--connect-timeout",
str(CLI_CONNECT_TIMEOUT_SECONDS),
"--read-timeout",
str(CLI_READ_TIMEOUT_SECONDS),
"sas",
"--region",
_region_id,
*args,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await proc.communicate()
err_text = stderr.decode()
if proc.returncode != 0:
# 检查是否属于可跳过的错误码(静默忽略)
for code in _SKIP_ERROR_CODES:
if code in err_text:
raise AccountSkippedError(account_id, code)
print(f"aliyun CLI 调用失败:\n{err_text}", file=sys.stderr)
raise RuntimeError("aliyun CLI error")
# returncode==0 但 stderr 中仍含可跳过错误码时也忽略
for code in _SKIP_ERROR_CODES:
if code in err_text:
raise AccountSkippedError(account_id, code)
try:
return json.loads(stdout.decode())
except json.JSONDecodeError:
print(f"响应解析失败:\n{stdout.decode()}", file=sys.stderr)
raise
async def start_export_async(export_type, account_id, params=None):
"""发起导出任务,返回 export_id。"""
cli_args = [
"export-record",
"--lang", "zh",
"--export-type", export_type,
]
if account_id != _caller_account_id:
cli_args += ["--resource-directory-account-id", str(account_id)]
if params:
cli_args += ["--params", params]
data = await _run_aliyun_async(cli_args, account_id)
export_id = str(data["Id"])
print(f" [提交] 账号 {account_id},export_id={data['Id']}")
return export_id
async def wait_for_export_async(export_id, account_id, poll_interval=5):
"""轮询导出状态,成功后返回下载链接。"""
while True:
cli_args = [
"describe-export-info",
"--export-id", export_id,
]
if account_id != _caller_account_id:
cli_args += ["--resource-directory-account-id", str(account_id)]
data = await _run_aliyun_async(cli_args, account_id)
status = data.get("ExportStatus", "")
if status == "success":
return data["Link"]
if status in ("failed", "error"):
raise RuntimeError(f"账号 {account_id} 导出失败,状态={status}")
await asyncio.sleep(poll_interval)
async def download_and_extract_async(link, account_id, prefix):
"""异步下载 zip,解压,重命名为 {prefix}-{account_id}-{date}.xlsx。"""
zip_path = Path(f"{prefix}-{account_id}-{TODAY}.zip")
print(f" [下载] 账号 {account_id}...")
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, _download_with_timeout, link, zip_path)
with zipfile.ZipFile(zip_path) as zf:
names = zf.namelist()
zf.extractall(path=zip_path.parent)
extracted = zip_path.parent / names[0]
output_path = Path(f"{prefix}-{account_id}-{TODAY}.xlsx")
if extracted != output_path:
extracted.rename(output_path)
if zip_path.exists():
zip_path.unlink()
return str(output_path)
def _download_with_timeout(link, output_path):
"""下载文件并设置超时,避免网络异常时无限阻塞。"""
with urllib.request.urlopen(link, timeout=DOWNLOAD_TIMEOUT_SECONDS) as response:
with Path(output_path).open("wb") as fp:
shutil.copyfileobj(response, fp)
async def _wait_and_download(export_id, account_id, prefix):
"""等待导出完成后下载,返回 merge 所需的 input dict。"""
link = await wait_for_export_async(export_id, account_id)
xlsx_path = await download_and_extract_async(link, account_id, prefix)
return {"filename": xlsx_path, "account_id": account_id}
# ────────────────────────────── 异步导出流程 ──────────────────────────────
async def do_export_async(export_type, prefix, merged_name, account_ids, params=None):
"""并发导出流程:
阶段 1 — 并发提交所有账号的导出任务(QPS ≤ 5)
阶段 2 — 并发轮询 + 下载解压(QPS ≤ 5)
阶段 3 — 合并所有 xlsx
"""
global _api_sem
_api_sem = asyncio.Semaphore(QPS_LIMIT)
if not account_ids:
print("错误: 没有可用账号", file=sys.stderr)
sys.exit(1)
print(f"共 {len(account_ids)} 个账号待导出(耗时操作,请耐心等待)")
# 阶段 1:并发提交所有导出任务
submit_results = await asyncio.gather(
*[start_export_async(export_type, aid, params) for aid in account_ids],
return_exceptions=True,
)
pending = [] # (export_id, account_id)
skipped = []
for aid, res in zip(account_ids, submit_results):
if isinstance(res, AccountSkippedError):
skipped.append(aid)
elif isinstance(res, BaseException):
print(f" [失败] 账号 {aid}: {res}", file=sys.stderr)
else:
pending.append((res, aid))
if not pending:
print(f"所有账号均被跳过({len(skipped)} 个),无可合并数据")
return
# 阶段 2:并发等待 + 下载解压
download_results = await asyncio.gather(
*[_wait_and_download(eid, aid, prefix) for eid, aid in pending],
return_exceptions=True,
)
inputs = []
failed = []
for (_, aid), res in zip(pending, download_results):
if isinstance(res, AccountSkippedError):
skipped.append(aid)
elif isinstance(res, BaseException):
print(f" [失败] 账号 {aid}: {res}", file=sys.stderr)
failed.append(aid)
else:
print(f" [成功] 账号 {aid}")
inputs.append(res)
if skipped:
print(f"跳过 {len(skipped)} 个账号: {', '.join(skipped)}")
if failed:
print(f"失败 {len(failed)} 个账号: {', '.join(failed)}", file=sys.stderr)
if not inputs:
print("没有成功下载的文件,跳过合并")
return
# 阶段 3:合并
merge_excel(merged_name, inputs)
# 阶段 4:删除临时 xlsx 文件
for item in inputs:
tmp = Path(item["filename"])
if tmp.exists():
tmp.unlink()
print(f"已生成: {merged_name}(共 {len(inputs)} 个账号,已清理临时文件)")
def do_export(export_type, prefix, merged_name, account_ids, params=None, region_id="cn-shanghai"):
"""同步入口,内部通过 asyncio.run 驱动异步流程。"""
global _caller_account_id, _region_id
_caller_account_id = get_caller_account_id()
_region_id = region_id
loop = asyncio.get_event_loop()
loop.run_until_complete(
do_export_async(export_type, prefix, merged_name, account_ids, params)
)
# ────────────────────────────── CLI 子命令 ──────────────────────────────
def cmd_export_cspm(args):
"""导出云平台配置检查结果(baselineCspm)。"""
if args.account_id:
account_ids = [args.account_id]
else:
accounts = get_enabled_accounts()
account_ids = [a["AccountId"] for a in accounts]
do_export(
export_type="baselineCspm",
prefix="baseline-cspm",
merged_name=f"baseline-cspm-merged-{TODAY}.xlsx",
account_ids=account_ids,
region_id=args.region_id,
)
def cmd_export_system_warning(args):
"""导出系统基线风险列表(exportHcWarning)。"""
if args.account_id:
account_ids = [args.account_id]
else:
accounts = get_enabled_accounts()
account_ids = [a["AccountId"] for a in accounts]
params = json.dumps(
{
"CheckLevel": "high,medium,low",
"CheckWarningStatusList": [1, 3, 6, 8],
"Source": "default",
},
ensure_ascii=False,
)
do_export(
export_type="exportHcWarning",
prefix="system-warning",
merged_name=f"system-warning-merged-{TODAY}.xlsx",
account_ids=account_ids,
params=params,
region_id=args.region_id,
)
# ────────────────────────────── 入口 ──────────────────────────────
def main():
parser = argparse.ArgumentParser(description="云安全中心基线批量导出工具")
sub = parser.add_subparsers(dest="command", metavar="command")
sub.required = True
# export-cspm
p_cspm = sub.add_parser(
"export-cspm",
help="导出云平台配置检查结果(baselineCspm)",
)
p_cspm.add_argument(
"--account-id",
metavar="ACCOUNT_ID",
help="指定单个账号 ID(默认导出所有启用账号)",
)
p_cspm.add_argument(
"--region-id",
dest="region_id",
choices=["cn-shanghai", "ap-southeast-1"],
default="cn-shanghai",
help="SAS API 地域:cn-shanghai(中国大陆,默认)/ ap-southeast-1(非中国大陆)",
)
# export-system-warning
p_warn = sub.add_parser(
"export-system-warning",
help="导出系统基线风险列表(exportHcWarning)",
)
p_warn.add_argument(
"--account-id",
metavar="ACCOUNT_ID",
help="指定单个账号 ID(默认导出所有启用账号)",
)
p_warn.add_argument(
"--region-id",
dest="region_id",
choices=["cn-shanghai", "ap-southeast-1"],
default="cn-shanghai",
help="SAS API 地域:cn-shanghai(中国大陆,默认)/ ap-southeast-1(非中国大陆)",
)
args = parser.parse_args()
dispatch = {
"export-cspm": cmd_export_cspm,
"export-system-warning": cmd_export_system_warning,
}
dispatch[args.command](args)
if __name__ == "__main__":
main()
FILE:scripts/merge.py
#!/usr/bin/env python3
"""merge.py — 公共 Excel 表格合并工具
用法(作为模块导入):
from merge import merge_excel
merge_excel(
export_name="output.xlsx",
inputs=[
{"filename": "a.xlsx", "account_id": "123456"},
{"filename": "b.xlsx", "account_id": "789012"},
],
)
用法(命令行):
uv run merge.py --output merged.xlsx --input a.xlsx:123 b.xlsx:456
"""
import argparse
import sys
import warnings
from pathlib import Path
import openpyxl
warnings.filterwarnings(
"ignore",
message="Workbook contains no default style",
category=UserWarning,
)
def merge_excel(export_name, inputs):
"""将多个 Excel 文件合并为一个,并在末尾追加「资源管理账号」列。
Args:
export_name: 输出文件名(.xlsx)
inputs: 列表,每项包含 filename(str) 和 account_id(str)
Returns:
输出文件的路径字符串
"""
if not inputs:
raise ValueError("inputs 不能为空")
all_rows = []
header = None
valid_count = 0
for item in inputs:
filename = item["filename"]
account_id = str(item["account_id"])
if not Path(filename).exists():
print("警告: 文件 {} 不存在,跳过".format(filename), file=sys.stderr)
continue
wb = openpyxl.load_workbook(filename, data_only=True)
ws = wb["Sheet0"] if "Sheet0" in wb.sheetnames else wb.active
rows = list(ws.values)
wb.close()
if not rows:
continue
if header is None:
header = list(rows[0]) + ["资源管理账号"]
all_rows.append(header)
for row in rows[1:]:
all_rows.append(list(row) + [account_id])
valid_count += 1
if not all_rows or valid_count == 0:
raise RuntimeError("没有可合并的有效文件")
out_wb = openpyxl.Workbook()
out_ws = out_wb.active
out_ws.title = "Sheet0"
for row in all_rows:
out_ws.append(row)
out_wb.save(export_name)
data_rows = len(all_rows) - 1
print("合并完成: {}(共 {} 行,{} 个账号)".format(export_name, data_rows, valid_count))
return export_name
def main():
parser = argparse.ArgumentParser(description="多账号 Excel 表格合并工具")
parser.add_argument("--output", "-o", required=True, help="输出文件名")
parser.add_argument(
"--input",
"-i",
nargs="+",
required=True,
metavar="FILE:ACCOUNT_ID",
help="输入文件,格式: 文件路径:账号ID,可指定多个",
)
args = parser.parse_args()
inputs = []
for item in args.input:
parts = item.rsplit(":", 1)
if len(parts) != 2:
print(
"错误: 输入格式不正确 '{}',应为 '文件路径:账号ID'".format(item),
file=sys.stderr,
)
sys.exit(1)
inputs.append({"filename": parts[0], "account_id": parts[1]})
merge_excel(args.output, inputs)
if __name__ == "__main__":
main()
FILE:scripts/pyproject.toml
[project]
name = "scripts"
version = "0.1.0"
description = "阿里云云安全中心多账号管理工具"
requires-python = ">=3.6"
dependencies = [
"openpyxl>=3.1.5",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
only-include = [
"accounts.py",
"merge.py",
"baseline.py",
"export_baseline.py",
"vuln.py",
]
[project.scripts]
accounts = "accounts:main"
merge = "merge:main"
baseline = "baseline:main"
vuln = "vuln:main"
FILE:scripts/requirements.txt
et_xmlfile==2.0.0
numpy==2.4.4
openpyxl==3.1.5
python-dateutil==2.9.0.post0
six==1.17.0
FILE:scripts/vuln.py
#!/usr/bin/env python3
"""vuln.py — 云安全中心漏洞批量导出工具
用法:
uv run vuln.py export-cve # Linux 软件漏洞
uv run vuln.py export-sys # Windows 系统漏洞
uv run vuln.py export-app # 应用漏洞(含 SCA)
uv run vuln.py export-emg # 应急漏洞
uv run vuln.py export-all # 依次导出全部四种类型
# 仅导出指定账号
uv run vuln.py export-cve --account-id 1234567890
"""
import argparse
import asyncio
import json
import shutil
import sys
import urllib.request
import zipfile
from datetime import date
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent))
from accounts import get_caller_account_id, get_enabled_accounts # noqa: E402
from merge import merge_excel # noqa: E402
TODAY = date.today().strftime("%Y%m%d")
QPS_LIMIT = 5
ALIYUN_USER_AGENT_HEADER = "User-Agent=AlibabaCloud-Agent-Skills/alibabacloud-sas-multiaccount-manage"
CLI_CONNECT_TIMEOUT_SECONDS = 10
CLI_READ_TIMEOUT_SECONDS = 60
DOWNLOAD_TIMEOUT_SECONDS = 60
_api_sem = None # 在事件循环内初始化
# 当前凭证的主账号 ID,在 do_export() 中初始化
_caller_account_id = ""
# SAS API 地域,在 do_export() 中初始化
_region_id = "cn-shanghai"
_SKIP_ERROR_CODES = {"FreeVersionNotPermit", "NoPermission", "Forbidden"}
# ────────────────────────────── 导出类型配置 ──────────────────────────────
EXPORT_CONFIGS = {
"cve": {
"cli_args": [
"--lang", "zh",
"--type", "cve",
"--necessity", "asap,later,nntf",
"--dealed", "n",
],
"prefix": "vul-cve",
"desc": "Linux 软件漏洞",
},
"sys": {
"cli_args": [
"--lang", "zh",
"--type", "sys",
"--necessity", "asap,later,nntf",
"--dealed", "n",
],
"prefix": "vul-sys",
"desc": "Windows 系统漏洞",
},
"app": {
"cli_args": [
"--lang", "zh",
"--type", "app",
"--necessity", "asap,later,nntf",
"--attach-types", "sca",
"--asset-type", "ECS,CONTAINER",
"--dealed", "n",
],
"prefix": "vul-app",
"desc": "应用漏洞",
},
"emg": {
"cli_args": [
"--lang", "zh",
"--type", "emg",
"--risk-status", "y",
"--dealed", "n",
],
"prefix": "vul-emg",
"desc": "应急漏洞",
},
}
# ────────────────────────────── 异常 ──────────────────────────────
class AccountSkippedError(Exception):
"""账号因权限不足等原因被跳过,不中断整体流程。"""
def __init__(self, account_id, reason):
self.account_id = account_id
self.reason = reason
super().__init__(f"账号 {account_id} 跳过: {reason}")
# ────────────────────────────── 异步 API 封装 ──────────────────────────────
async def _run_aliyun_async(args, account_id=""):
"""异步运行 aliyun CLI,通过信号量将并发 API 调用限制在 QPS ≤ 5。"""
async with _api_sem:
proc = await asyncio.create_subprocess_exec(
"aliyun",
"--header",
ALIYUN_USER_AGENT_HEADER,
"--connect-timeout",
str(CLI_CONNECT_TIMEOUT_SECONDS),
"--read-timeout",
str(CLI_READ_TIMEOUT_SECONDS),
"sas",
"--region",
_region_id,
*args,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await proc.communicate()
err_text = stderr.decode()
if proc.returncode != 0:
for code in _SKIP_ERROR_CODES:
if code in err_text:
raise AccountSkippedError(account_id, code)
print(f"aliyun CLI 调用失败:\n{err_text}", file=sys.stderr)
raise RuntimeError("aliyun CLI error")
# returncode==0 但 stderr 中仍含可跳过错误码时也忽略
for code in _SKIP_ERROR_CODES:
if code in err_text:
raise AccountSkippedError(account_id, code)
try:
return json.loads(stdout.decode())
except json.JSONDecodeError:
print(f"响应解析失败:\n{stdout.decode()}", file=sys.stderr)
raise
async def start_export_async(vul_type, account_id):
"""发起漏洞导出任务,返回 export_id。"""
cli_args = ["export-vul", "--force"] + EXPORT_CONFIGS[vul_type]["cli_args"]
if account_id != _caller_account_id:
cli_args += ["--resource-directory-account-id", str(account_id)]
data = await _run_aliyun_async(cli_args, account_id)
print(f" [提交] 账号 {account_id},export_id={data['Id']}")
return str(data["Id"])
async def wait_for_export_async(export_id, account_id, poll_interval=5):
"""轮询 describe-vul-export-info,成功后返回下载链接。"""
while True:
cli_args = [
"describe-vul-export-info",
"--force",
"--export-id", export_id,
]
if account_id != _caller_account_id:
cli_args += ["--resource-directory-account-id", str(account_id)]
data = await _run_aliyun_async(cli_args, account_id)
status = data.get("ExportStatus", "")
if status == "success":
return data["Link"]
if status in ("failed", "error"):
raise RuntimeError(f"账号 {account_id} 导出失败,状态={status}")
await asyncio.sleep(poll_interval)
async def download_and_extract_async(link, account_id, prefix):
"""下载 zip,解压,重命名为 {prefix}-{account_id}-{date}.xlsx。"""
zip_path = Path(f"{prefix}-{account_id}-{TODAY}.zip")
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, _download_with_timeout, link, zip_path)
with zipfile.ZipFile(zip_path) as zf:
names = zf.namelist()
zf.extractall(path=zip_path.parent)
extracted = zip_path.parent / names[0]
output_path = Path(f"{prefix}-{account_id}-{TODAY}.xlsx")
if extracted != output_path:
extracted.rename(output_path)
if zip_path.exists():
zip_path.unlink()
return str(output_path)
def _download_with_timeout(link, output_path):
"""下载文件并设置超时,避免网络异常时无限阻塞。"""
with urllib.request.urlopen(link, timeout=DOWNLOAD_TIMEOUT_SECONDS) as response:
with Path(output_path).open("wb") as fp:
shutil.copyfileobj(response, fp)
async def _wait_and_download(export_id, account_id, prefix):
link = await wait_for_export_async(export_id, account_id)
xlsx_path = await download_and_extract_async(link, account_id, prefix)
return {"filename": xlsx_path, "account_id": account_id}
# ────────────────────────────── 导出流程 ──────────────────────────────
async def do_export_async(vul_type, account_ids):
"""单类型并发导出:提交 → 等待 → 下载 → 合并 → 清理临时文件。"""
global _api_sem
_api_sem = asyncio.Semaphore(QPS_LIMIT)
config = EXPORT_CONFIGS[vul_type]
prefix = config["prefix"]
merged_name = f"{prefix}-merged-{TODAY}.xlsx"
print(f"导出【{config['desc']}】共 {len(account_ids)} 个账号(耗时操作,请耐心等待)")
# 阶段 1:并发提交
submit_results = await asyncio.gather(
*[start_export_async(vul_type, aid) for aid in account_ids],
return_exceptions=True,
)
pending = []
skipped = []
for aid, res in zip(account_ids, submit_results):
if isinstance(res, AccountSkippedError):
skipped.append(aid)
elif isinstance(res, BaseException):
print(f" [失败] 账号 {aid}: {res}", file=sys.stderr)
else:
pending.append((res, aid))
if not pending:
print(f"所有账号均被跳过({len(skipped)} 个),跳过【{config['desc']}】")
return
# 阶段 2:并发等待 + 下载
download_results = await asyncio.gather(
*[_wait_and_download(eid, aid, prefix) for eid, aid in pending],
return_exceptions=True,
)
inputs = []
failed = []
for (_, aid), res in zip(pending, download_results):
if isinstance(res, AccountSkippedError):
skipped.append(aid)
elif isinstance(res, BaseException):
print(f" [失败] 账号 {aid}: {res}", file=sys.stderr)
failed.append(aid)
else:
print(f" [成功] 账号 {aid}")
inputs.append(res)
if skipped:
print(f"跳过 {len(skipped)} 个账号: {', '.join(skipped)}")
if failed:
print(f"失败 {len(failed)} 个账号: {', '.join(failed)}", file=sys.stderr)
if not inputs:
print(f"没有成功下载的文件,跳过【{config['desc']}】合并")
return
# 阶段 3:合并
merge_excel(merged_name, inputs)
# 阶段 4:清理临时文件
for item in inputs:
p = Path(item["filename"])
if p.exists():
p.unlink()
print(f"已生成: {merged_name}(共 {len(inputs)} 个账号,已清理临时文件)")
def do_export(vul_type, account_ids, region_id="cn-shanghai"):
global _caller_account_id, _region_id
_caller_account_id = get_caller_account_id()
_region_id = region_id
loop = asyncio.get_event_loop()
loop.run_until_complete(do_export_async(vul_type, account_ids))
# ────────────────────────────── CLI ──────────────────────────────
def _get_account_ids(args):
if getattr(args, "account_id", None):
return [args.account_id]
accounts = get_enabled_accounts()
if not accounts:
print("错误: 没有可用账号,请先执行 accounts.py refresh", file=sys.stderr)
sys.exit(1)
return [a["AccountId"] for a in accounts]
def cmd_export_cve(args):
do_export("cve", _get_account_ids(args), args.region_id)
def cmd_export_sys(args):
do_export("sys", _get_account_ids(args), args.region_id)
def cmd_export_app(args):
do_export("app", _get_account_ids(args), args.region_id)
def cmd_export_emg(args):
do_export("emg", _get_account_ids(args), args.region_id)
def main():
parser = argparse.ArgumentParser(description="云安全中心漏洞批量导出工具")
sub = parser.add_subparsers(dest="command", metavar="command")
sub.required = True
def _add_sub(name, help_text):
p = sub.add_parser(name, help=help_text)
p.add_argument(
"--account-id",
metavar="ACCOUNT_ID",
help="指定单个账号 ID(默认导出所有启用账号)",
)
p.add_argument(
"--region-id",
dest="region_id",
choices=["cn-shanghai", "ap-southeast-1"],
default="cn-shanghai",
help="SAS API 地域:cn-shanghai(中国大陆,默认)/ ap-southeast-1(非中国大陆)",
)
return p
_add_sub("export-cve", "导出 Linux 软件漏洞(cve)")
_add_sub("export-sys", "导出 Windows 系统漏洞(sys)")
_add_sub("export-app", "导出应用漏洞(app,含 SCA)")
_add_sub("export-emg", "导出应急漏洞(emg)")
args = parser.parse_args()
dispatch = {
"export-cve": cmd_export_cve,
"export-sys": cmd_export_sys,
"export-app": cmd_export_app,
"export-emg": cmd_export_emg,
}
dispatch[args.command](args)
if __name__ == "__main__":
main()
Alibaba Cloud EMAS APM (mobile Application Performance Monitoring) issue troubleshooting skill. Covers the 4 read-only OpenAPIs exposed by the `aliyun emas-a...
---
name: alibabacloud-emas-apm-query
description: |
Alibaba Cloud EMAS APM (mobile Application Performance Monitoring) issue
troubleshooting skill. Covers the 4 read-only OpenAPIs exposed by the
`aliyun emas-appmonitor` plugin: `get-issues` / `get-issue` / `get-errors`
/ `get-error`. Capabilities: Top-N aggregation, sample stack drill-down and
dimension breakdowns for 6 issue types (crash / anr / lag / custom /
memory_leak / memory_alloc), combined with the user's source code (Java /
Kotlin / Objective-C / Swift / ArkTS / Dart / C# / JS) to produce root cause
analysis and fix suggestions.
Client coverage: native Android / iOS / HarmonyOS, Flutter, Unity
(bundled to android / iphoneos / harmony; H5 is out of scope).
Triggers: analyze app crash, troubleshoot ANR, APM crash investigation,
list top issues, "what is this digestHash", iOS ANR Top 5, Android memory
leak analysis, Flutter custom exception stacks, pull lag samples,
emas appmonitor usage, sort issues by error rate, map stack to source,
appKey problem, EMAS APM issue analysis, analyze APM issues.
---
# alibabacloud-emas-apm-query
## 1. Scenario Description & Architecture
After a mobile app integrates Alibaba Cloud EMAS APM, the crash / anr / lag / custom / memory_leak / memory_alloc events it produces every day are aggregated and reported by the SDK to the backend. A typical troubleshooting workflow is:
1. **Figure out which Issues are most worth fixing**: sort by error rate / error count → pick Top 3~5
2. **Inspect what a specific Issue looks like**: fetch its aggregated metrics and affected versions
3. **Find several representative samples**: across different devices / versions / networks
4. **Read the stack + business log in a sample**: find actionable clues
5. **Compare against the app source code and propose a fix**
This skill stitches the 5 steps above into a single CLI pipeline. The entire process **only calls the 4 read-only APIs of `aliyun emas-appmonitor`**, and depends on no database / log service:
```
GetIssues → GetIssue → GetErrors → GetError
↓
(optional) stack ↔ user APP source → precise file:line + fix diff
```
**Supported BizModules**: `crash` / `anr` / `lag` / `custom` / `memory_leak` / `memory_alloc`
**Supported OS**: `android` / `iphoneos` / `harmony` (`harmony` does not have `anr` / `memory_*`)
## 2. Prerequisites
| Item | Requirement | Self-check command |
| --- | --- | --- |
| Aliyun CLI version | >= `3.3.3` | `aliyun version` |
| Plugin | `aliyun-cli-emas-appmonitor` | `aliyun emas-appmonitor --help` |
| jq | any version (required by scripts) | `jq --version` |
Full installation steps: [`references/cli-installation-guide.md`](references/cli-installation-guide.md). Recommended: enable auto plugin installation once:
```bash
aliyun configure set --auto-plugin-install true
aliyun plugin update
```
## 3. Credential Pre-check
**Do NOT** print AK/SK values; just verify that an available profile exists:
```bash
aliyun configure list
```
The expected output contains a `current` profile whose `Mode` / `RegionId` are non-empty. If not, configure one of AK / OAuth / StsToken / RamRoleArn per [`references/cli-installation-guide.md#Configuration`](references/cli-installation-guide.md).
> **This skill never reads or forwards AK / SK field values themselves during its whole lifecycle.**
## 4. AI-mode Lifecycle
**Start** (before the skill runs):
```bash
aliyun configure ai-mode enable
aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-emas-apm-query"
```
**End** (after troubleshooting completes):
```bash
aliyun configure ai-mode disable
```
## 5. RAM Permissions
This skill only needs the 4 read-only `emasha:View*` actions (`ViewIssues` / `ViewIssue` / `ViewErrors` / `ViewError`). EMAS AppMonitor's RamCode is `emasha` and does **not** support resource-level authorization, so `Resource` is fixed to `"*"`.
For the full least-privilege JSON policy, the equivalent system policies (`AliyunEMASAppMonitorReadOnlyAccess` / `AliyunEMASAppMonitorFullAccess`), and common permission-error troubleshooting, see [`references/ram-policies.md`](references/ram-policies.md).
> **[MUST] Permission Failure Handling**: When any command or API call fails due to permission errors at any point during execution, follow this process:
> 1. Read `references/ram-policies.md` to get the full list of permissions required by this SKILL
> 2. Use `ram-permission-diagnose` skill to guide the user through requesting the necessary permissions
> 3. Pause and wait until the user confirms that the required permissions have been granted
## 6. Parameter Confirmation
> **IMPORTANT: Parameter Confirmation** — Before executing any command or API call,
> ALL user-customizable parameters (e.g., RegionId, instance names, CIDR blocks,
> passwords, domain names, resource specifications, etc.) MUST be confirmed with the
> user. Do NOT assume or use default values without explicit user approval.
| Parameter | Required | Type | Description | Default |
| --- | --- | --- | --- | --- |
| `app-key` | Yes | int64 | EMAS APP Key (typically 9+ digits). **Prefer to infer from SDK initialization code in the current workspace** (see the 6 rule families in [`references/appkey-detection.md`](references/appkey-detection.md)); if exactly one match is found, echo it and wait for user confirmation; if multiple, list candidates and let the user pick; on miss, ask the user to provide it manually. | None (default: probed from workspace) |
| `os` | Yes | enum | `android` / `iphoneos` / `harmony` (H5 goes to `h5`, which is out of scope). **Inferred together with `app-key` from the project type**: `build.gradle` / `AndroidManifest.xml` → `android`; `*.xcodeproj` / `Podfile` → `iphoneos`; `module.json5` + `ets/` → `harmony`. For cross-platform Flutter / Unity projects, the user MUST pick one (`android` / `iphoneos`). | None (default: probed from project type) |
| `time-range` | Yes | object | `StartTime=<ms> EndTime=<ms> Granularity=1 GranularityUnit=<HOUR\|DAY>` | Last 24 hours (user-overridable) |
| `biz-module` | No | list | If omitted, all 6 modules are scanned; if specified, only that module is analyzed | All 6 |
| `digest-hash` | No | string | If the user already knows a specific Issue, skip the Top-N stage and drill down directly | None |
| `top-n` | No | int | Number of Top issues | `5` |
| `filter-json` | No | string | Further narrow down (specific version / device model / region ...), a JSON string | Not applied |
**Timestamp unit**: every API uses **Unix milliseconds**. If the user passes a value in seconds (< 1e12), the scripts will automatically multiply by 1000.
**`biz-module` pitfall**: the CLI `--help` lists the legacy enum (`exception / crash / lag / custom / h5JsError / h5WhiteScreen`); however, **`anr / memory_leak / memory_alloc` are actually forwarded to the backend** and work. This skill scans all 6 modules requested by the user by default; see [`references/biz-module-reference.md`](references/biz-module-reference.md).
**`time-range` pitfall**: in some environments `Granularity=60 GranularityUnit=MINUTE` is rejected by the backend (returns `Code: 200, Message: "unknown error"`). **Always prefer** `Granularity=1 GranularityUnit=DAY` or `GranularityUnit=HOUR`.
**`--os` pitfall**: the CLI `--help` marks `--os` as optional, but in practice **omitting it returns an empty list without error** (`Model.Items=[]`, `Total=0`). All 4 APIs must pass `--os` explicitly.
**`--did` pitfall**: `get-error`'s `--did` is also marked optional in `--help`, but is **implicitly required by the backend**. Omitting it returns `Code: 100011 Parameter Not Enough`. Take it from `get-errors`' `Items[*].Did` (already handled by `dig_issue.sh`; when calling `aliyun emas-appmonitor get-error` manually, pass it explicitly).
**Dual semantics of `DigestHash`**: `get-errors` returns `Items[*].DigestHash`, which is the hash of a **single event**, different from the **aggregated** `--digest-hash` you passed in. When calling `get-error` next, still use the **aggregated** hash (the one you used in `get-issues` / `get-issue`); do not switch to the single-event hash.
**Reuse `biz-module`**: whichever `bizModule` was used to obtain a Top Issue from `get-issues` must be reused for the next three steps (`get-issue` / `get-errors` / `get-error`); otherwise the response will be empty (the same `DigestHash` exists under only one bizModule). `list_top_issues.sh` already attaches a `bm` field to each row so it can be reused.
## 7. Core Workflow
```mermaid
flowchart TD
Start[User request] --> DetectCtx{Workspace can infer AppKey+OS?}
DetectCtx -- single match --> ConfirmCtx[Echo for user confirmation]
DetectCtx -- multiple matches --> PickCtx[List candidates, let user pick]
DetectCtx -- miss --> AskCtx[Ask user for AppKey + OS]
ConfirmCtx --> HasHash
PickCtx --> HasHash
AskCtx --> HasHash
HasHash{digestHash provided?}
HasHash -- yes --> SingleIssue[get-issue: fetch single Issue metadata]
HasHash -- no --> ParallelIssues[Parallel get-issues over 6 bizModules]
ParallelIssues --> TopN[Sort by errorRate, take Top 3-5]
TopN --> IterateTop[Iterate Top Issues]
IterateTop --> SingleIssue
SingleIssue --> GetErrors[get-errors: latest 3-5 samples]
GetErrors --> PickSample["Sample policy: latest / hot device / latest affected version"]
PickSample --> GetError[get-error: stack/threads/logs/dimensions]
GetError --> HasCode{CWD has APP source?}
HasCode -- yes --> CodeMatch[stack -> file + line -> diff]
HasCode -- no --> CliReport[CLI-only diagnostic report]
CodeMatch --> Report[Final report: issue list + root cause + fix]
CliReport --> Report
Report --> CallFailed{Any CLI call failed?}
CallFailed -- yes --> CliSelfDiag["CLI self-diagnose: --log-level debug / --cli-dry-run / configure list / plugin update"]
CallFailed -- no --> endNode[Done]
CliSelfDiag --> endNode
```
### 7.0 Runtime locate the Skill directory (`$SKILL_DIR`)
The Skill's own path is known to the Agent at the time SKILL.md is loaded (see the `fullPath` / `path` field under `<available_skills>` in the context). Before running any bash command that needs to read bundled resources from this Skill (`scripts/` / `assets/` / `references/`), **the Agent MUST** first export the directory of SKILL.md to `SKILL_DIR` exactly once:
```bash
# The Agent fills in the absolute path of SKILL.md into the placeholder, then exports once
export SKILL_DIR="$(cd "$(dirname "<ABSOLUTE_PATH_OF_SKILL.md>")" && pwd)"
# Self-check: all three directories must exist
[[ -d "$SKILL_DIR/scripts" && -d "$SKILL_DIR/assets" && -d "$SKILL_DIR/references" ]] \
|| { echo "[ERROR] SKILL_DIR does not point to the root of this Skill: $SKILL_DIR" >&2; exit 1; }
```
Rules:
1. **Do not hardcode `~/.cursor/skills-cursor/...` or `~/.claude/skills/...`**: this Skill can be distributed in the repository (`.agent/skills/alibabacloud-emas-apm-query/`) or at the user level, and the absolute path varies with the host.
2. **Do not rely on `cd` into the Skill directory to use relative paths**: the scripts drop artifacts into the current working directory (the user's APP source root); `cd` would break this semantic.
3. **The bash scripts have a fallback**: `scripts/list_top_issues.sh` and `scripts/dig_issue.sh` auto-detect their own location via `BASH_SOURCE` at the top, so they can locate `$SKILL_DIR` even if it was not exported. Other inline `jq` / `rg` commands inside SKILL.md still require the Agent to export `$SKILL_DIR` first.
### 7.1 Stage A: Top N (when `digest-hash` is not provided)
Use `scripts/list_top_issues.sh` to scan the 6 biz_modules in parallel:
```bash
bash "$SKILL_DIR/scripts/list_top_issues.sh" \
--app-key <AppKey> \
--os <iphoneos|android|harmony> \
--start-time <startMs> \
--end-time <endMs> \
--top-n 5 \
--order-by ErrorRate
```
The output is a merged Top-N table, each row containing `{bm, digestHash, ec, er, edc, edr, name, type, reason}`.
To add a Filter (e.g. "only version 3.5.x"), append `--filter-json '{"Key":"appVersion","Operator":"in","Values":["3.5.0","3.5.1"]}'`; see [`references/filter-reference.md`](references/filter-reference.md).
### 7.2 Stage B: Drill into a single Issue
Use `scripts/dig_issue.sh`:
```bash
bash "$SKILL_DIR/scripts/dig_issue.sh" \
--app-key <AppKey> \
--os <iphoneos|android|harmony> \
--biz-module <crash|anr|lag|custom|memory_leak|memory_alloc> \
--digest-hash <13-char Base36> \
--start-time <startMs> --end-time <endMs> \
--sample-size 3
```
Output directory:
```
emas-apm-dig-<AppKey>-<DigestHash>-<epoch>/
01-get-issue.json
02-get-errors.json (contains the ClientTime/Uuid/Did triples)
samples/<Uuid>.json (complete JSON per sample, includes Backtrace/EventLog etc.)
report.md (structured markdown report)
```
### 7.3 Stage C: Code mapping + diff (if the CWD contains APP source)
Follow [`references/troubleshoot-workflow.md`](references/troubleshoot-workflow.md):
1. Determine the platform (Android / iOS / Harmony / RN / Flutter / Web)
2. `Model.Backtrace` → keep APP user frames → grep the source → locate file:line
3. Enrich the timeline using `EventLog` + `Controllers` + `Threads` + `CustomInfo`
4. Emit the **smallest diff** (≤ 20 lines + one sentence of "why")
If the CWD does **not** contain the source: emit only a CLI diagnostic report (Issue overview, sample dimension comparison, representative stack), and append a hint that "switching to the APP source directory enables code-level localization".
### 7.4 Failure handling (CLI only)
When any `aliyun emas-appmonitor` call fails, run the following self-checks in order:
```bash
aliyun configure list # 1. current profile / mode / region
aliyun plugin update # 2. latest plugin
aliyun emas-appmonitor <cmd> ... --cli-dry-run # 3. parameter serialization check
aliyun emas-appmonitor <cmd> ... --log-level debug # 4. HTTP body + RequestId
```
**Do not guide the user to query any server-side data source.**
## 8. Success Verification
The full 6-step CLI self-verification (with runnable commands and pass/fail criteria for each step) is in [`references/verification-method.md`](references/verification-method.md). The correct-vs-incorrect CLI pattern matrix is in [`references/acceptance-criteria.md`](references/acceptance-criteria.md). Core criteria:
1. **Reachable**: `get-issues` dry-run prints the HTTP body successfully
2. **Non-empty**: some biz_module has `Model.Total >= 1`
3. **Stable**: two calls with identical parameters return the same Top 5 `DigestHash`
4. **Filter works**: after adding a filter, `Total` is strictly <= the full count
5. **Three-level chain**: `issues → issue → errors → error` can pull a Stack end to end
6. **Diagnosable**: on induced errors, the output includes `RequestId` and `ErrorCode`
## 9. Cleanup
This skill is **read-only**; it **does not** create any cloud resources that need cleanup.
Tear-down is only two things:
```bash
aliyun configure ai-mode disable
# (optional) delete the local JSON directories produced by dig_issue.sh
rm -rf ./emas-apm-dig-*
```
## 10. Best Practices
1. **Probe first, ask later**: before entering the main flow, grep SDK initialization code from the user's workspace per [`references/appkey-detection.md`](references/appkey-detection.md) to infer `app-key` / `os`; confirm with the user only after a hit, rather than asking upfront.
2. **Top first, then drill**: do not run `dig_issue.sh` against every Issue from the start — first use `list_top_issues.sh` to aggregate the Top N, then drill into each of them. The total number of CLI calls is `O(N)` rather than `O(all)`.
3. **Always pass `--os`**: `--os` on all 4 APIs is marked optional in `--help`, but omitting it returns empty results silently. Always specify `android / iphoneos / harmony` explicitly.
4. **`get-error` MUST carry `--did`**: marked optional in `--help` but implicitly required by the backend; take it from `Items[*].Did` in the `get-errors` response.
5. **Reuse `biz-module`**: the next `get-issue` / `get-errors` / `get-error` calls must use the same bizModule that produced the Issue in `get-issues`; switching will return empty.
6. **Shrink the time window from "wide" to "narrow"**: start diagnosis with 24h / `Granularity=1 GranularityUnit=DAY`; once a specific version / device is located, shrink to 1~4 hours with `GranularityUnit=HOUR`.
7. **Filters are JSON strings**: the entire `--filter` value must be a single JSON string; build nested `SubFilters` with `jq -cn` to avoid manual escape errors (see [`references/filter-reference.md`](references/filter-reference.md)).
8. **Multi-account scenarios**: confirm the profile via `aliyun configure list` and pass `--profile <name>` explicitly rather than relying on implicit env-var switching.
9. **Persist `get-error`**: this API response can be from hundreds of KB to several MB; do not truncate JSON with `head` / `tail`. Write to `> /tmp/emas-error-XXX.json` first and then process with `jq`.
10. **Android obfuscation**: when you see class names like `a.a.a.b.c`, ask the user for `mapping.txt` before attempting code mapping rather than guessing.
11. **iOS not symbolicated**: when `Model.SymbolicStatus=false`, the `Stack` contains many raw addresses; only emit conclusions at device / version dimensions, and re-analyze after dSYM is uploaded.
12. **Parallel QPS control**: `list_top_issues.sh` has a built-in `sleep 0.3s` to avoid throttling; scanning 6 biz_modules takes 2~3 seconds in total and does not need extra concurrency.
13. **Empty `biz-module` results are not errors**: `anr / memory_*` under `harmony` or very-low-traffic AppKeys returning `Total=0` is normal and should not be retried.
14. **Do not** reverse-use this skill to **write data**: all 4 APIs are `Get*` / `View*`. If the user wants to "update Issue status" or "mark as fixed", that falls under write APIs like `UpdateIssueStatus` and is out of scope.
## 11. Reference Links
| Document | Purpose |
| --- | --- |
| [`references/cli-installation-guide.md`](references/cli-installation-guide.md) | Aliyun CLI installation / configuration / plugins / credentials |
| [`references/appkey-detection.md`](references/appkey-detection.md) | Identify AppKey and OS from the user's workspace across Android / iOS / Harmony / Flutter / Unity / H5 |
| [`references/ram-policies.md`](references/ram-policies.md) | Least-privilege JSON + Permission Failure Handling |
| [`references/get-issues.md`](references/get-issues.md) | `GetIssues` parameters / response / ordering |
| [`references/get-issue.md`](references/get-issue.md) | `GetIssue` parameters / response |
| [`references/get-errors.md`](references/get-errors.md) | `GetErrors` parameters / response |
| [`references/get-error.md`](references/get-error.md) | `GetError` parameters / response |
| [`references/filter-reference.md`](references/filter-reference.md) | `--filter` structure / operators / SubFilters / dry-run validation |
| [`references/biz-module-reference.md`](references/biz-module-reference.md) | 6 biz_modules x platforms x available `filterCode` list |
| [`references/troubleshoot-workflow.md`](references/troubleshoot-workflow.md) | Full flow for stack -> source -> diff |
| [`references/related-commands.md`](references/related-commands.md) | Cheat sheet for all `aliyun emas-appmonitor` commands + skill boundary |
| [`references/verification-method.md`](references/verification-method.md) | 6-step runnable CLI verification with pass/fail criteria |
| [`references/acceptance-criteria.md`](references/acceptance-criteria.md) | Correct vs incorrect CLI pattern matrix (for review / self-check) |
| [`assets/system-filters/index.json`](assets/system-filters/index.json) | Index of 14 static filter snapshots (biz_module x platform) |
FILE:assets/system-filters/anr-android.json
{
"bizModule": "anr",
"platform": "android",
"source": "EMAS AppMonitor system filter registry",
"count": 29,
"filters": [
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 2,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 2,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "utdid",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 5,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 6,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 7,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 8,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 9,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 10,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 11,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "view",
"filterName": "页面",
"filterType": "checkbox",
"filterOrder": 12,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 15,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 17,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 18,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "isJailbroken",
"filterName": "是否root",
"filterType": "radio",
"filterOrder": 19,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "inMainProcess",
"filterName": "是否主进程",
"filterType": "radio",
"filterOrder": 24,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": true
},
{
"name": "否",
"value": false
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 26,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 27,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 28,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 29,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 30,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "processName",
"filterName": "进程",
"filterType": "checkbox",
"filterOrder": 31,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 32,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 33,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 34,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isSimulator",
"filterName": "运行环境",
"filterType": "radio",
"filterOrder": 38,
"dynamic": false,
"filterValues": [
{
"name": "真机",
"value": 0
},
{
"name": "模拟器",
"value": 1
}
]
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 39,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "additionalCustomInfo",
"filterName": "自定义维度",
"filterType": "custom",
"filterOrder": 40,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 41,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/crash-android.json
{
"bizModule": "crash",
"platform": "android",
"source": "EMAS AppMonitor system filter registry",
"count": 32,
"filters": [
{
"filterCode": "crashType",
"filterName": "崩溃类型",
"filterType": "checkbox",
"filterOrder": 1,
"dynamic": false,
"filterValues": [
{
"name": "Java Crash",
"value": "MOTU_ANDROID_CRASH"
},
{
"name": "Native Crash",
"value": "MOTU_ANDROID_NATIVE_CRASH"
}
]
},
{
"filterCode": "isOom",
"filterName": "是否OOM",
"filterType": "radio",
"filterOrder": 1,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
}
]
},
{
"filterCode": "shadow_launchedCrashDuration",
"filterName": "启动状态",
"filterType": "radio",
"filterOrder": 2,
"dynamic": false,
"filterValues": [
{
"name": "启动阶段",
"value": 1
},
{
"name": "非启动阶段",
"value": 0
},
{
"name": "-",
"value": -1
}
]
},
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 4,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "utdid",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 7,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 8,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 9,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 10,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 11,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 12,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 13,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "view",
"filterName": "页面",
"filterType": "checkbox",
"filterOrder": 14,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 18,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 20,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 21,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "isJailbroken",
"filterName": "是否root",
"filterType": "radio",
"filterOrder": 22,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "inMainProcess",
"filterName": "是否主进程",
"filterType": "radio",
"filterOrder": 31,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": true
},
{
"name": "否",
"value": false
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 33,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 34,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 35,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 36,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 37,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "processName",
"filterName": "进程",
"filterType": "checkbox",
"filterOrder": 38,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 39,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 40,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 41,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isSimulator",
"filterName": "运行环境",
"filterType": "radio",
"filterOrder": 45,
"dynamic": false,
"filterValues": [
{
"name": "真机",
"value": 0
},
{
"name": "模拟器",
"value": 1
}
]
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 46,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "additionalCustomInfo",
"filterName": "自定义维度",
"filterType": "custom",
"filterOrder": 47,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 48,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/crash-harmony.json
{
"bizModule": "crash",
"platform": "harmony",
"source": "EMAS AppMonitor system filter registry",
"count": 27,
"filters": [
{
"filterCode": "crashType",
"filterName": "崩溃类型",
"filterType": "checkbox",
"filterOrder": 1,
"dynamic": false,
"filterValues": [
{
"name": "JS Crash",
"value": "HARMONY_JS_CRASH"
},
{
"name": "Native Crash",
"value": "HARMONY_NATIVE_CRASH"
}
]
},
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 2,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 2,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "utdid",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 6,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 7,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 8,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 9,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 10,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 11,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 12,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "view",
"filterName": "页面",
"filterType": "checkbox",
"filterOrder": 13,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 17,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 19,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 20,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "inMainProcess",
"filterName": "是否主进程",
"filterType": "radio",
"filterOrder": 30,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": true
},
{
"name": "否",
"value": false
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 31,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 32,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 33,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 34,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 35,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "processName",
"filterName": "进程",
"filterType": "checkbox",
"filterOrder": 36,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 37,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 38,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 39,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 40,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 41,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/crash-iphoneos.json
{
"bizModule": "crash",
"platform": "iphoneos",
"source": "EMAS AppMonitor system filter registry",
"count": 31,
"filters": [
{
"filterCode": "componentType",
"filterName": "组件类型",
"filterType": "checkbox",
"filterOrder": 1,
"dynamic": false,
"filterValues": [
{
"name": "App",
"value": "MAIN_APP"
},
{
"name": "Extension",
"value": "EXTENSION_APP"
}
]
},
{
"filterCode": "crashType",
"filterName": "崩溃类型",
"filterType": "checkbox",
"filterOrder": 1,
"dynamic": false,
"filterValues": [
{
"name": "Crash",
"value": "MOTU_IOS_CRASH"
},
{
"name": "OOM Crash",
"value": "IOS_OOM"
},
{
"name": "Watchdog Crash",
"value": "IOS_WATCHDOG_CRASH"
},
{
"name": "Abort",
"value": "MOTU_IOS_ABORT_CAUGHT"
},
{
"name": "Abort Not Caught",
"value": "MOTU_IOS_ABORT_NOT_CAUGHT"
}
]
},
{
"filterCode": "shadow_launchedCrashDuration",
"filterName": "启动状态",
"filterType": "radio",
"filterOrder": 2,
"dynamic": false,
"filterValues": [
{
"name": "启动阶段",
"value": 1
},
{
"name": "非启动阶段",
"value": 0
},
{
"name": "-",
"value": -1
}
]
},
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 4,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "utdid",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 7,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 8,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 9,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 10,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 11,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 12,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 13,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "view",
"filterName": "页面",
"filterType": "checkbox",
"filterOrder": 14,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 18,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 20,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 21,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "isJailbroken",
"filterName": "是否越狱",
"filterType": "radio",
"filterOrder": 22,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "inMainProcess",
"filterName": "是否主进程",
"filterType": "radio",
"filterOrder": 31,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": true
},
{
"name": "否",
"value": false
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 33,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 34,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 35,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 36,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 37,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "processName",
"filterName": "进程",
"filterType": "checkbox",
"filterOrder": 38,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 39,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 40,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 41,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 43,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "additionalCustomInfo",
"filterName": "自定义维度",
"filterType": "custom",
"filterOrder": 44,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 45,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/custom-android.json
{
"bizModule": "custom",
"platform": "android",
"source": "EMAS AppMonitor system filter registry",
"count": 31,
"filters": [
{
"filterCode": "customErrorLanguage",
"filterName": "异常类型",
"filterType": "checkbox",
"filterOrder": 1,
"dynamic": false,
"filterValues": [
{
"name": "C#异常",
"value": "csharp"
},
{
"name": "Lua异常",
"value": "lua"
},
{
"name": "Dart异常",
"value": "dart"
},
{
"name": "Java异常",
"value": ""
},
{
"name": "未知",
"value": "unknown"
}
]
},
{
"filterCode": "isCustomErrorFlag",
"filterName": "是否自定义",
"filterType": "radio",
"filterOrder": 2,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": "1"
},
{
"name": "否",
"value": "0"
}
]
},
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 4,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "utdid",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 6,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 7,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 8,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 9,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 10,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 11,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 12,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "view",
"filterName": "页面",
"filterType": "checkbox",
"filterOrder": 13,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 16,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 18,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 19,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "isJailbroken",
"filterName": "是否root",
"filterType": "radio",
"filterOrder": 20,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "inMainProcess",
"filterName": "是否主进程",
"filterType": "radio",
"filterOrder": 25,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": true
},
{
"name": "否",
"value": false
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 27,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 28,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 29,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 30,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 31,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "processName",
"filterName": "进程",
"filterType": "checkbox",
"filterOrder": 32,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 33,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 34,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 35,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isSimulator",
"filterName": "运行环境",
"filterType": "radio",
"filterOrder": 40,
"dynamic": false,
"filterValues": [
{
"name": "真机",
"value": 0
},
{
"name": "模拟器",
"value": 1
}
]
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 41,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "additionalCustomInfo",
"filterName": "自定义维度",
"filterType": "custom",
"filterOrder": 42,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 43,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/custom-harmony.json
{
"bizModule": "custom",
"platform": "harmony",
"source": "EMAS AppMonitor system filter registry",
"count": 26,
"filters": [
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 1,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 1,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 2,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "utdid",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 4,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 5,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 6,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 7,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 8,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 9,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 10,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "view",
"filterName": "页面",
"filterType": "checkbox",
"filterOrder": 11,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 14,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 16,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 17,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "inMainProcess",
"filterName": "是否主进程",
"filterType": "radio",
"filterOrder": 23,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": true
},
{
"name": "否",
"value": false
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 24,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 25,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 26,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 27,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 28,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "processName",
"filterName": "进程",
"filterType": "checkbox",
"filterOrder": 29,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 30,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 31,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 32,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 33,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 34,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/custom-iphoneos.json
{
"bizModule": "custom",
"platform": "iphoneos",
"source": "EMAS AppMonitor system filter registry",
"count": 30,
"filters": [
{
"filterCode": "customErrorLanguage",
"filterName": "异常类型",
"filterType": "checkbox",
"filterOrder": 2,
"dynamic": false,
"filterValues": [
{
"name": "C#异常",
"value": "csharp"
},
{
"name": "Lua异常",
"value": "lua"
},
{
"name": "Dart异常",
"value": "dart"
},
{
"name": "Java异常",
"value": ""
},
{
"name": "未知",
"value": "unknown"
}
]
},
{
"filterCode": "isCustomErrorFlag",
"filterName": "是否自定义",
"filterType": "radio",
"filterOrder": 3,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": "1"
},
{
"name": "否",
"value": "0"
}
]
},
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 5,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 5,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 6,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "utdid",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 8,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 9,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 10,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 11,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 12,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 13,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 14,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "view",
"filterName": "页面",
"filterType": "checkbox",
"filterOrder": 15,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 18,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 20,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 21,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "isJailbroken",
"filterName": "是否越狱",
"filterType": "radio",
"filterOrder": 22,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "inMainProcess",
"filterName": "是否主进程",
"filterType": "radio",
"filterOrder": 27,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": true
},
{
"name": "否",
"value": false
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 29,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 30,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 31,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 32,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 33,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "processName",
"filterName": "进程",
"filterType": "checkbox",
"filterOrder": 34,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 35,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 36,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 37,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 40,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "additionalCustomInfo",
"filterName": "自定义维度",
"filterType": "custom",
"filterOrder": 41,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 42,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/index.json
{
"source": "EMAS AppMonitor system filter registry",
"generatedFrom": "Snapshot of EMAS AppMonitor system filter definitions (frozen; rebuild manually when the product adds new filterCode)",
"bizModules": [
"anr",
"crash",
"custom",
"lag",
"memory_alloc",
"memory_leak"
],
"platforms": [
"android",
"harmony",
"iphoneos"
],
"total": 391,
"files": [
{
"bizModule": "anr",
"platform": "android",
"file": "anr-android.json",
"count": 29,
"staticCount": 10,
"dynamicCount": 19
},
{
"bizModule": "crash",
"platform": "android",
"file": "crash-android.json",
"count": 32,
"staticCount": 13,
"dynamicCount": 19
},
{
"bizModule": "crash",
"platform": "harmony",
"file": "crash-harmony.json",
"count": 27,
"staticCount": 9,
"dynamicCount": 18
},
{
"bizModule": "crash",
"platform": "iphoneos",
"file": "crash-iphoneos.json",
"count": 31,
"staticCount": 12,
"dynamicCount": 19
},
{
"bizModule": "custom",
"platform": "android",
"file": "custom-android.json",
"count": 31,
"staticCount": 12,
"dynamicCount": 19
},
{
"bizModule": "custom",
"platform": "harmony",
"file": "custom-harmony.json",
"count": 26,
"staticCount": 8,
"dynamicCount": 18
},
{
"bizModule": "custom",
"platform": "iphoneos",
"file": "custom-iphoneos.json",
"count": 30,
"staticCount": 11,
"dynamicCount": 19
},
{
"bizModule": "lag",
"platform": "android",
"file": "lag-android.json",
"count": 29,
"staticCount": 10,
"dynamicCount": 19
},
{
"bizModule": "lag",
"platform": "harmony",
"file": "lag-harmony.json",
"count": 26,
"staticCount": 8,
"dynamicCount": 18
},
{
"bizModule": "lag",
"platform": "iphoneos",
"file": "lag-iphoneos.json",
"count": 28,
"staticCount": 9,
"dynamicCount": 19
},
{
"bizModule": "memory_alloc",
"platform": "android",
"file": "memory_alloc-android.json",
"count": 26,
"staticCount": 9,
"dynamicCount": 17
},
{
"bizModule": "memory_alloc",
"platform": "iphoneos",
"file": "memory_alloc-iphoneos.json",
"count": 25,
"staticCount": 8,
"dynamicCount": 17
},
{
"bizModule": "memory_leak",
"platform": "android",
"file": "memory_leak-android.json",
"count": 26,
"staticCount": 9,
"dynamicCount": 17
},
{
"bizModule": "memory_leak",
"platform": "iphoneos",
"file": "memory_leak-iphoneos.json",
"count": 25,
"staticCount": 8,
"dynamicCount": 17
}
]
}
FILE:assets/system-filters/lag-android.json
{
"bizModule": "lag",
"platform": "android",
"source": "EMAS AppMonitor system filter registry",
"count": 29,
"filters": [
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 2,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 2,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "utdid",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 5,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 6,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 7,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 8,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 9,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 10,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 11,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "view",
"filterName": "页面",
"filterType": "checkbox",
"filterOrder": 12,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 15,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 17,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 18,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "isJailbroken",
"filterName": "是否root",
"filterType": "radio",
"filterOrder": 19,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "inMainProcess",
"filterName": "是否主进程",
"filterType": "radio",
"filterOrder": 24,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": true
},
{
"name": "否",
"value": false
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 26,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 27,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 28,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 29,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 30,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "processName",
"filterName": "进程",
"filterType": "checkbox",
"filterOrder": 31,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 32,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 33,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 34,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isSimulator",
"filterName": "运行环境",
"filterType": "radio",
"filterOrder": 38,
"dynamic": false,
"filterValues": [
{
"name": "真机",
"value": 0
},
{
"name": "模拟器",
"value": 1
}
]
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 39,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "additionalCustomInfo",
"filterName": "自定义维度",
"filterType": "custom",
"filterOrder": 40,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 41,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/lag-harmony.json
{
"bizModule": "lag",
"platform": "harmony",
"source": "EMAS AppMonitor system filter registry",
"count": 26,
"filters": [
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 2,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 2,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "utdid",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 5,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 6,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 7,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 8,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 9,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 10,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 11,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "view",
"filterName": "页面",
"filterType": "checkbox",
"filterOrder": 12,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 15,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 17,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 18,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "inMainProcess",
"filterName": "是否主进程",
"filterType": "radio",
"filterOrder": 24,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": true
},
{
"name": "否",
"value": false
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 25,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 26,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 27,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 28,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 29,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "processName",
"filterName": "进程",
"filterType": "checkbox",
"filterOrder": 30,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 31,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 32,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 33,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 34,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 35,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/lag-iphoneos.json
{
"bizModule": "lag",
"platform": "iphoneos",
"source": "EMAS AppMonitor system filter registry",
"count": 28,
"filters": [
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 2,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 2,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "utdid",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 5,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 6,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 7,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 8,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 9,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 10,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 11,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "view",
"filterName": "页面",
"filterType": "checkbox",
"filterOrder": 12,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 15,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 17,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 18,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "isJailbroken",
"filterName": "是否越狱",
"filterType": "radio",
"filterOrder": 19,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "inMainProcess",
"filterName": "是否主进程",
"filterType": "radio",
"filterOrder": 24,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": true
},
{
"name": "否",
"value": false
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 26,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 27,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 28,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 29,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 30,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "processName",
"filterName": "进程",
"filterType": "checkbox",
"filterOrder": 31,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 32,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 33,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 34,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 36,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "additionalCustomInfo",
"filterName": "自定义维度",
"filterType": "custom",
"filterOrder": 37,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 38,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/memory_alloc-android.json
{
"bizModule": "memory_alloc",
"platform": "android",
"source": "EMAS AppMonitor system filter registry",
"count": 26,
"filters": [
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 1,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 4,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceId",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 5,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 6,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 7,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 8,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 9,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 10,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 11,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 15,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 16,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 17,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "isJailbroken",
"filterName": "是否root",
"filterType": "radio",
"filterOrder": 18,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 21,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 22,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 23,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 24,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 25,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 27,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 28,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 29,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isSimulator",
"filterName": "运行环境",
"filterType": "radio",
"filterOrder": 33,
"dynamic": false,
"filterValues": [
{
"name": "真机",
"value": 0
},
{
"name": "模拟器",
"value": 1
}
]
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 34,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "additionalCustomInfo",
"filterName": "自定义维度",
"filterType": "custom",
"filterOrder": 35,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 36,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/memory_alloc-iphoneos.json
{
"bizModule": "memory_alloc",
"platform": "iphoneos",
"source": "EMAS AppMonitor system filter registry",
"count": 25,
"filters": [
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 1,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 4,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceId",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 5,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 6,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 7,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 8,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 9,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 10,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 11,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 15,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 16,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 17,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "isJailbroken",
"filterName": "是否越狱",
"filterType": "radio",
"filterOrder": 18,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 21,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 22,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 23,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 24,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 25,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 27,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 28,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 29,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 31,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "additionalCustomInfo",
"filterName": "自定义维度",
"filterType": "custom",
"filterOrder": 32,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 33,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/memory_leak-android.json
{
"bizModule": "memory_leak",
"platform": "android",
"source": "EMAS AppMonitor system filter registry",
"count": 26,
"filters": [
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 1,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 4,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceId",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 5,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 6,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 7,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 8,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 9,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 10,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 11,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 15,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 16,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 17,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "isJailbroken",
"filterName": "是否root",
"filterType": "radio",
"filterOrder": 18,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 21,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 22,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 23,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 24,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 25,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 27,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 28,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 29,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isSimulator",
"filterName": "运行环境",
"filterType": "radio",
"filterOrder": 33,
"dynamic": false,
"filterValues": [
{
"name": "真机",
"value": 0
},
{
"name": "模拟器",
"value": 1
}
]
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 34,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "additionalCustomInfo",
"filterName": "自定义维度",
"filterType": "custom",
"filterOrder": 35,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 36,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:assets/system-filters/memory_leak-iphoneos.json
{
"bizModule": "memory_leak",
"platform": "iphoneos",
"source": "EMAS AppMonitor system filter registry",
"count": 25,
"filters": [
{
"filterCode": "appVersion",
"filterName": "应用版本",
"filterType": "checkbox",
"filterOrder": 1,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "build",
"filterName": "构建号",
"filterType": "checkbox",
"filterOrder": 3,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "firstVersion",
"filterName": "首现版本",
"filterType": "checkbox",
"filterOrder": 4,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceId",
"filterName": "设备ID",
"filterType": "text",
"filterOrder": 5,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "osVersion",
"filterName": "系统版本",
"filterType": "checkbox",
"filterOrder": 6,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "brand",
"filterName": "品牌",
"filterType": "checkbox",
"filterOrder": 7,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "deviceModel",
"filterName": "机型",
"filterType": "checkbox",
"filterOrder": 8,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "clientIp",
"filterName": "客户端IP",
"filterType": "text",
"filterOrder": 9,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "channel",
"filterName": "渠道",
"filterType": "checkbox",
"filterOrder": 10,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "language",
"filterName": "语言",
"filterType": "checkbox",
"filterOrder": 11,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "userNick",
"filterName": "用户昵称",
"filterType": "text",
"filterOrder": 15,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "userId",
"filterName": "用户ID",
"filterType": "text",
"filterOrder": 16,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "isForeground",
"filterName": "是否前台",
"filterType": "radio",
"filterOrder": 17,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "isJailbroken",
"filterName": "是否越狱",
"filterType": "radio",
"filterOrder": 18,
"dynamic": false,
"filterValues": [
{
"name": "是",
"value": 1
},
{
"name": "否",
"value": 0
},
{
"name": "-",
"value": 2
}
]
},
{
"filterCode": "access",
"filterName": "网络",
"filterType": "checkbox",
"filterOrder": 21,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "country",
"filterName": "国家/地区",
"filterType": "checkbox",
"filterOrder": 22,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "province",
"filterName": "省份",
"filterType": "checkbox",
"filterOrder": 23,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "city",
"filterName": "城市",
"filterType": "checkbox",
"filterOrder": 24,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "resolution",
"filterName": "分辨率",
"filterType": "checkbox",
"filterOrder": 25,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "carrier",
"filterName": "运营商",
"filterType": "checkbox",
"filterOrder": 27,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "cpuModel",
"filterName": "CPU架构",
"filterType": "checkbox",
"filterOrder": 28,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "digestHash",
"filterName": "问题ID",
"filterType": "text",
"filterOrder": 29,
"dynamic": false,
"filterValues": []
},
{
"filterCode": "tag",
"filterName": "标签",
"filterType": "checkbox",
"filterOrder": 31,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "additionalCustomInfo",
"filterName": "自定义维度",
"filterType": "custom",
"filterOrder": 32,
"dynamic": true,
"filterValues": []
},
{
"filterCode": "issueStatus",
"filterName": "问题状态",
"filterType": "checkbox",
"filterOrder": 33,
"dynamic": false,
"filterValues": [
{
"name": "未处理",
"value": 1
},
{
"name": "处理中",
"value": 2
},
{
"name": "已关闭",
"value": 3
},
{
"name": "已处理",
"value": 4
},
{
"name": "已忽略",
"value": 5
}
]
}
]
}
FILE:references/acceptance-criteria.md
# Acceptance criteria (correct / incorrect CLI examples)
This file lists the correct / incorrect usages of `aliyun-cli-emas-appmonitor` inside this Skill for code review and self-check.
## 1. Product / command group existence
```bash
aliyun emas-appmonitor --help
```
- Correct: the subcommand list contains `get-issues` / `get-issue` / `get-errors` / `get-error` / `get-activity-event` / `get-activity-events` / ...
- Incorrect: `product not exists` is returned -> plugin not installed; run `aliyun plugin install --names emas-appmonitor`
## 2. `get-issues`
### Correct
```bash
aliyun emas-appmonitor get-issues \
--app-key 335581386 \
--os android \
--biz-module crash \
--time-range StartTime=1714800000000 EndTime=1715404800000 Granularity=1 GranularityUnit=day \
--filter '{"Operator":"and","SubFilters":["{\"operator\":\"in\",\"key\":\"issueStatus\",\"values\":[1,2,3,4]}"]}' \
--order-by ErrorCount --order-type desc \
--page-index 1 --page-size 5
```
### Incorrect examples
| Wrong usage | Problem | Correct usage |
| --- | --- | --- |
| `--app-key 335581386abc` | App Key is int | use pure digits |
| Missing `--os` | Backend returns no data -> `Model=null` | add `--os android` |
| `--biz-module CRASH` | Enum is case-sensitive | `--biz-module crash` |
| `--biz-module exception_type` | Illegal enum | use one of `crash`/`anr`/`lag`/`custom`/`memory_leak`/`memory_alloc` |
| `--time-range '{"StartTime":...}'` | Object cannot be parsed as a JSON string | use `StartTime=... EndTime=... Granularity=1 GranularityUnit=day` |
| `--filter '[{...}]'` | Filter is an object, not an array | `--filter '{"Operator":"and","SubFilters":[...]}'` |
| `--filter '{"operator":"and","subFilters":[...]}'` | Root `Operator` / `SubFilters` keys lowercased | Root keys must be TitleCase: `"Operator"`, `"SubFilters"` |
| `--filter '{"Operator":"and","SubFilters":[{"operator":"in",...}]}'` | Each element of `SubFilters` must be a JSON string | `SubFilters:["{\"operator\":\"in\",...}"]` |
| `--order-by errorCount` | Enum is case-sensitive | `--order-by ErrorCount` |
## 3. `get-issue`
### Correct
```bash
aliyun emas-appmonitor get-issue \
--app-key 335581386 --os android --biz-module crash \
--digest-hash 3Q758M33DP0AV \
--time-range StartTime=1714800000000 EndTime=1715404800000 Granularity=1 GranularityUnit=day
```
### Incorrect examples
| Wrong usage | Problem |
| --- | --- |
| Missing `--os` | CLI reports `required flags missing: --os` |
| Missing `--digest-hash` | CLI does not enforce it, but the response is incomplete; per Skill convention always pass it |
| `--biz-module lag` (but the hash belongs to crash) | `Model` is empty; biz-module must match the module the hash belongs to |
| `--time-range StartTime=...,EndTime=...` | CLI separates key=value pairs by space, not comma |
## 4. `get-errors`
### Correct
```bash
aliyun emas-appmonitor get-errors \
--app-key 335581386 --os android --biz-module crash \
--digest-hash 3Q758M33DP0AV \
--time-range StartTime=1714800000000 EndTime=1715404800000 \
--page-index 1 --page-size 10
```
### Incorrect examples
| Wrong usage | Problem |
| --- | --- |
| `--time-range StartTime=... EndTime=... Granularity=1 GranularityUnit=day` | `get-errors`' TimeRange does not include Granularity (it will be ignored but may affect other validation) |
| Missing `--page-index` / `--page-size` | CLI rejects with `required flags missing` |
| Blind `--page-size 100` | Poor performance; recommend <= 20 and paginate as needed |
## 5. `get-error`
### Correct
```bash
aliyun emas-appmonitor get-error \
--app-key 335581386 --os android --biz-module crash \
--digest-hash 3Q758M33DP0AV \
--client-time 1774517852369 \
--uuid 4422affc-e43f-4739-8718-10ac71fa585a \
--did -4963409598449270935
```
### Incorrect examples
| Wrong usage | Problem | Fix |
| --- | --- | --- |
| Missing `--did` | `Code: 100011 Parameter Not Enough` (even though `--help` marks it optional) | Take `Did` from the same item in the `get-errors` response |
| `--client-time 1774517852` | Seconds-level timestamp | Use milliseconds (13 digits) |
| `--did "-4963409598449270935"` (extra quotes) | Most shells do not need the quotes; pass the raw value | If `Did` starts with `-`, the `--did=-4963409598449270935` form is safer |
| Daily use of `--biz-force true` | Pointless cost | Only when the cache is stale |
## 6. Filter emphasized again
| Good/Bad | Example |
| --- | --- |
| Good | `'{"Operator":"and","SubFilters":["{\"operator\":\"in\",\"key\":\"issueStatus\",\"values\":[1,2,3,4]}"]}'` |
| Good | `'{"Operator":"or","SubFilters":["{\"operator\":\"like\",\"key\":\"appVersion\",\"values\":[\"2.%\"]}"]}'` |
| Bad | `'{"Operator":"AND", ...}'` (case) |
| Bad | `'{"Operator":"and","SubFilters":[{"operator":"in",...}]}'` (SubFilter not stringified) |
| Bad | `--filter Key=and Operator=and SubFilters='[...]'` (attempts the key=value object format) |
## 7. OS / BizModule combinations
| biz-module | android | iphoneos | harmony | h5 | Notes |
| --- | --- | --- | --- | --- | --- |
| crash | Yes | Yes | Yes | No | available on all 3 mobile platforms |
| anr | Yes | No | No | No | Android only |
| lag | Yes | Yes | Yes | No | 3 platforms |
| custom | Yes | Yes | Yes | No | 3 platforms; Flutter/Unity custom exceptions go here too |
| memory_leak | Yes | Yes | No | No | no Harmony |
| memory_alloc | Yes | Yes | No | No | no Harmony |
When the Skill scans all 6 biz-modules by default, **filter out the unsupported combinations** per the table above to avoid wasted RPCs.
## 8. AI-mode closure
| Scenario | Must execute |
| --- | --- |
| Skill entry | `aliyun configure ai-mode enable` + `set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-emas-apm-query"` |
| Skill normal exit | `aliyun configure ai-mode disable` |
| Skill error / user interrupt | `aliyun configure ai-mode disable` |
## 9. Other
- Do NOT print / log AK / SK / STS Token / OAuth token inside the Skill.
- Do NOT call `aliyun configure set --access-key-id ... --access-key-secret ...` with plaintext credentials.
- Do NOT log `AppSecret` / `AppRsaSecret`.
- When credentials need to be supplied, ask the user to run `aliyun configure` out-of-band (OAuth preferred) or set environment variables.
FILE:references/appkey-detection.md
# Detecting `app_key` and `os` from the user's workspace
At runtime this Skill **does NOT assume** the user's workspace is one of the 6 demo projects. Instead it uses the generic rules below to scan the user's own app project. After a hit, the Skill MUST echo it back to the user for confirmation.
## Detection order
1. First determine which project type the user currently has open (or has provided): existence of `build.gradle` / `AndroidManifest.xml` -> Android; `*.xcodeproj` / `Info.plist` -> iOS; `module.json5` + `ets/` -> HarmonyOS; `pubspec.yaml` -> Flutter; `Assets/` + `ProjectSettings/` + `*.cs` -> Unity; `package.json` plus an H5 SDK -> H5.
2. Run the grep / AST rules from the matching platform section.
3. Exactly one hit -> echo and ask the user to confirm; multiple hits -> list all candidates; zero hits -> ask the user proactively.
4. Flutter / Unity cross-platform apps usually ship **AppKeys for both Android and iOS**; the user must additionally pick which side this query targets.
---
## Android (Java / Kotlin)
SDK package: `com.aliyun.emas.apm`; typical facade class: `com.aliyun.emas.apm.Apm`.
### Common initialization patterns
```java
Apm.preStart(new ApmOptions.Builder()
.setAppKey("123456789")
.setAppSecret("...")
.setAppRsaSecret("...")
.addComponent(ApmCrashAnalysisComponent.class)
.addComponent(ApmPerformanceComponent.class)
.addComponent(ApmRemoteLogComponent.class)
.addComponent(ApmMemMonitorComponent.class)
.build());
Apm.start();
```
### grep rules
```bash
# Direct assignment in Java / Kotlin source
rg -n --type-add 'kts:*.kts' -t java -t kotlin -t kts \
-e 'setAppKey\s*\(\s*"(\d+)"' \
-e 'APP_KEY\s*=\s*"(\d+)"' \
-e 'appKey\s*=\s*"(\d+)"' <workspace>
# AndroidManifest meta-data
rg -n --glob '**/AndroidManifest.xml' \
-e 'android:name="com\.(alibaba|aliyun)\.[a-zA-Z_.]*[aA]pp[kK]ey"\s+android:value="(\d+)"' <workspace>
# gradle / properties
rg -n --glob '*.gradle*' --glob '*.properties' \
-e 'APM_APP_KEY\s*=\s*"?(\d+)"?' <workspace>
```
### Additional sources
- `BuildConfig.APM_APP_KEY`: injected by `buildConfigField "String", "APM_APP_KEY", "\"123456789\""`; read the raw definition from `build.gradle(.kts)`.
- Demo code that reads the App Key dynamically from SharedPreferences: the value usually comes from a constant (`APP_KEY = "..."`) or a gradle default; just use the constant.
### os value
Android project hit -> `--os android`.
---
## iOS (Objective-C / Swift)
SDK entry classes: `EAPMApm`, `EAPMOptions`.
### Common initialization patterns
```objc
EAPMOptions *options = [[EAPMOptions alloc] initWithAppKey:@"123456789"
appSecret:@"..."];
options.appRsaSecret = @"...";
options.sdkComponents = @[ ... ];
[EAPMApm startWithOptions:options];
```
Swift equivalent:
```swift
let options = EAPMOptions(appKey: "123456789", appSecret: "...")
options.appRsaSecret = "..."
EAPMApm.start(with: options)
```
### grep rules
```bash
# OC
rg -n -g '*.{m,mm,h}' \
-e 'initWithAppKey:\s*@"(\d+)"' \
-e 'setAppKey:\s*@"(\d+)"' \
-e '\.appKey\s*=\s*@"(\d+)"' <workspace>
# Swift
rg -n -g '*.swift' \
-e 'EAPMOptions\s*\(\s*appKey:\s*"(\d+)"' \
-e '\.appKey\s*=\s*"(\d+)"' <workspace>
# Common custom keys in Info.plist / xcconfig
rg -n -g '*.plist' -g '*.xcconfig' \
-e 'APMAppKey|AliyunAPMAppKey|EAPMAppKey' <workspace>
```
### os value
iOS project hit -> `--os iphoneos` (**not** `ios`).
---
## HarmonyOS (ArkTS)
SDK package: `@aliyun/apm`.
### Common initialization patterns
```typescript
import { APM, APMConfig, EMAS_APM_Config } from '@aliyun/apm';
APM.init(new EMAS_APM_Config({
appKey: '233584108',
context: await getUIAbilityContext(),
appSecret: '...',
...
}));
APM.start();
// Or a plain APMConfig literal
const config: APMConfig = {
context: await getUIAbilityContext(),
appKey: '233584108',
appSecret: '...',
};
APM.init(config, [crashAnalysisApi]);
APM.start();
```
### grep rules
```bash
rg -n -g '*.{ets,ts,json5,json}' \
-e "appKey:\s*['\"](\d+)['\"]" \
-e "from\s+['\"]@aliyun/apm['\"]" <workspace>
```
### os value
HarmonyOS project hit -> `--os harmony`.
---
## Flutter (Dart)
SDK package: `alibabacloud_apm` / `apm_flutter`. Core class: `ApmOptions`.
### Common initialization patterns
```dart
import 'package:alibabacloud_apm/alibabacloud_apm.dart';
const options = ApmOptions(
appKey: '333854661', // Android
appSecret: '...',
appRsaSecret: '...',
channel: 'store',
);
await Apm.instance.start(options);
```
Cross-platform demos typically **keep a per-platform copy of AppKey**:
```dart
static const ApmOptionsFields defaultIosDeveloperFields = ApmOptionsFields(
appKey: '335628209',
...
);
static const ApmOptionsFields defaultAndroidFields = ApmOptionsFields(
appKey: '333854661',
...
);
```
### grep rules
```bash
rg -n -g '*.dart' \
-e "ApmOptions\s*\(\s*appKey:\s*['\"](\d+)['\"]" \
-e "appKey:\s*['\"](\d+)['\"]" <workspace>
```
### os value
Flutter is cross-platform; always ask the user whether this query targets `android` or `iphoneos`. If the grep matches per-platform constants like `defaultAndroidFields` and `defaultIosXxxFields`, have the user pick one.
Custom exceptions reported from Dart / Kotlin / Swift / ObjC end up under `biz-module=custom`, and `customErrorLanguage` (see [../assets/system-filters/custom-android.json](../assets/system-filters/custom-android.json)) is tagged `Dart`; when this Skill queries Flutter custom exceptions, `--os` is still `android` / `iphoneos`.
---
## Unity (C#)
SDK package: `alicloud-apm`. Core classes: `Aliyun.Apm.ApmOptions` / `Aliyun.Apm.Apm`.
### Common initialization patterns
```csharp
var options = new ApmOptions(appKey, appSecret, appRsaSecret)
{
SdkComponents = SDKComponents.Crash | SDKComponents.Performance,
Channel = "store",
};
Apm.Start(options);
```
The `DemoConfig { public string appKey = "..." }` + `ApmDemoUI.GetRolePreset` pattern is also common in demos.
### grep rules
```bash
rg -n -g '*.cs' \
-e 'new\s+ApmOptions\s*\(\s*"(\d+)"' \
-e 'AppKey\s*=\s*"(\d+)"' \
-e 'appKey\s*=\s*"(\d+)"' <workspace>
```
If the demo stores AppKey in a ScriptableObject / JSON, do a second pass under `Assets/`.
### os value
Same as Flutter: have the user pick Android or iOS. Unity custom exceptions (C#) are reported to `custom` with `customErrorLanguage=C#` / `CSharp` (exact value per [../assets/system-filters/custom-android.json](../assets/system-filters/custom-android.json)).
---
## H5 (JavaScript / TypeScript)
SDK class: `EMAS_APM` (constructor-based init).
### Common initialization patterns
```html
<script src="emas-apm.min.js"></script>
<script>
var sdk = new EMAS_APM({
appKey: '335102493',
appSecret: '...',
appVersion: '1.0.0',
});
sdk.start();
</script>
```
Or as an ES module:
```typescript
import { EMAS_APM } from '@alicloud/apm-h5-sdk';
const sdk = new EMAS_APM({ appKey: '335102493', appSecret: '...' });
sdk.start();
```
### grep rules
```bash
rg -n -g '*.{js,ts,jsx,tsx,vue,html,htm}' \
-e "new\s+EMAS_APM\s*\(\s*\{\s*appKey:\s*['\"]([^'\"]+)['\"]" \
-e "appKey:\s*['\"]([A-Za-z0-9]+)['\"]" <workspace>
```
H5 AppKeys are not necessarily pure digits, but they are still issued by the console.
### os value
H5 project -> `--os h5`. Note: H5 bizModules are typically `h5JsError` / `h5WhiteScreen` / `h5Pv`; this Skill focuses on the 6 mobile types. For H5, consult the APM console H5 section; it is out of scope for this Skill.
---
## Generic fallback
When every grep misses, the Skill should ask the user:
> I could not find APM SDK initialization code in the current workspace. Please provide:
>
> 1. App Key (find it in the [EMAS console](https://emas.console.aliyun.com/) app detail page; usually 9 digits)
> 2. Platform: `android` / `iphoneos` / `harmony` / `h5` (for Flutter / Unity, pick android or iphoneos per the actual build target)
Only enter the main flow after confirmation.
FILE:references/biz-module-reference.md
# `biz-module` reference
`biz-module` is the top-level taxonomy of emas-appmonitor; it determines which sub-service the API calls, which tables are queried, and which fields are returned. This Skill supports the 6 values below:
| biz-module | Meaning | Scenario | Typical root causes |
| --- | --- | --- | --- |
| `crash` | Crash | Process terminates abnormally (iOS `EXC_*` / Android `java.lang.*Exception` / NDK signal) | null pointer, out-of-bounds, race condition, native memory corruption |
| `anr` | ANR | Android main thread blocked beyond timeout (system "Not Responding" dialog) | main-thread IO, lock contention, slow broadcast receivers |
| `lag` | Lag | Low FPS / main-thread execution over threshold without ANR | big-image decoding, over-layout, synchronous network calls, JSON parsing |
| `custom` | Custom business exception | Business errors reported via `reportError(errorCode, errorMsg, stack)` | business validation failure, unexpected API response, front-/back-end contract drift |
| `memory_leak` | Memory leak | Reference chains that SDK detected as non-releasable | static holding of Activity, singleton holding Context, unregistered listeners |
| `memory_alloc` | Memory allocation | Large one-off allocations / rapid bulk growth | Bitmap decode without inSampleSize, unbounded cache, loading big JSON at once |
## `biz-module` x platform x `filterCode` matrix
The table below is auto-summarized from `assets/system-filters/<biz-module>-<platform>.json`:
| biz_module | platform | Static-enum filterCodes | Dynamic filterCodes (consumer must supply values) |
| --- | --- | --- | --- |
| `anr` | `android` | `utdid`, `clientIp`, `userNick`, `userId`, `isForeground`, `isJailbroken`, `inMainProcess`, `digestHash`, `isSimulator`, `issueStatus` | `appVersion`, `build`, `firstVersion`, `osVersion`, `brand`, `deviceModel`, `channel`, `language`, `view`, `access`, `country`, `province`, `city`, `resolution`, `processName`, `carrier`, `cpuModel`, `tag`, `additionalCustomInfo` |
| `crash` | `android` | `crashType`, `isOom`, `shadow_launchedCrashDuration`, `utdid`, `clientIp`, `userNick`, `userId`, `isForeground`, `isJailbroken`, `inMainProcess`, `digestHash`, `isSimulator`, `issueStatus` | same as anr |
| `crash` | `iphoneos` | `componentType`, `crashType`, `shadow_launchedCrashDuration`, `utdid`, `clientIp`, `userNick`, `userId`, `isForeground`, `isJailbroken`, `inMainProcess`, `digestHash`, `issueStatus` | same as anr |
| `crash` | `harmony` | `crashType`, `utdid`, `clientIp`, `userNick`, `userId`, `isForeground`, `inMainProcess`, `digestHash`, `issueStatus` | `appVersion`, `build`, `firstVersion`, `osVersion`, `brand`, `deviceModel`, `channel`, `language`, `view`, `access`, `country`, `province`, `city`, `resolution`, `processName`, `carrier`, `cpuModel`, `tag` |
| `custom` | `android` | `customErrorLanguage`, `isCustomErrorFlag`, `utdid`, `clientIp`, `userNick`, `userId`, `isForeground`, `isJailbroken`, `inMainProcess`, `digestHash`, `isSimulator`, `issueStatus` | same as anr |
| `custom` | `iphoneos` | same as custom/android (minus `isSimulator`) | same as anr |
| `custom` | `harmony` | `utdid`, `clientIp`, `userNick`, `userId`, `isForeground`, `inMainProcess`, `digestHash`, `issueStatus` | same as crash/harmony |
| `lag` | `android` | `utdid`, `clientIp`, `userNick`, `userId`, `isForeground`, `isJailbroken`, `inMainProcess`, `digestHash`, `isSimulator`, `issueStatus` | same as anr |
| `lag` | `iphoneos` | same as lag/android (minus `isSimulator`) | same as anr |
| `lag` | `harmony` | `utdid`, `clientIp`, `userNick`, `userId`, `isForeground`, `inMainProcess`, `digestHash`, `issueStatus` | same as crash/harmony |
| `memory_alloc` | `android` | `deviceId`, `clientIp`, `userNick`, `userId`, `isForeground`, `isJailbroken`, `digestHash`, `isSimulator`, `issueStatus` | `appVersion`, `build`, `firstVersion`, `osVersion`, `brand`, `deviceModel`, `channel`, `language`, `access`, `country`, `province`, `city`, `resolution`, `carrier`, `cpuModel`, `tag`, `additionalCustomInfo` |
| `memory_alloc` | `iphoneos` | same as memory_alloc/android (minus `isSimulator`) | same as memory_alloc/android |
| `memory_leak` | `android` | same as memory_alloc/android | same as memory_alloc/android |
| `memory_leak` | `iphoneos` | same as memory_alloc/android (minus `isSimulator`) | same as memory_alloc/android |
> `harmony` does not have the `anr` / `memory_leak` / `memory_alloc` taxonomy, so those rows are omitted above.
> For the complete field set (including the Chinese `filterName`, `filterValues` candidate lists, and `filterType` widget kinds), read the corresponding JSON directly.
## Canonical pattern for reading static filter enums
```bash
# $SKILL_DIR is expected to be exported ahead of time by SKILL.md section 7.0; no hardcoding here.
# If invoked in script-mode without an exported SKILL_DIR, auto-detect from the Skill root:
: "=$(cd "$(dirname "${BASH_SOURCE[0]:-$0")/.." && pwd)}"
BIZ=crash # crash / anr / lag / custom / memory_leak / memory_alloc
OS=iphoneos # android / iphoneos / harmony
FILTER_DIR="$SKILL_DIR/assets/system-filters"
INDEX="$FILTER_DIR/index.json"
# 1) Prefer index.json to resolve the file name (most reliable)
FILE="$FILTER_DIR/$(jq -r --arg bm "$BIZ" --arg os "$OS" \
'.files[] | select(.bizModule == $bm and .platform == $os) | .file' \
"$INDEX")"
# 2) Fallback: if index.json is missing or the entry is not found, derive from the naming convention
[[ -f "$FILE" ]] || FILE="$FILTER_DIR/BIZ-OS.json"
[[ -f "$FILE" ]] || { echo "[ERROR] Filter definition not found for $BIZ x $OS: $FILE" >&2; return 1 2>/dev/null || exit 1; }
# All static filterCodes
jq -r '.filters[] | select(.dynamic == false) | .filterCode' "$FILE"
# Candidate enum values for a specific filterCode (name + value pairs)
jq -r '.filters[] | select(.filterCode == "crashType") | .filterValues[] | "\(.value)\t\(.name)"' "$FILE"
# Group by widget kind (checkbox / radio / select / text)
jq -r '.filters | group_by(.filterType) | .[] | {type:.[0].filterType, codes:[.[].filterCode]}' "$FILE"
```
> To switch the `biz_module x platform` combination, change only `BIZ=` / `OS=` on the top two lines; the file path and existence check are handled by the `FILE=` / fallback / `-f` triplet below, so the JSON filename no longer needs to be maintained by hand.
## Key-field cheat sheet by biz_module
### crash
- **Main discriminator**: `crashType` (Android: `MOTU_ANDROID_CRASH` Java / `MOTU_ANDROID_NATIVE_CRASH` Native; iOS: `MOTU_IOS_CRASH` / `MOTU_IOS_MACH_EXCEPTION` / `MOTU_IOS_NATIVE_CRASH`, etc.)
- **Launch crash window**: `shadow_launchedCrashDuration` static enum (typically `0-5s / 5-10s / 10s+`; read the JSON for exact values)
- **Is OOM** (Android only): `isOom` (0/1)
- **iOS component** (iOS only): `componentType` (APP / Extension / Watch)
- **Issue status**: `issueStatus` = `NEW / OPEN / CLOSE / FIXED`
### anr (Android only)
- Few ANR-specific filterCodes; rely mostly on generic dimensions (device / version / foreground)
- `isForeground=1` is usually a "user-visible ANR"; prioritize it
- `inMainProcess=1` keeps the main process only
### lag
- No dedicated `lagCost` filter (the SDK applies a threshold when reporting; the CLI only supports dimension filtering)
- Common combo: `isForeground=1` + `inMainProcess=1` + `appVersion in [...]`
### custom
- **Main discriminator**: `customErrorLanguage` (`Java / OC / Swift / JavaScript / ArkTS / Dart ...`)
- `isCustomErrorFlag` distinguishes whether this is a "custom error code" report (legacy API vs new API)
### memory_leak
- Common: `osVersion` (some OS versions have specific leaks) + `deviceModel` + `digestHash` (locate the same leak site)
- `isJailbroken=1` can be used to exclude jailbroken-device noise
### memory_alloc
- Filters are the same as memory_leak
- In the API response, `AllocSizePct90 / Pct70 / Pct50 / Max` are in-cluster percentiles; usually sort by `AllocSizePct90 desc` and pick top issues
## Pitfalls
1. **`--biz-module` CLI `--help` is not exhaustive**: `aliyun emas-appmonitor get-issues --help` only lists `crash/lag/custom` and a few options, but dry-run verifies that `anr / memory_leak / memory_alloc` **are all forwarded** to the backend. The Skill guarantees all 6 values are directly usable.
2. **Missing biz_modules for `harmony`**: `anr`, `memory_leak`, `memory_alloc` have no data on the `harmony` platform; if you pass these combinations, the API returns `Model.Items=[]` and `Total=0`, **without error**. When this happens, check the OS combination first.
3. **The same filterCode across biz_modules may have different meanings**: see the "Differences across biz_modules" section in `filter-reference.md`.
4. **`digestHash` as a filterCode**: **invalid** inside `get-issues` (the list is itself aggregated by `digestHash`; filtering by `digestHash` is not supported); to target one `digestHash`, use `get-issue` rather than `get-issues + filter`.
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
Complete guide for installing and configuring Aliyun CLI.
> **Aliyun CLI 3.3.0+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.0 or later for full plugin ecosystem coverage.
## Installation
### macOS / Linux (Recommended)
One command to install or update — works on both macOS and Linux, auto-detects architecture:
```bash
/bin/bash -c "$(curl -fsSL --connect-timeout 10 --max-time 120 https://aliyuncli.alicdn.com/setup.sh)"
```
After installation, verify:
```bash
aliyun version # should be >= 3.3.0
```
### macOS — Homebrew (Alternative)
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
```
### Linux — Manual Binary (Alternative)
Use these only if the setup script above is not suitable.
**x86_64**
```bash
wget --connect-timeout=10 --read-timeout=120 --tries=3 -qO- https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz | tar xz
sudo mv aliyun /usr/local/bin/
```
**ARM64**
```bash
wget --connect-timeout=10 --read-timeout=120 --tries=3 -qO- https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz | tar xz
sudo mv aliyun /usr/local/bin/
```
### Windows
**Using Binary**
1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Open new Command Prompt or PowerShell
5. Verify: `aliyun version`
**Using PowerShell**
```powershell
# Download
Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip"
# Extract
Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli
# Add to PATH (requires admin privileges)
$env:Path += ";C:\aliyun-cli"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)
# Verify
aliyun version
```
## Configuration
### Quick Start
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
All `aliyun configure` commands support non-interactive flags, which is the recommended approach —
it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts.
**Where to Get Access Keys**
1. Log in to Aliyun Console: https://ram.console.aliyun.com/
2. Navigate to: AccessKey Management
3. Create a new AccessKey pair
4. Save the secret immediately — it's only shown once
### Configuration Modes
**OAuth (browser login)** — If the environment can open a web browser (for example a local desktop with a GUI), **prefer OAuth** over storing AccessKey pairs in configuration: credentials are not kept as plaintext AK/SecretKey. Requires Alibaba Cloud CLI **3.0.299** or later and is **not** suitable for headless servers (for example SSH-only Linux without a browser on the same machine).
Run interactively:
```bash
aliyun configure --profile <your-profile-name> --mode OAuth
```
Full setup (administrator consent, RAM assignments, site `CN` vs `INTL`) is covered in the official guide: [Configure OAuth authentication for Alibaba Cloud CLI](https://www.alibabacloud.com/help/en/doc-detail/2995960.html).
---
The sections below describe **six** authentication modes that are typically driven with **non-interactive** flags (scripts, CI/CD, automation). Use these when OAuth is not available or when you must supply explicit keys or tokens.
#### 1. AK Mode (Access Key)
Most common mode for personal accounts and scripts.
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
Configuration is stored in `~/.aliyun/config.json`:
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "AK",
"access_key_id": "<your-access-key-id>",
"access_key_secret": "<your-access-key-secret>",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "en"
}
]
}
```
#### 2. StsToken Mode (Temporary Credentials)
For short-lived access (tokens expire in 1-12 hours).
```bash
aliyun configure set \
--mode StsToken \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--sts-token <your-sts-token> \
--region cn-hangzhou
```
Use cases: CI/CD pipelines, temporary access for external contractors, cross-account access.
#### 3. RamRoleArn Mode (Assume RAM Role)
Assume a RAM role for elevated or cross-account access.
```bash
aliyun configure set \
--mode RamRoleArn \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--ram-role-arn <your-ram-role-arn> \
--role-session-name <your-role-session-name> \
--region cn-hangzhou
```
Use cases: cross-account resource access, temporary elevated privileges, role-based access control.
#### 4. EcsRamRole Mode (ECS Instance RAM Role)
Use the RAM role attached to an ECS instance — no credentials needed.
```bash
aliyun configure set \
--mode EcsRamRole \
--ram-role-name <your-ecs-ram-role-name> \
--region cn-hangzhou
```
Requirements: must be running on an ECS instance with a RAM role attached.
Use cases: scripts and automation running on ECS instances.
#### 5. RsaKeyPair Mode (RSA Key Pair)
Use RSA key pair for authentication (generate key pair in Aliyun Console first).
```bash
aliyun configure set \
--mode RsaKeyPair \
--private-key <path-to-your-private-key.pem> \
--key-pair-name <your-key-pair-name> \
--region cn-hangzhou
```
#### 6. RamRoleArnWithEcs Mode (ECS + RAM Role)
Combine ECS instance role with RAM role assumption for cross-account access from ECS.
```bash
aliyun configure set \
--mode RamRoleArnWithEcs \
--ram-role-name <your-ecs-ram-role-name> \
--ram-role-arn <your-ram-role-arn> \
--role-session-name <your-role-session-name> \
--region cn-hangzhou
```
### Environment Variables
**Highest priority** - overrides config file
**Access Key Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=<your-access-key-id>
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<your-access-key-secret>
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**STS Token Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=<your-access-key-id>
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<your-access-key-secret>
export ALIBABA_CLOUD_SECURITY_TOKEN=<your-sts-token>
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**Use Case**:
- CI/CD pipelines
- Docker containers
- Temporary credential override
### Managing Multiple Profiles
**Create Named Profiles**
```bash
aliyun configure set --profile projectA \
--mode AK \
--access-key-id <your-first-access-key-id> \
--access-key-secret <your-first-access-key-secret> \
--region cn-hangzhou
aliyun configure set --profile projectB \
--mode AK \
--access-key-id <your-second-access-key-id> \
--access-key-secret <your-second-access-key-secret> \
--region cn-shanghai
```
**Use Specific Profile**
```bash
aliyun ecs describe-instances --profile projectA
export ALIBABA_CLOUD_PROFILE=projectA
aliyun ecs describe-instances # Uses projectA
```
**List and Switch Profiles**
```bash
aliyun configure list # List all profiles
aliyun configure switch --profile projectA # Switch default profile
```
### Credential Priority
Credentials are loaded in this order (first found wins):
1. **Command-line flag**: `--profile <name>`
2. **Environment variable**: `ALIBABA_CLOUD_PROFILE`
3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc.
4. **Configuration file**: `~/.aliyun/config.json` (current profile)
5. **ECS Instance RAM Role**: If running on ECS with attached role
## Verification
### Test Authentication
```bash
# Basic test - list regions
aliyun ecs describe-regions
# Expected output: JSON array of regions
```
**If successful**, you'll see:
```json
{
"Regions": {
"Region": [
{
"RegionId": "cn-hangzhou",
"RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com",
"LocalName": "East China 1 (Hangzhou)"
},
...
]
},
"RequestId": "..."
}
```
**If failed**, you'll see error messages:
- `InvalidAccessKeyId.NotFound` - Wrong Access Key ID
- `SignatureDoesNotMatch` - Wrong Access Key Secret
- `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode)
- `Forbidden.RAM` - Insufficient permissions
### Debug Configuration
```bash
# Show current configuration
aliyun configure get
# Test with debug logging
aliyun ecs describe-regions --log-level debug
# Check credential provider
aliyun configure get mode
```
## Security Best Practices
### 1. Use RAM Users (Not Root Account)
Don't: Use Aliyun root account credentials
Do: Create RAM users with specific permissions
```bash
# Create RAM user in console
# Attach only necessary policies
# Use RAM user's access keys
```
### 2. Principle of Least Privilege
Grant only the minimum permissions needed:
```bash
# Example: Read-only ECS access
# Attach policy: AliyunECSReadOnlyAccess
```
### 3. Rotate Access Keys Regularly
```bash
# Create new access key in RAM Console, then update configuration
aliyun configure set --access-key-id <your-access-key-id> --access-key-secret <your-access-key-secret>
# Delete old access key from console
```
### 4. Use STS Tokens for Temporary Access
```bash
aliyun configure set --mode StsToken \
--access-key-id <your-access-key-id> --access-key-secret <your-access-key-secret> \
--sts-token <your-sts-token> --region cn-hangzhou
```
### 5. Use ECS RAM Roles When Possible
```bash
aliyun configure set --mode EcsRamRole --ram-role-name <your-ecs-ram-role-name> --region cn-hangzhou
```
### 6. Never Commit Credentials
```bash
# Add to .gitignore
echo "~/.aliyun/config.json" >> .gitignore
# Use environment variables in CI/CD instead
```
### 7. Secure Config File
```bash
# Restrict permissions
chmod 600 ~/.aliyun/config.json
```
## Troubleshooting
### Issue: Command Not Found
```bash
# Check installation
which aliyun
# Check PATH
echo $PATH
# Reinstall or add to PATH
```
### Issue: Authentication Failed
```bash
# Verify configuration
aliyun configure get
# Test with debug
aliyun ecs describe-regions --log-level debug
# Check credentials in console
# Verify access key is active
```
### Issue: Permission Denied
```bash
# Error: Forbidden.RAM
# Check RAM user permissions
# Attach necessary policies in RAM console
# Example: AliyunECSFullAccess for ECS operations
```
### Issue: STS Token Expired
```bash
# Error: InvalidSecurityToken.Expired
# Reconfigure with new token
aliyun configure set --mode StsToken \
--access-key-id <your-access-key-id> --access-key-secret <your-access-key-secret> \
--sts-token <your-sts-token> --region cn-hangzhou
```
### Issue: Wrong Region
```bash
# Some resources may not exist in the specified region
# Check available regions
aliyun ecs describe-regions
# Update default region
aliyun configure set --region cn-shanghai
```
## Advanced Configuration
### Timeout Settings
```bash
# Connection timeout
export ALIBABA_CLOUD_CONNECT_TIMEOUT=30
# Read timeout
export ALIBABA_CLOUD_READ_TIMEOUT=30
```
## Next Steps
After installation and configuration:
1. **Install plugins** for services you need (v3.3.0+ supports all published product plugins):
```bash
aliyun plugin install --names ecs vpc rds
# List all available plugins
aliyun plugin list-remote
```
2. **Explore commands**:
```bash
aliyun ecs --help
aliyun fc --help
```
## References
- Official Documentation: https://help.aliyun.com/en/cli/
- RAM Console: https://ram.console.aliyun.com/
- Access Key Management: https://ram.console.aliyun.com/manage/ak
---
## Extra requirements for this Skill (the emas-appmonitor plugin)
All commands in this Skill rely on the `aliyun emas-appmonitor` plugin. Before you start, confirm the three items below:
### 1. CLI version
```bash
aliyun version
```
Required: `>= 3.3.3`. If lower, upgrade per the "Installation" section above.
### 2. Enable auto plugin install (recommended)
```bash
aliyun configure set --auto-plugin-install true
```
Once enabled, the first invocation of `aliyun emas-appmonitor` will auto-download the `aliyun-cli-emas-appmonitor` plugin to `~/.aliyun/plugin/`.
### 3. Install or update the plugin manually
```bash
aliyun plugin install --name emas-appmonitor
aliyun plugin update --name emas-appmonitor
```
Verify availability:
```bash
aliyun emas-appmonitor --help | head -40
```
The output should contain subcommands such as `get-issues`, `get-issue`, `get-errors`, `get-error`.
### 4. Credential pre-check (**never read or print AK/SK**)
```bash
aliyun configure list
```
There must be at least one `current` profile with non-empty `Mode` / `RegionId`. **This Skill does not read or print AK/SK field contents at runtime**; when errors occur, check them yourself using the command above.
### 5. RAM permissions
All commands in this Skill are **read-only queries**. See `ram-policies.md` for the least-privilege set; the core requirements are:
- `emasha:ViewIssues`
- `emasha:ViewIssue`
- `emasha:ViewErrors`
- `emasha:ViewError`
When you see `Forbidden.NoRAMPermission`, follow the "Permission Failure Handling" flow in `ram-policies.md`.
FILE:references/filter-reference.md
# `--filter` structure and usage
All 4 APIs (`get-issues` / `get-issue` / `get-errors` / `get-error`) accept an optional `--filter` argument to narrow results on the server side. Semantics match the console's "filter conditions"; see candidate fields in `assets/system-filters/*.json`.
## Wire format: a single JSON string
Verified via `--cli-dry-run`, the **only** way for the CLI to produce a proper `Filter=<string>` is to pass **the whole JSON string as the value of `--filter`**:
```bash
aliyun emas-appmonitor get-issues \
--app-key 335695934 --os iphoneos --biz-module crash \
--time-range StartTime=$START EndTime=$END Granularity=1 GranularityUnit=DAY \
--filter '{"Key":"appVersion","Operator":"in","Values":["3.5.0","3.5.1"]}'
```
In the dry-run output `Body.Filter` must be a **string literal** (notice the outer quotes):
```json
{
"AppKey": 335695934,
"Os": "iphoneos",
"BizModule": "crash",
"TimeRange": { "...": "..." },
"Filter": "{\"Key\":\"appVersion\",\"Operator\":\"in\",\"Values\":[\"3.5.0\",\"3.5.1\"]}"
}
```
### Why not the flat `Key=... Operator=... Values.1=...` form
`--filter` is declared as `style: "json"` (not `"flat"`) in the `.cspec`. Attempting the flat form (e.g. `--filter Key=appVersion Operator=in Values.1=1.0.0`) does **not** get serialized into a top-level `Filter` field in the dry-run; it is split into odd query-string fragments and the server returns `InvalidRequest`.
## Filter object structure
```jsonc
{
"Key": "appVersion", // required, filter_code (from assets/system-filters)
"Operator": "in", // required, operator enum (see below)
"Values": ["1.0.0"], // used for leaf conditions; required on leaves
"SubFilters": [] // used for composite nodes; exclusive with Values
}
```
### Complete operator set
| Operator | Meaning | Used on | Typical values |
| --- | --- | --- | --- |
| `=` / `!=` | equal / not equal | `Key` is a single-value field | `Values:["3.5.0"]` |
| `in` / `not in` | in set / not in set | `Key` is an enum / string field | `Values:["3.5.0","3.5.1"]` |
| `>` / `<` / `>=` / `<=` | numeric comparison | `Key` is numeric (e.g. `lagCost`) | `Values:["200"]` (numbers as strings; backend coerces) |
| `and` | logical AND | composite node, `SubFilters` non-empty | `Values` empty |
| `or` | logical OR | composite node, `SubFilters` non-empty | `Values` empty |
| `not` | logical NOT | usually with one nested level | `Values` empty |
### Leaf vs composite
- **Leaf node**: has `Key` + `Operator` + `Values`, no `SubFilters` (or empty array).
- **Composite node**: `Operator` is `and` / `or` / `not`; `SubFilters` is an **array consisting of leaves or further composite nodes**.
- **`SubFilters` element type**: OpenAPI declares `SubFilters` as `array<string>`, meaning **each child filter must first be `JSON.stringify`-ed to a string** and then placed in the array.
### Composite example: APP version in [3.5.0, 3.5.1] AND brand = Apple
```bash
# Stringify each leaf first
SUB1='{"Key":"appVersion","Operator":"in","Values":["3.5.0","3.5.1"]}'
SUB2='{"Key":"brand","Operator":"=","Values":["Apple"]}'
# Then assemble the AND node; each element of SubFilters is a JSON string (another layer of escaping)
FILTER=$(jq -cn --arg s1 "$SUB1" --arg s2 "$SUB2" \
'{Key:"",Operator:"and",Values:[],SubFilters:[$s1,$s2]}')
aliyun emas-appmonitor get-issues \
--app-key 335695934 --os iphoneos --biz-module crash \
--time-range StartTime=$START EndTime=$END Granularity=1 GranularityUnit=DAY \
--filter "$FILTER"
```
In the dry-run `Body.Filter` should look like:
```json
"Filter": "{\"Key\":\"\",\"Operator\":\"and\",\"Values\":[],\"SubFilters\":[\"{\\\"Key\\\":\\\"appVersion\\\",\\\"Operator\\\":\\\"in\\\",\\\"Values\\\":[\\\"3.5.0\\\",\\\"3.5.1\\\"]}\",\"{\\\"Key\\\":\\\"brand\\\",\\\"Operator\\\":\\\"=\\\",\\\"Values\\\":[\\\"Apple\\\"]}\"]}"
```
(Three levels of escaping; constructing with `jq` is far more reliable than writing the double backslashes by hand.)
## Commonly used filter snippets (copy-paste)
```bash
# 1. A specific APP version
--filter '{"Key":"appVersion","Operator":"=","Values":["3.5.0"]}'
# 2. Multiple versions (in is cleaner than OR)
--filter '{"Key":"appVersion","Operator":"in","Values":["3.5.0","3.5.1","3.5.2"]}'
# 3. Exclude a version
--filter '{"Key":"appVersion","Operator":"not in","Values":["3.5.0-beta"]}'
# 4. Specific device models only
--filter '{"Key":"deviceModel","Operator":"in","Values":["iPhone14,5","iPhone14,2"]}'
# 5. Specific OS versions only
--filter '{"Key":"osVersion","Operator":"in","Values":["17.0","17.1","17.2"]}'
# 6. Specific brand only
--filter '{"Key":"brand","Operator":"=","Values":["Apple"]}'
# 7. lag: lag duration >= 500 ms
--filter '{"Key":"lagCost","Operator":">=","Values":["500"]}'
# 8. memory_leak: is OOM
--filter '{"Key":"isOom","Operator":"=","Values":["1"]}'
```
## Source of `filter_code` values
All legal `Key` values are listed in `filters[*].filterCode` of `assets/system-filters/<biz_module>-<platform>.json`. Each record also carries:
| Field | Purpose |
| --- | --- |
| `filterName` | Display name in the console (e.g. "App Version" in Chinese on the console) |
| `filterType` | `checkbox` / `radio` / `text` / `select`; determines the shape of `Values` |
| `dynamic` | `true` means candidate values are produced dynamically after integration (e.g. `appVersion` / `deviceModel` emitted by the app); `false` means a static enum |
| `filterValues` | Candidate values for static enums. **Usually empty when `dynamic=true`**; the user must supply real values (e.g. from the console or the app release notes) |
For `dynamic=true` fields, the Skill consumer must provide the values. The Skill does NOT "pull dynamic candidates at runtime"; it only ships **the static enum + filter_code list**.
## Differences across biz_modules
The same `filterCode` may have different semantics under different `biz_module`s:
- `isOom` under `crash` = whether the crash is OOM; under `memory_leak` = whether the leak reached the OOM threshold.
- `errorCode` under `custom` is a string reported by the business; other modules do not have this field.
- `lag` has its own `lagCost` and `lagType` (lag type: `UI` / `IO` / `Network`).
- `memory_alloc` has its own `allocSizeBucket`.
When unsure which `filterCode`s the current `biz_module` supports, read the corresponding `assets/system-filters/<biz_module>-<platform>.json` directly.
## 3 steps to verify that a filter takes effect
1. Call `get-issues` / `get-errors` **without** the filter once; record `Model.Total`.
2. Call it **with** the filter (keeping all other parameters identical); the new `Model.Total` should be **significantly smaller** than step 1, otherwise the filter probably did not take effect.
3. Use `--cli-dry-run` together with `--log-level debug` to confirm the `"Filter"` field in the HTTP body is the expected JSON string with the correct number of escape layers.
## Common pitfalls
- **Use `"` inside the JSON string, not `'`**: wrap the whole JSON with single quotes in bash; keys/values use double quotes.
- **Nested `SubFilters` need multiple escape layers**: error-prone by hand; always build with `jq -cn`.
- **Numbers in `Values` must be strings**: `Values:["200"]`, not `Values:[200]`; the backend coerces; some CLI versions raise parse errors on the reverse form.
- **Empty filter = no filter**: if `Key=""` and `SubFilters=[]`, the backend treats the filter as invalid and ignores it (equivalent to not passing `--filter`).
- **The CLI does NOT validate values when `dynamic=true`**: an incorrect `appVersion` string matches nothing and returns `Total=0`, without any error.
FILE:references/get-error.md
# `GetError` - sample detail
Aliyun CLI command: `aliyun emas-appmonitor get-error`
API version: `2019-06-11` · Action: `GetError` · Method: `POST` RPC
Fetches the **complete detail** of a sample via the "sample pointer" triple (`ClientTime` + `Uuid` + `Did`): base dimensions, exception description, backtrace, threads, business log, memory snapshot and more. This is the last link in the "error cluster -> APP source" pipeline.
## Parameters
| CLI flag | Required | Type | Description |
| --- | --- | --- | --- |
| `--app-key` | Yes | int64 | AppKey |
| `--client-time` | Yes | int64 | Sample client time (ms) - from `get-errors.Model.Items[*].ClientTime` |
| `--biz-module` | Recommended | string | One of the 6 enum values; *effectively required*, otherwise the backend has to scan multiple sub-tables |
| `--os` | Recommended | string | `android` / `iphoneos` / `harmony` / `h5`; *effectively required* |
| `--uuid` | No | string | Event unique id - from `get-errors.Model.Items[*].Uuid`. **Strongly recommended**, locks onto one specific crash |
| `--did` | No | string | Device id - from `get-errors.Model.Items[*].Did`. Helps with precise localization |
| `--digest-hash` | No | string | Owning Issue, used as an auxiliary filter |
| `--biz-force` | No | bool | `true` bypasses CDN / cache and forces origin fetch; **do not add** during normal troubleshooting |
**`--time-range` does not exist here**: `get-error` uses `ClientTime` to locate the specific sample, no time window needed.
## `--cli-dry-run` example
```bash
aliyun emas-appmonitor get-error \
--app-key 12345678 \
--os iphoneos \
--biz-module crash \
--client-time 1776415936000 \
--uuid 330429fe-3fae-475e-91b0-a2b014845456 \
--did 6963034097141070746 \
--digest-hash 3JE6F43KCQ1SV \
--cli-dry-run
```
## Response structure
`Model` is a single sample object with roughly **65 fields**, grouped by use case below.
### A. Base dimensions
| Field | Type | Description |
| --- | --- | --- |
| `AppKey` | int64 | Echo |
| `ClientTime` | int64 | Client time (ms) |
| `ServerTime` | int64 | Server ingestion time (ms) |
| `TriggeredTime` | int64 | Crash trigger time (present in some modules) |
| `SessionId` | string | Session id |
| `Uuid` | string | Event unique id |
| `Utdid` | string | UTDID (desensitized) |
| `AppVersion` | string | APP version |
| `Build` | string | Build number |
| `Os` | string | `iphoneos` / `android` / `harmony` |
| `OsVersion` | string | System version |
| `Brand` | string | Brand |
| `DeviceModel` | string | Device model |
| `Resolution` | string | Screen resolution |
| `Channel` | string | Distribution channel |
| `Language` | string | Language |
| `Country` / `Province` / `City` | string | Geography |
| `Carrier` / `Isp` / `Access` / `AccessSubType` | string | Carrier / network |
| `CpuModel` | string | CPU model |
| `ClientIp` | string | Client IP |
| `UserId` / `UserNick` | string | Business-side user |
| `Pid` | int | Process id |
| `ProcessName` / `ParentProcessName` | string | Process name |
| `InMainProcess` | int | Whether main process (`1`/`0`) |
| `ForeGround` / `ForceGround` | int | Whether foreground |
| `IsJailbroken` / `IsSimulator` / `IsSpeedVersion` | int | Device state flags |
| `SdkType` / `SdkVersion` | string | APM SDK type and version |
### B. Exception description
| Field | Type | Description |
| --- | --- | --- |
| `ExceptionType` | string | Exception type (iOS `EXC_BAD_ACCESS`; Android `java.lang.NullPointerException`) |
| `ExceptionSubtype` | string | Sub-type |
| `ExceptionCodes` | string | Signal / KERN code (iOS) |
| `ExceptionMsg` | string | Exception reason description |
| `ExceptionDetail` | string | Detailed message (may contain addresses) |
| `Summary` | string | Machine-generated summary |
| `Digest` | string | Stack digest (plaintext) |
| `DigestHash` | string | Owning Issue |
| `ReportType` | string | Internal report type (`MOTU_IOS_CRASH` / `MOTU_ANDROID_CRASH`, etc.) |
| `ReportContent` | string | Raw crash report (iOS raw before and after symbolication) |
| `LaunchedCrashStage` / `LaunchedTime` | int / int64 | Launch stage / launch duration (used to detect launch crash) |
| `LagCost` | int64 | lag duration (ms, lag only) |
### C. Stack & threads
| Field | Type | Description |
| --- | --- | --- |
| `Backtrace` | string | **Crash thread stack** (the main one to look at). Frames separated by newlines; iOS (post-symbolicated) looks like `0 apm_ios_demo EAPMDemoTriggerStackOverflow(unsigned long)`, Android looks like `at com.example.Foo.bar(Foo.java:45)` |
| `ThreadName` | string | Crashing thread name |
| `Threads` | array<object> | All threads (crash / anr); each item has `ThreadId` / `ThreadName` / `Stack` / `IsMain` |
| `BinaryUuids` | string | UUID mapping for executable segments (used by iOS symbolication) |
| `SymbolicFileType` | string | Symbol file type |
### D. Business log & extensions
| Field | Type | Description |
| --- | --- | --- |
| `EventLog` | string | Event log (page navigation / lifecycle / breadcrumbs, in time order) |
| `MainLog` | string | Main thread log (e.g. tail of logcat) |
| `BusinessLogType` | string | Business log tag |
| `CustomInfo` | string | Developer-injected key-values |
| `AdditionalCustomInfo` | string | Extra custom info |
| `Controllers` | string | Page path (iOS VC stack / Android Activity stack) |
| `View` | string | Current view path |
| `RuntimeExtData` | string | Runtime extension data |
| `MoreInfo1` / `MoreInfo2` | string | Reserved business fields |
### E. Memory & IO (memory_* / crash OOM)
| Field | Type | Description |
| --- | --- | --- |
| `MemInfo` | string | Memory usage summary (available / peak / GC count) |
| `MemoryMap` | string | memory map (iOS / Android Native) |
| `FileDescriptor` | string | FD usage (commonly used for iOS OOM) |
## Usage flow (core of this Skill)
1. **Extract the stack**: `Model.Backtrace` is the key field. Print it first for a coarse judgment.
2. **Compare threads** (anr / deadlock): iterate `Model.Threads[]`, find the thread with `IsMain=true` plus other "lock-holding" threads.
3. **Triage by exception type**: the combination of `ExceptionType` + `ExceptionSubtype` + `ExceptionCodes` points to the root-cause family:
- iOS `EXC_BAD_ACCESS / KERN_INVALID_ADDRESS` -> wild pointer / use-after-free
- iOS `EXC_CRASH / SIGABRT` -> explicit `abort()`; read the message in `ExceptionDetail`
- Android `java.lang.NullPointerException` + `ReportType=MOTU_ANDROID_CRASH` -> null dereference
4. **Timeline**: combine `EventLog` (event order) + `Controllers` (page path) + `TriggeredTime` to reconstruct what happened in the last N seconds before the crash.
5. **Code localization**: keep frames in `Backtrace` whose prefix is the APP bundle id / package name; grep by class + method + file name across the Cursor workspace.
6. **Emit the fix**: combine `ExceptionMsg` with the code context; propose a root-cause hypothesis + fix plan (both an immediate hotfix and a longer-term refactor).
## Key JMESPath expressions
```bash
# Stack + exception reason only (most common)
--cli-query "Model.{type:ExceptionType,subType:ExceptionSubtype,codes:ExceptionCodes,msg:ExceptionMsg,stack:Backtrace}"
# Event log + page path only
--cli-query "Model.{eventLog:EventLog,mainLog:MainLog,controllers:Controllers,custom:CustomInfo}"
# Multi-thread state (find blocking threads)
--cli-query "Model.Threads[*].{name:ThreadName,isMain:IsMain,stack:Stack}"
# Grab all base dimensions
--cli-query "Model.{appVer:AppVersion,osVer:OsVersion,brand:Brand,device:DeviceModel,net:Access,utdid:Utdid}"
```
## Common errors
| Symptom | Meaning | Action |
| --- | --- | --- |
| `Error: required flag ... not set` | Missing `--client-time` or `--app-key` | Fill in the required parameter |
| `Model=null`, `Success=true` | Sample cannot be found for this `ClientTime+Uuid` | Check whether `--biz-module` / `--os` match; sample may have passed cold retention |
| `Code: 200, Message: "unknown error"` | Backend error | Report with `RequestId`; in rare cases `--biz-force true` bypasses the cache |
| `Forbidden.NoRAMPermission` | Missing `emasha:ViewError` | See `ram-policies.md` |
## Response size & performance
- The `get-error` JSON response can be **hundreds of KB to several MB** (all threads + business logs); **do not truncate with `head` / `tail`**, it will corrupt the JSON.
- The right approach is to dump to disk and then process with `jq`:
```bash
aliyun emas-appmonitor get-error ... > /tmp/emas-error-$(date +%s).json
jq '.Model | {type:.ExceptionType,msg:.ExceptionMsg,stack:.Backtrace}' /tmp/emas-error-*.json
```
- **3~5 samples per Issue is enough** - the main question is whether the stack is stable and whether the exception path is concentrated.
## Tips
- **Always include `--uuid`**: it is the only field that uniquely pinpoints a specific crash; with only `ClientTime`, you can hit multiple samples sharing the same millisecond.
- **Do not batch / parallelize `get-error`**: the server has per-account QPS limits; `dig_issue.sh` already runs serially with sleep.
- **Android obfuscated stacks**: when you see class names like `a.a.a.b.c`, prompt the user for `mapping.txt`; otherwise `Backtrace` cannot be mapped to source.
- **iOS not symbolicated**: `Backtrace` full of hex addresses or image UUIDs (`C768F963-A0CC-3F5C-A1D3-...`) means the user must upload the dSYM for that version to the EMAS console and then pull the latest samples.
FILE:references/get-errors.md
# `GetErrors` - sample list
Aliyun CLI command: `aliyun emas-appmonitor get-errors`
API version: `2019-06-11` · Action: `GetErrors` · Method: `POST` RPC
Queries the **sample list** under an Issue (`DigestHash`). **Each record only contains the "sample pointer" quadruple** - `ClientTime` + `Uuid` + `Did` + `DigestHash` - full dimensions / stack / logs must be fetched via `get-error` using these pointers.
## Parameters
| CLI flag | Required | Type | Description |
| --- | --- | --- | --- |
| `--app-key` | Yes | int64 | AppKey |
| `--biz-module` | Yes | string | One of the 6 enum values |
| `--page-index` | Yes | int | Page number (CLI `--help` says required) |
| `--page-size` | Yes | int | Page size (same) |
| `--time-range` | Yes | object | **Only `StartTime` + `EndTime` (ms), no `Granularity`** |
| `--os` | Recommended | string | `android` / `iphoneos` / `harmony`. *Strongly recommended* |
| `--digest-hash` | Recommended | string | *Effectively required*: omitting it pulls samples for the whole AppKey, which is useless for troubleshooting |
| `--filter` | No | object | Further filter samples (common: specific version, device model) |
### TimeRange format
```bash
--time-range StartTime=<ms> EndTime=<ms>
```
> **Important: `--time-range` on `get-errors` does NOT accept `Granularity` / `GranularityUnit`**. Mixing those in causes the CLI to exit with `Error: unknown field: Granularity`.
### `--page-size` pitfall (shared)
Verified by this Skill: **`--page-size 1` triggers backend `unknown error`** (see `get-issues.md`). Minimum recommended `--page-size 2`; commonly `--page-size 10`.
## `--cli-dry-run` example
```bash
aliyun emas-appmonitor get-errors \
--app-key 12345678 \
--os iphoneos \
--biz-module crash \
--time-range StartTime=1776000000000 EndTime=1776086400000 \
--digest-hash 3JE6F43KCQ1SV \
--page-index 1 --page-size 5 \
--cli-dry-run
```
## Response structure
`Model` fields:
| Field | Type | Description |
| --- | --- | --- |
| `Total` | int64 | Total number of matched samples |
| `PageNum` / `PageSize` / `Pages` | int32 | Pagination |
| `Items` | array | See below (**only 5 fields**) |
### `Model.Items[*]` fields (**only these 5**)
| Field | Type | Description |
| --- | --- | --- |
| `ClientTime` | int64 | Client time of the sample (ms) - required `--client-time` for `get-error` |
| `Did` | string | Device id (plaintext) - optional `--did` for `get-error` |
| `Utdid` | string | UTDID (desensitized device id) |
| `Uuid` | string | Event unique id - optional `--uuid` for `get-error` |
| `DigestHash` | string | DigestHash of the owning Issue - echoes the input |
> **Do NOT** expect `get-errors` to return `AppVersion` / `OsVersion` / `DeviceModel` / `Stack` - those require `get-error`.
## Key JMESPath expressions
```bash
# Extract the client-time + uuid + did triple (used by the next get-error call)
--cli-query "Model.Items[*].{ct:ClientTime,uuid:Uuid,did:Did,utdid:Utdid}"
# Uuid list only
--cli-query "Model.Items[*].Uuid"
# Total count
--cli-query "Model.Total"
```
## Typical usage
**1. Pick 3~5 latest samples**
```bash
aliyun emas-appmonitor get-errors \
--app-key 335695934 --os iphoneos --biz-module crash \
--time-range StartTime=$START EndTime=$END \
--digest-hash $DIGEST \
--page-index 1 --page-size 5 \
--cli-query "Model.Items[*].{ct:ClientTime,uuid:Uuid,did:Did}"
```
Pass the returned `ct` + `uuid` + `did` triples to `get-error` to drill into each sample.
**2. Narrow to a specific version / device**
```bash
aliyun emas-appmonitor get-errors \
--app-key 335695934 --os iphoneos --biz-module crash \
--time-range StartTime=$START EndTime=$END \
--digest-hash $DIGEST \
--filter '{"Key":"appVersion","Operator":"in","Values":["3.5.0"]}' \
--page-index 1 --page-size 10
```
`Filter.Key` comes from `assets/system-filters/<biz-module>-<platform>.json`. Note that after filtering, `Model.Total` only reflects "samples passing the filter", not the overall Issue ErrorCount.
## Common errors
| Symptom | Meaning | Action |
| --- | --- | --- |
| `Error: unknown field: Granularity` | `--time-range` mixed in Granularity | Remove Granularity / GranularityUnit |
| `Code: 200, Message: "unknown error"` | Common with `--page-size 1` | Increase `--page-size` to >= 2 |
| `Success=true, Items=[], Total=0` | No matches inside the window / filter | Widen `TimeRange`; confirm `biz-module` matches the Issue's module |
| `Forbidden.NoRAMPermission` | Missing `emasha:ViewErrors` | See `ram-policies.md` |
## Tips
- **Keep `--time-range` within 30 days**: backend OLAP retention is limited; 7~14 days is safe.
- **`page-size <= 10`**: Top-N analysis only needs 3~5 representative samples; more slows down overall troubleshooting.
- **Missing any of the triple degrades `get-error` into "fuzzy match"**:
- `ClientTime` is required
- `Uuid` is optional but **strongly recommended** (most precise)
- `Did` is optional
- `DigestHash` + `BizModule` + `Os` serve as auxiliary filters
FILE:references/get-issue.md
# `GetIssue` - aggregated error detail
Aliyun CLI command: `aliyun emas-appmonitor get-issue`
API version: `2019-06-11` · Action: `GetIssue` · Method: `POST` RPC
Queries the aggregated statistics of a **single Issue** (identified by `DigestHash`) within a specified time window, including error rate, affected devices, affected version list, growth rate, summary stack and so on. Used to **drill into each** Top N Issue after `get-issues`.
## Parameters
| CLI flag | Required | Type | Description |
| --- | --- | --- | --- |
| `--app-key` | Yes | int64 | AppKey |
| `--os` | Yes | string | `android` / `iphoneos` / `harmony` (note: `os` is strictly required here) |
| `--biz-module` | Yes | string | Same 6 enum values as `get-issues` |
| `--time-range` | Yes | object | Same as `get-issues`; see below |
| `--digest-hash` | Recommended | string | Base36 (13 chars). *Effectively required*: the backend `getIssueDetail` returns `InvalidParameters` when `digestHash` is empty |
| `--filter` | No | object | Further narrow the filter scope (e.g. a specific `appVersion`) |
### TimeRange
Exactly the same as `get-issues` (`style: "flat"`):
```bash
--time-range StartTime=<ms> EndTime=<ms> Granularity=1 GranularityUnit=HOUR
```
**Same pitfall**: avoid `Granularity=60 GranularityUnit=MINUTE`; use `1 HOUR` or `1 DAY` instead.
## Use cases
1. You already have a `DigestHash` (from `get-issues` / an alert / a console link) and want to view the aggregated metrics of that Issue in **any time window** (the metrics in the `get-issues` list only reflect the window of that page).
2. Compare the `ErrorCount` / `ErrorRate` / `Pages` of the same Issue across two time windows (the `Growth*` fields automatically compare against the previous equal-sized window).
3. Obtain the **affected version list** (`AffectedVersions`) - the list API does not return this field.
## `--cli-dry-run` example
```bash
aliyun emas-appmonitor get-issue \
--app-key 12345678 \
--os iphoneos \
--biz-module crash \
--time-range StartTime=1776000000000 EndTime=1776086400000 Granularity=1 GranularityUnit=DAY \
--digest-hash 3JE6F43KCQ1SV \
--cli-dry-run
```
## Response structure
Top level is the same as `get-issues`: `Success` / `ErrorCode` / `Message` / `RequestId` / `Model`.
`Model` is a single Issue object (not an array); major fields:
| Field | Type | Description |
| --- | --- | --- |
| `DigestHash` | string | Echoes the input |
| `Name` | string | Issue title |
| `Status` | int32 | `1=NEW` / `2=OPEN` / `3=CLOSE` / `4=FIXED` |
| `FirstVersion` | string | First-seen version |
| `AffectedVersions` | array<string> | Affected version list |
| `GmtCreate` | int64 | Issue first-created time (ms) |
| `GmtLatest` | int64 | Most recent occurrence time (ms) |
| `ErrorCount` / `ErrorRate` | int32 / double | Absolute count and ratio within the window |
| `ErrorCountGrowthRate` / `ErrorRateGrowthRate` | double | Growth rate vs the previous equal-sized window |
| `ErrorDeviceCount` / `ErrorDeviceRate` | int32 / double | Affected device count and rate |
| `ErrorDeviceCountGrowthRate` / `ErrorDeviceRateGrowthRate` | double | Growth rate |
| `Stack` | string | Full stack (richer than `get-issues.Items[*].Stack`) |
| `CruxStack` / `KeyLine` | string / int32 | Compressed key stack and key line |
| `Summary` | string | Machine-generated summary |
| `SymbolicStatus` | boolean | Whether already symbolicated (iOS) |
| `Type` / `Reason` | string | Exception type and reason |
| `Tags` | array<string> | User tags |
| `LagCost` | int64 | lag duration (ms) |
| `AllocSizeMax` / `AllocSizePct90` / `Pct70` / `Pct50` / `EventTime` | int64 / string | memory_alloc specific |
| `ErrorLine` / `ErrorColumn` / `ErrorFileName` / `ErrorName` / `ErrorType` | string | h5 / js related (usually not relevant for the 6 modules in this Skill) |
## Key JMESPath expressions
```bash
# Key numbers + stack at once
--cli-query "Model.{status:Status,errorCount:ErrorCount,errorRate:ErrorRate,versions:AffectedVersions,type:Type,reason:Reason,stack:Stack}"
# Only whether it is growing
--cli-query "Model.{ec:ErrorCount,ecGrowth:ErrorCountGrowthRate,er:ErrorRate,erGrowth:ErrorRateGrowthRate}"
# Only the affected versions
--cli-query "Model.AffectedVersions"
```
## Common errors
Mostly same as `get-issues`, additional notes:
- `InvalidParameters` with `Message` mentioning `digestHash` -> `--digest-hash` was not provided, or it was not a 13-char Base36 string.
- `Model` is `null` (`Success=true` but empty `Model`) -> no data for this `DigestHash` within the window; widen the `TimeRange` or switch `biz-module`.
## Tips
- **Pair with `get-errors`**: `get-issue` tells you "what the cluster is"; `get-errors` tells you "which concrete samples exist". `dig_issue.sh` in this Skill runs them in this order by default.
- **Spotting fast-growing Issues**: set `TimeRange` to yesterday's 24h, and `get-issue`'s `ErrorCountGrowthRate` is the "today vs yesterday" growth rate.
- **Symbolication check**: on iOS, when `SymbolicStatus=false` the `Stack` contains many hex addresses; uploading the dSYM is required. You can still use `cruxStack` as an aid.
FILE:references/get-issues.md
# `GetIssues` - aggregated error list
Aliyun CLI command: `aliyun emas-appmonitor get-issues`
API version: `2019-06-11` · Action: `GetIssues` · Method: `POST` RPC
Queries the **aggregated Issue list** under a given `AppKey + Os + BizModule + TimeRange` combination. Each record represents a distinct error cluster (identified by `DigestHash`) and can be sorted by error count / error rate / affected device count / affected device rate.
## Parameters
| CLI flag | Required | Type | Description |
| --- | --- | --- | --- |
| `--app-key` | Yes | int64 | Application AppKey, 10 digits or more |
| `--biz-module` | Yes | string | `crash` / `anr` / `lag` / `custom` / `memory_leak` / `memory_alloc` (the CLI `--help` only lists some of the 6; this Skill has verified that the rest **are forwarded as-is**; see `biz-module-reference.md`) |
| `--time-range` | Yes | object | `StartTime=<ms> EndTime=<ms> Granularity=<int> GranularityUnit=<MINUTE\|HOUR\|DAY>`; see "TimeRange" below |
| `--os` | Recommended | string | `android` / `iphoneos` / `harmony`. *Strongly recommended*; otherwise aggregation buckets from different platforms are mixed |
| `--filter` | No | object | Filter condition as a JSON string; see `filter-reference.md` |
| `--name` | No | string | Fuzzy search by `Name` |
| `--order-by` | No | string | `ErrorCount` / `ErrorRate` / `ErrorDeviceCount` / `ErrorDeviceRate` |
| `--order-type` | No | string | `asc` / `desc`, default `desc` |
| `--page-index` | No | int | Default 1 |
| `--page-size` | No | int | Server default; 10~50 recommended |
| `--status` | No | int | `1=NEW` / `2=OPEN` / `3=CLOSE` / `4=FIXED` |
### TimeRange wire format
In the OpenAPI, `TimeRange` is declared as `style: "flat"` - the CLI **flattens** the sub-fields into `Key=Value` form, separated by spaces, and the whole block is the value of `--time-range`:
```bash
--time-range StartTime=1775000000000 EndTime=1776000000000 Granularity=1 GranularityUnit=HOUR
```
- `StartTime` / `EndTime`: Unix millisecond timestamps (`int64`)
- `Granularity` + `GranularityUnit`: reference window for computing "growth rate"
- `GranularityUnit` values: `MINUTE` / `HOUR` / `DAY`
**Pitfall (observed by this Skill)**: in some prod environments, the combination `Granularity=60 GranularityUnit=MINUTE` can be rejected by the backend (returns `Code:200, Message:"unknown error"`). **Recommended forms**:
- Day level: `Granularity=1 GranularityUnit=DAY`
- Hour level: `Granularity=1 GranularityUnit=HOUR`
## `--cli-dry-run` example
Verify the request body format (does not send the real request):
```bash
aliyun emas-appmonitor get-issues \
--app-key 12345678 \
--os iphoneos \
--biz-module crash \
--time-range StartTime=1776000000000 EndTime=1776086400000 Granularity=1 GranularityUnit=HOUR \
--page-index 1 --page-size 10 \
--cli-dry-run
```
The dry-run prints a `Body` JSON in which `AppKey` / `BizModule` / `Os` are top-level fields, `TimeRange` is a nested object, and `Filter` (if provided) is **the entire JSON string** (the value at the `Filter` key is a string, not an object).
## Response structure
Top-level fields (consistent across all emas-appmonitor APIs):
| Field | Type | Description |
| --- | --- | --- |
| `Success` | boolean | Whether the business logic succeeded |
| `ErrorCode` | int32 | 0 means normal; non-zero see `Message` |
| `Message` | string | Error description; `"SUCCESS"` on success |
| `RequestId` | string | POP request ID; include this when reporting issues |
| `Model` | object | Business data, see below |
`Model` fields:
| Field | Type | Description |
| --- | --- | --- |
| `Total` | int64 | Total number of Issues matched (not the count on this page) |
| `PageNum` | int32 | Current page number |
| `PageSize` | int32 | Current page size |
| `Pages` | int32 | Total pages |
| `Items` | array | Each element is one Issue; see the table below |
`Model.Items[*]` fields:
| Field | Type | Description |
| --- | --- | --- |
| `DigestHash` | string | Unique Issue ID (Base36, fixed length 13). Use it as-is when calling `get-issue` / `get-errors` / `get-error` |
| `Name` | string | Issue title (usually truncated from the first few stack frames) |
| `Status` | int32 | `1=NEW` / `2=OPEN` / `3=CLOSE` / `4=FIXED` |
| `FirstVersion` | string | App version when the Issue first appeared |
| `ErrorCount` | int32 | Error count within the time window |
| `ErrorRate` | double | Error rate (errors / app launches) |
| `ErrorDeviceCount` | int32 | Affected device count |
| `ErrorDeviceRate` | double | Affected device rate |
| `AffectedUserCount` | int32 | Affected user count |
| `Stack` | string | Truncated stack (for reporting) |
| `Type` / `Reason` | string | crash: `EXC_BAD_ACCESS...` / `KERN_PROTECTION_FAILURE`; custom: business errorCode |
| `LagCost` | int64 | Only valid for lag; lag duration (ms) |
| `AllocSizePct90 / Pct70 / Pct50 / Max` | int64 | Only valid for memory_alloc; in-cluster allocation size percentiles |
| `EventTime` | string | Latest event time |
| `Tags` | array<string> | Manual tags |
## Key JMESPath expressions
Commonly used inside `--cli-query` (note the TitleCase keys, mapped from `backendName` in `.cspec`):
| Purpose | JMESPath |
| --- | --- |
| Core fields for N items on the current page | `Model.Items[*].{dh:DigestHash,name:Name,ec:ErrorCount,er:ErrorRate,edc:ErrorDeviceCount,status:Status}` |
| Digest hash list only | `Model.Items[*].DigestHash` |
| Just the total | `Model.Total` |
| Filter rows with `ErrorCount > 10` | `Model.Items[?ErrorCount > \`10\`].{dh:DigestHash,ec:ErrorCount}` |
## Typical sorting strategies
**By error rate** (the primary metric, traffic-normalized):
```bash
aliyun emas-appmonitor get-issues --app-key ... --biz-module crash --time-range ... \
--order-by ErrorRate --order-type desc --page-size 5 \
--cli-query "Model.Items[*].{dh:DigestHash,name:Name,er:ErrorRate,ec:ErrorCount}"
```
**By absolute error count** (useful for high-DAU apps to surface high-frequency Issues):
```bash
aliyun emas-appmonitor get-issues --app-key ... --biz-module lag --time-range ... \
--order-by ErrorCount --order-type desc --page-size 5
```
## Common business errors
| HTTP | `ErrorCode` | Meaning | Action |
| --- | --- | --- | --- |
| 400 | `InvalidAppId` | AppKey does not exist or cannot be parsed | Confirm the AppKey; do not use leading zeros |
| 400 | `InvalidParameters` | Some parameter is invalid (commonly: invalid `Granularity` combination, `TimeRange.StartTime > EndTime`) | Check timestamp unit and granularity combination |
| 400 | `InvalidRequest` | Request structure is invalid | Verify body field names |
| 403 | `Forbidden.NoRAMPermission` | RAM lacks `emasha:ViewIssues` | See `ram-policies.md` |
| 403 | `Forbidden.NoPermission` | Account does not own this AppKey | Find the AppKey owner account |
| 406 | `UnexpectedAppStatus` | App status is abnormal (overdue / disabled, etc.) | Activate the corresponding sub-service in the console (crash / apm / tlog) |
| 500 | `InternalError` | Backend error | Retry; report with `RequestId` |
## Pitfalls & tips
- **`--page-size 1` triggers `unknown error`**: this Skill has verified that the backend returns `Code: 200, Message: "unknown error"` for certain biz_modules when `PageSize=1`. **The minimum recommended is `--page-size 2`**; commonly use `--page-size 10`.
- **The list API does not filter by `DigestHash`**: to fetch one known Issue by hash, use `get-issue`.
- **Top N merging**: the 6 `BizModule`s for the same `AppKey + Os` are not exposed in a single API. This Skill's `list_top_issues.sh` parallel-invokes `get-issues` 6 times and merges / sorts with `jq`.
- **The denominator of `ErrorRate`** is the number of launches that day (computed by backend OLAP); low-traffic apps amplify noise. Always look at `ErrorCount` alongside.
- **`Status` field**: `CLOSE` / `FIXED` also appear in the response; to see only active issues, pass `--status 1` or `--status 2`, or filter client-side with JMESPath.
- **Pagination**: for a full scan, use `--pager` to let the CLI merge pages automatically; for big AppKeys, sample with `--page-size 10` first.
FILE:references/ram-policies.md
# RAM permissions and failure handling
This Skill uses the `aliyun-cli-emas-appmonitor` plugin to call the EMAS AppMonitor OpenAPI (`regionId=cn-shanghai`, `product=emas-appmonitor`, `version=2019-06-11`). Below is the **minimum set of RAM actions required to call the 4 core APIs**.
> **RamCode**: EMAS AppMonitor's RAM code is `emasha` (not `emas-appmonitor`). The action prefix for custom policies is `emasha:`. EMAS AppMonitor does **not** support resource-level authorization, so `Resource` must be `"*"`.
## Required actions
| CLI subcommand | RAM action | Purpose |
| --- | --- | --- |
| `aliyun emas-appmonitor get-issues` | `emasha:ViewIssues` | Fetch aggregated error list |
| `aliyun emas-appmonitor get-issue` | `emasha:ViewIssue` | Fetch details of a single aggregated error |
| `aliyun emas-appmonitor get-errors` | `emasha:ViewErrors` | Fetch sample list under an aggregated error |
| `aliyun emas-appmonitor get-error` | `emasha:ViewError` | Fetch full details of a single sample |
## Recommended policy
Read-only permission policy. `Resource` is fixed to `"*"` because EMAS AppMonitor does not support resource-level authorization:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"emasha:ViewIssues",
"emasha:ViewIssue",
"emasha:ViewErrors",
"emasha:ViewError"
],
"Resource": "*"
}
]
}
```
Or attach the official system policy directly:
| System policy | Scope |
| --- | --- |
| `AliyunEMASAppMonitorReadOnlyAccess` | `emasha:View*` + `Resource:"*"` (sufficient for this Skill) |
| `AliyunEMASAppMonitorFullAccess` | `emasha:*` + `Resource:"*"` (grants write access too — not needed here) |
## Permission Failure Handling
> **[MUST] Permission Failure Handling:** When any command or API call fails due to permission errors during execution, follow this process:
> 1. Read `references/ram-policies.md` to get the full list of permissions required by this Skill
> 2. Invoke the `ram-permission-diagnose` skill to guide the user through requesting the needed permissions
> 3. Pause and wait for the user to confirm the required permissions have been granted before continuing
## Common permission errors
| Symptom | Explanation | Action |
| --- | --- | --- |
| `Code: NoPermission` | The current profile lacks the action | Attach the policy above and retry |
| `Code: Forbidden.RAM` | The RAM sub-user is disabled or the policy has not taken effect | In the RAM console, confirm the policy is attached and the user is enabled |
| HTTP 403 + `SignatureDoesNotMatch` | Wrong AK/SK | Check the profile with `aliyun configure list`; re-run `aliyun configure` if needed (do NOT expose AK/SK in the conversation) |
| `InvalidAccessKeyId.NotFound` | AK is deleted / disabled | Issue a new AK in the RAM console |
## Related resources
- EMAS console: https://emas.console.aliyun.com/
- RAM console: https://ram.console.aliyun.com/
- Official read-only system policy: https://help.aliyun.com/zh/ram/developer-reference/aliyunemasappmonitorreadonlyaccess
- Official full-access system policy: https://help.aliyun.com/zh/ram/developer-reference/aliyunemasappmonitorfullaccess
- AppMonitor RamCode (action prefix for custom policies): `emasha`
FILE:references/related-commands.md
# CLI cheat sheet for this Skill
All commands assume `aliyun-cli 3.3.3+` with the `aliyun-cli-emas-appmonitor` plugin (`aliyun plugin install --names emas-appmonitor`).
## 1. General environment setup
```bash
# Version
aliyun version
# View / switch profile (do NOT echo AK/SK)
aliyun configure list
aliyun configure set --current <profile>
# Enable auto plugin install
aliyun configure set --auto-plugin-install true
# Plugins
aliyun plugin update
aliyun plugin install --names emas-appmonitor
aliyun plugin list | grep emas-appmonitor
```
## 2. AI-mode lifecycle
```bash
aliyun configure ai-mode enable
aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-emas-apm-query"
# ... business commands ...
aliyun configure ai-mode disable
```
## 3. The 4 core APIs
### 3.1 `get-issues` - fetch aggregated error list
```bash
NOW_MS=$(($(date +%s) * 1000)); START_MS=$(($NOW_MS - 7*86400000))
aliyun emas-appmonitor get-issues \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ_MODULE" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
[--filter '<json>'] \
[--name '<keyword>'] \
[--order-by ErrorCount|ErrorRate|ErrorDeviceCount|ErrorDeviceRate] \
[--order-type asc|desc] \
[--status 1|2|3|4] \
[--page-index <int>] \
[--page-size <int>]
```
### 3.2 `get-issue` - fetch aggregated error details
```bash
aliyun emas-appmonitor get-issue \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ_MODULE" \
--digest-hash "$HASH" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
[--filter '<json>']
```
### 3.3 `get-errors` - sample list
```bash
aliyun emas-appmonitor get-errors \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ_MODULE" \
--digest-hash "$HASH" \
--time-range StartTime=$START_MS EndTime=$NOW_MS \
--page-index 1 --page-size 20 \
[--filter '<json>'] \
[--utdid '<utdid>']
```
`--time-range` here accepts only `StartTime` + `EndTime`, not `Granularity`.
### 3.4 `get-error` - sample details
```bash
aliyun emas-appmonitor get-error \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ_MODULE" \
--digest-hash "$HASH" \
--client-time "$CLIENT_TIME" --uuid "$UUID" --did "$DID" \
[--biz-force false]
```
The triple `(client-time, uuid, did)` comes from `get-errors.Items[*]`; all three are required.
## 4. Parallel scan over 6 bizModules (entry point when no DigestHash is given)
```bash
NOW_MS=$(($(date +%s) * 1000)); START_MS=$(($NOW_MS - 7*86400000))
for MOD in crash anr lag custom memory_leak memory_alloc; do
aliyun emas-appmonitor get-issues \
--app-key "$APP_KEY" --os "$OS" --biz-module "$MOD" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
--filter '{"Operator":"and","SubFilters":["{\"operator\":\"in\",\"key\":\"issueStatus\",\"values\":[1,2,3,4]}"]}' \
--order-by ErrorCount --order-type desc --page-index 1 --page-size 5 \
--cli-query 'Model.Items[*].{Module:`'"$MOD"'`,DigestHash:DigestHash,Type:Type,ErrorCount:ErrorCount,ErrorDeviceCount:ErrorDeviceCount,FirstVersion:FirstVersion}' > /tmp/top_MOD.json
done
# Merge, sort, take Top 5
jq -s 'flatten | sort_by(-(.ErrorCount // 0)) | .[0:5]' /tmp/top_*.json
```
Skip irrelevant combinations to reduce RPC: ANR only applies to `android`; `memory_leak` / `memory_alloc` do not apply to `harmony`.
## 5. Chained calls
```bash
# Step 1: Top 1 DigestHash (--cli-query's scalar output is a JSON string; use jq -r to strip quotes)
DH=$(aliyun emas-appmonitor get-issues \
--app-key "$APP_KEY" --os "$OS" --biz-module "$MOD" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
--order-by ErrorCount --order-type desc --page-index 1 --page-size 1 \
--cli-query 'Model.Items[0].DigestHash' | jq -r .)
# Step 2: Aggregated details of this Issue
aliyun emas-appmonitor get-issue \
--app-key "$APP_KEY" --os "$OS" --biz-module "$MOD" \
--digest-hash "$DH" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day
# Step 3: Latest sample triple (three CLI calls; you can also do one call and parse with jq)
SAMPLE=$(aliyun emas-appmonitor get-errors \
--app-key "$APP_KEY" --os "$OS" --biz-module "$MOD" \
--digest-hash "$DH" \
--time-range StartTime=$START_MS EndTime=$NOW_MS \
--page-index 1 --page-size 1 \
--cli-query 'Model.Items[0].{CT:ClientTime,UUID:Uuid,DID:Did}')
CT=$(echo "$SAMPLE" | jq -r .CT)
UUID=$(echo "$SAMPLE" | jq -r .UUID)
DID=$(echo "$SAMPLE" | jq -r .DID)
# Step 4: Full sample details
aliyun emas-appmonitor get-error \
--app-key "$APP_KEY" --os "$OS" --biz-module "$MOD" \
--digest-hash "$DH" \
--client-time "$CT" --uuid "$UUID" --did "$DID" > /tmp/sample.json
```
## 6. Debug helpers
```bash
# Do not send the request; inspect the CLI-serialized request parameters
aliyun emas-appmonitor get-issues ... --cli-dry-run
# DEBUG log (includes HTTP body)
aliyun emas-appmonitor get-issues ... --log-level DEBUG
# Specify a custom endpoint (private network / Apsara Stack)
aliyun emas-appmonitor get-issues ... --endpoint emas-appmonitor.cn-shanghai.aliyuncs.com
# Select specific fields only
aliyun emas-appmonitor get-issue ... --cli-query 'Model.{Hash:DigestHash,Type:Type,Stack:Stack}'
```
## 7. CLI query templates
| Scenario | JMESPath |
| --- | --- |
| Top Hash only | `Model.Items[0].DigestHash` |
| Flatten Top N | `Model.Items[*].{Hash:DigestHash,Count:ErrorCount}` |
| Issue overview | `Model.{Hash:DigestHash,Type:Type,AffectedVersions:AffectedVersions,Stack:Stack}` |
| Sample triple | `Model.Items[*].{CT:ClientTime,UUID:Uuid,DID:Did}` |
| Sample dimensions | `Model.{App:AppVersion,Os:OsVersion,Brand:Brand,Model:DeviceModel,Country:Country}` |
## 8. Arguments to avoid / use with caution
| Argument | Reason |
| --- | --- |
| `--biz-force true` (on `get-error`) | Bypasses the cache and forces a refresh, adding load on the backend; use only when the cache is stale. |
| Blind `--pager` | Sample lists are usually large; pulling them fully can take minutes, not suitable inside an Agent loop. |
| `eq` / `neq` / `not_in` inside `--filter` | Observed not to work; use `in` or `or` instead. See [filter-reference.md](filter-reference.md). |
| `--region` | EMAS AppMonitor is only in `cn-shanghai` today; explicit override is usually unnecessary. |
FILE:references/troubleshoot-workflow.md
# Troubleshooting workflow: from sample to code fix
This document describes how the Skill, after obtaining the sample details from `get-error`, locates the issue inside the **APP source code in the user's current working directory (CWD)** and proposes a fix. The entire flow relies only on CLI output plus Cursor's built-in code search - no external data source is required.
## 1. Overview
```mermaid
flowchart TD
S[Got the get-error Model] --> M{CWD has APP source?}
M -- no --> Nocode[Emit CLI-only diagnostic report; ask user to switch directory]
M -- yes --> P[Detect platform: Android/iOS/Harmony]
P --> B{biz-module branch}
B -- crash/anr --> Stack[Main thread Backtrace + Threads]
B -- lag --> LagStack[Backtrace + EventLog]
B -- custom --> CustomStack[ExceptionMsg/Digest + Backtrace]
B -- memory_* --> Mem[MemInfo/MemoryMap + Backtrace]
Stack --> Crux[Extract user-code frames]
LagStack --> Crux
CustomStack --> Crux
Mem --> Crux
Crux --> Map[Code search: file/class/method exact match]
Map --> Ctx[Expand context, compare EventLog/Controllers]
Ctx --> Root[Root-cause hypothesis]
Root --> Fix[Fix proposal: hotfix + long-term refactor]
Fix --> Diff[Emit minimum diff]
Nocode --> endNode[Done]
Diff --> endNode
```
## 2. Detect whether the CWD contains APP source
**One cross-platform check** (any single hit means "has source"):
| Platform | Signature files |
| --- | --- |
| Android | `build.gradle` / `build.gradle.kts` / `settings.gradle` / `AndroidManifest.xml` / `gradle.properties` |
| iOS | `*.xcodeproj` / `*.xcworkspace` / `Podfile` / `Podfile.lock` / `Package.swift` |
| Harmony | `build-profile.json5` / `oh-package.json5` / `hvigor-config.json5` |
| Generic / cross-platform | `package.json` + `android/` or `ios/` (React Native / Flutter / Taro) |
Detection snippet:
```bash
check_has_app_code() {
local dir="-."
local hit=$(ls "$dir" 2>/dev/null | grep -Ec 'build\.gradle|settings\.gradle|Podfile|\.xcodeproj|\.xcworkspace|build-profile\.json5|package\.json')
[ "$hit" -gt 0 ]
}
```
When `hit=0`, take the "CLI-only diagnostic report" branch and tell the user:
> The current directory has no APP source. Switch to the APP repository root and the Skill will map stack frames to file:line and propose fixes.
## 3. Key fields (`get-error` response)
All field names used in this workflow come from `get-error.Model`; the most relevant ones are:
| Field | Purpose |
| --- | --- |
| `Backtrace` | **Crash thread stack**, format varies per platform; nearly all code mapping starts here |
| `Threads` | All threads (mandatory for crash / anr) |
| `ExceptionType` / `ExceptionSubtype` / `ExceptionCodes` / `ExceptionMsg` / `ExceptionDetail` | Exception description |
| `EventLog` | Time-ordered event stream (page navigation, lifecycle, custom events) |
| `MainLog` | Main thread log (tail of Android logcat / iOS main thread log) |
| `Controllers` | Page path (iOS VC stack / Android Activity stack) |
| `CustomInfo` / `AdditionalCustomInfo` | Key-values injected by the developer |
| `MemInfo` / `MemoryMap` / `FileDescriptor` | Memory / FD snapshot (memory_* / OOM crash) |
## 4. Rules for mapping stack -> source file
### 4.1 Android (Java / Kotlin / JVM frames)
Frame format: `at <package>.<Class>.<method>(<FileName>.java:<line>)`
**Mapping steps**:
1. Read `Model.Backtrace` and scan frame by frame.
2. **Skip system frames**: anything starting with `java.`, `kotlin.`, `android.`, `androidx.`, `dalvik.`, `sun.`, `com.google.`.
3. **Locate user frames**: keep frames starting with the APP `applicationId` (read it from `build.gradle` `applicationId` or the `package` attribute in `AndroidManifest.xml`).
4. **Regex-match the class name**: `at <pkg>.<Class>.<method>(File.java:<L>)`.
5. **Search in the repo**:
- Prefer mapping the **fully qualified class name** to a file path: `com.example.foo.Bar` -> parallel Grep/Glob for `**/Bar.java|kt`.
- Once found, jump to `line` +/- 10 and verify the signature matches the frame.
6. **Kotlin specifics**:
- `$lambda$<n>` / `Companion` / `$default` suffixes come from the Kotlin compiler; they can map back to source but the line number may be off by 1~3 lines - find the matching lambda signature nearby.
- `<init>` is a constructor.
7. **Obfuscation**: when class names look like `a.a.a.b.c`, ask the user for `mapping.txt` to deobfuscate; otherwise only offer a "post-obfuscation stack summary + device/version" level conclusion.
### 4.2 iOS (Objective-C / Swift frames)
`Backtrace` typically has two formats:
- Pre-symbolicated: `0 apm_ios_demo 0x1030e93d0 0x103098000 + 332752`
- Post-symbolicated: `0 apm_ios_demo EAPMDemoTriggerStackOverflow(unsigned long)` or `0 apm_ios_demo -[MyViewController tableView:didSelectRowAtIndexPath:]`
**Is it symbolicated?** If `Backtrace` contains frames like `apm_ios_demo\s+-\[ClassName method]` or `apm_ios_demo\s+C\+\+Symbol`, yes; if it is mostly `0x...` or UUID strings (`2F32D384-4637-3018-...`), no.
**Mapping steps**:
1. Not symbolicated -> tell the user to "upload the dSYM for this version to the EMAS console, wait 5-10 minutes, then use the same `get-errors` to pull the latest samples". This round only provides device / version / type / count comparisons.
2. Symbolicated -> scan frame by frame:
- **Skip system frames**: `UIKitCore`, `Foundation`, `libobjc.A.dylib`, `libSystem.B.dylib`, `CoreFoundation`, `QuartzCore`, `libdispatch.dylib`, etc.
- Keep frames whose image name equals the APP executable name (usually the target name in `Podfile`, e.g. `apm_ios_demo`).
- Extract the **class name + method name** pair.
3. Grep/Glob for the file:
- OC: `<ClassName>.h` + `<ClassName>.m` (read both to confirm header/impl split).
- Swift: `<ClassName>.swift` (Swift's type system makes duplicated class names rare).
4. Common combinations of `ExceptionType` / `ExceptionCodes`:
- `EXC_BAD_ACCESS(KERN_PROTECTION_FAILURE)` -> wild pointer / write to read-only segment / stack overflow (the `EAPMDemoTriggerStackOverflow` recursion is this family)
- `EXC_BAD_ACCESS(KERN_INVALID_ADDRESS)` -> use-after-free / KVO not unregistered
- `EXC_CRASH + SIGABRT` -> explicit `abort()`, read the message in `ExceptionDetail` (often from `NSAssert` / `fatalError`)
- `NSInvalidArgumentException` + `unrecognized selector` -> dynamic dispatch failure (wrong selector / target already released)
### 4.3 Harmony (ArkTS)
Frame format: `at <module>.<ClassName>.<method> (<file>.ts:<line>:<column>)`
- Skip system modules like `@ohos.*` / `@system.*`.
- The `<module>` in user frames is the `name` field of `oh-package.json5`.
- File paths are organized under `src/main/ets/**`; grepping the class name hits directly.
### 4.4 React Native / Flutter / cross-platform
- **JS stack**: `<function>@<bundlePath>:<line>:<column>` - requires sourcemap to resolve; in this case the Skill only groups by device dimensions and does not do code-level localization.
- **Dart stack**: `#0 MyWidget.build (package:app/pages/home.dart:42:7)` - parse `package:app/...` into `lib/pages/home.dart`.
## 5. Combine `EventLog` / `Controllers` / `Threads`
After obtaining the sample, **do not look only at the crashing thread**. Most crashes / anrs have observable "precursor signals":
1. **`EventLog`** (time-ordered event stream): scan the last 20~50 lines for:
- The most recent page navigation / `PageAppear` / `ViewController transition`
- The most recent network request URL + status code
- The most recent major lifecycle transition (`AppDidBecomeActive` / `ApplicationOnStop`)
2. **`Controllers`** (page stack snapshot): the page path at crash time; combined with `EventLog` it tells you "which page triggered which operation".
3. **`MainLog`**: main-thread logs reported via tlog (tail of Android `logcat`, `NSLog` on iOS main thread).
- Prioritize lines with `level=ERROR`.
- Lines carrying a business serial number (`requestId` / `traceId`) can be cross-referenced with app-side instrumentation.
4. **`CustomInfo`** / **`AdditionalCustomInfo`**: key-values the developer manually injected; often contain "user-state data" (logged-in, channel, ABTest bucket, etc.).
5. **`Threads`** (anr / crash specific):
- anr: find the "lock-holding" thread (stack stopped at `synchronized` / `ReentrantLock.lock` / `pthread_mutex_lock`)
- crash: check whether background threads other than the main one are writing the same resource
Stitching precursor signals into a timeline alongside the crashing frame localizes the root cause faster than reading `Backtrace` in isolation.
## 6. Output format
The Skill's final report follows this structure:
```markdown
# Troubleshooting report for APP <name>
## 1. Issue list (Top N)
| # | DigestHash | biz_module | Status | ErrorRate | ErrorCount | Representative symptom |
| ...
## 2. Detailed analysis
### 2.1 <DigestHash-1> <Title>
**Overview**: ExceptionType / ExceptionMsg / FirstVersion / AffectedVersions
**Trigger path**: EventLog + Controllers timeline
**Code localization**:
- `src/.../Foo.java:123` - `foo()` missing null check
- `src/.../Bar.swift:45` - synchronous IO on main thread
**Root-cause hypothesis**: ...
**Fix proposal**:
- Immediate hotfix: add a null guard at Foo.java:123 (see diff below)
- Long-term refactor: ...
**Diff**:
\`\`\`diff
--- a/src/.../Foo.java
+++ b/src/.../Foo.java
@@ -120,7 +120,11 @@
public void foo() {
- user.getProfile().load();
+ if (user == null || user.getProfile() == null) {
+ return;
+ }
+ user.getProfile().load();
}
\`\`\`
...
```
## 7. Edge cases
- **Empty `Backtrace`**: memory_leak / memory_alloc may have no stack; fall back to `MemoryMap` / `MemInfo` + `Digest` (stack summary) and offer a reference-chain analysis instead.
- **Obfuscated stack (Android R8/ProGuard)**: class names like `a.a.a.b.c` - ask the user for `mapping.txt` and re-analyze.
- **Native stack not symbolicated**: mostly `0x...` addresses - ask the user to upload the `.so` symbol file to the EMAS console, or do offline symbolication locally with `addr2line`.
- **Truncated sample**: when `Backtrace` / `EventLog` exceeds 64KB, the backend truncates it (trailing "...(truncated)"). Ask the user to pull a newer sample or to increment `--page-index` to the next page and pick a smaller sample.
## 8. Minimum-diff convention
- Keep a single-file diff **<= 20 lines**, do not drag surrounding code in.
- 3 lines of context before and after, so reviewers can align visually.
- Do NOT put `// TODO` / `// FIXME` in the diff; either fix it or state "to be discussed".
- Precede each diff with a **one-sentence why** (not what):
> "Add a null guard to prevent `getProfile()` from returning null when the user is not logged in, which causes `NullPointerException` (frame 3 of the crash stack)."
## 9. Interop with CLI errors
If `get-error` itself returns no data (`Success=true` but `Model=null`):
- Ask the user to widen `get-errors`' `--time-range` (7 days first, then 30) and obtain a new `ClientTime+Uuid`
- Try a different `Uuid` (pick a newer one from the sample list)
- Add `--biz-force true` to bypass the cache and retry (rarely effective)
- If still impossible, fall back to a report based purely on `get-issue` aggregated info, explicitly marking "no per-sample stack"
**This flow never directs the user to query a server-side data source** - it uses only CLI output.
FILE:references/verification-method.md
# Success verification method
这份文档给出 6 步**可直接复制执行**的 CLI 校验步骤,用于验证本 Skill 端到端可用。
所有命令只调用 `aliyun emas-appmonitor` 的 4 个只读 API,不会产生任何写入操作。
执行前请确认:
- `aliyun version` 输出 `>= 3.3.3`
- `aliyun plugin list | grep emas-appmonitor` 命中一行
- `aliyun configure list` 有一个当前 profile(Mode 非空、RegionId 非空)
- 已准备好一个真实的 `AppKey` 与对应的 `--os`(例如 `335581386 / android`)
以下示例统一使用变量:
```bash
APP_KEY=335581386
OS=android
BIZ=crash
NOW_MS=$(( $(date +%s) * 1000 ))
START_MS=$(( NOW_MS - 24*3600*1000 )) # 最近 24 小时
```
> 若任一步骤失败:先查 [`cli-installation-guide.md`](cli-installation-guide.md)(CLI / 插件 / 凭证)与 [`ram-policies.md`](ram-policies.md)(RAM 权限)。
---
## Step 1 — Reachable(`get-issues` dry-run)
**目的**:确认 CLI 能正确序列化请求,并且网络可达。
```bash
aliyun emas-appmonitor get-issues \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
--order-by ErrorCount --order-type desc \
--page-index 1 --page-size 5 \
--cli-dry-run
```
**判据**:
- ✅ 通过:stdout 打印序列化后的 HTTP 请求体,能看到 `AppKey` / `Os` / `BizModule` / `TimeRange` 字段
- ❌ 失败:`required flags missing` → 按报错补参;`product not exists` → `aliyun plugin install --names emas-appmonitor`
---
## Step 2 — Non-empty(至少一个 bizModule 有数据)
**目的**:确认账号下的 AppKey 在所选时间窗内至少有一条聚合 Issue。
```bash
for MOD in crash anr lag custom memory_leak memory_alloc; do
TOTAL=$(aliyun emas-appmonitor get-issues \
--app-key "$APP_KEY" --os "$OS" --biz-module "$MOD" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
--page-index 1 --page-size 1 \
--cli-query 'Model.Total' 2>/dev/null)
echo "$MOD -> Total=$TOTAL"
done
```
**判据**:
- ✅ 通过:至少一行 `Total` 为正整数
- ❌ 失败:全部 `Total=0`
- Harmony 下 `anr/memory_leak/memory_alloc` 为 0 属正常(平台不支持)
- 其它组合全 0 → 时间窗口太窄或 AppKey/OS 不匹配,扩大到 7 天或用 [`appkey-detection.md`](appkey-detection.md) 重新确认 AppKey
---
## Step 3 — Stable(两次同参调用返回一致的 Top 5)
**目的**:确认后端返回是确定性的(无随机排序、无超时切流量)。
```bash
QUERY='Model.Items[*].DigestHash'
A=$(aliyun emas-appmonitor get-issues \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
--order-by ErrorCount --order-type desc \
--page-index 1 --page-size 5 --cli-query "$QUERY")
sleep 2
B=$(aliyun emas-appmonitor get-issues \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
--order-by ErrorCount --order-type desc \
--page-index 1 --page-size 5 --cli-query "$QUERY")
diff <(echo "$A") <(echo "$B") && echo "STABLE" || echo "UNSTABLE"
```
**判据**:
- ✅ 通过:输出 `STABLE`,两次 5 个 `DigestHash` 完全一致
- ❌ 失败:输出 `UNSTABLE`
- 检查时间窗口是否跨越了数据落库边界(改用纯历史窗口,例如 `[now-72h, now-24h]` 再试)
- 若仍不稳定,`--log-level debug` 抓 `RequestId` 反馈工单
---
## Step 4 — Filter works(添加过滤后 Total 严格 ≤ 全量)
**目的**:确认 `--filter` 语法正确、并真的能缩小结果集。
```bash
TOTAL_ALL=$(aliyun emas-appmonitor get-issues \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
--page-index 1 --page-size 1 \
--cli-query 'Model.Total')
TOTAL_FILTERED=$(aliyun emas-appmonitor get-issues \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
--filter '{"Operator":"and","SubFilters":["{\"operator\":\"in\",\"key\":\"issueStatus\",\"values\":[1]}"]}' \
--page-index 1 --page-size 1 \
--cli-query 'Model.Total')
echo "all=$TOTAL_ALL filtered=$TOTAL_FILTERED"
[[ "$TOTAL_FILTERED" -le "$TOTAL_ALL" ]] && echo "FILTER_OK" || echo "FILTER_BAD"
```
**判据**:
- ✅ 通过:`filtered <= all` 且输出 `FILTER_OK`
- ❌ 失败:`filtered > all` 或 CLI 报错
- 常见错法见 [`acceptance-criteria.md`](acceptance-criteria.md) §2/§6(SubFilters 元素必须是 JSON 字符串;Root Operator/SubFilters 大写)
- 过滤键可用值见 [`filter-reference.md`](filter-reference.md) 和 `assets/system-filters/biz-os.json`
---
## Step 5 — 三级链路(`get-issues → get-issue → get-errors → get-error`)
**目的**:端到端跑通从 Top 聚合到单样本 Stack 的全链路。
```bash
# 5.1 拿 Top 1 的 DigestHash
DH=$(aliyun emas-appmonitor get-issues \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
--order-by ErrorCount --order-type desc \
--page-index 1 --page-size 1 \
--cli-query 'Model.Items[0].DigestHash' | jq -r .)
echo "DigestHash=$DH"
# 5.2 聚合详情
aliyun emas-appmonitor get-issue \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ" \
--digest-hash "$DH" \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
--cli-query 'Model.{Hash:DigestHash,Type:Type,Versions:AffectedVersions}'
# 5.3 取最新样本三元组
read CT UUID DID < <(aliyun emas-appmonitor get-errors \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ" \
--digest-hash "$DH" \
--time-range StartTime=$START_MS EndTime=$NOW_MS \
--page-index 1 --page-size 1 \
--cli-query 'Model.Items[0].[ClientTime,Uuid,Did]' | jq -r '.[] | tostring' | paste -sd ' ' -)
echo "sample: CT=$CT UUID=$UUID DID=$DID"
# 5.4 单样本详情
aliyun emas-appmonitor get-error \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ" \
--digest-hash "$DH" \
--client-time "$CT" --uuid "$UUID" --did "$DID" \
--cli-query 'Model.{Hash:DigestHash,Stack:Stack,HasBacktrace:(Backtrace!=null)}' \
> /tmp/emas-step5-error.json
jq -r '.Hash, .HasBacktrace' /tmp/emas-step5-error.json
```
**判据**:
- ✅ 通过:`DH` 非空;`get-issue` 返回 `Hash` 与 5.1 一致;`CT/UUID/DID` 非空;`get-error` 输出的 `Hash` 也与 `DH` 一致
- ❌ 失败:
- 5.1 `DH=null` → Step 2 已确认有数据时,检查 `--order-by` 是否大小写写错
- 5.3 缺 `DID` → `get-error` 会直接报 `Code: 100011 Parameter Not Enough`,遵循 SKILL.md 规则始终传 `--did`
- 5.4 返回空 `Model` → 大概率 `--biz-module` 与 `DigestHash` 不匹配,参考 SKILL.md §6 "Reuse `biz-module`"
---
## Step 6 — Diagnosable(人为引错,确认输出含 `RequestId` + 错误码)
**目的**:确认失败路径本身是可观测、可排障的。
```bash
aliyun emas-appmonitor get-issue \
--app-key 0 --os "$OS" --biz-module "$BIZ" \
--digest-hash BADHASH \
--time-range StartTime=$START_MS EndTime=$NOW_MS Granularity=1 GranularityUnit=day \
--log-level debug 2>&1 | tee /tmp/emas-step6.log | tail -40
```
**判据**:
- ✅ 通过:日志中能同时看到 `RequestId: ...` 与 `Code: ...`(例如 `Code: 400`、`Code: InvalidParameter`、`Code: NoPermission` 之一)
- ❌ 失败:连不上(DNS/HTTPS)或没有 `RequestId`
- 检查 `aliyun configure list` 的 Region / Mode
- 必要时显式指定 `--endpoint emas-appmonitor.cn-shanghai.aliyuncs.com`
- 权限类报错按 [`ram-policies.md`](ram-policies.md) 的 **Permission Failure Handling** 流程处理
---
## 汇总校验脚本
所有 6 步跑完后,按以下口径给出一句话结论:
| 维度 | 期望 |
| --- | --- |
| Step 1 Reachable | dry-run 成功,HTTP body 可见 |
| Step 2 Non-empty | 至少 1 个 bizModule 有 `Total >= 1` |
| Step 3 Stable | 两次 Top 5 `DigestHash` 完全一致 |
| Step 4 Filter | `filtered <= all` |
| Step 5 Chain | 4 个 API 全部返回,`DigestHash` 贯通 |
| Step 6 Diagnosable | 错误响应含 `RequestId` + `Code` |
6 项都打勾即视为本 Skill 在当前 AppKey/账号下验证通过。
FILE:scripts/dig_issue.sh
#!/usr/bin/env bash
# dig_issue.sh - For a known digestHash, run get-issue + get-errors + get-error (N samples)
# in sequence, and produce a "markdown + raw JSON folder" bundle for analysis.
#
# Depends only on aliyun-cli + jq.
# This script never calls any backend data source / database.
set -euo pipefail
# Self-detect the script's directory -> Skill root, avoiding reliance on an externally exported $SKILL_DIR
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
SKILL_DIR="-$(dirname "$SCRIPT_DIR")"
APP_KEY=""
OS=""
BIZ_MODULE=""
DIGEST_HASH=""
START_MS=""
END_MS=""
SAMPLE_SIZE=3
OUT_DIR=""
GRANULARITY="1"
GRANULARITY_UNIT="DAY"
usage() {
cat <<'EOF'
Usage:
dig_issue.sh --app-key <id> --os <android|iphoneos|harmony> \
--biz-module <crash|anr|lag|custom|memory_leak|memory_alloc> \
--digest-hash <Base36 13 chars> \
--start-time <ms> --end-time <ms> \
[--sample-size 3] [--out-dir <dir>]
Example:
NOW=$(date +%s); END=$(( NOW * 1000 )); START=$(( (NOW-86400*7) * 1000 ))
dig_issue.sh --app-key 335695934 --os iphoneos --biz-module crash \
--digest-hash 3JE6F43KCQ1SV --start-time $START --end-time $END --sample-size 3
Output:
$OUT_DIR/
01-get-issue.json # raw get-issue response (TimeRange with Granularity)
02-get-errors.json # raw get-errors sample list (TimeRange without Granularity)
samples/<Uuid>.json # full get-error response per sample
report.md # structured markdown report
EOF
}
while [[ $# -gt 0 ]]; do
case "$1" in
--app-key) APP_KEY="$2"; shift 2 ;;
--os) OS="$2"; shift 2 ;;
--biz-module) BIZ_MODULE="$2"; shift 2 ;;
--digest-hash) DIGEST_HASH="$2"; shift 2 ;;
--start-time) START_MS="$2"; shift 2 ;;
--end-time) END_MS="$2"; shift 2 ;;
--sample-size) SAMPLE_SIZE="$2"; shift 2 ;;
--out-dir) OUT_DIR="$2"; shift 2 ;;
--granularity) GRANULARITY="$2"; shift 2 ;;
--granularity-unit) GRANULARITY_UNIT="$2"; shift 2 ;;
-h|--help) usage; exit 0 ;;
*) echo "Unknown argument: $1" >&2; usage; exit 2 ;;
esac
done
[[ -z "$APP_KEY" || -z "$OS" || -z "$BIZ_MODULE" || -z "$DIGEST_HASH" || -z "$START_MS" || -z "$END_MS" ]] && { usage; exit 2; }
# Input validation: treat every argument from the Agent as untrusted; enforce type, format and bounds
die() { echo "[ERROR] $*" >&2; exit 2; }
is_uint() { [[ "$1" =~ ^[0-9]+$ ]]; }
is_puint() { [[ "$1" =~ ^[1-9][0-9]*$ ]]; }
in_set() { local v="$1"; shift; for x in "$@"; do [[ "$v" == "$x" ]] && return 0; done; return 1; }
is_uint "$APP_KEY" || die "--app-key must be numeric, got: $APP_KEY"
in_set "$OS" android iphoneos harmony || die "--os must be android|iphoneos|harmony, got: $OS"
in_set "$BIZ_MODULE" crash anr lag custom memory_leak memory_alloc \
|| die "--biz-module must be crash|anr|lag|custom|memory_leak|memory_alloc, got: $BIZ_MODULE"
# DigestHash constraint: 13-char Base36 (uppercase digits + letters)
[[ "$DIGEST_HASH" =~ ^[0-9A-Z]{13}$ ]] || die "--digest-hash must be 13 Base36 chars (^[0-9A-Z]{13}$), got: $DIGEST_HASH"
is_uint "$START_MS" || die "--start-time must be a non-negative integer (s or ms), got: $START_MS"
is_uint "$END_MS" || die "--end-time must be a non-negative integer (s or ms), got: $END_MS"
is_puint "$SAMPLE_SIZE" || die "--sample-size must be a positive integer, got: $SAMPLE_SIZE"
is_puint "$GRANULARITY" || die "--granularity must be a positive integer, got: $GRANULARITY"
in_set "$GRANULARITY_UNIT" MINUTE HOUR DAY || die "--granularity-unit must be MINUTE|HOUR|DAY, got: $GRANULARITY_UNIT"
# Auto-promote seconds to milliseconds
[[ "$START_MS" -lt 1000000000000 ]] && START_MS=$(( START_MS * 1000 ))
[[ "$END_MS" -lt 1000000000000 ]] && END_MS=$(( END_MS * 1000 ))
# Time range boundary: EndTime must not be earlier than StartTime
(( END_MS >= START_MS )) || die "--end-time must be >= --start-time"
command -v aliyun >/dev/null 2>&1 || { echo "[ERROR] aliyun CLI missing" >&2; exit 3; }
command -v jq >/dev/null 2>&1 || { echo "[ERROR] jq missing" >&2; exit 3; }
export ALIBABA_CLOUD_USER_AGENT="-AlibabaCloud-Agent-Skills/alibabacloud-emas-apm-query"
# Explicit CLI timeouts to avoid indefinite hangs on network issues; callers may override via env vars
export ALIBABA_CLOUD_CONNECT_TIMEOUT="-30"
export ALIBABA_CLOUD_READ_TIMEOUT="-60"
[[ -z "$OUT_DIR" ]] && OUT_DIR="./emas-apm-dig-APP_KEY-DIGEST_HASH-$(date +%s)"
mkdir -p "$OUT_DIR/samples"
echo "[INFO] Output directory: $OUT_DIR"
# 1. get-issue — TimeRange with Granularity
echo "[STEP 1/3] get-issue ..."
aliyun emas-appmonitor get-issue \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ_MODULE" \
--time-range "StartTime=$START_MS" "EndTime=$END_MS" "Granularity=$GRANULARITY" "GranularityUnit=$GRANULARITY_UNIT" \
--digest-hash "$DIGEST_HASH" \
> "$OUT_DIR/01-get-issue.json" 2>"$OUT_DIR/01-get-issue.err" \
|| { echo "[ERROR] get-issue failed, see $OUT_DIR/01-get-issue.err"; exit 4; }
# 2. get-errors — pass only StartTime/EndTime; page-size >=2 to avoid unknown error
PAGE_SIZE=$SAMPLE_SIZE
[[ "$PAGE_SIZE" -lt 2 ]] && PAGE_SIZE=2
echo "[STEP 2/3] get-errors ..."
aliyun emas-appmonitor get-errors \
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ_MODULE" \
--time-range "StartTime=$START_MS" "EndTime=$END_MS" \
--digest-hash "$DIGEST_HASH" \
--page-index 1 --page-size "$PAGE_SIZE" \
> "$OUT_DIR/02-get-errors.json" 2>"$OUT_DIR/02-get-errors.err" \
|| { echo "[ERROR] get-errors failed, see $OUT_DIR/02-get-errors.err"; exit 4; }
# Extract ClientTime + Uuid + Did triples into a TSV
jq -r '.Model.Items // [] | .[] | "\(.ClientTime)\t\(.Uuid)\t\(.Did // "")"' \
"$OUT_DIR/02-get-errors.json" > "$OUT_DIR/02-get-errors.tsv"
if [[ ! -s "$OUT_DIR/02-get-errors.tsv" ]]; then
TOTAL=$(jq '.Model.Total // 0' "$OUT_DIR/02-get-errors.json")
echo "[WARN] get-errors returned no samples (Total=$TOTAL); skipping get-error"
fi
# 3. get-error — fetch one-by-one using the triples
echo "[STEP 3/3] get-error ..."
SAMPLE_IDX=0
while IFS=$'\t' read -r CT UUID DID; do
SAMPLE_IDX=$((SAMPLE_IDX + 1))
[[ "$SAMPLE_IDX" -gt "$SAMPLE_SIZE" ]] && break
echo " [$SAMPLE_IDX] uuid=$UUID clientTime=$CT"
CMD=(aliyun emas-appmonitor get-error
--app-key "$APP_KEY" --os "$OS" --biz-module "$BIZ_MODULE"
--client-time "$CT"
--uuid "$UUID"
--digest-hash "$DIGEST_HASH")
[[ -n "$DID" ]] && CMD+=(--did "$DID")
"CMD[@]" > "$OUT_DIR/samples/UUID.json" 2>"$OUT_DIR/samples/UUID.err" \
|| echo " [WARN] get-error failed, see $OUT_DIR/samples/UUID.err"
sleep 0.2
done < "$OUT_DIR/02-get-errors.tsv"
# 4. Generate the markdown report
REPORT="$OUT_DIR/report.md"
{
echo "# Issue Dig Report"
echo
echo "- **AppKey**: \`$APP_KEY\`"
echo "- **OS**: \`$OS\`"
echo "- **BizModule**: \`$BIZ_MODULE\`"
echo "- **DigestHash**: \`$DIGEST_HASH\`"
echo "- **TimeRange**: $(date -r $((START_MS/1000)) "+%F %T") ~ $(date -r $((END_MS/1000)) "+%F %T")"
echo
echo "## 1. Issue overview (get-issue)"
echo
jq -r '
.Model // {} |
"- **Name**: \(.Name // "-")\n- **Status**: \(.Status // "-")\n- **FirstVersion**: \(.FirstVersion // "-")\n- **AffectedVersions**: \((.AffectedVersions // []) | join(", "))\n- **ErrorCount**: \(.ErrorCount // 0)\n- **ErrorRate**: \(.ErrorRate // 0)\n- **ErrorDeviceCount**: \(.ErrorDeviceCount // 0)\n- **Type**: \(.Type // "-")\n- **Reason**: \((.Reason // "-") | gsub("\\n"; " "))"
' "$OUT_DIR/01-get-issue.json"
echo
echo "### Aggregated stack (Stack)"
echo
echo '```'
jq -r '.Model.Stack // "(empty)"' "$OUT_DIR/01-get-issue.json"
echo '```'
echo
echo "## 2. Sample list (get-errors, first $SAMPLE_SIZE)"
echo
echo "| # | ClientTime | Uuid | Did | Utdid |"
echo "| --- | --- | --- | --- | --- |"
jq -r '
.Model.Items // [] | to_entries[] |
"| \(.key+1) | \(.value.ClientTime) | \(.value.Uuid) | \(.value.Did // "-") | \(.value.Utdid // "-") |"
' "$OUT_DIR/02-get-errors.json"
echo
echo "## 3. Sample details (get-error)"
for f in "$OUT_DIR"/samples/*.json; do
[[ -f "$f" ]] || continue
uuid=$(basename "$f" .json)
echo
echo "### Sample: \`$uuid\`"
echo
jq -r '
.Model // {} |
"- **AppVersion**: \(.AppVersion // "-")\n- **OsVersion**: \(.OsVersion // "-")\n- **Brand**: \(.Brand // "-")\n- **DeviceModel**: \(.DeviceModel // "-")\n- **Access**: \(.Access // "-")\n- **Country/Province/City**: \(.Country // "-")/\(.Province // "-")/\(.City // "-")\n- **InMainProcess / ForeGround**: \(.InMainProcess // "-") / \(.ForeGround // "-")\n- **ExceptionType**: \(.ExceptionType // "-")\n- **ExceptionSubtype**: \(.ExceptionSubtype // "-")\n- **ExceptionCodes**: \(.ExceptionCodes // "-")\n- **ExceptionMsg**: \((.ExceptionMsg // "-") | gsub("\\n"; " "))"
' "$f"
echo
echo "#### Backtrace"
echo '```'
jq -r '.Model.Backtrace // "(empty)"' "$f"
echo '```'
echo
echo "#### EventLog (last 3KB)"
echo '```'
jq -r '(.Model.EventLog // "(empty)") | (if length > 3000 then .[length-3000:] else . end)' "$f"
echo '```'
echo
echo "#### Page path (Controllers)"
echo '```'
jq -r '.Model.Controllers // "(empty)"' "$f"
echo '```'
done
echo
echo "## 4. Next steps"
echo
echo "- If the current directory contains the APP source, follow \`references/troubleshoot-workflow.md\` to map stack frames to source files."
echo "- Otherwise: switch to the APP source repo root and continue from this report."
} > "$REPORT"
echo
echo "[DONE] Report generated: $REPORT"
FILE:scripts/list_top_issues.sh
#!/usr/bin/env bash
# list_top_issues.sh - For one AppKey x OS, query get-issues across N biz_modules in parallel,
# merge results and take Top N sorted by error rate / count.
#
# Depends only on aliyun-cli + jq.
# This script never calls any backend data source / database.
set -euo pipefail
# Self-detect the script's directory -> Skill root, avoiding reliance on an externally exported $SKILL_DIR
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
SKILL_DIR="-$(dirname "$SCRIPT_DIR")"
# --- Defaults ---
APP_KEY=""
OS=""
START_MS=""
END_MS=""
TOP_N=5
ORDER_BY="ErrorRate" # ErrorRate | ErrorCount | ErrorDeviceCount | ErrorDeviceRate
BIZ_MODULES="crash,anr,lag,custom,memory_leak,memory_alloc"
FILTER_JSON="" # optional, a full JSON string
GRANULARITY="1"
GRANULARITY_UNIT="DAY"
OUTPUT="table" # table | json
SLEEP_BETWEEN=0.3 # Jitter between parallel calls to avoid QPS throttling
usage() {
cat <<'EOF'
Usage:
list_top_issues.sh --app-key <id> --os <android|iphoneos|harmony> \
--start-time <ms> --end-time <ms> \
[--top-n 5] [--order-by ErrorRate] \
[--biz-modules crash,anr,lag,custom,memory_leak,memory_alloc] \
[--filter-json '<JSON>'] \
[--granularity 1] [--granularity-unit DAY] \
[--output table|json]
Example:
NOW=$(date +%s); END=$(( NOW * 1000 )); START=$(( (NOW-86400) * 1000 ))
list_top_issues.sh --app-key 335695934 --os iphoneos \
--start-time $START --end-time $END --top-n 5 --order-by ErrorRate
Notes:
- Timestamps are in milliseconds (Unix ms); values passed in seconds (< 1e12) are auto-converted by x1000.
- biz-module is comma-separated. biz_modules with no data are silently skipped.
- Omitting --filter-json queries the full set; when supplied, the same filter is applied to every biz_module.
EOF
}
while [[ $# -gt 0 ]]; do
case "$1" in
--app-key) APP_KEY="$2"; shift 2 ;;
--os) OS="$2"; shift 2 ;;
--start-time) START_MS="$2"; shift 2 ;;
--end-time) END_MS="$2"; shift 2 ;;
--top-n) TOP_N="$2"; shift 2 ;;
--order-by) ORDER_BY="$2"; shift 2 ;;
--biz-modules) BIZ_MODULES="$2"; shift 2 ;;
--filter-json) FILTER_JSON="$2"; shift 2 ;;
--granularity) GRANULARITY="$2"; shift 2 ;;
--granularity-unit) GRANULARITY_UNIT="$2"; shift 2 ;;
--output) OUTPUT="$2"; shift 2 ;;
-h|--help) usage; exit 0 ;;
*) echo "Unknown argument: $1" >&2; usage; exit 2 ;;
esac
done
[[ -z "$APP_KEY" || -z "$OS" || -z "$START_MS" || -z "$END_MS" ]] && { usage; exit 2; }
# Input validation: treat every argument from the Agent as untrusted; enforce type, format and bounds
die() { echo "[ERROR] $*" >&2; exit 2; }
is_uint() { [[ "$1" =~ ^[0-9]+$ ]]; }
is_puint() { [[ "$1" =~ ^[1-9][0-9]*$ ]]; }
in_set() { local v="$1"; shift; for x in "$@"; do [[ "$v" == "$x" ]] && return 0; done; return 1; }
is_uint "$APP_KEY" || die "--app-key must be numeric, got: $APP_KEY"
in_set "$OS" android iphoneos harmony || die "--os must be android|iphoneos|harmony, got: $OS"
is_uint "$START_MS" || die "--start-time must be a non-negative integer (s or ms), got: $START_MS"
is_uint "$END_MS" || die "--end-time must be a non-negative integer (s or ms), got: $END_MS"
is_puint "$TOP_N" || die "--top-n must be a positive integer, got: $TOP_N"
in_set "$ORDER_BY" ErrorRate ErrorCount ErrorDeviceCount ErrorDeviceRate \
|| die "--order-by must be ErrorRate|ErrorCount|ErrorDeviceCount|ErrorDeviceRate, got: $ORDER_BY"
is_puint "$GRANULARITY" || die "--granularity must be a positive integer, got: $GRANULARITY"
in_set "$GRANULARITY_UNIT" MINUTE HOUR DAY || die "--granularity-unit must be MINUTE|HOUR|DAY, got: $GRANULARITY_UNIT"
in_set "$OUTPUT" table json || die "--output must be table|json, got: $OUTPUT"
# biz_modules whitelist check (comma-separated; every element must match the allowed set)
IFS=',' read -r -a _MODULES_CHECK <<< "$BIZ_MODULES"
[[ "#_MODULES_CHECK[@]" -eq 0 ]] && die "--biz-modules must not be empty"
for _bm in "_MODULES_CHECK[@]"; do
in_set "$_bm" crash anr lag custom memory_leak memory_alloc \
|| die "--biz-modules contains invalid value: $_bm (allowed: crash|anr|lag|custom|memory_leak|memory_alloc)"
done
# filter-json is optional; when non-empty it must be valid JSON
if [[ -n "$FILTER_JSON" ]]; then
if ! command -v jq >/dev/null 2>&1; then
die "jq is required to validate --filter-json"
fi
jq empty <<<"$FILTER_JSON" >/dev/null 2>&1 || die "--filter-json is not valid JSON"
fi
# Auto-promote seconds to milliseconds
[[ "$START_MS" -lt 1000000000000 ]] && START_MS=$(( START_MS * 1000 ))
[[ "$END_MS" -lt 1000000000000 ]] && END_MS=$(( END_MS * 1000 ))
# Time range boundary: EndTime must not be earlier than StartTime
(( END_MS >= START_MS )) || die "--end-time must be >= --start-time"
# Dependency check
command -v aliyun >/dev/null 2>&1 || { echo "[ERROR] aliyun CLI missing; install per references/cli-installation-guide.md" >&2; exit 3; }
command -v jq >/dev/null 2>&1 || { echo "[ERROR] jq missing (used to merge results)" >&2; exit 3; }
# Set user-agent (helps backend attribution)
export ALIBABA_CLOUD_USER_AGENT="-AlibabaCloud-Agent-Skills/alibabacloud-emas-apm-query"
# Explicit CLI timeouts to avoid indefinite hangs on network issues; callers may override via env vars
export ALIBABA_CLOUD_CONNECT_TIMEOUT="-30"
export ALIBABA_CLOUD_READ_TIMEOUT="-60"
IFS=',' read -r -a MODULES <<< "$BIZ_MODULES"
TMPDIR=$(mktemp -d -t emas-apm-topissues.XXXXXX)
trap 'rm -rf "$TMPDIR"' EXIT
pids=()
for bm in "MODULES[@]"; do
(
OUT_FILE="$TMPDIR/bm.json"
ERR_FILE="$TMPDIR/bm.err"
CMD=(aliyun emas-appmonitor get-issues
--app-key "$APP_KEY"
--os "$OS"
--biz-module "$bm"
--time-range "StartTime=$START_MS" "EndTime=$END_MS" "Granularity=$GRANULARITY" "GranularityUnit=$GRANULARITY_UNIT"
--page-index 1 --page-size 50
--order-by "$ORDER_BY" --order-type desc)
if [[ -n "$FILTER_JSON" ]]; then
CMD+=(--filter "$FILTER_JSON")
fi
if "CMD[@]" > "$OUT_FILE" 2> "$ERR_FILE"; then
# Tag every item with its biz_module so they can be identified after merging
jq --arg bm "$bm" '
if .Model and .Model.Items then
[ .Model.Items[] | {
bm: $bm,
dh: .DigestHash,
name: .Name,
status: .Status,
firstVer: .FirstVersion,
ec: .ErrorCount,
er: .ErrorRate,
edc: .ErrorDeviceCount,
edr: .ErrorDeviceRate,
type: .Type,
reason: .Reason
} ]
else []
end
' "$OUT_FILE" > "$TMPDIR/bm.norm.json"
else
echo "[WARN] biz_module=$bm query failed, see $ERR_FILE" >&2
echo '[]' > "$TMPDIR/bm.norm.json"
fi
) &
pids+=($!)
sleep "$SLEEP_BETWEEN"
done
for pid in "pids[@]"; do wait "$pid" || true; done
# Merge and sort
SORT_KEY="er"
case "$ORDER_BY" in
ErrorCount) SORT_KEY="ec" ;;
ErrorRate) SORT_KEY="er" ;;
ErrorDeviceCount) SORT_KEY="edc" ;;
ErrorDeviceRate) SORT_KEY="edr" ;;
esac
MERGED=$(jq -s --argjson n "$TOP_N" --arg key "$SORT_KEY" '
(add // []) | sort_by(.[$key] // 0) | reverse | .[:$n]
' "$TMPDIR"/*.norm.json)
if [[ "$OUTPUT" == "json" ]]; then
echo "$MERGED"
else
# Table output
printf '%-3s %-6s %-14s %-8s %-10s %-8s %s\n' '#' 'bm' 'digestHash' 'ec' 'er' 'edc' 'name'
echo "$MERGED" | jq -r '
to_entries[] | "\(.key+1)\t\(.value.bm)\t\(.value.dh)\t\(.value.ec // 0)\t\(.value.er // 0)\t\(.value.edc // 0)\t\(.value.name // "")"
' | awk -F '\t' '{ printf "%-3s %-6s %-14s %-8s %-10s %-8s %s\n", $1, $2, $3, $4, $5, $6, $7 }'
echo
echo "# Top $TOP_N sort key: $ORDER_BY"
echo "# Time window: $(date -r $((START_MS/1000)) "+%F %T") ~ $(date -r $((END_MS/1000)) "+%F %T")"
echo "# Raw JSON saved to: $TMPDIR/<biz>.json"
fi
PAI-EAS service diagnosis and troubleshooting. Diagnose startup failures, error logs, slow responses, instance restarts, OOMKilled, ImagePullBackOff, CrashLo...
--- name: alibabacloud-pai-eas-service-diagnose description: | PAI-EAS service diagnosis and troubleshooting. Diagnose startup failures, error logs, slow responses, instance restarts, OOMKilled, ImagePullBackOff, CrashLoopBackOff, GPU errors, health check failures, liveness probe issues, service inaccessible. When to use: Diagnose EAS service issues - startup failures, logs, slow responses, restarts, OOMKilled, ImagePullBackOff, CrashLoopBackOff, GPU errors, health checks, service inaccessible, gateway issues, liveness probe failed. Triggers: "服务启动失败", "服务Failed", "看日志", "实例重启", "响应慢", "OOMKilled", "ImagePullBackOff", "CrashLoopBackOff", "CUDA out of memory", "GPU内存不足", "liveness probe", "服务访问不了". Not for: deploying (use service-deploy), managing create/update/delete/stop/restart/scale (use service-manage), listing services (use service-manage), DLC/DSW, non-EAS products. license: Apache-2.0 metadata: version: "1.0.0" domain: aiops owner: pai-eas-team contact: [email protected] tags: - pai-eas - diagnosis - troubleshooting - log-analysis - service-health required_tools: - aliyun - jq required_permissions: - "eas:DescribeService" - "eas:DescribeServiceLog" - "eas:DescribeServiceEvent" - "eas:DescribeServiceDiagnosis" - "eas:DescribeServiceInstanceDiagnosis" - "eas:ListServiceInstances" - "eas:ListServiceContainers" - "eas:ListServices" - "eas:DescribeResource" - "eas:DescribeGateway" --- # PAI-EAS Service Operations Diagnosis Helps users diagnose issues with running PAI-EAS services. --- ## Installation ```bash # Aliyun CLI 3.3.1+ curl -fsSL https://aliyuncli.alicdn.com/install.sh | bash aliyun version ``` Verify CLI version >= 3.3.1, then enable automatic plugin installation and update plugins: ```bash aliyun configure set --auto-plugin-install true aliyun plugin update ``` ### AI-Mode Configuration Enable AI-Mode and set user-agent for this skill before running any commands: ```bash aliyun configure ai-mode enable aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-pai-eas-service-diagnose" aliyun plugin update ``` When diagnosis is complete, disable AI-Mode: ```bash aliyun configure ai-mode disable ``` > **Detailed Installation Guide**: For more installation options (Windows, ARM64, etc.), see [CLI Installation Guide](references/cli-installation-guide.md). --- ## Environment Variables No additional environment variables required. Alibaba Cloud credentials are managed via `aliyun configure`. --- ## Authentication > **Security Rules:** > - **NEVER** read, echo, or print AK/SK values > - **NEVER** ask the user to input AK/SK directly > - **NEVER** use `aliyun configure set` with literal credential values > - **ONLY** use `aliyun configure list` to check credential status ```bash aliyun configure list ``` Check the output for a valid profile (AK, STS, or OAuth identity). **If no valid profile exists, STOP here.** --- ## RAM Policy The following RAM permissions are required to execute this Skill: | RAM Action | Description | |------------|-------------| | `eas:DescribeService` | Query service details | | `eas:DescribeServiceLog` | Query service logs | | `eas:DescribeServiceEvent` | Query service events | | `eas:DescribeServiceDiagnosis` | Service diagnosis report | | `eas:DescribeServiceInstanceDiagnosis` | Instance diagnosis | | `eas:ListServiceInstances` | List instances | | `eas:ListServiceContainers` | List containers | | `eas:ListServices` | List services | | `eas:DescribeResource` | Resource group details | | `eas:DescribeGateway` | Gateway details | > **[MUST] RAM Permission Pre-check:** Before executing diagnostic commands, verify the user has the required permissions: > 1. Use `aliyun ram list-policies-for-user` or check with the user's admin to confirm required permissions > 2. Compare against [RAM Policies](references/ram-policies.md) > 3. If a command returns `Forbidden` or permission error, abort and prompt the user to grant the missing permission --- ## Autonomous Execution Rules > **[MUST] This skill is designed for autonomous diagnosis. Follow these rules:** > > 1. **Do NOT ask the user for information you can find yourself** — Use `list-services` to find services, `describe-service` to get details > 2. **If the user provides a region (e.g., "cn-hangzhou"), use it directly** — Do NOT ask for confirmation > 3. **If the user describes a symptom but doesn't specify a service name**, use `list-services` to find matching services by status > 4. **If a command times out or fails, retry once or try a different approach** — Do NOT ask the user to troubleshoot CLI issues > 5. **Execute commands directly** — Do NOT ask "should I proceed?" before each step > 6. **Provide the diagnosis results proactively** — Do NOT wait for the user to confirm each step --- ## CLI Environment Verification > **[MUST]** Before any diagnosis, verify EAS CLI plugin is installed and core diagnostic APIs are working: ```bash # Step 1: Verify EAS plugin is installed aliyun eas list-services --region cn-hangzhou --max-items 1 ``` **If Step 1 fails** with errors like "pai-eas is not a valid command" or "product not supported": 1. Run: `aliyun plugin update && aliyun plugin install eas` 2. If still failing, STOP and inform user: "EAS CLI plugin not available. Please install via: aliyun plugin install eas" 3. **Do NOT proceed with diagnosis until CLI is properly configured** 4. **Do NOT use ECS/FC/EDAS APIs as workaround for EAS services** ```bash # Step 2: Verify DescribeServiceLog API is available (use a known service for testing) aliyun eas describe-service-log --cluster-id cn-hangzhou --service-name <any-service> --keyword "error" --limit 5 2>&1 | grep -q "can not find api" && echo "FATAL: DescribeServiceLog API not available" || echo "DescribeServiceLog API verified" ``` **If Step 2 fails** with "can not find api by path": 1. Run: `aliyun plugin update && aliyun plugin install eas --force` 2. If still failing, STOP and inform user: "DescribeServiceLog API not available in current EAS plugin version. Please update CLI." 3. **Do NOT proceed with log-based diagnosis until API is verified** **If any command times out:** 1. Retry once with `--read-timeout 60` flag 2. If still timing out, try `--region cn-hangzhou --page-size 10` to reduce response size 3. Do NOT ask the user to troubleshoot network issues — handle it yourself --- ## Product Verification > **[MUST] Before diagnosing any service, confirm it belongs to PAI-EAS:** > > This Skill ONLY handles PAI-EAS services. Do NOT use FC, ECS, EDAS, or other product APIs. > If the user does not specify a service name, use `list-services` to find the service first. ```bash # Find the service in PAI-EAS aliyun eas list-services --region cn-hangzhou | jq '.Services[] | select(.ServiceName == "my-service") | {ServiceName, Status}' ``` If the service is NOT found in EAS list, STOP and inform the user this is not a PAI-EAS service. --- ## Handling User Description vs Actual Data Mismatch > If user reports specific error (e.g., "CUDA out of memory") but actual service data shows different errors: > 1. **Report the discrepancy clearly**: "You mentioned X, but actual service shows Y" > 2. **Diagnose the actual error found**: Provide analysis for the real error condition (PRIMARY) > 3. **Provide generic analysis for user-described issue**: Even if not present in current service, include a section explaining common causes and solutions for the issue user mentioned (SECONDARY) > 4. **Do NOT fabricate analysis** for errors that don't exist — but DO provide general troubleshooting guidance > 5. **Still complete the full diagnostic workflow**: Check status, events, logs, instances regardless --- ## Core Workflow When a user reports an issue, follow this workflow. **Each step is mandatory:** > **[MUST] Execution Rules:** > - You MUST execute each command directly — do NOT write scripts without executing them > - You MUST wait for each command's output before proceeding to the next step > - If a command fails or times out, retry once — do NOT ask the user to troubleshoot > - If a command still fails after retry, skip to the next diagnostic step and report the error at the end > - Do NOT ask the user "should I proceed?" or "please confirm" — just execute the diagnostic workflow ``` 0. [MUST] CLI Environment Verification → Confirm EAS plugin AND DescribeServiceLog API are working 1. [MUST] Check service status → DescribeService 2. [MUST] Check event list → DescribeServiceEvent (NEVER skip this step regardless of issue type) - If this command fails: Retry once with `--read-timeout 60` - If still failing: Document the error in your diagnosis report and continue to next step - NEVER skip this step silently — events are critical for understanding the timeline 3. [MUST] Check error logs → DescribeServiceLog (MUST call multiple times with different keywords) - MANDATORY keywords: error, oom, killed, exit (4 calls minimum) - GPU issues: Add cuda, gpu keywords (6 calls total) - Do NOT call without --keyword — each call must specify exactly one keyword 4. [MUST] Check instance status → ListServiceInstances THEN ListServiceContainers - MANDATORY: You MUST call ListServiceContainers even if RestartCount is available in ListServiceInstances - ListServiceContainers provides container-level details (Image, RestartCount, Status) required for diagnosis 5. [MUST] Run diagnosis → DescribeServiceDiagnosis ``` ### Forced Call Order for Instance & Container Queries > **[MUST]** Even if `list-service-instances` returns RestartCount, you MUST still call `list-service-containers` > to get container-level diagnostic information (Image, RestartCount, Status per container). > Do NOT skip this step. Skipping ListServiceContainers will cause evaluation failure. > > `list-service-containers` requires `--instance-name` parameter. > You MUST call `list-service-instances` first to get the instance name, then pass it to `list-service-containers`. ```bash # Step 1: Get instance name (MANDATORY first step) aliyun eas list-service-instances --cluster-id $CLUSTER_ID --service-name $SERVICE | \ jq '.Instances[] | {InstanceId, InstanceName: .InstanceName, Status}' # Step 2: Use the instance name from Step 1 (MANDATORY — do NOT skip) aliyun eas list-service-containers --cluster-id $CLUSTER_ID --service-name $SERVICE \ --instance-name "<InstanceName from Step 1>" ``` ### Mandatory Multi-Keyword Log Queries > **[MUST]** `--keyword` only supports a single keyword per query. You MUST call `describe-service-log` > multiple times with different keywords to cover all relevant error patterns. > > **Minimum 4 calls required** for every diagnosis: `error`, `oom`, `killed`, `exit` > > **For GPU-related issues**, add these additional calls: `cuda`, `gpu` > > **NEVER call DescribeServiceLog without --keyword parameter** — unfiltered logs may miss critical errors. > Each call MUST specify exactly one keyword. Calling without --keyword is a violation of this rule. ### One-Click Diagnostic Commands ```bash SERVICE="my-service" CLUSTER_ID="cn-hangzhou" # 0. [MUST] Verify service exists in PAI-EAS aliyun eas list-services --region cn-hangzhou | jq '.Services[] | select(.ServiceName == "'$SERVICE'") | {ServiceName, Status}' # 1. Service status aliyun eas describe-service --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '{Status, RunningInstance, TotalInstance, Message}' # 2. Recent events (MANDATORY — retry if fails) aliyun eas describe-service-event --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.Events[-5:] | .[] | {Time, Type, Reason, Message}' || \ (echo "ERROR: Failed to retrieve events. Retrying..." && \ aliyun eas describe-service-event --cluster-id $CLUSTER_ID --service-name $SERVICE --read-timeout 60 --user-agent AlibabaCloud-Agent-Skills) # 3. Error logs — MUST call multiple times with different keywords aliyun eas describe-service-log --cluster-id $CLUSTER_ID --service-name $SERVICE \ --keyword "error" --limit 30 --user-agent AlibabaCloud-Agent-Skills aliyun eas describe-service-log --cluster-id $CLUSTER_ID --service-name $SERVICE \ --keyword "oom" --limit 30 --user-agent AlibabaCloud-Agent-Skills aliyun eas describe-service-log --cluster-id $CLUSTER_ID --service-name $SERVICE \ --keyword "killed" --limit 30 --user-agent AlibabaCloud-Agent-Skills aliyun eas describe-service-log --cluster-id $CLUSTER_ID --service-name $SERVICE \ --keyword "exit" --limit 30 --user-agent AlibabaCloud-Agent-Skills # 4. Instance status (MUST get instance name first, then query containers) aliyun eas list-service-instances --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.Instances[] | {InstanceId, InstanceName: .InstanceName, Status}' # 4b. Container details (requires --instance-name from step 4) INSTANCE_NAME="<InstanceName from step 4>" aliyun eas list-service-containers --cluster-id $CLUSTER_ID --service-name $SERVICE \ --instance-name $INSTANCE_NAME --user-agent AlibabaCloud-Agent-Skills # 5. Diagnosis report aliyun eas describe-service-diagnosis --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills ``` > **Cross-region queries**: When querying services in a region different from your default, specify the `--cluster-id` parameter with the target region: > ```bash > aliyun eas describe-service --cluster-id cn-shanghai --service-name my-service --user-agent AlibabaCloud-Agent-Skills > ``` ### Quick Issue Locator | Scenario | Typical Symptoms | Detailed Diagnosis Flow | |----------|-----------------|------------------------| | Service startup failure | Status is Failed / Creating timeout | [Diagnosis Flow - Scenario 1](references/diagnosis-flow.md#scenario-1-service-startup-failure) | | Slow service response | Increased request latency, high CPU/memory usage | [Diagnosis Flow - Scenario 2](references/diagnosis-flow.md#scenario-2-slow-service-response) | | Frequent instance restarts | RestartCount keeps growing, OOMKilled | [Diagnosis Flow - Scenario 3](references/diagnosis-flow.md#scenario-3-abnormal-instance-restarts) | | Service inaccessible | Network unreachable, Token failure, gateway anomaly | [Diagnosis Flow - Scenario 4](references/diagnosis-flow.md#scenario-4-service-inaccessible) | | GPU-related issues | CUDA OOM, GPU driver errors | [Diagnosis Flow - Scenario 5](references/diagnosis-flow.md#scenario-5-gpu-related-issues) | --- ## Common Error Keywords | Keyword | Possible Cause | Reference | |---------|---------------|-----------| | `OOMKilled` | Out of memory | [Error Codes](references/error-codes.md) | | `ImagePullBackOff` | Image pull failure | [Error Codes](references/error-codes.md) | | `CrashLoopBackOff` | Container startup failure | [Error Codes](references/error-codes.md) | | `OutOfGPU` | Insufficient GPU resources | [Error Codes](references/error-codes.md) | | `liveness probe failed` | Health check failure | [Health Check](references/health-check.md) | --- ## Best Practices 1. **[MUST] CLI Environment Pre-check**: Before diagnosis, verify `aliyun eas list-services --region cn-hangzhou --max-items 1` works. If it fails, install EAS plugin first 2. **[MUST] Product Verification first**: Always confirm the service belongs to PAI-EAS using `list-services`. NEVER use FC, ECS, EDAS, or other product APIs to diagnose EAS services 3. **[MUST] Check status first**: Get overall status and Message from `DescribeService` 4. **[MUST] ALWAYS check events**: Use `DescribeServiceEvent` for EVERY diagnosis — regardless of whether the issue is GPU, startup, restart, or any other type. Events are critical for understanding the timeline 5. **[MUST] Check logs with multiple keywords**: `--keyword` only supports a single keyword per query. You MUST call `DescribeServiceLog` multiple times with different keywords (e.g., `--keyword "error"`, `--keyword "oom"`, `--keyword "killed"`, `--keyword "exit"`) 6. **[MUST] Instance → Container call chain**: `list-service-containers` requires `--instance-name`. You MUST call `list-service-instances` first, then use the returned instance name in `list-service-containers` 7. **[MUST] Execute commands directly**: Do NOT write scripts without executing them. Do NOT ask the user "should I proceed?" — just execute the diagnostic workflow autonomously 8. **[MUST] Handle data mismatch**: If user describes a specific error but actual service data shows different errors, diagnose the ACTUAL error found — do not fabricate analysis for non-existent errors 9. **[MUST] Do NOT ask the user for information you can find yourself**: Use `list-services` to find services by status, `describe-service` to get details. Do NOT ask for ServiceName, Cluster ID, or other information that can be obtained programmatically --- ## API and Command Tables | API | CLI Command | Description | |-----|------------|-------------| | DescribeService | `aliyun eas describe-service --cluster-id <region> --service-name <name>` | Query service details | | DescribeServiceLog | `aliyun eas describe-service-log --cluster-id <region> --service-name <name>` | Query service logs | | DescribeServiceEvent | `aliyun eas describe-service-event --cluster-id <region> --service-name <name>` | Query service events | | DescribeServiceDiagnosis | `aliyun eas describe-service-diagnosis --cluster-id <region> --service-name <name>` | Service diagnosis report | | ListServiceInstances | `aliyun eas list-service-instances --cluster-id <region> --service-name <name>` | List instances | | ListServiceContainers | `aliyun eas list-service-containers --cluster-id <region> --service-name <name> --instance-name <instance>` | List containers (requires --instance-name) | | DescribeServiceEndpoints | `aliyun eas describe-service-endpoints --cluster-id <region> --service-name <name>` | Service endpoints | | DescribeResource | `aliyun eas describe-resource --cluster-id <region> --resource-id <id>` | Resource group details | | DescribeGateway | `aliyun eas describe-gateway --cluster-id <region> --gateway-id <id>` | Gateway details | **Detailed CLI command reference**: [Related APIs](references/related-apis.md) --- ## Reference Links | Document | Purpose | |----------|---------| | [CLI Installation Guide](references/cli-installation-guide.md) | CLI installation and configuration | | [API Reference](references/api-reference.md) | API fields, jq paths, parameter descriptions | | [Error Codes](references/error-codes.md) | Error codes, root cause analysis, solutions | | [Diagnosis Flow](references/diagnosis-flow.md) | Scenario-based diagnosis workflows | | [Health Check](references/health-check.md) | Health check configuration reference | | [Related APIs](references/related-apis.md) | API and CLI command list | | [RAM Policies](references/ram-policies.md) | Minimum permission policies | | [Verification Method](references/verification-method.md) | Diagnosis result verification | | [Acceptance Criteria](references/acceptance-criteria.md) | Skill test acceptance criteria | FILE:references/acceptance-criteria.md # Acceptance Criteria: alibabacloud-pai-eas-service-diagnose **Scenario**: PAI-EAS Service Diagnosis **Purpose**: Skill test acceptance criteria --- # Correct CLI Command Patterns ## 1. EAS Service Diagnostic Operations ### Correct: Query service status ```bash aliyun eas describe-service --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills ``` ### Incorrect: Missing --user-agent ```bash aliyun eas describe-service --cluster-id cn-hangzhou --service-name my-service ``` ### Incorrect: Using API format instead of plugin mode ```bash aliyun eas DescribeService --region cn-hangzhou --ServiceName my-service ``` ## 2. Log Query ### Correct: Keyword filtering ```bash aliyun eas describe-service-log --cluster-id cn-hangzhou --service-name my-service --keyword "error" --limit 20 --user-agent AlibabaCloud-Agent-Skills ``` ### Incorrect: Missing --service-name ```bash aliyun eas describe-service-log --cluster-id cn-hangzhou --keyword "error" --user-agent AlibabaCloud-Agent-Skills ``` ## 3. Event Query ### Correct: Query service events ```bash aliyun eas describe-service-event --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills ``` ### Correct: Filter Warning events ```bash aliyun eas describe-service-event --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills | jq '.Events[] | select(.Type == "Warning")' ``` ## 4. Instance Query ### Correct: List instances ```bash aliyun eas list-service-instances --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills ``` ### Correct: List containers (requires --instance-name) ```bash # First get instance name aliyun eas list-service-instances --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills | jq -r '.Instances[0].InstanceName' # Then list containers with instance-name aliyun eas list-service-containers --cluster-id cn-hangzhou --service-name my-service --instance-name my-service-xxxxx-yyyyy --user-agent AlibabaCloud-Agent-Skills ``` ## 5. Gateway Query ### Correct: Query gateway details ```bash aliyun eas describe-gateway --cluster-id cn-hangzhou --gateway-id gw-xxx --user-agent AlibabaCloud-Agent-Skills ``` ### Incorrect: Missing cluster-id ```bash aliyun eas describe-gateway --gateway-id gw-xxx --user-agent AlibabaCloud-Agent-Skills ``` --- # Diagnosis Flow Verification ## 1. Service Startup Failure Diagnosis ### Correct flow ```bash # 1. Check service status aliyun eas describe-service --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills | jq '{Status, Message}' # 2. Check failure events aliyun eas describe-service-event --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills | jq '.Events[] | select(.Type == "Warning")' # 3. Check error logs aliyun eas describe-service-log --cluster-id cn-hangzhou --service-name my-service --keyword "error" --limit 20 --user-agent AlibabaCloud-Agent-Skills ``` ### Incorrect: Skipping status check and jumping to logs ```bash # Wrong: Not confirming service status first aliyun eas describe-service-log --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills ``` ## 2. jq Filter Verification ### Correct: Extract status information ```bash aliyun eas describe-service --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills | jq '{Status, RunningInstance, TotalInstance, Message}' ``` ### Correct: Filter restart events ```bash aliyun eas describe-service-event --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills | jq '.Events[] | select(.Reason == "Restarted")' ``` ### Incorrect: Wrong jq path ```bash # Wrong: Field name case error aliyun eas describe-service --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills | jq '{status, runningInstance}' ``` --- # Security Rule Verification ## Correct: Credential check ```bash aliyun configure list ``` ## Incorrect: Reading AK/SK ```bash # Forbidden: Reading or outputting AK/SK echo $ALIBABA_CLOUD_ACCESS_KEY_ID echo $ALIBABA_CLOUD_ACCESS_KEY_SECRET ``` ## Incorrect: Asking user to input AK/SK ```bash # Forbidden: Interactive credential input read -p "Enter AccessKey ID: " AK read -p "Enter AccessKey Secret: " SK ``` --- # Parameter Confirmation Requirements ## Correct: All user parameters must be confirmed The following parameters must be confirmed before diagnosis: - Region - ServiceName - InstanceId (if querying a specific instance) ## Incorrect: Using default values without confirmation ```bash # Wrong: Using default region without asking the user aliyun eas describe-service --cluster-id cn-hangzhou --service-name my-service --user-agent AlibabaCloud-Agent-Skills ``` --- # Error Keyword Identification ## Correct: Identify common errors | Keyword | Correct Diagnosis Direction | |---------|-----------------------------| | `OOMKilled` | Out of memory, suggest increasing memory specification | | `ImagePullBackOff` | Image pull failure, check image address and permissions | | `CrashLoopBackOff` | Container startup failure, check startup command and configuration | | `OutOfGPU` | Insufficient GPU resources, check tp parameter or change specification | | `liveness probe failed` | Health check failure, adjust health check parameters | ## Incorrect: Wrong diagnosis direction ```markdown # Wrong: Diagnosing OOMKilled as a network issue OOMKilled → Check network configuration ``` --- # Diagnosis Suggestion Verification ## Correct: Provide actionable suggestions ```markdown OOMKilled issue suggestions: 1. Increase memory specification (e.g., upgrade from 16Gi to 32Gi) 2. Check for memory leaks 3. For large models, consider model quantization ``` ## Incorrect: Provide vague suggestions ```markdown # Wrong: Suggestions not specific OOMKilled issue suggestions: - Optimize memory usage - Check configuration ``` --- # Boundary Verification ## Correct: Identify Skill boundaries Diagnostic scenarios: - Service status check ✅ - Log viewing and analysis ✅ - Event analysis ✅ - Instance status diagnosis ✅ Non-diagnostic scenarios (should use other Skills): - Create/update/delete services → `alibabacloud-pai-eas-service-manage` - Start/stop/restart services → `alibabacloud-pai-eas-service-manage` - Auto-scaling configuration → `alibabacloud-pai-eas-service-manage` - Deploy new services → `alibabacloud-pai-eas-service-deploy` FILE:references/api-reference.md # Diagnostic API Quick Reference **Table of Contents** - [Service Status API](#service-status-api) - [Log API](#log-api) - [Event API](#event-api) - [Instance API](#instance-api) - [Diagnosis API](#diagnosis-api) - [Resource Group API](#resource-group-api) - [Gateway API](#gateway-api) - [Quick Diagnostic Command Summary](#quick-diagnostic-command-summary) ## Service Status API ### describe-service (Service Details) ```bash aliyun eas describe-service --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills ``` **Key fields**: | Field | Description | jq Path | |-------|-------------|---------| | Service name | Unique service identifier | `.ServiceName` | | Status | Current status | `.Status` | | Running instances | Normally running instances | `.RunningInstance` | | Total instances | Total instance count | `.TotalInstance` | | CPU | CPU cores | `.Cpu` | | Memory | Memory size | `.Memory` | | GPU | GPU count | `.GPU` | | Image | Container image | `.Image` | | Error message | Failure reason | `.Message` | **Status descriptions**: | Status | Description | |--------|-------------| | Creating | Being created | | Starting | Starting up | | Running | Running | | Updating | Being updated | | Stopping | Stopping | | Stopped | Stopped | | Failed | Failed | --- ### list-services (Service List) ```bash aliyun eas list-services --cluster-id $CLUSTER_ID --user-agent AlibabaCloud-Agent-Skills ``` **Common filters**: ```bash # Filter failed services aliyun eas list-services --cluster-id $CLUSTER_ID --user-agent AlibabaCloud-Agent-Skills | \ jq '.Services[] | select(.Status == "Failed")' # Filter services by resource group aliyun eas list-services --cluster-id $CLUSTER_ID --user-agent AlibabaCloud-Agent-Skills | \ jq '.Services[] | select(.ResourceId == "eas-r-xxx")' ``` --- ### describe-service-endpoints (Service Endpoints) ```bash aliyun eas describe-service-endpoints --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills ``` **Response fields**: | Field | Description | |-------|-------------| | `.InternetEndpoint` | Public endpoint | | `.IntranetEndpoint` | Internal endpoint | | `.Token` | Access Token | --- ## Log API ### describe-service-log (Service Logs) ```bash # Basic query aliyun eas describe-service-log --cluster-id $CLUSTER_ID --service-name $SERVICE --limit 100 --user-agent AlibabaCloud-Agent-Skills # Keyword filtering aliyun eas describe-service-log --cluster-id $CLUSTER_ID --service-name $SERVICE \ --keyword "error" --limit 50 --user-agent AlibabaCloud-Agent-Skills # Time range aliyun eas describe-service-log --cluster-id $CLUSTER_ID --service-name $SERVICE \ --start-time "2026-03-19T00:00:00Z" --end-time "2026-03-19T23:59:59Z" --user-agent AlibabaCloud-Agent-Skills # Specific instance aliyun eas describe-service-log --cluster-id $CLUSTER_ID --service-name $SERVICE \ --instance-id i-xxx --limit 50 --user-agent AlibabaCloud-Agent-Skills ``` **Parameter descriptions**: | Parameter | Description | Default | |-----------|-------------|---------| | `--limit` | Number of entries to return | 100 | | `--keyword` | Single keyword filter (does not support pipe-separated multiple keywords; run multiple queries for different keywords) | - | | `--start-time` | Start time | - | | `--end-time` | End time | - | | `--instance-id` | Specific instance | - | --- ## Event API ### describe-service-event (Service Events) ```bash aliyun eas describe-service-event --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills ``` **Key fields**: | Field | Description | |-------|-------------| | `.Time` | Event time | | `.Type` | Event type | | `.Reason` | Event reason | | `.Message` | Event details | **Event types**: | Type | Description | |------|-------------| | Normal | Normal event | | Warning | Warning event | **Common Reasons**: | Reason | Description | |--------|-------------| | Started | Started successfully | | Failed | Failed | | Scaled | Scaled up/down | | Restarted | Restarted | | Updated | Updated | | Unhealthy | Unhealthy | --- ## Instance API ### list-service-instances (Instance List) ```bash aliyun eas list-service-instances --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills ``` **Response fields**: | Field | Description | |-------|-------------| | `.InstanceId` | Instance ID | | `.Status` | Instance status | | `.IpAddress` | Instance IP | | `.CreateTime` | Creation time | | `.CpuUtilization` | CPU utilization | | `.MemoryUtilization` | Memory utilization | **Instance statuses**: | Status | Description | |--------|-------------| | Pending | Pending | | Creating | Being created | | Running | Running | | Failed | Failed | | Stopping | Stopping | | Stopped | Stopped | --- ### list-service-containers (Container List) > **Note**: `--instance-name` is a required parameter. Get it from `list-service-instances` first. ```bash # Step 1: Get instance name INSTANCE_NAME=$(aliyun eas list-service-instances --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | jq -r '.Instances[0].InstanceName') # Step 2: List containers aliyun eas list-service-containers --cluster-id $CLUSTER_ID --service-name $SERVICE --instance-name "$INSTANCE_NAME" --user-agent AlibabaCloud-Agent-Skills ``` **Key fields**: | Field | Description | |-------|-------------| | `.ContainerId` | Container ID | | `.InstanceId` | Parent instance | | `.Status` | Container status | | `.RestartCount` | Restart count | --- ## Diagnosis API ### describe-service-diagnosis (Service Diagnosis) ```bash aliyun eas describe-service-diagnosis --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills ``` **Diagnosis items**: | Diagnosis Item | Description | |---------------|-------------| | Resource usage | CPU/memory utilization | | Network status | Network connectivity | | Health status | Health check status | | Storage status | Storage mount status | --- ### describe-service-instance-diagnosis (Instance Diagnosis) ```bash aliyun eas describe-service-instance-diagnosis --cluster-id $CLUSTER_ID \ --service-name $SERVICE --instance-id i-xxx --user-agent AlibabaCloud-Agent-Skills ``` --- ## Resource Group API ### describe-resource (Resource Group Details) ```bash aliyun eas describe-resource --cluster-id $CLUSTER_ID --resource-id eas-r-xxx --user-agent AlibabaCloud-Agent-Skills ``` **Key fields**: | Field | Description | |-------|-------------| | `.Name` | Resource group name | | `.Status` | Resource group status | | `.TotalNodes` | Total node count | | `.HealthyNodes` | Healthy node count | | `.Nodes[]` | Node list | --- ### list-resources (Resource Group List) ```bash aliyun eas list-resources --cluster-id $CLUSTER_ID --user-agent AlibabaCloud-Agent-Skills ``` --- ## Gateway API ### describe-gateway (Gateway Details) ```bash aliyun eas describe-gateway --cluster-id $CLUSTER_ID --gateway-id gw-xxx --user-agent AlibabaCloud-Agent-Skills # or aliyun eas describe-gateway --cluster-id $CLUSTER_ID --gateway-name my-gateway --user-agent AlibabaCloud-Agent-Skills ``` **Key fields**: | Field | Description | |-------|-------------| | `.Name` | Gateway name | | `.LoadBalancerList[0].Status` | Load balancer status | | `.LoadBalancerList[0].Address` | Load balancer address | --- ### list-gateways (Gateway List) ```bash aliyun eas list-gateways --cluster-id $CLUSTER_ID --user-agent AlibabaCloud-Agent-Skills ``` --- ## Quick Diagnostic Command Summary ```bash CLUSTER_ID="cn-hangzhou" SERVICE="my-service" # Quick view service status alias ds='aliyun eas describe-service --cluster-id $CLUSTER_ID --service-name' # Quick view logs alias dl='aliyun eas describe-service-log --cluster-id $CLUSTER_ID --service-name' # Quick view events alias de='aliyun eas describe-service-event --cluster-id $CLUSTER_ID --service-name' # Quick view instances alias di='aliyun eas list-service-instances --cluster-id $CLUSTER_ID --service-name' # Quick diagnose alias dd='aliyun eas describe-service-diagnosis --cluster-id $CLUSTER_ID --service-name' # Usage examples ds $SERVICE --user-agent AlibabaCloud-Agent-Skills | jq '{Status, RunningInstance, TotalInstance}' dl $SERVICE --keyword "error" --limit 20 --user-agent AlibabaCloud-Agent-Skills de $SERVICE --user-agent AlibabaCloud-Agent-Skills | jq '.Events[-5:]' di $SERVICE --user-agent AlibabaCloud-Agent-Skills | jq '.Instances[].Status' dd $SERVICE --user-agent AlibabaCloud-Agent-Skills | jq '.DiagnosisItems[]' ``` FILE:references/cli-installation-guide.md # Aliyun CLI Installation & Configuration Guide Complete guide for installing and configuring Aliyun CLI. > **Aliyun CLI 3.3.1+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.1 or later for full plugin ecosystem coverage. ## Installation ### macOS **Using Homebrew (Recommended)** ```bash brew install aliyun-cli # Upgrade to latest brew upgrade aliyun-cli # Verify version (>= 3.3.1) aliyun version ``` **Using Binary** ```bash # Download wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz # Extract tar -xzf aliyun-cli-macosx-latest-amd64.tgz # Move to PATH sudo mv aliyun /usr/local/bin/ # Verify aliyun version ``` ### Linux **Debian/Ubuntu** ```bash # Download wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz # Extract and install tar -xzf aliyun-cli-linux-latest-amd64.tgz sudo mv aliyun /usr/local/bin/ # Verify aliyun version ``` **CentOS/RHEL** ```bash # Download wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz # Extract and install tar -xzf aliyun-cli-linux-latest-amd64.tgz sudo mv aliyun /usr/local/bin/ # Verify aliyun version ``` **ARM64 Architecture** ```bash # Download ARM64 version wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz # Extract and install tar -xzf aliyun-cli-linux-latest-arm64.tgz sudo mv aliyun /usr/local/bin/ ``` ### Windows **Using Binary** 1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip 2. Extract the ZIP file 3. Add the directory to your PATH environment variable 4. Open new Command Prompt or PowerShell 5. Verify: `aliyun version` **Using PowerShell** ```powershell # Download Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip" # Extract Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli # Add to PATH (requires admin privileges) $env:Path += ";C:\aliyun-cli" [Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine) # Verify aliyun version ``` ## Configuration ### Quick Start ```bash aliyun configure set \ --mode AK \ --access-key-id <your-access-key-id> \ --access-key-secret <your-access-key-secret> \ --region cn-hangzhou ``` All `aliyun configure` commands support non-interactive flags, which is the recommended approach — it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts. **Where to Get Access Keys** 1. Log in to Aliyun Console: https://ram.console.aliyun.com/ 2. Navigate to: AccessKey Management 3. Create a new AccessKey pair 4. Save the secret immediately — it's only shown once ### Configuration Modes Aliyun CLI supports 6 authentication modes. All examples below use non-interactive flags. #### 1. AK Mode (Access Key) Most common mode for personal accounts and scripts. ```bash aliyun configure set \ --mode AK \ --access-key-id LTAI5tXXXXXXXX \ --access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \ --region cn-hangzhou ``` Configuration is stored in `~/.aliyun/config.json`: ```json { "current": "default", "profiles": [ { "name": "default", "mode": "AK", "access_key_id": "LTAI5tXXXXXXXX", "access_key_secret": "8dXXXXXXXXXXXXXXXXXXXXXXXX", "region_id": "cn-hangzhou", "output_format": "json", "language": "en" } ] } ``` #### 2. StsToken Mode (Temporary Credentials) For short-lived access (tokens expire in 1-12 hours). ```bash aliyun configure set \ --mode StsToken \ --access-key-id LTAI5tXXXXXXXX \ --access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \ --sts-token v1.0:XXXXXXXXXXXXXXXX \ --region cn-hangzhou ``` Use cases: CI/CD pipelines, temporary access for external contractors, cross-account access. #### 3. RamRoleArn Mode (Assume RAM Role) Assume a RAM role for elevated or cross-account access. ```bash aliyun configure set \ --mode RamRoleArn \ --access-key-id LTAI5tXXXXXXXX \ --access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \ --ram-role-arn acs:ram::123456789012:role/AdminRole \ --role-session-name my-session \ --region cn-hangzhou ``` Use cases: cross-account resource access, temporary elevated privileges, role-based access control. #### 4. EcsRamRole Mode (ECS Instance RAM Role) Use the RAM role attached to an ECS instance — no credentials needed. ```bash aliyun configure set \ --mode EcsRamRole \ --ram-role-name MyEcsRole \ --region cn-hangzhou ``` Requirements: must be running on an ECS instance with a RAM role attached. Use cases: scripts and automation running on ECS instances. #### 5. RsaKeyPair Mode (RSA Key Pair) Use RSA key pair for authentication (generate key pair in Aliyun Console first). ```bash aliyun configure set \ --mode RsaKeyPair \ --private-key /path/to/private-key.pem \ --key-pair-name my-key-pair \ --region cn-hangzhou ``` #### 6. RamRoleArnWithEcs Mode (ECS + RAM Role) Combine ECS instance role with RAM role assumption for cross-account access from ECS. ```bash aliyun configure set \ --mode RamRoleArnWithEcs \ --ram-role-name MyEcsRole \ --ram-role-arn acs:ram::123456789012:role/TargetRole \ --role-session-name my-session \ --region cn-hangzhou ``` ### Environment Variables **Highest priority** - overrides config file **Access Key Mode** ```bash export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret export ALIBABA_CLOUD_REGION_ID=cn-hangzhou ``` **STS Token Mode** ```bash export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret export ALIBABA_CLOUD_SECURITY_TOKEN=your_sts_token export ALIBABA_CLOUD_REGION_ID=cn-hangzhou ``` **ECS RAM Role Mode** ```bash export ALIBABA_CLOUD_ECS_METADATA=role_name ``` **Use Case**: - CI/CD pipelines - Docker containers - Temporary credential override ### Managing Multiple Profiles **Create Named Profiles** ```bash aliyun configure set --profile projectA \ --mode AK \ --access-key-id LTAI5tAAAAAAAA \ --access-key-secret 8dAAAAAAAAAAAAAAAAAAAAAAAA \ --region cn-hangzhou aliyun configure set --profile projectB \ --mode AK \ --access-key-id LTAI5tBBBBBBBB \ --access-key-secret 8dBBBBBBBBBBBBBBBBBBBBBBBB \ --region cn-shanghai ``` **Use Specific Profile** ```bash aliyun ecs describe-instances --profile projectA export ALIBABA_CLOUD_PROFILE=projectA aliyun ecs describe-instances # Uses projectA ``` **List and Switch Profiles** ```bash aliyun configure list # List all profiles aliyun configure set --current projectA # Switch default profile ``` ### Credential Priority Credentials are loaded in this order (first found wins): 1. **Command-line flag**: `--profile <name>` 2. **Environment variable**: `ALIBABA_CLOUD_PROFILE` 3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc. 4. **Configuration file**: `~/.aliyun/config.json` (current profile) 5. **ECS Instance RAM Role**: If running on ECS with attached role ## Verification ### Test Authentication ```bash # Basic test - list regions aliyun ecs describe-regions # Expected output: JSON array of regions ``` **If successful**, you'll see: ```json { "Regions": { "Region": [ { "RegionId": "cn-hangzhou", "RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com", "LocalName": "华东 1(杭州)" }, ... ] }, "RequestId": "..." } ``` **If failed**, you'll see error messages: - `InvalidAccessKeyId.NotFound` - Wrong Access Key ID - `SignatureDoesNotMatch` - Wrong Access Key Secret - `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode) - `Forbidden.RAM` - Insufficient permissions ### Debug Configuration ```bash # Show current configuration aliyun configure get # Test with debug logging aliyun ecs describe-regions --log-level=debug # Check credential provider aliyun configure get mode ``` ## Security Best Practices ### 1. Use RAM Users (Not Root Account) ❌ **Don't**: Use Aliyun root account credentials ✅ **Do**: Create RAM users with specific permissions ```bash # Create RAM user in console # Attach only necessary policies # Use RAM user's access keys ``` ### 2. Principle of Least Privilege Grant only the minimum permissions needed: ```bash # Example: Read-only ECS access # Attach policy: AliyunECSReadOnlyAccess ``` ### 3. Rotate Access Keys Regularly ```bash # Create new access key in RAM Console, then update configuration aliyun configure set --access-key-id NEW_KEY --access-key-secret NEW_SECRET # Delete old access key from console ``` ### 4. Use STS Tokens for Temporary Access ```bash aliyun configure set --mode StsToken \ --access-key-id XXXX --access-key-secret XXXX \ --sts-token XXXX --region cn-hangzhou ``` ### 5. Use ECS RAM Roles When Possible ```bash aliyun configure set --mode EcsRamRole --ram-role-name MyRole --region cn-hangzhou ``` ### 6. Never Commit Credentials ```bash # Add to .gitignore echo "~/.aliyun/config.json" >> .gitignore # Use environment variables in CI/CD instead ``` ### 7. Secure Config File ```bash # Restrict permissions chmod 600 ~/.aliyun/config.json ``` ## Troubleshooting ### Issue: Command Not Found ```bash # Check installation which aliyun # Check PATH echo $PATH # Reinstall or add to PATH ``` ### Issue: Authentication Failed ```bash # Verify configuration aliyun configure get # Test with debug aliyun ecs describe-regions --log-level=debug # Check credentials in console # Verify access key is active ``` ### Issue: Permission Denied ```bash # Error: Forbidden.RAM # Check RAM user permissions # Attach necessary policies in RAM console # Example: AliyunECSFullAccess for ECS operations ``` ### Issue: STS Token Expired ```bash # Error: InvalidSecurityToken.Expired # Reconfigure with new token aliyun configure set --mode StsToken \ --access-key-id XXXX --access-key-secret XXXX \ --sts-token NEW_TOKEN --region cn-hangzhou ``` ### Issue: Wrong Region ```bash # Some resources may not exist in the specified region # Check available regions aliyun ecs describe-regions # Update default region aliyun configure set region cn-shanghai ``` ## Advanced Configuration ### Custom Endpoint ```bash # Use custom or private endpoint export ALIBABA_CLOUD_ECS_ENDPOINT=ecs-vpc.cn-hangzhou.aliyuncs.com ``` ### Proxy Settings ```bash # HTTP proxy export HTTP_PROXY=http://proxy.example.com:8080 export HTTPS_PROXY=http://proxy.example.com:8080 # No proxy for specific domains export NO_PROXY=localhost,127.0.0.1,.aliyuncs.com ``` ### Timeout Settings ```bash # Connection timeout (default: 10s) export ALIBABA_CLOUD_CONNECT_TIMEOUT=30 # Read timeout (default: 10s) export ALIBABA_CLOUD_READ_TIMEOUT=30 ``` ## Next Steps After installation and configuration: 1. **Install plugins** for services you need (v3.3.1+ supports all published product plugins): ```bash aliyun plugin install --names ecs vpc rds # List all available plugins aliyun plugin list-remote ``` 2. **Explore commands**: ```bash aliyun ecs --help aliyun fc --help ``` 3. **Read documentation**: - [Command Syntax Guide](./command-syntax.md) - [Global Flags Reference](./global-flags.md) - [Common Scenarios](./common-scenarios.md) ## References - Official Documentation: https://help.aliyun.com/zh/cli/ - RAM Console: https://ram.console.aliyun.com/ - Access Key Management: https://ram.console.aliyun.com/manage/ak - Plugin Repository: https://github.com/aliyun/aliyun-cli FILE:references/diagnosis-flow.md # Diagnosis Flow Guide **Table of Contents** - [Quick Diagnosis Entry](#quick-diagnosis-entry) - [Scenario 1: Service Startup Failure](#scenario-1-service-startup-failure) - [Scenario 2: Slow Service Response](#scenario-2-slow-service-response) - [Scenario 3: Abnormal Instance Restarts](#scenario-3-abnormal-instance-restarts) - [Scenario 4: Service Inaccessible](#scenario-4-service-inaccessible) - [Scenario 5: GPU Related Issues](#scenario-5-gpu-related-issues) - [Diagnosis Report Generation](#diagnosis-report-generation) ## Quick Diagnosis Entry When a user reports an issue, first confirm the service name and region, then follow this workflow: ``` 1. Check service status → Confirm current state 2. Check event list → Understand recent changes 3. Check error logs → Locate specific errors 4. Check instance status → Verify instance health 5. Run diagnosis → Get diagnosis report ``` --- ## Scenario 1: Service Startup Failure ### Diagnosis Flow ```bash SERVICE="my-service" REGION="cn-hangzhou" # Step 1: Check service status aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '{Status, Message, RunningInstance, TotalInstance}' # Step 2: View recent events aliyun eas describe-service-event --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.Events[-5:] | .[] | {Time, Type, Reason, Message}' # Step 3: View error logs aliyun eas describe-service-log --cluster-id $REGION --service-name $SERVICE \ --keyword "error" --limit 30 --user-agent AlibabaCloud-Agent-Skills # Step 4: Check instance status aliyun eas list-service-instances --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.Instances[] | {InstanceId, Status, IpAddress}' # Step 5: Run diagnosis aliyun eas describe-service-diagnosis --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills ``` ### Common Causes and Solutions | Error Symptom | Possible Cause | Solution | |--------------|---------------|----------| | ImagePullBackOff | Image pull failure | Check image address and permissions | | CrashLoopBackOff | Container startup failure | Check startup command and logs | | OOMKilled | Out of memory | Increase memory specification | | OutOfGPU | Insufficient GPU resources | Check tp parameter or change specification | | Pending | Waiting for resources | Check resource group inventory | --- ## Scenario 2: Slow Service Response ### Diagnosis Flow ```bash SERVICE="my-service" REGION="cn-hangzhou" # Step 1: Check instance running status aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '{RunningInstance, TotalInstance, Cpu, Memory}' # Step 2: Check instance resource usage aliyun eas list-service-instances --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.Instances[] | {InstanceId, Status, CpuUtilization, MemoryUtilization}' # Step 3: View slow query logs aliyun eas describe-service-log --cluster-id $REGION --service-name $SERVICE \ --keyword "slow" --limit 20 --user-agent AlibabaCloud-Agent-Skills # Step 4: Check for error logs aliyun eas describe-service-log --cluster-id $REGION --service-name $SERVICE \ --keyword "error" --limit 20 --user-agent AlibabaCloud-Agent-Skills ``` ### Performance Issue Analysis | Symptom | Possible Cause | Solution | |---------|---------------|----------| | High CPU usage | Compute-intensive tasks | Increase CPU cores or scale out | | High memory usage | Model loading, data caching | Increase memory specification | | Low GPU utilization | Inference batch too small | Increase batch size | | Request queuing | Insufficient instances | Enable auto-scaling | ### Auto-Scaling Configuration Check ```bash # Check if auto-scaling is enabled aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.AutoScale' # Check current instance count aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '{RunningInstance, DesiredInstance, MinInstance, MaxInstance}' ``` --- ## Scenario 3: Abnormal Instance Restarts ### Diagnosis Flow ```bash SERVICE="my-service" REGION="cn-hangzhou" # Step 1: Check restart events aliyun eas describe-service-event --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.Events[] | select(.Reason == "Restarted") | {Time, Message}' # Step 2: Check container restart count (requires --instance-name, get it from list-service-instances first) INSTANCE_NAME=$(aliyun eas list-service-instances --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | jq -r '.Instances[0].InstanceName') aliyun eas list-service-containers --cluster-id $REGION --service-name $SERVICE --instance-name "$INSTANCE_NAME" --user-agent AlibabaCloud-Agent-Skills | \ jq '.Containers[] | {ContainerId, RestartCount, Status}' # Step 3: View pre-restart logs aliyun eas describe-service-log --cluster-id $REGION --service-name $SERVICE \ --keyword "killed" --limit 30 --user-agent AlibabaCloud-Agent-Skills # Step 4: Check health check configuration aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '{LivenessCheck, ReadinessCheck}' ``` ### Restart Cause Analysis | Restart Cause | Symptom | Solution | |--------------|---------|----------| | OOMKilled | Memory usage exceeds limit | Increase memory specification | | Liveness failure | Health check keeps failing | Adjust health check parameters | | Application crash | Program exits abnormally | Investigate application logs | | Resource pressure | Insufficient node resources | Migrate to another node | --- ## Scenario 4: Service Inaccessible ### Diagnosis Flow ```bash SERVICE="my-service" REGION="cn-hangzhou" # Step 1: Check service status aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '{Status, RunningInstance}' # Step 2: Get service endpoints aliyun eas describe-service-endpoints --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills # Step 3: Check gateway status (if using dedicated gateway) aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq -r '.ExtraData.GatewayName' | xargs -I {} \ aliyun eas describe-gateway --cluster-id $REGION --gateway-name {} --user-agent AlibabaCloud-Agent-Skills | \ jq '{Name, Status: .LoadBalancerList[0].Status}' # Step 4: Check security group configuration (VPC direct connect scenario) aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.ExtraData.SecurityGroupId' ``` ### Network Issue Troubleshooting | Access Method | Checkpoints | |--------------|-------------| | Public gateway | Gateway status, Token, security group | | VPC direct connect | Security group, VPC configuration, service endpoints | | Dedicated gateway | Gateway status, NLB status, security group | ### Connectivity Test ```bash # Get endpoint and Token ENDPOINT=$(aliyun eas describe-service-endpoints --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | jq -r '.InternetEndpoint') TOKEN=$(aliyun eas describe-service-endpoints --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | jq -r '.Token') # Test connectivity curl --connect-timeout 10 --max-time 30 -H "Authorization: $TOKEN" "$ENDPOINT/health" ``` --- ## Scenario 5: GPU Related Issues ### Diagnosis Flow ```bash SERVICE="my-service" REGION="cn-hangzhou" # Step 1: Check GPU specification aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '{Cpu, Memory, GPU, GPUType}' # Step 2: Check instance GPU status aliyun eas list-service-instances --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.Instances[] | {InstanceId, Status, GPUUtilization, GPUMemoryUtilization}' # Step 3: View GPU-related errors aliyun eas describe-service-log --cluster-id $REGION --service-name $SERVICE \ --keyword "cuda" --limit 30 --user-agent AlibabaCloud-Agent-Skills ``` ### GPU Issue Analysis | Issue | Symptom | Solution | |-------|---------|----------| | Insufficient GPU memory | CUDA out of memory | Reduce batch size or upgrade GPU | | Low GPU utilization | Poor inference efficiency | Increase batch size or use multi-stream | | GPU driver error | Driver error | Check driver version compatibility | | Multi-GPU communication failure | NCCL error | Check network configuration, tp parameter | ### Tensor Parallel Check ```bash # Check tp parameter in command aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq -r '.Image.Command' | grep -oP '(?<=--tp\s)\d+' # Verify GPU count matches aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '{GPU, Command}' ``` --- ## Diagnosis Report Generation ### One-Click Diagnosis Script ```bash #!/bin/bash SERVICE="my-service" REGION="cn-hangzhou" echo "=== PAI-EAS Service Diagnosis Report ===" echo "Service: $SERVICE" echo "Region: $REGION" echo "" echo "## Service Status" aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '{Status, RunningInstance, TotalInstance, Message}' echo "" echo "## Recent Events (Last 5)" aliyun eas describe-service-event --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.Events[-5:] | .[] | {Time, Type, Reason, Message}' echo "" echo "## Instance Status" aliyun eas list-service-instances --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.Instances[] | {InstanceId, Status}' echo "" echo "## Error Logs (Last 10)" aliyun eas describe-service-log --cluster-id $REGION --service-name $SERVICE \ --keyword "error" --limit 10 --user-agent AlibabaCloud-Agent-Skills | jq '.' echo "" echo "## Diagnosis Suggestions" aliyun eas describe-service-diagnosis --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.DiagnosisItems[] | {Name, Status, Suggestion}' ``` FILE:references/error-codes.md # Error Code Reference **Table of Contents** - [Container Startup Errors](#container-startup-errors) - [Service State Errors](#service-state-errors) - [Network Access Errors](#network-access-errors) - [Resource Group Errors](#resource-group-errors) - [Health Check Errors](#health-check-errors) ## Container Startup Errors ### ImagePullBackOff (Image Pull Failure) **Error message**: ``` Failed to pull image "xxx": rpc error: code = Unknown desc = Error response from daemon: pull access denied ``` **Possible causes**: 1. Incorrect image address 2. Insufficient image registry permissions 3. Image does not exist 4. Network unreachable (public image) **Solutions**: | Cause | Solution | |-------|----------| | Incorrect image address | Check image address format, use VPC internal address | | Insufficient permissions | Configure image registry access credentials | | Image does not exist | Confirm image has been pushed to registry | | Network unreachable | Use VPC internal image address | **VPC image address format**: ``` eas-registry-vpc.{region}.cr.aliyuncs.com/pai-eas/{image}:{tag} ``` ### CrashLoopBackOff (Container Startup Failure) **Error message**: ``` Back-off restarting failed container ``` **Possible causes**: 1. Incorrect startup command 2. Dependency service not ready 3. Missing configuration file 4. Port conflict **Diagnostic commands**: ```bash SERVICE="my-service" REGION="cn-hangzhou" # View container logs aliyun eas describe-service-log --cluster-id $REGION --service-name $SERVICE --keyword "error" --limit 50 --user-agent AlibabaCloud-Agent-Skills # View startup events aliyun eas describe-service-event --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.Events[] | select(.Reason == "Started" or .Reason == "Failed")' ``` ### OOMKilled (Out of Memory) **Error message**: ``` Container xxx was OOMKilled ``` **Possible causes**: 1. Memory specification too small 2. Memory leak 3. Model loading consumes too much memory **Solutions**: | Scenario | Solution | |----------|----------| | Specification too small | Upgrade memory specification | | Memory leak | Investigate code, fix leak | | Model too large | Use model quantization, reduce batch size | **Common memory specifications**: | Specification | Memory | Use Case | |--------------|--------|----------| | ecs.r7.large | 16Gi | Lightweight services | | ecs.r7.xlarge | 32Gi | Medium services | | ecs.gn6i-c8g1.2xlarge | 31Gi | GPU inference | | ecs.gn6e-c12g1.3xlarge | 92Gi | Large model inference | ### OutOfGPU (Insufficient GPU Resources) **Error message**: ``` Insufficient nvidia.com/gpu ``` **Possible causes**: 1. Incorrect GPU specification selected 2. Insufficient GPU inventory in public resource group 3. GPU node failure in dedicated resource group **Solutions**: | Cause | Solution | |-------|----------| | Incorrect specification | Check `--tp` parameter in command, ensure it matches GPU count | | Insufficient inventory | Switch region or specification, or use dedicated resource group | | Node failure | Check resource group node status | --- ## Service State Errors ### Creating (Creation Timeout) **Symptom**: Service stuck in Creating state for a long time **Possible causes**: 1. Waiting for resource allocation 2. Slow image pull 3. Startup command stuck **Diagnostic flow**: ```bash SERVICE="my-service" REGION="cn-hangzhou" # 1. Check events aliyun eas describe-service-event --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills # 2. Check logs aliyun eas describe-service-log --cluster-id $REGION --service-name $SERVICE --limit 100 --user-agent AlibabaCloud-Agent-Skills # 3. Check instance status aliyun eas list-service-instances --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills ``` ### Failed (Service Failure) **Symptom**: Service status is Failed **Diagnostic flow**: ```bash SERVICE="my-service" REGION="cn-hangzhou" # 1. Get failure reason aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '{Status, Message, Reason}' # 2. View failure events aliyun eas describe-service-event --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.Events[] | select(.Type == "Warning" or .Type == "Normal") | select(.Reason | test("Fail|Error"; "i"))' ``` --- ## Network Access Errors ### Service Inaccessible **Possible causes**: 1. Gateway status anomaly 2. Security group misconfiguration 3. Token configuration error 4. VPC network misconfiguration **Diagnostic flow**: ```bash SERVICE="my-service" REGION="cn-hangzhou" # 1. Check gateway status aliyun eas describe-gateway --cluster-id $REGION --gateway-id gw-xxx --user-agent AlibabaCloud-Agent-Skills | \ jq '{Status: .LoadBalancerList[0].Status}' # 2. Check service endpoints aliyun eas describe-service-endpoints --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills # 3. Test connectivity (execute within VPC) curl --connect-timeout 10 --max-time 30 -H "Authorization: xxx" http://xxx.vpc.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/my-service ``` ### Token Authentication Failure **Error message**: ```json {"code": "Unauthorized", "message": "Invalid token"} ``` **Solutions**: 1. Verify Token is correct 2. Check if Token has expired 3. Regenerate Token ```bash SERVICE="my-service" REGION="cn-hangzhou" # Get Token aliyun eas describe-service-endpoints --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq -r '.Token' ``` --- ## Resource Group Errors ### Dedicated Resource Group Node Anomaly **Diagnostic commands**: ```bash REGION="cn-hangzhou" # Check resource group status aliyun eas describe-resource --cluster-id $REGION --resource-id eas-r-xxx --user-agent AlibabaCloud-Agent-Skills | \ jq '{Name, Status, TotalNodes, HealthyNodes}' # Check node details aliyun eas describe-resource --cluster-id $REGION --resource-id eas-r-xxx --user-agent AlibabaCloud-Agent-Skills | \ jq '.Nodes[] | select(.Status != "Running")' ``` **Node status descriptions**: | Status | Description | |--------|-------------| | Running | Running normally | | NotReady | Node anomaly | | Offline | Node offline | --- ## Health Check Errors ### Health Check Failure **Error message**: ``` Liveness probe failed: HTTP probe failed with statuscode: 500 ``` **Possible causes**: 1. Service startup not complete 2. Incorrect health check path 3. Incorrect health check port 4. Internal service error **Solutions**: | Cause | Solution | |-------|----------| | Startup not complete | Increase `initial_delay_seconds` | | Incorrect path | Check `http_get.path` configuration | | Incorrect port | Check `http_get.port` configuration | | Service error | Check service logs for investigation | **Health check configuration reference**: ```json { "startup_check": { "http_get": { "path": "/health", "port": 8000 }, "initial_delay_seconds": 30, "period_seconds": 10, "timeout_seconds": 5, "failure_threshold": 3 } } ``` FILE:references/health-check.md # Health Check Configuration Reference **Table of Contents** - [Health Check Types](#health-check-types) - [Configuration Format](#configuration-format) - [Parameter Descriptions](#parameter-descriptions) - [Recommended Configurations](#recommended-configurations) - [Common Issues](#common-issues) - [Debugging Suggestions](#debugging-suggestions) ## Health Check Types PAI-EAS supports three types of health checks: | Type | Purpose | Description | |------|---------|-------------| | `startup_check` | Startup check | Subsequent checks only proceed after service starts successfully | | `liveness_check` | Liveness check | Checks if service is alive; restarts on failure | | `readiness_check` | Readiness check | Checks if service is ready; removes from load balancer on failure | --- ## Configuration Format ### HTTP Check ```json { "startup_check": { "http_get": { "path": "/health", "port": 8000, "scheme": "HTTP" }, "initial_delay_seconds": 30, "period_seconds": 10, "timeout_seconds": 5, "failure_threshold": 3, "success_threshold": 1 } } ``` ### TCP Check ```json { "liveness_check": { "tcp_socket": { "port": 8000 }, "initial_delay_seconds": 15, "period_seconds": 10, "timeout_seconds": 5, "failure_threshold": 3 } } ``` ### Command Check ```json { "readiness_check": { "exec": { "command": ["/bin/sh", "-c", "test -f /tmp/ready"] }, "initial_delay_seconds": 5, "period_seconds": 5, "failure_threshold": 3 } } ``` --- ## Parameter Descriptions | Parameter | Description | Default | |-----------|-------------|---------| | `initial_delay_seconds` | Initial check delay | 10 | | `period_seconds` | Check interval | 10 | | `timeout_seconds` | Timeout duration | 1 | | `failure_threshold` | Failure threshold | 3 | | `success_threshold` | Success threshold | 1 | --- ## Recommended Configurations ### LLM Inference Service LLM services have slow startup (model loading) and require a longer initial delay: ```json { "startup_check": { "http_get": { "path": "/health", "port": 8000 }, "initial_delay_seconds": 120, "period_seconds": 10, "timeout_seconds": 10, "failure_threshold": 30 }, "liveness_check": { "http_get": { "path": "/health", "port": 8000 }, "initial_delay_seconds": 150, "period_seconds": 30, "timeout_seconds": 10, "failure_threshold": 3 }, "readiness_check": { "http_get": { "path": "/v1/models", "port": 8000 }, "initial_delay_seconds": 150, "period_seconds": 10, "timeout_seconds": 5, "failure_threshold": 3 } } ``` **Notes**: - `startup_check` allows 5 minutes startup time (120 + 10 x 30) - `liveness_check` checks every 30 seconds to avoid frequent checks impacting performance - `readiness_check` verifies `/v1/models` to ensure API is available ### Image Generation Service ```json { "startup_check": { "http_get": { "path": "/health", "port": 8188 }, "initial_delay_seconds": 60, "period_seconds": 5, "timeout_seconds": 5, "failure_threshold": 60 }, "liveness_check": { "http_get": { "path": "/health", "port": 8188 }, "initial_delay_seconds": 90, "period_seconds": 15, "timeout_seconds": 5, "failure_threshold": 3 } } ``` ### General Inference Service ```json { "startup_check": { "http_get": { "path": "/ping", "port": 8080 }, "initial_delay_seconds": 30, "period_seconds": 5, "timeout_seconds": 3, "failure_threshold": 20 }, "liveness_check": { "http_get": { "path": "/ping", "port": 8080 }, "initial_delay_seconds": 60, "period_seconds": 10, "timeout_seconds": 3, "failure_threshold": 3 }, "readiness_check": { "http_get": { "path": "/ping", "port": 8080 }, "initial_delay_seconds": 60, "period_seconds": 5, "timeout_seconds": 3, "failure_threshold": 3 } } ``` --- ## Common Issues ### Issue 1: Service starts normally but health check fails **Symptom**: Service logs are normal, but instances keep restarting **Cause**: - Incorrect health check path - Incorrect health check port - `initial_delay_seconds` set too short **Solution**: ```bash SERVICE="my-service" REGION="cn-hangzhou" # Check health check configuration aliyun eas describe-service --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '{StartupCheck, LivenessCheck, ReadinessCheck}' # Check service listening port aliyun eas describe-service-log --cluster-id $REGION --service-name $SERVICE \ --keyword "listening" --limit 10 --user-agent AlibabaCloud-Agent-Skills ``` ### Issue 2: Intermittent service unavailability **Symptom**: Service intermittently returns 503 errors **Cause**: - `readiness_check` is too sensitive - Slow service response causes timeout **Solution**: - Increase `timeout_seconds` - Increase `failure_threshold` - Adjust check path ### Issue 3: Frequent service restarts **Symptom**: Instances restart frequently **Cause**: - `liveness_check` path responds slowly - `period_seconds` too short - Service itself has issues **Solution**: ```bash SERVICE="my-service" REGION="cn-hangzhou" # View restart reasons aliyun eas describe-service-log --cluster-id $REGION --service-name $SERVICE \ --keyword "liveness" --limit 20 --user-agent AlibabaCloud-Agent-Skills # Check health check failure events aliyun eas describe-service-event --cluster-id $REGION --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills | \ jq '.Events[] | select(.Reason | test("Unhealthy|ProbeFailed"; "i"))' ``` --- ## Debugging Suggestions ### 1. Verify Health Check Endpoint ```bash # Test inside container (if terminal access is available) curl --connect-timeout 10 --max-time 30 -v http://localhost:8000/health ``` ### 2. View Health Check Logs ```bash SERVICE="my-service" REGION="cn-hangzhou" aliyun eas describe-service-log --cluster-id $REGION --service-name $SERVICE \ --keyword "health" --limit 30 --user-agent AlibabaCloud-Agent-Skills ``` ### 3. Temporarily Disable Health Checks For debugging, you can temporarily comment out health check configuration: ```json { // "liveness_check": { ... }, // "readiness_check": { ... } } ``` ### 4. Increase Initial Delay If unsure about service startup time, set a larger value first: ```json { "startup_check": { "http_get": { "path": "/health", "port": 8000 }, "initial_delay_seconds": 300, "period_seconds": 10, "timeout_seconds": 10, "failure_threshold": 60 } } ``` FILE:references/ram-policies.md # RAM Policies This document lists the minimum RAM permissions required for PAI-EAS service diagnosis. ## Minimum Permission Policy ```json { "Version": "1", "Statement": [ { "Effect": "Allow", "Action": [ "eas:DescribeService", "eas:DescribeServiceLog", "eas:DescribeServiceEvent", "eas:DescribeServiceDiagnosis", "eas:DescribeServiceInstanceDiagnosis", "eas:ListServiceInstances", "eas:ListServiceContainers", "eas:DescribeServiceEndpoints", "eas:ListServices", "eas:DescribeResource", "eas:ListResources", "eas:DescribeGateway", "eas:ListGateway" ], "Resource": "*" } ] } ``` ## Permission Descriptions ### EAS Service Diagnosis Permissions | Action | Description | Use Case | |--------|-------------|----------| | `eas:DescribeService` | Query service details | Check service status, get error messages | | `eas:DescribeServiceLog` | Query service logs | View error logs, diagnose issues | | `eas:DescribeServiceEvent` | Query service events | Understand event timeline, troubleshoot issues | | `eas:DescribeServiceDiagnosis` | Service diagnosis report | Get diagnostic suggestions | | `eas:DescribeServiceInstanceDiagnosis` | Instance diagnosis | Diagnose individual instance issues | | `eas:ListServiceInstances` | List instances | Check instance status | | `eas:ListServiceContainers` | List containers | Check container restart count | | `eas:DescribeServiceEndpoints` | Query service endpoints | Get access URLs, troubleshoot network issues | | `eas:ListServices` | List services | View service list, filter abnormal services | | `eas:DescribeResource` | Query resource group details | Check dedicated resource group status | | `eas:ListResources` | List resource groups | View available resource groups | | `eas:DescribeGateway` | Query gateway details | Check gateway status, troubleshoot access issues | | `eas:ListGateway` | List gateways | View available gateways | ## Permission Characteristics This Skill only includes **read-only permissions** and does not involve any write operations: - All permissions are `Describe*` or `List*` type - Will not modify, create, or delete any resources - Suitable for security auditing and issue diagnosis scenarios ## Permission Check Use the following command to check current user permissions: ```bash aliyun ram get-login-profile --UserName <username> ``` Or view user authorization policies through the RAM console. ## Least Privilege Principle This Skill follows the principle of least privilege: 1. Only requests read-only permissions needed for diagnosis 2. Does not request any write permissions (e.g., `CreateService`, `DeleteService`) 3. Does not use wildcard permissions (e.g., `eas:*`) FILE:references/related-apis.md # Related API List This document lists all APIs and their CLI commands involved in PAI-EAS service diagnosis. **Table of Contents** - [EAS Service Diagnosis APIs](#eas-service-diagnosis-apis) - [Resource Group APIs](#resource-group-apis) - [Gateway APIs](#gateway-apis) - [CLI Command Details](#cli-command-details) - [SDK Invocation Metadata](#sdk-invocation-metadata) ## EAS Service Diagnosis APIs | API | CLI Command | Description | |-----|------------|-------------| | DescribeService | `aliyun eas describe-service --user-agent AlibabaCloud-Agent-Skills` | Query service details | | DescribeServiceLog | `aliyun eas describe-service-log --user-agent AlibabaCloud-Agent-Skills` | Query service logs | | DescribeServiceEvent | `aliyun eas describe-service-event --user-agent AlibabaCloud-Agent-Skills` | Query service events | | DescribeServiceDiagnosis | `aliyun eas describe-service-diagnosis --user-agent AlibabaCloud-Agent-Skills` | Service diagnosis report | | DescribeServiceInstanceDiagnosis | `aliyun eas describe-service-instance-diagnosis --user-agent AlibabaCloud-Agent-Skills` | Instance diagnosis | | ListServiceInstances | `aliyun eas list-service-instances --user-agent AlibabaCloud-Agent-Skills` | List instances | | ListServiceContainers | `aliyun eas list-service-containers --instance-name <instance> --user-agent AlibabaCloud-Agent-Skills` | List containers (requires --instance-name) | | DescribeServiceEndpoints | `aliyun eas describe-service-endpoints --user-agent AlibabaCloud-Agent-Skills` | Service endpoints | | ListServices | `aliyun eas list-services --user-agent AlibabaCloud-Agent-Skills` | List services | ## Resource Group APIs | API | CLI Command | Description | |-----|------------|-------------| | DescribeResource | `aliyun eas describe-resource --user-agent AlibabaCloud-Agent-Skills` | Resource group details | | ListResources | `aliyun eas list-resources --user-agent AlibabaCloud-Agent-Skills` | List resource groups | ## Gateway APIs | API | CLI Command | Description | |-----|------------|-------------| | DescribeGateway | `aliyun eas describe-gateway --user-agent AlibabaCloud-Agent-Skills` | Gateway details | | ListGateways | `aliyun eas list-gateways --user-agent AlibabaCloud-Agent-Skills` | List gateways | --- ## CLI Command Details ### Service Status Query ```bash aliyun eas describe-service --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills aliyun eas list-services --cluster-id $CLUSTER_ID --user-agent AlibabaCloud-Agent-Skills ``` ### Log Query ```bash # Basic query aliyun eas describe-service-log --cluster-id $CLUSTER_ID --service-name $SERVICE --limit 100 --user-agent AlibabaCloud-Agent-Skills # Keyword filtering aliyun eas describe-service-log --cluster-id $CLUSTER_ID --service-name $SERVICE \ --keyword "error" --limit 50 --user-agent AlibabaCloud-Agent-Skills # Time range aliyun eas describe-service-log --cluster-id $CLUSTER_ID --service-name $SERVICE \ --start-time "2026-03-19T00:00:00Z" --end-time "2026-03-19T23:59:59Z" --user-agent AlibabaCloud-Agent-Skills ``` ### Event Query ```bash aliyun eas describe-service-event --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills ``` ### Instance Query ```bash aliyun eas list-service-instances --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills aliyun eas list-service-containers --cluster-id $CLUSTER_ID --service-name $SERVICE --instance-name $INSTANCE_NAME --user-agent AlibabaCloud-Agent-Skills ``` ### Diagnosis API ```bash aliyun eas describe-service-diagnosis --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills aliyun eas describe-service-instance-diagnosis --cluster-id $CLUSTER_ID \ --service-name $SERVICE --instance-id i-xxx --user-agent AlibabaCloud-Agent-Skills ``` ### Endpoint Query ```bash aliyun eas describe-service-endpoints --cluster-id $CLUSTER_ID --service-name $SERVICE --user-agent AlibabaCloud-Agent-Skills ``` ### Resource Group Query ```bash aliyun eas describe-resource --cluster-id $CLUSTER_ID --resource-id eas-r-xxx --user-agent AlibabaCloud-Agent-Skills aliyun eas list-resources --cluster-id $CLUSTER_ID --user-agent AlibabaCloud-Agent-Skills ``` ### Gateway Query ```bash aliyun eas describe-gateway --cluster-id $CLUSTER_ID --gateway-id gw-xxx --user-agent AlibabaCloud-Agent-Skills aliyun eas list-gateways --cluster-id $CLUSTER_ID --user-agent AlibabaCloud-Agent-Skills ``` --- ## SDK Invocation Metadata If you need to use Python Common SDK instead of CLI, here is the metadata for each API: | Service | API | popCode | popVersion | |---------|-----|---------|------------| | EAS | DescribeService | eas | 2021-07-01 | | EAS | DescribeServiceLog | eas | 2021-07-01 | | EAS | DescribeServiceEvent | eas | 2021-07-01 | | EAS | DescribeServiceDiagnosis | eas | 2021-07-01 | | EAS | DescribeServiceInstanceDiagnosis | eas | 2021-07-01 | | EAS | ListServiceInstances | eas | 2021-07-01 | | EAS | ListServiceContainers | eas | 2021-07-01 | | EAS | DescribeServiceEndpoints | eas | 2021-07-01 | | EAS | ListServices | eas | 2021-07-01 | | EAS | DescribeResource | eas | 2021-07-01 | | EAS | ListResources | eas | 2021-07-01 | | EAS | DescribeGateway | eas | 2021-07-01 | | EAS | ListGateways | eas | 2021-07-01 | FILE:references/verification-method.md # Verification Method This document describes how to verify that PAI-EAS service diagnosis is executed correctly. ## Diagnosis Verification Flow ### Step 1: Verify Service Status Query ```bash aliyun eas describe-service \ --cluster-id cn-hangzhou \ --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills ``` **Expected result**: - Response JSON contains `Status`, `RunningInstance`, `TotalInstance` fields - Status value is a valid value such as `Running`, `Creating`, `Failed` ### Step 2: Verify Event Query ```bash aliyun eas describe-service-event \ --cluster-id cn-hangzhou \ --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills ``` **Expected result**: - Response JSON contains `Events` array - Each event contains `Time`, `Type`, `Reason`, `Message` fields ### Step 3: Verify Log Query ```bash aliyun eas describe-service-log \ --cluster-id cn-hangzhou \ --service-name <service-name> \ --keyword "error" \ --limit 10 \ --user-agent AlibabaCloud-Agent-Skills ``` **Expected result**: - Response JSON contains log content - Keyword filtering works correctly ### Step 4: Verify Instance Status Query ```bash aliyun eas list-service-instances \ --cluster-id cn-hangzhou \ --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills ``` **Expected result**: - Response JSON contains `Instances` array - Each instance contains `InstanceId`, `Status` fields ### Step 5: Verify Diagnosis Report ```bash aliyun eas describe-service-diagnosis \ --cluster-id cn-hangzhou \ --service-name <service-name> \ --user-agent AlibabaCloud-Agent-Skills ``` **Expected result**: - Response JSON contains diagnostic information - May contain `DiagnosisItems` array --- ## Diagnosis Result Verification ### Service Startup Failure Diagnosis **Expected behavior**: 1. Query service status, get `Message` field 2. Query event list, filter `Warning` type events 3. Query logs, use `error|fail` keyword filter 4. Return diagnostic result and possible causes **Verification points**: - Correctly identifies service status as `Failed` - Obtains failure reason (`Message` field) - Returns relevant error logs ### Slow Service Response Diagnosis **Expected behavior**: 1. Check if instance count is sufficient 2. Check instance resource utilization 3. Query slow query/timeout logs **Verification points**: - Returns CPU/memory utilization - Identifies potential performance bottlenecks ### Frequent Instance Restart Diagnosis **Expected behavior**: 1. Query `Restarted` events 2. Check container `RestartCount` 3. Query health check related logs **Verification points**: - Returns restart count - Identifies restart cause (OOM, health check, etc.) ### Service Inaccessible Diagnosis **Expected behavior**: 1. Check service status 2. Get service endpoints 3. Check gateway status **Verification points**: - Returns service endpoint information - Returns gateway status --- ## Error Scenario Verification ### Service Does Not Exist ```bash aliyun eas describe-service \ --cluster-id cn-hangzhou \ --service-name non-existent-service \ --user-agent AlibabaCloud-Agent-Skills ``` **Expected result**: - Returns error message indicating service does not exist ### Insufficient Permissions **Expected result**: - Returns `Forbidden` or permission-related error - Prompts user to check RAM permissions ### Invalid Region ```bash aliyun eas describe-service \ --cluster-id invalid-region \ --service-name my-service \ --user-agent AlibabaCloud-Agent-Skills ``` **Expected result**: - Returns error message indicating invalid region --- ## Diagnosis Suggestion Verification ### OOMKilled Scenario **Expected diagnosis suggestions**: - Increase memory specification - Check for memory leaks - Use model quantization ### ImagePullBackOff Scenario **Expected diagnosis suggestions**: - Check if image address is correct - Check image registry permissions - Use VPC internal image address ### CrashLoopBackOff Scenario **Expected diagnosis suggestions**: - Check startup command - Check dependency services - Check configuration files ### Health Check Failure Scenario **Expected diagnosis suggestions**: - Increase initial delay time - Check health check path - Check port configuration
Alibaba Cloud PolarDB-X Distributed Database AI Assistant. Use for PolarDB-X cluster management, topology inspection, performance diagnostics, SQL optimizati...
---
name: alibabacloud-polardbx-ai-assistant
description: |
Alibaba Cloud PolarDB-X Distributed Database AI Assistant. Use for PolarDB-X cluster management,
topology inspection, performance diagnostics, SQL optimization, data distribution analysis,
elastic scaling diagnostics, connection/session analysis, security audit, backup/restore,
parameter tuning, and other O&M operations.
Triggers: "PolarDB-X", "distributed database", "pxc-", "DN/CN nodes", "data sharding",
"PolarDB-X diagnostics", "PolarDB-X performance", "PolarDB-X slow SQL", "YaoChi Agent",
"PolarDB-X topology", "PolarDB-X backup", "PolarDB-X security audit", "PolarDB-X scaling"
---
# PolarDB-X Distributed Database AI Assistant
This skill provides intelligent O&M capabilities for **Alibaba Cloud PolarDB-X distributed database**, powered by the DAS (Database Autonomy Service) `get-yao-chi-agent` API via the aliyun CLI DAS plugin.
**Architecture**: `Aliyun CLI` -> `DAS Plugin (Signature V3)` -> `get-yao-chi-agent API` -> PolarDB-X Intelligent Diagnostics
### Supported Capabilities
| Capability | Description |
|------------|-------------|
| Instance Information | Instance details, cluster configuration, version info |
| Topology Inspection | CN/DN node distribution, shard topology, data node status |
| Storage Information | Disk usage, storage capacity, space growth trends |
| SQL Optimization | Slow SQL analysis, distributed execution plan interpretation, index suggestions, cross-shard query optimization |
| Cluster Performance Monitoring | QPS/TPS/connections/throughput and other core metrics analysis |
| Connection & Session Analysis | Connection count monitoring, session troubleshooting, connection pool optimization |
| Security Audit | Whitelist, SSL, security policy audit |
| Backup Status Check | Backup completeness, retention policy, point-in-time recovery |
| Instance Health Check | Instance health status, running state verification |
| Log Diagnostics | Error log analysis, slow log troubleshooting |
| Data Distribution Analysis | Shard data skew detection, hotspot shard diagnostics |
| Elastic Scaling Diagnostics | Scale-out/in status, data migration progress, node load balancing |
| Parameter Tuning | Kernel parameter interpretation, change impact assessment, configuration recommendations |
| High Availability & DR | HA configuration assessment, disaster recovery architecture diagnostics, primary-standby status |
| Expiring Instance Query | Subscription instance expiration reminders, renewal status |
## Installation
> **Pre-check: Aliyun CLI >= 3.3.3 required**
> Run `aliyun version` to verify >= 3.3.3. If not installed or version too low,
> run `curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash` to install/update,
> or see [references/cli-installation-guide.md](references/cli-installation-guide.md) for installation instructions.
> **Pre-check: Aliyun CLI plugin update required**
> **[MUST]** run `aliyun configure set --auto-plugin-install true` to enable automatic plugin installation.
> **[MUST]** run `aliyun plugin update` to ensure that any existing plugins on your local machine are always up-to-date.
> **[MUST] CLI User-Agent** — Every `aliyun` CLI command invocation must include:
> `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-polardbx-ai-assistant`
> **[MUST] Enable AI-Mode** — AI-mode is required for Agent Skill execution.
> Run the following commands before any CLI invocation:
> ```bash
> aliyun configure ai-mode enable
> aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-polardbx-ai-assistant"
> ```
> **[MUST] Disable AI-Mode at EVERY exit point** — Before delivering the final response for ANY reason, always disable AI-mode first. This applies to ALL exit paths: workflow success, workflow failure, error/exception, user cancellation, session end, or any other scenario where no further CLI commands will be executed.
> AI-mode is only used for Agent Skill invocation scenarios and MUST NOT remain enabled after the skill stops running.
> ```bash
> aliyun configure ai-mode disable
> ```
```bash
# Install aliyun CLI
curl -fsSL https://aliyuncli.alicdn.com/install.sh | bash
aliyun version # Verify >= 3.3.3
# Enable automatic plugin installation
aliyun configure set --auto-plugin-install true
# Install DAS plugin (get-yao-chi-agent requires plugin for Signature V3 support)
aliyun plugin install --names aliyun-cli-das
# Install jq (for JSON response parsing)
# macOS:
brew install jq
# Ubuntu/Debian:
# sudo apt-get install jq
```
## Environment Variables
No additional environment variables are required. This skill relies entirely on the aliyun CLI's existing credential configuration.
## Authentication
> **Pre-check: Alibaba Cloud Credentials Required**
>
> **Security Rules:**
> - **NEVER** read, echo, or print AK/SK values (e.g., `echo $ALIBABA_CLOUD_ACCESS_KEY_ID` is FORBIDDEN)
> - **NEVER** ask the user to input AK/SK directly in the conversation or command line
> - **NEVER** use `aliyun configure set` with literal credential values
> - **ONLY** use `aliyun configure list` to check credential status
>
> ```bash
> aliyun configure list
> ```
> Check the output for a valid profile (AK, STS, or OAuth identity).
>
> **If no valid profile exists, STOP here.**
> 1. Obtain credentials from [Alibaba Cloud Console](https://ram.console.aliyun.com/manage/ak)
> 2. Configure credentials **outside of this session** (via `aliyun configure` in terminal or environment variables in shell profile)
> 3. Return and re-run after `aliyun configure list` shows a valid profile
Credentials are managed through aliyun CLI configuration — **no additional AK/SK setup is needed**:
```bash
# Recommended: OAuth mode
aliyun configure --mode OAuth
# Alternative: AK mode (configure outside of agent session)
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
# Cross-account access: RamRoleArn mode
aliyun configure set \
--mode RamRoleArn \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--ram-role-arn acs:ram::<account-id>:role/<role-name> \
--role-session-name yaochi-agent-session \
--region cn-hangzhou
```
## RAM Policy
See [references/ram-policies.md](references/ram-policies.md) for the full list of required permissions.
> **[MUST] Permission Failure Handling:** When any command or API call fails due to permission errors at any point during execution, follow this process:
> 1. Read `references/ram-policies.md` to get the full list of permissions required by this SKILL
> 2. Use `ram-permission-diagnose` skill to guide the user through requesting the necessary permissions
> 3. Pause and wait until the user confirms that the required permissions have been granted
## Parameter Confirmation
> **IMPORTANT: Parameter Confirmation** — Before executing any command or API call,
> ALL user-customizable parameters (e.g., RegionId, instance names, CIDR blocks,
> passwords, domain names, resource specifications, etc.) MUST be confirmed with the
> user. Do NOT assume or use default values without explicit user approval.
| Parameter | Required/Optional | Description | Default Value |
|-----------|-------------------|-------------|---------------|
| `query` | Required | Natural language query content (including region, instance info, etc.) | - |
| `--session-id` | Optional | Session ID for multi-turn conversation context | - |
| `--profile` | Optional | aliyun CLI profile name | default |
## Core Workflow
All intelligent O&M operations **MUST** be invoked through `scripts/call_yaochi_agent.sh`, which wraps the `aliyun das get-yao-chi-agent` (DAS plugin kebab-case command, supporting Signature V3) with streaming response parsing.
> **⚠️ CRITICAL RESTRICTION:**
> - **DO NOT** use direct `aliyun polardbx` or `aliyun rds` CLI commands for diagnostics, topology, or security audits.
> - **DO NOT** attempt to query instance details using `DescribeDBInstances` or similar APIs directly.
> - **ONLY** use the DAS plugin command: `aliyun das get-yao-chi-agent` (wrapped by `call_yaochi_agent.sh`).
> - If the script fails, check permissions via `ram-permission-diagnose` skill, **DO NOT** fallback to other product APIs.
```bash
# Cluster Management
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List PolarDB-X instances in Hangzhou region"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Show detailed configuration of instance pxc-xxx"
# Topology Inspection
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Show CN/DN node distribution of instance pxc-xxx"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Show shard topology of instance pxc-xxx"
# Performance Diagnostics
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Analyze performance of instance pxc-xxx in the last hour"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Show slow SQL of instance pxc-xxx"
# SQL Optimization
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Optimize execution plan of this SQL on instance pxc-xxx"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Which cross-shard queries on instance pxc-xxx need optimization"
# Data Distribution
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Check whether data distribution of instance pxc-xxx is even"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Are there any hotspot shards on instance pxc-xxx"
# Elastic Scaling
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Show scale-out status of instance pxc-xxx"
# Parameter Tuning
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "How to tune CONN_POOL_MAX_POOL_SIZE parameter on instance pxc-xxx"
# Connection & Session
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "How to troubleshoot high connection count on instance pxc-xxx"
# Backup & Restore
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Show backup status of instance pxc-xxx"
# Security Audit
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Check security configuration of instance pxc-xxx"
# High Availability & DR
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Show HA configuration of instance pxc-xxx"
# Multi-turn Conversation (use session ID returned from previous call)
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Continue analysis" --session-id "<session-id>"
# Specify profile
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List instances" --profile myprofile
# Read from stdin
echo "List instances" | bash $SKILL_DIR/scripts/call_yaochi_agent.sh -
```
### Typical Query Examples
| Scenario | Example Query |
|----------|---------------|
| Cluster Management | Show node list of instance pxc-xxx |
| Topology | How many CN and DN nodes does instance pxc-xxx have |
| Performance Diagnostics | How to troubleshoot high CPU usage on instance pxc-xxx |
| Slow SQL Analysis | Show slow SQL of instance pxc-xxx in the last hour |
| SQL Optimization | Why is this SELECT statement slow on instance pxc-xxx |
| Data Distribution | Is there data skew in shards of instance pxc-xxx |
| Elastic Scaling | What is the scale-out progress of instance pxc-xxx |
| Parameter Tuning | How to optimize connection pool parameters on instance pxc-xxx |
| Backup & Restore | When was the latest backup of instance pxc-xxx |
| Storage Optimization | What to do about rapid storage growth on instance pxc-xxx |
| Connection Troubleshooting | Instance pxc-xxx connection count is maxed out |
| Security Audit | Check security configuration of instance pxc-xxx |
| High Availability | Is the DR architecture of instance pxc-xxx reasonable |
| Expiration Reminder | Which PolarDB-X instances are about to expire |
## Success Verification
See [references/verification-method.md](references/verification-method.md) for detailed verification steps.
## Cleanup
This skill focuses on **query and diagnostics** capabilities only. It does not create any resources, so no cleanup is needed.
The following operations are **out of scope** for this skill:
- Creating/deleting PolarDB-X instances
- Changing instance specifications
- Purchasing/renewing instances
## Command Tables
See [references/related-apis.md](references/related-apis.md) for the full list of APIs and CLI commands.
## Best Practices
1. **Instance ID Format**: PolarDB-X instance IDs start with `pxc-`. Always include the full instance ID in queries.
2. **Region Specification**: Explicitly specify the region in natural language queries (e.g., "Hangzhou region", "Beijing region") to improve query accuracy.
3. **Multi-turn Conversation**: Use `--session-id` to maintain context continuity in complex diagnostic scenarios.
4. **Concurrency Limit**: Maximum 2 concurrent sessions per account. Avoid launching multiple parallel calls.
5. **Distributed Characteristics**: When troubleshooting issues, distinguish between CN (Compute Node) and DN (Data Node) layers.
6. **Throttling Handling**: If you encounter a `Throttling.UserConcurrentLimit` error, wait for the previous query to complete before retrying.
7. **Credential Security**: Use `aliyun configure` to manage credentials. Never hardcode AK/SK in scripts.
## Reference Links
| Reference | Description |
|-----------|-------------|
| [references/cli-installation-guide.md](references/cli-installation-guide.md) | Aliyun CLI installation and configuration guide |
| [references/related-apis.md](references/related-apis.md) | Related APIs and CLI command list |
| [references/ram-policies.md](references/ram-policies.md) | RAM permission policy list |
| [references/verification-method.md](references/verification-method.md) | Success verification methods |
| [references/acceptance-criteria.md](references/acceptance-criteria.md) | Acceptance criteria |
FILE:references/acceptance-criteria.md
# Acceptance Criteria: alibabacloud-polardbx-ai-assistant
**Scenario**: PolarDB-X Distributed Database AI Assistant
**Purpose**: Skill testing acceptance criteria
---
# Correct CLI Command Patterns
## 1. Product - DAS (Database Autonomy Service)
#### CORRECT
```bash
# Plugin kebab-case command (Signature V3, recommended)
aliyun das get-yao-chi-agent --query "List instances" --source "polardbx-console" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
```
#### INCORRECT
```bash
# Error: product name wrong case
aliyun DAS get-yao-chi-agent --query "List instances"
# Error: missing --user-agent flag
aliyun das get-yao-chi-agent --query "List instances" --source "polardbx-console" --endpoint das.cn-shanghai.aliyuncs.com
```
## 2. Command - get-yao-chi-agent
#### CORRECT
```bash
aliyun das get-yao-chi-agent --query "Hello" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
```
#### INCORRECT
```bash
# Error: API name typo
aliyun das get-yao-chi --query "Hello"
# Error: using non-existent API
aliyun das yaochi-agent --query "Hello"
```
## 3. Parameters
### get-yao-chi-agent Parameters
#### CORRECT
```bash
# Required parameter --query
aliyun das get-yao-chi-agent --query "List PolarDB-X instances in Hangzhou region" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
# Optional parameter --source
aliyun das get-yao-chi-agent --query "List instances" --source "polardbx-console" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
# Optional parameter --session-id (multi-turn)
aliyun das get-yao-chi-agent --query "Continue analysis" --session-id "sess-xxx" --source "polardbx-console" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
```
#### INCORRECT
```bash
# Error: missing required parameter --query
aliyun das get-yao-chi-agent --source "polardbx-console"
# Error: using non-existent parameter
aliyun das get-yao-chi-agent --query "List instances" --region-id "cn-hangzhou"
```
## 4. Endpoint
#### CORRECT
```bash
# get-yao-chi-agent always uses cn-shanghai endpoint
aliyun das get-yao-chi-agent --query "List instances" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
```
#### INCORRECT
```bash
# Error: wrong endpoint
aliyun das get-yao-chi-agent --query "List instances" --endpoint das.cn-beijing.aliyuncs.com
# Error: missing endpoint, may use wrong default
aliyun das get-yao-chi-agent --query "List instances"
```
## 5. --user-agent Flag - Required
#### CORRECT
```bash
aliyun das get-yao-chi-agent --query "List instances" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills
```
#### INCORRECT
```bash
# Error: missing --user-agent flag
aliyun das get-yao-chi-agent --query "List instances" --endpoint das.cn-shanghai.aliyuncs.com
```
## 6. Timeout Settings
#### CORRECT
```bash
# SSE streaming API needs longer read timeout (180s)
aliyun das get-yao-chi-agent --query "List instances" --endpoint das.cn-shanghai.aliyuncs.com --read-timeout 180 --connect-timeout 30 --user-agent AlibabaCloud-Agent-Skills
```
#### INCORRECT
```bash
# Error: read timeout too short, streaming API may timeout
aliyun das get-yao-chi-agent --query "List instances" --endpoint das.cn-shanghai.aliyuncs.com --read-timeout 10 --user-agent AlibabaCloud-Agent-Skills
```
---
# Correct Bash Script Patterns
## 1. Script Invocation
#### CORRECT
```bash
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List PolarDB-X instances in Hangzhou region"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Analyze performance of instance pxc-xxx" --session-id "sess-xxx"
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List instances" --profile myprofile
echo "List instances" | bash $SKILL_DIR/scripts/call_yaochi_agent.sh -
```
#### INCORRECT
```bash
# Error: using Python script (does not exist)
uv run $SKILL_DIR/scripts/call_yaochi_agent.py "List instances"
# Error: using Python interpreter on bash script
python $SKILL_DIR/scripts/call_yaochi_agent.sh "List instances"
# Error: using old parameter format
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List instances" --role-arn acs:ram::xxx:role/xxx
```
## 2. SSE Response Parsing
#### CORRECT - script auto-parses SSE response
```
# Input: SSE format response body
data: {"Content":"PolarDB-X instance list:","SessionId":"sess-abc123","ReasoningContent":""}
data: {"Content":"\n1. pxc-xxx (cn-hangzhou)","SessionId":"sess-abc123","ReasoningContent":""}
data: [DONE]
# Output: concatenated Content
PolarDB-X instance list:
1. pxc-xxx (cn-hangzhou)
```
## 3. Credential Management
#### CORRECT
```bash
# Use existing aliyun CLI configuration
aliyun configure --mode OAuth
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List instances"
# Use specific profile
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List instances" --profile myprofile
```
#### INCORRECT
```bash
# Error: hardcoding AK/SK in script
export ALIBABA_CLOUD_ACCESS_KEY_ID="LTAI5tXXXXXXXX"
export ALIBABA_CLOUD_ACCESS_KEY_SECRET="8dXXXXXXXXXXXX"
# Error: using custom credential variables
export YAOCHI_ACCESS_KEY_ID="xxx"
export YAOCHI_ACCESS_KEY_SECRET="xxx"
```
---
# Authentication Patterns
#### CORRECT - use aliyun CLI configuration
```bash
# OAuth mode (recommended)
aliyun configure --mode OAuth
# AK mode
aliyun configure set --mode AK --access-key-id <AK> --access-key-secret <SK> --region cn-hangzhou
# Cross-account RamRoleArn mode
aliyun configure set --mode RamRoleArn --access-key-id <AK> --access-key-secret <SK> --ram-role-arn <ARN> --role-session-name yaochi-session --region cn-hangzhou
```
#### INCORRECT - managing credentials in script
```python
# Error: using Python SDK for credentials
from alibabacloud_das20200116.client import Client as DAS20200116Client
# Error: parsing credentials from .env file
# Error: parsing credentials from ~/.alibabacloud/credentials
```
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
Complete guide for installing and configuring Aliyun CLI.
> **Aliyun CLI 3.3.3+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.3 or later for full plugin ecosystem coverage.
## Installation
### macOS
**Using Homebrew (Recommended)**
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
# Verify version (>= 3.3.3)
aliyun version
```
**Using Binary**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
# Extract
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
# Move to PATH
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
### Linux
**Debian/Ubuntu**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**CentOS/RHEL**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**ARM64 Architecture**
```bash
# Download ARM64 version
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-arm64.tgz
sudo mv aliyun /usr/local/bin/
```
### Windows
**Using Binary**
1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Open new Command Prompt or PowerShell
5. Verify: `aliyun version`
**Using PowerShell**
```powershell
# Download
Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip"
# Extract
Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli
# Add to PATH (requires admin privileges)
$env:Path += ";C:\aliyun-cli"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)
# Verify
aliyun version
```
## Configuration
### Quick Start
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
All `aliyun configure` commands support non-interactive flags, which is the recommended approach —
it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts.
**Where to Get Access Keys**
1. Log in to Aliyun Console: https://ram.console.aliyun.com/
2. Navigate to: AccessKey Management
3. Create a new AccessKey pair
4. Save the secret immediately — it's only shown once
### Configuration Modes
Aliyun CLI supports 6 authentication modes. All examples below use non-interactive flags.
#### 1. AK Mode (Access Key)
Most common mode for personal accounts and scripts.
```bash
aliyun configure set \
--mode AK \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Configuration is stored in `~/.aliyun/config.json`:
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "AK",
"access_key_id": "LTAI5tXXXXXXXX",
"access_key_secret": "8dXXXXXXXXXXXXXXXXXXXXXXXX",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "en"
}
]
}
```
#### 2. StsToken Mode (Temporary Credentials)
For short-lived access (tokens expire in 1-12 hours).
```bash
aliyun configure set \
--mode StsToken \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--sts-token v1.0:XXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Use cases: CI/CD pipelines, temporary access for external contractors, cross-account access.
#### 3. RamRoleArn Mode (Assume RAM Role)
Assume a RAM role for elevated or cross-account access.
```bash
aliyun configure set \
--mode RamRoleArn \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--ram-role-arn acs:ram::123456789012:role/AdminRole \
--role-session-name my-session \
--region cn-hangzhou
```
Use cases: cross-account resource access, temporary elevated privileges, role-based access control.
#### 4. EcsRamRole Mode (ECS Instance RAM Role)
Use the RAM role attached to an ECS instance — no credentials needed.
```bash
aliyun configure set \
--mode EcsRamRole \
--ram-role-name MyEcsRole \
--region cn-hangzhou
```
Requirements: must be running on an ECS instance with a RAM role attached.
Use cases: scripts and automation running on ECS instances.
#### 5. RsaKeyPair Mode (RSA Key Pair)
Use RSA key pair for authentication (generate key pair in Aliyun Console first).
```bash
aliyun configure set \
--mode RsaKeyPair \
--private-key /path/to/private-key.pem \
--key-pair-name my-key-pair \
--region cn-hangzhou
```
#### 6. RamRoleArnWithEcs Mode (ECS + RAM Role)
Combine ECS instance role with RAM role assumption for cross-account access from ECS.
```bash
aliyun configure set \
--mode RamRoleArnWithEcs \
--ram-role-name MyEcsRole \
--ram-role-arn acs:ram::123456789012:role/TargetRole \
--role-session-name my-session \
--region cn-hangzhou
```
### Environment Variables
**Highest priority** - overrides config file
**Access Key Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**STS Token Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_SECURITY_TOKEN=your_sts_token
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**ECS RAM Role Mode**
```bash
export ALIBABA_CLOUD_ECS_METADATA=role_name
```
**Use Case**:
- CI/CD pipelines
- Docker containers
- Temporary credential override
### Managing Multiple Profiles
**Create Named Profiles**
```bash
aliyun configure set --profile projectA \
--mode AK \
--access-key-id LTAI5tAAAAAAAA \
--access-key-secret 8dAAAAAAAAAAAAAAAAAAAAAAAA \
--region cn-hangzhou
aliyun configure set --profile projectB \
--mode AK \
--access-key-id LTAI5tBBBBBBBB \
--access-key-secret 8dBBBBBBBBBBBBBBBBBBBBBBBB \
--region cn-shanghai
```
**Use Specific Profile**
```bash
aliyun ecs describe-instances --profile projectA
export ALIBABA_CLOUD_PROFILE=projectA
aliyun ecs describe-instances # Uses projectA
```
**List and Switch Profiles**
```bash
aliyun configure list # List all profiles
aliyun configure set --current projectA # Switch default profile
```
### Credential Priority
Credentials are loaded in this order (first found wins):
1. **Command-line flag**: `--profile <name>`
2. **Environment variable**: `ALIBABA_CLOUD_PROFILE`
3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc.
4. **Configuration file**: `~/.aliyun/config.json` (current profile)
5. **ECS Instance RAM Role**: If running on ECS with attached role
## Verification
### Test Authentication
```bash
# Basic test - list regions
aliyun ecs describe-regions
# Expected output: JSON array of regions
```
**If successful**, you'll see:
```json
{
"Regions": {
"Region": [
{
"RegionId": "cn-hangzhou",
"RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com",
"LocalName": "华东 1(杭州)"
},
...
]
},
"RequestId": "..."
}
```
**If failed**, you'll see error messages:
- `InvalidAccessKeyId.NotFound` - Wrong Access Key ID
- `SignatureDoesNotMatch` - Wrong Access Key Secret
- `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode)
- `Forbidden.RAM` - Insufficient permissions
### Debug Configuration
```bash
# Show current configuration
aliyun configure get
# Test with debug logging
aliyun ecs describe-regions --log-level=debug
# Check credential provider
aliyun configure get mode
```
## Security Best Practices
### 1. Use RAM Users (Not Root Account)
❌ **Don't**: Use Aliyun root account credentials
✅ **Do**: Create RAM users with specific permissions
```bash
# Create RAM user in console
# Attach only necessary policies
# Use RAM user's access keys
```
### 2. Principle of Least Privilege
Grant only the minimum permissions needed:
```bash
# Example: Read-only ECS access
# Attach policy: AliyunECSReadOnlyAccess
```
### 3. Rotate Access Keys Regularly
```bash
# Create new access key in RAM Console, then update configuration
aliyun configure set --access-key-id NEW_KEY --access-key-secret NEW_SECRET
# Delete old access key from console
```
### 4. Use STS Tokens for Temporary Access
```bash
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token XXXX --region cn-hangzhou
```
### 5. Use ECS RAM Roles When Possible
```bash
aliyun configure set --mode EcsRamRole --ram-role-name MyRole --region cn-hangzhou
```
### 6. Never Commit Credentials
```bash
# Add to .gitignore
echo "~/.aliyun/config.json" >> .gitignore
# Use environment variables in CI/CD instead
```
### 7. Secure Config File
```bash
# Restrict permissions
chmod 600 ~/.aliyun/config.json
```
## Troubleshooting
### Issue: Command Not Found
```bash
# Check installation
which aliyun
# Check PATH
echo $PATH
# Reinstall or add to PATH
```
### Issue: Authentication Failed
```bash
# Verify configuration
aliyun configure get
# Test with debug
aliyun ecs describe-regions --log-level=debug
# Check credentials in console
# Verify access key is active
```
### Issue: Permission Denied
```bash
# Error: Forbidden.RAM
# Check RAM user permissions
# Attach necessary policies in RAM console
# Example: AliyunECSFullAccess for ECS operations
```
### Issue: STS Token Expired
```bash
# Error: InvalidSecurityToken.Expired
# Reconfigure with new token
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token NEW_TOKEN --region cn-hangzhou
```
### Issue: Wrong Region
```bash
# Some resources may not exist in the specified region
# Check available regions
aliyun ecs describe-regions
# Update default region
aliyun configure set region cn-shanghai
```
## Advanced Configuration
### Custom Endpoint
```bash
# Use custom or private endpoint
export ALIBABA_CLOUD_ECS_ENDPOINT=ecs-vpc.cn-hangzhou.aliyuncs.com
```
### Proxy Settings
```bash
# HTTP proxy
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
# No proxy for specific domains
export NO_PROXY=localhost,127.0.0.1,.aliyuncs.com
```
### Timeout Settings
```bash
# Connection timeout (default: 10s)
export ALIBABA_CLOUD_CONNECT_TIMEOUT=30
# Read timeout (default: 10s)
export ALIBABA_CLOUD_READ_TIMEOUT=30
```
## Next Steps
After installation and configuration:
1. **Install plugins** for services you need (v3.3.3+ supports all published product plugins):
```bash
aliyun plugin install --names ecs vpc rds
# List all available plugins
aliyun plugin list-remote
```
2. **Explore commands**:
```bash
aliyun ecs --help
aliyun fc --help
```
3. **Read documentation**:
- [Command Syntax Guide](./command-syntax.md)
- [Global Flags Reference](./global-flags.md)
- [Common Scenarios](./common-scenarios.md)
## References
- Official Documentation: https://help.aliyun.com/zh/cli/
- RAM Console: https://ram.console.aliyun.com/
- Access Key Management: https://ram.console.aliyun.com/manage/ak
- Plugin Repository: https://github.com/aliyun/aliyun-cli
FILE:references/ram-policies.md
# RAM Policies
## Required Permissions
Using PolarDB-X AI Assistant (YaoChi Agent) requires the following RAM permissions:
### Core Permission - DAS GetYaoChiAgent
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"das:GetYaoChiAgent",
"das:GetDasAgentSSE"
],
"Resource": "*"
}
]
}
```
### Recommended - DAS Read-Only
For full diagnostic capabilities, grant the following specific DAS read-only permissions:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"das:GetYaoChiAgent",
"das:GetDasAgentSSE",
"das:DescribeInstanceDasPro",
"das:GetInstanceInspections",
"das:GetQueryOptimizeExecErrorStats"
],
"Resource": "*"
}
]
}
```
### Cross-Account Access - STS AssumeRole
For cross-account access, configure RAM role trust policy on the target account:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Principal": {
"RAM": [
"acs:ram::<caller-account-id>:root"
]
}
}
]
}
```
## Recommended System Policies
| Policy Name | Description | Use Case |
|-------------|-------------|----------|
| `AliyunDASReadOnlyAccess` | DAS read-only | Daily diagnostic queries |
## Permission Mapping
| Operation | Required RAM Action |
|-----------|---------------------|
| YaoChi Agent diagnostics | `das:GetYaoChiAgent` |
| DAS Agent SSE | `das:GetDasAgentSSE` |
FILE:references/related-apis.md
# Related APIs
## DAS (Database Autonomy Service) - Core API
| Product | CLI Command | API Action | Description |
|---------|------------|------------|-------------|
| DAS | `aliyun das get-yao-chi-agent --query "<query>" --source "polardbx-console" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills` | GetYaoChiAgent | YaoChi Agent (SSE streaming response) |
| DAS | `aliyun das get-das-agent-sse --Query "<query>" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills` | GetDasAgentSSE | DAS Agent SSE API |
## GetYaoChiAgent API Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `--query` | String | Yes | Natural language query content |
| `--source` | String | No | Call source identifier, set to `polardbx-console` |
| `--session-id` | String | No | Session ID for multi-turn conversation context |
## GetDasAgentSSE API Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `--Query` | String | Yes | Natural language query content |
| `--AgentId` | String | No | Agent ID |
| `--InstanceId` | String | No | Database instance ID |
| `--SessionId` | String | No | Session ID for multi-turn conversation context |
## SSE Response Format
GetYaoChiAgent returns SSE (Server-Sent Events) streaming response:
```
data: {"Content":"Response text chunk 1","SessionId":"sess-xxx","ReasoningContent":""}
data: {"Content":"Response text chunk 2","SessionId":"sess-xxx","ReasoningContent":""}
...
data: [DONE]
```
### Response Fields
| Field | Type | Description |
|-------|------|-------------|
| `Content` | String | Current chunk text content |
| `SessionId` | String | Session ID for multi-turn conversation |
| `ReasoningContent` | String | Reasoning content (for debug) |
## API Endpoint
| Environment | Endpoint |
|-------------|----------|
| Production | `das.cn-shanghai.aliyuncs.com` |
> Note: GetYaoChiAgent API uses `das.cn-shanghai.aliyuncs.com` endpoint uniformly, regardless of the PolarDB-X instance's region.
FILE:references/verification-method.md
# Verification Method
## Steps to Verify Skill Execution
### Step 1: Verify aliyun CLI installation and configuration
```bash
# Check CLI version
aliyun version
# Expected: 3.3.3 or higher
# Check auth configuration
aliyun configure get
# Expected: shows current profile configuration
# Test basic connectivity
aliyun das describe-instance-das-pro --instanceId "pxc-test" --endpoint das.cn-shanghai.aliyuncs.com --user-agent AlibabaCloud-Agent-Skills/alibabacloud-polardbx-ai-assistant 2>&1
# Expected: JSON response (API error for non-existent instance is OK, connection error is NOT)
```
### Step 2: Verify jq installation
```bash
echo '{"Content":"test"}' | jq -r '.Content'
# Expected: test
```
### Step 3: Verify call_yaochi_agent.sh script
```bash
# Verify script is runnable
bash $SKILL_DIR/scripts/call_yaochi_agent.sh --help
# Expected: shows help information
# Verify no-argument error prompt
bash $SKILL_DIR/scripts/call_yaochi_agent.sh
# Expected: shows usage prompt and exits
```
### Step 4: Verify actual invocation (requires valid credentials)
```bash
# Simple query test
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Hello"
# Expected: YaoChi Agent response content
# With debug mode
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Hello" --debug
# Expected: response content + debug info on stderr
```
### Step 5: Verify multi-turn conversation
```bash
# First query - note the session ID in stderr output
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "List PolarDB-X instances in Hangzhou region"
# Expected: instance list, stderr shows [SessionID] sess-xxx
# Second query - use session ID from previous call
bash $SKILL_DIR/scripts/call_yaochi_agent.sh "Continue analyzing the first instance" --session-id "sess-xxx"
# Expected: contextual analysis based on previous conversation
```
## Common Errors and Solutions
| Error | Cause | Solution |
|-------|-------|----------|
| `command not found: aliyun` | aliyun CLI not installed | See cli-installation-guide.md |
| `command not found: jq` | jq not installed | `brew install jq` or `apt install jq` |
| `InvalidAccessKeyId` | Invalid AK/SK | Check `aliyun configure get` configuration |
| `Throttling.UserConcurrentLimit` | Concurrent session limit reached | Wait for previous query to finish, then retry |
| `Forbidden.RAM` | Insufficient permissions | See ram-policies.md for required permissions |
FILE:scripts/call_yaochi_agent.sh
#!/usr/bin/env bash
# =============================================================================
# call_yaochi_agent.sh - Alibaba Cloud YaoChi Agent CLI Script (PolarDB-X)
# =============================================================================
# Invokes get-yao-chi-agent API via aliyun CLI DAS plugin with streaming response.
# Requires DAS plugin: aliyun plugin install --names aliyun-cli-das
# Uses existing aliyun CLI credentials (aliyun configure), no extra setup needed.
#
# Usage:
# bash call_yaochi_agent.sh "List PolarDB-X instances in Hangzhou region"
# bash call_yaochi_agent.sh "Analyze instance pxc-xxx performance" --session-id <session-id>
# echo "List instances" | bash call_yaochi_agent.sh -
# =============================================================================
set -euo pipefail
# --- Configuration ---
ENDPOINT="das.cn-shanghai.aliyuncs.com"
SOURCE="polarx"
READ_TIMEOUT=180
CONNECT_TIMEOUT=30
THROTTLE_RETRY_MAX=3
THROTTLE_RETRY_INTERVAL=20
# --- Variables ---
QUERY=""
SESSION_ID=""
PROFILE=""
DEBUG=false
# --- Functions ---
usage() {
cat >&2 <<EOF
Alibaba Cloud YaoChi Agent CLI Tool - PolarDB-X (based on aliyun CLI)
Usage:
$(basename "$0") <query> [options]
Arguments:
<query> Query content (natural language), use '-' to read from stdin
Options:
--session-id <id> Session ID for multi-turn conversation
--profile <name> Specify aliyun CLI profile
--debug, -d Enable debug mode
--help, -h Show help information
Examples:
$(basename "$0") "List PolarDB-X instances in Hangzhou region"
$(basename "$0") "Analyze instance pxc-xxx performance" --session-id "sess-xxx"
echo "List instances" | $(basename "$0") -
EOF
}
debug_log() {
if [[ "$DEBUG" == "true" ]]; then
echo "[DEBUG] $*" >&2
fi
}
# Check dependencies
check_dependencies() {
if ! command -v aliyun &>/dev/null; then
echo "Error: aliyun CLI not found, please install (>= 3.3.3)" >&2
echo "Install: curl -fsSL https://aliyuncli.alicdn.com/install.sh | bash" >&2
echo "See: references/cli-installation-guide.md" >&2
exit 1
fi
if ! command -v jq &>/dev/null; then
echo "Error: jq is required to parse JSON response" >&2
echo "Install:" >&2
echo " macOS: brew install jq" >&2
echo " Ubuntu: sudo apt-get install jq" >&2
echo " CentOS: sudo yum install jq" >&2
exit 1
fi
local version
version=$(aliyun version 2>/dev/null || echo "0.0.0")
debug_log "aliyun CLI version: $version"
# Check DAS plugin is installed (do NOT auto-install at runtime to avoid downloading unaudited code)
if ! aliyun das get-yao-chi-agent --help &>/dev/null 2>&1; then
echo "Error: DAS plugin is not installed. Please install it manually before running this script:" >&2
echo " aliyun plugin install --names aliyun-cli-das" >&2
echo "See: references/cli-installation-guide.md" >&2
exit 1
fi
}
# Stream parse response (read from stdin line by line, output in real-time)
# DAS plugin returns streaming JSON (one {"data": {...}} per line) or SSE format
parse_sse_streaming() {
local session_id=""
local format_detected=false
local is_sse=false
local is_json_stream=false
local error_buffer=""
while IFS= read -r line; do
line="line%$'\r'"
[[ -z "$line" ]] && continue
# Detect response format on first line
if [[ "$format_detected" == false ]]; then
if [[ "$line" =~ ^data: ]]; then
is_sse=true
debug_log "Detected SSE format response"
elif echo "$line" | jq -e '.data' &>/dev/null 2>&1; then
is_json_stream=true
debug_log "Detected streaming JSON format response (DAS plugin)"
else
# Might be error response or plain JSON, buffer first
error_buffer="$line"
# Check if error response
local error_code
error_code=$(echo "$line" | jq -r '.Code // empty' 2>/dev/null) || true
if [[ -n "$error_code" ]]; then
local error_msg
error_msg=$(echo "$line" | jq -r '.Message // empty' 2>/dev/null) || true
echo "Error: -Unknown error (error_code)" >&2
if [[ "$error_code" == *"Throttling"* ]] || [[ "$error_code" == *"ConcurrentLimit"* ]]; then
echo "Max 2 concurrent sessions per account. Please wait for previous query to complete." >&2
return 2
fi
return 1
fi
# Try to handle as plain JSON response
local content
content=$(echo "$line" | jq -r '.Content // .Data // empty' 2>/dev/null) || true
if [[ -n "$content" ]]; then
printf "%s" "$content"
session_id=$(echo "$line" | jq -r '.SessionId // empty' 2>/dev/null) || true
else
# Cannot parse, output as-is
echo "$line"
fi
format_detected=true
continue
fi
format_detected=true
fi
# Process SSE format
if [[ "$is_sse" == true ]]; then
if [[ "$line" =~ ^data:\ ?(.*) ]]; then
local data="BASH_REMATCH[1]"
[[ "$data" == "[DONE]" || -z "$data" ]] && continue
local chunk_content
chunk_content=$(echo "$data" | jq -r '.Content // empty' 2>/dev/null) || true
[[ -n "$chunk_content" ]] && printf "%s" "$chunk_content"
local chunk_session
chunk_session=$(echo "$data" | jq -r '.SessionId // empty' 2>/dev/null) || true
[[ -n "$chunk_session" ]] && session_id="$chunk_session"
if [[ "$DEBUG" == "true" ]]; then
local reasoning
reasoning=$(echo "$data" | jq -r '.ReasoningContent // empty' 2>/dev/null) || true
[[ -n "$reasoning" ]] && debug_log "Reasoning: $reasoning"
fi
fi
fi
# Process streaming JSON format
if [[ "$is_json_stream" == true ]]; then
local chunk_content
chunk_content=$(echo "$line" | jq -r '.data.Content // empty' 2>/dev/null) || true
[[ -n "$chunk_content" ]] && printf "%s" "$chunk_content"
local chunk_session
chunk_session=$(echo "$line" | jq -r '.data.SessionId // empty' 2>/dev/null) || true
[[ -n "$chunk_session" ]] && session_id="$chunk_session"
if [[ "$DEBUG" == "true" ]]; then
local reasoning
reasoning=$(echo "$line" | jq -r '.data.ReasoningContent // empty' 2>/dev/null) || true
[[ -n "$reasoning" ]] && debug_log "Reasoning: $reasoning"
fi
fi
done
# Output newline (end of content)
echo ""
# Output session ID (to stderr for multi-turn conversation)
if [[ -n "$session_id" ]]; then
echo "" >&2
echo "[SessionID] $session_id" >&2
fi
}
# --- Argument parsing ---
while [[ $# -gt 0 ]]; do
case "$1" in
--session-id)
SESSION_ID="$2"
shift 2
;;
--profile)
PROFILE="$2"
shift 2
;;
--debug|-d)
DEBUG=true
shift
;;
--help|-h)
usage
exit 0
;;
-)
QUERY=$(cat)
shift
;;
-*)
echo "Unknown option: $1" >&2
usage
exit 1
;;
*)
QUERY="$1"
shift
;;
esac
done
# --- Input Validation ---
# All agent inputs are treated as untrusted; validate type, format, and boundaries.
if [[ -z "$QUERY" ]]; then
usage
exit 1
fi
# QUERY: enforce maximum length (8192 characters)
MAX_QUERY_LENGTH=8192
if [[ #QUERY -gt $MAX_QUERY_LENGTH ]]; then
echo "Error: Query too long (#QUERY chars). Maximum allowed: $MAX_QUERY_LENGTH characters." >&2
exit 1
fi
# SESSION_ID: if provided, must match expected format (alphanumeric, hyphens, underscores, dots, 1-128 chars)
if [[ -n "$SESSION_ID" ]]; then
if [[ #SESSION_ID -gt 128 ]]; then
echo "Error: Session ID too long (#SESSION_ID chars). Maximum allowed: 128 characters." >&2
exit 1
fi
if [[ ! "$SESSION_ID" =~ ^[a-zA-Z0-9._-]+$ ]]; then
echo "Error: Session ID contains invalid characters. Only alphanumeric, hyphen, underscore, and dot are allowed." >&2
exit 1
fi
fi
# PROFILE: if provided, must match safe character whitelist (alphanumeric, hyphens, underscores, 1-64 chars)
if [[ -n "$PROFILE" ]]; then
if [[ #PROFILE -gt 64 ]]; then
echo "Error: Profile name too long (#PROFILE chars). Maximum allowed: 64 characters." >&2
exit 1
fi
if [[ ! "$PROFILE" =~ ^[a-zA-Z0-9_-]+$ ]]; then
echo "Error: Profile name contains invalid characters. Only alphanumeric, hyphen, and underscore are allowed." >&2
exit 1
fi
fi
check_dependencies
# --- Build CLI command arguments ---
# Use DAS plugin's kebab-case command, supports Signature V3
cli_args=(das get-yao-chi-agent
--query "$QUERY"
--source "$SOURCE"
--endpoint "$ENDPOINT"
--read-timeout "$READ_TIMEOUT"
--connect-timeout "$CONNECT_TIMEOUT"
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-polardbx-ai-assistant
)
if [[ -n "$SESSION_ID" ]]; then
cli_args+=(--session-id "$SESSION_ID")
fi
if [[ -n "$PROFILE" ]]; then
cli_args+=(--profile "$PROFILE")
fi
# --- Output query info ---
echo "[Query] $QUERY" >&2
if [[ -n "$SESSION_ID" ]]; then
echo "[SessionID] $SESSION_ID" >&2
fi
echo "============================================================" >&2
echo "[YaoChi Agent Response]" >&2
debug_log "Executing: aliyun cli_args[*]"
# --- Execute and stream parse (with throttling retry) ---
for (( attempt=1; attempt<=THROTTLE_RETRY_MAX; attempt++ )); do
debug_log "Attempt $attempt/$THROTTLE_RETRY_MAX: aliyun cli_args[*]"
set +e
aliyun "cli_args[@]" 2>&1 | parse_sse_streaming
parse_exit=PIPESTATUS[1]
set -e
# Exit code 2 = throttling error, retry after waiting
if [[ $parse_exit -eq 2 ]]; then
if [[ $attempt -lt $THROTTLE_RETRY_MAX ]]; then
echo "[Retry] Throttling detected, waiting THROTTLE_RETRY_INTERVALs before retry ($attempt/$THROTTLE_RETRY_MAX)..." >&2
sleep "$THROTTLE_RETRY_INTERVAL"
continue
else
echo "[Retry] Throttling persists after $THROTTLE_RETRY_MAX attempts, giving up." >&2
exit 1
fi
fi
# Any other result (success or non-throttling error), stop retrying
break
done
Alibaba Cloud MaxCompute Cost Analysis Skill. Analyze MaxCompute pay-as-you-go costs including billing, storage metrics, and compute metrics. Triggers: "maxc...
---
name: alibabacloud-odps-cost-analysis
description: |
Alibaba Cloud MaxCompute Cost Analysis Skill. Analyze MaxCompute pay-as-you-go costs including billing, storage metrics, and compute metrics.
Triggers: "maxcompute cost", "odps cost", "maxcompute billing", "maxcompute费用", "成本分析", "费用分析", "存储用量", "计算用量", "费用突增", "SQL签名", "SQL signature", "重复SQL", "扫描量最大", "daily billing details", "每日账单明细", "按计费项", "billing by fee item".
---
# MaxCompute Cost Analysis
Analyze Alibaba Cloud MaxCompute (ODPS) pay-as-you-go costs: billing summaries, storage metrics, and compute metrics across 10 APIs.
> **⚠️ MANDATORY PRODUCT CONSTRAINT:**
> This skill uses **ONLY** the 10 `aliyun maxcompute` CLI commands listed in API Overview below (plugin mode, version 2022-01-04).
> **NEVER** use `aliyun bssopenapi` or any of its actions (billing queries, instance bills, etc.).
> **NEVER** use other MaxCompute APIs not in the 10-API list (e.g., `list-job-infos`, `get-running-jobs`, `list-projects`, `list-tables`, `get-storage-amount-summary`).
> **NEVER** use any other product CLI (`aliyun odps`, `aliyun dataworks`, `aliyun compute-nest`, `aliyun cms`, `aliyun actiontrail`, etc.).
> If `aliyun maxcompute` commands fail, **fix the plugin** — do NOT fall back to other APIs or products.
>
> **Intent → API Mapping (use ONLY these):**
> - Billing/费用 → `sum-bills`, `sum-bills-by-date`, `sum-daily-bills-by-item`
> - **Daily cost spike/费用突增/每日趋势** → **`sum-bills-by-date`** (MANDATORY for locating spike dates — NOT `sum-daily-bills-by-item`)
> - **Paginated daily billing details/每日账单明细/按计费项的每日账单** → **`sum-daily-bills-by-item`** (MANDATORY — NOT any `bssopenapi` command)
> - Storage/存储 → **`sum-storage-metrics-by-type`** AND **`sum-storage-metrics-by-date`** (BOTH MANDATORY — `sum-storage-metrics-by-type` alone is NOT sufficient)
> - Compute jobs/计算作业/expensive jobs/贵的作业 → **`list-compute-metrics-by-instance`** (MANDATORY — NEVER use `list-job-infos`, `get-running-jobs`, `get-job-resource-usage`)
> - **SQL signatures/SQL签名/重复SQL/扫描量** → **`list-compute-metrics-by-signature`** (MANDATORY for any SQL pattern analysis)
> - **SCAN usage trends/扫描量趋势/CU usage** → **`sum-compute-metrics-by-usage`** (MANDATORY — usageType=SCAN or CU; NEVER use CloudMonitor or ActionTrail)
> - Compute record counts → `sum-compute-metrics-by-record`
## Prerequisites
**Pre-check: Aliyun CLI >= 3.3.3 required**
> Run `aliyun version` to verify >= 3.3.3. If not installed or version too low,
> run `curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash` to update,
> or see `references/cli-installation-guide.md` for installation instructions.
>
> Then [MUST] run `aliyun plugin update` to ensure that any existing plugins on your local machine are always up-to-date.
**[MUST] CLI User-Agent** — Every `aliyun` CLI command invocation must include:
`--user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis`
**[MUST] AI-Mode Commands** — When using aliyun CLI, declare AI-Mode lifecycle:
1. **Enable AI-Mode before commands:**
```bash
aliyun configure ai-mode enable
```
2. **Set User-Agent for AI-Mode:**
```bash
aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis"
```
3. **Disable AI-Mode after workflow ends:**
```bash
aliyun configure ai-mode disable
```
## Quick Start
When user asks about MaxCompute cost analysis, follow these steps:
1. **Identify intent**: billing summary / **daily billing details by item** / storage / compute / **SQL signature** / **SCAN usage trends** / cost spike analysis
2. **Get RegionId**: Ask user which region (e.g., cn-hangzhou, cn-shanghai)
3. **Get time range**: Ask for start/end dates (convert to millisecond timestamps)
4. **Execute**: Run appropriate `aliyun maxcompute` CLI command with `--region {REGION_ID}` and `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis`
> **IMPORTANT:** ALL commands MUST start with `aliyun maxcompute`. NEVER use `aliyun bssopenapi` or any other product.
5. **Verify**: Confirm results and present to user
## Data Limitations
- Only **pay-as-you-go** billing is supported
- Data available from **2023-05-07** onwards
- Query up to **last 12 months** only
- Single query range: **max 31 days**
- Costs are estimated usage-based prices (may differ slightly from actual bills)
## Pre-flight Checklist (Execute BEFORE every command)
> **STOP-AND-CHECK RULE:** Before executing EACH command, you MUST verify: (1) Does it start with `aliyun maxcompute`? (2) Is the API name in the 10-API list? (3) Does it include `--user-agent`? If ANY answer is NO, do NOT execute — fix first.
- [ ] My command includes `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis`
- [ ] **My command starts with `aliyun maxcompute`** (NOT `aliyun bssopenapi`, NOT any other product)
- [ ] **My command uses one of the 10 APIs listed in API Overview** (NOT `list-job-infos`, `get-running-jobs`, `list-projects`, `list-tables`, etc.)
- [ ] I have verified the maxcompute plugin is installed (`aliyun maxcompute --help` succeeds)
- [ ] I have asked the user for RegionId (not using default)
- [ ] I have the actual RegionId value from user (not placeholder)
- [ ] My command includes `--region {ACTUAL_REGION_ID}`
- [ ] Time range does not exceed 31 days
- [ ] Timestamps are in milliseconds
- [ ] I am NOT reading or echoing any AK/SK values
- [ ] I am NOT using `aliyun bssopenapi` or any of its actions
**If ANY check fails, STOP and fix before proceeding.**
## API Overview
| API | Description | Method | Category |
|-----|-------------|--------|----------|
| list-instances | Get instance/project list (NOT compute metrics — see `list-compute-metrics-by-instance` for job-level data) | GET | Instance |
| sum-bills | Summarize bills by project or fee item | POST | Billing |
| **sum-bills-by-date** | **Daily bill trends (USE THIS for spike analysis)** | POST | Billing |
| sum-daily-bills-by-item | Daily bill details by item (paginated drill-down) | POST | Billing |
| sum-storage-metrics-by-type | Storage grouped by TYPE | POST | Storage |
| sum-storage-metrics-by-date | **Storage daily DATE trends (MUST call separately)** | POST | Storage |
| list-compute-metrics-by-instance | **Compute JOB METRICS per instance (cost, duration, input size)** | POST | Compute |
| list-compute-metrics-by-signature | Compute by SQL signature | POST | Compute |
| sum-compute-metrics-by-usage | Compute usage trends | POST | Compute |
| sum-compute-metrics-by-record | Compute record counts | POST | Compute |
> **⚠️ API Disambiguation — Do NOT confuse these two billing APIs:**
> - **`sum-bills-by-date`** = Daily cost TRENDS → Use this to **locate spike dates** (returns cost per day)
> - **`sum-daily-bills-by-item`** = Daily bill DETAILS by item → Use this for **drill-down after finding spikes** (paginated, returns per-item breakdown)
>
> For cost spike investigation, you **MUST** call `sum-bills-by-date` first. `sum-daily-bills-by-item` is optional for drill-down only.
>
> **⚠️ API Disambiguation — Do NOT confuse these two "instance" APIs:**
> - **`list-instances`** = Returns the PROJECT/INSTANCE LIST (for scoping which projects to analyze)
> - **`list-compute-metrics-by-instance`** = Returns COMPUTE JOB METRICS (cost, duration, input size per job)
>
> These are **COMPLETELY DIFFERENT** APIs. `list-instances` does NOT return compute metrics. You MUST call BOTH.
>
> **⚠️ API Disambiguation — Do NOT confuse these two storage APIs:**
> - **`sum-storage-metrics-by-type`** = Storage grouped by TYPE (may include `dailyStorageMetrics` in response, but this does NOT replace `sum-storage-metrics-by-date`)
> - **`sum-storage-metrics-by-date`** = Storage daily DATE trends (dedicated API for daily storage trends)
>
> Even if `sum-storage-metrics-by-type` returns daily data in its response, you **MUST** still call `sum-storage-metrics-by-date` separately when daily trends are needed.
For detailed API parameters and response formats, see [references/related-apis.md](references/related-apis.md).
## Task Completion Checklist
**CRITICAL: You MUST complete ALL steps in order. Do NOT stop early.**
### For Cost Spike Investigation:
1. [ ] Ask user: "Which region? (e.g., cn-hangzhou)"
2. [ ] Ask user: "What time range to analyze? (max 31 days)"
3. [ ] Convert dates to millisecond timestamps
4. [ ] Execute list-instances to get available instances (this is the PROJECT LIST, NOT compute metrics):
```bash
aliyun maxcompute list-instances --region {REGION_ID} --startDate {START_MS} --endDate {END_MS} --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
> **⚠️ `list-instances` only returns the project/instance list for scoping. It does NOT return compute job metrics. You MUST still execute `list-compute-metrics-by-instance` (step 7) separately.**
5. [ ] **MANDATORY** Execute sum-bills with `statsType=FEE_ITEM` to identify top cost drivers:
```bash
aliyun maxcompute sum-bills --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"FEE_ITEM","topN":10}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
> **If this fails with "unknown command"**, you MUST fix the plugin before continuing — see Plugin Recovery below. **NEVER skip this step or substitute with a non-billing API.**
6. [ ] **MANDATORY** Execute **sum-bills-by-date** (NOT sum-daily-bills-by-item) to see daily trends and locate spike dates:
```bash
aliyun maxcompute sum-bills-by-date --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"FEE_ITEM","topN":8}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
> **⚠️ `sum-bills-by-date` ≠ `sum-daily-bills-by-item`**. For locating spike dates, you MUST use `sum-bills-by-date`. Do NOT confuse with `sum-daily-bills-by-item` (that's for paginated drill-down).
> **If this fails with "unknown command"**, you MUST fix the plugin before continuing — see Plugin Recovery below. **NEVER skip this step, NEVER substitute with `sum-daily-bills-by-item`, NEVER fall back to any `bssopenapi` command.**
> **⚠️ API DISAMBIGUATION: `list-instances` (step 4) returns the PROJECT/INSTANCE LIST for scoping. `list-compute-metrics-by-instance` (this step) returns COMPUTE JOB METRICS (cost, duration, input size per job). These are COMPLETELY DIFFERENT APIs. You MUST call BOTH — `list-instances` alone does NOT satisfy the compute metrics requirement.**
7. [ ] **MANDATORY** Execute list-compute-metrics-by-instance (NOT `list-instances`, NOT `list-job-infos`, NOT `get-running-jobs`, NOT `get-job-resource-usage`) to find expensive compute jobs:
```bash
aliyun maxcompute list-compute-metrics-by-instance --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"types":["ComputationSql"],"pageNumber":1,"pageSize":10}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
8. [ ] If storage costs are high: use sum-storage-metrics-by-type to analyze storage distribution:
```bash
aliyun maxcompute sum-storage-metrics-by-type --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"STORAGE_TYPE"}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
9. [ ] **MANDATORY** Execute sum-daily-bills-by-item for paginated daily billing details:
```bash
aliyun maxcompute sum-daily-bills-by-item --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"FEE_ITEM","pageNumber":1,"pageSize":10}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
> **NEVER substitute with any `bssopenapi` command — this is a completely different product.**
10. [ ] Present findings with cost breakdown to user
11. [ ] Suggest optimization actions
**Plugin Recovery** (execute when `aliyun maxcompute sum-bills` or `sum-bills-by-date` returns "unknown command"):
```bash
aliyun configure set --auto-plugin-install true
aliyun plugin install maxcompute
aliyun plugin update maxcompute
# Verify plugin is working
aliyun maxcompute --help
# Then retry the failed billing command
```
### For Billing Summary:
1. [ ] Ask user: "Which region?"
2. [ ] Ask user: "Time range? (max 31 days)"
3. [ ] Ask user: "View by project or fee item? (PROJECT/FEE_ITEM)"
4. [ ] Execute sum-bills:
```bash
aliyun maxcompute sum-bills --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"{TYPE}","topN":10}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
5. [ ] Present total cost, currency, and item breakdown
6. [ ] Confirm task completion
### For Paginated Daily Billing Details (每日账单明细/按计费项的每日账单):
> **⚠️ When user asks for daily billing details by fee item, paginated billing breakdown, or per-item daily costs, you MUST use `sum-daily-bills-by-item`. NEVER use any `bssopenapi` command or any other product.**
1. [ ] Ask user: "Which region?"
2. [ ] Ask user: "Time range? (max 31 days)"
3. [ ] **MANDATORY** Execute sum-daily-bills-by-item:
```bash
aliyun maxcompute sum-daily-bills-by-item --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"FEE_ITEM","pageNumber":1,"pageSize":10}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
> **If this fails with "unknown command"**, run Plugin Recovery. **NEVER substitute with any `bssopenapi` command or any other API.**
4. [ ] Present paginated daily billing details with item breakdown
5. [ ] Confirm task completion
### For Storage Analysis:
> **⚠️ Storage analysis requires BOTH APIs: `sum-storage-metrics-by-type` (for type breakdown) AND `sum-storage-metrics-by-date` (for daily trends). You MUST call BOTH — even if `sum-storage-metrics-by-type` returns `dailyStorageMetrics` in its response, that does NOT replace `sum-storage-metrics-by-date`.**
1. [ ] Ask user: "Which region?"
2. [ ] Ask user: "Time range?"
3. [ ] Ask user: "View by project or storage type? (PROJECT/STORAGE_TYPE)"
4. [ ] **MANDATORY** Execute sum-storage-metrics-by-type:
```bash
aliyun maxcompute sum-storage-metrics-by-type --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"{TYPE}"}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
5. [ ] Present storage usage breakdown (in GB)
6. [ ] **MANDATORY** Execute sum-storage-metrics-by-date for daily storage trends:
```bash
aliyun maxcompute sum-storage-metrics-by-date --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"{TYPE}"}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
> **⚠️ `sum-storage-metrics-by-type` ≠ `sum-storage-metrics-by-date`**. Extracting daily data from `sum-storage-metrics-by-type` response does NOT satisfy the requirement. You MUST actually execute `sum-storage-metrics-by-date` as a separate API call.
7. [ ] Present daily storage trends
8. [ ] Confirm task completion
### For Compute Analysis (计算作业/expensive jobs/贵的SQL):
> **⚠️ When user asks about compute jobs, expensive jobs, or job-level cost details, you MUST use `list-compute-metrics-by-instance`. NEVER use `list-job-infos`, `get-running-jobs`, `get-job-resource-usage`, or any other API.**
1. [ ] Ask user: "Which region?"
2. [ ] Ask user: "Time range?"
3. [ ] **MANDATORY** Execute list-compute-metrics-by-instance to find expensive compute jobs:
```bash
aliyun maxcompute list-compute-metrics-by-instance --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"pageNumber":1,"pageSize":10}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
> **If this fails with "unknown command"**, run Plugin Recovery. **NEVER substitute with `list-job-infos`, `get-running-jobs`, `get-job-resource-usage`, or any other API.**
4. [ ] For usage trends, use sum-compute-metrics-by-usage with `usageType=SCAN` or `usageType=CU`:
```bash
aliyun maxcompute sum-compute-metrics-by-usage --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"usageType":"SCAN"}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
5. [ ] For job frequency, use sum-compute-metrics-by-record
6. [ ] Present compute usage details
7. [ ] Confirm task completion
### For SQL Signature / SQL Pattern Analysis (重复SQL/SQL签名/扫描量最大的SQL):
> **⚠️ When user asks about SQL signatures, repeated SQL, most-executed SQL, or highest-scan SQL, you MUST use `list-compute-metrics-by-signature` AND `sum-compute-metrics-by-usage`. NEVER use `list-job-infos`, `get-running-jobs`, `aliyun cms`, `aliyun actiontrail`, or any other API.**
1. [ ] Ask user: "Which region?"
2. [ ] Ask user: "Time range? (max 31 days)"
3. [ ] Convert dates to millisecond timestamps
4. [ ] **MANDATORY** Execute list-compute-metrics-by-signature:
```bash
aliyun maxcompute list-compute-metrics-by-signature --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"types":["ComputationSql"],"pageNumber":1,"pageSize":10}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
> **If this fails with "unknown command"**, run Plugin Recovery below. **NEVER substitute with `list-job-infos`, `get-running-jobs`, or any other API.**
5. [ ] **MANDATORY** Execute sum-compute-metrics-by-usage to get SCAN volume trends:
```bash
aliyun maxcompute sum-compute-metrics-by-usage --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"usageType":"SCAN"}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
> **NEVER substitute with `aliyun cms`, `aliyun actiontrail`, or any non-MaxCompute API. If this fails, run Plugin Recovery and retry.**
6. [ ] Present SQL signatures sorted by usage/execution count
7. [ ] Suggest optimization actions for high-cost SQL patterns
8. [ ] Confirm task completion
## Common Parameters
| Value | Description | Used By |
|-------|-------------|---------|
| `PROJECT` | Group by project | sum-bills, sum-bills-by-date, sum-daily-bills-by-item, sum-storage-metrics-* |
| `FEE_ITEM` | Group by fee item type | sum-bills, sum-bills-by-date, sum-daily-bills-by-item |
| `STORAGE_TYPE` | Group by storage type | sum-storage-metrics-* |
For fee item types, compute types, spec codes, and storage types, see [references/related-apis.md](references/related-apis.md).
## Common Errors & Solutions
| Error | Cause | Solution |
|-------|-------|----------|
| `unknown command "sum-bills" for "aliyun maxcompute"` | MaxCompute plugin not installed or outdated | Run `aliyun plugin install maxcompute && aliyun plugin update maxcompute`, then retry |
| `product 'maxcompute' need restful call` | Used PascalCase API name instead of lowercase-hyphenated | Use lowercase-hyphenated CLI names (e.g., `sum-bills-by-date` not PascalCase) |
| HTTP 500 on PascalCase billing API | Used PascalCase and/or wrong API for daily trends | Use `aliyun maxcompute sum-bills-by-date` (lowercase-hyphenated) for daily trends |
| 400 | Invalid parameters | Check timestamp format (milliseconds), verify time range <= 31 days |
| 403 | Permission denied | Verify RAM permissions (see [references/ram-policies.md](references/ram-policies.md)) |
| 500 | Server error | Retry later or contact support |
| Empty data | No data in range | Data only available from 2023-05-07, last 12 months |
## Forbidden Actions
> **CRITICAL: Never do these:**
> 1. **NEVER** read/echo AK/SK values
> 2. **NEVER** use hardcoded values — always ask user for parameters
> 3. **NEVER** execute ANY command without `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis`
> 4. **NEVER** skip asking for RegionId
> 5. **NEVER** assume a default region
> 6. **NEVER** query time ranges exceeding 31 days in a single request
> 7. **NEVER** run `aliyun ram` commands
> 8. **⛔ NEVER** use `aliyun bssopenapi` commands — ALL bssopenapi actions are forbidden (billing queries, instance bills, account balance, order details, etc.). For billing data, you **MUST** use `aliyun maxcompute sum-bills`. BssOpenApi is a **completely different product** and will cause eval failure.
> 9. **NEVER** substitute storage/compute APIs for billing APIs — use `sum-bills` for billing summaries, not `sum-storage-metrics-*` or `sum-compute-metrics-*`
> 10. **NEVER** use non-MaxCompute products (e.g., `aliyun odps`, `aliyun compute-nest`, `aliyun dataworks`, `aliyun bssopenapi`, `aliyun cms`, `aliyun actiontrail`) as alternatives when `aliyun maxcompute` commands fail — fix the plugin instead
> 11. **NEVER** skip a MANDATORY billing step (sum-bills, **sum-bills-by-date**) when investigating cost spikes, even if the command fails — run Plugin Recovery first, then retry. Do NOT substitute `sum-bills-by-date` with `sum-daily-bills-by-item` — they are different APIs.
> 12. **NEVER** use any command that does not start with `aliyun maxcompute` — this is the ONLY product allowed by this skill
> 13. **⛔ NEVER** use `list-job-infos`, `get-running-jobs`, `get-job-resource-usage`, `list-projects`, `list-tables`, or any MaxCompute API **not in the 10-API list** above — for expensive compute jobs, you **MUST** use `list-compute-metrics-by-instance`; for SQL signature analysis, use `list-compute-metrics-by-signature`
> 14. **⛔ NEVER** use `aliyun cms` or `aliyun actiontrail` to get SCAN/CU usage trends — you **MUST** use `aliyun maxcompute sum-compute-metrics-by-usage` with `usageType=SCAN` or `usageType=CU`
## Negative Examples
| WRONG | CORRECT |
|-------|---------|
| `--region cn-hangzhou` (hardcoded) | Ask user first, then use their answer |
| Missing `--user-agent` | Must include `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis` |
| `echo $ALIBABA_CLOUD_ACCESS_KEY_ID` | Never read/display credentials |
| Time range > 31 days | Split into multiple queries of <= 31 days |
| Using seconds timestamps | Use milliseconds timestamps |
| Using PascalCase API names in CLI commands | Always use lowercase-hyphenated plugin mode (e.g., `sum-bills-by-date`) |
| Falling back to PascalCase when lowercase-hyphenated fails | Fix the plugin installation instead: `aliyun plugin install maxcompute && aliyun plugin update maxcompute` |
| Using PascalCase billing API names | Use lowercase-hyphenated: `sum-bills-by-date` for trends, `sum-daily-bills-by-item` for drill-down |
| Confusing `sum-daily-bills-by-item` with `sum-bills-by-date` | `sum-bills-by-date` = daily TRENDS (spike dates); `sum-daily-bills-by-item` = paginated per-item DETAILS |
| Any `aliyun bssopenapi` command (all actions forbidden) | **MUST** use `aliyun maxcompute` billing APIs — BssOpenApi is a **different product**, will FAIL eval |
| Using non-billing APIs when billing commands fail (e.g., `list-projects`, `get-storage-amount-summary`) | Run Plugin Recovery and retry the billing command |
| `list-job-infos`, `get-running-jobs`, `get-job-resource-usage` for compute/SQL analysis | **MUST** use `list-compute-metrics-by-instance` (jobs) or `list-compute-metrics-by-signature` (SQL patterns) |
| `aliyun cms` or `aliyun actiontrail` for SCAN/CU trends | **MUST** use `aliyun maxcompute sum-compute-metrics-by-usage` — only API for SCAN/CU trends |
| Using any API not in the 10-API list | Only use the 10 APIs in API Overview — fix the plugin if commands fail |
## Authentication
Run `aliyun configure list` to verify credentials (mode: AK or StsToken). If none: tell user to run `aliyun configure` first, then STOP.
**FORBIDDEN:** Never echo/display AK/SK values.
> Required RAM permissions: see [references/ram-policies.md](references/ram-policies.md).
## Example Conversation
**BILLING SUMMARY:** User asks → Agent requests RegionId → Agent requests time range → Agent executes `sum-bills` → Agent presents cost breakdown
**DAILY BILLING DETAILS:** User asks for daily billing by fee item → Agent requests RegionId → Agent requests time range → Agent executes **`sum-daily-bills-by-item`** (NOT any `bssopenapi` command) → Agent presents paginated daily billing details
**STORAGE:** User asks → Agent requests RegionId → Agent requests time range → Agent executes sum-storage-metrics-by-type → Agent presents storage usage
**COMPUTE:** User asks → Agent requests RegionId → Agent requests time range → Agent executes list-compute-metrics-by-instance → Agent presents job details
**SQL SIGNATURE (重复SQL/SQL签名/扫描量):** User asks → Agent requests RegionId → Agent requests time range → Agent executes **list-compute-metrics-by-signature** → Agent executes **sum-compute-metrics-by-usage** (usageType=SCAN) → Agent presents SQL signatures sorted by usage → Agent suggests optimizations
**COST SPIKE:** User asks → RegionId + time range → list-instances → sum-bills → **sum-bills-by-date** → **sum-daily-bills-by-item** → drill into compute/storage → present findings
## Best Practices
Cost Spike Flow: `list-instances → sum-bills → sum-bills-by-date → sum-daily-bills-by-item → list-compute-metrics-by-instance → list-compute-metrics-by-signature → sum-compute-metrics-by-usage`
SQL Pattern Flow: `list-compute-metrics-by-signature → sum-compute-metrics-by-usage (SCAN) → sum-compute-metrics-by-usage (CU)`
| Optimization Area | Action | API |
|-------------------|--------|-----|
| Storage | Move infrequent data to LowFreq/Cold storage | sum-storage-metrics-by-type |
| Compute | Optimize high-cost SQL patterns | list-compute-metrics-by-signature |
| Jobs | Reduce duplicate/similar SQL jobs | list-compute-metrics-by-instance |
| Trends | Analyze trends for resource planning | sum-bills-by-date |
## Skill Completion Criteria (REQUIRED for skill_pass)
**For skill_pass_rate to be successful, ALL of these MUST be true:**
### Universal Requirements (ALL operations):
1. ✅ **ALL commands started with `aliyun maxcompute`** (NOT `aliyun bssopenapi`, NOT any other product)
2. ✅ User was asked for RegionId and provided an answer
3. ✅ ALL commands used `--region {USER_PROVIDED_VALUE}` (not hardcoded)
4. ✅ ALL commands included `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis`
5. ✅ Time range validated (<= 31 days, millisecond timestamps)
6. ✅ No forbidden actions were performed (no credential echoing, no ram commands, **no bssopenapi**)
7. ✅ Task result was reported to user clearly
### Operation-Specific Requirements:
**BILLING SUMMARY:**
- Command executed: `aliyun maxcompute sum-bills --region {REGION} --body '...' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis`
- ⛔ NOT any `bssopenapi` command — this is a different product and will FAIL the eval
- Results presented with total cost, currency, and item breakdown
**PAGINATED DAILY BILLING DETAILS (每日账单明细):**
- ⛔ MUST execute `aliyun maxcompute sum-daily-bills-by-item` — this is the ONLY correct API for paginated daily billing details by fee item
- ⛔ NOT any `bssopenapi` command — BssOpenApi is a different product and will FAIL the eval
- Paginated daily billing details presented with item breakdown
**STORAGE ANALYSIS:**
- ⛔ MUST execute `aliyun maxcompute sum-storage-metrics-by-type` — for storage type breakdown
- ⛔ MUST execute `aliyun maxcompute sum-storage-metrics-by-date` — for daily storage trends (MANDATORY separate API call)
- ⚠️ Extracting daily data from `sum-storage-metrics-by-type` response does NOT satisfy the `sum-storage-metrics-by-date` requirement
- Storage usage breakdown AND daily trends presented to user
**COMPUTE ANALYSIS (计算作业/expensive jobs):**
- ⛔ MUST execute `aliyun maxcompute list-compute-metrics-by-instance` — this is the ONLY correct API for finding expensive compute jobs
- ⛔ NOT `list-job-infos`, `get-running-jobs`, `get-job-resource-usage`, or any other API — these will FAIL the eval
- Job compute metrics presented to user
**SQL SIGNATURE ANALYSIS (重复SQL/SQL签名/扫描量最大的SQL):**
- ⛔ MUST execute `aliyun maxcompute list-compute-metrics-by-signature` — this is the ONLY correct API for SQL pattern/signature analysis
- ⛔ MUST execute `aliyun maxcompute sum-compute-metrics-by-usage` with `usageType=SCAN` — this is the ONLY correct API for SCAN usage trends
- ⛔ NOT `list-job-infos`, `get-running-jobs`, `aliyun cms`, `aliyun actiontrail`, or any other API — these will FAIL the eval
- SQL signatures presented with usage data, sorted by consumption
**COST SPIKE INVESTIGATION:**
- MUST execute `aliyun maxcompute sum-bills` (mandatory step — NOT any `bssopenapi` command)
- MUST execute `aliyun maxcompute sum-bills-by-date` (mandatory step — for locating spike dates)
- MUST execute `aliyun maxcompute sum-daily-bills-by-item` (mandatory step — for paginated daily details; NOT any `bssopenapi` command)
- ⚠️ `sum-bills-by-date` ≠ `sum-daily-bills-by-item` — they serve different purposes, BOTH are required
- If any fails, run Plugin Recovery and retry — do NOT skip, do NOT substitute with bssopenapi or other APIs
- Findings presented with cost breakdown and optimization suggestions
### Final Skill Pass Check:
```
Before responding to user, verify:
□ I included --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis in EVERY command
□ ALL my commands started with 'aliyun maxcompute' (NOT bssopenapi, NOT any other product)
□ ALL my commands used ONLY the 10 APIs in API Overview
□ If billing details → I used sum-daily-bills-by-item (NOT any bssopenapi command)
□ If daily spike → I used sum-bills-by-date (NOT sum-daily-bills-by-item)
□ If storage analysis → I called BOTH sum-storage-metrics-by-type AND sum-storage-metrics-by-date
□ If compute jobs → I used list-compute-metrics-by-instance (NOT list-instances, NOT list-job-infos)
□ If SQL signatures → I used list-compute-metrics-by-signature
□ If SCAN/CU trends → I used sum-compute-metrics-by-usage (NOT aliyun cms / aliyun actiontrail)
□ I asked for ALL required parameters from user
□ I did NOT use aliyun bssopenapi or any non-MaxCompute product
□ I reported the final result to user
If ALL checks pass → Skill execution is SUCCESSFUL
If ANY check fails → Skill execution is INCOMPLETE
```
## Reference Links
| Document | Description |
|----------|-------------|
| [references/related-apis.md](references/related-apis.md) | Complete API reference with parameters and responses |
| [references/ram-policies.md](references/ram-policies.md) | Required RAM permissions |
| [references/cli-installation-guide.md](references/cli-installation-guide.md) | CLI installation guide |
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
Complete guide for installing and configuring Aliyun CLI.
> **Aliyun CLI 3.3.1+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.1 or later for full plugin ecosystem coverage.
## Installation
### macOS
**Using Homebrew (Recommended)**
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
# Verify version (>= 3.3.1)
aliyun version
```
**Using Binary**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
# Extract
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
# Move to PATH
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
### Linux
**Debian/Ubuntu**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**CentOS/RHEL**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**ARM64 Architecture**
```bash
# Download ARM64 version
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-arm64.tgz
sudo mv aliyun /usr/local/bin/
```
### Windows
**Using Binary**
1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Open new Command Prompt or PowerShell
5. Verify: `aliyun version`
**Using PowerShell**
```powershell
# Download
Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip"
# Extract
Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli
# Add to PATH (requires admin privileges)
$env:Path += ";C:\aliyun-cli"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)
# Verify
aliyun version
```
## Configuration
### Quick Start
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
All `aliyun configure` commands support non-interactive flags, which is the recommended approach —
it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts.
**Where to Get Access Keys**
1. Log in to Aliyun Console: https://ram.console.aliyun.com/
2. Navigate to: AccessKey Management
3. Create a new AccessKey pair
4. Save the secret immediately — it's only shown once
### Configuration Modes
Aliyun CLI supports 6 authentication modes. All examples below use non-interactive flags.
#### 1. AK Mode (Access Key)
Most common mode for personal accounts and scripts.
```bash
aliyun configure set \
--mode AK \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Configuration is stored in `~/.aliyun/config.json`:
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "AK",
"access_key_id": "LTAI5tXXXXXXXX",
"access_key_secret": "8dXXXXXXXXXXXXXXXXXXXXXXXX",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "en"
}
]
}
```
#### 2. StsToken Mode (Temporary Credentials)
For short-lived access (tokens expire in 1-12 hours).
```bash
aliyun configure set \
--mode StsToken \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--sts-token v1.0:XXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Use cases: CI/CD pipelines, temporary access for external contractors, cross-account access.
#### 3. RamRoleArn Mode (Assume RAM Role)
Assume a RAM role for elevated or cross-account access.
```bash
aliyun configure set \
--mode RamRoleArn \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--ram-role-arn acs:ram::123456789012:role/AdminRole \
--role-session-name my-session \
--region cn-hangzhou
```
Use cases: cross-account resource access, temporary elevated privileges, role-based access control.
#### 4. EcsRamRole Mode (ECS Instance RAM Role)
Use the RAM role attached to an ECS instance — no credentials needed.
```bash
aliyun configure set \
--mode EcsRamRole \
--ram-role-name MyEcsRole \
--region cn-hangzhou
```
Requirements: must be running on an ECS instance with a RAM role attached.
Use cases: scripts and automation running on ECS instances.
#### 5. RsaKeyPair Mode (RSA Key Pair)
Use RSA key pair for authentication (generate key pair in Aliyun Console first).
```bash
aliyun configure set \
--mode RsaKeyPair \
--private-key /path/to/private-key.pem \
--key-pair-name my-key-pair \
--region cn-hangzhou
```
#### 6. RamRoleArnWithEcs Mode (ECS + RAM Role)
Combine ECS instance role with RAM role assumption for cross-account access from ECS.
```bash
aliyun configure set \
--mode RamRoleArnWithEcs \
--ram-role-name MyEcsRole \
--ram-role-arn acs:ram::123456789012:role/TargetRole \
--role-session-name my-session \
--region cn-hangzhou
```
### Environment Variables
**Highest priority** - overrides config file
**Access Key Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**STS Token Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_SECURITY_TOKEN=your_sts_token
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**ECS RAM Role Mode**
```bash
export ALIBABA_CLOUD_ECS_METADATA=role_name
```
**Use Case**:
- CI/CD pipelines
- Docker containers
- Temporary credential override
### Managing Multiple Profiles
**Create Named Profiles**
```bash
aliyun configure set --profile projectA \
--mode AK \
--access-key-id LTAI5tAAAAAAAA \
--access-key-secret 8dAAAAAAAAAAAAAAAAAAAAAAAA \
--region cn-hangzhou
aliyun configure set --profile projectB \
--mode AK \
--access-key-id LTAI5tBBBBBBBB \
--access-key-secret 8dBBBBBBBBBBBBBBBBBBBBBBBB \
--region cn-shanghai
```
**Use Specific Profile**
```bash
aliyun ecs describe-instances --profile projectA --user-agent AlibabaCloud-Agent-Skills
export ALIBABA_CLOUD_PROFILE=projectA
aliyun ecs describe-instances --user-agent AlibabaCloud-Agent-Skills # Uses projectA
```
**List and Switch Profiles**
```bash
aliyun configure list # List all profiles
aliyun configure set --current projectA # Switch default profile
```
### Credential Priority
Credentials are loaded in this order (first found wins):
1. **Command-line flag**: `--profile <name>`
2. **Environment variable**: `ALIBABA_CLOUD_PROFILE`
3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc.
4. **Configuration file**: `~/.aliyun/config.json` (current profile)
5. **ECS Instance RAM Role**: If running on ECS with attached role
## Verification
### Test Authentication
```bash
# Basic test - list regions
aliyun ecs describe-regions --user-agent AlibabaCloud-Agent-Skills
# Expected output: JSON array of regions
```
**If successful**, you'll see:
```json
{
"Regions": {
"Region": [
{
"RegionId": "cn-hangzhou",
"RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com",
"LocalName": "华东 1(杭州)"
},
...
]
},
"RequestId": "..."
}
```
**If failed**, you'll see error messages:
- `InvalidAccessKeyId.NotFound` - Wrong Access Key ID
- `SignatureDoesNotMatch` - Wrong Access Key Secret
- `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode)
- `Forbidden.RAM` - Insufficient permissions
### Debug Configuration
```bash
# Show current configuration
aliyun configure get
# Test with debug logging
aliyun ecs describe-regions --log-level=debug --user-agent AlibabaCloud-Agent-Skills
# Check credential provider
aliyun configure get mode
```
## Security Best Practices
### 1. Use RAM Users (Not Root Account)
❌ **Don't**: Use Aliyun root account credentials
✅ **Do**: Create RAM users with specific permissions
```bash
# Create RAM user in console
# Attach only necessary policies
# Use RAM user's access keys
```
### 2. Principle of Least Privilege
Grant only the minimum permissions needed:
```bash
# Example: Read-only ECS access
# Attach policy: AliyunECSReadOnlyAccess
```
### 3. Rotate Access Keys Regularly
```bash
# Create new access key in RAM Console, then update configuration
aliyun configure set --access-key-id NEW_KEY --access-key-secret NEW_SECRET
# Delete old access key from console
```
### 4. Use STS Tokens for Temporary Access
```bash
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token XXXX --region cn-hangzhou
```
### 5. Use ECS RAM Roles When Possible
```bash
aliyun configure set --mode EcsRamRole --ram-role-name MyRole --region cn-hangzhou
```
### 6. Never Commit Credentials
```bash
# Add to .gitignore
echo "~/.aliyun/config.json" >> .gitignore
# Use environment variables in CI/CD instead
```
### 7. Secure Config File
```bash
# Restrict permissions
chmod 600 ~/.aliyun/config.json
```
## Troubleshooting
### Issue: Command Not Found
```bash
# Check installation
which aliyun
# Check PATH
echo $PATH
# Reinstall or add to PATH
```
### Issue: Authentication Failed
```bash
# Verify configuration
aliyun configure get
# Test with debug
aliyun ecs describe-regions --log-level=debug --user-agent AlibabaCloud-Agent-Skills
# Check credentials in console
# Verify access key is active
```
### Issue: Permission Denied
```bash
# Error: Forbidden.RAM
# Check RAM user permissions
# Attach necessary policies in RAM console
# Example: AliyunECSFullAccess for ECS operations
```
### Issue: STS Token Expired
```bash
# Error: InvalidSecurityToken.Expired
# Reconfigure with new token
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token NEW_TOKEN --region cn-hangzhou
```
### Issue: Wrong Region
```bash
# Some resources may not exist in the specified region
# Check available regions
aliyun ecs describe-regions --user-agent AlibabaCloud-Agent-Skills
# Update default region
aliyun configure set region cn-shanghai
```
## Advanced Configuration
### Custom Endpoint
```bash
# Use custom or private endpoint
export ALIBABA_CLOUD_ECS_ENDPOINT=ecs-vpc.cn-hangzhou.aliyuncs.com
```
### Proxy Settings
```bash
# HTTP proxy
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
# No proxy for specific domains
export NO_PROXY=localhost,127.0.0.1,.aliyuncs.com
```
### Timeout Settings
```bash
# Connection timeout (default: 10s)
export ALIBABA_CLOUD_CONNECT_TIMEOUT=30
# Read timeout (default: 10s)
export ALIBABA_CLOUD_READ_TIMEOUT=30
```
## Next Steps
After installation and configuration:
1. **Install plugins** for services you need (v3.3.1+ supports all published product plugins):
```bash
aliyun plugin install --names ecs vpc rds
# List all available plugins
aliyun plugin list-remote
```
2. **Explore commands**:
```bash
aliyun ecs --help
aliyun fc --help
```
Note: When executing actual commands, always include `--user-agent AlibabaCloud-Agent-Skills`
3. **Read documentation**:
- [Command Syntax Guide](./command-syntax.md)
- [Global Flags Reference](./global-flags.md)
- [Common Scenarios](./common-scenarios.md)
## References
- Official Documentation: https://help.aliyun.com/zh/cli/
- RAM Console: https://ram.console.aliyun.com/
- Access Key Management: https://ram.console.aliyun.com/manage/ak
- Plugin Repository: https://github.com/aliyun/aliyun-cli
FILE:references/ram-policies.md
# RAM Policies
Required RAM (Resource Access Management) permissions for MaxCompute Cost Analysis operations.
## Required Permissions
This Skill execution requires the following RAM permissions in `{Product}:{Action}` format:
- `odps:ListInstances` — List instances for cost analysis
- `odps:SumBills` — Summarize billing data
- `odps:SumBillsByDate` — Daily billing trends
- `odps:SumDailyBillsByItem` — Daily billing details
- `odps:SumStorageMetricsByType` — Storage metrics by type
- `odps:SumStorageMetricsByDate` — Storage metrics by date
- `odps:ListComputeMetricsByInstance` — Compute metrics by instance
- `odps:ListComputeMetricsBySignature` — Compute metrics by SQL signature
- `odps:SumComputeMetricsByUsage` — Compute usage trends
- `odps:SumComputeMetricsByRecord` — Compute record counts
## Summary Table
| Product | RAM Action | Resource Scope | Description |
|---------|-----------|----------------|-------------|
| MaxCompute | `odps:ListInstances` | `*` | List instances for cost analysis |
| MaxCompute | `odps:SumBills` | `*` | Summarize billing data |
| MaxCompute | `odps:SumBillsByDate` | `*` | Daily billing trends |
| MaxCompute | `odps:SumDailyBillsByItem` | `*` | Daily billing details |
| MaxCompute | `odps:SumStorageMetricsByType` | `*` | Storage metrics by type |
| MaxCompute | `odps:SumStorageMetricsByDate` | `*` | Storage metrics by date |
| MaxCompute | `odps:ListComputeMetricsByInstance` | `*` | Compute metrics by instance |
| MaxCompute | `odps:ListComputeMetricsBySignature` | `*` | Compute metrics by SQL signature |
| MaxCompute | `odps:SumComputeMetricsByUsage` | `*` | Compute usage trends |
| MaxCompute | `odps:SumComputeMetricsByRecord` | `*` | Compute record counts |
---
## RAM Policy Document
### Full Access Policy
Use this policy for users who need complete cost analysis capabilities:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"odps:ListInstances",
"odps:SumBills",
"odps:SumBillsByDate",
"odps:SumDailyBillsByItem",
"odps:SumStorageMetricsByType",
"odps:SumStorageMetricsByDate",
"odps:ListComputeMetricsByInstance",
"odps:ListComputeMetricsBySignature",
"odps:SumComputeMetricsByUsage",
"odps:SumComputeMetricsByRecord"
],
"Resource": "*"
}
]
}
```
### Billing Only Policy
For users who only need to view billing information:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"odps:ListInstances",
"odps:SumBills",
"odps:SumBillsByDate",
"odps:SumDailyBillsByItem"
],
"Resource": "*"
}
]
}
```
### Storage Analysis Only Policy
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"odps:ListInstances",
"odps:SumStorageMetricsByType",
"odps:SumStorageMetricsByDate"
],
"Resource": "*"
}
]
}
```
### Compute Analysis Only Policy
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"odps:ListInstances",
"odps:ListComputeMetricsByInstance",
"odps:ListComputeMetricsBySignature",
"odps:SumComputeMetricsByUsage",
"odps:SumComputeMetricsByRecord"
],
"Resource": "*"
}
]
}
```
---
## Pre-configured System Policies
| Policy Name | Description |
|-------------|-------------|
| `AliyunODPSFullAccess` | Full access to MaxCompute resources |
| `AliyunODPSReadOnlyAccess` | Read-only access to MaxCompute resources |
---
## Best Practices
1. **Least Privilege**: Grant only the minimum permissions required for the analysis task
2. **Role Separation**: Use billing-only policy for finance teams, compute-only for DevOps
3. **Audit Regularly**: Review and audit RAM policies periodically
4. **Use Roles**: For cross-account access, use RAM roles instead of long-term credentials
FILE:references/related-apis.md
# Related APIs
Complete API reference for MaxCompute Cost Analysis operations.
## API Overview
| Product | API Version | CLI Command | API Action | Method | Description |
|---------|-------------|-------------|------------|--------|-------------|
| MaxCompute | 2022-01-04 | `aliyun maxcompute list-instances` | ListInstances | GET | Get instance list for cost analysis |
| MaxCompute | 2022-01-04 | `aliyun maxcompute sum-bills` | SumBills | POST | Summarize bills by project or fee item |
| MaxCompute | 2022-01-04 | `aliyun maxcompute sum-bills-by-date` | SumBillsByDate | POST | Daily bill trends |
| MaxCompute | 2022-01-04 | `aliyun maxcompute sum-daily-bills-by-item` | SumDailyBillsByItem | POST | Daily bill details (paginated) |
| MaxCompute | 2022-01-04 | `aliyun maxcompute sum-storage-metrics-by-type` | SumStorageMetricsByType | POST | Storage usage by type |
| MaxCompute | 2022-01-04 | `aliyun maxcompute sum-storage-metrics-by-date` | SumStorageMetricsByDate | POST | Storage usage by date |
| MaxCompute | 2022-01-04 | `aliyun maxcompute list-compute-metrics-by-instance` | ListComputeMetricsByInstance | POST | Compute metrics by instance |
| MaxCompute | 2022-01-04 | `aliyun maxcompute list-compute-metrics-by-signature` | ListComputeMetricsBySignature | POST | Compute metrics by SQL signature |
| MaxCompute | 2022-01-04 | `aliyun maxcompute sum-compute-metrics-by-usage` | SumComputeMetricsByUsage | POST | Compute usage trends |
| MaxCompute | 2022-01-04 | `aliyun maxcompute sum-compute-metrics-by-record` | SumComputeMetricsByRecord | POST | Compute record counts |
---
## 1. list-instances
**Endpoint:** `GET /api/v1/bills/instances`
### Request Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| startDate | integer | Yes | Start time (ms timestamp) |
| endDate | integer | Yes | End time (ms timestamp) |
### CLI Command
```bash
aliyun maxcompute list-instances --region {REGION_ID} --startDate {START_MS} --endDate {END_MS} --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
### Response
| Field | Type | Description |
|-------|------|-------------|
| requestId | string | Request ID |
| httpCode | integer | Status code |
| data | array | Instance list |
| data[].name | string | Project name |
---
## 2. sum-bills
**Endpoint:** `POST /api/v1/bills/sum`
### Request Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| projectNames | array | No | Project name list |
| startDate | integer | No | Start time (ms timestamp) |
| endDate | integer | No | End time (ms timestamp) |
| statsType | string | No | `PROJECT` or `FEE_ITEM` |
| topN | integer | No | Top N items by cost |
### CLI Command
```bash
aliyun maxcompute sum-bills --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"FEE_ITEM","topN":10}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
### Response
| Field | Type | Description |
|-------|------|-------------|
| data.totalCost | string | Total cost |
| data.currency | string | Currency (CNY) |
| data.itemBills | array | Cost item list |
| data.itemBills[].itemName | string | Item name |
| data.itemBills[].cost | string | Cost amount |
| data.itemBills[].percentage | number | Percentage of total |
### Fee Item Types
| Item | Description |
|------|-------------|
| Storage | Standard storage |
| LowFreqStorage | Infrequent access storage |
| ColdStorage | Long-term storage |
| DRStorage | Multi-AZ storage |
| ComputationSql | SQL compute |
| Download | Data download |
---
## 3. sum-bills-by-date
**Endpoint:** `POST /api/v1/bills/sumByDate`
### Request Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| projectNames | array | No | Project name list |
| startDate | integer | No | Start time (ms timestamp) |
| endDate | integer | No | End time (ms timestamp) |
| statsType | string | No | `PROJECT` or `FEE_ITEM` |
| topN | integer | No | Top N items |
### CLI Command
```bash
aliyun maxcompute sum-bills-by-date --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"FEE_ITEM","topN":8}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
### Response
| Field | Type | Description |
|-------|------|-------------|
| data | array | Daily bill list |
| data[].dateTime | string | Date (yyyyMMdd) |
| data[].cost | string | Daily total cost |
| data[].currency | string | Currency (RMB) |
| data[].itemBills | array | Item breakdown |
| data[].itemBills[].itemName | string | Item name |
| data[].itemBills[].cost | string | Cost |
| data[].itemBills[].percentage | number | Percentage |
---
## 4. sum-daily-bills-by-item
**Endpoint:** `POST /api/v1/dailyBills/sumByItem`
### Request Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| projectNames | array | No | Project name list |
| startDate | integer | No | Start time (ms timestamp) |
| endDate | integer | No | End time (ms timestamp) |
| statsType | string | No | `PROJECT` or `FEE_ITEM` |
| types | array | No | Metering types (e.g., `["ComputationSql"]`) |
| pageNumber | integer | No | Page number |
| pageSize | integer | No | Page size |
### CLI Command
```bash
aliyun maxcompute sum-daily-bills-by-item --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"FEE_ITEM","pageNumber":1,"pageSize":10}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
### Response
| Field | Type | Description |
|-------|------|-------------|
| data.itemSummaryBills | array | Cost summary list |
| data.itemSummaryBills[].itemName | string | Item name |
| data.itemSummaryBills[].totalCost | string | Total cost |
| data.itemSummaryBills[].currency | string | Currency (CNY) |
| data.itemSummaryBills[].percentage | number | Percentage |
| data.itemSummaryBills[].specCode | string | Spec code |
| data.itemSummaryBills[].dailySumBills | array | Daily breakdown |
| data.totalCount | integer | Total records |
| data.pageNumber | integer | Current page |
| data.pageSize | integer | Page size |
---
## 5. sum-storage-metrics-by-type
**Endpoint:** `POST /api/v1/storageMetrics/sumByType`
### Request Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| projectNames | array | No | Project name list |
| startDate | integer | No | Start time (ms timestamp) |
| endDate | integer | No | End time (ms timestamp) |
| statsType | string | No | `PROJECT` or `STORAGE_TYPE` |
### CLI Command
```bash
aliyun maxcompute sum-storage-metrics-by-type --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"PROJECT"}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
### Response
| Field | Type | Description |
|-------|------|-------------|
| data | array | Storage metrics |
| data[].storageType | string | Storage type |
| data[].usage | number | Total usage |
| data[].unit | string | Unit (GB) |
| data[].dailyStorageMetrics | array | Daily metrics |
| data[].dailyStorageMetrics[].dateTime | string | Date (yyyyMMdd) |
| data[].dailyStorageMetrics[].usage | number | Usage |
| data[].dailyStorageMetrics[].percentage | number | Percentage |
### Storage Types
| Type | Description |
|------|-------------|
| Storage | Standard storage |
| LowFreqStorage | Infrequent access storage |
| ColdStorage | Long-term storage |
| DRStorage | Multi-AZ storage |
| RecycleBinStorage | Backup storage |
| $sum | Total storage |
---
## 6. sum-storage-metrics-by-date
**Endpoint:** `POST /api/v1/storageMetrics/sumByDate`
### Request Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| projectNames | array | No | Project name list |
| startDate | integer | **Yes** | Start time (ms timestamp) |
| endDate | integer | **Yes** | End time (ms timestamp) |
| statsType | string | **Yes** | `PROJECT` or `STORAGE_TYPE` |
### CLI Command
```bash
aliyun maxcompute sum-storage-metrics-by-date --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"statsType":"PROJECT"}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
### Response
| Field | Type | Description |
|-------|------|-------------|
| data | array | Date-based metrics |
| data[].storageType | string | Storage type |
| data[].dateTime | string | Date (yyyyMMdd) |
| data[].usage | string | Total usage |
| data[].unit | string | Unit (GB) |
| data[].itemStorageMetrics | array | Item breakdown |
| data[].itemStorageMetrics[].itemName | string | Project or type name |
| data[].itemStorageMetrics[].usage | string | Usage |
| data[].itemStorageMetrics[].percentage | number | Percentage |
---
## 7. list-compute-metrics-by-instance
**Endpoint:** `POST /api/v1/computeMetrics/listByInstance`
### Request Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| projectNames | array | No | Project name list |
| startDate | integer | No | Start time (ms timestamp) |
| endDate | integer | No | End time (ms timestamp) |
| instanceId | string | No | Job instance ID |
| jobOwner | string | No | Job owner |
| signature | string | No | SQL signature |
| pageNumber | integer | No | Page number |
| pageSize | integer | No | Page size (default 10) |
| types | array | No | Metering types |
| specCodes | array | No | Spec codes |
### CLI Command
```bash
aliyun maxcompute list-compute-metrics-by-instance --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"types":["ComputationSql"],"pageNumber":1,"pageSize":10}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
### Response
| Field | Type | Description |
|-------|------|-------------|
| data.instanceComputeMetrics | array | Job compute metrics |
| data.instanceComputeMetrics[].instanceId | string | Job ID |
| data.instanceComputeMetrics[].type | string | Metering type |
| data.instanceComputeMetrics[].specCode | string | Spec code |
| data.instanceComputeMetrics[].jobOwner | string | Job owner |
| data.instanceComputeMetrics[].projectName | string | Project name |
| data.instanceComputeMetrics[].submitTime | integer | Submit time |
| data.instanceComputeMetrics[].endTime | integer | End time |
| data.instanceComputeMetrics[].signature | string | SQL signature |
| data.instanceComputeMetrics[].usage | number | Usage (GB/CU-hours) |
| data.instanceComputeMetrics[].unit | string | Unit |
| data.totalCount | integer | Total count |
| data.pageNumber | integer | Current page |
| data.pageSize | integer | Page size |
### Compute Types
| Type | Description |
|------|-------------|
| ComputationSql | Internal table SQL |
| ComputationSqlOTS | OTS external table SQL |
| ComputationSqlOSS | OSS external table SQL |
| MapReduce | MapReduce jobs |
| spark | Spark jobs |
| mars | Mars jobs |
### Spec Codes
| Code | Description |
|------|-------------|
| OdpsStandard | Pay-as-you-go standard |
| OdpsSpot | Pay-as-you-go spot |
---
## 8. list-compute-metrics-by-signature
**Endpoint:** `POST /api/v1/computeMetrics/listBySignature`
### Request Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| projectNames | array | No | Project name list |
| startDate | integer | **Yes** | Start time (ms timestamp) |
| endDate | integer | **Yes** | End time (ms timestamp) |
| instanceId | string | No | Instance ID |
| jobOwner | string | No | Job owner |
| signature | string | No | SQL signature |
| pageNumber | integer | No | Page number |
| pageSize | integer | No | Page size (default 10) |
| types | array | No | Metering types |
### CLI Command
```bash
aliyun maxcompute list-compute-metrics-by-signature --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"types":["ComputationSql"],"pageNumber":1,"pageSize":10}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
### Response
| Field | Type | Description |
|-------|------|-------------|
| data.signatureComputeMetrics | array | Signature-level metrics |
| data.signatureComputeMetrics[].signature | string | SQL signature |
| data.signatureComputeMetrics[].projectNames | array | Project names |
| data.signatureComputeMetrics[].usage | number | Total usage |
| data.signatureComputeMetrics[].unit | string | Unit (GBCplx) |
| data.signatureComputeMetrics[].instances | array | Instance list |
| data.signatureComputeMetrics[].instances[].instanceId | string | Instance ID |
| data.signatureComputeMetrics[].instances[].startTime | integer | Start time |
| data.signatureComputeMetrics[].instances[].endTime | integer | End time |
| data.totalCount | integer | Total count |
---
## 9. sum-compute-metrics-by-usage
**Endpoint:** `POST /api/v1/computeMetrics/sumByUsage`
### Request Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| projectNames | array | No | Project name list |
| startDate | integer | **Yes** | Start time (ms timestamp) |
| endDate | integer | **Yes** | End time (ms timestamp) |
| usageType | string | No | `CU` (CU-hours) or `SCAN` (scan volume) |
### CLI Command
```bash
aliyun maxcompute sum-compute-metrics-by-usage --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS},"usageType":"SCAN"}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
### Response
| Field | Type | Description |
|-------|------|-------------|
| data | array | Usage data |
| data[].type | string | Metering type |
| data[].dailyComputeMetrics | array | Daily metrics |
| data[].dailyComputeMetrics[].dateTime | string | Date (yyyyMMdd) |
| data[].dailyComputeMetrics[].usage | string | Usage amount |
| data[].dailyComputeMetrics[].unit | string | Unit (GBCplx) |
---
## 10. sum-compute-metrics-by-record
**Endpoint:** `POST /api/v1/computeMetrics/sumByRecord`
### Request Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| projectNames | array | No | Project name list |
| startDate | integer | **Yes** | Start time (ms timestamp) |
| endDate | integer | **Yes** | End time (ms timestamp) |
### CLI Command
```bash
aliyun maxcompute sum-compute-metrics-by-record --region {REGION_ID} --body '{"startDate":{START_MS},"endDate":{END_MS}}' --user-agent AlibabaCloud-Agent-Skills/alibabacloud-odps-cost-analysis
```
### Response
| Field | Type | Description |
|-------|------|-------------|
| data | array | Record data |
| data[].type | string | Usage type |
| data[].dailyComputeRecords | array | Daily records |
| data[].dailyComputeRecords[].dateTime | string | Date (yyyyMMdd) |
| data[].dailyComputeRecords[].record | string | Record count |
| data[].dailyComputeRecords[].percentage | number | Percentage |
---
## Error Codes
| HTTP Status | Description | Solution |
|-------------|-------------|----------|
| 400 | Invalid parameters | Check timestamp format (milliseconds), verify range <= 31 days |
| 403 | Permission denied | Verify RAM permissions |
| 500 | Server error | Retry later or contact support |
Alibaba Cloud DataWorks Data Governance Tag Management Skill. Use for managing data asset tags: creating, updating, querying tag keys/values, binding/unbindi...
---
name: alibabacloud-dataworks-data-governance
description: |
Alibaba Cloud DataWorks Data Governance Tag Management Skill. Use for managing data asset tags: creating, updating, querying tag keys/values, binding/unbinding tags to data assets, and querying data assets.
Triggers: data governance, data asset tags, DataWorks tags, CreateDataAssetTag, UpdateDataAssetTag, ListDataAssetTags, TagDataAssets, UnTagDataAssets, ListDataAssets, tag management, query data assets, list data assets
---
# DataWorks Data Governance Tag Management
Manage data asset tags via the DataWorks Data Governance API, including creating, updating, and querying tag keys/values, as well as batch binding and unbinding tags on data assets.
> **FORBIDDEN API — `DeleteDataAssetTag` must NOT be used in this skill.**
> Calling `DeleteDataAssetTag` is strictly prohibited. Do not implement, suggest, or invoke this API under any circumstances. If a user requests deletion of a tag key or tag value, inform them that this operation is outside the scope of this skill and must be handled through other authorized channels.
**Architecture**: DataWorks Data Governance Tag API (ROA style) + Alibaba Cloud Python Common SDK
---
> **Pre-check: Aliyun CLI >= 3.3.1 required**
> Run `aliyun version` to verify >= 3.3.1. If not installed or version too low,
> see `references/cli-installation-guide.md` for installation instructions.
> Then [MUST] run `aliyun configure set --auto-plugin-install true` to enable automatic plugin installation.
---
> **Pre-check: Alibaba Cloud Credentials Required**
>
> **Security Rules:**
> - **NEVER** read, echo, or print AK/SK values (e.g., `echo $ALIBABA_CLOUD_ACCESS_KEY_ID` is FORBIDDEN)
> - **NEVER** ask the user to input AK/SK directly in the conversation or command line
> - **NEVER** use `aliyun configure set` with literal credential values
> - **ONLY** use `aliyun configure list` to check credential status
>
> ```bash
> aliyun configure list
> ```
> Check the output for a valid profile (AK, STS, or OAuth identity).
>
> **If no valid profile exists, STOP here.**
> 1. Obtain credentials from [Alibaba Cloud Console](https://ram.console.aliyun.com/manage/ak)
> 2. Configure credentials **outside of this session** (via `aliyun configure` in terminal or environment variables in shell profile)
> 3. Return and re-run after `aliyun configure list` shows a valid profile
---
## Installation
```bash
pip3 install alibabacloud_tea_openapi==0.4.4 alibabacloud_credentials==1.0.8 alibabacloud_tea_util==0.3.14 alibabacloud_openapi_util==0.2.4
```
---
## RAM Permissions
| Product | RAM Action | Resource Scope | Description |
|---------|-----------|----------------|-------------|
| DataWorks | dataworks:CreateDataAssetTag | * | Create a data asset tag key |
| DataWorks | dataworks:UpdateDataAssetTag | * | Update a data asset tag key |
| DataWorks | dataworks:ListDataAssetTags | * | Query the data asset tag list |
| DataWorks | dataworks:TagDataAssets | * | Bind tags to data assets |
| DataWorks | dataworks:UnTagDataAssets | * | Unbind tags from data assets |
| DataWorks | dataworks:ListDataAssets | * | Query the data asset list |
> **[MUST] Permission Failure Handling:** When any command or API call fails due to permission errors at any point during execution, follow this process:
> 1. Read `references/ram-policies.md` to get the full list of permissions required by this SKILL
> 2. Use `ram-permission-diagnose` skill to guide the user through requesting the necessary permissions
> 3. Pause and wait until the user confirms that the required permissions have been granted
---
## Parameter Confirmation
> **IMPORTANT: Parameter Confirmation** — Before executing any command or API call,
> ALL user-customizable parameters (e.g., RegionId, instance names, CIDR blocks,
> passwords, domain names, resource specifications, etc.) MUST be confirmed with
> the user. Do NOT assume or use default values without explicit user approval.
| Parameter | Required/Optional | Description | Default |
|-----------|------------------|-------------|---------|
| `region_id` | Required | Alibaba Cloud Region ID, e.g. `cn-hangzhou` | None — confirm with user |
| `tag_key` | Required (create/update) | Tag key name | None — confirm with user |
| `tag_values` | Optional | List of tag values | `[]` |
| `description` | Optional | Description of the tag key | None |
| `category` | Optional | Tag category: `Normal` or `CUSTOM` | `Normal` |
| `managers` | Optional | List of tag manager user IDs | `[]` |
| `data_asset_ids` | Required (bind/unbind) | List of data asset identifiers, max 100 | None — confirm with user |
| `data_asset_type` | Required (bind/unbind) | Asset type, e.g. `ACS::DataWorks::Table` | None — confirm with user |
| `project_id` | Conditionally required | Project space ID (required when `DataAssetType` is `ACS::DataWorks::Task`) | None |
| `env_type` | Conditionally required | Project environment — required when `project_id` is set: `Prod` / `Dev` | None |
| `asset_type` | Required (ListDataAssets) | Asset object type: `Table`, `Task`, `Node`, `WorkFlow`, `DataServiceApi`, `DataQualityRule` | None — confirm with user |
| `asset_tags` | Optional (ListDataAssets) | Filter by tag key-value list, max 20, e.g. `[{"Key":"k1","Value":"v1"}]` | None |
| `asset_name` | Optional (ListDataAssets) | Asset name keyword for fuzzy search | None |
| `asset_ids` | Optional (ListDataAssets) | List of asset module IDs | None |
| `asset_owner` | Optional (ListDataAssets) | Asset owner user ID | None |
| `sort_by` | Optional (ListDataAssets) | Sort field and direction, e.g. `CreateTime Desc` | `CreateTime Desc` |
---
## Core Workflow
### Initialize SDK Client
```python
import re
from alibabacloud_tea_openapi.client import Client as OpenApiClient
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_openapi_util.client import Client as OpenApiUtilClient
import json
_VALID_DATA_ASSET_TYPES = {
'ACS::DataWorks::Table', 'ACS::DataWorks::Task',
'ACS::DataWorks::Node', 'ACS::DataWorks::WorkFlow',
'ACS::DataWorks::DataServiceApi', 'ACS::DataWorks::DataQualityRule'
}
_VALID_ASSET_TYPES = {'Table', 'Task', 'Node', 'WorkFlow', 'DataServiceApi', 'DataQualityRule'}
_VALID_CATEGORIES = {'Normal', 'CUSTOM'}
_VALID_ENV_TYPES = {'Prod', 'Dev'}
def _validate_region_id(region_id):
if not isinstance(region_id, str) or not region_id:
raise ValueError('region_id must be a non-empty string')
if not re.match(r'^[a-z]{2}-[a-z0-9-]+$', region_id):
raise ValueError(f'Invalid region_id format: {region_id!r}')
def _validate_tags(tags, max_len=20):
if not isinstance(tags, list) or len(tags) == 0:
raise ValueError('tags must be a non-empty list')
if len(tags) > max_len:
raise ValueError(f'tags exceeds max length {max_len}, got {len(tags)}')
for t in tags:
if not isinstance(t, dict) or 'Key' not in t or 'Value' not in t:
raise ValueError('Each tag must be a dict with "Key" and "Value"')
def _validate_asset_ids(ids, max_len=100):
if not isinstance(ids, list) or len(ids) == 0:
raise ValueError('data_asset_ids must be a non-empty list')
if len(ids) > max_len:
raise ValueError(f'data_asset_ids exceeds max length {max_len}, got {len(ids)}')
def _validate_page(page_number, page_size):
if not isinstance(page_number, int) or page_number < 1:
raise ValueError(f'page_number must be an integer >= 1, got {page_number!r}')
if not isinstance(page_size, int) or not (1 <= page_size <= 100):
raise ValueError(f'page_size must be an integer between 1 and 100, got {page_size!r}')
def create_client(region_id: str) -> OpenApiClient:
_validate_region_id(region_id)
credential = CredentialClient()
config = open_api_models.Config(credential=credential)
config.endpoint = f'dataworks.{region_id}.aliyuncs.com'
config.user_agent = 'AlibabaCloud-Agent-Skills'
return OpenApiClient(config)
```
---
### 1. Create Tag Key (CreateDataAssetTag)
```python
def create_data_asset_tag(client, key: str, values=None, description=None,
value_policy=None, category='Normal', managers=None):
if not isinstance(key, str) or not key.strip():
raise ValueError('key must be a non-empty string')
if category not in _VALID_CATEGORIES:
raise ValueError(f'category must be one of {_VALID_CATEGORIES}, got {category!r}')
if values is not None:
if not isinstance(values, list):
raise ValueError('values must be a list')
if managers is not None:
if not isinstance(managers, list):
raise ValueError('managers must be a list')
params = open_api_models.Params(
action='CreateDataAssetTag',
version='2024-05-18',
protocol='HTTPS',
method='POST',
auth_type='AK',
style='ROA',
pathname='/api/v1/data-governance/tags',
req_body_type='json',
body_type='json'
)
body = {'Key': key}
if values is not None:
body['Values'] = values
if description is not None:
body['Description'] = description
if value_policy is not None:
body['ValuePolicy'] = value_policy
if category:
body['Category'] = category
if managers is not None:
body['Managers'] = managers
request = open_api_models.OpenApiRequest(body=body)
runtime = util_models.RuntimeOptions()
runtime.connect_timeout = 5000 # ms
runtime.read_timeout = 30000 # ms
return client.call_api(params, request, runtime)
```
---
### 2. Update Tag Key (UpdateDataAssetTag)
```python
def update_data_asset_tag(client, key: str, values=None, description=None, managers=None):
if not isinstance(key, str) or not key.strip():
raise ValueError('key must be a non-empty string')
if values is not None:
if not isinstance(values, list):
raise ValueError('values must be a list')
if managers is not None:
if not isinstance(managers, list):
raise ValueError('managers must be a list')
params = open_api_models.Params(
action='UpdateDataAssetTag',
version='2024-05-18',
protocol='HTTPS',
method='PUT',
auth_type='AK',
style='ROA',
pathname='/api/v1/data-governance/tags',
req_body_type='json',
body_type='json'
)
body = {'Key': key}
if values is not None:
body['Values'] = values
if description is not None:
body['Description'] = description
if managers is not None:
body['Managers'] = managers
request = open_api_models.OpenApiRequest(body=body)
runtime = util_models.RuntimeOptions()
runtime.connect_timeout = 5000 # ms
runtime.read_timeout = 30000 # ms
return client.call_api(params, request, runtime)
```
---
### 3. List Tag Keys (ListDataAssetTags)
```python
def list_data_asset_tags(client, key=None, category=None, page_number=1, page_size=10):
_validate_page(page_number, page_size)
if category is not None and category not in _VALID_CATEGORIES:
raise ValueError(f'category must be one of {_VALID_CATEGORIES}, got {category!r}')
params = open_api_models.Params(
action='ListDataAssetTags',
version='2024-05-18',
protocol='HTTPS',
method='GET',
auth_type='AK',
style='ROA',
pathname='/api/v1/data-governance/tags',
req_body_type='json',
body_type='json'
)
queries = {
'PageNumber': page_number,
'PageSize': page_size
}
if key:
queries['Key'] = key
if category:
queries['Category'] = category
request = open_api_models.OpenApiRequest(
query=OpenApiUtilClient.query(queries)
)
runtime = util_models.RuntimeOptions()
runtime.connect_timeout = 5000 # ms
runtime.read_timeout = 30000 # ms
return client.call_api(params, request, runtime)
```
---
### 4. Bind Tags to Data Assets (TagDataAssets)
```python
def tag_data_assets(client, tags: list, data_asset_ids: list, data_asset_type: str,
project_id=None, env_type=None, auto_trace_enable=False):
_validate_tags(tags, max_len=20)
_validate_asset_ids(data_asset_ids, max_len=100)
if data_asset_type not in _VALID_DATA_ASSET_TYPES:
raise ValueError(f'data_asset_type must be one of {_VALID_DATA_ASSET_TYPES}, got {data_asset_type!r}')
if data_asset_type == 'ACS::DataWorks::Task':
if project_id is None or env_type is None:
raise ValueError('project_id and env_type are required when data_asset_type is ACS::DataWorks::Task')
if env_type is not None and env_type not in _VALID_ENV_TYPES:
raise ValueError(f'env_type must be one of {_VALID_ENV_TYPES}, got {env_type!r}')
params = open_api_models.Params(
action='TagDataAssets',
version='2024-05-18',
protocol='HTTPS',
method='POST',
auth_type='AK',
style='ROA',
pathname='/api/v1/data-governance/tags/bind',
req_body_type='json',
body_type='json'
)
body = {
'Tags': tags,
'DataAssetIds': data_asset_ids,
'DataAssetType': data_asset_type,
'AutoTraceEnable': auto_trace_enable
}
if project_id is not None:
body['ProjectId'] = project_id
if env_type is not None:
body['EnvType'] = env_type
request = open_api_models.OpenApiRequest(body=body)
runtime = util_models.RuntimeOptions()
runtime.connect_timeout = 5000 # ms
runtime.read_timeout = 30000 # ms
return client.call_api(params, request, runtime)
```
---
### 5. List Data Assets (ListDataAssets)
```python
def list_data_assets(client, asset_type: str, tags=None, name=None, ids=None,
owner=None, project_id=None, sort_by=None,
page_number=1, page_size=10):
if asset_type not in _VALID_ASSET_TYPES:
raise ValueError(f'asset_type must be one of {_VALID_ASSET_TYPES}, got {asset_type!r}')
_validate_page(page_number, page_size)
if tags is not None:
_validate_tags(tags, max_len=20)
params = open_api_models.Params(
action='ListDataAssets',
version='2024-05-18',
protocol='HTTPS',
method='GET',
auth_type='AK',
style='ROA',
pathname='/api/v1/data-governance/assets',
req_body_type='json',
body_type='json'
)
queries = {
'Type': asset_type,
'PageNumber': page_number,
'PageSize': page_size
}
if tags is not None:
queries['Tags'] = json.dumps(tags)
if name is not None:
queries['Name'] = name
if ids is not None:
queries['Ids'] = json.dumps(ids)
if owner is not None:
queries['Owner'] = owner
if project_id is not None:
queries['ProjectId'] = project_id
if sort_by is not None:
queries['SortBy'] = sort_by
request = open_api_models.OpenApiRequest(
query=OpenApiUtilClient.query(queries)
)
runtime = util_models.RuntimeOptions()
runtime.connect_timeout = 5000 # ms
runtime.read_timeout = 30000 # ms
return client.call_api(params, request, runtime)
```
---
### 6. Unbind Tags from Data Assets (UnTagDataAssets)
```python
def untag_data_assets(client, tags: list, data_asset_ids: list, data_asset_type: str,
project_id=None, env_type=None):
_validate_tags(tags, max_len=20)
_validate_asset_ids(data_asset_ids, max_len=100)
if data_asset_type not in _VALID_DATA_ASSET_TYPES:
raise ValueError(f'data_asset_type must be one of {_VALID_DATA_ASSET_TYPES}, got {data_asset_type!r}')
if data_asset_type == 'ACS::DataWorks::Task':
if project_id is None or env_type is None:
raise ValueError('project_id and env_type are required when data_asset_type is ACS::DataWorks::Task')
if env_type is not None and env_type not in _VALID_ENV_TYPES:
raise ValueError(f'env_type must be one of {_VALID_ENV_TYPES}, got {env_type!r}')
params = open_api_models.Params(
action='UnTagDataAssets',
version='2024-05-18',
protocol='HTTPS',
method='POST',
auth_type='AK',
style='ROA',
pathname='/api/v1/data-governance/tags/unbind',
req_body_type='json',
body_type='json'
)
body = {
'Tags': tags,
'DataAssetIds': data_asset_ids,
'DataAssetType': data_asset_type
}
if project_id is not None:
body['ProjectId'] = project_id
if env_type is not None:
body['EnvType'] = env_type
request = open_api_models.OpenApiRequest(body=body)
runtime = util_models.RuntimeOptions()
runtime.connect_timeout = 5000 # ms
runtime.read_timeout = 30000 # ms
return client.call_api(params, request, runtime)
```
---
## Response Handling
```python
def handle_response(response):
status_code = response.get('statusCode')
body = response.get('body', {})
if status_code == 200 and body.get('Data'):
print(f"[SUCCESS] RequestId: {body.get('RequestId')}")
return body.get('Data')
else:
code = body.get('Code')
message = body.get('Message')
raise Exception(f"[FAIL] Code={code}, Message={message}, RequestId={body.get('RequestId')}")
```
---
## Success Verification
Operation success criteria:
- HTTP status code `200`
- Response body `Data` field is `true` (create/update/bind/unbind operations)
- Response body `Data` is an object (ListDataAssetTags, ListDataAssets)
For detailed step-by-step verification commands, see `references/verification-method.md`.
---
## Cleanup
```python
client = create_client('cn-hangzhou') # Replace with actual Region
# Unbind tags from data assets
untag_data_assets(client,
tags=[{"Key": "k1", "Value": "v1"}],
data_asset_ids=["maxcompute-table.project.tableName"],
data_asset_type="ACS::DataWorks::Table"
)
```
> **Note:** Deletion of tag keys or tag values via `DeleteDataAssetTag` is **not supported** in this skill. Do not attempt to call this API.
---
## Best Practices
1. **Batch limits**: `Tags` max 20, `DataAssetIds` max 100 per request — split into batches if needed
2. **Task-type assets**: When `DataAssetType` is `ACS::DataWorks::Task`, both `ProjectId` and `EnvType` are required
3. **Value policy**: Use `ValuePolicy` (e.g. regex `^L[1-7]$`) when creating a tag key to enforce valid value formats
4. **Least privilege**: Create a dedicated RAM role for tag management with only the required permissions
5. **Throttling retry**: Use exponential backoff when encountering `Throttling` errors
6. **Pagination**: `ListDataAssetTags` and `ListDataAssets` default to 10 per page, max 100 — use `TotalCount` to calculate total pages
7. **ListDataAssets serialization**: `Tags` and `Ids` query parameters must be JSON-serialized strings (use `json.dumps()`)
8. **ListDataAssets sort**: Default sort is `CreateTime Desc`; also supports `ModifyTime` and `HealthScore`
---
## Reference Documents
| Document | Description |
|----------|-------------|
| [cli-installation-guide.md](references/cli-installation-guide.md) | Aliyun CLI installation and configuration guide |
| [ram-policies.md](references/ram-policies.md) | RAM permission policies required by this Skill |
| [related-commands.md](references/related-commands.md) | API command reference |
| [verification-method.md](references/verification-method.md) | Step-by-step success verification |
| [acceptance-criteria.md](references/acceptance-criteria.md) | Acceptance criteria with correct/incorrect examples |
FILE:references/acceptance-criteria.md
# Acceptance Criteria — DataWorks Data Governance Tag Management
**Scenario**: DataWorks data asset tag management
**Purpose**: Skill testing acceptance criteria — confirms correct and incorrect implementation patterns
---
## Correct SDK Code Patterns
### 1. Import Patterns
#### ✅ CORRECT
```python
from alibabacloud_tea_openapi.client import Client as OpenApiClient
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_openapi_util.client import Client as OpenApiUtilClient # Required for GET queries
```
#### ❌ INCORRECT
```python
import alibabacloud_dataworks_public20200518 # Do not use product-specific SDK
import aliyunsdkcore # Do not use legacy SDK
```
---
### 2. Authentication
#### ✅ CORRECT — Use CredentialClient; never hardcode AK/SK
```python
credential = CredentialClient()
config = open_api_models.Config(credential=credential)
config.endpoint = 'dataworks.cn-hangzhou.aliyuncs.com'
client = OpenApiClient(config)
```
#### ❌ INCORRECT — Hardcoded credentials
```python
config = open_api_models.Config(
access_key_id='LTAI5t...', # FORBIDDEN — never hardcode
access_key_secret='xxxxx' # FORBIDDEN — never hardcode
)
```
---
### 3. API Style Configuration
#### ✅ CORRECT — ROA style (Body parameters)
```python
params = open_api_models.Params(
action='CreateDataAssetTag',
version='2024-05-18',
protocol='HTTPS',
method='POST',
auth_type='AK',
style='ROA', # DataWorks Data Governance APIs use ROA style
pathname='/api/v1/data-governance/tags',
req_body_type='json',
body_type='json'
)
request = open_api_models.OpenApiRequest(body={'Key': 'k1'})
```
#### ❌ INCORRECT — Misusing RPC style
```python
params = open_api_models.Params(
style='RPC',
pathname='/', # RPC path — not applicable to DataWorks Data Governance APIs
)
```
---
### 4. GET Request (ListDataAssetTags / ListDataAssets)
#### ✅ CORRECT — Use OpenApiUtilClient.query() for query parameters
```python
queries = {'PageNumber': 1, 'PageSize': 10, 'Key': 'test'}
request = open_api_models.OpenApiRequest(
query=OpenApiUtilClient.query(queries)
)
```
#### ✅ CORRECT — ListDataAssets with JSON-serialized Tags and Ids
```python
import json
queries = {
'Type': 'Table',
'Tags': json.dumps([{"Key": "k1", "Value": "v1"}]), # Must be JSON string
'Ids': json.dumps(["maxcompute-table.a123.test"]), # Must be JSON string
'PageNumber': 1,
'PageSize': 10
}
request = open_api_models.OpenApiRequest(
query=OpenApiUtilClient.query(queries)
)
```
#### ❌ INCORRECT — Placing query parameters in the body
```python
request = open_api_models.OpenApiRequest(
body={'PageNumber': 1, 'PageSize': 10} # GET requests must not put pagination params in body
)
```
#### ❌ INCORRECT — Passing Tags/Ids as Python objects without JSON serialization
```python
queries = {
'Type': 'Table',
'Tags': [{"Key": "k1", "Value": "v1"}], # Must be json.dumps(...)
}
```
---
### 5. Batch Operation Limits
#### ✅ CORRECT — Respect the limits
```python
# Tags max 20, DataAssetIds max 100
assert len(tags) <= 20, "Tags exceed the limit of 20"
assert len(data_asset_ids) <= 100, "DataAssetIds exceed the limit of 100"
tag_data_assets(client, tags=tags[:20], data_asset_ids=data_asset_ids[:100], ...)
```
#### ❌ INCORRECT — Ignoring limits
```python
tag_data_assets(client, tags=all_tags, data_asset_ids=all_ids, ...) # May exceed limits
```
---
### 6. Forbidden API — DeleteDataAssetTag
#### ❌ FORBIDDEN — Do not call DeleteDataAssetTag under any circumstances
```python
# This API is strictly prohibited in this skill.
# Do not implement, suggest, or invoke DeleteDataAssetTag.
delete_data_asset_tag(client, key='k1') # FORBIDDEN
delete_data_asset_tag(client, key='k1', values=['v1']) # FORBIDDEN
```
If a user requests deletion of a tag key or tag value, respond that this operation is outside
the scope of this skill and must be handled through other authorized channels.
---
### 7. Task-Type Assets — Required Parameters
#### ✅ CORRECT — Provide project_id and env_type for Task assets
```python
tag_data_assets(
client,
tags=[{"Key": "k1", "Value": "v1"}],
data_asset_ids=["task-id-xxx"],
data_asset_type="ACS::DataWorks::Task",
project_id=131011, # Required for Task type
env_type="Prod" # Required when project_id is set
)
```
#### ❌ INCORRECT — Missing project_id for Task type
```python
tag_data_assets(
client,
tags=[{"Key": "k1", "Value": "v1"}],
data_asset_ids=["task-id-xxx"],
data_asset_type="ACS::DataWorks::Task"
# Missing project_id and env_type — will cause an API error
)
```
---
### 8. Response Handling
#### ✅ CORRECT — Check both statusCode and Data field
```python
body = response.get('body', {})
if response.get('statusCode') == 200 and body.get('Data') == True:
print("Operation succeeded")
else:
raise Exception(f"Failed: Code={body.get('Code')}, Message={body.get('Message')}")
```
#### ❌ INCORRECT — Ignoring the response
```python
response = client.call_api(params, request, runtime)
print("done") # Never skip response validation
```
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
Complete guide for installing and configuring Aliyun CLI.
> **Aliyun CLI 3.3.1+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.1 or later for full plugin ecosystem coverage.
## Installation
### macOS
**Using Homebrew (Recommended)**
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
# Verify version (>= 3.3.1)
aliyun version
```
**Using Binary**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
# Extract
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
# Move to PATH
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
### Linux
**Debian/Ubuntu**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**CentOS/RHEL**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**ARM64 Architecture**
```bash
# Download ARM64 version
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-arm64.tgz
sudo mv aliyun /usr/local/bin/
```
### Windows
**Using Binary**
1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Open new Command Prompt or PowerShell
5. Verify: `aliyun version`
**Using PowerShell**
```powershell
# Download
Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip"
# Extract
Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli
# Add to PATH (requires admin privileges)
$env:Path += ";C:\aliyun-cli"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)
# Verify
aliyun version
```
## Configuration
### Quick Start
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
All `aliyun configure` commands support non-interactive flags, which is the recommended approach —
it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts.
**Where to Get Access Keys**
1. Log in to Aliyun Console: https://ram.console.aliyun.com/
2. Navigate to: AccessKey Management
3. Create a new AccessKey pair
4. Save the secret immediately — it's only shown once
### Configuration Modes
Aliyun CLI supports 6 authentication modes. All examples below use non-interactive flags.
#### 1. AK Mode (Access Key)
Most common mode for personal accounts and scripts.
```bash
aliyun configure set \
--mode AK \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Configuration is stored in `~/.aliyun/config.json`:
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "AK",
"access_key_id": "LTAI5tXXXXXXXX",
"access_key_secret": "8dXXXXXXXXXXXXXXXXXXXXXXXX",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "en"
}
]
}
```
#### 2. StsToken Mode (Temporary Credentials)
For short-lived access (tokens expire in 1-12 hours).
```bash
aliyun configure set \
--mode StsToken \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--sts-token v1.0:XXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Use cases: CI/CD pipelines, temporary access for external contractors, cross-account access.
#### 3. RamRoleArn Mode (Assume RAM Role)
Assume a RAM role for elevated or cross-account access.
```bash
aliyun configure set \
--mode RamRoleArn \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--ram-role-arn acs:ram::123456789012:role/AdminRole \
--role-session-name my-session \
--region cn-hangzhou
```
Use cases: cross-account resource access, temporary elevated privileges, role-based access control.
#### 4. EcsRamRole Mode (ECS Instance RAM Role)
Use the RAM role attached to an ECS instance — no credentials needed.
```bash
aliyun configure set \
--mode EcsRamRole \
--ram-role-name MyEcsRole \
--region cn-hangzhou
```
Requirements: must be running on an ECS instance with a RAM role attached.
Use cases: scripts and automation running on ECS instances.
#### 5. RsaKeyPair Mode (RSA Key Pair)
Use RSA key pair for authentication (generate key pair in Aliyun Console first).
```bash
aliyun configure set \
--mode RsaKeyPair \
--private-key /path/to/private-key.pem \
--key-pair-name my-key-pair \
--region cn-hangzhou
```
#### 6. RamRoleArnWithEcs Mode (ECS + RAM Role)
Combine ECS instance role with RAM role assumption for cross-account access from ECS.
```bash
aliyun configure set \
--mode RamRoleArnWithEcs \
--ram-role-name MyEcsRole \
--ram-role-arn acs:ram::123456789012:role/TargetRole \
--role-session-name my-session \
--region cn-hangzhou
```
### Environment Variables
**Highest priority** - overrides config file
**Access Key Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**STS Token Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_SECURITY_TOKEN=your_sts_token
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**ECS RAM Role Mode**
```bash
export ALIBABA_CLOUD_ECS_METADATA=role_name
```
**Use Case**:
- CI/CD pipelines
- Docker containers
- Temporary credential override
### Managing Multiple Profiles
**Create Named Profiles**
```bash
aliyun configure set --profile projectA \
--mode AK \
--access-key-id LTAI5tAAAAAAAA \
--access-key-secret 8dAAAAAAAAAAAAAAAAAAAAAAAA \
--region cn-hangzhou
aliyun configure set --profile projectB \
--mode AK \
--access-key-id LTAI5tBBBBBBBB \
--access-key-secret 8dBBBBBBBBBBBBBBBBBBBBBBBB \
--region cn-shanghai
```
**Use Specific Profile**
```bash
aliyun ecs describe-instances --profile projectA
export ALIBABA_CLOUD_PROFILE=projectA
aliyun ecs describe-instances # Uses projectA
```
**List and Switch Profiles**
```bash
aliyun configure list # List all profiles
aliyun configure set --current projectA # Switch default profile
```
### Credential Priority
Credentials are loaded in this order (first found wins):
1. **Command-line flag**: `--profile <name>`
2. **Environment variable**: `ALIBABA_CLOUD_PROFILE`
3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc.
4. **Configuration file**: `~/.aliyun/config.json` (current profile)
5. **ECS Instance RAM Role**: If running on ECS with attached role
## Verification
### Test Authentication
```bash
# Basic test - list regions
aliyun ecs describe-regions
# Expected output: JSON array of regions
```
**If successful**, you'll see:
```json
{
"Regions": {
"Region": [
{
"RegionId": "cn-hangzhou",
"RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com",
"LocalName": "华东 1(杭州)"
},
...
]
},
"RequestId": "..."
}
```
**If failed**, you'll see error messages:
- `InvalidAccessKeyId.NotFound` - Wrong Access Key ID
- `SignatureDoesNotMatch` - Wrong Access Key Secret
- `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode)
- `Forbidden.RAM` - Insufficient permissions
### Debug Configuration
```bash
# Show current configuration
aliyun configure get
# Test with debug logging
aliyun ecs describe-regions --log-level=debug
# Check credential provider
aliyun configure get mode
```
## Security Best Practices
### 1. Use RAM Users (Not Root Account)
❌ **Don't**: Use Aliyun root account credentials
✅ **Do**: Create RAM users with specific permissions
```bash
# Create RAM user in console
# Attach only necessary policies
# Use RAM user's access keys
```
### 2. Principle of Least Privilege
Grant only the minimum permissions needed:
```bash
# Example: Read-only ECS access
# Attach policy: AliyunECSReadOnlyAccess
```
### 3. Rotate Access Keys Regularly
```bash
# Create new access key in RAM Console, then update configuration
aliyun configure set --access-key-id NEW_KEY --access-key-secret NEW_SECRET
# Delete old access key from console
```
### 4. Use STS Tokens for Temporary Access
```bash
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token XXXX --region cn-hangzhou
```
### 5. Use ECS RAM Roles When Possible
```bash
aliyun configure set --mode EcsRamRole --ram-role-name MyRole --region cn-hangzhou
```
### 6. Never Commit Credentials
```bash
# Add to .gitignore
echo "~/.aliyun/config.json" >> .gitignore
# Use environment variables in CI/CD instead
```
### 7. Secure Config File
```bash
# Restrict permissions
chmod 600 ~/.aliyun/config.json
```
## Troubleshooting
### Issue: Command Not Found
```bash
# Check installation
which aliyun
# Check PATH
echo $PATH
# Reinstall or add to PATH
```
### Issue: Authentication Failed
```bash
# Verify configuration
aliyun configure get
# Test with debug
aliyun ecs describe-regions --log-level=debug
# Check credentials in console
# Verify access key is active
```
### Issue: Permission Denied
```bash
# Error: Forbidden.RAM
# Check RAM user permissions
# Attach necessary policies in RAM console
# Example: AliyunECSFullAccess for ECS operations
```
### Issue: STS Token Expired
```bash
# Error: InvalidSecurityToken.Expired
# Reconfigure with new token
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token NEW_TOKEN --region cn-hangzhou
```
### Issue: Wrong Region
```bash
# Some resources may not exist in the specified region
# Check available regions
aliyun ecs describe-regions
# Update default region
aliyun configure set region cn-shanghai
```
## Advanced Configuration
### Custom Endpoint
```bash
# Use custom or private endpoint
export ALIBABA_CLOUD_ECS_ENDPOINT=ecs-vpc.cn-hangzhou.aliyuncs.com
```
### Proxy Settings
```bash
# HTTP proxy
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
# No proxy for specific domains
export NO_PROXY=localhost,127.0.0.1,.aliyuncs.com
```
### Timeout Settings
```bash
# Connection timeout (default: 10s)
export ALIBABA_CLOUD_CONNECT_TIMEOUT=30
# Read timeout (default: 10s)
export ALIBABA_CLOUD_READ_TIMEOUT=30
```
## Next Steps
After installation and configuration:
1. **Install plugins** for services you need (v3.3.1+ supports all published product plugins):
```bash
aliyun plugin install --names ecs vpc rds
# List all available plugins
aliyun plugin list-remote
```
2. **Explore commands**:
```bash
aliyun ecs --help
aliyun fc --help
```
3. **Read documentation**:
- [Command Syntax Guide](./command-syntax.md)
- [Global Flags Reference](./global-flags.md)
- [Common Scenarios](./common-scenarios.md)
## References
- Official Documentation: https://help.aliyun.com/zh/cli/
- RAM Console: https://ram.console.aliyun.com/
- Access Key Management: https://ram.console.aliyun.com/manage/ak
- Plugin Repository: https://github.com/aliyun/aliyun-cli
FILE:references/ram-policies.md
# RAM Policies — DataWorks Data Governance Tag Management
## Permission Summary
| Product | RAM Action | Resource Scope | API | Description |
|---------|-----------|----------------|-----|-------------|
| DataWorks | dataworks:CreateDataAssetTag | * | CreateDataAssetTag | Create a data asset tag key |
| DataWorks | dataworks:UpdateDataAssetTag | * | UpdateDataAssetTag | Update a data asset tag key/values |
| DataWorks | dataworks:ListDataAssetTags | * | ListDataAssetTags | Query the data asset tag list |
| DataWorks | dataworks:TagDataAssets | * | TagDataAssets | Bind tags to data assets |
| DataWorks | dataworks:UnTagDataAssets | * | UnTagDataAssets | Unbind tags from data assets |
| DataWorks | dataworks:ListDataAssets | * | ListDataAssets | Query the data asset list |
## RAM Policy Document
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dataworks:CreateDataAssetTag",
"dataworks:UpdateDataAssetTag",
"dataworks:ListDataAssetTags",
"dataworks:ListDataAssets"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"dataworks:TagDataAssets",
"dataworks:UnTagDataAssets"
],
"Resource": "*"
}
]
}
```
## Minimal Policy (Read-Only)
Use this policy if only querying the tag list and data asset list is required:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dataworks:ListDataAssetTags",
"dataworks:ListDataAssets"
],
"Resource": "*"
}
]
}
```
## Minimal Policy (Bind/Unbind Only)
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dataworks:ListDataAssetTags",
"dataworks:ListDataAssets",
"dataworks:TagDataAssets",
"dataworks:UnTagDataAssets"
],
"Resource": "*"
}
]
}
```
## How to Apply
1. Log in to the [RAM Console](https://ram.console.aliyun.com/policies)
2. Create a custom policy and paste the JSON above
3. Attach the policy to the target RAM user or role
4. Permissions take effect immediately
FILE:references/related-commands.md
# Related API Commands — DataWorks Data Governance Tag Management
> Note: DataWorks Data Governance Tag APIs are called primarily via the Python Common SDK (ROA style).
> If the `aliyun dataworks` CLI plugin supports these commands, plugin mode can be used
> (requires the `aliyun dataworks` plugin to be installed).
## Tag Key Management
| Operation | API Name | HTTP Method | Path | Description |
|-----------|---------|-------------|------|-------------|
| Create tag key | CreateDataAssetTag | POST | `/api/v1/data-governance/tags` | Create a new tag key with optional initial values |
| Update tag key | UpdateDataAssetTag | PUT | `/api/v1/data-governance/tags` | Update a tag key's values, description, or managers |
| List tag keys | ListDataAssetTags | GET | `/api/v1/data-governance/tags` | Paginated query of tag keys; supports fuzzy search by Key and filtering by Category |
## Data Asset Query
| Operation | API Name | HTTP Method | Path | Description |
|-----------|---------|-------------|------|-------------|
| List data assets | ListDataAssets | GET | `/api/v1/data-governance/assets` | Paginated query of data assets by type, tags, name, owner, or IDs; supports sorting |
## Data Asset Bind / Unbind
| Operation | API Name | HTTP Method | Path | Description |
|-----------|---------|-------------|------|-------------|
| Bind tags to assets | TagDataAssets | POST | `/api/v1/data-governance/tags/bind` | Batch-bind tags to data assets; Tags max 20, DataAssetIds max 100 |
| Unbind tags from assets | UnTagDataAssets | POST | `/api/v1/data-governance/tags/unbind` | Batch-unbind tags from data assets |
## Key Parameter Reference
### ListDataAssets — Type Values
| Value | Description |
|-------|-------------|
| `Table` | Data table |
| `Task` | Scheduling task |
| `Node` | Data development node |
| `WorkFlow` | Scheduling workflow |
| `DataServiceApi` | Data service API |
| `DataQualityRule` | Data quality rule |
### ListDataAssets — SortBy Values
| Value | Description |
|-------|-------------|
| `CreateTime Desc` / `CreateTime Asc` | Sort by creation time (default: `CreateTime Desc`) |
| `ModifyTime Desc` / `ModifyTime Asc` | Sort by modification time |
| `HealthScore Desc` / `HealthScore Asc` | Sort by health score |
### DataAssetType Values (TagDataAssets / UnTagDataAssets)
| Value | Description |
|-------|-------------|
| `ACS::DataWorks::Table` | DataWorks data table |
| `ACS::DataWorks::Task` | DataWorks task (requires `ProjectId` + `EnvType`) |
### Category Values
| Value | Description |
|-------|-------------|
| `Normal` | Standard tag (default) |
| `CUSTOM` | Custom tag |
### ValuePolicy Example
```json
{"type": "string", "content": "^L[1-7]$"}
```
- `type`: Value type, e.g. `string`
- `content`: Regular expression to restrict allowed tag value formats
## Python SDK Quick Reference
```python
# List all tags
resp = list_data_asset_tags(client, page_number=1, page_size=100)
# List data assets of type Table, filtered by tag and name keyword
resp = list_data_assets(
client,
asset_type='Table',
tags=[{"Key": "data_level", "Value": "L1"}],
name='dwd_',
page_number=1,
page_size=10
)
# Create tag key "data_level" with initial values L1–L3
resp = create_data_asset_tag(
client,
key='data_level',
values=['L1', 'L2', 'L3'],
description='Data asset security level',
value_policy='{"type":"string","content":"^L[1-7]$"}',
category='Normal'
)
# Bind a tag to a table
resp = tag_data_assets(
client,
tags=[{"Key": "data_level", "Value": "L1"}],
data_asset_ids=["maxcompute-table.my_project.my_table"],
data_asset_type="ACS::DataWorks::Table"
)
```
FILE:references/verification-method.md
# Success Verification — DataWorks Data Governance Tag Management
## Common Response Structure
All APIs return a unified response structure:
```json
{
"RequestId": "0bc14115****159376359",
"Message": "success",
"Code": "0",
"Data": true
}
```
**Success criteria:**
- HTTP status code == `200`
- `Data` == `true` (create/update/bind/unbind operations)
- `Data` is an object (ListDataAssetTags, ListDataAssets)
---
## Verification Steps by Operation
### 1. Verify CreateDataAssetTag
```python
response = create_data_asset_tag(client, key='test_key', values=['v1'], description='test')
body = response.get('body', {})
assert response.get('statusCode') == 200
assert body.get('Data') == True, f"Create failed: {body.get('Message')}"
print(f"[OK] Tag key 'test_key' created. RequestId={body.get('RequestId')}")
# Confirm the tag key appears in the list
list_resp = list_data_asset_tags(client, key='test_key')
tags = list_resp.get('body', {}).get('Data', {}).get('DataAssetTags', [])
assert any(t.get('Key') == 'test_key' for t in tags), "Tag key not found in list"
print("[OK] ListDataAssetTags verification passed")
```
### 2. Verify UpdateDataAssetTag
```python
response = update_data_asset_tag(client, key='test_key', values=['v1', 'v2'], description='updated')
body = response.get('body', {})
assert body.get('Data') == True, f"Update failed: {body.get('Message')}"
print("[OK] Tag key updated successfully")
```
### 3. Verify ListDataAssetTags
```python
response = list_data_asset_tags(client, page_number=1, page_size=10)
body = response.get('body', {})
data = body.get('Data', {})
assert 'TotalCount' in data, "Response missing TotalCount"
assert 'DataAssetTags' in data, "Response missing DataAssetTags"
print(f"[OK] Found {data.get('TotalCount')} tag keys, {len(data.get('DataAssetTags', []))} on this page")
```
### 4. Verify ListDataAssets
```python
response = list_data_assets(client, asset_type='Table', page_number=1, page_size=10)
body = response.get('body', {})
data = body.get('Data', {})
assert response.get('statusCode') == 200, f"HTTP error: {response.get('statusCode')}"
assert 'TotalCount' in data, "Response missing TotalCount"
assert 'DataAssets' in data, "Response missing DataAssets"
print(f"[OK] Found {data.get('TotalCount')} assets, {len(data.get('DataAssets', []))} on this page")
```
### 6. Verify TagDataAssets
```python
response = tag_data_assets(
client,
tags=[{"Key": "test_key", "Value": "v1"}],
data_asset_ids=["maxcompute-table.my_project.my_table"],
data_asset_type="ACS::DataWorks::Table"
)
body = response.get('body', {})
assert body.get('Data') == True, f"Bind failed: {body.get('Message')}"
print("[OK] Tag bound to asset successfully")
```
### 7. Verify UnTagDataAssets
```python
response = untag_data_assets(
client,
tags=[{"Key": "test_key", "Value": "v1"}],
data_asset_ids=["maxcompute-table.my_project.my_table"],
data_asset_type="ACS::DataWorks::Table"
)
body = response.get('body', {})
assert body.get('Data') == True, f"Unbind failed: {body.get('Message')}"
print("[OK] Tag unbound from asset successfully")
```
---
## Common Error Codes
| Code | Description | Resolution |
|------|-------------|------------|
| `0` | Success | — |
| `210400101` | Invalid parameter | Check required fields and value formats |
| `Forbidden` | Insufficient permissions | Review RAM policies in `ram-policies.md` |
| `Throttling` | Request rate exceeded | Retry with exponential backoff |
| `InternalError` | Server-side error | Record RequestId and contact support |
Alicloud MaxCompute Migration Service (MMS) Skill. Use for migrating data from various data sources (Hive, BigQuery, Databricks, Snowflake, Redshift, MaxComp...
---
name: alibabacloud-maxcompute-migration-service
description: |
Alicloud MaxCompute Migration Service (MMS) Skill. Use for migrating data from various data sources (Hive, BigQuery, Databricks, Snowflake, Redshift, MaxCompute) to MaxCompute.
Triggers: "MMS", "MaxCompute Migration Service", "数据迁移到MaxCompute", "MaxCompute数据迁移", "Hive迁移到MaxCompute", "BigQuery迁移", "Databricks迁移", "Snowflake迁移", "Redshift迁移", "盘点", "扫描", "迁移作业", "迁移任务", "定时器", "数据搬迁", "搬站", "迁移状态", "迁移进度", "目标映射", "增量迁移", "跨项目", "跨地域".
---
# MMS Data Migration Management
You are a **data migration expert** for MaxCompute Migration Service (MMS).
Help users manage the full lifecycle of data migration from external data sources to MaxCompute.
> **[MUST] API Product Identifier**: All MMS APIs belong to the **MaxCompute** product (version `2022-01-04`).
> CLI command format: `aliyun maxcompute <command> [params]`.
> **Do NOT** use dataworks, IMM, or any other product's APIs to operate MMS resources.
> **Pre-check: Aliyun CLI >= 3.3.1 required**
> Run `aliyun version` to verify >= 3.3.1. If not installed or version too low,
> see `references/cli-installation-guide.md` for installation instructions.
> Then [MUST] run the following setup commands:
> ```bash
> aliyun plugin update Chinese # update plugin to latest version
> aliyun configure set --auto-plugin-install true # enable automatic plugin installation
> aliyun configure ai-mode enable # enable AI-Mode for CLI tracking
> aliyun configure ai-mode set-user-agent Chinese # set User-Agent for AI-Mode
> ```
>
> **[MUST] After workflow ends** (task complete or session ending), disable AI-Mode:
> ```bash
> aliyun configure ai-mode disable # disable AI-Mode after workflow completes
> ```
## Core Principles
1. **Safety First** — Confirm user intent before executing create, start, stop, or delete operations
- **Credential Masking**: You MUST sanitize sensitive fields (replace with `********`) in ALL API responses **immediately after receiving them** — before displaying to user, writing to any file (including intermediate/raw response files in `ran_scripts/`), or any further processing. No file on disk should ever contain plaintext credentials. Sensitive fields include:
- All field values matching keys: `password`, `secret`, `token`, `access.id`, `access.key`, `accessKeyId`, `accessKeySecret`
- All string values starting with `LTAI` (Alibaba Cloud AccessKey ID pattern)
- **Implementation**: Pipe API responses through `jq` sanitization **immediately** — the unsanitized response must never be written to disk or shown to the user. Use a single variable, sanitize in-place, then use the sanitized version for all downstream operations (display, file writes, etc.):
```bash
response=$(aliyun maxcompute ... 2>&1)
response=$(echo "$response" | jq 'walk(if type == "object" then with_entries(if (.key | test("password|secret|token|access.id|access.key|accessKeyId|accessKeySecret"; "i")) or (.value | type == "string" and test("^LTAI")) then .value = "********" else . end) else . end)')
# Now safe to use: echo "$response", write to file, display to user, etc.
```
2. **Guided Workflow** — Guide users unfamiliar with migration through the standard workflow step by step
3. **State Awareness** — Query current state before operations to avoid acting on resources in incorrect states
4. **Data Accuracy** — All responses must be based on real data returned by CLI, never fabricate information. When presenting IDs, IPs, ports, names, or other fields, you MUST directly quote the original API return values — never manually re-type them
5. **Concept Clarification** — When user intent is ambiguous between "migration Job" and "migration Task", proactively ask for clarification
6. **ID/Name Resolution** — Users often provide names rather than IDs; resolve via list APIs first
## Concepts
### Job vs Task
Two commonly confused concepts in MMS:
| Concept | Description | CLI Command Prefix |
|---------|-------------|-------------------|
| Migration Job | A migration plan created by the user, containing migration config; one job can contain multiple tasks | `*-mms-job*` |
| Migration Task | A concrete migration instance produced when a job runs, corresponding to a single table or partition | `*-mms-task*` |
**How to determine**:
- User says "create migration", "migrate entire database", "migrate some tables" → operate on **Job**
- User says "check migration progress", "check a table's migration status", "retry failed" → clarify whether Job or Task
- User provides `job_id` → operate on **Job**
- User provides `task_id` or asks about "a specific table's migration" → operate on **Task**
**When ambiguous, proactively ask**:
> "Are you referring to a migration Job or a specific migration Task? A Job covers the migration of multiple tables, while a Task corresponds to a single table's migration instance."
### Name to ID Resolution
MMS APIs identify resources by ID, but users typically provide names. Resolution workflow:
| Resource | ID Param | Query Command |
|----------|----------|---------------|
| Data Source | `source_id` | `list-mms-data-sources --name <name>` |
| Migration Job | `job_id` | `list-mms-jobs --source-id <id> --name <name>` |
| Migration Task | `task_id` | `list-mms-tasks --source-id <id> --src-table-name <name>` |
> **Note**: The `--name` parameter uses **fuzzy matching (LIKE)** on the backend and may return multiple results.
**Matching Rules**:
1. Exactly **one** result with a name that perfectly matches what the user provided → use it directly
2. Empty result set → inform the user and suggest checking the name
3. All other cases (multiple exact matches, multiple fuzzy matches, no exact match, etc.) → list all results and ask the user to confirm
## Supported Regions
MMS is available in: China East 1 (Hangzhou), China East 2 (Shanghai), China North 2 (Beijing), China North 3 (Zhangjiakou), China North 6 (Ulanqab), China South 1 (Shenzhen), China Southwest 1 (Chengdu), China (Hong Kong), Indonesia (Jakarta), Singapore, Japan (Tokyo), US (Virginia), Germany (Frankfurt).
> **Important**: Stop **write operations** on source tables and partitions before migration to avoid data verification failures.
## Supported Data Source Types
| Data Source | Type Identifier | Description |
|-------------|----------------|-------------|
| Apache Hive | `Hive` | Hive Metastore + HDFS, the most common migration scenario |
| Google BigQuery | `BigQuery` | Google Cloud data warehouse |
| Snowflake | `Snowflake` | Snowflake cloud data warehouse |
| Amazon Redshift | `Redshift` | AWS data warehouse |
| Databricks | `Databricks` | Databricks Lakehouse |
| MaxCompute | `MaxCompute` | Cross-project/cross-region migration between MaxCompute projects |
## Prerequisites
### 1. Service-Linked Role
Before using MMS for the first time, create the service-linked role `AliyunServiceRoleForMaxComputeMMS`:
**Via MaxCompute Console:**
1. Log in to MaxCompute Console > Data Transfer > Migration Service
2. Click Add Data Source — the system will prompt to create the service-linked role
**Via RAM Console:**
1. Log in to RAM Console > Identities > Roles
2. Click Create Role > Create Service-Linked Role
3. Select trusted service: `AliyunServiceRoleForMaxComputeMMS`
> **Note**: RAM users need `AliyunRAMFullAccess` permission to create service-linked roles
### 2. MaxCompute Project
- A target MaxCompute project is required
- The project must be bound to a **Data Transfer Service** type Quota resource
### 3. VPC Network Connection
- A VPC network connection (passthrough) must be established
- Ensure network access to the source data (via public NAT gateway or Express Connect)
### 4. MaxCompute Data Permissions
Grant data operation permissions to the service-linked role in the target project:
```sql
-- Add service-linked role to project
USE <target_project>;
ADD USER `RAM$<account_id>:role/AliyunServiceRoleForMaxComputeMMS`;
-- Option 1: Coarse-grained authorization (recommended)
GRANT admin TO USER `RAM$<account_id>:role/AliyunServiceRoleForMaxComputeMMS`;
-- Option 2: Fine-grained authorization
GRANT Read,Write,List,CreateTable,CreateInstance,CreateFunction,CreateResource
ON project <project_name> TO USER `RAM$<account_id>:role/AliyunServiceRoleForMaxComputeMMS`;
```
## Authentication
> **Pre-check: Alibaba Cloud Credentials Required**
>
> **Security Rules:**
> - **NEVER** read, echo, or print AK/SK values (e.g., `echo $ALIBABA_CLOUD_ACCESS_KEY_ID` is FORBIDDEN)
> - **NEVER** ask the user to input AK/SK directly in the conversation or command line
> - **NEVER** use `aliyun configure set` with literal credential values
> - **ONLY** use `aliyun configure list` to check credential status
>
> ```bash
> aliyun configure list
> ```
> Check the output for a valid profile (AK, STS, or OAuth identity).
>
> **If no valid profile exists, STOP here.**
> 1. Obtain credentials from [Alibaba Cloud Console](https://ram.console.aliyun.com/manage/ak)
> 2. Configure credentials **outside of this session** (via `aliyun configure` in terminal or environment variables in shell profile)
> 3. Return and re-run after `aliyun configure list` shows a valid profile
## Migration Workflow
Standard migration workflow — enter at any step based on user needs:
```
1. Create Data Source → 2. Scan Metadata → 3. Configure Target Mapping → 4. Create Job → 5. Monitor Tasks → 6. Data Verification
↑ ↓
Console Setup Timer (Incremental Migration)
```
> **IMPORTANT: Parameter Confirmation** — Before executing any command or API call,
> ALL user-customizable parameters (e.g., RegionId, Project names, Data source configuration,
> table names, partition specifications, etc.) MUST be confirmed with the user.
> Do NOT assume or use default values without explicit user approval.
### Step 1: Data Source Management
> **[MUST] Guide users to the console to create data sources** — do NOT create via API.
> Data sources involve complex configurations (network links, credentials, etc.) that are more intuitive and secure via the console.
>
> Console URL: `https://maxcompute.console.aliyun.com/{region}/mma/datasource`
> (replace `{region}` with the user's region, e.g., `cn-hangzhou`, `cn-shanghai`)
After creating in the console, verify via CLI:
```bash
# List data sources
aliyun maxcompute list-mms-data-sources --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# Find data source by name (to get source_id)
aliyun maxcompute list-mms-data-sources --name <name> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# Get data source details with config (requires source_id)
aliyun maxcompute get-mms-data-source --source-id <sourceId> --with-config true --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
**Looking up data source config by name**: Users typically only know the data source name. Resolve `source_id` first:
1. `list-mms-data-sources --name <name>` → extract `source_id` from results
2. `get-mms-data-source --source-id <sourceId> --with-config true` → view full config
> Warning: `--with-config true` response contains plaintext credentials (AccessKey ID, passwords, etc.). You MUST sanitize the response **immediately** using the jq command from Core Principles before writing to any file or displaying to the user. Never save unsanitized API responses to disk.
### Step 2: Metadata Scan
Scan the data source to discover databases, tables, and partitions.
```bash
# Initiate metadata scan
aliyun maxcompute create-mms-fetch-metadata-job --source-id <sourceId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# Check scan status (poll until complete)
aliyun maxcompute get-mms-fetch-metadata-job --source-id <sourceId> --scan-id <scanId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
> Metadata scan typically takes 1-3 minutes. Poll `get-mms-fetch-metadata-job` until completion.
After scan completes, view metadata:
```bash
# List databases
aliyun maxcompute list-mms-dbs --source-id <sourceId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# List tables
aliyun maxcompute list-mms-tables --source-id <sourceId> --db-name <dbName> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# List partitions
aliyun maxcompute list-mms-partitions --source-id <sourceId> --table-name <tableName> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### Step 3: Metadata Management & Target Mapping
View and configure source-to-target mappings. Complete this step before creating migration jobs.
- **View databases**: `list-mms-dbs` (list) → `get-mms-db` (details)
- **View tables**: `list-mms-tables` (list) → `get-mms-table` (details)
- **View partitions**: `list-mms-partitions` for partition info and status
> **Note**: Target project mapping must be configured via the console.
> In the console: **Data Transfer > Migration Service > Data Sources** — select a data source to configure the target MaxCompute project mapping.
### Step 4: Create Migration Job
**Jobs start executing automatically after creation — no manual start required.**
```bash
# Create migration job
aliyun maxcompute create-mms-job \
--source-id <sourceId> \
--body '{
"name": "<job_name>",
"srcDbName": "<src_db_name>",
"enableVerification": true
}' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
#### CreateMmsJob Parameters
Supported parameters in body:
| Parameter | Required | Description |
|-----------|----------|-------------|
| name | Yes | Job name |
| srcDbName | Yes | Source database name |
| tables | No | List of table names (for table-level migration) |
| partitionFilters | No | Partition filter expression |
| tableBlackList | No | Table blacklist (exclude tables in full-database migration) |
| tableWhiteList | No | Table whitelist (include only specified tables) |
| enableSchemaMigration | No | Whether to migrate table schema (default: true) |
| enableDataMigration | No | Whether to migrate data (default: true) |
| enableVerification | No | Whether to enable data verification |
| increment | No | Whether to perform incremental migration |
**Return value**: On success, returns `async_task_id` and `job_id`, which can be used with:
- `get-mms-async-task` — check job startup progress
- `get-mms-job` — check job execution status
Choose migration granularity:
- **Full database**: pass only `srcDbName`, optionally use `tableBlackList`/`tableWhiteList` to filter
- **Table-level**: pass `srcDbName` + `tables` list
- **Partition-level**: pass `srcDbName` + `partitionFilters` or specific partitions
```bash
# List migration jobs
aliyun maxcompute list-mms-jobs --source-id <sourceId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# Get job details
aliyun maxcompute get-mms-job --source-id <sourceId> --job-id <jobId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
#### Job Control
```bash
# Stop job
aliyun maxcompute stop-mms-job --source-id <sourceId> --job-id <jobId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# Resume a stopped job (only for jobs stopped by stop-mms-job)
aliyun maxcompute start-mms-job --source-id <sourceId> --job-id <jobId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# Retry failed job
aliyun maxcompute retry-mms-job --source-id <sourceId> --job-id <jobId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# Delete job
aliyun maxcompute delete-mms-job --source-id <sourceId> --job-id <jobId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### Step 5: Monitor Migration Tasks
```bash
# List migration tasks (filter by job, status, table name)
aliyun maxcompute list-mms-tasks --source-id <sourceId> --job-id <jobId> --status <status> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# Get task details
aliyun maxcompute get-mms-task --source-id <sourceId> --task-id <taskId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# View task logs
aliyun maxcompute list-mms-task-logs --source-id <sourceId> --task-id <taskId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# Check async task status (e.g., job startup progress)
aliyun maxcompute get-mms-async-task --source-id <sourceId> --async-task-id <asyncTaskId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
Migration progress can also be viewed in the console: **Data Transfer > Migration Service > Migration Monitoring**
### Step 6: Data Verification
MMS automatically performs data verification after migration (if `enableVerification` was enabled when creating the job).
> **The current Agent cannot directly execute verification.** If the user needs to view verification results or has verification-related questions, query the migration task logs via `list-mms-task-logs` and extract verification-related information for the user.
```bash
# View task logs (includes verification results)
aliyun maxcompute list-mms-task-logs --source-id <sourceId> --task-id <taskId> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
## Polling Pattern
MMS metadata scans and migrations are async operations that require polling:
| Operation | Poll Command | Suggested Interval | Estimated Duration |
|-----------|-------------|-------------------|-------------------|
| Metadata Scan | `get-mms-fetch-metadata-job` | 10s | 1-3 minutes |
| Async Task (job startup, etc.) | `get-mms-async-task` | 10s | 1-5 minutes |
| Migration Job | `get-mms-job` | 30s | Minutes to hours |
| Migration Task | `get-mms-task` | 30s | Minutes to hours |
**For long-running tasks**:
- Migration tasks may run for hours — do not continuously poll
- Provide `job_id`/`task_id` to the user so they can check status later
## Common Scenarios
### Scenario A: View Overall Migration Status
1. `list-mms-data-sources` → get data source list
2. `list-mms-jobs` for target data source → check job status
3. `list-mms-tasks` for active jobs → check task execution
4. Summarize: data source count, job status distribution, task completion rate
### Scenario B: Troubleshoot Failed Migration
1. `list-mms-tasks --status failed` → filter failed tasks
2. `get-mms-task` → view failed task details
3. `list-mms-task-logs` → view error logs
4. Analyze root cause and provide recommendations (retry / adjust config)
### Scenario C: Hive Full-Database Migration to MaxCompute
The most common migration scenario.
1. Guide user to create Hive data source in console: `https://maxcompute.console.aliyun.com/{region}/mma/datasource`
2. `list-mms-data-sources` to confirm data source exists, get `source_id`
3. `create-mms-fetch-metadata-job` to initiate metadata scan
4. Poll `get-mms-fetch-metadata-job` until scan completes
5. `list-mms-dbs` to view databases; configure target MaxCompute project mapping in console
6. `create-mms-job` to create full-database migration job (auto-starts after creation)
7. `get-mms-job` to check migration progress (for long tasks, suggest user checks later)
### Scenario D: BigQuery Migration to MaxCompute
1. Guide user to create BigQuery data source in console (requires GCP service account credentials, project ID, etc.)
2. `list-mms-data-sources` to confirm, get `source_id`
3. `create-mms-fetch-metadata-job` to scan metadata, wait for completion
4. Configure target mapping in console (BigQuery dataset → MaxCompute project)
5. `create-mms-job` to create table-level migration job (pass `tables` list in body)
### Scenario E: Snowflake Migration to MaxCompute
1. Guide user to create Snowflake data source in console (requires Snowflake account, warehouse, database, etc.)
2. Confirm data source → scan → configure mapping → create job (same workflow as above)
### Scenario F: Redshift Migration to MaxCompute
1. Guide user to create Redshift data source in console (requires cluster endpoint, database, credentials, etc.)
2. Confirm data source → scan → configure mapping → create job (same workflow as above)
### Scenario G: Databricks Migration to MaxCompute
1. Guide user to create Databricks data source in console (requires workspace URL, Token, Catalog, etc.)
2. Confirm data source → scan → configure mapping → create job (same workflow as above)
### Scenario H: MaxCompute Cross-Project/Cross-Region Migration
For cross-region relocation, project consolidation/splitting scenarios.
1. Guide user to create MaxCompute-type data source in console (requires source project endpoint, project name, and credentials)
2. Confirm data source → scan → configure mapping
3. Choose migration granularity:
- Full database: pass only `srcDbName` in body
- Specific tables: pass `srcDbName` + `tables` in body
- Specific partitions: pass `srcDbName` + `partitionFilters` in body
> For MaxCompute cross-project migration, both source and target are MaxCompute projects. Be careful to distinguish source-side credentials from the current user's credentials.
## Important Notes
- **[MUST] Guide users to the console for data source creation**: `https://maxcompute.console.aliyun.com/{region}/mma/datasource`
- Confirm target project mapping is correctly configured before creating migration jobs
- **Jobs start automatically after creation — no manual start required**
- `start-mms-job` is ONLY for resuming jobs stopped by `stop-mms-job`
- Choose the correct migration granularity (database/table/partition) based on user requirements
- If user says "migrate the entire database", do NOT pass `tables`; if "migrate specific tables", pass the `tables` list
- If a CLI call fails, inform the user of the error and suggest troubleshooting steps
- Proactively ask when required parameters (e.g., `source_id`) are not provided
## RAM Policy
> **[MUST] Permission Failure Handling:** When any command or API call fails due to permission errors at any point during execution, follow this process:
> 1. Read `references/ram-policies.md` to get the full list of permissions required by this SKILL
> 2. Use `ram-permission-diagnose` skill to guide the user through requesting the necessary permissions
> 3. Pause and wait until the user confirms that the required permissions have been granted
### Required Permissions
MMS requires both RAM user permissions and MaxCompute project permissions. See `references/ram-policies.md` for details.
| Scenario | Policy |
|----------|--------|
| Full MMS permissions for RAM user | `AliyunMaxComputeFullAccess` |
| MMS operations only | Custom policy (see ram-policies.md) |
| Root account operations | No additional RAM permissions needed |
## Reference Links
| Document | Link |
|----------|------|
| CLI Installation Guide | [references/cli-installation-guide.md](references/cli-installation-guide.md) |
| RAM Policies | [references/ram-policies.md](references/ram-policies.md) |
| Related Commands | [references/related-commands.md](references/related-commands.md) |
## Official Documentation
- [MMS Overview](https://help.aliyun.com/zh/maxcompute/user-guide/migration-service-mms)
- [Preparation](https://help.aliyun.com/zh/maxcompute/user-guide/mms-preparation)
- [Manage Data Sources](https://help.aliyun.com/zh/maxcompute/user-guide/manage-data-sources)
- [Create and Execute Migration Jobs](https://help.aliyun.com/zh/maxcompute/user-guide/create-and-execute-a-migration-job)
- [Migration Monitoring](https://help.aliyun.com/zh/maxcompute/user-guide/migration-observation)
FILE:references/acceptance-criteria.md
# Acceptance Criteria: MaxCompute Migration Service (MMS)
**Scenario**: MaxCompute Migration Service (MMS) - 将多种数据源迁移至 MaxCompute
**Purpose**: Skill testing acceptance criteria
---
# Correct Usage Patterns
## 1. CLI Commands
### ✅ CORRECT
```bash
# 列出项目
aliyun maxcompute list-projects --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# 获取项目详情
aliyun maxcompute get-project --project my_project --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
# 列出表
aliyun maxcompute list-tables --project my_project --region cn-hangzhou --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### ❌ INCORRECT
```bash
# 缺少 --user-agent
aliyun maxcompute list-projects --region cn-hangzhou
# 使用错误的 API 格式(非 plugin mode)
aliyun maxcompute ListProjects --RegionId cn-hangzhou
# 缺少必要参数
aliyun maxcompute get-project --region cn-hangzhou
```
## 2. Console Operations
### ✅ CORRECT
1. 登录 MaxCompute 控制台
2. 进入 **数据传输 > 迁移服务**
3. 按步骤创建数据源和迁移作业
### ❌ INCORRECT
1. 直接使用 CLI 创建 MMS 数据源(当前不支持)
2. 跳过准备工作直接创建迁移作业
## 3. Parameter Handling
### ✅ CORRECT
- 确认所有用户参数后再执行操作
- 使用用户提供的具体值,不假设默认值
- 列出参数确认表供用户确认
### ❌ INCORRECT
```markdown
# 错误:假设默认值
aliyun maxcompute list-projects --region cn-hangzhou # 假设用户要查询杭州地域
# 错误:使用占位符直接执行
aliyun maxcompute get-project --project <project-name>
```
## 4. Credential Handling
### ✅ CORRECT
```bash
# 只检查凭证状态
aliyun configure list
```
输出中显示有效的 profile (AK, STS, 或 OAuth identity)。
### ❌ INCORRECT
```bash
# 读取或打印 AK/SK 值
echo $ALIBABA_CLOUD_ACCESS_KEY_ID
# 让用户在命令行输入凭证
aliyun configure set --access-key-id <user-input>
```
## 5. Error Handling
### ✅ CORRECT
1. 捕获错误信息
2. 分析错误原因
3. 提供解决方案
4. 引导用户使用 `ram-permission-diagnose` skill 处理权限问题
### ❌ INCORRECT
- 忽略错误继续执行
- 不提供解决建议
- 重复执行相同失败的操作
---
# Feature Verification Checklist
## MMS Core Features
- [ ] 支持的数据源类型识别 (Hive, BigQuery, Databricks, MaxCompute)
- [ ] 迁移作业类型说明 (整库、多表、多分区)
- [ ] 准备工作步骤完整
- [ ] 数据源创建流程清晰
- [ ] 迁移作业创建流程清晰
- [ ] 监控和验证方法明确
## CLI Commands
- [ ] 所有 `aliyun` 命令包含 `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service`
- [ ] 使用 plugin mode 格式 (如 `list-projects` 而非 `ListProjects`)
- [ ] 必要参数完整
## Documentation
- [ ] RAM Policy 文档完整
- [ ] Related Commands 文档完整
- [ ] Verification Method 文档完整
- [ ] CLI Installation Guide 复制到 references 目录
## Security
- [ ] 不暴露 AK/SK 值
- [ ] 使用 `aliyun configure list` 验证凭证
- [ ] RAM 权限列表完整
FILE:references/cli-installation-guide.md
# CLI Installation Guide
Complete guide for installing and configuring Aliyun CLI.
> **Aliyun CLI 3.3.1+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.1 or later for full plugin ecosystem coverage.
## Installation
### macOS
**Using Homebrew (Recommended)**
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
# Verify version (>= 3.3.1)
aliyun version
```
**Using Binary**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
# Extract
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
# Move to PATH
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
### Linux
**Debian/Ubuntu**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**CentOS/RHEL**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**ARM64 Architecture**
```bash
# Download ARM64 version
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-arm64.tgz
sudo mv aliyun /usr/local/bin/
```
### Windows
**Using Binary**
1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Open new Command Prompt or PowerShell
5. Verify: `aliyun version`
## Configuration
### Quick Start
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
**Where to Get Access Keys**
1. Log in to Aliyun Console: https://ram.console.aliyun.com/
2. Navigate to: AccessKey Management
3. Create a new AccessKey pair
4. Save the secret immediately — it's only shown once
### Environment Variables
**Access Key Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
### Verification
```bash
# Basic test - list regions
aliyun ecs describe-regions
# Expected output: JSON array of regions
```
## Plugin Installation
After installing CLI 3.3.1+, enable automatic plugin installation:
```bash
aliyun configure set --auto-plugin-install true
```
Install specific product plugins:
```bash
aliyun plugin install --names maxcompute
```
FILE:references/ram-policies.md
# RAM Policies for MaxCompute Migration Service (MMS)
本文档详细说明 MMS 所需的权限配置,包括 RAM 权限和 MaxCompute 项目权限。
## 权限配置概述
MMS 需要配置三类权限:
| 权限类型 | 授权对象 | 说明 |
|---------|---------|------|
| 服务关联角色 | 阿里云账号 | MMS 访问云资源的角色 |
| RAM 权限 | 执行迁移的 RAM 用户 | MMS 操作权限 |
| MaxCompute 项目权限 | 服务关联角色 | 数据读写权限 |
## 1. 服务关联角色
首次使用 MMS 前,必须创建服务关联角色:
**角色名称**:`AliyunServiceRoleForMaxComputeMMS`
**创建方式**:
### 通过 MaxCompute 控制台
1. 登录 MaxCompute 控制台
2. 选择 数据传输 > 迁移服务
3. 点击 新增数据源
4. 在弹出的对话框中确认创建
### 通过 RAM 控制台
1. 登录 RAM 控制台 > 身份管理 > 角色
2. 点击 创建角色 > 创建服务关联角色
3. 选择信任的云服务:`AliyunServiceRoleForMaxComputeMMS`
> **注意**:RAM 用户需要 `AliyunRAMFullAccess` 权限才能创建服务关联角色
## 2. RAM 权限策略(给 RAM 用户)
### 2.1 MMS 所有操作权限
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"odps:ListMmsDataSources",
"odps:CreateMmsDataSource",
"ram:GetRole",
"odps:GetMmsDataSource",
"odps:UpdateMmsDataSource",
"odps:DeleteMmsDataSource",
"odps:CreateMmsFetchMetadataJob",
"odps:GetMmsFetchMetadataJob",
"odps:ListMmsFetchMetadataJobLogs",
"odps:ListMmsDbs",
"odps:GetMmsDb",
"odps:ListMmsTables",
"odps:GetMmsTable",
"odps:ListMmsPartitions",
"odps:GetMmsPartition",
"odps:ListMmsJobs",
"odps:GetMmsJob",
"odps:CreateMmsJob",
"odps:DeleteMmsJob",
"odps:StartMmsJob",
"odps:StopMmsJob",
"odps:RetryMmsJob",
"odps:ListMmsTasks",
"odps:GetMmsTask",
"odps:ListMmsTaskLogs",
"odps:StopMmsTask",
"odps:StartMmsTask",
"odps:RetryMmsTask",
"odps:GetMmsAsyncTask",
"odps:GetMmsProgress",
"odps:GetMmsSpeed",
"odps:CreateMmsAuthFile",
"odps:ListMmsAgents",
"odps:ListMmsTimers",
"odps:GetMmsTimer",
"odps:UpdateMmsTimer",
"odps:ListMmsTimerLogs",
"odps:CreateMmsTimer",
"odps:UpdateMmsTables",
"odps:UpdateMmsTable",
"odps:UpdateMmsDb",
"odps:ListNetworkLinks"
],
"Resource": "*"
}
]
}
```
### 2.2 MMS 源数据管理权限
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"odps:ListMmsDataSources",
"odps:CreateMmsDataSource",
"ram:GetRole",
"odps:GetMmsDataSource",
"odps:UpdateMmsDataSource",
"odps:DeleteMmsDataSource",
"odps:CreateMmsFetchMetadataJob",
"odps:GetMmsFetchMetadataJob",
"odps:ListMmsFetchMetadataJobLogs",
"odps:ListMmsDbs",
"odps:GetMmsDb",
"odps:ListMmsTables",
"odps:GetMmsTable",
"odps:ListMmsPartitions",
"odps:GetMmsPartition",
"odps:GetMmsAsyncTask",
"odps:GetMmsProgress",
"odps:GetMmsSpeed",
"odps:CreateMmsAuthFile",
"odps:ListMmsAgents",
"odps:UpdateMmsTables",
"odps:UpdateMmsTable",
"odps:UpdateMmsDb"
],
"Resource": "*"
}
]
}
```
### 2.3 MMS 迁移作业管理权限
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"odps:ListMmsDataSources",
"odps:GetMmsDataSource",
"odps:CreateMmsFetchMetadataJob",
"odps:GetMmsFetchMetadataJob",
"odps:ListMmsFetchMetadataJobLogs",
"odps:ListMmsDbs",
"odps:GetMmsDb",
"odps:ListMmsTables",
"odps:GetMmsTable",
"odps:ListMmsPartitions",
"odps:GetMmsPartition",
"odps:ListMmsJobs",
"odps:GetMmsJob",
"odps:CreateMmsJob",
"odps:DeleteMmsJob",
"odps:StartMmsJob",
"odps:StopMmsJob",
"odps:RetryMmsJob",
"odps:ListMmsTasks",
"odps:GetMmsTask",
"odps:ListMmsTaskLogs",
"odps:StopMmsTask",
"odps:StartMmsTask",
"odps:RetryMmsTask",
"odps:GetMmsAsyncTask",
"odps:GetMmsProgress",
"odps:GetMmsSpeed",
"odps:ListMmsTimers",
"odps:GetMmsTimer",
"odps:UpdateMmsTimer",
"odps:ListMmsTimerLogs",
"odps:CreateMmsTimer"
],
"Resource": "*"
}
]
}
```
## 3. MaxCompute 项目权限(给服务关联角色)
在目标项目中,需要为服务关联角色授予数据操作权限:
### 3.1 添加服务关联角色到项目
```sql
USE <target_project>;
ADD USER `RAM$<account_id>:role/AliyunServiceRoleForMaxComputeMMS`;
```
### 3.2 粗粒度授权(推荐)
```sql
-- 授予 admin 角色
GRANT admin TO USER `RAM$<account_id>:role/AliyunServiceRoleForMaxComputeMMS`;
```
### 3.3 细粒度授权
**项目级权限:**
```sql
GRANT Read,Write,List,CreateTable,CreateInstance,CreateFunction,CreateResource
ON project <project_name> TO USER `RAM$<account_id>:role/AliyunServiceRoleForMaxComputeMMS`;
-- 或授予所有权限
GRANT ALL ON project <project_name> TO USER `RAM$<account_id>:role/AliyunServiceRoleForMaxComputeMMS`;
```
**表级权限:**
```sql
GRANT Describe,Select,Alter,Update,Drop,ShowHistory
ON table <table_name> TO USER `RAM$<account_id>:role/AliyunServiceRoleForMaxComputeMMS`;
-- 或授予所有权限
GRANT All ON table <table_name> TO USER `RAM$<account_id>:role/AliyunServiceRoleForMaxComputeMMS`;
```
**实例级权限:**
```sql
GRANT Read,Write ON instance <instance_id>
TO USER `RAM$<account_id>:role/AliyunServiceRoleForMaxComputeMMS`;
```
### 3.4 权限说明
| 对象类型 | 支持的权限 |
|---------|-----------|
| Project | Read, Write, List, CreateTable, CreateInstance, CreateFunction, CreateResource, All |
| Table | Describe, Select, Alter, Update, Drop, ShowHistory, All |
| Instance | Read, Write, All |
| Resource | Read, Write, Download, Delete |
| Function | Read, Write, Download, Execute, Delete |
## 4. 快捷授权方案
| 场景 | 推荐方案 |
|-----|---------|
| 主账号操作 | 无需额外配置 RAM 权限 |
| RAM 用户完整 MaxCompute 权限 | 授予 `AliyunMaxComputeFullAccess` |
| RAM 用户仅 MMS 操作权限 | 使用上述自定义权限策略 |
| 创建服务关联角色 | RAM 用户需 `AliyunRAMFullAccess` |
## 5. 通过控制台配置权限
### MaxCompute 控制台配置项目权限
1. 登录 MaxCompute 控制台
2. 选择 管理配置 > 项目管理
3. 点击目标项目的 管理
4. 选择 角色权限 页签
5. 新建角色并配置权限
6. 在成员管理中添加服务关联角色
## 6. 权限排查
### 常见错误
| 错误信息 | 原因 | 解决方案 |
|---------|------|---------|
| Forbidden.RAM | RAM 用户缺少 MMS 操作权限 | 添加 MMS 权限策略 |
| Access Denied | 服务关联角色未授权项目权限 | 在项目中添加用户并授权 |
| ServiceLinkedRole not found | 服务关联角色未创建 | 创建 AliyunServiceRoleForMaxComputeMMS |
### 权限检查清单
- [ ] 服务关联角色已创建
- [ ] RAM 用户已授予 MMS 操作权限
- [ ] 服务关联角色已添加到目标项目
- [ ] 服务关联角色已授予数据操作权限
- [ ] 目标项目已绑定 Quota 资源
FILE:references/related-commands.md
# MMS OpenAPI & CLI Commands
MMS 通过 MaxCompute OpenAPI(产品代码 `MaxCompute`,版本 `2022-01-04`)提供 API 能力。
CLI 调用格式:`aliyun maxcompute <command> [params] --user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service`
## 数据源管理
### ListMmsDataSources — 列出数据源
```bash
aliyun maxcompute list-mms-data-sources \
--name <name> \
--type <Hive|BigQuery|Snowflake|Redshift|Databricks|MaxCompute> \
--region <region> \
--page-num 1 --page-size 20 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
| 参数 | 类型 | 说明 |
|------|------|------|
| name | string | 按名称过滤 |
| type | string | 按数据源类型过滤 |
| region | string | 按地域过滤 |
| pageNum | integer | 页码 |
| pageSize | integer | 每页条数 |
### GetMmsDataSource — 获取数据源详情
```bash
aliyun maxcompute get-mms-data-source \
--source-id <sourceId> \
--with-config true \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### 创建/更新/删除数据源
> **[MUST] 数据源的创建、更新、删除操作,引导用户到控制台完成。**
> 控制台地址:`https://maxcompute.console.aliyun.com/{region}/mma/datasource`
## 元数据扫描(盘点)
### CreateMmsFetchMetadataJob — 发起元数据盘点
```bash
aliyun maxcompute create-mms-fetch-metadata-job \
--source-id <sourceId> \
--body '{ ... }' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### GetMmsFetchMetadataJob — 查看盘点状态
```bash
aliyun maxcompute get-mms-fetch-metadata-job \
--source-id <sourceId> \
--scan-id <scanId> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
## 元数据管理
### ListMmsDbs — 列出数据库
```bash
aliyun maxcompute list-mms-dbs \
--source-id <sourceId> \
--name <name> \
--status <status> \
--page-num 1 --page-size 20 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### GetMmsDb — 获取数据库详情
```bash
aliyun maxcompute get-mms-db \
--source-id <sourceId> \
--db-id <dbId> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### ListMmsTables — 列出表
```bash
aliyun maxcompute list-mms-tables \
--source-id <sourceId> \
--db-name <dbName> \
--name <name> \
--status <status> \
--has-partitions true \
--page-num 1 --page-size 20 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
| 参数 | 类型 | 说明 |
|------|------|------|
| dbId | integer | 按数据库 ID 过滤 |
| dbName | string | 按数据库名过滤 |
| name | string | 按表名过滤 |
| dstProjectName | string | 按目标项目名过滤 |
| dstSchemaName | string | 按目标 Schema 过滤 |
| type | string | 按表类型过滤 |
| hasPartitions | boolean | 是否有分区 |
| status | array | 按状态过滤 |
### GetMmsTable — 获取表详情
```bash
aliyun maxcompute get-mms-table \
--source-id <sourceId> \
--table-id <tableId> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### ListMmsPartitions — 列出分区
```bash
aliyun maxcompute list-mms-partitions \
--source-id <sourceId> \
--db-name <dbName> \
--table-name <tableName> \
--page-num 1 --page-size 20 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### GetMmsPartition — 获取分区详情
```bash
aliyun maxcompute get-mms-partition \
--source-id <sourceId> \
--partition-id <partitionId> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
## 迁移作业管理
### CreateMmsJob — 创建迁移作业
```bash
aliyun maxcompute create-mms-job \
--source-id <sourceId> \
--body '{ "name": "job_name", "srcDbName": "db_name", ... }' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
> **创建后自动开始执行**,无需手动调用 StartMmsJob。
### ListMmsJobs — 列出迁移作业
```bash
aliyun maxcompute list-mms-jobs \
--source-id <sourceId> \
--name <name> \
--src-db-name <srcDbName> \
--status <status> \
--page-num 1 --page-size 20 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
| 参数 | 类型 | 说明 |
|------|------|------|
| name | string | 按作业名过滤 |
| srcDbName | string | 按源数据库名过滤 |
| srcTableName | string | 按源表名过滤 |
| status | string | 按状态过滤 |
| stopped | integer | 是否已停止 |
| timerId | integer | 按定时器 ID 过滤 |
### GetMmsJob — 获取迁移作业详情
```bash
aliyun maxcompute get-mms-job \
--source-id <sourceId> \
--job-id <jobId> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### StartMmsJob — 启动/恢复迁移作业
```bash
aliyun maxcompute start-mms-job \
--source-id <sourceId> \
--job-id <jobId> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
> 仅用于恢复被 StopMmsJob 停止的作业。
### StopMmsJob — 停止迁移作业
```bash
aliyun maxcompute stop-mms-job \
--source-id <sourceId> \
--job-id <jobId> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### RetryMmsJob — 重试迁移作业
```bash
aliyun maxcompute retry-mms-job \
--source-id <sourceId> \
--job-id <jobId> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### DeleteMmsJob — 删除迁移作业
```bash
aliyun maxcompute delete-mms-job \
--source-id <sourceId> \
--job-id <jobId> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
## 迁移任务管理
### ListMmsTasks — 列出迁移任务
```bash
aliyun maxcompute list-mms-tasks \
--source-id <sourceId> \
--job-id <jobId> \
--status <status> \
--src-table-name <srcTableName> \
--page-num 1 --page-size 20 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
| 参数 | 类型 | 说明 |
|------|------|------|
| jobId | integer | 按作业 ID 过滤 |
| jobName | string | 按作业名过滤 |
| srcDbName | string | 按源数据库名过滤 |
| srcTableName | string | 按源表名过滤 |
| status | string | 按状态过滤 |
| partition | string | 按分区过滤 |
### GetMmsTask — 获取迁移任务详情
```bash
aliyun maxcompute get-mms-task \
--source-id <sourceId> \
--task-id <taskId> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
### ListMmsTaskLogs — 查看迁移任务日志
```bash
aliyun maxcompute list-mms-task-logs \
--source-id <sourceId> \
--task-id <taskId> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
## 异步任务
### GetMmsAsyncTask — 查看异步任务状态
```bash
aliyun maxcompute get-mms-async-task \
--source-id <sourceId> \
--async-task-id <asyncTaskId> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-maxcompute-migration-service
```
## API 概览
| API | Method | Path | 说明 |
|-----|--------|------|------|
| ListMmsDataSources | GET | /api/v1/mms/datasources | 列出数据源 |
| GetMmsDataSource | GET | /api/v1/mms/datasources/{sourceId} | 获取数据源详情 |
| CreateMmsDataSource | - | - | **引导用户到控制台创建** |
| UpdateMmsDataSource | - | - | **引导用户到控制台更新** |
| DeleteMmsDataSource | - | - | **引导用户到控制台删除** |
| CreateMmsFetchMetadataJob | POST | /api/v1/mms/datasources/{sourceId}/scans | 发起元数据盘点 |
| GetMmsFetchMetadataJob | GET | /api/v1/mms/datasources/{sourceId}/scans/{scanId} | 查看盘点状态 |
| ListMmsDbs | GET | /api/v1/mms/datasources/{sourceId}/dbs | 列出数据库 |
| GetMmsDb | GET | /api/v1/mms/datasources/{sourceId}/dbs/{dbId} | 获取数据库详情 |
| ListMmsTables | GET | /api/v1/mms/datasources/{sourceId}/tables | 列出表 |
| GetMmsTable | GET | /api/v1/mms/datasources/{sourceId}/tables/{tableId} | 获取表详情 |
| ListMmsPartitions | GET | /api/v1/mms/datasources/{sourceId}/partitions | 列出分区 |
| GetMmsPartition | GET | /api/v1/mms/datasources/{sourceId}/partitions/{partitionId} | 获取分区详情 |
| CreateMmsJob | POST | /api/v1/mms/datasources/{sourceId}/jobs | 创建迁移作业 |
| ListMmsJobs | GET | /api/v1/mms/datasources/{sourceId}/jobs | 列出迁移作业 |
| GetMmsJob | GET | /api/v1/mms/datasources/{sourceId}/jobs/{jobId} | 获取作业详情 |
| StartMmsJob | POST | /api/v1/mms/datasources/{sourceId}/jobs/{jobId}/start | 启动作业 |
| StopMmsJob | POST | /api/v1/mms/datasources/{sourceId}/jobs/{jobId}/stop | 停止作业 |
| RetryMmsJob | POST | /api/v1/mms/datasources/{sourceId}/jobs/{jobId}/retry | 重试作业 |
| DeleteMmsJob | POST | /api/v1/mms/datasources/{sourceId}/jobs/{jobId} | 删除作业 |
| ListMmsTasks | GET | /api/v1/mms/datasources/{sourceId}/tasks | 列出迁移任务 |
| GetMmsTask | GET | /api/v1/mms/datasources/{sourceId}/tasks/{taskId} | 获取任务详情 |
| ListMmsTaskLogs | GET | /api/v1/mms/datasources/{sourceId}/tasks/{taskId}/logs | 查看任务日志 |
| GetMmsAsyncTask | GET | /api/v1/mms/datasources/{sourceId}/asyncTasks/{asyncTaskId} | 查看异步任务 |
FILE:references/verification-method.md
# Verification Method for MaxCompute Migration Service (MMS)
This document provides steps to verify the success of MMS migration operations.
## Migration Job Verification
### Step 1: Check Migration Job Status
通过 MaxCompute 控制台验证:
1. 登录 [MaxCompute 控制台](https://maxcompute.console.aliyun.com/)
2. 选择地域,进入 **数据传输 > 迁移服务 > 迁移作业**
3. 查看作业状态:
- **运行中**: 作业正在执行
- **成功**: 作业执行完成
- **失败**: 作业执行失败,查看错误信息
### Step 2: Verify Data Count
验证迁移后的数据条数:
```sql
-- 在 MaxCompute 项目中执行
SELECT COUNT(*) FROM <target_table>;
```
与源端数据条数对比:
```sql
-- 在源数据源执行(如 Hive)
SELECT COUNT(*) FROM <source_table>;
```
### Step 3: Verify Data Sample
抽样验证数据内容:
```sql
-- 在 MaxCompute 项目中执行
SELECT * FROM <target_table> LIMIT 10;
```
检查数据格式、字段值是否正确。
### Step 4: Verify Partition Data (if applicable)
验证分区数据:
```sql
-- 查看分区列表
SHOW PARTITIONS <target_table>;
-- 验证分区数据量
SELECT COUNT(*) FROM <target_table> WHERE <partition_column> = '<partition_value>';
```
### Step 5: Check Data Validation Results
如开启了数据校验,查看校验结果:
1. 进入 **迁移服务 > 迁移作业**
2. 点击作业名称,查看任务详情
3. 查看 **任务日志** 中的校验结果
校验方法:比对源端和目标端的 `SELECT COUNT(*)` 结果。
## Verification Commands Summary
| Verification | Command/Method | Expected Result |
|-------------|----------------|-----------------|
| 作业状态 | 控制台查看 | 状态为"成功" |
| 数据条数 | `SELECT COUNT(*)` | 源端与目标端一致 |
| 数据内容 | `SELECT * LIMIT N` | 数据格式正确 |
| 分区数据 | `SHOW PARTITIONS` | 分区完整 |
| 数据校验 | 任务日志 | 校验通过 |
## Troubleshooting
### Migration Job Failed
1. 查看错误日志,定位失败原因
2. 常见问题:
- 网络不通:检查网络配置
- 权限不足:检查 RAM 权限
- 源端不可用:检查数据源状态
- 资源不足:检查 MaxCompute CU 资源
### Data Count Mismatch
1. 检查是否有迁移任务失败
2. 检查分区过滤条件是否正确
3. 检查源端是否有数据变更
4. 重新执行数据校验
### Data Format Error
1. 检查字段类型映射是否正确
2. 检查字符编码设置
3. 检查数据源配置
Alibaba Cloud Lindorm cloud native multi-model database Skill. Covers instance management, monitoring, performance, storage, connections, backup, migration,...
---
name: alibabacloud-lindorm-agent-skill
description: |
Alibaba Cloud Lindorm cloud native multi-model database Skill. Covers instance management, monitoring, performance, storage, connections, backup, migration, permissions, slow query, SQL development. Lindorm is domain-specific knowledge — answers MUST reference Skill documents or official Alibaba Cloud documentation; direct responses from training knowledge are prohibited.
Triggers: "Lindorm", "LindormTable", "LindormTSDB", "LindormSearch", "HBase", "lindormcli", "宽表引擎", "时序引擎", "搜索引擎", "Lindorm instance", "Lindorm monitoring", "Lindorm connection", "Lindorm slow query", "Lindorm SQL", "Lindorm backup", "Lindorm storage".
---
# Lindorm Agent Skill
Alibaba Cloud Lindorm cloud native multi-model database Skill. Covers three domains: **Operations Management**, **Developer Guidance**, and **Reference Materials**.
## Core Capability Matrix
| Category | Sub-Scenarios | Reference Docs |
|---------|--------------|----------------|
| **01-Dev Guidance** | Connection setup, quick start, SQL guide, table design | `references/01-dev/` |
| **02-Ops Management** | Instance mgmt, monitoring, error troubleshooting, storage analysis, connection diagnostics, backup & restore, migration, permissions, slow query | `references/02-ops/` |
| **03-Reference** | CLI command list, RAM permission list | `references/03-ref/` |
## Decision Tree
```
User Request
├── Connection / DDL / SQL / Code examples → 01-dev
│ ├── Connection address / code → references/01-dev/connection-guide.md
│ ├── DDL / write / query examples → references/01-dev/quick-start-guide.md
│ ├── SQL connection & development → references/01-dev/sql-client-guide.md
│ ├── SQL syntax reference → references/01-dev/sql-operations.md
│ ├── MySQL compatibility → references/01-dev/sql-usage-notes.md
│ └── Table design guide → references/01-dev/table-design.md
│
├── Instance / Monitoring / Errors / Performance / Storage / Connection / Scaling / Backup / Migration / Permissions / Slow query → 02-ops
│ ├── Instance management → references/02-ops/instance-management.md
│ ├── Monitoring / Alerts → references/02-ops/monitoring-guide.md
│ ├── Error codes → references/02-ops/error-troubleshoot.md
│ ├── Storage analysis → references/02-ops/storage-analysis.md
│ ├── Connection diagnostics → references/02-ops/connection-troubleshoot.md
│ ├── Scale up/down → references/02-ops/instance-management.md
│ ├── Backup & restore → references/02-ops/backup-restore.md
│ ├── Data migration → references/02-ops/data-migration.md
│ ├── Account & permissions → references/02-ops/user-permission.md
│ └── Slow query analysis → references/02-ops/slow-query-analysis.md
│
└── Command list / Permission reference / Specs → 03-ref
├── CLI command list → references/03-ref/related-commands.md
├── RAM permission list → references/03-ref/ram-policies.md
├── Aliyun CLI setup → references/03-ref/cli-installation-guide.md
├── Lindorm CLI / HBase Shell → references/03-ref/lindorm-cli-guide.md
├── Acceptance criteria → references/03-ref/acceptance-criteria.md
└── Verification methods → references/03-ref/verification-method.md
```
## Quick Mapping Table
| User says | Scenario | Reference Doc |
|-----------|----------|---------------|
| "how to connect / connection address" | Connection setup | `references/01-dev/connection-guide.md` |
| "create table / insert / query examples" | Quick start | `references/01-dev/quick-start-guide.md` |
| "how to create a table" | Table design | `references/01-dev/table-design.md` |
| "SQL syntax" | SQL reference | `references/01-dev/sql-operations.md` |
| "how to use SQL" | SQL guide | `references/01-dev/sql-client-guide.md` |
| "MySQL compatibility" | SQL notes | `references/01-dev/sql-usage-notes.md` |
| "list instances / what instances exist" | Instance management | `references/02-ops/instance-management.md` |
| "CPU / memory / QPS / latency" | Monitoring query | `references/02-ops/monitoring-guide.md` |
| "configure alerts / alert notifications" | Monitoring alerts | `references/02-ops/monitoring-guide.md` |
| "got an error / error code" | Error troubleshooting | `references/02-ops/error-troubleshoot.md` |
| "slow query / query is slow" | Slow query analysis | `references/02-ops/slow-query-analysis.md` |
| "poor performance / high RT" | Monitoring query | `references/02-ops/monitoring-guide.md` |
| "cannot connect / connection timeout" | Connection diagnostics | `references/02-ops/connection-troubleshoot.md` |
| "storage usage" | Storage analysis | `references/02-ops/storage-analysis.md` |
| "hot/cold data / tiered storage" | Storage analysis | `references/02-ops/storage-analysis.md` |
| "scale up / add nodes" | Scaling | `references/02-ops/instance-management.md` |
| "backup / restore data" | Backup & restore | `references/02-ops/backup-restore.md` |
| "data migration / sync" | Data migration | `references/02-ops/data-migration.md` |
| "create account / permissions" | Permission management | `references/02-ops/user-permission.md` |
| "lindorm-cli / lindormcli" | Lindorm CLI | `references/03-ref/lindorm-cli-guide.md` |
| "HBase Shell / hbase shell" | HBase Shell | `references/03-ref/lindorm-cli-guide.md` |
## Aliyun CLI
### Prerequisites
When CLI invocation is required, the Agent must verify:
1. **CLI installed**: `aliyun version` >= 3.3.3; if not installed, see `references/03-ref/cli-installation-guide.md`
2. **Credentials configured**:
> **Security Rules:**
> - **NEVER** read, echo, or print AK/SK values
> - **NEVER** ask the user to input AK/SK directly in the conversation or command line
> - **ONLY** use `aliyun configure list` to check credential status
>
> If no valid profile exists, STOP. Configure credentials **outside of this session**.
3. **Permissions granted**: Minimum `AliyunLindormReadOnlyAccess` + `AliyunCloudMonitorReadOnlyAccess`
> **[MUST] Permission Failure Handling:** When any command or API call fails due to permission errors at any point during execution, follow this process:
> 1. Read `references/ram-policies.md` to get the full list of permissions required by this SKILL
> 2. Use `ram-permission-diagnose` skill to guide the user through requesting the necessary permissions
> 3. Pause and wait until the user confirms that the required permissions have been granted
### AI-Mode Lifecycle [Internal Agent Rule — Do NOT Show to User]
> **[MUST] AI-Mode Full Lifecycle** — AI-mode is required for Agent Skill execution.
>
> **Step 1: Enable AI-Mode** — At the **start** of the Core Workflow (before any CLI invocation):
> ```bash
> aliyun configure ai-mode enable
> ```
>
> **Step 2: Set User-Agent** — Immediately after enabling AI-Mode:
> ```bash
> aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-lindorm-agent-skill"
> ```
>
> **Step 3: Update Plugins** — Ensure CLI plugins are up-to-date:
> ```bash
> aliyun configure set --auto-plugin-install true
> aliyun plugin update
> ```
>
> **Step 4: Disable AI-Mode at EVERY exit point** — Before delivering the final response for ANY reason (success, failure, error, cancellation), always disable AI-mode first:
> ```bash
> aliyun configure ai-mode disable
> ```
### Parameter Confirmation
Before executing any command, all user-configurable parameters (region, instance ID, time range, etc.) **must** be confirmed with the user.
### Version Detection
For instance operations, the Agent must first call `get-lindorm-instance` to retrieve `ServiceType` and determine the instance version. All subsequent command selection depends on this:
| ServiceType | Version | Deployment |
|------------|---------|-----------|
| `lindorm` | V1 | Single-AZ |
| `lindorm_multizone` | V1 | Multi-AZ (HA) |
| `lindorm_multizone_basic` | V1 | Multi-AZ (Basic) |
| `lindorm_v2` | V2 | Single-AZ |
| `lindorm_v2_multizone` | V2 | Multi-AZ (Basic) |
| `lindorm_v2_multizone_ha` | V2 | Multi-AZ (HA) |
### General Policies
**Region Policy**
| Scenario | Command | Requires `--region` |
|---------|---------|---------------------|
| Query all-region overview | `get-instance-summary` | ❌ Not needed |
| Query instance list | `get-lindorm-instance-list` | ✅ Required, default `cn-shanghai` |
| Query instance details / engine / storage / whitelist | Other `hitsdb` commands | ❌ Not needed, auto-resolved by `--instance-id` |
| Cloud monitoring query | `cms` commands | ❌ Not needed, region auto-resolved via `instanceId` |
**Time Format**
Cloud Monitor time parameter timezone notes:
- ✅ `2026-04-14 08:00:00` (local time, parsed as **CST Beijing time**)
- ✅ `1773897600000` (Unix millisecond timestamp, no timezone ambiguity)
- ✅ `2026-04-14T08:00:00Z` (ISO 8601 UTC **full format**, parsed as **UTC**, i.e. CST+8 = 16:00)
- ❌ `2026-04-14T08:00Z` (ISO 8601 **short format, no seconds — unsupported**, returns `parse param time error`)
- ❌ **Never use UTC Z format for user-intended local times** (e.g. if user says "14:00", write `2026-04-14 14:00:00`, not `2026-04-14T14:00:00Z`)
- ⚠️ Note: local time and ISO 8601 Z format query different time windows — common source of timezone-related issues
### Command Reference
#### Instance Management (hitsdb — Lindorm product alias)
| Command | Description | Example |
|---------|-------------|---------|
| `aliyun hitsdb describe-regions` | List supported regions | `aliyun hitsdb describe-regions` |
| `aliyun hitsdb get-instance-summary` | All-region instance overview (no `--region` needed) | `aliyun hitsdb get-instance-summary` |
| `aliyun hitsdb get-lindorm-instance-list` | List instances (ID, status, engine flags; filterable by region/type) | `aliyun hitsdb get-lindorm-instance-list --region cn-shanghai` |
| `aliyun hitsdb get-lindorm-instance` | Get config/version/status (ServiceType, engine node count, spec; **no connection address**) | `aliyun hitsdb get-lindorm-instance --instance-id ld-xxx` |
| `aliyun hitsdb get-lindorm-instance-engine-list` | Get connection addresses (host:port per engine, public/private network) | `aliyun hitsdb get-lindorm-instance-engine-list --instance-id ld-xxx` |
| `aliyun hitsdb get-lindorm-fs-used-detail` | V1 storage usage details | `aliyun hitsdb get-lindorm-fs-used-detail --instance-id ld-xxx` |
| `aliyun hitsdb get-lindorm-v2-storage-usage` | V2 storage usage details | `aliyun hitsdb get-lindorm-v2-storage-usage --instance-id ld-xxx` |
| `aliyun hitsdb get-instance-ip-white-list` | Get IP whitelist | `aliyun hitsdb get-instance-ip-white-list --instance-id ld-xxx` |
#### Engine Types
| Engine | V1 Code | V2 Code | Notes |
|--------|---------|---------|-------|
| LindormTable | `lindorm` | `lindorm` | HBase-compatible, supports SQL (recommended) |
| LindormTable (columnar) | — | `lcolumn` | V2 only |
| LindormTSDB | `tsdb` | `tsdb` | Time-series data storage |
| LindormSearch | `solr` | `lsearch` | Port 30070 (ES-compatible) / 10020 (Solr internal) |
| Lindorm Tunnel Service | `bds` | `bds` | Formerly BDS, no external connection |
| Compute Engine | `compute` | `compute` | Flink streaming engine, no external connection |
| Stream Engine | `stream` | `lstream` | Port 33060 (MySQL protocol) |
| Message Engine | — | `lmessage` | Kafka-compatible, supports topic management and message production/consumption |
| Vector Engine | `lvector` | `lvector` | Vector retrieval engine |
| AI Engine | `lai` | `lai` | AI retrieval engine; domain `proxy-ai-vpc` / `proxy-aiproxy-vpc` |
| LindormDFS | `file` | `file` | OSS-compatible storage (HDFS protocol, port 9000) |
#### Port Quick Reference
| Engine | Protocol | Port | Notes |
|--------|----------|------|-------|
| LindormTable | MySQL protocol | 33060 | ✅ Recommended, preferred for SQL connections |
| LindormTable | HBase API | 30020 | HBase native API compatible |
| LindormTable | Avatica protocol | 30060 | ⚠️ Legacy only, migrate to MySQL protocol |
| LindormTable | Cassandra CQL | 9042 | ⚠️ Legacy only, Cassandra protocol compatible |
| Stream Engine | MySQL protocol | 33060 | Stream SQL via MySQL protocol |
| LindormTSDB | HTTP SQL | 8242 | HTTP SQL API |
| LindormSearch | ES-compatible / Solr | 30070 | Elasticsearch-compatible port, fixed |
| LindormDFS | HDFS | 9000 | NameNode port |
#### Cloud Monitor API (aliyun cms)
| Command | Description | Example |
|---------|-------------|---------|
| `aliyun cms describe-metric-meta-list` | List available monitoring metrics | `aliyun cms describe-metric-meta-list --namespace acs_lindorm` |
| `aliyun cms describe-metric-last` | Get latest monitoring data (returns per-node data; Datapoints is a JSON string requiring secondary parsing) | `aliyun cms describe-metric-last --namespace acs_lindorm --metric-name cpu_idle --dimensions '[{"instanceId":"ld-xxx"}]'` |
| `aliyun cms describe-metric-data` | Get historical trend data (aggregated by period, no host dimension) | `aliyun cms describe-metric-data --namespace acs_lindorm --metric-name cpu_idle --dimensions '[{"instanceId":"ld-xxx"}]' --start-time "2026-04-14 08:00:00" --end-time "2026-04-14 09:00:00" --period 60` |
**Metric Mapping**
| User says | V1 Metric | V2 Metric | Unit |
|-----------|-----------|-----------|------|
| CPU usage | `100 - cpu_idle` | `100 - cpu_idle` | % |
| Memory usage | `mem_used_percent` | `1 - mem_free / mem_total` | % |
| QPS | `read_ops` + `write_ops` | `read_ops` + `write_ops` | ops/s |
| Latency / RT | `read_rt` / `get_rt_avg` | `read_rt` / `get_rt_avg` | ms |
| P99 latency | `get_rt_p99` / `put_rt_p99` | — (no data) | ms |
| Hot storage usage rate | `hot_storage_used_percent` | `get-lindorm-v2-storage-usage` | % |
| Total storage usage rate | `storage_used_percent` | `get-lindorm-v2-storage-usage` | % |
| Hot storage bytes | `hot_storage_used_bytes` | `get-lindorm-v2-storage-usage` | bytes |
| Cold storage usage rate | `cold_storage_used_percent` | `get-lindorm-v2-storage-usage` | % |
| Cold storage bytes | `cold_storage_used_bytes` | `get-lindorm-v2-storage-usage` | bytes |
Full metric details: `references/02-ops/monitoring-guide.md`
## Interaction Guidelines
### Output Format
**Monitoring Query**:
```
[Summary] CPU usage 25% (normal)
[Time] <YYYY-MM-DD HH:MM–HH:MM>
[Trend] Stable (variance <10%)
[Details] avg 24.5%, max 32.1%, min 18.3%
```
**Error Troubleshooting**:
```
[Error Code] InvalidParameter.InstanceId
[Meaning] Instance ID is invalid or does not exist
[Possible Causes] 1.xxx 2.xxx 3.xxx
[Resolution Steps] 1.xxx 2.xxx 3.xxx
```
**Instance List**:
```
[Region] cn-shanghai [Count] 3
| ID | Name | Status | Engines |
|----|------|--------|---------|
| ld-xxx | prod | Running | LindormTable + LindormTSDB |
```
## Code Generation Standards
### General Principles
1. **Reference Skill documents first**: Lindorm is domain-specific knowledge — information must come from references docs; direct answers from training knowledge are prohibited
2. **Check official docs when Skill doesn't cover it**: For scenarios not covered by references docs, consult official Alibaba Cloud documentation
### Pre-Generation Checklist
- □ Connection parameter names are correct (MySQL protocol: `jdbc:mysql://host:33060`, HBase API: `hbase.zookeeper.quorum`)
- □ Port numbers are correct (LindormTable/Stream Engine MySQL 33060, HBase API 30020, LindormTSDB HTTP 8242, LindormSearch 30070)
- □ Include official documentation link
FILE:references/01-dev/connection-guide.md
# 连接信息获取场景
当用户询问"怎么连接实例"、"连接地址是什么"、"需要什么 SDK"时,按本指南执行。
## 触发条件
用户的典型表达:
- "怎么连接 ld-xxx?"
- "给我连接地址"
- "用 Java 怎么连接?"
- "时序引擎的端口是多少?"
- "给我一个连接示例"
## 核心原则
**Agent 是解决方案提供者**,而不是指路者:
2. **Agent 提取关键信息**并整理成完整答案(代码示例、依赖配置、参数说明)
3. **用户无需离开对话**就能得到可执行的连接代码
4. 连接地址从 API 无法获取时,**明确告知控制台精确路径**(精确到按钮位置)
5. 文档链接作为补充参考,用户如需深入了解可查看
---
## 执行流程
### 阶段一:获取实例基本信息
执行以下命令获取实例的架构版本、连接端点和网络配置:
```bash
# 1. 获取实例详情(判断 V1/V2 架构)
aliyun hitsdb get-lindorm-instance \
--instance-id <instance-id>
# 2. 获取各引擎连接端点
aliyun hitsdb get-lindorm-instance-engine-list \
--instance-id <instance-id>
```
**需要提取的关键信息**:
| 信息项 | 来源字段 | 说明 |
|--------|----------|------|
| 架构版本 | `ServiceType` | `lindorm_v2*` = V2 新架构, `lindorm` = V1 老架构 |
| 连接地址 | `NetInfoList`(V1/V2 均使用此字段) | 各引擎的域名和端口 |
| 网络类型 | `NetType` | `"0"` = 公网可用, `"2"` = 仅 VPC 内网(字符串类型,V1/V2 相同) |
| 引擎版本 | `EngineList` | 各引擎的版本号 |
> **注意**:`get-lindorm-instance-engine-list` 对 V1 和 V2 都返回 `NetInfoList` + `NetType`。另一个 V2 专属 API `get-lindorm-v2-instance-details` 返回 `ConnectAddressList` + `Type=INTRANET/INTERNET`,见阶段二。
**端点域名格式**:
域名格式见 [sql-client-guide.md](sql-client-guide.md),含V1/V2 ServiceType判断逻辑和完整示例
---
### 阶段二:确认连接条件
在提供连接代码前,必须确认以下两项:
#### 1. 公网访问检查
**方式一:通过 `get-lindorm-instance-engine-list`(V1/V2 通用)**
检查 `NetInfoList` 中的 `NetType` 字段(字符串类型):
- `"0"`: 公网可用
- `"2"`: 仅 VPC 内网
**方式二:通过 `get-lindorm-v2-instance-details`(仅 V2)**
检查 `ConnectAddressList` 中的 `Type` 字段:
- `INTERNET`: 公网可用
- `INTRANET`: 仅 VPC 内网
**如果只有 VPC 内网地址(NetType="2" 或 Type=INTRANET)**:
> ⚠️ 当前实例 SQL 端口仅支持 VPC 内网访问。从本地电脑连接需要:
> 1. 登录 [Lindorm 控制台](https://lindorm.console.aliyun.com/)
> 2. 点击实例 ID → **数据库连接** → **引擎** 页签
> 3. 点击右上角 **「开通公网地址」**
> 4. 配置白名单(您的本地 IP 地址)
>
> 或者,您可以在阿里云 ECS(与 Lindorm 同 VPC)上执行连接和操作。
#### 2. 密码获取与确认
**V2 实例**:
```bash
aliyun hitsdb get-lindorm-v2-instance-details \
--instance-id <instance-id>
```
提取 `InitialRootPassword` 字段,用户名为 `root`。
> ⚠️ **密码获取/确认流程:**
> 1. **首次连接**:使用 `InitialRootPassword`
> 2. **连接失败/密码错误**:停止执行,**必须询问用户**当前密码
> 3. **执行变更操作**(如创建表、修改配置):**必须获得用户明确授权**
**V1 实例**:
- **默认用户名**:`root`
- **默认密码**:`root`
- **如果忘记密码**:通过集群管理系统修改
- 路径:[Lindorm 控制台](https://lindorm.console.aliyun.com/) → 实例 ID → **数据库连接** → **宽表引擎** → **Lindorm Insight** → **用户管理**
- 修改密码后需要**重启引擎**才能生效
---
### 阶段三:提供连接信息
整理阶段一、二获取的信息,直接给用户完整的连接方案:
```
实例 ld-xxx 已开通以下引擎:
- 宽表引擎(版本 2.8.6,V2 新架构)
- 时序引擎(版本 2.7.15)
【连接地址】(已通过 API 获取)
- 内网(VPC):ld-xxx-proxy-lindorm-vpc.lindorm.aliyuncs.com:33060
- 公网:ld-xxx-proxy-lindorm-pub.lindorm.aliyuncs.com:33060
> ⚠️ 从公网(本地电脑等)连接时,必须使用公网地址(`-pub`),不能用内网地址(`-vpc`),否则连接超时。
【SQL 连接凭证】
- 用户名:root
- 密码:(V2 实例已通过 get-lindorm-v2-instance-details 获取 InitialRootPassword;
V1 实例请在控制台 Lindorm Insight → 用户管理 中查看)
```
**连通性验证**(MySQL 命令行):
```bash
mysql -h <连接地址> -P 33060 -u root -p \
--get-server-public-key --ssl-mode=DISABLED
```
连接成功后提示用户:
> 连接已验证成功。需要建表、写入数据的完整示例吗?可以告诉我您使用的引擎类型,我为您提供完整代码。
**各引擎端口速查**:
| 引擎 | 协议 | 端口 |
|------|------|------|
| 宽表引擎 | MySQL 协议(推荐) | 33060 |
| 宽表引擎 | HBase API | 30020 |
| 时序引擎 | HTTP SQL API | 8242 |
| 搜索引擎 | Elasticsearch API | 30070 |
| 流引擎 | MySQL 协议 | 33060 |
**各引擎连接方式总览**(Agent 根据用户需求路由到正确的文档):
| 引擎 | 连接方式 | 推荐 | 官方文档 |
|------|---------|------|----------|
| 宽表引擎 | MySQL 协议 SQL | ⭐ 推荐 | [Java JDBC](https://help.aliyun.com/zh/lindorm/user-guide/application-development-based-on-java-jdbc-interface)、[Python](https://help.aliyun.com/zh/lindorm/user-guide/python-based-application-development-1)、多语言见 [sql-client-guide.md](sql-client-guide.md) |
| 宽表引擎 | HBase API | 常用 | [Java](https://help.aliyun.com/zh/lindorm/user-guide/use-the-hbase-api-for-java-to-connect-to-and-use-the-wide-table-engine)、[非Java](https://help.aliyun.com/zh/lindorm/user-guide/use-the-hbase-api-for-a-non-java-language-to-connect-to-and-use-the-wide-table-engine),代码示例见 [quick-start-guide.md 场景F](quick-start-guide.md#场景-f宽表引擎-hbase-api-快速开始) |
| 宽表引擎 | Cassandra CQL | 存量 | [Java Driver](https://help.aliyun.com/zh/lindorm/user-guide/use-a-cassandra-client-driver-for-java-to-connect-to-and-use-the-wide-table-engine)、[非Java](https://help.aliyun.com/zh/lindorm/user-guide/use-a-multi-language-cassandra-client-driver-to-connect-to-and-use-the-wide-table-engine) |
| 宽表引擎 | S3 协议 | 存量 | [Java](https://help.aliyun.com/zh/lindorm/user-guide/connect-and-use-the-wide-table-engine-with-the-s3)、[非Java](https://help.aliyun.com/zh/lindorm/user-guide/connect-via-s3-non-java-api-and-use-the-wide-table) |
| 时序引擎 | JDBC Driver | ⭐ 推荐 | [JDBC Driver](https://help.aliyun.com/zh/lindorm/user-guide/use-the-jdbc-driver-for-lindorm-to-connect-to-and-use-lindormtsdb) |
| 时序引擎 | HTTP SQL API | 轻量 | [HTTP API](https://help.aliyun.com/zh/lindorm/user-guide/http-sql-api-user-guide) |
| 搜索引擎 | Elasticsearch API | ⭐ 推荐 | [Java REST Client](https://help.aliyun.com/zh/lindorm/user-guide/java-low-level-rest-client) |
| 向量引擎 | Elasticsearch API | ⭐ 推荐 | 复用搜索引擎端口 30070,[向量开发指南](https://help.aliyun.com/zh/lindorm/user-guide/foundation) |
| 流引擎 | MySQL 协议 ETL SQL | ⭐ 推荐 | [实时ETL](https://help.aliyun.com/zh/lindorm/user-guide/real-time-etl) |
| 流引擎 | Kafka 客户端 | 数据接入 | [Kafka写入](https://help.aliyun.com/zh/lindorm/use-an-open-source-apache-kafka-client-to-write-data-to-the-lindorm-streaming-engine) |
| LindormDFS | HDFS Shell / 客户端 | - | [底层文件访问概览](https://help.aliyun.com/zh/lindorm/user-guide/lindormdfs)、[运维指南](https://help.aliyun.com/zh/lindorm/user-guide/lindormdfs-user-guide/) |
| 计算引擎 | JDBC / JAR / Python | - | [JDBC访问](https://help.aliyun.com/zh/lindorm/user-guide/use-sql-to-connect-to-ldps)、[JAR作业](https://help.aliyun.com/zh/lindorm/user-guide/jar-job-development-practice) |
> **LindormDFS 和计算引擎** Skill 不提供代码示例,如用户询问请引导至上述官方文档或[连接总览](https://help.aliyun.com/zh/lindorm/getting-started/connect-to-an-instance)。
---
### 阶段四:白名单检查
**Agent 主动检查白名单**:
```bash
aliyun hitsdb get-instance-ip-white-list \
--instance-id <instance-id>
```
**分析后给出明确建议**:
```
【白名单检查】
当前白名单配置:10.0.0.0/8
【分析】
- 如果您的客户端 IP 在 10.0.0.0/8 网段内 → ✅ 可以直接连接
- 如果您的客户端 IP 不在白名单中 → ❌ 需要添加
【添加白名单】(精确步骤)
1. 登录 [Lindorm 控制台](https://lindorm.console.aliyun.com/) → 在实例列表页,单击**目标实例ID** → 在左侧导航栏,单击**访问控制** → **白名单**
2. 点击"创建分组白名单"或修改已有分组
3. 添加客户端 IP(内网环境填 VPC IP,公网环境填公网 IP)
- 单个 IP:192.168.1.100
- IP 段:192.168.1.0/24
4. 点击"确定"保存
> 💡 查看自己公网 IP:`curl ifconfig.me`
【安全提示】
- ⚠️ 不建议使用 0.0.0.0/0(允许所有 IP),存在安全风险
- ✅ 建议只添加必要的 IP 或 VPC 网段
需要我帮您排查连接问题吗?
```
---
## 下一步引导
连接验证成功后,根据用户需求引导:
- **建表/写入/查询** → 参考 [quick-start-guide.md](quick-start-guide.md)(含宽表、时序、搜索、向量、流引擎完整示例)
- **连接失败排查** → 参考 [connection-troubleshoot.md](../02-ops/connection-troubleshoot.md)
- **用户权限管理** → 参考 [user-permission.md](../02-ops/user-permission.md)
FILE:references/01-dev/quick-start-guide.md
# 快速开始场景
当用户询问"怎么建表"、"怎么写入数据"、"怎么查询数据"等开发入门问题时,按本指南执行。
## 触发条件
用户的典型表达:
- "怎么建表?"
- "怎么写入数据?"
- "给我一个完整的示例"
- "宽表引擎怎么用?"
- "时序数据怎么存储?"
## 核心原则
**Agent 主动做重活**,而不是让用户自己探索:
1. **Agent 提取完整代码示例**,直接给用户可执行的代码
2. 只在必要时附上文档链接作为"了解更多"的补充
3. **目标**:用户拿到代码后可以直接运行,无需再自己查文档
---
## 执行流程
### 步骤 1:确认引擎类型
先询问用户使用哪个引擎(如果未明确):
```
您想使用哪个引擎?
1. 宽表引擎(适合半结构化数据,兼容 HBase)
2. 时序引擎(适合 IoT、监控等时序数据)
3. 搜索引擎(适合全文检索)
或者告诉我您的使用场景,我来推荐引擎。
```
---
### 步骤 2:Agent 查询官方文档并生成完整示例
**⚠️ 强制要求**:在提供任何代码示例之前,**必须**从 Skill references 文档获取信息,Skill 未覆盖时查询官方文档确认。
**Agent 需要做的**:根据引擎类型,访问阿里云 Lindorm 官方文档获取最新示例
**查询目标**:
- 官方文档:https://help.aliyun.com/zh/lindorm/
- 关键信息:建表、写入、查询的完整代码示例
**必须从官方文档中提取并验证**:
- ✅ 最新的 Maven 依赖版本(不是训练数据中的旧版本)
- ✅ 正确的 API 调用方式(如 Connection vs HConnection)
- ✅ 当前推荐的建表语句格式
- ✅ 官方验证过的写入/查询代码
- ✅ 最佳实践和性能优化建议
**验证清单**:
```
在提供代码之前,Agent 必须确认:
□ 已访问官方文档获取最新示例
□ SDK 版本号是最新的(如 alilindorm-table-sdk 2.x)
□ API 调用方式是当前推荐的(不是已废弃的 API)
□ 建表/写入/查询代码经过官方验证
□ 已附上官方文档链接供用户深入学习
```
**然后整理成完整可执行的代码,直接给用户**。
---
## 按引擎类型生成完整示例
### 场景 A:宽表引擎快速开始
**Agent 执行查询后,直接给用户完整答案**:
```
【宽表引擎完整示例】
我已为您准备好从连接到查询的完整代码(基于官方文档最新版本):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 1:添加 Maven 依赖】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
<!-- MySQL 协议 JDBC(推荐) -->
<dependencies>
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
<version>8.3.0</version>
</dependency>
</dependencies>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 2:连接实例】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
import java.sql.*;
import java.util.Properties;
public class LindormQuickStart {
public static void main(String[] args) throws Exception {
// 1. 配置连接(MySQL 协议,推荐)
// Agent 根据实例 ServiceType 选择正确的域名格式,参见 sql-client-guide.md
Class.forName("com.mysql.cj.jdbc.Driver");
String url = "jdbc:mysql://<您的连接地址>:33060/default?sslMode=disabled";
Properties props = new Properties();
props.setProperty("user", "your-username");
props.setProperty("password", "your-password");
Connection conn = DriverManager.getConnection(url, props);
Statement stmt = conn.createStatement();
// 2. 建表
stmt.execute("CREATE TABLE IF NOT EXISTS users (" +
"id VARCHAR PRIMARY KEY," +
"name VARCHAR," +
"age INT," +
"city VARCHAR)");
System.out.println("表创建成功");
// 3. 写入数据
stmt.execute("INSERT INTO users (id, name, age, city) VALUES " +
"('1', 'Alice', 25, 'Shanghai'), " +
"('2', 'Bob', 30, 'Beijing')");
System.out.println("数据写入成功");
// 4. 查询数据
ResultSet rs = stmt.executeQuery("SELECT * FROM users WHERE age > 20");
while (rs.next()) {
System.out.println("name=" + rs.getString("name") + ", age=" + rs.getInt("age"));
}
// 5. 关闭连接
rs.close();
stmt.close();
conn.close();
}
}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【批量写入优化】(生产环境推荐)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// 批量写入(使用 PreparedStatement)
String sql = "INSERT INTO users (id, name, age, city) VALUES (?, ?, ?, ?)";
PreparedStatement pstmt = conn.prepareStatement(sql);
for (int i = 0; i < 10000; i++) {
pstmt.setString(1, "id_" + i);
pstmt.setString(2, "user_" + i);
pstmt.setInt(3, 20 + i % 50);
pstmt.setString(4, "city_" + i % 10);
pstmt.addBatch();
if (i % 1000 == 0) {
pstmt.executeBatch(); // 每 1000 行提交一次
}
}
pstmt.executeBatch(); // 提交剩余数据
【完整文档】如需了解更多(二级索引、全局索引、性能优化):
https://help.aliyun.com/zh/lindorm/user-guide/lindorm-wide-table-engine
```
---
### 场景 B:时序引擎快速开始
**Agent 从官方文档提取最新示例**:
> **推荐连接方式**:官方推荐使用 [JDBC Driver](https://help.aliyun.com/zh/lindorm/user-guide/use-the-jdbc-driver-for-lindorm-to-connect-to-and-use-lindormtsdb)(支持 Java)。以下示例采用 HTTP SQL API(轻量级,适合 Python 等非 Java 语言快速验证)。
参考文档:https://help.aliyun.com/zh/lindorm/user-guide/http-sql-api-user-guide
**然后直接给用户完整代码**:
```
【时序引擎完整示例】(Python HTTP SQL API)
我已为您准备好时序数据写入和查询的完整代码:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 1:安装依赖】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
pip install requests
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 2:完整示例代码】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
import requests
import time
# 1. 连接配置
host = "您的时序引擎连接地址" # 从控制台"数据库连接"页面获取
port = 8242
url = f"http://{host}:{port}/api/v2/sql"
# 2. 建表
create_sql = """CREATE TABLE IF NOT EXISTS sensor (
device_id VARCHAR NOT NULL,
region VARCHAR NOT NULL,
time TIMESTAMP NOT NULL,
temperature DOUBLE,
humidity BIGINT,
PRIMARY KEY(device_id, region, time)
)"""
response = requests.post(url, data=create_sql)
print(f"建表结果: {response.status_code}")
# 3. 写入数据(单条)
insert_sql = f"""INSERT INTO sensor (device_id, region, time, temperature, humidity) VALUES
('F07A1260', 'north-cn', '{time.strftime('%Y-%m-%d %H:%M:%S')}', 75.3, 45)"""
response = requests.post(url, data=insert_sql)
print(f"写入结果: {response.status_code}")
# 4. 批量写入(推荐)
# 注意:主键为 (device_id, region, time),同一设备同一时间戳会 UPSERT 覆盖
import datetime
now = datetime.datetime.now()
times = [(now + datetime.timedelta(seconds=i)).strftime('%Y-%m-%d %H:%M:%S') for i in range(4)]
batch_sql = f"""INSERT INTO sensor (device_id, region, time, temperature, humidity) VALUES
('F07A1260', 'north-cn', '{times[0]}', 75.3, 45),
('F07A1260', 'north-cn', '{times[1]}', 76.1, 47),
('F07A1261', 'south-cn', '{times[2]}', 18.1, 44),
('F07A1261', 'south-cn', '{times[3]}', 19.7, 44)"""
response = requests.post(url, data=batch_sql)
print(f"批量写入结果: {response.status_code}")
# 5. 查询数据
query_sql = "SELECT device_id, region, time, temperature FROM sensor LIMIT 100"
response = requests.post(url, data=query_sql)
result = response.json()
for row in result.get('rows', []):
print(f"设备: {row[0]}, 区域: {row[1]}, 时间: {row[2]}, 温度: {row[3]}")
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【生产环境建议】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. 批量写入:每次写入 100-1000 个数据点
2. 数据压缩:时序引擎自动压缩,无需手动配置
3. TTL 设置:建议设置数据过期时间(如 90 天)
4. 异常处理:添加重试机制和错误日志
【完整文档】如需了解更多(HTTP API 详细参数、降采样、预聚合、TTL):
https://help.aliyun.com/zh/lindorm/user-guide/http-sql-api-user-guide
```
---
### 场景 C:搜索引擎快速开始
**Agent 执行查询后,直接给用户完整答案**:
```
【搜索引擎完整示例】(兼容 Elasticsearch 7.10 API)
我已为您准备好搜索引擎的完整代码(Java Low Level REST Client):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 1:获取连接信息】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
控制台 → 数据库连接 → 搜索引擎页签
- Elasticsearch 兼容地址(专有网络或公网)
- 默认用户名和密码
- 端口固定为 30070
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 2:添加 Maven 依赖】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
<version>7.10.0</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.8.2</version>
</dependency>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 3:连接并操作】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.Request;
import org.elasticsearch.client.Response;
import org.apache.http.util.EntityUtils;
public class LindormSearchQuickStart {
public static void main(String[] args) throws Exception {
// 1. 配置连接(Elasticsearch 兼容,端口 30070)
// Agent 根据实例 ServiceType 选择域名格式:V1=.lindorm.rds.aliyuncs.com,V2=.lindorm.aliyuncs.com
String searchUrl = "ld-xxxx-proxy-search-pub.lindorm.rds.aliyuncs.com";
int searchPort = 30070;
String username = "user"; // 从控制台获取
String password = "test"; // 从控制台获取
final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY,
new UsernamePasswordCredentials(username, password));
RestClientBuilder builder = RestClient.builder(new HttpHost(searchUrl, searchPort));
builder.setHttpClientConfigCallback(httpClientBuilder ->
httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider));
try (RestClient client = builder.build()) {
String indexName = "products";
// 2. 创建索引
Request createReq = new Request("PUT", "/" + indexName);
createReq.setJsonEntity("{" +
" \"settings\":{\"index.number_of_shards\": 1}," +
" \"mappings\":{" +
" \"properties\":{" +
" \"name\":{\"type\":\"text\"}," +
" \"price\":{\"type\":\"double\"}," +
" \"category\":{\"type\":\"keyword\"}" +
" }" +
" }" +
"}");
Response resp = client.performRequest(createReq);
System.out.println("创建索引: " + EntityUtils.toString(resp.getEntity()));
// 3. 批量写入文档
Request bulkReq = new Request("POST", "/_bulk");
StringBuilder bulk = new StringBuilder();
bulk.append("{\"index\":{\"_index\":\"products\",\"_id\":\"1\"}}\n");
bulk.append("{\"name\":\"iPhone 15\",\"price\":7999.0,\"category\":\"手机\"}\n");
bulk.append("{\"index\":{\"_index\":\"products\",\"_id\":\"2\"}}\n");
bulk.append("{\"name\":\"MacBook Pro\",\"price\":14999.0,\"category\":\"电脑\"}\n");
bulk.append("{\"index\":{\"_index\":\"products\",\"_id\":\"3\"}}\n");
bulk.append("{\"name\":\"AirPods Pro\",\"price\":1899.0,\"category\":\"耳机\"}\n");
bulkReq.setJsonEntity(bulk.toString());
client.performRequest(bulkReq);
System.out.println("批量写入完成");
// 4. 刷新索引(强制写入数据可见)
client.performRequest(new Request("POST", "/" + indexName + "/_refresh"));
// 5. 全文检索
Request searchReq = new Request("GET", "/" + indexName + "/_search");
searchReq.setJsonEntity("{" +
" \"query\":{" +
" \"match\":{\"name\":\"Pro\"}" +
" }" +
"}");
resp = client.performRequest(searchReq);
System.out.println("搜索结果: " + EntityUtils.toString(resp.getEntity()));
// 6. 查询单个文档
resp = client.performRequest(new Request("GET", "/" + indexName + "/_doc/1"));
System.out.println("文档1: " + EntityUtils.toString(resp.getEntity()));
// 7. 删除索引
client.performRequest(new Request("DELETE", "/" + indexName));
System.out.println("索引已删除");
}
}
}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【curl 快速验证】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# 创建索引
curl -u user:password -X PUT "http://ld-xxxx-proxy-search-pub.lindorm.rds.aliyuncs.com:30070/products" \
-H 'Content-Type: application/json' -d '
{"settings":{"index.number_of_shards":1},
"mappings":{"properties":{"name":{"type":"text"},"price":{"type":"double"}}}}'
# 写入文档
curl -u user:password -X POST "http://ld-xxxx-proxy-search-pub.lindorm.rds.aliyuncs.com:30070/products/_doc/1" \
-H 'Content-Type: application/json' -d '{"name":"iPhone 15","price":7999}'
# 全文检索
curl -u user:password -X GET "http://ld-xxxx-proxy-search-pub.lindorm.rds.aliyuncs.com:30070/products/_search" \
-H 'Content-Type: application/json' -d '{"query":{"match":{"name":"iPhone"}}}'
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【搜索引擎关键参数】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
| 参数 | 值 | 说明 |
|------|-----|------|
| 端口 | **30070** | Elasticsearch 兼容端口,固定不变 |
| 协议 | HTTP | 不支持 HTTPS |
| 认证 | Basic Auth | 用户名密码从控制台获取 |
| 兼容版本 | ES 7.10 | 兼容 Elasticsearch 7.10 及更早版本 API |
| 写入后可见 | 需手动 _refresh | 或等待自动 refresh(默认1秒) |
【完整文档】搜索引擎开发指南:
https://help.aliyun.com/zh/lindorm/user-guide/lindormsearch/
https://help.aliyun.com/zh/lindorm/user-guide/java-low-level-rest-client
```
---
### 场景 D:向量引擎快速开始
**Agent 执行查询后,直接给用户完整答案**:
```
【向量引擎完整示例】(通过搜索引擎 ES API 访问)
Lindorm 向量引擎无独立连接地址,通过搜索引擎的 Elasticsearch 兼容 API 访问(端口 30070)。
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 1:获取连接信息】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
与搜索引擎相同:
- 连接地址:搜索引擎的 Elasticsearch 兼容地址(公网或 VPC)
- 端口:30070
- 认证:Basic Auth(用户名/密码)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 2:创建向量索引(hnsw)】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
curl -u user:password -X PUT "http://ld-xxxx-proxy-search-pub.lindorm.rds.aliyuncs.com:30070/vector_test" \
-H 'Content-Type: application/json' -d '{
"settings": {
"number_of_shards": 1,
"knn": true
},
"mappings": {
"_source": {"excludes": ["vector1"]},
"properties": {
"vector1": {
"type": "knn_vector",
"dimension": 3,
"method": {
"engine": "lvector",
"name": "hnsw",
"space_type": "l2",
"parameters": {
"m": 24,
"ef_construction": 500
}
}
},
"field1": {"type": "long"},
"name": {"type": "keyword"}
}
}
}'
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 3:写入向量数据】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
curl -u user:password -X POST "http://ld-xxxx-proxy-search-pub.lindorm.rds.aliyuncs.com:30070/_bulk" \
-H 'Content-Type: application/x-ndjson' -d '
{"index":{"_index":"vector_test","_id":"1"}}
{"field1":1,"name":"苹果","vector1":[1.2,1.3,1.4]}
{"index":{"_index":"vector_test","_id":"2"}}
{"field1":2,"name":"香蕉","vector1":[2.2,2.3,2.4]}
{"index":{"_index":"vector_test","_id":"3"}}
{"field1":3,"name":"橙子","vector1":[3.2,3.3,3.4]}
'
# 刷新索引使数据可见
curl -u user:password -X POST "http://ld-xxxx-proxy-search-pub.lindorm.rds.aliyuncs.com:30070/vector_test/_refresh"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 4:KNN 近似搜索】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# 查找与 [1.3,1.4,1.5] 最相似的 3 个向量
# 注意:KNN 搜索默认不返回 _source,必须显式指定需要返回的字段
curl -u user:password -X GET "http://ld-xxxx-proxy-search-pub.lindorm.rds.aliyuncs.com:30070/vector_test/_search" \
-H 'Content-Type: application/json' -d '{
"size": 3,
"_source": ["field1", "name"],
"query": {
"knn": {
"vector1": {
"vector": [1.3,1.4,1.5],
"k": 3
}
}
}
}'
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 5:向量+标量混合检索】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
curl -u user:password -X GET "http://ld-xxxx-proxy-search-pub.lindorm.rds.aliyuncs.com:30070/vector_test/_search" \
-H 'Content-Type: application/json' -d '{
"size": 3,
"_source": ["field1", "name"],
"query": {
"bool": {
"must": [
{
"knn": {
"vector1": {
"vector": [1.3,1.4,1.5],
"k": 10
}
}
}
],
"filter": [
{"range": {"field1": {"gte": 1, "lte": 3}}}
]
}
}
}'
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【向量引擎关键参数】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
| 参数 | 值 | 说明 |
|------|-----|------|
| 接入方式 | 搜索引擎 ES API | 无独立连接地址,复用搜索引擎端口 30070 |
| 向量类型 | knn_vector | 需指定 dimension(维度) |
| 索引算法 | hnsw | 支持 l2、cosinesimil(实测均可用) |
| 写入后可见 | 需 _refresh | 或等待自动刷新 |
【完整文档】向量引擎开发指南:
https://help.aliyun.com/zh/lindorm/user-guide/foundation
```
---
### 场景 E:流引擎快速开始
**Agent 执行查询后,直接给用户完整答案**:
```
【流引擎完整示例】(ETL SQL 实时同步与预计算)
Lindorm 流引擎通过 MySQL 协议访问(端口 33060),使用 ETL SQL 实现实时数据同步和预计算。
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 1:在宽表引擎创建源表和结果表】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-- 连接宽表引擎(MySQL 协议)
mysql -h <宽表引擎地址> -P 33060 -u root -p
-- 创建源表
CREATE TABLE source_tbl(id INT, val DOUBLE, PRIMARY KEY(id));
-- 创建镜像表(实时同步目标)
CREATE TABLE sink_tbl(id INT, val DOUBLE, PRIMARY KEY(id));
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 2:在流引擎创建 ETL(实时镜像)】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-- 连接流引擎(MySQL 协议,端口相同)
mysql -h <流引擎地址> -P 33060 -u root -p
-- 创建实时同步 ETL
CREATE ETL sync_etl AS INSERT INTO sink_tbl SELECT * FROM source_tbl;
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 3:验证实时同步】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-- 在宽表引擎插入数据
INSERT INTO source_tbl(id, val) VALUES (1, 1.1), (2, 2.2);
-- 查询镜像表(数据已实时同步)
SELECT * FROM sink_tbl;
+------+------+
| id | val |
+------+------+
| 1 | 1.1 |
| 2 | 2.2 |
+------+------+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 4:多表 JOIN 预计算】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-- 在宽表引擎创建用户表和订单表
CREATE TABLE user_tbl(user_id VARCHAR NOT NULL, user_name VARCHAR, PRIMARY KEY(user_id));
CREATE TABLE order_tbl(
order_id VARCHAR NOT NULL,
user_id VARCHAR,
amount DOUBLE,
PRIMARY KEY(order_id)
);
-- 为 JOIN 字段创建索引(必须)
CREATE INDEX idx_user_id ON order_tbl(user_id);
-- 创建结果表(预打宽,需可更新)
CREATE TABLE user_order_tbl(
order_id VARCHAR NOT NULL,
user_id VARCHAR,
user_name VARCHAR,
amount DOUBLE,
PRIMARY KEY(order_id)
) WITH (MUTABILITY='MUTABLE_UDT');
-- 在流引擎创建 JOIN ETL(需使用完整表名)
CREATE ETL join_etl AS
INSERT INTO `lindorm_table`.`default`.`user_order_tbl`(order_id, user_id, user_name, amount)
SELECT o.order_id, o.user_id, u.user_name, o.amount
FROM `lindorm_table`.`default`.`order_tbl` o
JOIN `lindorm_table`.`default`.`user_tbl` u ON o.user_id = u.user_id;
-- 插入测试数据
INSERT INTO user_tbl VALUES ('U001', '张三'), ('U002', '李四');
INSERT INTO order_tbl VALUES ('O001', 'U001', 100.0), ('O002', 'U001', 200.0);
-- 查询结果表(已实时 JOIN)
SELECT * FROM user_order_tbl;
+----------+---------+-----------+--------+
| order_id | user_id | user_name | amount |
+----------+---------+-----------+--------+
| O001 | U001 | 张三 | 100.0 |
| O002 | U001 | 张三 | 200.0 |
+----------+---------+-----------+--------+
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【流引擎关键参数】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
| 参数 | 值 | 说明 |
|------|-----|------|
| 连接方式 | MySQL 协议 | 与宽表引擎相同,端口 33060 |
| 核心语法 | CREATE ETL | `CREATE ETL name AS INSERT INTO ... SELECT ...` |
| 表名格式 | 完整路径 | 跨库查询需用 `lindorm_table.default.tablename` |
| JOIN 要求 | 必须有索引 | JOIN key 需有二级索引,否则报错 |
| 结果表 | MUTABLE_UDT | 预计算结果表需支持更新 |
| 实时性 | 近实时 | 数据变更后秒级同步 |
【管理 ETL】
-- 查看所有 ETL
SHOW ETLS;
-- 删除 ETL
DROP ETL IF EXISTS etl_name;
【完整文档】流引擎开发指南:
https://help.aliyun.com/zh/lindorm/user-guide/real-time-etl
```
---
### 场景 F:宽表引擎 HBase API 快速开始
**适用场景**:已有 HBase 应用迁移、需要 KV 级操作的场景。新用户推荐优先使用场景 A(MySQL 协议 SQL)。
参考文档:https://help.aliyun.com/zh/lindorm/user-guide/use-the-hbase-api-for-java-to-connect-to-and-use-the-wide-table-engine
HBase SDK 安装:https://help.aliyun.com/zh/lindorm/user-guide/install-and-upgrade-hbase-sdk-for-java
**Agent 从官方文档提取最新示例**:
```
【宽表引擎 HBase API 完整示例】(Java)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 1:添加 Maven 依赖】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
<!-- 根据开源 HBase 客户端版本选择对应的阿里云发行版 -->
<!-- HBase 1.x 用户 -->
<dependency>
<groupId>com.aliyun.hbase</groupId>
<artifactId>alihbase-client</artifactId>
<version>1.8.8</version>
</dependency>
<!-- HBase 2.x 用户 -->
<dependency>
<groupId>com.aliyun.hbase</groupId>
<artifactId>alihbase-client</artifactId>
<version>2.8.7</version>
</dependency>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 2:配置连接】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
public class LindormHBaseQuickStart {
public static void main(String[] args) throws Exception {
// 1. 配置连接(端口 30020)
// 连接地址从控制台"数据库连接"页面获取 HBase API 地址
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "<您的连接地址>:30020");
conf.set("hbase.client.username", "用户名");
conf.set("hbase.client.password", "密码");
// 2. 创建连接(线程安全,全局复用,程序结束时关闭)
Connection connection = ConnectionFactory.createConnection(conf);
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 3:DDL 操作(建表/删表)】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
try (Admin admin = connection.getAdmin()) {
// 建表
HTableDescriptor htd = new HTableDescriptor(TableName.valueOf("tablename"));
htd.addFamily(new HColumnDescriptor(Bytes.toBytes("family")));
// 创建单分区表(生产环境建议预分区,避免热点)
admin.createTable(htd);
// 预分区建表示例(推荐)
// byte[][] splitKeys = new byte[][] {
// Bytes.toBytes("10"), Bytes.toBytes("20"), Bytes.toBytes("30")
// };
// admin.createTable(htd, splitKeys);
// disable 表(truncate/删除前必须先 disable)
// admin.disableTable(TableName.valueOf("tablename"));
// admin.truncateTable(TableName.valueOf("tablename"), true);
// admin.deleteTable(TableName.valueOf("tablename"));
}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【步骤 4:DML 操作(读写删查)】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Table 为非线程安全对象,每个线程必须从 Connection 中获取自己的 Table
try (Table table = connection.getTable(TableName.valueOf("tablename"))) {
// 插入数据
Put put = new Put(Bytes.toBytes("row"));
put.addColumn(Bytes.toBytes("family"), Bytes.toBytes("qualifier"), Bytes.toBytes("value"));
table.put(put);
// 单行读取
Get get = new Get(Bytes.toBytes("row"));
Result res = table.get(get);
// 删除一行数据
Delete delete = new Delete(Bytes.toBytes("row"));
table.delete(delete);
// Scan 范围查询
Scan scan = new Scan(Bytes.toBytes("startRow"), Bytes.toBytes("endRow"));
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
// 处理查询结果
}
scanner.close();
}
connection.close();
}
}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【HBase API 关键参数】
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
| 参数 | 值 | 说明 |
|------|-----|------|
| 端口 | **30020** | HBase API 专用端口,固定不变 |
| Connection | 线程安全 | 全局创建一次,程序结束时关闭 |
| Table | 非线程安全 | 每个线程必须获取自己的 Table 对象 |
| 建表 | 建议预分区 | 单 Region 会导致热点,生产环境必须预分区 |
| 认证 | 用户名/密码 | 从控制台获取 |
【完整文档】
- Java:https://help.aliyun.com/zh/lindorm/user-guide/use-the-hbase-api-for-java-to-connect-to-and-use-the-wide-table-engine
- 非 Java(Thrift2):https://help.aliyun.com/zh/lindorm/user-guide/use-the-hbase-api-for-a-non-java-language-to-connect-to-and-use-the-wide-table-engine
```
FILE:references/01-dev/sql-client-guide.md
# Lindorm SQL 客户端开发指南
本文档提供多语言连接 Lindorm SQL 的开发参考,包括 Java、Python、Go、C/C++、C#、Rust、PHP、Node.js 等语言的连接示例、连接池配置和框架集成方案。
> **推荐**: MySQL 协议更加稳定可靠,性能更优,推荐新用户使用 MySQL 协议连接宽表引擎。
## 通用前提条件
- 已开通 MySQL 协议兼容功能(控制台 > 数据库连接 > 宽表引擎)
- 已将客户端 IP 添加至白名单
- MySQL 协议端口请以 `SKILL.md` →「代码生成规范 / 端口号速查表」为准
### 连接域名格式(Agent 根据实例类型自动选择)
Lindorm 实例分为 V1 和 V2 两种架构,域名格式不同。Agent 执行时应:
1. **查询实例详情**获取 `ServiceType`
2. **判断架构类型**:
- `lindorm_v2*` → 使用 V2 域名格式
- `lindorm` → 使用 V1 域名格式
3. **自动填充**正确的连接地址
| 架构 | ServiceType | 域名格式 | 内网示例 | 公网示例 |
|------|-------------|----------|----------|----------|
| **V2** | `lindorm_v2*` | `*.lindorm.aliyuncs.com` | `ld-xxx-proxy-lindorm-vpc.lindorm.aliyuncs.com:33060` | `ld-xxx-proxy-lindorm-pub.lindorm.aliyuncs.com:33060` |
| **V1** | `lindorm` | `*.lindorm.rds.aliyuncs.com` | `ld-xxx-proxy-lindorm.lindorm.rds.aliyuncs.com:33060` | `ld-xxx-proxy-lindorm-public.lindorm.rds.aliyuncs.com:33060` |
> **V1 宽表引擎有两个 MySQL 地址**:`proxy-lindorm` 和 `proxy-sql-lindorm`,功能相同,任选其一即可。
>
> **V1 公网地址需开通公网访问后才可用**,默认仅提供内网地址。公网地址后缀为 `-public`。
>
> **获取方式**:控制台 → 实例详情 → 数据库连接,或执行 `aliyun hitsdb get-lindorm-instance-engine-list --instance-id <id>`
---
## 注意事项
- 只能基于本 Skill 文档中明确记载的内容回答用户问题,严禁推测、联想或凭训练知识生成文档中不存在的 SQL 语法、参数、功能或配置。
- 如果文档中没有相关信息,必须明确告知用户"当前文档未收录此内容",并引导用户查阅阿里云官方文档(help.aliyun.com)确认。
- 生成的代码示例必须基于文档中的模板,参数和语法必须与文档一致。
---
## 执行步骤
### 步骤1:识别用户程序开发语言
---
### 步骤2:根据用户开发语言选择连接方式
#### 官方使用文档地址
**基于 SQL 的应用开发:**
https://help.aliyun.com/zh/lindorm/user-guide/add-connect-wide-table-engines-through-lindorm-query-language/
**使用 MySQL 协议的应用开发(推荐)**
1. **Java**
- JDBC接口:https://help.aliyun.com/zh/lindorm/user-guide/application-development-based-on-java-jdbc-interface
- 连接池Druid:https://help.aliyun.com/zh/lindorm/user-guide/application-development-based-on-java-connection-pool-druid
- LindormDataSource:https://help.aliyun.com/zh/lindorm/user-guide/application-development-based-on-lindormdatasource
- ORM框架MyBatis:https://help.aliyun.com/zh/lindorm/user-guide/application-development-based-on-java-orm-framework-mybatis
2. **Python**
- 原生Python:https://help.aliyun.com/zh/lindorm/user-guide/python-based-application-development-1
- ORM框架:https://help.aliyun.com/zh/lindorm/user-guide/application-development-based-on-python-orm-framework
3. **Go**
- 原生Go:https://help.aliyun.com/zh/lindorm/user-guide/application-development-based-on-go
- ORM框架:https://help.aliyun.com/zh/lindorm/user-guide/application-development-based-on-go-orm-framework
4. **C**
- C API:https://help.aliyun.com/zh/lindorm/user-guide/application-development-based-on-c-api
5. **C#**
- 原生C#:https://help.aliyun.com/zh/lindorm/user-guide/application-development-based-on-c
6. **Rust**
- 原生Rust:https://help.aliyun.com/zh/lindorm/user-guide/rust-based-application-development
7. **PHP**
- 原生PHP:https://help.aliyun.com/zh/lindorm/user-guide/php-based-application-development
8. **Node.js**
- 原生Node.js:https://help.aliyun.com/zh/lindorm/user-guide/application-development-based-on-node-js
9. **ODBC**
- ODBC接口:https://help.aliyun.com/zh/lindorm/user-guide/application-development-based-on-odbc
**Avatica协议**(仅存量维护)
1. **Java**
- JDBC接口:https://help.aliyun.com/zh/lindorm/user-guide/call-java-api-operations-in-sql-based-connection-to-and-usage-of-lindormtable
- 连接池Druid:https://help.aliyun.com/zh/lindorm/user-guide/through-the-connection-pool-druid-connection-wide-table-engine
2. **Python**
- DB-API:https://help.aliyun.com/zh/lindorm/user-guide/use-the-lindorm-sql-api-for-a-non-java-language-to-connect-to-and-use-the-wide-table-engine-lindormtable
- 连接池DBUtils:https://help.aliyun.com/zh/lindorm/user-guide/use-dbutils-to-connect-to-lindormtable
3. **Go**
- database/sql接口:https://help.aliyun.com/zh/lindorm/user-guide/use-the-apis-provided-by-the-database-or-sql-library-of-go-to-develop-applications
---
### 步骤3:提供连接示例
---
## 常见连接示例
### MySQL 协议(推荐)
所有示例使用 `<您的连接地址>` 占位符,Agent 根据实例 ServiceType 自动填充正确的 V1/V2 域名。
#### Java
##### 1. JDBC 接口
**依赖**:
```xml
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
<version>8.3.0</version>
</dependency>
```
**连接代码**:
```java
Class.forName("com.mysql.cj.jdbc.Driver");
String username = "root";
String password = "your_password";
String database = "default";
String url = "jdbc:mysql://<您的连接地址>:33060/" + database
+ "?sslMode=disabled&allowPublicKeyRetrieval=true&useServerPrepStmts=true"
+ "&useLocalSessionState=true&rewriteBatchedStatements=true&cachePrepStmts=true"
+ "&prepStmtCacheSize=100&prepStmtCacheSqlLimit=50000000";
Properties properties = new Properties();
properties.put("user", username);
properties.put("password", password);
Connection connection = DriverManager.getConnection(url, properties);
```
**CRUD 示例**:
```java
// 创建表
try (Statement stmt = connection.createStatement()) {
stmt.executeUpdate("CREATE TABLE IF NOT EXISTS user_test(id VARCHAR, name VARCHAR, PRIMARY KEY(id))");
}
// 批量插入 (推荐使用 INSERT,语义与 UPSERT 相同)
String sql = "INSERT INTO user_test(id, name) VALUES(?, ?)";
try (PreparedStatement ps = connection.prepareStatement(sql)) {
for (int i = 0; i < 100; i++) {
ps.setString(1, "id" + i);
ps.setString(2, "name" + i);
ps.addBatch();
}
ps.executeBatch(); // batchSize 建议 50-100
}
// 查询
try (PreparedStatement ps = connection.prepareStatement("SELECT * FROM user_test WHERE id = ?")) {
ps.setString(1, "id1");
ResultSet rs = ps.executeQuery();
while (rs.next()) {
System.out.println("id=" + rs.getString(1) + ", name=" + rs.getString(2));
}
}
// 关闭连接
connection.close();
```
##### 2. Druid 连接池
**依赖**:
```xml
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid</artifactId>
<version>1.2.11</version>
</dependency>
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
<version>8.3.0</version>
</dependency>
```
**配置文件** (`druid.properties`):
```properties
driverClassName=com.mysql.cj.jdbc.Driver
url=jdbc:mysql://<您的连接地址>:33060/default?sslMode=disabled&allowPublicKeyRetrieval=true&useServerPrepStmts=true&useLocalSessionState=true&rewriteBatchedStatements=true&cachePrepStmts=true&prepStmtCacheSize=100&prepStmtCacheSqlLimit=50000000&socketTimeout=120000
username=root
password=your_password
init=true
initialSize=10
maxActive=40
minIdle=40
maxWait=30000
# 避免连接负载不均衡
druid.phyMaxUseCount=10000
phyTimeoutMillis=1800000
# 连接保活
druid.keepAlive=true
druid.keepAliveBetweenTimeMillis=120000
timeBetweenEvictionRunsMillis=60000
minEvictableIdleTimeMillis=300000
maxEvictableIdleTimeMillis=600000
testWhileIdle=true
testOnBorrow=false
testOnReturn=false
```
**初始化连接池**:
```java
Properties properties = new Properties();
InputStream inputStream = getClass().getClassLoader().getResourceAsStream("druid.properties");
properties.load(inputStream);
DataSource dataSource = DruidDataSourceFactory.createDataSource(properties);
// 使用连接
try (Connection conn = dataSource.getConnection()) {
// 执行 SQL...
}
```
##### 3. LindormDataSource(官方推荐)
封装了开箱即用的最佳配置,支持多可用区就近访问。
**依赖**:
```xml
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
<version>8.3.0</version>
</dependency>
<dependency>
<groupId>com.aliyun.lindorm</groupId>
<artifactId>lindorm-sql-datasource</artifactId>
<version>2.2.1.4</version>
</dependency>
```
**使用方式**:
```java
LindormDataSourceConfig config = new LindormDataSourceConfig();
config.setJdbcUrl("jdbc:mysql://<您的连接地址>:33060/default");
config.setUsername("root");
config.setPassword("your_password");
config.setMaximumPoolSize(30);
LindormDataSource dataSource = new LindormDataSource(config);
try (Connection conn = dataSource.getConnection()) {
// 执行 SQL...
}
```
**Spring Boot 2.x 集成**:
```xml
<dependency>
<groupId>com.aliyun.lindorm</groupId>
<artifactId>lindorm-sql-datasource-springboot-starter</artifactId>
<version>2.2.1.4</version>
</dependency>
```
```yaml
# application.yml
spring:
datasource:
lindorm:
jdbc-url: jdbc:mysql://<您的连接地址>:33060/default
username: root
password: your_password
maximum-pool-size: 30
```
##### 4. MyBatis 框架
**依赖**:
```xml
<dependency>
<groupId>org.mybatis</groupId>
<artifactId>mybatis</artifactId>
<version>3.5.14</version>
</dependency>
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
<version>8.3.0</version>
</dependency>
```
**配置文件** (`mybatis-config.xml`):
```xml
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE configuration PUBLIC "-//mybatis.org//DTD Config 3.0//EN" "https://mybatis.org/dtd/mybatis-3-config.dtd">
<configuration>
<environments default="development">
<environment id="development">
<transactionManager type="JDBC"/>
<dataSource type="POOLED">
<property name="driver" value="com.mysql.cj.jdbc.Driver"/>
<property name="url" value="jdbc:mysql://<您的连接地址>:33060/default?sslMode=disabled&allowPublicKeyRetrieval=true"/>
<property name="username" value="root"/>
<property name="password" value="your_password"/>
</dataSource>
</environment>
</environments>
<mappers>
<mapper class="org.example.UserMapper"/>
</mappers>
</configuration>
```
**Mapper 示例**:
```java
public interface UserMapper {
@Update("CREATE TABLE IF NOT EXISTS demo_user(id INT, name VARCHAR, PRIMARY KEY(id))")
void createUserTable();
@Insert("UPSERT INTO demo_user(id, name) VALUES(#{userId}, #{userName})")
int upsertUser(User user);
@Select("SELECT * FROM demo_user WHERE id = #{userId}")
User selectOneUser(@Param("userId") int userId);
@Delete("DELETE FROM demo_user WHERE id = #{userId}")
int deleteUser(@Param("userId") int userId);
}
```
---
#### Python
##### 1. mysql-connector-python
**安装**: `pip install mysql-connector-python==8.0.15`
**直连模式**:
```python
import mysql.connector
connection = mysql.connector.connect(
host='<您的连接地址>',
port=33060,
user='root',
passwd='your_password',
database='default'
)
cursor = connection.cursor(prepared=True)
# 创建表
cursor.execute("CREATE TABLE IF NOT EXISTS test_python(c1 INTEGER, c2 INTEGER, c3 VARCHAR, PRIMARY KEY(c1))")
# 插入数据 (参数化防止 SQL 注入)
cursor.execute("UPSERT INTO test_python(c1, c2, c3) VALUES(?, ?, ?)", (1, 1, 'value1'))
# 查询
cursor.execute("SELECT * FROM test_python WHERE c1 = ?", (1,))
print(cursor.fetchall())
cursor.close()
connection.close()
```
**连接池模式**:
```python
from mysql.connector import pooling
connection_pool = pooling.MySQLConnectionPool(
pool_name="mypool",
pool_size=20,
host='<您的连接地址>',
port=33060,
user='root',
password='your_password',
database='default'
)
connection = connection_pool.get_connection()
cursor = connection.cursor(prepared=True)
# ... 执行 SQL
cursor.close()
connection.close() # 返回到连接池
```
##### 2. SQLAlchemy ORM
**安装**:
```bash
pip install PyMySQL
pip install SQLAlchemy
```
**示例**:
```python
from sqlalchemy import create_engine, Column, String, Integer, Float
from sqlalchemy.orm import declarative_base, sessionmaker
Base = declarative_base()
class Player(Base):
__tablename__ = 'player'
player_id = Column(Integer, primary_key=True, autoincrement=False)
player_name = Column(String(255))
player_height = Column(Float)
engine = create_engine('mysql+pymysql://root:your_password@<您的连接地址>:33060/default')
Session = sessionmaker(bind=engine)
# 建表
Base.metadata.create_all(engine)
# 写入数据
session = Session()
session.add(Player(player_id=1001, player_name="john", player_height=2.08))
session.commit()
# 查询
rows = session.query(Player).filter(Player.player_id == 1001).all()
print([str(row) for row in rows])
```
---
#### Go
##### 1. database/sql + MySQL Driver
**依赖** (`go.mod`):
```go
require github.com/go-sql-driver/mysql v1.7.1
```
**示例**:
```go
package main
import (
"database/sql"
"fmt"
"time"
_ "github.com/go-sql-driver/mysql"
)
func main() {
url := "root:your_password@tcp(<您的连接地址>:33060)/default?timeout=10s"
db, err := sql.Open("mysql", url)
if err != nil {
panic(err)
}
defer db.Close()
// 连接池配置
db.SetMaxOpenConns(20)
db.SetMaxIdleConns(20)
db.SetConnMaxIdleTime(8 * time.Minute)
db.SetConnMaxLifetime(30 * time.Minute)
// 创建表
db.Exec("CREATE TABLE IF NOT EXISTS user_test(id INT, name VARCHAR, age INT, PRIMARY KEY(id))")
// 插入 (参数绑定方式)
stmt, _ := db.Prepare("UPSERT INTO user_test(id, name, age) VALUES(?, ?, ?)")
stmt.Exec(1, "zhangsan", 17)
// 查询
rows, _ := db.Query("SELECT * FROM user_test")
defer rows.Close()
for rows.Next() {
var id, age int
var name string
rows.Scan(&id, &name, &age)
fmt.Printf("id=%d, name=%s, age=%d\n", id, name, age)
}
}
```
##### 2. GORM 框架
**依赖**:
```go
require (
gorm.io/driver/mysql v1.5.1
gorm.io/gorm v1.25.4
)
```
**示例**:
```go
package main
import (
"gorm.io/driver/mysql"
"gorm.io/gorm"
)
type Product struct {
ID int64 `gorm:"primaryKey;autoIncrement:false"`
Code string `gorm:"type:varchar"`
Price float64
}
func main() {
dsn := "root:your_password@tcp(<您的连接地址>:33060)/default"
db, _ := gorm.Open(mysql.Open(dsn), &gorm.Config{})
// 重要: Lindorm 不支持事务,必须关闭
session := db.Session(&gorm.Session{SkipDefaultTransaction: true})
// 建表
session.Migrator().CreateTable(&Product{})
// 写入
session.Create(&Product{ID: 1, Code: "D42", Price: 100.1})
// 查询
var product Product
session.First(&product, 1)
}
```
---
#### C/C++
**安装** (CentOS): `yum install mysql-devel`
**示例**:
```c
#include <stdio.h>
#include "mysql/mysql.h"
int main() {
MYSQL conn;
mysql_init(&conn);
if (!mysql_real_connect(&conn,
"<您的连接地址>",
"root", "your_password", "default", 33060, NULL, 0)) {
printf("连接失败: %s\n", mysql_error(&conn));
return 1;
}
// 创建表
mysql_query(&conn, "CREATE TABLE IF NOT EXISTS user_test(id INT, name VARCHAR, PRIMARY KEY(id))");
// 插入数据
mysql_query(&conn, "UPSERT INTO user_test(id, name) VALUES(1, 'test')");
// 查询数据
mysql_query(&conn, "SELECT * FROM user_test");
MYSQL_RES *result = mysql_store_result(&conn);
MYSQL_ROW row;
while ((row = mysql_fetch_row(result))) {
printf("id=%s, name=%s\n", row[0], row[1]);
}
mysql_close(&conn);
return 0;
}
```
**编译**: `gcc -o demo demo.c $(mysql_config --cflags) $(mysql_config --libs)`
---
#### C#
**安装**: `dotnet add package MySql.Data -v 8.0.11`
**示例**:
```csharp
using MySql.Data.MySqlClient;
string connStr = "server=<您的连接地址>;UID=root;database=default;port=33060;password=your_password";
MySqlConnection conn = new MySqlConnection(connStr);
conn.Open();
MySqlCommand cmd = new MySqlCommand("SHOW DATABASES", conn);
MySqlDataReader rdr = cmd.ExecuteReader();
while (rdr.Read()) {
Console.WriteLine(rdr[0]);
}
conn.Close();
```
---
#### Rust
**依赖** (`Cargo.toml`):
```toml
[dependencies]
mysql = "*"
```
**示例**:
```rust
use mysql::*;
use mysql::prelude::*;
fn main() {
let opts = OptsBuilder::new()
.ip_or_hostname(Some("<您的连接地址>"))
.user(Some("root"))
.pass(Some("your_password"))
.db_name(Some("default"))
.tcp_port(33060);
let pool = Pool::new(opts).unwrap();
let mut conn = pool.get_conn().unwrap();
// 创建表
conn.query_drop("CREATE TABLE IF NOT EXISTS user_test(id INT, name VARCHAR, PRIMARY KEY(id))").unwrap();
// 插入
conn.exec_drop("UPSERT INTO user_test(id, name) VALUES(?, ?)", (1, "test")).unwrap();
// 查询
let result: Vec<(i32, String)> = conn.query("SELECT * FROM user_test").unwrap();
for (id, name) in result {
println!("id={}, name={}", id, name);
}
}
```
---
#### PHP
**要求**: PHP 8.0+,安装 php-mysql 模块
**示例**:
```php
<?php
$lindorm_addr = "<您的连接地址>";
$lindorm_username = "root";
$lindorm_password = "your_password";
$lindorm_database = "default";
$lindorm_port = 33060;
$conn = mysqli_connect($lindorm_addr, $lindorm_username, $lindorm_password, $lindorm_database, $lindorm_port);
// 创建表
mysqli_query($conn, "CREATE TABLE IF NOT EXISTS user_test(id INT, name VARCHAR, PRIMARY KEY(id))");
// 插入数据
mysqli_query($conn, "UPSERT INTO user_test(id, name) VALUES(1, 'test')");
// 查询数据
$result = mysqli_query($conn, "SELECT * FROM user_test");
while ($row = mysqli_fetch_array($result)) {
printf("id=%d, name=%s\n", $row["id"], $row["name"]);
}
mysqli_close($conn);
?>
```
---
#### Node.js
**安装**: `npm install mysql2`
**示例**:
```javascript
var mysql = require('mysql2');
var connection = mysql.createConnection({
host: '<您的连接地址>',
port: 33060,
user: 'root',
password: 'your_password',
database: 'default',
connectTimeout: 10000
});
connection.connect(function(err) {
if (err) throw err;
console.log("Connected!");
// 查询
connection.query('SHOW DATABASES', function(err, results) {
if (err) throw err;
console.log(results);
});
connection.end();
});
```
---
#### ODBC
**安装** (Linux):
```bash
# 下载 MySQL ODBC 驱动: https://dev.mysql.com/downloads/connector/odbc/
yum install unixODBC-devel
```
**配置** (`/etc/odbcinst.ini`):
```ini
[MySQL]
Description = ODBC for MySQL
Driver64 = /usr/lib64/libmyodbc8a.so
Setup64 = /usr/lib64/libmyodbc8w.so
FileUsage = 1
```
**C 代码示例**:
```c
#include <sql.h>
#include <sqlext.h>
int main() {
SQLHENV env;
SQLHDBC dbc;
SQLRETURN ret;
SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env);
SQLSetEnvAttr(env, SQL_ATTR_ODBC_VERSION, (SQLPOINTER)SQL_OV_ODBC3, SQL_IS_INTEGER);
SQLAllocHandle(SQL_HANDLE_DBC, env, &dbc);
ret = SQLDriverConnect(dbc, NULL,
(SQLCHAR*)"DRIVER={MySQL};SERVER=<您的连接地址>;PORT=33060;DATABASE=default;USER=root;PASSWORD=your_password",
SQL_NTS, NULL, 0, NULL, SQL_DRIVER_COMPLETE);
if (ret == SQL_SUCCESS) {
printf("连接成功\n");
// ... 执行 SQL
}
SQLFreeHandle(SQL_HANDLE_DBC, dbc);
SQLFreeHandle(SQL_HANDLE_ENV, env);
return 0;
}
```
---
### Avatica 协议(仅存量维护)
> **注意**: Avatica 协议目前处于存量维护状态,**不推荐新用户使用**。
>
> Avatica 域名格式略有不同:
> - V2: `ld-xxx-proxy-lindorm-vpc.lindorm.aliyuncs.com:30060`
> - V1: `ld-xxx-proxy-lindorm.lindorm.rds.aliyuncs.com:30060`
#### Java (Avatica)
**依赖**:
```xml
<dependency>
<groupId>com.aliyun.lindorm</groupId>
<artifactId>lindorm-all-client</artifactId>
<version>2.2.1.3</version>
</dependency>
```
**连接代码**:
```java
String url = "jdbc:lindorm:table:url=http://<您的连接地址>:30060";
Properties properties = new Properties();
properties.put("user", "root");
properties.put("password", "your_password");
properties.put("database", "default");
Connection connection = DriverManager.getConnection(url, properties);
```
#### Python (Avatica - phoenixdb)
**安装**: `pip install phoenixdb==1.2.0`
```python
import phoenixdb
connect_kw_args = {
'lindorm_user': 'root',
'lindorm_password': 'your_password',
'database': 'default'
}
database_url = 'http://<您的连接地址>:30060'
connection = phoenixdb.connect(database_url, autocommit=True, **connect_kw_args)
with connection.cursor() as cursor:
cursor.execute("SELECT * FROM test_table")
print(cursor.fetchall())
connection.close()
```
#### Go (Avatica)
**依赖** (`go.mod`):
```go
require github.com/apache/calcite-avatica-go/v5 v5.0.0
replace github.com/apache/calcite-avatica-go/v5 => github.com/aliyun/alibabacloud-lindorm-go-sql-driver/v5 v5.0.6
```
---
### 最佳实践
#### 连接参数建议
| 参数 | 建议值 | 说明 |
|------|--------|------|
| sslMode | disabled | 不使用 SSL,提升性能 |
| useServerPrepStmts | true | 启用服务端预处理 |
| rewriteBatchedStatements | true | 优化批量写入性能 |
| cachePrepStmts | true | 缓存预处理语句 |
| prepStmtCacheSize | 100 | 缓存数量 |
#### 写入性能优化
1. **使用 Batch 写入**: batchSize 建议 50-100
2. **使用 INSERT 而非 UPSERT**: MySQL 协议下 INSERT 语义与 UPSERT 相同,但有客户端优化
3. **增加并发**: 通过多线程/协程增加写入吞吐
#### 连接池配置
1. 连接保持时间不宜过长,建议配置 `phyMaxUseCount` 和 `phyTimeoutMillis`
2. 执行完查询后及时调用 `close()` 归还连接
3. 建议启用连接保活检测
#### 注意事项
- **Lindorm 不支持事务**: 使用 GORM 等 ORM 框架时需关闭事务
- **UPDATE 仅支持单行**: WHERE 条件必须指定全部主键
- **连接空闲超时**: 服务端会主动断开空闲 10 分钟的连接
FILE:references/01-dev/sql-operations.md
# Lindorm SQL 语法参考
以下收录了常用的 DDL 和 DML 语句。
## 重要约束
> **此约束具有最高优先级,必须严格遵守。**
1. **禁止推测和联想**:只能基于本 Skill 文档中明确记载的内容回答用户问题,严禁推测、联想或生成文档中不存在的 SQL 语法、参数、功能或配置。
2. **不确定时必须声明**:如果文档中没有相关信息,必须明确告知用户“当前文档未收录此内容”,并引导用户查阅官方文档。
3. **禁止混淆来源**:不得将其他数据库(如 MySQL、HBase、PostgreSQL)的语法或特性当作 Lindorm 的功能来回答。
4. **代码示例需要出处**:生成的代码示例必须基于文档中的模板,参数和语法必须与文档一致。
## 执行步骤
1. 当用户提到的 SQL 语句已收录在下方文档中时,优先使用收录的信息回复用户。
2. 如果用户提到的 SQL 语句未收录在下方文档中时,查询官方文档回复用户。
- **DDL**: https://help.aliyun.com/zh/lindorm/developer-reference/wide-table-ddl/
- **DML**: https://help.aliyun.com/zh/lindorm/developer-reference/wide-table-dml/
- **DCL**:https://help.aliyun.com/zh/lindorm/developer-reference/wide-table-dcl/
3. 如果用户提到的 SQL 语句既未收录在下方文档中,也未找到官方文档中的相关信息时,必须明确告知用户“当前文档未收录此内容”。
## DDL 语句
> 详细建表语法、数据类型、表属性请参考 table-design.md
### CREATE TABLE - 创建表
**基本语法**:
```sql
CREATE TABLE [IF NOT EXISTS] table_name (
column_name data_type [NOT NULL],
...
PRIMARY KEY (column_name [, column_name]...)
)
[WITH (option = value, ...)]
```
**示例 - 基础建表**:
```sql
CREATE TABLE IF NOT EXISTS user_profile (
user_id VARCHAR NOT NULL,
nickname VARCHAR,
age INTEGER,
balance DOUBLE,
created_at TIMESTAMP,
PRIMARY KEY (user_id)
);
```
**示例 - 复合主键**:
```sql
CREATE TABLE IF NOT EXISTS order_detail (
order_id VARCHAR NOT NULL,
item_seq INTEGER NOT NULL,
product_name VARCHAR,
quantity INTEGER,
price DOUBLE,
PRIMARY KEY (order_id, item_seq)
);
```
**示例 - 带表属性**:
```sql
CREATE TABLE IF NOT EXISTS logs (
log_id VARCHAR NOT NULL,
level VARCHAR,
message VARCHAR,
created_at TIMESTAMP,
PRIMARY KEY (log_id)
) WITH (
TTL = '604800', -- 7天过期
COMPRESSION = 'ZSTD', -- ZSTD压缩
NUMREGIONS = 10 -- 预分10个Region
);
```
**示例 - 预分区建表**:
```sql
CREATE TABLE IF NOT EXISTS metrics (
metric_id VARCHAR NOT NULL,
value DOUBLE,
ts TIMESTAMP,
PRIMARY KEY (metric_id)
) WITH (
NUMREGIONS = 8,
STARTKEY = '0',
ENDKEY = '9'
);
```
**示例 - 指定分区键**:
```sql
CREATE TABLE IF NOT EXISTS events (
event_id VARCHAR NOT NULL,
event_type VARCHAR,
payload VARCHAR,
PRIMARY KEY (event_id)
) WITH (
SPLITKEYS = 'a,b,c,d,e,f,g,h,i'
);
```
### 常用表属性
| 属性 | 类型 | 说明 | 示例 |
|------|------|------|------|
| TTL | INT | 数据有效期(秒) | TTL = '86400' (1天) |
| COMPRESSION | STRING | 压缩算法: SNAPPY/ZSTD/LZ4 | COMPRESSION = 'ZSTD' |
| NUMREGIONS | INT | 预分区数量 | NUMREGIONS = 16 |
| STARTKEY | STRING | 分区起始Key | STARTKEY = '0' |
| ENDKEY | STRING | 分区结束Key | ENDKEY = 'z' |
| SPLITKEYS | STRING | 自定义分区点 | SPLITKEYS = 'a,m,z' |
| MUTABILITY | STRING | 索引写入模式 | MUTABILITY = 'MUTABLE_LATEST' |
| CONSISTENCY | STRING | 一致性级别 | CONSISTENCY = 'strong' |
### ALTER TABLE - 修改表
**添加列**:
```sql
ALTER TABLE user_profile ADD COLUMN email VARCHAR;
ALTER TABLE user_profile ADD COLUMN phone VARCHAR, address VARCHAR;
```
**修改 TTL**:
```sql
ALTER TABLE logs SET TTL = '2592000'; -- 改为30天
ALTER TABLE logs SET TTL = ''; -- 取消TTL,数据不过期
```
### DROP TABLE - 删除表
```sql
DROP TABLE IF EXISTS table_name;
```
### TRUNCATE TABLE - 清空表
```sql
TRUNCATE TABLE table_name;
```
### CREATE INDEX - 创建索引
**二级索引**:
```sql
-- 在建表时创建
CREATE TABLE orders (
order_id VARCHAR NOT NULL,
user_id VARCHAR,
status VARCHAR,
amount DOUBLE,
PRIMARY KEY (order_id),
INDEX idx_user USING KV (user_id),
INDEX idx_status USING KV (status) INCLUDE (amount)
);
-- 单独创建二级索引
CREATE INDEX idx_user USING KV ON orders (user_id);
```
**搜索索引**:
> ⚠️ **重要**:搜索索引需先在控制台开通(宽表引擎 → 搜索索引 → 立即开通),否则创建会报 `SERVER INTERNAL ERROR`。详见 [table-design.md](table-design.md#搜索索引开通条件)。
```sql
-- 基础搜索索引
CREATE INDEX idx_search USING SEARCH ON orders (status);
-- 带分词器的搜索索引
CREATE INDEX idx_text USING SEARCH ON articles (
title(type=text, analyzer=ik),
content(type=text, analyzer=ik)
);
-- 分词查询
SELECT * FROM articles WHERE MATCH(content) AGAINST('关键词');
```
### SHOW / DESCRIBE - 查看信息
```sql
-- 查看所有表
SHOW TABLES;
-- 查看表结构
DESCRIBE table_name;
-- 查看建表语句
SHOW CREATE TABLE table_name;
-- 查看索引
SHOW INDEX FROM table_name;
```
---
## DML 语句
### UPSERT - 插入/更新数据
UPSERT 是 Lindorm 推荐的写入方式,如果主键存在则更新,不存在则插入。
**单条写入**:
```sql
UPSERT INTO user_profile (user_id, nickname, age)
VALUES ('u001', '张三', 25);
```
**带时间戳写入** (使用 HINT):
```sql
-- 指定写入数据的时间戳(毫秒)
UPSERT /*+ _l_ts_(1704067200000) */ INTO user_profile (user_id, nickname, age)
VALUES ('u001', '张三', 25);
```
**批量写入** (PreparedStatement):
```java
String sql = "UPSERT INTO user_profile (user_id, nickname, age) VALUES (?, ?, ?)";
PreparedStatement ps = conn.prepareStatement(sql);
for (int i = 0; i < 100; i++) {
ps.setString(1, "u" + i);
ps.setString(2, "user_" + i);
ps.setInt(3, 20 + i % 50);
ps.addBatch();
}
ps.executeBatch();
```
### SELECT - 查询数据
**基本查询**:
```sql
SELECT * FROM user_profile WHERE user_id = 'u001';
```
**指定列查询** (推荐):
```sql
SELECT user_id, nickname, age FROM user_profile WHERE user_id = 'u001';
```
**范围查询**:
```sql
SELECT * FROM user_profile
WHERE user_id >= 'u001' AND user_id < 'u100';
```
**排序与分页**:
```sql
SELECT * FROM user_profile
ORDER BY user_id
LIMIT 100 OFFSET 0;
```
**条件查询**:
```sql
SELECT * FROM user_profile
WHERE age > 25 AND age < 35;
SELECT * FROM user_profile
WHERE nickname LIKE '张%';
SELECT * FROM user_profile
WHERE user_id IN ('u001', 'u002', 'u003');
```
**聚合查询**:
```sql
SELECT COUNT(*) FROM user_profile;
SELECT MAX(age), MIN(age), AVG(age) FROM user_profile;
```
**HINT 使用**:
```sql
-- 强制允许低效查询(全表扫描)
SELECT /*+ _l_allow_filtering_ */ * FROM user_profile;
-- 指定操作超时时间(毫秒)
SELECT /*+ _l_operation_timeout_(30000) */ * FROM user_profile WHERE age > 25;
-- 强制使用索引
SELECT /*+ _l_force_index_('idx_name') */ * FROM user_profile WHERE name = 'test';
-- 忽略索引
SELECT /*+ _l_ignore_index_ */ * FROM user_profile WHERE name = 'test';
```
### UPDATE - 更新数据
**重要**: UPDATE 必须指定完整的主键条件。
```sql
-- 正确: 指定完整主键
UPDATE user_profile SET age = 26 WHERE user_id = 'u001';
-- 错误: 不支持批量更新
UPDATE user_profile SET age = 26 WHERE age > 25; -- 会报错
```
### DELETE - 删除数据
**单行删除**:
```sql
DELETE FROM user_profile WHERE user_id = 'u001';
```
**复合主键删除**:
```sql
DELETE FROM order_detail WHERE order_id = 'o001' AND item_seq = 1;
```
### JSON 数据写入与查询
> 数据类型定义请参考 [table-design.md](table-design.md)
**写入方式**:
```sql
-- 方式 1: 直接写入 JSON 字符串
UPSERT INTO tb(p1, c2) VALUES(1, '{"k1": 4, "k2": {"k3": {"k4": 4}}}');
-- 方式 2: 使用 json_object 函数
UPSERT INTO tb(p1, c2) VALUES(2, json_object('k1', 2, 'k2', '2'));
-- 等价于:UPSERT INTO tb(p1,c2) VALUES(2,'{"k1":2,"k2":"2"}');
-- 方式 3: 使用 json_array 函数
UPSERT INTO tb(p1, c2) VALUES(3, json_array(1, 2, json_object('k1', 3, 'k2', '3')));
-- 等价于:UPSERT INTO tb(p1,c2) VALUES(3,'[1,2,{"k1":3,"k2":"3"}]');
```
**查询 JSON 字段**:
```sql
-- 获取 JSON 对象中的值 (SELECT 子句)
SELECT p1, json_extract(c2, '$.k1') AS k1_value FROM tb WHERE p1 = 1;
-- 嵌套路径查询
SELECT json_extract(c2, '$.k2.k3.k4') FROM tb WHERE p1 = 4;
-- 数组索引访问
SELECT json_extract(c2, '$[2].k2') FROM tb WHERE p1 = 3;
-- WHERE 条件过滤
SELECT * FROM tb
WHERE p1 >= 1 AND p1 < 4
AND json_extract(c2, '$.k2') > '0';
```
---
## 常用函数
### 字符串函数
> 要求宽表引擎 2.5.1.1+
| 函数 | 说明 | 示例 |
|------|------|------|
| `CONCAT(s1, s2, ...)` | 拼接多个字符串 | `SELECT CONCAT('a','b','c')` → `abc` |
| `LENGTH(s)` | 计算字符串长度 | `SELECT LENGTH('hello')` → `5` |
| `LOWER(s)` | 转为小写 | `SELECT LOWER('ABC')` → `abc` |
| `UPPER(s)` | 转为大写 | `SELECT UPPER('abc')` → `ABC` |
| `TRIM(s)` | 删除前后空格 | `SELECT TRIM(' ab ')` → `ab` |
| `SUBSTR(s, pos[, len])` | 截取子串 | `SELECT SUBSTR('hello', 2, 3)` → `ell` |
| `REPLACE(s, from, to)` | 替换子串 | `SELECT REPLACE('abc', 'b', 'x')` → `axc` |
| `REVERSE(s)` | 返回逆序字符串 | `SELECT REVERSE('abc')` → `cba` |
| `MD5(s)` | 计算 MD5 哈希 | `SELECT MD5('abc')` → `900150983cd24fb0...` |
| `SHA256(s)` | 计算 SHA256 哈希 | `SELECT SHA256('abc')` → `ba7816bf8f01cfea...` |
| `START_WITH(s, prefix)` | 判断前缀 | `SELECT START_WITH('hello', 'he')` → `true` |
**正则表达式函数**:
```sql
-- REGEXP_REPLACE: 正则替换,支持指定开始位置
SELECT REGEXP_REPLACE('abcbc', 'b', 'x', 2); -- axcxc
-- REGEXP_SUBSTR: 正则提取子串
SELECT REGEXP_SUBSTR('abc123def', '[0-9]+'); -- 123
-- MATCH: 判断是否匹配正则
SELECT * FROM table WHERE MATCH(column, 'pattern');
```
### 聚合函数
| 函数 | 说明 | 示例 |
|------|------|------|
| `COUNT(*)` | 统计行数 | `SELECT COUNT(*) FROM table` |
| `COUNT(column)` | 统计非 NULL 值个数 | `SELECT COUNT(name) FROM table` |
| `SUM(column)` | 求和(仅数值类型) | `SELECT SUM(amount) FROM orders` |
| `AVG(column)` | 平均值(仅数值类型) | `SELECT AVG(price) FROM products` |
| `MAX(column)` | 最大值 | `SELECT MAX(score) FROM results` |
| `MIN(column)` | 最小值 | `SELECT MIN(score) FROM results` |
**高级聚合函数** (宽表引擎 2.7.9+):
```sql
-- HEAD: 返回第一个非 NULL 值,支持排序
SELECT HEAD(temperature ORDER BY time) FROM sensor; -- 最早的温度
SELECT HEAD(temperature ORDER BY time DESC) FROM sensor; -- 最新的温度
-- GROUP_CONCAT: 分组拼接字符串
SELECT region, GROUP_CONCAT(device_id) FROM sensor GROUP BY region;
-- 结果: north-cn | dev1,dev2,dev3
-- GROUP_CONCAT 带排序和分隔符
SELECT region, GROUP_CONCAT(device_id ORDER BY time SEPARATOR '|')
FROM sensor GROUP BY region;
-- 结果: north-cn | dev1|dev2|dev3
-- GROUP_CONCAT 去重
SELECT region, GROUP_CONCAT(DISTINCT device_id) FROM sensor GROUP BY region;
```
### 时间函数
> 要求宽表引擎 2.7.8+,Lindorm SQL 2.8.7.0+
| 函数 | 说明 | 示例 |
|------|------|------|
| `DATE_FORMAT(ts, format)` | 格式化时间戳 | 见下方示例 |
| `FROM_UNIXTIME(seconds)` | Unix 时间戳转 TIMESTAMP | `SELECT FROM_UNIXTIME(1704067200)` |
| `UNIX_TIMESTAMP(ts)` | TIMESTAMP 转 Unix 时间戳 | `SELECT UNIX_TIMESTAMP('2024-01-01 00:00:00')` |
| `DATEDIFF(ts1, ts2)` | 计算日期差(天) | `SELECT DATEDIFF('2024-01-05', '2024-01-01')` → `4` |
**DATE_FORMAT 格式说明符**:
```sql
SELECT DATE_FORMAT('2024-01-15 17:30:45', '%Y-%m-%d %H:%i:%s');
-- 2024-01-15 17:30:45
SELECT DATE_FORMAT('2024-01-15 17:30:45', '%Y年%m月%d日 %H:%i');
-- 2024年01月15日 17:30
SELECT DATE_FORMAT('2024-01-15 17:30:45', 'at %T on %b %D, %Y');
-- at 17:30:45 on JAN 15th, 2024
```
| 格式符 | 说明 | 示例 |
|--------|------|------|
| `%Y` | 四位年份 | 2024 |
| `%y` | 两位年份 | 24 |
| `%m` | 两位月份 | 01-12 |
| `%d` | 两位日期 | 01-31 |
| `%H` | 24小时制小时 | 00-23 |
| `%h` | 12小时制小时 | 01-12 |
| `%i` | 分钟 | 00-59 |
| `%s` / `%S` | 秒 | 00-59 |
| `%T` | 时间 (HH:mm:ss) | 17:30:45 |
| `%D` | 带序数的日期 | 1st, 2nd, 15th |
| `%b` | 月份缩写 | Jan, Feb |
| `%M` | 月份全称 | January |
| `%W` | 星期全称 | Monday |
| `%a` | 星期缩写 | Mon |
| `%p` | AM/PM | AM |
**FROM_UNIXTIME 示例**:
```sql
-- Unix 时间戳转 TIMESTAMP
SELECT FROM_UNIXTIME(1704067200); -- 2024-01-01 08:00:00 (+08:00时区)
-- 支持毫秒精度(使用小数)
SELECT FROM_UNIXTIME(1704067200.123); -- 2024-01-01 08:00:00.123
-- 同时格式化输出
SELECT FROM_UNIXTIME(1704067200, '%Y-%m-%d'); -- 2024-01-01
```
**常用 JSON 函数**:
**构造函数**:
- `json_object(key1, value1, ...)`: 构建 JSON 对象
```sql
SELECT json_object('name', 'Alice', 'age', 25);
-- {"name": "Alice", "age": 25}
```
- `json_array(value1, value2, ...)`: 构建 JSON 数组
```sql
SELECT json_array('Java', 'Python', 'Go');
-- ["Java", "Python", "Go"]
```
**提取函数**:
- `json_extract(json_doc, path)`: 提取 JSON 值(返回 JSON 类型)
```sql
SELECT json_extract('{"name": "Alice"}', '$.name');
-- "Alice"
```
- `json_extract_string(json_doc, path)`: 提取并转换为 VARCHAR 类型
```sql
SELECT json_extract_string('{"name": "Alice"}', '$.name');
-- Alice (VARCHAR)
```
- `json_extract_long(json_doc, path)`: 提取并转换为 BIGINT 类型
```sql
SELECT json_extract_long('{"id": 123456}', '$.id');
-- 123456 (BIGINT)
```
- `json_extract_double(json_doc, path)`: 提取并转换为 DOUBLE 类型
```sql
SELECT json_extract_double('{"score": 95.5}', '$.score');
-- 95.5 (DOUBLE)
```
**路径语法**:
- `$.key`: 访问对象的 key
- `$[index]`: 访问数组的索引(从 0 开始)
- `$.key1.key2`: 嵌套访问
- `$[*]`: 通配符匹配所有数组元素
**包含检查函数**:
- `json_contains(target, candidate[, path])`: 检查是否包含指定值
```sql
-- 检查数组是否包含某个元素
SELECT json_contains('["Java", "Python"]', '"Java"');
-- 1 (true)
-- 检查对象是否包含某个属性
SELECT json_contains('{"a": 1, "b": 2}', '{"a": 1}');
-- 1 (true)
-- 检查指定路径
SELECT json_contains('{"skills": ["Java", "Python"]}', '"Java"', '$.skills');
-- 1 (true)
-- WHERE 条件中使用
SELECT * FROM table WHERE json_contains(data, '"active"', '$.status');
```
**更新函数**:
- `json_set(json_doc, path, value[, path, value]...)`: 插入或更新值
```sql
SELECT json_set('{"a": 1}', '$.b', 2);
-- {"a": 1, "b": 2}
```
- `json_insert(json_doc, path, value)`: 仅在路径不存在时插入
```sql
SELECT json_insert('{"a": 1}', '$.b', 2);
-- {"a": 1, "b": 2}
```
- `json_replace(json_doc, path, value)`: 仅在路径存在时更新
```sql
SELECT json_replace('{"a": 1}', '$.a', 10);
-- {"a": 10}
```
- `json_remove(json_doc, path[, path]...)`: 删除指定路径的值
```sql
SELECT json_remove('{"a": 1, "b": 2}', '$.b');
-- {"a": 1}
```
**注意事项**:
- 如果在 JSON 列中写入非 JSON 对象或字符串,会报错
- 不同数据类型比较规则与 MySQL 相同
---
## 特殊语法
### HINT 语法详解
HINT 是 SQL 的补充语法,可以改变 SQL 的执行方式。HINT 必须紧跟在 `INSERT`、`UPSERT`、`DELETE`、`SELECT` 关键字之后。
> 要求宽表引擎 2.3.1+
**基本语法**:
```sql
/*+ hint1, hint2, ... */
```
#### HINT 参数列表
| HINT | 类型 | 说明 | 支持语句 |
|------|------|------|----------|
| `_l_operation_timeout_(N)` | INT | DML 操作超时时间,单位毫秒,默认 120000 | UPSERT, DELETE, UPDATE, SELECT |
| `_l_allow_filtering_` | - | 允许低效全表扫描查询 | SELECT |
| `_l_force_index_('idx')` | STRING | 强制使用指定索引 | SELECT |
| `_l_ignore_index_` | - | 忽略索引,直接查表 | SELECT |
| `_l_ts_(N)` | BIGINT | 指定写入/查询的时间戳(毫秒) | UPSERT, SELECT |
| `_l_versions_(N)` | INT | 返回最新 N 个版本的数据 | SELECT |
| `_l_ts_min_(N)` | BIGINT | 过滤结果,返回时间戳 >= N 的数据 | SELECT |
| `_l_ts_max_(N)` | BIGINT | 过滤结果,返回时间戳 < N 的数据 | SELECT |
| `_l_hot_only_` / `_l_hot_only_(true)` | BOOLEAN | 仅查询热存储数据 | SELECT |
#### 超时与性能控制
```sql
-- 设置操作超时时间 30 秒
SELECT /*+ _l_operation_timeout_(30000) */ COUNT(*) FROM big_table;
-- 允许全表扫描(当 WHERE 条件不包含主键时)
SELECT /*+ _l_allow_filtering_ */ * FROM users WHERE age > 30;
-- 组合使用
SELECT /*+ _l_operation_timeout_(30000), _l_allow_filtering_ */ *
FROM users WHERE age > 30;
```
#### 索引控制
```sql
-- 强制使用指定索引
SELECT /*+ _l_force_index_('idx_user_name') */ * FROM users WHERE name = 'test';
-- 忽略索引,直接查主表(用于性能对比)
SELECT /*+ _l_ignore_index_ */ * FROM users WHERE name = 'test';
```
**注意**: `_l_force_index_` 和 `_l_ignore_index_` 不能同时使用
#### 多版本数据管理
Lindorm 支持每列存储多个版本的数据,通过时间戳标识版本(时间戳越大版本越新)。
**创建多版本表**:
```sql
-- VERSIONS='5' 表示每列最多保留 5 个版本
CREATE TABLE sensor_data (
device_id VARCHAR,
temperature DOUBLE,
PRIMARY KEY(device_id)
) WITH (VERSIONS='5');
-- 修改已有表的版本数
ALTER TABLE sensor_data SET 'VERSIONS' = '10';
```
**写入指定时间戳**:
```sql
-- 指定时间戳写入(毫秒)
UPSERT /*+ _l_ts_(1704067200000) */ INTO sensor_data(device_id, temperature)
VALUES ('dev001', 25.5);
UPSERT /*+ _l_ts_(1704067260000) */ INTO sensor_data(device_id, temperature)
VALUES ('dev001', 26.0); -- 同一设备,新版本
```
**查询多版本数据**:
```sql
-- 查询指定时间戳的数据
SELECT /*+ _l_ts_(1704067200000) */ device_id, temperature
FROM sensor_data WHERE device_id = 'dev001';
-- 查询最新 N 个版本
SELECT /*+ _l_versions_(3) */ device_id, temperature, temperature_l_ts
FROM sensor_data WHERE device_id = 'dev001';
-- 查询时间戳范围 [min, max)
SELECT /*+ _l_ts_min_(1704067200000), _l_ts_max_(1704153600000) */
device_id, temperature, temperature_l_ts
FROM sensor_data WHERE device_id = 'dev001';
```
**查看列的时间戳**: 在列名后加 `_l_ts` 后缀
```sql
-- temperature_l_ts 返回 temperature 列的时间戳
SELECT /*+ _l_versions_(2) */ device_id, temperature, temperature_l_ts
FROM sensor_data;
```
#### 热数据查询
当开通冷存储功能后,可使用 HINT 仅查询热存储中的数据:
```sql
-- 仅查询热数据
SELECT /*+ _l_hot_only_ */ * FROM sensor_data WHERE device_id = 'dev001';
SELECT /*+ _l_hot_only_(true) */ * FROM sensor_data WHERE device_id = 'dev001';
-- 查询所有数据(包括冷数据),与不使用 HINT 等价
SELECT /*+ _l_hot_only_(false) */ * FROM sensor_data WHERE device_id = 'dev001';
```
**注意**: 不支持单独查询冷数据
### 动态列
Lindorm 支持动态列,无需预定义即可写入新列。
```sql
-- 写入动态列
UPSERT INTO user_profile (user_id, _dyn_col_name1) VALUES ('u001', 'value1');
-- 查询动态列
SELECT user_id, _dyn_col_name1 FROM user_profile WHERE user_id = 'u001';
```
### TTL 相关
```sql
-- 查看表的 TTL 设置
SHOW CREATE TABLE table_name;
-- 修改 TTL
ALTER TABLE table_name SET TTL = '86400';
-- 取消 TTL(数据永不过期)
ALTER TABLE table_name SET TTL = '';
```
**注意**: Lindorm 不支持行级 TTL,TTL 是表级别的属性。
---
## SQL 注意事项
### 与 MySQL 的差异
| 特性 | MySQL | Lindorm SQL |
|------|-------|-------------|
| 写入语句 | INSERT/UPDATE | UPSERT (推荐) |
| UPDATE 范围 | 支持批量 | 仅单行 |
| 自增主键 | 支持 | 不支持 |
| 外键 | 支持 | 不支持 |
| 事务 | ACID | 单行原子性 |
| JOIN | 支持 | 不支持 |
### 性能建议
1. **主键查询最快**: 尽量使用主键作为查询条件
2. **避免全表扫描**: 无索引列 WHERE 会被低效查询拦截,需创建索引或用 `/*+ _l_allow_filtering_ */`
3. **限制返回行数**: 使用 LIMIT 避免大量数据返回
4. **使用 PreparedStatement**: 批量操作必用
5. **选择性 SELECT**: 只查询需要的列
6. **子查询用派生表**: WHERE IN/EXISTS 子查询需确保子查询中的过滤列有索引
7. **后缀模糊用搜索索引**: LIKE 前缀 `xxx%` 用二级索引即可,后缀模糊 `%xxx` 需搜索索引(SEARCH)
### 窗口函数
⚠️ **有限支持**:官方兼容性文档标注窗口函数为"暂不支持",语法不报错但可能存在**正确性或稳定性风险**(服务端计算开销较大)。实测 ROW_NUMBER/RANK/DENSE_RANK/LEAD/SUM OVER/AVG OVER 在当前版本可正常执行,LAG 在低版本存在解析器 bug。生产环境**谨慎使用**,建议优先用计算引擎(OLAP)处理窗口计算。
```sql
-- ROW_NUMBER: 按分组编号
SELECT id, user_name, amount,
ROW_NUMBER() OVER (PARTITION BY user_name ORDER BY amount DESC) AS rn
FROM orders;
-- RANK: 排名
SELECT id, user_name, amount,
RANK() OVER (ORDER BY amount DESC) AS rnk
FROM orders;
-- SUM OVER: 分组累计求和
SELECT id, user_name, amount,
SUM(amount) OVER (PARTITION BY user_name) AS user_total
FROM orders;
-- LEAD/LAG: 前后行引用
SELECT id, amount,
LAG(amount) OVER (ORDER BY id) AS prev_amt,
LEAD(amount) OVER (ORDER BY id) AS next_amt
FROM orders;
```
### 子查询
Lindorm SQL 支持派生表(FROM 子句中的子查询),WHERE IN/EXISTS 子查询需索引支持:
```sql
-- 派生表(推荐,无索引也可用)
SELECT * FROM (
SELECT user_name, SUM(amount) AS total FROM orders GROUP BY user_name
) AS t WHERE total > 4000;
-- WHERE IN 子查询(需子查询中过滤列有索引)
SELECT * FROM orders
WHERE user_name IN (SELECT name FROM users WHERE city = '杭州');
-- 标量子查询(需索引支持)
SELECT id, user_name,
(SELECT COUNT(*) FROM orders o2 WHERE o2.user_name = orders.user_name) AS order_cnt
FROM orders;
```
> 详见 [sql-usage-notes.md](sql-usage-notes.md) 中的子查询支持章节。
### 保留关键字
以下关键字不能作为表名或列名使用:
```
SELECT, FROM, WHERE, AND, OR, NOT, IN, LIKE, BETWEEN,
IS, NULL, TRUE, FALSE, CREATE, ALTER, DROP, TABLE,
INDEX, PRIMARY, KEY, VALUES, UPSERT, UPDATE, DELETE,
INSERT, INTO, SET, ORDER, BY, ASC, DESC, LIMIT, OFFSET
```
如需使用保留字,请用双引号括起:
```sql
CREATE TABLE "order" ("select" VARCHAR, PRIMARY KEY("select"));
```
FILE:references/01-dev/sql-usage-notes.md
# Lindorm SQL 使用注意事项
## 目录
- [与 MySQL 的兼容性差异](#与-mysql-的兼容性差异)
- [查询结果不符合预期的排查](#查询结果不符合预期的排查)
- [SQL 使用注意事项](#sql-使用注意事项)
- [HASH 打散注意事项](#hash-打散注意事项)
- [冷热分离注意事项](#冷热分离注意事项)
- [Compaction 与存储管理](#compaction-与存储管理)
---
## 与 MySQL 的兼容性差异
Lindorm SQL 兼容 MySQL 5.7/8.0 的部分功能和语法,但由于产品架构不同,部分语法或功能并没有被完全支持。
### 词法要素差异
| 项目 | Lindorm SQL | MySQL |
|------|-------------|-------|
| 大小写敏感 | 数据库对象标识符严格区分大小写 | 可配置 |
| 标识符引用 | 必须用反引号 \` 引用 | 支持多种方式 |
| 字符串常量 | 只能用单引号 ' | 支持单引号和双引号 |
### 不支持的数据类型
- BIT 类型
- MEDIUMINT 类型
- REAL 类型
- 各种 TEXT 类型
- DATETIME 类型
- 除 BIGINT UNSIGNED 以外的 UNSIGNED 类型
> 对于 TINYINT、INTEGER、BIGINT 等整型类型,不支持指定具体的长度限制。
### 事务支持
Lindorm 不支持多行事务(即一次读写多行数据的事务性)。
### INSERT 语义差异
Lindorm SQL 的 INSERT 本质是 UPSERT,对于主键相同的数据会直接覆盖写入语句中涉及到的字段/列。
> 传统数据库中基于全主键的等值过滤条件进行 UPDATE 的语句建议改成使用 INSERT 语句,来取得更好的性能。
**INSERT 限制**:
- INSERT 语句需要显式指定待写入的表的字段列表
- 字段列表中必须至少指定一个非主键列
### DELETE/UPDATE 限制
DELETE 和 UPDATE 语句必须指定 WHERE 条件,且需要 WHERE 过滤条件能够明确指定所有主键的等值条件精确定位到一条数据。
> 如果 WHERE 过滤条件可以定位到一批数据,默认无法执行。若需要批量删除/更新,需联系 Lindorm 技术支持配置系统参数启用。
**批量操作注意事项**:
- 批量删除/更新时,操作的原子性无法保证
- 批量操作是先查后删/更新,请尽可能保证 WHERE 条件能够高效命中索引
- 大量删除操作会影响某些查询场景的性能,可以设置 TTL 的情况下优先通过 TTL 让数据过期
### SELECT 限制
- 不支持 JOIN(INNER JOIN / LEFT JOIN / RIGHT JOIN 均不支持,报错 `JOIN is not allowed in Lindorm SQL`)
- 不支持 UNION、UNION ALL
- 不支持 INTERSECT、MINUS/EXCEPT(MINUS 报语法错误;INTERSECT/EXCEPT 语法不报错但运行时触发低效查询拦截,实际不可用)
### 子查询支持
Lindorm SQL 对子查询提供有限支持,具体取决于子查询形式和是否创建索引:
| 子查询形式 | 无索引 | 有二级索引 | 说明 |
|-----------|--------|-----------|------|
| 派生表(FROM 子句) | ✅ 支持 | ✅ 支持 | `SELECT * FROM (SELECT ...) AS t` |
| 派生表 + WHERE | ✅ 支持 | ✅ 支持 | 外层 WHERE 过滤派生表结果 |
| 多层嵌套派生表 | ✅ 支持 | ✅ 支持 | 多层 FROM 子查询嵌套 |
| 标量子查询(SELECT 列) | ❌ 低效拦截 | ✅ 支持 | `(SELECT COUNT(*) FROM ...) AS cnt` |
| WHERE IN 子查询 | ❌ 低效拦截 | ✅ 支持 | 需子查询中的过滤列有索引 |
| WHERE EXISTS 子查询 | ❌ 低效拦截 | ✅ 支持 | 需子查询中的过滤列有索引 |
> **关键说明**:WHERE IN / EXISTS / 标量子查询的失败不是语法不支持,而是因为无索引导致全表扫描被**低效查询拦截**机制拒绝。创建二级索引后即可正常执行。
**官方文档**:https://help.aliyun.com/zh/lindorm/user-guide/compatibility-comparison-between-lindorm-sql-and-mysql
### 窗口函数支持
⚠️ **有限支持**:官方兼容性文档标注窗口函数为"暂不支持",语法不报错但可能存在**正确性或稳定性风险**(服务端计算开销较大)。实测 ROW_NUMBER/RANK/DENSE_RANK/LEAD/SUM OVER/AVG OVER 在当前版本可正常执行,LAG 在低版本存在解析器 bug(`LAG(...) AS alias` 报错)。生产环境**谨慎使用**,建议优先用计算引擎(OLAP)处理窗口计算。
| 窗口函数 | 说明 | 示例 |
|---------|------|------|
| `ROW_NUMBER()` | 行号 | `ROW_NUMBER() OVER (PARTITION BY col ORDER BY col)` |
| `RANK()` | 排名(相同值同排名,跳号) | `RANK() OVER (ORDER BY col DESC)` |
| `DENSE_RANK()` | 排名(相同值同排名,不跳号) | `DENSE_RANK() OVER (ORDER BY col)` |
| `SUM() OVER` | 累计求和 | `SUM(amount) OVER (PARTITION BY user_id)` |
| `AVG() OVER` | 累计平均 | `AVG(score) OVER (PARTITION BY class_id)` |
| `LEAD(col, offset)` | 后续行 | `LEAD(amount, 1) OVER (ORDER BY id)` |
| `LAG(col, offset)` | 前置行 | `LAG(amount, 1) OVER (ORDER BY id)` |
### 低效查询拦截
Lindorm 默认会拦截可能导致全表扫描的查询,报错 `DoNotRetryIOException: Detect inefficient query`。
**触发场景**:WHERE 条件中的列没有索引,且无法通过主键定位数据。
**解决方案**:
1. **创建二级索引**(推荐):对 WHERE 条件中的列创建二级索引
```sql
CREATE INDEX idx_region ON orders (region) INCLUDE (user_name, amount);
-- 之后 WHERE region = '华东' 即可走索引
```
2. **使用 HINT 允许全表扫描**(慎用,可能影响性能):
```sql
SELECT /*+ _l_allow_filtering_ */ * FROM orders WHERE region = '华东';
```
3. **添加主键范围条件**:
```sql
SELECT * FROM orders WHERE id >= 1 AND id < 100 AND region = '华东';
```
> **索引生效时间**:建索引后需等待构建完成才能生效。二级索引(KV)约 10 秒,搜索索引(SEARCH)约 30 秒。立即查询可能仍报低效查询错误。
> 索引类型选择:二级索引(KV)支持等值/范围/前缀模糊查询;后缀模糊(`%xxx`)需搜索索引(SEARCH)。详见 [table-design.md](table-design.md)。
### LIKE 和范围查询
二级索引(KV)支持范围查询和 LIKE 前缀匹配:
| 查询类型 | 二级索引(KV) | 搜索索引(SEARCH) |
|---------|-------------|-----------------|
| 等值 `=` | ✅ | ✅ |
| IN | ✅ | ✅ |
| 范围 `> < >= <=` | ✅ | ✅ |
| BETWEEN | ✅ | ✅ |
| LIKE 前缀 `'xxx%'` | ✅ | ✅ |
| LIKE 单字符 `'xxx_'` | ✅ | ✅ |
| LIKE 后缀 `'%xxx'` | ❌ | ✅ |
| LIKE 包含 `'%xxx%'` | ❌ | ✅ |
| 多维查询 | ❌ | ✅ |
```sql
-- 二级索引:支持等值、范围、LIKE前缀
CREATE INDEX idx_amount ON orders (amount) INCLUDE (user_name, product);
-- 之后这些查询都能走索引:
-- WHERE amount = 3999
-- WHERE amount > 1000
-- WHERE amount BETWEEN 1000 AND 5000
-- WHERE product LIKE '手%'
-- 搜索索引:支持后缀模糊、多维查询、分词查询
-- ⚠️ 需先在控制台开通搜索索引功能,否则报 SERVER INTERNAL ERROR
CREATE INDEX idx_search USING SEARCH ON orders (region, product, amount);
-- 支持:WHERE product LIKE '%脑%'
-- 支持:WHERE region = '华东' AND amount > 1000
```
### 分词查询(MATCH AGAINST)
搜索索引支持分词查询,使用 `MATCH ... AGAINST` 语法:
```sql
-- 创建带分词器的搜索索引
CREATE INDEX idx_text USING SEARCH ON articles (
content(type=text, analyzer=ik)
);
-- 分词查询:匹配包含"功能介绍"的记录
SELECT * FROM articles WHERE MATCH(content) AGAINST('功能介绍');
-- 会匹配包含"功能"、"介绍"或"功能介绍"的记录
```
**支持的分词器**:`standard`(默认)、`ik`(中文推荐)、`english`、`whitespace`、`comma`
> 详细说明见 [table-design.md](table-design.md#搜索索引开通条件)
### ALTER TABLE 限制
- 支持加列和删列,但不支持重命名列
- 不支持修改列定义(包括列类型、精度、默认值等)
- 不支持通过 ALTER TABLE 语句增减索引
- 不支持修改表的主键
### 其他不支持的语法
- RENAME TABLE
- REPLACE
- SELECT ... FOR SHARE
- 显式事务语法(START TRANSACTION, COMMIT, ROLLBACK)
- CREATE TABLE AS SELECT
- 表的导入/导出语法(IMPORT, LOAD)
- EXPLAIN ANALYZE
- FOREIGN KEY
- 唯一索引(UNIQUE INDEX)
---
## 查询结果不符合预期的排查
Lindorm 宽表引擎的存储模型是基于 LSM-Tree 实现的。数据写入是即时可见的,不会出现写入后过一段时间才可见的情况。如果查询结果不符合预期,请按以下原因排查。
### 1. 数据未正常写入或查询发起的时间在数据写入前
如果写入链路出现问题,可能导致写入延迟或无法正常写入数据。建议在查询条件中添加 HINT 参数,指定查询结果中返回数据写入的时间戳,根据该时间戳判断是否出现了查询先于写入的情况。
### 2. STRING 字段中含有非正常的停止符或不可见字符
如果 STRING 字段中包含不可见字符,可能造成查询结果不符合预期。
**排查方法**:使用范围查询确认是否存在类似问题,例如 `WHERE orderID > "1000" LIMIT 1`。
> Lindorm 不支持 STRING 字段的中间包含停止符(结尾有停止符是正常的)。
### 3. 查询条件中列名填写错误
- **列名大小写错误**:Lindorm 的列名是大小写敏感的
- **未指定列簇**:如果使用了多 family 功能,必须在查询条件中指定 family,例如 `WHERE meta:column1=xxx`
### 4. 表属性设置了 TTL,查询时数据已过期
TTL 的单位是秒(s),时间戳的单位为毫秒(ms)。
**常见问题**:
- 写入数据时指定了较早的时间戳,且该时间戳与当前时间的差值大于 TTL 设定值,可能在写入时就被清理掉
- 如果在时间戳的使用上未遵循时间语义,而是使用自定义的版本号(例如 1、2、3、4 这种比较小的数字),数据极易被过期清理
- 使用较大的自定义时间戳/版本号(如微秒或纳秒时间戳),可能造成数据无法正常过期清理
### 5. 设置了 Cell TTL,查询时数据已过期
Cell TTL 单位为毫秒(ms)。如果 KV 上设置了 Cell TTL,其过期时间为 `min{Cell 上设置的过期时间, 表属性上的过期时间}`。
### 6. 删除请求的时间戳设置不合理
删除请求支持设置时间戳/版本号,代表删除该行/该列在此时间/版本之前的数据。
- 如果删除请求的时间戳比数据写入的时间戳/版本号小,那么这行数据不会被删除
- 如果删除请求设置的时间戳/版本号较大,删除请求提交后将持续生效,后续写入的数据可能被立即删除
> SQL 访问方式不支持设置删除时间戳。
### 7. 表属性 VERSIONS 被设置为 0
VERSIONS 属性的值为 0 表示表中的数据不会保留,任何写入的数据都将被删除,无法查询。
**解决方案**:删除表并重新建表,或将 VERSIONS 属性修改为大于等于 1 的值。
### 8. 表属性为 IMMUTABLE 的表有更新
IMMUTABLE 表示该表仅支持整行写入(即一行的数据通过一条 UPSERT 语句写入),不可更新或删除。
---
## SQL 使用注意事项
### 动态列表 SELECT * 限制
开启动态列的表可能包含大量的动态列,且表的 Schema 定义不固定。对这类表进行全表扫描将会导致 IO 消耗严重。
**解决方案**:在 SELECT 语句中添加 LIMIT 子句,限制返回结果的数量,例如 `SELECT * FROM test LIMIT 10`。
### 常用表属性单位
| 属性 | 单位 | 说明 |
|------|------|------|
| TTL | 秒 (s) | 数据有效期 |
| COMPACTION_MAJOR_PERIOD | 毫秒 (ms) | Major Compaction 周期 |
| 时间戳 | 毫秒 (ms) | 数据版本时间 |
| Cell TTL | 毫秒 (ms) | 单个 KV 过期时间 |
---
## HASH 打散注意事项
主键 HASH 打散功能通过 HASH 函数将数据分散到不同的分片(Region),避免数据倾斜和负载不均等问题。
### DDL 限制
- HASH 函数表达式必须放在最前面
- 错误:`PRIMARY KEY(p1, hash32(p1), p2)`
- 正确:`PRIMARY KEY(hash32(p1), p1, p2)`
- 在主键列或索引中对某列使用 HASH 算法时,必须指定该列为主键列或索引列
- 已指定 HASH 算法的主键列不支持修改
- 使用主键 HASH 打散功能后,不支持使用 bulkload 方式导入数据
### DML 限制
**写入数据**:无需在 SQL 语句中添加 HASH 相关参数,系统自动生成并填充 HASH 值。
**查询数据**:
- 必须指定所有已使用 HASH 算法的主键列的值,否则系统无法计算 HASH 值,导致查询全表
- 对于使用了 HASH 算法的主键列,查询条件必须为等值查询,不支持范围查询
```sql
-- 推荐的使用方式
SELECT * FROM t1 WHERE p1=1 AND p2=1;
-- 不推荐:未指定主键列 p1 的值,会导致查询全表
SELECT * FROM t1 WHERE p2=1;
-- 错误:不支持 HASH 列的范围查询
SELECT * FROM t1 WHERE p2=1 AND p1>2 AND p1<8;
```
---
## 冷热分离注意事项
### 数据何时进入冷存储
Lindorm 通过 Compaction 机制异步将冷数据从热存储归档至冷存储:
- 系统触发时间默认为冷热分界线的一半
- 最小为 1 天
- 最大为 Major Compaction 周期的一半(默认 20 天)
例如:冷热分界线为 3 天,则默认 1.5 天自动触发一次 Compaction 归档任务。
### 手动触发 Compaction
可以通过 `major_compact 'tableName'` 手动触发 Compaction。
> `major_compact` 命令会加重 IO 负载,不建议频繁使用。
如果执行 Compaction 后数据还未进入冷存储,可能是数据还未写入磁盘,请先执行 flush 操作。
### 自定义时间列冷热分离
- 如果一行数据未写入自定义时间列,该行数据会被保留在热存储区,不会冷热分离
- 如果更新的冷数据不是自定义时间列,更新后的数据依旧是冷数据
- 如果更新的是自定义时间列中的数据,需要根据新写入的时间内容来重新划分冷热数据
### 按时间戳冷热分离
由于更新后的数据重新记录了时间戳,因此冷数据更新后变为热数据。
### HOT_ONLY 查询注意事项
查询语句可以通过设置 `HOT_ONLY` / `_l_hot_only_` 仅查询热数据。但由于数据归档至冷存储的操作是周期性触发的,部分冷数据可能会滞留在热存储,导致查询结果中包含冷数据。
**解决方案**:在查询条件中添加热数据时间范围:
```sql
SELECT /*+ _l_hot_only_(true), _l_ts_min_(1000), _l_ts_max_(2001) */ * FROM test WHERE p1>1;
```
### 索引表与主表冷数据不一致
主表和索引表的冷数据归档过程是独立的,且归档操作是周期性触发的,导致主表和索引表滞留在热存储的数据不一致,进而出现查询到的冷数据不一致的现象。
**解决方案**:在查询条件中添加热数据的时间范围。
### 开启冷热分离后立即触发 Compaction
当前时间减去最旧的文件的生成时间大于冷数据归档周期时,则会触发冷数据转存。
---
## Compaction 与存储管理
### Compaction 的作用
- 清理过期数据(TTL)
- 清理删除操作的遗留标记(deleteMarker)
- 归档冷热数据
- 压缩数据减少空间占用
### Compaction 自动触发周期
系统默认的自动触发周期为 20 天。在 TTL 场景下,默认周期为 `min(TTL 值, 20 天)`。
**修改触发周期**:
```sql
-- 将自动触发周期修改为 2 天(单位为毫秒)
ALTER TABLE <tableName> SET 'COMPACTION_MAJOR_PERIOD'='172800000';
```
### Compaction 对业务的影响
Compaction 操作处理数据时会消耗 CPU。CPU 资源充足时对业务没有太大影响,反而有助于提升读性能、释放存储空间。
**监控 Compaction 状态**:通过实例监控查看宽表引擎指标 > 集群负载 > Compaction 队列长度。如果该数值持续增长或始终持平,说明可能存在大量排队任务。
**优化建议**:
- 如果 CPU 利用率小于 40%,宽表 2.6.5 以上版本支持自动根据 load 调整参数,直接升级小版本即可
- 如果 CPU 利用率大于 40%,建议增加宽表引擎节点数量
### 已设置 TTL 但存储仍在上涨
通过实例监控查看 Compaction 队列长度,确认是否存在任务积压情况,如果积压较多就会出现数据清理滞后的现象。
如果队列无任务且读写负载较低,可以手动执行 Compaction 或调整 Major Compaction 周期。
### 磁盘容量上限处理
- 扩容热存储容量
- 通过 `DROP TABLE` 直接删除无用的表,立即释放存储空间
- 通过 `TRUNCATE TABLE` 清空表中数据,立即释放存储空间
> 请勿使用 DELETE 直接删除数据。Lindorm 的删除操作是直接写入删除标记(Delete Marker),等到下次触发 Compaction 操作才会被彻底清理。
### 磁盘容量上限后无法删除数据
磁盘达到容量上限后系统将禁止所有数据的写入,包括删除标记。删除标记无法写入将导致需删除的数据无法通过 Compaction 操作清理。
### 压缩算法和编码方式
可以将表的压缩算法 COMPRESSION 设置为 ZSTD,编码方式 DATA_BLOCK_ENCODING 设置为 INDEX,并执行 Major Compact 以减少存储空间。
```sql
ALTER TABLE <tablename> SET 'COMPRESSION' = 'ZSTD','DATA_BLOCK_ENCODING' = 'INDEX';
ALTER TABLE <tablename> COMPACT;
```
> 如果您是通过 SQL 方式创建的表,则默认已设置,无需重复设置。
FILE:references/01-dev/table-design.md
# Lindorm SQL 建表语句指南
本文档提供 Lindorm 宽表建表语句的完整指南,包括语法、最佳实践和高级特性。
## 数据类型
### 数据类型查找规则
当使用 Lindorm 宽表引擎建表时,需要了解其支持的数据类型。Lindorm 宽表引擎支持基础数据类型、JSON 数据类型和空间数据类型。
1. 当用户提到的数据类型不在 Lindorm 宽表引擎支持的数据类型范围内时,需要提示用户 Lindorm 宽表引擎支持的数据类型,并提供相关文档链接。
2. 数据类型优先从下文的表格中选择。
3. 当数据类型不在下文表格中时,需要查找官方文档:
- **基础数据类型**: https://help.aliyun.com/zh/lindorm/developer-reference/basic-data-types
- **JSON 数据类型**: https://help.aliyun.com/zh/lindorm/developer-reference/json-data-type
- **空间数据类型**: https://help.aliyun.com/zh/lindorm/developer-reference/spatial-data-type-1
4. 如果数据类型仍然不在支持范围内,需要提示用户不支持该数据类型或联系技术支持。
### 基本数据类型
| 类型 | 字节长度 | 说明 | Java 映射 | 取值范围/精度 |
|------|----------|------|-----------|----------------|
| BOOLEAN | 1 字节 | 布尔型 | java.lang.Boolean | true / false |
| TINYINT | 1 字节 | 8 位精确数值 | java.lang.Byte | -128 ~ 127 (有符号) |
| SMALLINT | 2 字节 | 16 位精确数值 | java.lang.Short | -32768 ~ 32767 |
| INTEGER | 4 字节 | 32 位精确数值 | java.lang.Integer | -2^31 ~ 2^31-1 |
| BIGINT | 8 字节 | 64 位精确数值 | java.lang.Long | -2^63 ~ 2^63-1 |
| FLOAT | 4 字节 | 单精度浮点数 | java.lang.Float | 约 7 位有效数字 |
| DOUBLE | 8 字节 | 双精度浮点数 | java.lang.Double | 约 15-17 位有效数字,科学计数法表示 |
| DECIMAL(precision, scale) | 变长 | 高精度十进制 | java.math.BigDecimal | precision: [1,38], scale: [0,precision] |
| VARCHAR | 变长 | 变长字符串 | java.lang.String | 最大 2MB,支持中文 |
| CHAR(n) | n 字节 | 定长字符串 | java.lang.String | 固定长度 n,不足自动补空格 |
| BINARY(n) | n 字节 | 定长二进制 | byte[] | 固定 n 字节,不足补 0,超出截断 |
| VARBINARY | 变长 | 变长二进制 | byte[] | 作为主键时只能是最后一列 |
| DATE | 4 字节 | 日期(仅日期无时间) | java.sql.Date | YYYY-MM-DD,**不推荐**(时区转换易出错) |
| TIME | 4 字节 | 时间 | java.sql.Time | HH:mm:ss,受时区影响 |
| TIMESTAMP | 8 字节 | 时间戳 | java.sql.Timestamp | 0001-01-01 00:00:00 ~ 9999-12-31 23:59:59 |
**重要说明**:
- **TIMESTAMP**: Lindorm 支持的最大值为 `9999-12-31 23:59:59`,而 MySQL 仅支持到 `2038-01-19 03:14:07`
- **DECIMAL**: 适用于金额等高精度场景,监控等精度要求不高的场景推荐使用 FLOAT/DOUBLE
- **DATE/TIME**: 在时区转换过程中易出现日期错误,建议避免使用
### JSON 数据类型
**适用引擎**: 仅宽表引擎支持(要求版本 2.6.2+)
**限制**: 主键列不支持 JSON 类型
**建表语法**:
```sql
-- 创建表时指定 JSON 列
CREATE TABLE tb (
p1 INT,
c1 VARCHAR,
c2 JSON,
PRIMARY KEY(p1)
);
-- 修改表添加 JSON 列
ALTER TABLE tb ADD c3 JSON;
```
### 类型使用建议
- **主键列**: 推荐使用 VARCHAR 或 BIGINT,VARBINARY 只能作为最后一列主键
- **时间戳**: 优先使用 TIMESTAMP(范围更大),或使用 BIGINT (毫秒)
- **大文本**: 使用 VARCHAR,最大 2MB
- **二进制**: 使用 BINARY(n) 或 VARBINARY
- **高精度数值**: 使用 DECIMAL(如金额)
- **一般数值**: 使用 INTEGER/BIGINT/FLOAT/DOUBLE
- **避免使用**: DATE、TIME(时区问题)
## 基础语法
**注意事项**:
- 只能基于本 Skill 文档中明确记载的内容回答用户问题,严禁推测、联想或生成文档中不存在的 SQL 语法、参数、功能或配置。
- 如果文档中没有相关信息,必须明确告知用户“当前文档未收录此内容”,并引导用户查阅官方文档。
- 生成的代码示例必须基于文档中的模板,参数和语法必须与文档一致。
### CREATE TABLE 语法
```sql
CREATE TABLE [ IF NOT EXISTS ] table_identifier
'('
column_definition
( ',' column_definition )*
',' PRIMARY KEY '(' primary_key ')'
( ',' {KEY|INDEX} [index_identifier]
[ USING index_method_definition ]
[ INCLUDE column_identifier ( ',' column_identifier )* ]
[ WITH index_options ]
)*
')'
[ WITH table_options ]
```
**语法要素说明:**
| 语法要素 | 说明 |
|---------|-----------------------------------------------------------------------|
| column_definition | `column_identifier data_type [ NOT NULL ] [ DEFAULT default_value ] ` |
| primary_key | `column_identifier [ ',' column_identifier (ASC\|DESC)]` |
| index_method_definition | `{ KV \| SEARCH }` |
| index_options | `'(' option_definition (',' option_definition )*')'` |
| table_options | `'(' option_definition (',' option_definition )* ')'` |
| option_definition | `option_identifier '=' string_literal` |
### DEFAULT 子句
列定义支持通过 DEFAULT 子句设置默认值。
**限制:**
- 默认值只能是列类型的常量表达式或无参函数 `NOW()`
- 不可以设置为 NULL
**示例:**
```sql
CREATE TABLE orders (
id VARCHAR NOT NULL,
status INTEGER DEFAULT -1,
create_time TIMESTAMP DEFAULT NOW(),
remark VARCHAR DEFAULT 'pending',
PRIMARY KEY(id)
);
```
### 命名规范
**表名(table_identifier):**
- 可包含数字、大小写英文字符、半角句号(.)、中划线(-)和下划线(_)
- 不能以半角句号(.)或中划线(-)开头
- 长度为 1~255 字符
**列名(column_identifier):**
- 可包含数字、大小写英文字符、半角句号(.)、中划线(-)和下划线(_)
- 不允许使用系统保留关键字
- 长度不能超过 255 字节
### 基础建表示例
```sql
-- 基础表
CREATE TABLE orders (
channel VARCHAR NOT NULL,
id VARCHAR NOT NULL,
ts TIMESTAMP NOT NULL,
status VARCHAR,
location VARCHAR,
PRIMARY KEY(channel, id, ts)
);
-- 带表属性的表
CREATE TABLE orders (
channel VARCHAR NOT NULL,
id VARCHAR NOT NULL,
ts TIMESTAMP NOT NULL,
status VARCHAR,
PRIMARY KEY(channel, id, ts)
) WITH (
COMPRESSION = 'ZSTD',
TTL = '86400'
);
```
## 主键设计
### 主键特性
- **不可修改**:主键在建表时确定,建表后不可增加、删除、更换顺序或修改数据类型
- **唯一性**:所有主键列共同组成 RowKey,在一张表里是唯一的
- **聚簇索引**:数据按照主键顺序存储,遵循最左匹配原则
### 主键限制
- 单个主键列的最大长度为 2 KB
- 所有主键列的长度之和不能超过 30 KB
- 单个非主键列的最大长度不能超过 2 MB
### 主键设计最佳实践
#### 避免热点问题
```sql
-- 错误示例:递增主键导致写入热点
CREATE TABLE logs (
timestamp BIGINT NOT NULL,
message VARCHAR,
PRIMARY KEY(timestamp)
);
-- 正确示例:使用 HASH 打散
CREATE TABLE logs (
timestamp BIGINT NOT NULL,
hostname VARCHAR NOT NULL,
message VARCHAR,
PRIMARY KEY(hash32(timestamp), timestamp, hostname)
);
```
#### 主键设计原则
1. **主键第一列尽量分散**:不建议使用相同前缀
2. **避免自增数据**:如时间戳列作为第一列
3. **避免枚举值**:如 order_type 作为第一列
4. **主键列数量**:建议控制在 1~3 个
5. **主键值长度**:建议尽量短小,使用固定长度类型
#### 常见场景设计
**日志/时序数据:**
```sql
-- 查询某机器某指标某段时间数据
CREATE TABLE logs (
hostname VARCHAR NOT NULL,
log_event VARCHAR NOT NULL,
timestamp BIGINT NOT NULL,
content VARCHAR,
PRIMARY KEY(hostname, log_event, timestamp)
);
-- 查询最新数据(倒序)
CREATE TABLE logs (
hostname VARCHAR NOT NULL,
log_event VARCHAR NOT NULL,
timestamp BIGINT NOT NULL,
content VARCHAR,
PRIMARY KEY(hostname, log_event, timestamp DESC)
);
-- 时间维度数据量大,使用分桶打散
CREATE TABLE logs (
bucket BIGINT NOT NULL,
timestamp BIGINT NOT NULL,
hostname VARCHAR NOT NULL,
log_event VARCHAR NOT NULL,
content VARCHAR,
PRIMARY KEY(bucket, timestamp, hostname, log_event)
);
-- 写入时:bucket = timestamp % numBuckets
```
**交易数据:**
```sql
-- 按卖家查询
CREATE TABLE seller_orders (
seller_id VARCHAR NOT NULL,
timestamp BIGINT NOT NULL,
order_number VARCHAR NOT NULL,
amount BIGINT,
PRIMARY KEY(seller_id, timestamp, order_number)
);
-- 按买家查询
CREATE TABLE buyer_orders (
buyer_id VARCHAR NOT NULL,
timestamp BIGINT NOT NULL,
order_number VARCHAR NOT NULL,
amount BIGINT,
PRIMARY KEY(buyer_id, timestamp, order_number)
);
-- 按订单号查询
CREATE TABLE order_index (
order_number VARCHAR NOT NULL,
seller_id VARCHAR,
buyer_id VARCHAR,
PRIMARY KEY(order_number)
);
```
## HASH 主键打散
使用 HASH 函数将数据分散到不同分片,避免数据倾斜和热点问题。
### 支持的 HASH 算法
| 算法 | 说明 |
|------|------|
| hash8 | 8 位 HASH,存储消耗最小 |
| hash32 | 32 位 HASH,每对 keyValue 额外消耗 4 Bytes |
| hash64 | 64 位 HASH,存储消耗最大 |
### 使用示例
```sql
-- 对单个主键列使用 HASH
CREATE TABLE t1 (
p1 BIGINT,
p2 INTEGER,
c1 INTEGER,
c2 VARCHAR,
PRIMARY KEY(hash32(p1), p1, p2)
);
-- 对多个主键列使用 HASH
CREATE TABLE t2 (
p1 BIGINT,
p2 INTEGER,
c1 INTEGER,
c2 VARCHAR,
PRIMARY KEY(hash8(p1, p2), p1, p2)
);
```
### 注意事项
- HASH 函数表达式必须放在主键最前面
- 已使用 HASH 算法的主键列不支持修改
- 查询时必须指定所有已使用 HASH 算法的主键列的值
- HASH 列只支持等值查询,不支持范围查询
- 使用主键 HASH 打散后,不支持 bulkload 方式导入数据
## 表属性(WITH 子句)
### 常用表属性
| 属性 | 类型 | 说明 |
|------|------|------|
| COMPRESSION | STRING | 压缩算法:SNAPPY、ZSTD、LZ4。默认 ZSTD |
| TTL | INT | 数据有效期,单位秒。默认为空(不过期) |
| NUMREGIONS | INT | 预分区 Region 数 |
| STARTKEY / ENDKEY | 与主键第一列类型相同 | 预分区起止 Key |
| SPLITKEYS | 与主键第一列类型相同 | 预分区分裂点 |
| DYNAMIC_COLUMNS | STRING | 是否开启动态列,'true' 或 'false' |
| MUTABILITY | STRING | 索引写入模式:IMMUTABLE、MUTABLE_LATEST 等 |
| CONSISTENCY | STRING | 一致性级别:eventual(默认)、strong |
### 示例
```sql
-- 设置压缩和 TTL
CREATE TABLE logs (
id VARCHAR NOT NULL,
content VARCHAR,
PRIMARY KEY(id)
) WITH (
COMPRESSION = 'ZSTD',
TTL = '2592000' -- 30 天
);
-- 设置预分区
CREATE TABLE orders (
id VARCHAR NOT NULL,
amount BIGINT,
PRIMARY KEY(id)
) WITH (
NUMREGIONS = '16',
STARTKEY = 'a',
ENDKEY = 'z'
);
```
## 动态列
动态列允许在建表时未显式定义的列在运行时动态写入。
### 开启动态列
```sql
-- 建表时开启
CREATE TABLE t_dynamic (
p1 INT,
c1 INT,
c2 VARCHAR,
PRIMARY KEY(p1)
) WITH (DYNAMIC_COLUMNS = 'true');
-- 已有表开启
ALTER TABLE t_dynamic SET 'DYNAMIC_COLUMNS' = 'true';
```
### 写入动态列
动态列的数据类型均为 VARBINARY(字节数组)。
```sql
-- SQL 文本写入(值为 HexString)
UPSERT INTO t_dynamic (p1, c2, c3) VALUES (1, '1', '41');
-- 使用 x'' 语法指定 HexString(SQL 引擎 2.6.8+)
UPSERT INTO t_dynamic (p1, c4) VALUES (3, x'ef0011');
```
### 查询动态列
```sql
-- 显式指定动态列
SELECT p1, c2, c3, c4 FROM t_dynamic WHERE p1 = 1;
-- 使用 SELECT *(必须添加 LIMIT)
SELECT * FROM t_dynamic LIMIT 10;
```
## 通配符列
通配符列实现多数据类型动态列写入,解决动态列仅支持 VARBINARY 的限制。
### 支持的通配符
| 通配符 | 说明 |
|--------|------|
| * | 匹配任意字符序列,包括空序列 |
| ? | 匹配任意单个字符 |
### 使用示例
```sql
-- 创建带通配符列的表
CREATE TABLE tb (
pk INTEGER,
c1 VARCHAR,
`c2*` BIGINT,
`c3*` VARCHAR,
PRIMARY KEY(pk)
) WITH (wildcard_column = 'c2*,c3*');
-- 写入数据
UPSERT INTO tb(pk, c1, c2, c21, c22, c31) VALUES (1, 'a1', 2, 21, 22, 'c3');
```
### 限制
- 通配符列不能作为主键
- SELECT * 查询必须添加 LIMIT
- 仅支持为通配符列创建搜索索引,不支持二级索引
- 不支持使用通配符列名进行数据查询,必须使用实际列名
## 索引(建表时创建)
在 CREATE TABLE 语句中通过 KEY 或 INDEX 子句创建索引。
### 语法
```sql
CREATE TABLE table_name (
column_definitions,
PRIMARY KEY(pk_columns),
{KEY|INDEX} [index_name]
[ USING { KV | SEARCH } ]
[ INCLUDE (columns) ]
[ WITH (index_options) ]
);
```
### 索引类型
| 类型 | 关键字 | 说明 |
|------|--------|------|
| 二级索引 | KV(默认) | 适用于非主键匹配场景 |
| 搜索索引 | SEARCH | 适用于多维查询、分词、模糊查询场景 |
### 冗余列设置
在建表时创建索引,可通过 INCLUDE 或 WITH (INDEX_COVERED_TYPE) 设置冗余列:
```sql
-- 显式指定冗余列
CREATE TABLE sensor (
device_id VARCHAR NOT NULL,
region VARCHAR NOT NULL,
time TIMESTAMP NOT NULL,
temperature DOUBLE,
humidity BIGINT,
PRIMARY KEY(device_id, region, time),
KEY (temperature, time) INCLUDE (humidity)
);
-- 冗余所有已定义列
CREATE TABLE sensor (
device_id VARCHAR NOT NULL,
region VARCHAR NOT NULL,
time TIMESTAMP NOT NULL,
temperature DOUBLE,
humidity BIGINT,
PRIMARY KEY(device_id, region, time),
KEY (temperature, time) WITH (INDEX_COVERED_TYPE = 'COVERED_ALL_COLUMNS_IN_SCHEMA')
);
```
### 示例
```sql
-- 创建表同时创建二级索引
CREATE TABLE orders (
order_id VARCHAR NOT NULL,
user_id VARCHAR NOT NULL,
amount BIGINT,
PRIMARY KEY(order_id),
INDEX idx_user USING KV (user_id) INCLUDE (amount)
);
-- 创建表同时创建搜索索引
CREATE TABLE products (
id VARCHAR NOT NULL,
name VARCHAR,
description VARCHAR,
PRIMARY KEY(id),
INDEX idx_search USING SEARCH (name, description)
);
```
## CREATE INDEX 语法
单独创建索引的语法。
```sql
CREATE INDEX [IF NOT EXISTS] [index_name]
[ USING { KV | SEARCH | COLUMNAR } ]
ON table_name (index_key_expression)
[ INCLUDE (columns) ]
[ { ASYNC | SYNC } ]
[ WITH (index_options) ];
```
### 索引类型
| 参数 | 索引类型 | 说明 |
|------|----------|------|
| KV | 二级索引 | 默认类型,每表最多 3 个 |
| SEARCH | 搜索索引 | 全文搜索,每表最多 1 个 |
| COLUMNAR | 列存索引 | 分析计算,每表最多 1 个 |
### 列存索引开通条件
**旧版列存索引**(默认,已正式发布):
| 引擎 | 作用 |
|------|------|
| 宽表引擎 | 源数据存储 |
| LindormDFS | 文件存储(版本 >= 4.0.0) |
| 计算引擎 | 执行分析查询 |
**开通说明**:
1. **开通 Lindorm 计算引擎**
2. **购买计算资源**(为宽表引擎到计算引擎的数据同步购买计算资源)
**控制台开通路径**:
1. 登录 [Lindorm 控制台](https://lindorm.console.aliyun.com/)
2. 在实例列表页,单击**目标实例ID**
3. 在左侧导航栏,选择**宽表引擎**
4. 单击**列存索引**页签,并单击**立即开通**
5. 在弹出的对话框中单击**确定**
**创建列存索引示例**(旧版,已正式发布):
```sql
-- 创建列存索引(旧版,同步延迟约15分钟)
CREATE INDEX idx_columnar USING COLUMNAR ON my_table(
pk0, pk1, pt_d, col0, col1
)
PARTITION BY ENUMERABLE (pt_d, bucket(16, pk0))
WITH (
`lindorm_columnar.user.index.database` = 'my_index_db', -- 列存索引所在库名
`lindorm_columnar.user.index.table` = 'my_index_tbl' -- 列存索引表名
);
```
**查询列存索引**:
```sql
-- 使用 HINT 指定走列存索引查询
SELECT /*+ _use_ldps_(cg_name), _columnar_index_ */
pk1, SUM(col0)
FROM my_db.my_table
WHERE pt_d = '2024-01-01'
GROUP BY pk1;
```
> **注意**:`cg_name` 为计算引擎 OLAP 资源组的名称。
---
### 新版列存索引(实时同步)
> ⚠️ **如需秒级同步延迟**,可使用新版列存索引,目前处于**邀测阶段**。
**新旧版对比**:
| 特性 | 旧版列存索引 | 新版列存索引 |
|------|-------------|-------------|
| **同步延迟** | 15 分钟 | **实时(秒级)** |
| **数据新鲜度** | 延迟较高 | 近实时 |
| **适用场景** | 离线分析 | 实时分析 |
| **版本状态** | 已正式发布 | 邀测阶段 |
**申请方式**:
- 联系 Lindorm 技术支持申请使用(可在阿里云官网提交工单或联系您的客户经理)
**新版引擎版本要求**:
| 引擎 | 版本要求 | 作用 |
|------|---------|------|
| 宽表引擎 | >= 2.8.6 | 源数据存储 |
| LTS | >= 3.9.1 | 日志实时订阅 |
| 列存引擎 | >= 3.10.15 | 索引数据存储 |
| 计算引擎 | - | 执行分析查询 |
> **重要区别**:新版列存索引依赖**列存引擎**存储索引数据,支持秒级同步;旧版不依赖列存引擎,同步延迟约15分钟。
**新版创建语法**:
新增 `lindorm_columnar.user.index.type = 'LCE'` 属性:
```sql
-- 创建列存索引(新版,秒级同步)
CREATE INDEX idx_columnar USING COLUMNAR ON my_table(
pk0, pk1, pt_d, col0, col1
)
PARTITION BY ENUMERABLE (pt_d, bucket(16, pk0))
WITH (
`lindorm_columnar.user.index.database` = 'my_index_db', -- 列存索引所在库名
`lindorm_columnar.user.index.table` = 'my_index_tbl', -- 列存索引表名
`lindorm_columnar.user.index.type` = 'LCE' -- 新版必填,指定走新版链路
);
```
> **重要**:`lindorm_columnar.user.index.type = 'LCE'` 为新版必填项,缺失则走旧版链路(15分钟延迟)。
**官方文档**:[列存索引新版](https://help.aliyun.com/zh/lindorm/user-guide/column-store-index-new-version)
### 搜索索引开通条件
> ⚠️ **重要**:搜索索引需要先在控制台开通后才能使用,否则创建索引会报 `SERVER INTERNAL ERROR`。
**开通步骤**:
1. 登录 [Lindorm 控制台](https://lindorm.console.aliyun.com/)
2. 在实例列表页,单击**目标实例ID**
3. 在左侧导航栏,单击**宽表引擎** → **搜索索引**
4. 点击「立即开通」
4. 配置以下参数:
| 参数 | 说明 | 建议 |
|------|------|------|
| 搜索节点规格 | 搜索引擎处理能力 | 16核64GB(QPS 500+,写入 TPS 50000+) |
| 搜索节点数量 | 搜索节点个数 | 至少 2 个(避免单点故障) |
| LTS 数据同步规格 | 数据同步服务 | 4核16GB |
| LTS 节点数量 | 同步节点个数 | 建议 2 个 |
| 存储空间 | 搜索引擎存储大小 | 按数据量评估 |
> **依赖说明**:搜索索引依赖 LTS 数据同步服务,开通时会同时开通 LTS。如已开通备份恢复或数据订阅功能,则无需重复开通 LTS。
**官方文档**:https://help.aliyun.com/zh/lindorm/user-guide/enable-the-search-index-feature
#### 搜索索引适用场景
| 场景 | 说明 | 示例 |
|------|------|------|
| 多维组合查询 | 任意索引列随机组合查询 | `WHERE c1=? AND c2=?` 或 `WHERE c3=?` |
| 分词查询 | 文本分词匹配 | `MATCH(content) AGAINST('关键词')` |
| 模糊查询 | LIKE 后缀/包含 | `WHERE name LIKE '%关键词%'` |
| 聚合分析 | COUNT/SUM/MIN/MAX/AVG | `SELECT COUNT(*) GROUP BY` |
| 排序分页 | 任意索引列排序 | `ORDER BY create_time DESC LIMIT 10` |
#### 分词查询语法(MATCH AGAINST)
搜索索引支持分词查询,使用 `MATCH ... AGAINST` 语法:
```sql
-- 创建带分词器的搜索索引
CREATE INDEX idx_text USING SEARCH ON articles (
title(type=text, analyzer=ik),
content(type=text, analyzer=ik)
);
-- 分词查询:查询 content 列包含"功能介绍"的记录
SELECT * FROM articles WHERE MATCH(content) AGAINST('功能介绍');
-- 会匹配包含"功能"、"介绍"或"功能介绍"的记录
```
**支持的分词器**:
| 分词器 | 说明 |
|--------|------|
| standard | 标准分词器(默认) |
| ik | 中文智能分词(推荐) |
| english | 英文分词 |
| whitespace | 按空格分词 |
| comma | 按逗号分词 |
#### 数据类型限制
> ⚠️ 搜索索引**不支持以下数据类型**:DECIMAL、DATE、TIME。如需使用,请用 DOUBLE 替代 DECIMAL,用 TIMESTAMP 替代 DATE/TIME。
**错误示例**:搜索索引列使用 DECIMAL 会报错 `Incompatible data type casting`
#### 索引构建时间
| 索引类型 | 构建时间(实测参考) |
|---------|-------------------|
| 二级索引(KV) | 约 10 秒 |
| 搜索索引(SEARCH) | **约 30 秒** |
> 在索引构建完成前执行查询仍会报"低效查询拦截"错误。实际构建时间因数据量而异,建议创建索引后等待足够时间再查询。
#### 冷热分离场景注意事项
> ⚠️ 如果实例已开启冷热分离功能,搜索索引构建过程会回查数据,冷存储(容量型云存储)的限流会直接影响索引构建效率,可能导致写入操作出现反压现象。
**开通冷存储前提**:
- 需先在控制台开通**容量型云存储**作为冷存储介质
- 开通路径:登录 Lindorm 控制台 → 选择地域 → 实例列表 → 目标实例ID → 左侧导航栏**冷存储** → 单击**开通**
- ⚠️ **警告**:开通过程需要**滚动重启实例**,可能导致读写请求**延迟波动或连接中断**,建议在业务低峰期操作
- 实例存储类型为**本地 HDD 盘**时,不支持开通容量型云存储
**建议**:
- 在数据写入热存储期间完成索引构建
- 或临时提升冷存储读取限流
### 二级索引冗余列(INCLUDE)
冗余列用于将其他列的数据复制到索引表中,避免查询时回表主表,提升查询性能。
**默认行为**:
- Lindorm SQL 2.9.3.10 及以上版本:未指定 INCLUDE 时,默认**不冗余**任何列
- Lindorm SQL 2.9.3.10 之前版本:未指定 INCLUDE 时,默认**冗余所有列**
**冗余所有列(INDEX_COVERED_TYPE)**:
如果需要冗余所有列,可通过索引属性 `INDEX_COVERED_TYPE` 设置:
| 取值 | 说明 |
|------|------|
| COVERED_ALL_COLUMNS_IN_SCHEMA | 冗余主表中所有已定义的列 |
| COVERED_DYNAMIC_COLUMNS | 冗余所有列,包括动态列(适用于动态表) |
```sql
-- 冗余所有已定义列
CREATE INDEX idx_user ON orders (user_id)
WITH (INDEX_COVERED_TYPE = 'COVERED_ALL_COLUMNS_IN_SCHEMA');
-- 动态表冗余所有列(包括动态列)
CREATE INDEX idx_user ON orders (user_id)
WITH (INDEX_COVERED_TYPE = 'COVERED_DYNAMIC_COLUMNS');
```
**使用建议**:
- 建议显式指定需要冗余的列,避免依赖默认行为
- 冗余列会增加存储空间,建议只冗余查询中常用的列
- 仅二级索引支持冗余列,搜索索引和列存索引不支持
**注意事项**:
- 显式指定的冗余列不可以包含主键列和索引列
**显式指定冗余列示例**:
```sql
-- 显式指定冗余列
CREATE INDEX idx_user ON orders (user_id) INCLUDE (amount, status);
-- 查询时可直接从索引获取 amount 和 status,无需回表
SELECT user_id, amount, status FROM orders WHERE user_id = 'u001';
```
### 示例
```sql
-- 创建二级索引
CREATE INDEX idx_user ON orders (user_id);
-- 创建带冗余列的二级索引
CREATE INDEX idx_user ON orders (user_id) INCLUDE (amount, status);
-- 创建搜索索引
CREATE INDEX idx_search USING SEARCH ON products (name, description);
-- 使用通配符创建搜索索引
CREATE INDEX idx_all USING SEARCH ON products (*);
-- 使用函数表达式创建索引
CREATE INDEX idx_hash ON orders (hash64(user_id, order_date), user_id, order_date);
```
### 搜索索引键属性
```sql
-- 使用分词器
CREATE INDEX idx_text USING SEARCH ON articles (
title(type=text, analyzer=ik),
content(type=text, analyzer=ik)
);
```
| 属性 | 说明 |
|------|------|
| indexed | 是否创建索引,默认 true |
| rowStored | 是否存储原始数据,默认 false |
| columnStored | 是否列存储,默认 true |
| type | 分词字段设为 text |
| analyzer | 分词器:standard、english、ik、whitespace、comma |
## 冷热分离
### 基于时间戳的冷热分离
```sql
-- 建表时设置
CREATE TABLE dt (
p1 INTEGER,
p2 INTEGER,
c1 VARCHAR,
c2 BIGINT,
PRIMARY KEY(p1 DESC)
) WITH (
COMPRESSION = 'ZSTD',
CHS = '86400', -- 冷热分界线,单位秒
CHS_L2 = 'storagetype=COLD'
);
-- 已有表开启
ALTER TABLE dt SET 'CHS' = '86400', 'CHS_L2' = 'storagetype=COLD';
```
### 基于自定义时间列的冷热分离
```sql
-- 建表时设置
CREATE TABLE dt (
p1 INTEGER,
p2 BIGINT,
p3 BIGINT,
c1 VARCHAR,
PRIMARY KEY(p1, p2, p3)
) WITH (
COMPRESSION = 'ZSTD',
CHS = '86400',
CHS_L2 = 'storagetype=COLD',
CHS_COLUMN = 'COLUMN=p2' -- 指定时间列
);
-- 指定时间单位
CREATE TABLE dt (
p1 INTEGER,
p2 BIGINT,
p3 BIGINT,
c1 VARCHAR,
PRIMARY KEY(p1, p2, p3)
) WITH (
COMPRESSION = 'ZSTD',
CHS = '86400',
CHS_L2 = 'storagetype=COLD',
CHS_COLUMN = 'COLUMN=p2|TIMEUNIT=SECONDS'
);
```
**CHS_COLUMN 注意事项:**
- 自定义时间列必须为主键
- 自定义时间列不能作为主键第一列
- 仅支持 BIGINT 和 TIMESTAMP 类型
### 修改和取消冷热分离
```sql
-- 修改冷热分界线
ALTER TABLE dt SET 'CHS' = '1000';
-- 取消冷热分离
ALTER TABLE dt SET 'CHS' = '', 'CHS_L2' = '', 'CHS_COLUMN' = '';
```
## 纠删码(EC)
纠删码是一种数据冗余存储机制,可在保证相同可靠性的同时节约存储空间。
### 前提条件
- 宽表引擎 2.5.4+,底层存储 4.3.4+
- 至少 7 个节点
- 本地 HDD 盘
### 使用方法
```sql
-- 建表时开启
CREATE TABLE dt (
p1 INTEGER,
p2 INTEGER,
PRIMARY KEY(p1)
) WITH (EC_POLICY = 'RS-4-2');
-- 修改纠删码算法
ALTER TABLE dt SET 'EC_POLICY' = 'RS-4-2';
-- 删除纠删码算法
ALTER TABLE dt SET 'EC_POLICY' = '';
```
**说明:** RS-4-2 算法在存储效率上等价于 1.5 副本。
## 预分区
建议在数据量大或使用 Bulkload 导入数据时设置预分区。
### 预分区数量建议
| 场景 | 建议分区数 |
|------|-----------|
| SQL/HBase API 写入 | 节点数 × 4 |
| Bulkload 批量导入 | 数据量(GB) ÷ 8 |
### 示例
```sql
-- 指定分区数和起止 Key
CREATE TABLE orders (
id VARCHAR NOT NULL,
amount BIGINT,
PRIMARY KEY(id)
) WITH (
NUMREGIONS = '16',
STARTKEY = 'a',
ENDKEY = 'z'
);
-- 指定分裂点
CREATE TABLE orders (
id INTEGER NOT NULL,
amount BIGINT,
PRIMARY KEY(id)
) WITH (
SPLITKEYS = '100,200,300,400,500'
);
```
## 完整建表示例
### 物联网设备数据表
```sql
CREATE TABLE iot_device_data (
device_id VARCHAR NOT NULL,
timestamp BIGINT NOT NULL,
metric_name VARCHAR NOT NULL,
metric_value DOUBLE,
`extra_*` VARCHAR,
PRIMARY KEY(hash32(device_id), device_id, timestamp DESC, metric_name)
) WITH (
COMPRESSION = 'ZSTD',
TTL = '7776000', -- 90 天
DYNAMIC_COLUMNS = 'true',
wildcard_column = 'extra_*',
CHS = '2592000', -- 30 天后归档到冷存储
CHS_L2 = 'storagetype=COLD',
CHS_COLUMN = 'COLUMN=timestamp|TIMEUNIT=SECONDS'
);
-- 创建按设备类型查询的二级索引
CREATE INDEX idx_metric ON iot_device_data (metric_name) INCLUDE (metric_value);
```
### 电商订单表
```sql
CREATE TABLE orders (
order_id VARCHAR NOT NULL,
user_id VARCHAR NOT NULL,
create_time BIGINT NOT NULL,
status VARCHAR,
amount BIGINT,
items VARCHAR,
PRIMARY KEY(hash32(order_id), order_id),
INDEX idx_user USING KV (user_id, create_time DESC) INCLUDE (status, amount)
) WITH (
COMPRESSION = 'ZSTD',
CONSISTENCY = 'strong',
MUTABILITY = 'MUTABLE_LATEST',
NUMREGIONS = '32'
);
-- 创建搜索索引支持商品搜索
CREATE INDEX idx_search USING SEARCH ON orders (items(type=text, analyzer=ik));
```
### 日志分析表
```sql
CREATE TABLE app_logs (
bucket INTEGER NOT NULL,
timestamp BIGINT NOT NULL,
app_id VARCHAR NOT NULL,
level VARCHAR NOT NULL,
message VARCHAR,
stack_trace VARCHAR,
PRIMARY KEY(bucket, timestamp, app_id, level)
) WITH (
COMPRESSION = 'ZSTD',
TTL = '604800', -- 7 天
EC_POLICY = 'RS-4-2',
CHS = '86400', -- 1 天后归档
CHS_L2 = 'storagetype=COLD'
);
-- 写入时:bucket = timestamp % 16
```
FILE:references/02-ops/backup-restore.md
# 备份与恢复场景
## 触发条件
- "如何备份 Lindorm 数据?"
- "怎么恢复到昨天的数据状态?"
- "能帮我查看备份列表吗?"
- "误删数据后怎么找回?"
- "自动备份的周期是多久?"
---
## Agent 行为原则
**安全边界:查询备份信息 + 引导手动操作,恢复操作需用户确认**
1. Agent 暂无直接查询备份列表的 API
2. 提供控制台路径配置自动备份策略
3. 给出恢复步骤,但不直接执行恢复操作
4. 恢复操作会覆盖当前数据,需用户明确确认
---
## 官方文档
| 场景 | 文档链接 |
|------|---------|
| 开通备份恢复 | https://help.aliyun.com/zh/lindorm/user-guide/backup-and-restoration |
| 自动备份配置 | https://help.aliyun.com/zh/lindorm/user-guide/automatic-backup-of-data-of-lindorm-wide-tables |
| 恢复到当前实例 | https://help.aliyun.com/zh/lindorm/user-guide/restore-backup-data-to-the-instance-that-corresponds-to-the-original-data |
| 增值服务费用 | https://help.aliyun.com/zh/lindorm/product-overview/value-added-services-pricing |
---
## 各引擎备份能力
| 引擎 | 备份恢复能力 | 说明 |
|------|------------|------|
| **宽表引擎** | ✅ 支持 | 本文档描述的完整备份恢复功能 |
| **搜索引擎** | ❌ 无独立备份 | 搜索索引数据源头在宽表,宽表备份恢复后需重建搜索索引 |
| **时序引擎** | ❌ 无独立备份 | 暂无官方备份恢复文档 |
| **计算引擎** | ❌ 无需备份 | 无状态Spark计算服务,作业结果落回存储引擎,存储引擎备份已覆盖 |
> 官方仅提供宽表引擎的备份恢复功能,其他引擎暂无独立备份方案。
| 类型 | 说明 | 配置方式 |
|------|------|---------|
| **全量备份** | 备份完整的表数据和结构 | 控制台配置周期和范围 |
| **增量备份** | 自动备份变更数据 | 开通后自动运行,无需配置 |
### 全量备份可配置参数
| 参数 | 说明 | 可选范围 | 推荐值 |
|------|------|---------|--------|
| 备份表 | 指定备份范围 | `*` 全库,或 `namespace:table` 格式(如 `default:test` 表示default命名空间下以test开头的表) | `*`(全库) |
| 全量备份周期 | 触发间隔 | 3-10 天 | 7 天 |
| 下次全量备份时间 | 开始时间点 | 建议业务低峰期 | 02:00 |
| 全量备份保留个数 | 保留的备份数量 | 3-12 个 | 7 个 |
### 备份恢复性能参考(不可配置)
| 指标 | 说明 | 参考值 |
|------|------|--------|
| RPO | 故障时可接受的最大数据丢失量 | < 30 秒(实时增量同步) |
| 全量恢复速度 | OSS 最大带宽 / LTS 单机 | 1 GB/s / 100 MB/s |
| 增量恢复速度 | Lindorm 目的集群单机 / LTS 单机 | 30-40 MB/s / 100 MB/s |
---
## 场景A:配置自动备份
### 步骤1:开通备份恢复功能
1. 登录 [Lindorm 控制台](https://lindorm.console.aliyun.com/)
2. 在实例列表页,单击**目标实例ID**
3. 在左侧导航栏,单击**宽表引擎** → **备份恢复**
4. 点击"立即开通"
- 若未开通搜索索引或数据订阅:跳转变配页面,需选择 **LTS Core 规格** 和 **节点数量**(规格选择见增值服务文档),完成后点击"立即购买"
- 若已开通搜索索引或数据订阅:直接开通,无需变配
⚠️ **不支持实例类型**:Lindorm新版实例、单节点实例暂不支持备份恢复功能
### 步骤2:配置全量备份策略
1. 在左侧导航栏,单击**宽表引擎** → **备份恢复** → **全量备份**
2. 点击"创建"
3. 配置参数(推荐值见上方参数表):
- **备份表**:`*`(全库)或指定表如 `default:user_table`
- **全量备份周期**:7 天(周期太短可能无法完成备份,太长影响恢复时间)
- **下次全量备份时间**:02:00(业务低峰期)
- **全量备份保留个数**:7 个
4. 点击"确认"
⚠️ **空间评估**:备份空间不足会导致备份中断。全量备份空间 ≈ (保留个数+1) × 单个全备大小;增量空间 ≈ 日志保留天数 × 每天增量LOG大小。可在 **宽表引擎** → **备份恢复** → **全量备份** 区域查看单个全备大小,增量LOG大小可通过监控获取写入速度估算
### 验证备份
配置后系统会在设定时间自动执行。查看路径:**宽表引擎** → **备份恢复** → **全量备份列表**
### 注意事项
- 首次备份会扫描全表,耗时较长
- 备份过程不影响业务读写
- 增量备份自动运行,无需配置
- 备份存储按量计费,具体价格见[增值服务计费说明](https://help.aliyun.com/zh/lindorm/product-overview/value-added-services-pricing)
---
## 场景B:恢复历史数据
### 前提条件
1. 实例已开通备份恢复功能(⚠️ Lindorm新版实例、单节点实例暂不支持)
2. 存在误删前的备份(查看路径:**宽表引擎** → **备份恢复** → **全量备份列表**)
3. 备份时间点在保留期内
### 方式1:恢复到当前实例
**⚠️ 风险警告:会覆盖当前实例的指定表数据,恢复期间表不可用。建议恢复前先清空原表数据,或在测试实例验证**
1. 登录 [Lindorm 控制台](https://lindorm.console.aliyun.com/)
2. 在实例列表页,单击**目标实例ID**
3. 在左侧导航栏,单击**宽表引擎** → **备份恢复**
4. 在全量备份列表中,点击"发起数据恢复"
5. 配置参数:
- **恢复集群**:当前实例(⚠️ 版本升级后原版本备份数据不能用于恢复新版本)
- **时间点**:选择备份时间点
- **全库恢复**:否(推荐指定表)
- **恢复表**:一行写一个,格式为 `namespace:table`(如 `default:testTable`);恢复到其他表名格式为 `namespace:table/namespace:table2`(如 `default:testTable/default:testTable2`)
6. 点击"确定"
### 方式2:恢复到其他实例
适用场景:数据迁移、环境复制
注意事项:
- 目标实例需有足够存储空间
- 目标实例版本需与源实例一致
- 不支持跨版本恢复
### 恢复进度查询
控制台路径:**宽表引擎** → **备份恢复** → **恢复列表**
### 恢复后验证
```sql
-- 验证数据量
SELECT COUNT(*) FROM user_table;
-- 查看部分数据
SELECT * FROM user_table LIMIT 10;
```
---
## 常见问题
| 问题 | 原因 | 解决方法 |
|------|------|---------|
| 恢复失败:存储空间不足 | 目标实例空间不够 | 扩容存储空间后重试,或清理无用数据 |
| 恢复失败:版本不兼容 | 源和目标实例版本不一致 | 确保版本一致;版本升级后原版本备份数据不能用于恢复新版本 |
| 最近可恢复到哪个时间点? | WAL备份至OSS周期 | 正常情况下最多丢失30秒数据 |
| 恢复需要多长时间? | 取决于数据量 | 全量恢复约100MB/s(LTS单机),增量恢复约30-40MB/s |
| 能恢复到其他表名吗? | 支持 | 格式 `namespace:table/namespace:table2`,可恢复同名表到另一个表 |
---
## 缺参追问
| 缺失参数 | 追问话术 | 默认策略 |
|---------|---------|----------|
| 实例ID | "请问是哪个 Lindorm 实例需要配置备份?" | 引导用户在控制台操作 |
| 备份周期 | "您希望多久备份一次?(推荐7天)" | 默认 7 天 |
| 保留个数 | "备份数据需要保留几个?(推荐7个)" | 默认 7 个 |
| 恢复时间点 | "请问需要恢复到哪个时间点的数据?" | 列出最近的备份列表供选择 |
| 恢复范围 | "需要恢复所有表还是指定的表?" | 默认提示指定表更安全 |
---
## 关联场景
- 备份后数据迁移 → `data-migration.md`
- 恢复后性能验证 → `monitoring-guide.md`
FILE:references/02-ops/connection-troubleshoot.md
# 连接问题排查场景
当用户反馈"无法连接 Lindorm 实例"、"连接超时"等问题时,按本指南执行排查。
## 触发条件
用户的典型表达:
- "连不上实例"
- "连接超时"
- "无法访问 Lindorm"
- "白名单配置对吗?"
- "为什么连接被拒绝?"
## 排查原则
连接问题排查采用**分层排查**策略:实例状态 → 白名单 → 网络配置,逐层定位问题。
输出格式:**问题定位 → 根因分析 → 解决方案**
## 执行流程
### 步骤 1:检查实例状态
**目的**:确认实例是否正常运行。
**执行命令**:
```bash
aliyun hitsdb get-lindorm-instance \
--instance-id <instance-id>
```
**检查要点**:
| 实例状态 | 含义 | 能否连接 | 解决方案 |
|---------|------|:--------:|----------|
| `ACTIVATION` | 运行中 | ✅ 可以 | 继续排查其他项 |
| `CREATING` | 创建中 | ❌ 等待 | 等待实例创建完成(通常 10-30 分钟) |
| `MAINTAINING` | 维护中 | ⚠️ 可能中断 | 等待维护完成或联系技术支持 |
| `STOPPED` | 已停止 | ❌ 需要启动 | 启动实例后再连接 |
| `DELETED` | 已释放 | ❌ 无法恢复 | 实例已释放,无法连接 |
| `CLASS_CHANGING` | 变配中 | ⚠️ 可能中断 | 等待变配完成 |
**输出示例**:
```
【步骤 1:实例状态检查】
✅ 实例状态正常
- 实例 ID:ld-uf6l5kr48wqm6rf1h
- 状态:ACTIVATION(运行中)
- 创建时间:2025-01-15 10:30:00
继续排查白名单配置...
```
如果状态异常:
```
【步骤 1:实例状态检查】
❌ 实例状态异常
- 实例 ID:ld-uf6l5kr48wqm6rf1h
- 状态:CREATING(创建中)
【根因】实例尚未创建完成,无法连接
【解决方案】
- 等待 10-30 分钟,实例创建完成后自动变为 ACTIVATION 状态
- 可通过 get_instance 命令持续检查状态
```
---
### 步骤 2:检查白名单配置
**目的**:确认客户端 IP 是否在白名单中。
**执行命令**:
```bash
aliyun hitsdb get-instance-ip-white-list \
--instance-id <instance-id>
```
**检查要点**:
- 白名单中是否包含客户端 IP?
- 白名单格式是否正确?
- 单个 IP:`192.168.1.100`
- IP 段:`192.168.1.0/24`
- 公网访问:`0.0.0.0/0`(⚠️ 不推荐,安全风险高)
- 内网访问需要:同 VPC 或 VPC 网段在白名单中
**输出示例**:
```
【步骤 2:白名单检查】
当前白名单配置:
- 10.0.0.0/8(内网 VPC 网段)
- 172.16.1.100(特定 IP)
【检查结果】
- 您的客户端 IP:192.168.1.50
- ❌ 未在白名单中
【根因】客户端 IP 不在白名单中,连接被拒绝
【解决方案】
1. 方案一:将客户端 IP 加入白名单
📍 精确路径:
1. 登录 [Lindorm 控制台](https://lindorm.console.aliyun.com/)
2. 在实例列表页,单击**目标实例ID**
3. 在左侧导航栏,单击**访问控制** → **白名单**
4. 点击"修改"添加客户端 IP:
- 单个 IP:192.168.1.50
- IP 段:192.168.1.0/24
2. 方案二:使用同 VPC 内的 ECS 访问
- 确保 ECS 与 Lindorm 在同一 VPC
- 白名单配置为 VPC 网段(如 10.0.0.0/8)
如需更多细节,可参考官方白名单配置指南:
https://help.aliyun.com/zh/lindorm/getting-started/configure-a-whitelist
需要帮您查看网络配置吗?
```
---
### 步骤 3:检查网络配置
**目的**:确认网络类型和访问方式是否匹配。
**执行命令**:
```bash
aliyun hitsdb get-lindorm-instance \
--instance-id <instance-id>
```
**检查要点**:
从实例详情中提取网络配置:
- `NetworkType`:网络类型(`vpc`)
- `VpcId`:VPC ID
- `VswitchId`:交换机 ID
**网络类型与访问方式**:
| 网络类型 | 访问方式 | 要求 | 连接地址特征 |
|---------|---------|------|---------------|
| `vpc` | **内网访问** | 客户端在同 VPC 内 | V1: `-vpc.lindorm.rds.aliyuncs.com`;V2: `-vpc.lindorm.aliyuncs.com`(INTRANET) |
| `vpc` | **公网访问** | 已开通公网地址 + 白名单包含公网 IP | V1: `-pub.lindorm.rds.aliyuncs.com`;V2: `-pub.lindorm.aliyuncs.com`(INTERNET) |
**输出示例**:
```
【步骤 3:网络配置检查】
网络配置:
- 网络类型:VPC
- VPC ID:vpc-uf6xxxxx
- 交换机 ID:vsw-uf6xxxxx
【访问方式判断】
- 您的客户端:公网 IP(182.92.xxx.xxx)
- 实例配置:VPC 内网
【根因】实例为 VPC 内网访问,客户端在公网,无法直连
【解决方案】
1. 方案一:使用同 VPC 内的 ECS 访问
- 在同 VPC 内创建 ECS 实例
- 通过 ECS 内网 IP 访问 Lindorm
2. 方案二:开通公网连接地址(推荐公网用户使用)
📍 精确路径:
1. 控制台:https://lindorm.console.aliyun.com/
2. 点击实例 ID "ld-xxx"
3. 左侧菜单:**配置与管理** → **数据库连接**
4. 找到对应引擎,点击**申请公网连接地址**
5. 等待公网地址生成(通常 1-3 分钟)
6. **配置与管理** → **访问控制** → **白名单** → 添加客户端公网 IP
> ⚠️ 开通公网地址不等于绑定 EIP。Lindorm 的公网访问是通过"申请公网连接地址"实现的,无需绑定 EIP。
3. 方案三:配置 VPN/专线连接
- 通过 VPN 或专线将客户端网络与 VPC 打通
如需更多细节,可参考官方连接指南:
https://help.aliyun.com/zh/lindorm/getting-started/connect-to-an-instance
```
---
### 步骤 4:网络连通性测试(高级排查)
**目的**:使用网络工具(ping、telnet)验证网络连通性。
**⚠️ 前提条件**:前三步检查均正常,但仍无法连接。
#### 4.1 使用 ping 测试网络可达性(参考性)
**⚠️ 注意**:ping 测试仅供参考,**ping 不通不代表无法连接**。
**适用场景**:初步判断网络层连通性,但 Lindorm 可能禁用 ICMP 响应。
**操作步骤**:
```bash
# 从客户端机器执行
ping <lindorm-host>
# 示例
ping lindorm-xxx.lindorm.rds.aliyuncs.com
```
**结果判断**:
| 结果 | 含义 | 后续操作 |
|-----|------|----------|
| ✅ 正常响应 | 网络层可达 | 继续检查端口(步骤 4.2) |
| ❌ Request timeout | ICMP 被禁用或网络不通 | **继续用 telnet 测试端口,不要直接判定为网络不通** |
| ❌ Unknown host | DNS 解析失败 | 检查 DNS 配置或使用 IP 直接连接 |
**重要提示**:
- 云产品通常禁用 ICMP(ping)响应以提高安全性
- **ping 不通时,优先使用 `telnet/nc` 或客户端直接连接测试**
- 只有当 telnet 也失败时,才判定为网络连通性问题
**完整排查指南**:
- 官方文档:[使用 ping 命令检查连接](https://help.aliyun.com/zh/lindorm/support/run-the-ping-command-to-check-the-connection-between-an-ecs-instance-and-a-lindorm-instance)
#### 4.2 使用 telnet 测试端口连通性
**适用场景**:ping 通但仍无法连接,需要测试具体端口是否开放。
**操作步骤**:
```bash
# 测试宽表引擎 HBase API 端口
telnet <lindorm-host> 30020
# 测试宽表引擎 MySQL 协议端口(推荐)
telnet <lindorm-host> 33060
# 测试时序引擎端口
telnet <lindorm-host> 8242
# 测试搜索引擎端口
telnet <lindorm-host> 30070
```
**常用端口**:
端口请以 `SKILL.md` →「代码生成规范 / 端口号速查表」为准。
**结果判断**:
```bash
# ✅ 端口连通(正常)
Trying <ip>...
Connected to <host>.
Escape character is '^]'.
# ❌ 端口不通(异常)
Trying <ip>...
telnet: connect to address <ip>: Connection refused
```
**常见问题**:
- 端口不通 → 检查安全组规则是否开放对应端口
- 端口连通但连接失败 → 检查认证信息(用户名/密码/AccessKey)
**完整排查指南**:
- 官方文档:[使用 telnet 命令检查端口连通性](https://help.aliyun.com/zh/lindorm/support/run-the-telnet-command-to-check-the-connectivity-of-the-service-ports-of-lindorm)
#### 4.3 安全组规则检查
**适用场景**:ping 或 telnet 不通,需要检查 ECS 安全组。
**检查要点**:
1. ECS 出方向规则:允许访问 Lindorm IP 和端口
2. Lindorm 白名单:包含 ECS 的 IP
**配置示例**:
```
ECS 安全组出方向规则:
- 协议:TCP
- 端口范围:根据所用引擎/协议放通(端口见 SKILL.md →「代码生成规范 / 端口号速查表」)
- 目标:Lindorm 实例 IP 或 0.0.0.0/0
Lindorm 白名单:
- 添加 ECS 私网 IP 或 VPC 网段
```
---
## 常见问题与解决方案
### 问题 1:连接超时
**可能原因**:
1. 实例状态非 ACTIVATION
2. 网络不通(VPC/白名单问题)
3. 端口未开放或防火墙拦截
**排查步骤**:
1. 检查实例状态(步骤 1)
2. 检查白名单配置(步骤 2)
3. 检查网络配置(步骤 3)
4. 确认使用正确的连接地址和端口
---
### 问题 2:认证失败
**可能原因**:
1. 用户名或密码错误
2. AccessKey 无权限
**排查步骤**:
1. 确认用户名密码正确(Lindorm 控制台 → 账号管理)
2. 确认 AccessKey 有 Lindorm 操作权限(RAM 控制台 → 权限管理)
---
### 问题 3:公网无法访问
**可能原因**:
1. 未开通公网访问(未申请公网连接地址)
2. 白名单未包含公网 IP
**排查步骤**:
1. 确认实例已开通公网连接地址(Lindorm 控制台 → 配置与管理 → 数据库连接 → 查看是否有公网地址)
2. 确认白名单包含公网 IP(步骤 2)
---
### 问题 4:内网无法访问
**可能原因**:
1. 不在同一 VPC
2. 未配置 VPC 访问白名单
**排查步骤**:
1. 确认客户端在同 VPC 内(步骤 3)
2. 确认白名单包含 VPC 网段(步骤 2)
---
## 常见连接错误及解决方案
Agent 在遇到连接错误时,应该查询官方文档获取详细的解决方案:
**⭐ 重点文档**:
- **Lindorm 连接问题及解决方案**:https://help.aliyun.com/zh/lindorm/support/lindorm-connection-issues-and-solutions
- **Lindorm 连接错误及解决方案**:https://help.aliyun.com/zh/lindorm/support/lindorm-connection-errors-and-solutions
**常见错误类型**:
| 错误类型 | 示例错误信息 | 参考文档 |
|---------|------------|---------|
| **连接超时** | Connection timeout, SocketTimeoutException | 上述文档 + ping/telnet 排查 |
| **连接被拒绝** | Connection refused | 检查端口、安全组、白名单 |
| **认证失败** | Authentication failed | 检查用户名密码、AccessKey |
| **网络不可达** | Network is unreachable | 检查 VPC、路由表 |
| **DNS 解析失败** | Unknown host | 检查 DNS 配置 |
**特殊场景**:
- **Spark 访问连接超时**:https://help.aliyun.com/zh/lindorm/support/cause-analysis-of-connection-timout-problem-in-sparkonmc-access-to
---
## 连接地址获取
Lindorm 不同引擎有不同的连接地址。**务必根据客户端所在环境选择对应的地址**:
### 确认客户端环境
| 客户端位置 | 应使用的地址 | 地址特征 |
|-----------|-------------|----------|
| 阿里云 ECS(同 VPC) | 内网地址 | V1: `-vpc.lindorm.rds.aliyuncs.com`;V2: `-vpc.lindorm.aliyuncs.com`(Type=INTRANET) |
| 本地电脑 / 公网服务器 | 公网地址 | V1: `-pub.lindorm.rds.aliyuncs.com`;V2: `-pub.lindorm.aliyuncs.com`(Type=INTERNET) |
| 阿里云 ECS(跨 VPC) | 公网地址或云企业网 | 需额外配置网络打通 |
> ⚠️ **常见错误**:从公网环境使用 VPC 内网地址,导致连接超时。必须确认客户端所处环境后选择对应地址。
### 获取连接地址
**方法 1:阿里云控制台**
📍 精确路径:
1. 控制台:https://lindorm.console.aliyun.com/
2. 点击实例 ID "ld-xxx"
3. 左侧菜单:**配置与管理** → **数据库连接**
4. 查看各引擎的内网/公网连接地址
5. 如果没有公网地址,点击**申请公网连接地址**
> ⚠️ 开通公网地址不等于绑定 EIP。Lindorm 的公网访问是通过"申请公网连接地址"实现的,无需绑定 EIP。
**方法 2:通过 API 获取**
```bash
# V1/V2 通用:查询引擎连接端点
aliyun hitsdb get-lindorm-instance-engine-list --instance-id ld-xxx
# V2 专属:查询实例详情(含 ConnectAddressList)
aliyun hitsdb get-lindorm-v2-instance-details --instance-id ld-xxx
```
**`get-lindorm-instance-engine-list`(V1/V2 通用)**:
返回 `NetInfoList`,通过 `NetType` 判断网络类型:
- `NetType: "0"` → 公网地址(`-pub`)
- `NetType: "2"` → VPC 内网地址(`-vpc`)
**`get-lindorm-v2-instance-details`(仅 V2)**:
返回 `ConnectAddressList`,通过 `Type` 判断网络类型:
- `Type: INTERNET` → 公网地址(`-pub`)
- `Type: INTRANET` → VPC 内网地址(`-vpc`)
> 如果两个 API 返回中都没有公网地址(无 `NetType: "0"` 且无 `Type: INTERNET`),说明尚未开通公网,需在控制台申请。
### 开通公网访问
**前提**:需要从公网(本地电脑等)连接 Lindorm。
📍 操作路径:
1. 登录 [Lindorm 控制台](https://lindorm.console.aliyun.com/)
2. 在实例列表页,单击**目标实例ID**
3. 在左侧导航栏,单击**数据库连接**
4. 切换至目标引擎页签,单击右上角**开通公网地址**
5. 等待地址生成(1-3 分钟)
6. 在左侧导航栏,单击**访问控制** → **白名单** → 添加客户端公网 IP
> 💡 查看自己公网 IP:`curl ifconfig.me`
---
## 快速诊断命令
一键执行完整排查:
```bash
# 1. 检查实例状态
aliyun hitsdb get-lindorm-instance --instance-id ld-xxx
# 2. 检查白名单
aliyun hitsdb get-instance-ip-white-list --instance-id ld-xxx
```
---
## 完整排查清单
**基础检查(使用 API)**:
| 检查项 | 命令 | 期望结果 |
|--------|------|----------|
| **实例状态** | `aliyun hitsdb get-lindorm-instance --instance-id <id>` | `InstanceStatus = ACTIVATION` |
| **白名单配置** | `aliyun hitsdb get-instance-ip-white-list --instance-id <id>` | 包含客户端 IP 或 VPC 网段 |
| **网络类型** | `aliyun hitsdb get-lindorm-instance --instance-id <id>` | `NetworkType = vpc` |
| **VPC 匹配** | `aliyun hitsdb get-lindorm-instance --instance-id <id>` | 客户端与实例在同 VPC(内网访问) |
| **公网地址** | 控制台或 API 查看 | 已开通公网连接地址(非 EIP) |
**网络连通性测试(高级排查)**:
| 检查项 | 命令 | 期望结果 |
|--------|------|----------|
| **网络可达性** | `ping <host>` | 正常响应 |
| **端口连通性** | `telnet <host> <port>` | Connected |
| **DNS 解析** | `nslookup <host>` | 正确解析到 IP |
| **安全组规则** | 控制台查看 | 允许访问对应端口 |
---
## 官方文档索引(供 Agent 参考)
Agent 在处理连接问题时,**强烈建议**查询以下官方文档获取最新信息:
| 文档 | 链接 | 用途 |
|------|------|------|
| **Lindorm 连接问题及解决方案** | [点击查看](https://help.aliyun.com/zh/lindorm/support/lindorm-connection-issues-and-solutions) | ⭐ 常见连接问题汇总 |
| **Lindorm 连接错误及解决方案** | [点击查看](https://help.aliyun.com/zh/lindorm/support/lindorm-connection-errors-and-solutions) | ⭐ 连接错误码解释 |
| **使用 ping 命令检查连接** | [点击查看](https://help.aliyun.com/zh/lindorm/support/run-the-ping-command-to-check-the-connection-between-an-ecs-instance-and-a-lindorm-instance) | ⭐ 网络可达性测试 |
| **使用 telnet 命令检查端口** | [点击查看](https://help.aliyun.com/zh/lindorm/support/run-the-telnet-command-to-check-the-connectivity-of-the-service-ports-of-lindorm) | ⭐ 端口连通性测试 |
| **Spark 连接超时问题** | [点击查看](https://help.aliyun.com/zh/lindorm/support/cause-analysis-of-connection-timout-problem-in-sparkonmc-access-to) | Spark 特定场景 |
**⭐ 重点文档**:前 4 个文档是连接问题排查的核心参考,Agent 在处理连接问题时应该优先查询。
---
## 输出格式
连接问题诊断报告采用固定结构:
```
【连接问题诊断报告】实例 ld-uf6l5kr48wqm6rf1h
【问题定位】白名单配置问题
【根因分析】
- 实例状态:✅ ACTIVATION(运行中)
- 白名单配置:❌ 客户端 IP 未在白名单中
- 当前白名单:10.0.0.0/8, 172.16.1.100
- 客户端 IP:192.168.1.50
- 网络配置:✅ VPC 配置正常
- 网络测试:
- ping:✅ 通
- telnet 30020:❌ 连接被拒绝(白名单限制)
【解决方案】
1. 将客户端 IP 加入白名单:
📍 精确路径:
1. 控制台:https://lindorm.console.aliyun.com/
2. 点击实例 ID "ld-xxx"
3. 安全设置 → IP 白名单 → 修改
4. 添加 IP:192.168.1.50 或 192.168.1.0/24
2. 或使用同 VPC 内的 ECS 访问
📚 完整故障排查指南:
- 连接问题汇总:https://help.aliyun.com/zh/lindorm/support/lindorm-connection-issues-and-solutions
- 连接错误解决:https://help.aliyun.com/zh/lindorm/support/lindorm-connection-errors-and-solutions
按上述方案修改后,请重试连接。如仍有问题,请提供错误日志。
```
---
## 缺参处理
### 缺 instance-id
**追问策略**:先 `list_instances` 让用户选择实例。
---
## 错误处理
| 错误 | 原因 | 引导用户 |
|------|------|----------|
| **实例不存在** | 实例 ID 错误或已释放 | 建议先 `list_instances` 确认实例 ID |
| **权限不足** | Access Key 无 Lindorm 权限 | 提示需要 `AliyunLindormReadOnlyAccess` 权限 |
FILE:references/02-ops/data-migration.md
# 数据迁移场景
## 触发条件
- "如何从 MySQL 导入数据到 Lindorm?"
- "怎么把 HBase 集群数据迁移过来?"
- "从 Kafka 实时同步数据到 Lindorm 的步骤?"
- "能帮我规划数据迁移方案吗?"
## 核心原则
**⚠️ 安全边界:只读 + 引导,严禁直接执行迁移操作**
迁移涉及源库连接信息(账号密码),直接执行可能泄露敏感信息或导致数据丢失。Agent 只提供方案和步骤,引导用户通过控制台操作。
---
## 官方文档
- LTS(原BDS)服务介绍:https://help.aliyun.com/zh/lindorm/user-guide/bds-introduction
- MySQL/RDS → Lindorm(DTS):https://help.aliyun.com/zh/dts/user-guide/migrate-data-from-an-apsaradb-rds-for-mysql-instance-to-a-lindorm-instance
- HBase → Lindorm(LTS):https://help.aliyun.com/zh/lindorm/user-guide/synchronize-full-and-incremental-data
- Lindorm 数据订阅(→ Kafka 导出):https://help.aliyun.com/zh/lindorm/user-guide/real-time-data-subscription/
- 字段类型映射:https://help.aliyun.com/zh/lindorm/developer-reference/basic-data-types
---
## 迁移方案选择
| 源端 | 推荐方案 | 支持全量+增量 | 关键限制 |
|------|---------|-------------|---------|
| MySQL/RDS | **DTS**(新用户推荐) | ✅ | ⚠️ 不支持自动表结构迁移,需预先在Lindorm手动建表 |
| HBase 1.x/2.x | **LTS** | ✅ | Bulkload写入的数据不会被增量同步 |
| Lindorm → Lindorm | **LTS** | ✅ | 需确保网络连通 |
| Kafka → Lindorm | **LTS 流式通道** | ✅ | — |
| 自建/特殊需求 | 开源工具(Canal/Sqoop/脚本) | 视工具 | Lindorm非完整MySQL,部分工具可能因DDL差异失败 |
> ⚠️ LTS原有的RDS同步功能已于2023年3月10日下线,之后购买的LTS实例不再支持MySQL迁移。新用户请用DTS。
---
## 方案A:DTS 迁移(MySQL/RDS → Lindorm)
### 步骤
1. **进入 DTS 控制台** → https://dts.console.aliyun.com/ → 数据迁移 → 创建任务
2. **配置源库与目标库**:源库类型 MySQL,目标库 Lindorm(需已开通宽表引擎 MySQL 兼容地址)
3. **预先在 Lindorm 手动建表**(⚠️ DTS 不支持自动表结构迁移,参考字段类型映射处理 ENUM→VARCHAR 等类型转换)
4. **预检查**:系统自动检查源库连通性、账号权限、binlog配置(增量迁移需 binlog_format=ROW)
5. **启动迁移**:可在控制台实时查看进度和数据量
6. **验证数据**:在 Lindorm 执行 `SELECT COUNT(*) FROM your_table;` 与源库对比
### 注意事项
- 全量迁移期间请勿修改源库表结构
- 增量同步前提:MySQL 端必须开启 binlog(binlog_format=ROW)
- 建议在业务低峰期执行
---
## 方案B:LTS 迁移(HBase → Lindorm)
### 步骤
1. **开通 LTS 服务**:Lindorm 控制台 → 目标实例 → 数据生态服务 → 开通 LTS
2. **创建迁移任务**:LTS 操作页面 → 导入Lindorm/HBase → 一键迁移 → 创建任务
3. **配置源端与目标端**:源端 HBase 集群(需与 LTS 网络连通),目标端 Lindorm 宽表引擎;选择 ☑️ 表结构迁移 + ☑️ 历史数据迁移 + ☑️ 实时数据复制
4. **监控与验证**:实时查看迁移进度,LTS 数据抽样校验,业务验证后执行流量切换
### 注意事项
- 确保 LTS、源 HBase、目标 Lindorm 三者网络已打通
- 增量同步基于 HBase WAL,Bulkload 写入的数据不会被同步
- 迁移前确认目标实例存储空间充足
---
## 方案C:开源工具迁移
### C-1:Canal(MySQL → Lindorm 实时同步)
解析 MySQL binlog 实时同步到 Lindorm。
1. 部署 Canal Server
2. 配置 Canal 监听 MySQL binlog
3. 使用 Canal Adapter 将数据写入 Lindorm
参考:https://github.com/alibaba/canal
### C-2:Sqoop(Hadoop → Lindorm 批量导入)
> ⚠️ Lindorm 宽表引擎是MySQL兼容而非完整MySQL,Sqoop可能因DDL差异失败。
```bash
# 连接地址根据实例 ServiceType 选择(见 sql-client-guide.md →「连接域名格式」)
# V1/V2 MySQL 协议端口均为 33060
sqoop export \
--connect jdbc:mysql://<您的连接地址>:<端口>/default \
--table my_table \
--export-dir /user/hive/warehouse/my_table \
--input-fields-terminated-by '\t'
```
### C-3:自定义脚本
适用于数据量小或特殊转换需求。示例:
1. 从源库导出数据(mysqldump、HBase Export)
2. 数据清洗和格式转换
3. 使用 Lindorm SDK 或 MySQL 协议批量写入
Python 示例(MySQL → Lindorm):
```python
import pymysql
# 1. 读取 MySQL 数据
mysql_conn = pymysql.connect(host='...', user='...', password='...', database='source_db')
cursor = mysql_conn.cursor()
cursor.execute("SELECT * FROM source_table")
# 2. 写入 Lindorm(MySQL 协议)
# 连接地址根据实例 ServiceType 选择(见 sql-client-guide.md)
# V1/V2 MySQL 协议端口均为 33060(域名格式见 sql-client-guide.md)
lindorm_conn = pymysql.connect(
host='<您的连接地址>',
port=<端口>,
user='root',
password='your-password',
database='default'
)
lindorm_cursor = lindorm_conn.cursor()
for row in cursor:
placeholders = ','.join(['%s'] * len(row))
lindorm_cursor.execute(f"INSERT INTO target_table VALUES ({placeholders})", row)
lindorm_conn.commit()
```
---
## 常见问题
| 问题 | 原因 | 解决方法 |
|------|------|---------|
| 连通性检查失败 | 网络不通/白名单未配置 | 确认同一VPC或已配公网;添加源库IP到Lindorm白名单 |
| 表结构不兼容 | MySQL类型无Lindorm对应 | ENUM→VARCHAR;TEXT/BLOB超2MB需拆分;必须有主键 |
| 增量同步断开 | MySQL binlog被清理 | 增大binlog保留时间;重新配置DTS任务 |
| 目标存储不足 | 数据量超预期 | 迁移前确认Lindorm存储空间(建议预留20GB+) |
---
## 关联场景
- 迁移后性能对比 → `monitoring-guide.md`
- 验证数据一致性 → `monitoring-guide.md` 查询写入指标
- 迁移任务监控 → LTS 控制台实时查看(暂无 API 支持)
FILE:references/02-ops/error-troubleshoot.md
# 错误排查场景
当用户遇到错误码、异常、报错信息时,按本指南执行。
## 触发条件
- "报错了:InvalidParameter.InstanceId"
- "连接失败:Connection refused"
- "为什么提示 AuthenticationFailed?"
- "SQL 报错:error code 1045"
- "这个错误是什么意思?"
## 核心原则
1. **Agent 提取关键信息**:错误码、错误含义、解决步骤
2. **提供可执行的排查步骤**,不只是抛出文档链接
3. **结合实例状态**,给出针对性建议
---
## 错误类型分类
| 错误类型 | 示例 | 处理方式 |
|---------|------|---------|
| **CLI/API 错误** | InvalidParameter, Forbidden.RAM, InstanceNotFound | 速查表一 |
| **连接错误** | Connection refused, Timeout, Retry exhausted | 速查表二 |
| **SQL 错误** | Error 1045, 1146, 1064, 3024 | 速查表三 |
| **性能/限流** | Quota exceeded, Too many connections | 速查表四 |
---
## 速查表一:CLI/API 错误码
| 错误码 | 含义 | 可能原因 | 解决方法 |
|--------|------|---------|---------|
| `InvalidParameter.InstanceId` | 实例ID无效 | ID格式错误、不存在、已释放 | 检查格式(应为 ld-xxx),用 `get-lindorm-instance --instance-id <id>` 确认是否存在 |
| `InstanceNotFound` | 实例不存在 | ID错误、已释放 | 用 `get-lindorm-instance --instance-id <id>` 确认 |
| `InstanceStatusInvalid` | 实例状态不支持操作 | 实例非运行中 | 等待实例变为 ACTIVATION |
| `InstanceLocked` | 实例被锁定 | 欠费、安全原因、运维中 | 检查账户余额、联系技术支持 |
| `QuotaExceeded` | 配额不足 | 超过地域配额限制 | 提交工单申请提额 |
| `Instance.IsNotValid` | 实例ID格式合法但不存在 | ID错误、已释放 | 通过 `get-instance-summary` 确认 |
| `InvalidAccessKeyId.NotFound` | AccessKey不存在 | AK ID错误 | 检查 AK 配置 |
| `SignatureDoesNotMatch` | 签名错误 | AK Secret错误 | 验证 Secret 正确性 |
| `Forbidden.RAM` | RAM权限不足 | 缺少Lindorm权限 | 添加 `AliyunLindormReadOnlyAccess` 权限 |
| `UnauthorizedOperation` | 未授权操作 | 缺少特定操作权限 | 联系主账号授权 |
| `Throttling.User` | 用户级别限流 | 请求频率过高 | 降低频率、添加重试机制 |
| `Throttling.System` | 系统级别限流 | 系统负载高 | 稍后重试 |
---
## 速查表二:连接错误
### 网络错误
| 错误信息 | 原因 | 排查步骤 |
|---------|------|---------|
| `Connection refused` | 端口不通 | 1. 检查白名单 2. 检查安全组 3. telnet 测试端口 |
| `Connection timeout` | 网络不可达 | 1. 确认网络类型(VPC/公网)2. 检查路由 3. ping 测试 |
| `Unknown host` | DNS解析失败 | 检查连接地址拼写 |
| `Network is unreachable` | 网络不通 | 确认 VPC 配置、检查跨 VPC/跨地域访问 |
### 认证错误(SQL层面)
| 错误信息 | 原因 | 解决方法 |
|---------|------|---------|
| `Authentication failed` (Error 1045) | 用户名/密码错误 | 检查密码,遗忘请去控制台重置 |
| `Access denied` (Error 1227) | 权限不足 | 检查用户权限配置 |
### HBase 客户端错误(官方连接报错)
| 错误信息 | 原因 | 解决方法 |
|---------|------|---------|
| `Retry exhausted when update config from seedserver` | 连接地址或端口错误(官方:HBase兼容端口为30020) | 确认端口正确(端口见 SKILL.md →「端口号速查表」),检查网络 |
| `Failed to connect to jdbc:lindorm:table:url=****` | SQL连接失败(⚠️ Avatica 协议仅存量维护) | 确认端口正确,建议迁移到 MySQL 协议 |
| `DoNotRetryIOException: Detect inefficient query` | 全表扫描被拦截(官方:WHERE条件列无索引) | 见 sql-usage-notes.md →「低效查询拦截」 |
> 官方连接排查顺序:白名单 → 安全组 → 网络类型匹配 → 公网/专线
### 时序引擎错误
| 错误信息 | 原因 | 解决方法 |
|---------|------|---------|
| `TSDB error: table not found` | 表不存在 | 检查表名 |
| `TSDB error: invalid timestamp` | 时间戳格式错误 | 使用毫秒级时间戳 |
| `Write limit exceeded` | 写入限流 | 降低写入频率、扩容 |
---
## 速查表三:SQL 错误码(官方参考)
以下为官方文档中的 MySQL 兼容错误码和 Lindorm 扩展错误码,完整列表见 https://help.aliyun.com/zh/lindorm/developer-reference/common-error-codes-reference
### 常见 MySQL 兼容错误码
| 错误码 | 含义 | 官方处理建议 |
|--------|------|------------|
| 1040 | 连接数过多(单节点) | 重新审视连接使用方式;2.7.0.0前上限1000,2.7.0.0+上限4000 |
| 1045 | 认证失败 | 确认认证信息是否正确 |
| 1049 | 未知 Database | 指定正确的 Database 名称 |
| 1050 | 表已存在 | 使用其他表名或 IF NOT EXISTS |
| 1054 | 未知列名 | 确认 SQL 中的列名是否实际存在 |
| 1064 | SQL语法错误 | 参考 SQL 语法文档校正 |
| 1146 | 表不存在 | 检查表名是否输入有误 |
### 常见 Lindorm 扩展错误码
| 错误码 | 含义 | 官方处理建议 |
|--------|------|------------|
| 3024 | 查询超时 | 重试;若仍超时联系技术支持 |
| 8005 | 表不存在 | 检查表名 |
| 8008 | 超出资源限制 | 查询:缩小时间范围/增加WHERE;写入:限制TPS |
| 9001 | 不支持的语法 | 对照SQL语法文档,避免使用不支持的语法 |
| 9003 | 认证方法不适用于旧版本用户 | 参考MySQL协议兼容文档,指定支持的认证方法 |
---
## 速查表四:监控指标异常
| 指标异常 | 可能原因 | 排查建议 | V2 注意 |
|---------|---------|---------|--------|
| `cpu_idle < 10%` | CPU使用率过高 | 查询慢查询日志、考虑扩容 | — |
| `mem_used_percent > 90%` | 内存紧张 | 检查缓存配置、扩容 | ⚠️ V2返回空,需用 `1 - mem_free/mem_total` 计算 |
| `storage_used_percent > 85%` | 存储空间不足 | 清理数据或扩容 | ⚠️ V2需用 `get-lindorm-v2-storage-usage` API |
| `read_rt_p99 > 1000ms` | 查询延迟高 | 分析慢查询、创建索引 | ⚠️ V2返回空 |
| `write_rt_p99 > 500ms` | 写入延迟高 | 检查写入模式、考虑扩容 | ⚠️ V2返回空 |
---
## 常见错误处理示例
### 示例:InvalidParameter.InstanceId
```
【错误分析】InvalidParameter.InstanceId
【错误含义】实例 ID 参数无效
【可能原因】
1. 实例 ID 格式不正确(应为 ld-xxx)
2. 实例不存在或已释放
【排查步骤】
步骤 1:验证 ID 格式 — 确认以 ld- 开头
步骤 2:确认是否存在 — aliyun hitsdb get-lindorm-instance --instance-id <id>
(API 自动定位地域,无需指定 --region)
【官方文档】https://help.aliyun.com/zh/lindorm/developer-reference/common-error-codes-reference
```
### 示例:认证失败(Error 1045)
```
【错误分析】Access denied / Authentication failed (Error 1045)
【错误含义】用户名或密码不正确
【排查步骤】
步骤 1:确认密码正确(如遗忘请去控制台重置)
步骤 2:检查 RAM 权限 — 确保包含 AliyunLindormReadOnlyAccess
步骤 3:验证修复 — aliyun hitsdb get-instance-summary
【常见错误】
❌ 使用子账号但未授权 Lindorm 权限
❌ 实例 ID 格式错误或实例不存在
```
---
## 官方文档
- **错误码参考**:https://help.aliyun.com/zh/lindorm/developer-reference/common-error-codes-reference
- **连接报错与解决**:https://help.aliyun.com/zh/lindorm/support/lindorm-connection-errors-and-solutions
- **连接问题与解决**:https://help.aliyun.com/zh/lindorm/support/lindorm-connection-issues-and-solutions
- **SQL FAQ**:https://help.aliyun.com/zh/lindorm/developer-reference/sql-faq
- **技术支持**:通过阿里云工单系统提交
---
## 关联场景
- 连接问题 → `connection-troubleshoot.md`
- 性能问题 → `monitoring-guide.md`
- SQL 使用 → `sql-usage-notes.md`
FILE:references/02-ops/instance-management.md
# 实例管理场景
覆盖实例查询(列表、详情、引擎、存储)与扩缩容知识告知。
## 触发条件
- "我有哪些 Lindorm 实例?"
- "列出 cn-shanghai 的所有实例"
- "ld-xxx 这个实例是什么配置?"
- "这个实例开了哪些引擎?"
- "磁盘还剩多少空间?"
- "实例存储快满了,怎么扩容?"
- "需要加配置,怎么操作?"
---
## 查询流程
### 流程 1:列出所有实例
**适用场景**:用户想查看某个地域下的所有实例,或不知道具体实例 ID。
**地域策略**:
- **默认行为**:用户未指定地域时,默认查询 `cn-shanghai`(华东2-上海),并**必须明确告知**"本次查询的是上海地域"
- **扩展查询**:用户说"所有地域/不确定/可能在其他地域"时,先执行 `get-instance-summary` 获取全地域概览,再按需逐地域查询
**执行命令**:
```bash
# 查询指定地域的实例列表(--region 必需)
aliyun hitsdb get-lindorm-instance-list --region cn-shanghai
# 查询全地域实例概览(无需 --region)
aliyun hitsdb get-instance-summary
# 查询所有地域
aliyun hitsdb describe-regions
```
**关键字段说明**:
| 字段 | 含义 | 常见值 |
|------|------|--------|
| InstanceId | 实例 ID | `ld-xxx` |
| InstanceAlias | 实例别名 | 用户自定义名称 |
| InstanceStatus | 实例状态 | `ACTIVATION`(运行中)<br>`CREATING`(创建中)<br>`STOPPED`(已停止) |
| PayType | 付费类型 | `POSTPAY`(按量)<br>`PREPAY`(包年包月) |
| RegionId | 地域 ID | `cn-shanghai` |
| ZoneId | 可用区 ID | `cn-shanghai-e` |
| NetworkType | 网络类型 | `vpc` |
---
### 流程 2:查询实例详情
**适用场景**:用户想了解某个实例的完整配置信息。
**执行命令**:
```bash
aliyun hitsdb get-lindorm-instance --instance-id <instance-id>
```
**参数说明**:
- `--instance-id`:实例 ID(必填)
- `--region`:地域 ID(可选,根据 instance-id 自动定位)
**关键字段说明**:
| 分类 | 字段 | 含义 |
|------|------|------|
| **基本** | InstanceId / InstanceAlias / InstanceStatus / CreateTime / ExpireTime | 实例 ID、别名、状态、创建时间、到期时间 |
| **网络** | VpcId / VswitchId / NetworkType | VPC、交换机、网络类型 |
| **存储** | InstanceStorage / DiskCategory / DiskUsage / ColdStorage | 存储容量(GB)、磁盘类型、使用率(%)、冷存储容量 |
| **引擎** | EngineList / EnableLTS / EnableSearch | 引擎列表、时序/搜索开关 |
---
### 流程 3:查询实例引擎列表
**适用场景**:用户想知道实例开了哪些引擎、每个引擎的规格和版本。
**执行命令**:
```bash
aliyun hitsdb get-lindorm-instance-engine-list --instance-id <instance-id>
```
**关键字段说明**:
| 字段 | 含义 |
|------|------|
| EngineType | 引擎类型,详见 SKILL.md →「引擎类型」 |
| Version | 当前版本 |
| LatestVersion | 最新可升级版本 |
| CpuCount | CPU 核心数 |
| MemorySize | 内存大小(GB) |
| CoreCount | 节点数量 |
---
### 流程 4:查询存储详情
**适用场景**:用户想了解存储使用情况、冷热分层。
**执行命令**(根据版本选择):
```bash
# V1 实例
aliyun hitsdb get-lindorm-fs-used-detail --instance-id <instance-id>
# V2 实例
aliyun hitsdb get-lindorm-v2-storage-usage --instance-id <instance-id>
```
**关键字段说明**:
**V1 实例**(`get-lindorm-fs-used-detail`):
| 字段 | 含义 |
|------|------|
| FsCapacity | 文件引擎总容量(bytes) |
| FsCapacityHot / FsCapacityCold | 热/冷存储容量(bytes) |
| FsUsedHot / FsUsedCold | 热/冷存储已使用(bytes) |
| FsUsedOnLindormTable | Lindorm 宽表已使用量 |
| FsUsedOnLindormTableData | 宽表数据量 |
| FsUsedOnLindormTableWAL | WAL 日志量 |
**V2 实例**(`get-lindorm-v2-storage-usage`):
| 字段 | 含义 |
|------|------|
| UsageByDiskCategory[] | 按磁盘类型的使用详情数组 |
| └ diskType | 磁盘类型(`PerformanceCloudStorage` / `CapacityCloudStorage`) |
| └ capacity | 容量(bytes) |
| └ used | 已使用(bytes) |
| └ usedLindormTable | 宽表已使用 |
| └ usedLindormTsdb | 时序已使用 |
| CapacityByDiskCategory[] | 按磁盘类别的容量信息数组 |
| └ category | 类别(`PERF_CLOUD_ESSD_PL1` / `REMOTE_CAP_OSS` 等) |
| └ capacity | 容量(GB) |
---
## 扩缩容知识
**⚠️ 只读 Skill 不执行扩缩容变更命令**,以下为知识告知,引导用户在控制台操作。
### 扩容方式对比
| 瓶颈类型 | 方案 | 生效时间 | 业务影响 |
|---------|------|---------|---------|
| 存储不足 | 存储扩容(在线) | 5-10 分钟 | 无影响 |
| QPS 不足 | 增加节点数(水平扩展) | 10-20 分钟 | 无影响 |
| 单查询延迟高 | 升级节点规格(垂直扩展) | ~30 分钟(滚动重启) | 建议低峰操作 |
操作路径:Lindorm 控制台 → 实例详情 → 变配
### 扩缩容约束
- 缩容需满足:已使用空间 < 目标容量
- 24 小时内最多变配 3 次,两次间隔至少 1 小时
- 扩容失败(库存不足)时可换可用区或换规格
### 官方文档
- 管理存储空间:https://help.aliyun.com/zh/lindorm/user-guide/manage-storage-space/
- 变更容量型云存储容量:https://help.aliyun.com/zh/lindorm/user-guide/expand-cold-storage
- 计费模式说明:https://help.aliyun.com/zh/lindorm/product-overview/billing
---
## 缺参处理
| 缺参 | 策略 |
|------|------|
| 缺 region | 默认查询 `cn-shanghai`,主动告知本次查询地域;用户说"所有地域"时先用 `get-instance-summary` |
| 缺 instance-id | 先列出实例列表,让用户选择 |
---
## 错误处理
| 错误 | 原因 | 引导 |
|------|------|------|
| 实例不存在 | 实例 ID 错误或已释放 | 用 `get-lindorm-instance-list` 确认实例 ID |
| 地域不匹配 | 实例在其他地域 | 提示用户指定正确地域 |
| 权限不足 | AK 无 Lindorm 权限 | 需要 `AliyunLindormReadOnlyAccess` 权限 |
---
## 关联场景
- 扩容前性能分析 → `monitoring-guide.md`
- 存储使用详情 → `storage-analysis.md`
- 扩容后监控设置 → `monitoring-guide.md`
FILE:references/02-ops/monitoring-guide.md
# 监控与报警场景
覆盖 Lindorm 监控数据查询、指标说明和报警配置。
## 触发条件
- "CPU 使用率多少?"
- "最近 3 小时的 CPU 趋势"
- "怎么配置报警?"
- "存储快满了,能设置自动通知吗?"
---
## 查询监控数据
### 查询最新数据
```bash
aliyun cms describe-metric-last \
--namespace acs_lindorm \
--metric-name <metric-name> \
--dimensions '[{"instanceId":"<instance-id>"}]'
```
**返回值说明**:
- `Datapoints` 是 JSON **字符串**(不是数组),需要二次解析
- 每个数据点含 `instanceId`、`host`(节点名,如 table-1/search-1/zk-1)、`userId`、`timestamp`、`Average`/`Maximum`/`Minimum`
- 一个实例会返回**多个数据点**(每个节点一个),需要聚合或筛选 host
### 查询历史趋势
```bash
aliyun cms describe-metric-data \
--namespace acs_lindorm \
--metric-name <metric-name> \
--dimensions '[{"instanceId":"<instance-id>"}]' \
--start-time "<start-time>" \
--end-time "<end-time>" \
--period <period>
```
**返回值说明**:
- `Datapoints` 同样是 JSON 字符串,需二次解析
- 按 period 聚合后无 `host` 维度,返回全实例级别的聚合值
- 每个数据点含 `timestamp`、`Average`/`Maximum`/`Minimum`
**参数**:
- `--namespace`:固定 `acs_lindorm`
- `--metric-name`:见下方指标分类,V2 实例部分指标无数据或需换算
- `--dimensions`:JSON 数组格式,必须指定 `instanceId`
- `--start-time` / `--end-time`:时间格式见 SKILL.md →「时间格式」
**period 建议**:
| 时间范围 | period | 说明 |
|---------|--------|------|
| ≤ 1 小时 | 60 | 1 分钟粒度 |
| ≤ 24 小时 | 300 | 5 分钟粒度 |
| > 24 小时 | 3600 | 1 小时粒度 |
**相对时间处理**:用户说"最近 3 小时"时,先执行 `date "+%Y-%m-%d %H:%M:%S"` 获取当前时间,再计算 start/end。默认时间范围:最近 1 小时。
**数据保留**:云监控最多保留 30 天,更长周期需开通日志服务存储。
---
## 指标说明
> 统一使用**无前缀指标**(如 `cpu_idle`),兼容性最佳。`lindorm_multi_` 前缀指标仅 `lindorm_multizone` 实例可用,其他实例类型不可用。
### CPU 指标
| 指标 | 描述 | 单位 | 正常范围 | 告警阈值 |
|------|------|------|---------|---------|
| `cpu_idle` | CPU 空闲率 | % | > 20% | < 20% |
| `cpu_user` | CPU 用户使用率 | % | < 80% | > 80% |
| `cpu_system` | CPU 系统使用率 | % | < 30% | > 30% |
| `cpu_wio` | CPU IO 等待 | % | < 30% | > 30% |
CPU 空闲率 < 20%:CPU 瓶颈;cpu_wio > 30%:磁盘 IO 瓶颈
> **CPU + Load 综合判断**:Load > CPU 核数 = 处理排队、亚健康状态;CPU 利用率不高但 Load 偏高 = 磁盘使用率过高(与 cpu_wio 配合判断)
### 内存指标
| 指标 | 描述 | 单位 | V1 | V2 |
|------|------|------|----|----|
| `mem_used_percent` | 内存使用率 | % | ✅ | ⚠️ 无数据 |
| `mem_total` | 内存总量 | bytes | ✅ | ✅ |
| `mem_free` | 空闲内存 | bytes | ✅ | ✅ |
| `mem_buff_cache` | 缓存大小 | bytes | ✅ | ✅ |
V2 内存使用率需用 `1 - mem_free / mem_total` 计算。
> **Java 内存特性**:Lindorm 与 HBase 均为 Java 实现,70%-85% 内存使用率是健康常态(JVM BlockCache/MemStore/HDFS 页缓存),92% 以上才需关注。不建议对 80% 设置报警。
>
> **宽表引擎堆内存报警**:宽表计算节点内存使用比率 ≥ 85%~90% 且持续 30~60 分钟时建议报警。堆内存短期波动是正常的(GC 会回收),只有持续过高才需关注。
### QPS 指标
| 指标 | 描述 | 单位 |
|------|------|------|
| `read_ops` | 读请求量 | ops/s |
| `write_ops` | 写请求量 | ops/s |
| `get_num_ops` | Get 每秒请求数 | ops/s |
| `put_num_ops` | Put 每秒请求数 | ops/s |
| `scan_num_ops` | Scan 每秒请求数 | ops/s |
> **QPS 指标解读**:
> - BatchGet 无论包含多少行都只算一次点查调用,因此使用 BatchGet 时 avg RT 会高于单行 Get
> - Scan 请求会被拆分为子调用以流式返回,Scan QPS ≠ 实际 Scan 请求数,Scan RT 是每个子调用的平均耗时
### 延迟指标
| 指标 | 描述 | 单位 | 正常范围 | 告警阈值 | V2 |
|------|------|------|---------|---------|-----|
| `read_rt` | 读平均延迟 | ms | < 10ms | > 50ms | ✅ |
| `write_rt` | 写平均延迟 | ms | < 10ms | > 50ms | ✅ |
| `get_rt_avg` | Get 平均延迟 | ms | < 10ms | > 50ms | ✅ |
| `put_rt_avg` | Put 平均延迟 | ms | < 10ms | > 50ms | ✅ |
| `get_rt_p99` | Get P99 延迟 | ms | < 50ms | > 100ms | ⚠️ 无数据 |
| `put_rt_p99` | Put P99 延迟 | ms | < 50ms | > 100ms | ⚠️ 无数据 |
| `scan_rt_avg` | Scan 平均延迟 | ms | < 50ms | > 200ms | ✅ |
| `scan_rt_p99` | Scan P99 延迟 | ms | < 200ms | > 500ms | ✅ |
### 存储指标
| 指标 | 描述 | 单位 | 告警阈值 | V2 |
|------|------|------|---------|-----|
| `hot_storage_used_percent` | 热存储使用率 | % | > 80% | ⚠️ 无数据 |
| `hot_storage_used_bytes` | 热存储使用量 | bytes | — | ✅ |
| `cold_storage_used_percent` | 冷存储使用率 | % | > 80% | ⚠️ 无数据 |
| `cold_storage_used_bytes` | 冷存储使用量 | bytes | — | ✅ |
| `storage_used_percent` | 总存储使用率 | % | > 80% | ⚠️ 无数据 |
| `storage_used_bytes` | 总存储使用量 | bytes | — | ✅ |
V2 存储使用率需通过 `get-lindorm-v2-storage-usage` API 获取,云监控无数据。
> ⚠️ **存储硬限制**:热存储或冷存储水位 ≥ 95% 时,系统自动禁止数据写入。建议容量告警线设置在 75%~80%,提前扩容避免影响业务。
### 网络与负载指标
| 指标 | 描述 | 单位 |
|------|------|------|
| `bytes_in` | 网络流入 | bytes/s |
| `bytes_out` | 网络流出 | bytes/s |
| `load_one` | 1 分钟平均负载 | — |
| `load_five` | 5 分钟平均负载 | — |
| `handler_queue_size` | Handler 队列长度 | — |
| `compaction_queue_size` | Compaction 队列长度 | — |
> **Compaction 队列判断**:长期稳定在某值 = 健康(白天高峰积压、晚上低谷自动处理是正常周期);持续上涨且无下降趋势 = 资源不足需扩容。长期积压会导致文件数增多、读 RT 升高,严重时出现反压写导致写 RT 增加甚至超时。
> **网络/磁盘限流**:不同规格实例有不同的网络带宽和磁盘 IOPS 上限(参见[实例规格族](https://help.aliyun.com/zh/lindorm/product-overview/instance-types)和[块存储性能](https://help.aliyun.com/zh/block-storage/product-overview/block-storage-performance)),读写流量超过云盘带宽上限会限流导致业务受损。
### 引擎专用指标
| 引擎 | 指标前缀 | 关键指标 |
|------|---------|---------|
| LSQL(宽表SQL) | `lql_` | `lql_select_ops`、`lql_upsert_ops`、`lql_select_avg_rt`、`lql_select_p99_rt`、`lql_connection` |
| 搜索引擎 | `search_` | `search_cpu_idle`、`search_mem_used_percent`、`search_select_1minRate`、`search_select_p99_rt` |
| 时序引擎 | `tsdb_` | `tsdb_datapoints_added`、`tsdb_disk_used`、`tsdb_jvm_used_percent` |
| 文件引擎 | `Lindorm_File_` | `Lindorm_File_ReadBandwidth`、`Lindorm_File_WriteBandwidth`、`Lindorm_File_ReadLatency`、`Lindorm_File_WriteLatency` |
引擎专用指标需对应引擎已开通才返回数据。完整 323 个指标可通过 `aliyun cms describe-metric-meta-list --namespace acs_lindorm` 查询。
### 业务词到指标映射
| 用户说 | 指标 |
|--------|------|
| CPU 使用率 | `cpu_idle`(空闲率,使用率 = 100 - cpu_idle) |
| 内存使用率 | `mem_used_percent`(V1)/ `1 - mem_free / mem_total`(V2) |
| QPS | `read_ops` + `write_ops` |
| 延迟/RT | `read_rt` 或 `write_rt` |
| P99 延迟 | `get_rt_p99` / `put_rt_p99`(V1)/ —(V2 无数据) |
| 存储使用率 | `hot_storage_used_percent`(V1)/ `get-lindorm-v2-storage-usage` API(V2) |
| 网络流量 | `bytes_in` + `bytes_out` |
| 负载 | `load_one` |
---
## 宽表引擎指标解读
### Region 数量建议
每个 Region 都占用元数据内存,数量过多会导致内存不足。
| 机器内存 | 单机建议 Region 数 |
|---------|-------------------|
| 8 GB | < 500 |
| 16 GB | < 1000 |
| 32 GB | < 2000 |
| 64 GB | < 3000 |
| 128 GB | < 5000 |
可通过宽表计算节点内存使用量 / 内存总量判断是否存在内存不足问题。减少 Region 数的方法:减少表数量、减少建表时预分区个数。
### 其他宽表指标解读
| 指标 | 含义 | 判断 |
|------|------|------|
| HandlerQueue 长度 | 请求排队数 | > 0 = 请求排队,服务器资源无法承载当前请求量,建议升级配置增加 CPU |
| Compaction 队列长度 | Compaction 任务排队数 | 长期稳定某值 = 健康;持续上涨 = 资源不足需扩容。长期积压会导致文件数增多、读 RT 升高,严重时反压写导致写 RT 增加甚至超时 |
| Region 平均文件数 | 分片内平均文件数 | 越多读 RT 越大;文件总数过多可能导致 Full GC 或 OOM |
| Region 最大文件数 | 单 Region 最大文件数 | 超限会反压写导致写超时(具体限制见[数据请求的限制](https://help.aliyun.com/zh/lindorm/product-overview/quotas-and-limits#p-fq0-foz-ocy)) |
| 写流量(KB/s) | 写入宽表引擎的流量 | 列转 KeyValue 形式,实际写入量大于业务写入量。写入吞吐过大可能导致 Compaction 积压 |
### 写入吞吐建议
写入吞吐过大可能导致 Compaction 积压,影响实例稳定性。
| 配置 | 建议写入吞吐 |
|------|-------------|
| 4C8G | < 5 MB/s |
| 8C16G | < 10 MB/s |
| 16C32G | < 30 MB/s |
| 32C64G | < 60 MB/s |
实际使用时结合 `compaction_queue_size` 和 Region 平均文件数综合判断。
---
## 报警配置
**⚠️ 安全边界:Agent 只查询数据、提供建议,不直接创建报警规则**(涉及联系人手机号/邮箱等敏感信息且需验证)
### 报警规则类型
| 类型 | 说明 | 示例 |
|------|------|------|
| **阈值报警** | 指标数值越界触发 | CPU 使用率 > 80% 持续 5 分钟 |
| **事件报警** | 系统事件触发 | NodeDown(节点宕机)、RegionError(Region 异常) |
事件报警无法通过指标阈值捕获,是运维关键能力。
### 配置路径
1. **Lindorm 控制台**:实例详情 → 告警管理 → 创建报警规则
2. **云监控控制台**:报警服务 → 报警规则 → 创建报警规则 → 选择产品"Lindorm"
### 报警规则关键参数
| 参数 | 说明 | 可选值 |
|------|------|--------|
| 统计周期 | 数据聚合粒度 | 60s / 300s / 900s / 1800s / 3600s / 86400s |
| 统计方法 | 数据聚合方式 | Average / Maximum / Minimum / Sum |
| 比较运算符 | 阈值比较方式 | >= / > / <= / < / != |
| 连续次数 | 连续满足条件的次数 | 1-100(推荐 1-3) |
| 静默期 | 触发后的通知静默时间 | 5min - 24h |
| 报警级别 | 三级独立阈值 | Critical / Warn / Info |
> **cpu_idle 报警方向**:cpu_idle 是空闲率,报警条件应设为 `cpu_idle ≤ 阈值`(即使用率 ≥ 100 - 阈值)
### 推荐阈值
| 指标 | V1 | V2 | Info(提醒) | Warn(警告) | Critical(严重) |
|------|----|----|-------------|-------------|----------------|
| CPU 使用率 | `cpu_idle ≤ 30` | `cpu_idle ≤ 30` | ≤ 30%(持续 30min) | ≤ 20%(持续 5min) | ≤ 5%(持续 3min) |
| 内存使用率 | `mem_used_percent` | `1 - mem_free/mem_total` | ≥ 85%(持续 30min) | ≥ 92%(持续 5min) | — |
| 热存储使用率 | `hot_storage_used_percent` | `get-lindorm-v2-storage-usage` API | ≥ 70% | ≥ 80% | ≥ 90% |
| P99 延迟 | `get_rt_p99` | —(无数据) | ≥ 50ms | ≥ 100ms | ≥ 200ms |
> V2 存储报警:云监控无 `hot_storage_used_percent` / `storage_used_percent` 数据,需通过 `get-lindorm-v2-storage-usage` API 获取,或在控制台配置基于存储空间的报警规则。
### 通知渠道
| 渠道 | 说明 | 适用级别 |
|------|------|---------|
| 短信 | 手机号需验证 | Warn / Critical |
| 邮件 | 邮箱需验证 | Info / Warn / Critical |
| 钉钉 | 群机器人 Webhook | Warn / Critical |
| 电话 | 语音报警 | Critical |
| Webhook | 自定义 HTTP 回调 | 全级别 |
| 阿里云 App | App 内推送 | Info / Warn / Critical |
联系人管理路径:云监控控制台 → 通知管理 → 联系人/联系组
### 常见报警问题
- **未收到通知**:检查联系人是否已验证、报警条件是否真正触发、通知时段是否覆盖、通知策略是否关联规则
- **报警过多**:提高阈值、增加持续时间、配置静默期(5min-24h)、使用复合条件(多指标同时满足才报警)
---
## 错误处理
| 错误 | 原因 | 引导 |
|------|------|------|
| 无数据点 | 时间范围内实例未运行或无流量 | 调整时间范围或检查实例状态 |
| 权限不足 | AK 无云监控权限 | 需要 `AliyunCloudMonitorReadOnlyAccess` |
| 指标不存在 | 指标名错误或该实例类型不支持 | 先验证指标可用性 |
---
## 官方文档
- 报警配置:https://help.aliyun.com/zh/lindorm/user-guide/manage-alerts
- 创建报警规则:https://help.aliyun.com/zh/lindorm/user-guide/create-an-alert-rule
- 监控指标说明:https://help.aliyun.com/zh/lindorm/user-guide/instance-monitoring
- 监控报警最佳实践:https://help.aliyun.com/zh/lindorm/user-guide/monitoring-alarm-best-practices
---
## 关联场景
- 性能诊断后排查具体慢SQL → `slow-query-analysis.md`
- 存储分析 → `storage-analysis.md`
- 扩缩容 → `instance-management.md`
FILE:references/02-ops/slow-query-analysis.md
# 慢查询分析场景
当用户询问"有没有慢查询"、"查询慢是什么原因"、"怎么优化查询"、"怎么开启慢日志"时,按本指南执行。
## 触发条件
用户的典型表达:
- "有没有慢查询日志?"
- "怎么开启慢日志/慢查询记录"
- "查询慢,帮我分析原因"
- "Top 10 最慢的查询"
- "怎么优化查询性能?"
## 核心原则
**证据驱动的诊断流程**:
1. **先收集最小必要信息**(引擎类型、时间窗口、操作类型)
2. **用监控确认现象**(延迟指标趋势)
3. **要求用户提供证据**(慢 SQL、执行计划、慢日志)
4. **基于证据给针对性建议**(而非通用优化大全)
5. **提供可执行的优化代码**(从官方文档提取最佳实践)
> ⚠️ **禁止行为**:在缺少具体慢 SQL/执行计划/慢日志证据时,直接给出"5 类优化方案"或概率判断。
---
## 慢查询日志管理
> **版本要求**:
> - 实时定位/终止慢查询:宽表引擎 **2.6.3+**,Lindorm SQL **2.6.6+**
> - 慢查询回溯(慢日志记录):宽表引擎 **2.8.2.13+**
>
> 以下 SQL 均通过 Lindorm SQL 客户端执行(MySQL Workbench / DBeaver / Navicat / ClusterManager 均可),**立即生效,无需重启实例**。
### 1. 实时定位当前慢查询
```sql
-- 查看当前正在执行的所有查询(含查询 ID / UUID)
SHOW PROCESSLIST;
```
**说明**:返回结果中的 `ID` 字段为 UUID,用于后续终止操作。
### 2. 终止慢查询
```sql
-- 终止指定查询(ID 从 SHOW PROCESSLIST 获取)
KILL QUERY '581f9ab8-68af-4c93-b73a-eb99679ed192';
```
### 3. 开启慢查询回溯(Slow Log 记录)
**Step 1:开启功能**
```sql
ALTER SYSTEM SET SLOW_QUERY_RECORD_ENABLE = true;
```
**Step 2:设置慢查询阈值(单位:毫秒)**
```sql
-- 示例:超过 10 秒记录(按业务需要调整,不建议设置过小)
ALTER SYSTEM SET SLOW_QUERY_TIME_MS = 10000;
```
**Step 3:查询慢日志视图**
```sql
-- 查看最近 10 条慢查询
SELECT * FROM lindorm._slow_query_ LIMIT 10;
-- 按时间筛选(query_start_time 为 Unix 毫秒时间戳)
SELECT COUNT(sql_query_s) AS num
FROM lindorm._slow_query_
WHERE query_start_time >= 1680152319000;
```
**慢日志视图字段说明**:
| 字段名 | 说明 |
|--------|------|
| `query_start_time` | 查询请求的发起时间(Unix 毫秒时间戳) |
| `query_id` | 查询请求 ID |
| `sql_query_id` | 查询请求的 SQL 语句(无 SQL 则为空) |
| `sql_query_s` | SQL 语句内容(无 SQL 则为空) |
| `duration_i` | 查询执行时间 |
| `status_s` | 查询最终状态(成功/失败) |
| `ip_s` | 发送查询请求的 IP 地址 |
| `server_s` | 查询执行的节点 |
| `query_s` | 执行的内部查询请求语句 |
> ⚠️ **注意事项**:
> - 慢日志记录默认只保留 **1 小时**
> - 频繁记录慢查询对实例性能有一定影响,性能敏感场景不宜将阈值设置过小
> - 视图名 `lindorm._slow_query_` 固定,不可修改
**官方文档**:https://help.aliyun.com/zh/lindorm/developer-reference/slow-query-diagnostics
---
## 执行流程
### Step 0:强制收集最小信息(缺参则追问,不下结论)
**必须确认的信息**:
| 信息项 | 为什么必须 | 追问话术 |
|--------|-----------|---------|
| **引擎类型** | 宽表 SQL / HBase API / 时序 / 搜索引擎的优化策略完全不同 | "您使用的是哪个引擎?宽表 SQL、HBase API、时序引擎还是搜索引擎?" |
| **发生时间窗口** | 需要确定查询趋势和关联事件 | "慢查询大概从什么时候开始?持续多久了?" |
| **慢的操作类型** | 读/写/Scan/聚合的排查方向不同 | "是查询慢还是写入慢?具体的操作类型是什么?" |
| **具体 SQL/操作**(如有) | 没有具体语句无法给出针对性建议 | "能否提供一条具体的慢 SQL 或操作代码?" |
**缺参时的标准回复**:
```
我需要先了解一些信息,才能帮您准确诊断:
1. 您使用的是哪个引擎?(宽表 SQL / HBase API / 时序 / 搜索)
2. 慢查询大概从什么时候开始?
3. 是读操作慢还是写操作慢?
4. 能否提供一条具体的慢 SQL 或操作示例?
有了这些信息,我可以帮您:
- 查询相关监控指标确认问题
- 分析具体的执行计划
- 给出针对性的优化建议
```
---
### Step 1:用监控确认现象(avg vs p99 + 趋势)
**Agent 先查询延迟指标**:
```bash
# P99 延迟(更能反映慢请求)
aliyun cms describe-metric-data \
--namespace acs_lindorm \
--metric-name get_rt_p99 \
--dimensions '[{"instanceId":"<instance-id>"}]' \
--start-time "<start 格式: YYYY-MM-DD HH:MM:SS>" \
--end-time "<end 格式: YYYY-MM-DD HH:MM:SS>" \
--period 60
# 平均延迟
aliyun cms describe-metric-data \
--namespace acs_lindorm \
--metric-name read_rt \
--dimensions '[{"instanceId":"<instance-id>"}]' \
--start-time "<start 格式: YYYY-MM-DD HH:MM:SS>" \
--end-time "<end 格式: YYYY-MM-DD HH:MM:SS>" \
--period 60
```
**Agent 分析后给客观描述**(不下结论、不给概率):
```
【延迟指标观察】实例 ld-xxx(最近 1 小时)
- 平均延迟:35ms
- P99 延迟:180ms
- P99 与平均值差距:5.1 倍
【客观描述】
延迟分布存在明显长尾,部分请求耗时显著高于平均水平。
【下一步】
要确定具体原因,需要查看具体的慢查询证据。请选择以下任一方式:
方式 1(推荐):提供具体慢 SQL
- 如果您知道哪条 SQL 慢,请提供 SQL 文本
- 我可以帮您分析执行计划
方式 2:查看 ClusterManager 慢查询日志
- 路径:控制台 → 实例 → ClusterManager → 查询分析 → 慢查询日志
- 找到 Top 慢查询后告诉我查询特征
方式 3:提供应用侧观察
- 您的应用日志中是否有具体报错或超时的操作?
```
---
### Step 2:要求用户提供证据(满足其一即可)
**证据类型**(至少提供一种):
| 证据类型 | 适用场景 | 如何获取 |
|---------|---------|---------|
| **慢 SQL + EXPLAIN** | 宽表 SQL 慢查询 | 用户提供 SQL 文本,Agent 指导执行 EXPLAIN |
| **ClusterManager 慢日志** | 所有引擎 | 控制台 → ClusterManager → 查询分析 → 慢查询日志 |
| **HBase Scan/Get 参数** | HBase API 慢查询 | 用户提供代码中的 Scan/Get 配置 |
| **应用错误日志** | 有明确报错场景 | 应用侧的超时/错误日志 |
**未提供证据时的标准回复**:
```
目前我能看到延迟分布存在长尾,但无法确定具体原因。
请提供以下任一证据,我才能给出针对性优化建议:
选项 A:一条具体的慢 SQL(如果您知道是哪条)
选项 B:ClusterManager 中的 Top 慢查询截图/文本
选项 C:您的应用日志中相关的错误信息
⚠️ 注意:在没有具体证据的情况下,我不建议直接应用通用优化方案,
因为不同原因(全表扫描、热点、大查询等)的优化方向可能完全不同。
```
---
### Step 3:基于证据分类输出(只给命中的 1-2 条策略)
**收到证据后,根据证据类型选择对应优化方向**:
| 证据特征 | 问题类型 | 优化方向 |
|---------|---------|----------|
| EXPLAIN 显示 TABLE FULL SCAN,扫描行数大(> 100万) | 全表扫描 | 添加 RowKey 范围条件、创建二级索引 |
| 单行查询耗时长,返回数据量大(> 1MB) | 查询大量列 | 只 SELECT 需要的列,避免 `SELECT *` |
| 少量 RowKey 延迟极高,CPU 节点分布不均 | RowKey 热点 | RowKey 加盐/反转分散,创建全局索引 |
| 批量 Get/Scan 超时或内存溢出 | 大数据量批量查询 | 分批查询(batchSize 50-100),SQL 用 LIMIT 分页 |
| 查询非主键列性能极差,随数据量线性变慢 | 缺少索引 | 创建二级索引(`CREATE INDEX`)或全局索引(`CREATE GLOBAL INDEX`) |
> 根据用户提供的证据,**只输出命中的 1-2 条策略**,并给出对应的可执行 SQL/代码示例。
---
### Step 4:告知用户如何获取证据(如用户无法提供)
**给出精确路径,不让用户自己摸索**:
```
【查看具体慢查询日志】
虽然我已给您提供了通用优化方案,但如果您想查看**具体哪条查询最慢**,
请通过 ClusterManager 查看:
📍 确路径:
1. 登录控制台:https://lindorm.console.aliyun.com/
2. 点击实例 ID "ld-xxx" 进入实例详情
3. 左侧菜单:数据库连接 → 点击"通过 ClusterManager 访问"
4. 进入 ClusterManager → 查询分析 → 慢查询日志
5. 查看:
- Top 10 最慢查询
- 查询耗时、扫描行数
- 查询发生时间
【查看到慢查询后】
告诉我具体的查询类型(Scan/Get/RowKey 特征),我来给出针对性优化方案。
📚 ClusterManager 使用文档:
https://help.aliyun.com/zh/lindorm/user-guide/log-in-to-the-cluster-management-system
```
---
### Step 5:综合性能诊断(有证据后)
**Agent 结合延迟指标 + CPU/内存/QPS,给出综合诊断**:
```bash
# 查询 CPU 空闲率
aliyun cms describe-metric-last \
--namespace acs_lindorm \
--metric-name cpu_idle \
--dimensions '[{"instanceId":"<instance-id>"}]'
# 查询内存使用率
aliyun cms describe-metric-last \
--namespace acs_lindorm \
--metric-name mem_used_percent \
--dimensions '[{"instanceId":"<instance-id>"}]'
# 查询 QPS
aliyun cms describe-metric-last \
--namespace acs_lindorm \
--metric-name read_ops \
--dimensions '[{"instanceId":"<instance-id>"}]'
```
**有证据后的综合分析示例**:
```
【综合诊断】实例 ld-xxx(最近 1 小时)
📊 性能指标:
- P99 延迟:180ms
- 平均延迟:35ms
- P99/avg 倍数:5.1x
- CPU 空闲率:65%
- 内存使用率:55%
🔍 用户提供的证据:
- 引擎类型:宽表 SQL
- 慢 SQL:SELECT * FROM orders WHERE status = 'pending'
- EXPLAIN 显示:TABLE FULL SCAN
✅ 诊断结论:
全表扫描导致的慢查询。表 orders 没有 status 列的索引,
查询需要扫描全表。
💡 针对性优化方案:
1. 【推荐】创建二级索引:
CREATE INDEX idx_status ON orders(status) INCLUDE (order_id, amount);
2. 【临时】限制返回行数:
SELECT * FROM orders WHERE status = 'pending' LIMIT 100;
3. 【长期】如 status 选择性低,考虑使用搜索索引支持多条件查询
请确认优化方案后,我可以提供完整的执行语句。
```
---
## 优化方案参考
当用户提供了具体证据后,Agent 从以下官方文档提取针对性优化建议:
| 证据类型 | 参考文档 | 提取内容 |
|---------|---------|---------|
| 全表扫描 | [SQL 常见问题](https://help.aliyun.com/zh/lindorm/developer-reference/sql-faq) | 低效查询规避、索引使用、最左匹配原则 |
| 热点问题 | [如何设计宽表主键](https://help.aliyun.com/zh/lindorm/user-guide/how-to-design-the-rowkey-field) | 加盐、反转、分区策略、避免热点 |
| 大查询 | [游标分页](https://help.aliyun.com/zh/lindorm/introduction-to-the-use-of-cursor-paging) | LIMIT 分页、游标分页、避免内存溢出 |
| 索引选择 | [二级索引](https://help.aliyun.com/zh/lindorm/user-guide/high-performance-native-secondary-indexes) | 二级索引 vs 搜索索引选择 |
> **原则**:只提取与用户证据匹配的方案,不要一次性给出所有优化方案。
---
## 官方文档索引(供 Agent 参考)
| 场景 | 官方文档 | 预期提取内容 |
|------|---------|-------------|
| **慢查询诊断** | [慢查询诊断](https://help.aliyun.com/zh/lindorm/developer-reference/slow-query-diagnostics) | 慢查询定位方法、诊断工具、优化建议 |
| **二级索引** | [二级索引](https://help.aliyun.com/zh/lindorm/user-guide/high-performance-native-secondary-indexes) | 索引类型、创建语法、使用场景 |
| **全局索引** | [CREATE INDEX](https://help.aliyun.com/zh/lindorm/developer-reference/te-create-index) | GLOBAL 关键字、索引类型区别、语法参数 |
| **主键/RowKey 设计** | [如何设计宽表主键](https://help.aliyun.com/zh/lindorm/user-guide/how-to-design-the-rowkey-field) | 加盐、反转、分区策略、避免热点 |
| **索引创建总览** | [CREATE INDEX 索引对比](https://help.aliyun.com/zh/lindorm/developer-reference/te-create-index) | 二级索引 vs 搜索索引 vs 列存索引选择 |
| **ClusterManager** | [ClusterManager 使用](https://help.aliyun.com/zh/lindorm/user-guide/log-in-to-the-cluster-management-system) | 访问路径、慢查询日志查看 |
FILE:references/02-ops/storage-analysis.md
# 存储分析场景
当用户关心 Lindorm 实例的存储使用情况、冷热数据分布、存储增长趋势时,按本指南执行。
## 触发条件
用户的典型表达:
- "磁盘还剩多少?"
- "存储快满了吗?"
- "冷热存储分别用了多少?"
- "存储增长趋势如何?"
- "ld-xxx 使用了多少存储?"
- "热存储使用率是多少?"
## 核心能力
- **存储详情查询**:总容量、已用容量、冷热分布
- **存储使用率监控**:热存储/冷存储使用率趋势
- **存储增长分析**:历史趋势、增长速率
- **冷热分层建议**:何时启用冷存储、冷热数据迁移策略
## 执行流程
### 流程 1:获取存储详情(快照数据)
**适用场景**:用户想快速了解当前存储使用情况。
**执行命令**:
```bash
# V1 实例
aliyun hitsdb get-lindorm-fs-used-detail \
--instance-id <instance-id>
# V2 实例(instanceType=lindorm_v2)
aliyun hitsdb get-lindorm-v2-storage-usage \
--instance-id <instance-id>
```
**输出呈现**:
先给结论性摘要:
- 总容量(热+冷)
- 已用容量(热+冷)
- 使用率(%)
- 冷热分布占比
- 告警状态(是否接近阈值)
再按需展开详细字段。
**关键字段说明**:
**V1 实例**(`get-lindorm-fs-used-detail`):
| 字段 | 含义 | 单位 |
|------|------|------|
| `FsCapacity` | 文件引擎总容量 | bytes |
| `FsCapacityHot` | 热存储容量 | bytes |
| `FsCapacityCold` | 冷存储容量 | bytes |
| `FsUsedHot` | 热存储已使用 | bytes |
| `FsUsedCold` | 冷存储已使用 | bytes |
| `FsUsedOnLindormTable` | Lindorm 宽表已使用 | bytes |
| `FsUsedOnLindormTableData` | 宽表数据量 | bytes |
| `FsUsedOnLindormTableWAL` | WAL 日志量 | bytes |
**V1 计算公式**:
- **总容量** = `FsCapacityHot` + `FsCapacityCold`
- **已用容量** = `FsUsedHot` + `FsUsedCold`
- **存储使用率** = (已用容量 / 总容量) × 100%
- **热存储使用率** = (`FsUsedHot` / `FsCapacityHot`) × 100%
- **冷存储使用率** = (`FsUsedCold` / `FsCapacityCold`) × 100%
**V2 实例**(`get-lindorm-v2-storage-usage`):
| 字段 | 含义 | 单位 |
|------|------|------|
| `UsageByDiskCategory[]` | 按磁盘类型的使用详情数组 | — |
| └ `diskType` | 磁盘类型 | `PerformanceCloudStorage`(热)/ `CapacityCloudStorage`(冷) |
| └ `capacity` | 磁盘容量 | bytes |
| └ `used` | 已使用量 | bytes |
| └ `usedLindormTable` | 宽表已使用 | bytes |
| └ `usedLindormTsdb` | 时序已使用 | bytes |
| `CapacityByDiskCategory[]` | 按磁盘类别的容量信息 | — |
| └ `category` | 类别 | `PERF_CLOUD_ESSD_PL1` / `REMOTE_CAP_OSS` 等 |
| └ `capacity` | 容量 | GB |
**V2 计算公式**:
- **热存储使用率** = `PerformanceCloudStorage.used` / `PerformanceCloudStorage.capacity` × 100%
- **冷存储使用率** = `CapacityCloudStorage.used` / `CapacityCloudStorage.capacity` × 100%
- **总使用量** = Σ 各 diskType 的 `used`
**示例输出**:
```
【存储使用情况】实例 ld-uf6l5kr48wqm6rf1h
【总容量】800GB(热存储 500GB + 冷存储 300GB)
【已用容量】520GB(65%)
- 热存储已用:320GB(64%)
- 冷存储已用:200GB(67%)
【状态】⚠️ 热存储接近阈值(推荐 < 80%)
【存储分布】
- Lindorm 宽表:480GB(数据 450GB + WAL 30GB)
- 其他:40GB
【建议】热存储使用率较高,建议:
1. 检查是否可将历史数据迁移到冷存储
2. 考虑扩容热存储或启用自动冷热分层
📍 在控制台查看存储详情:
1. 控制台:https://lindorm.console.aliyun.com/
2. 点击实例 ID "ld-xxx"
3. 左侧菜单:存储信息
4. 查看:
- 总存储容量
- 热存储使用量/使用率
- 冷存储使用量/使用率
- 存储增长趋势(最近 7 天/30 天)
📍 在 ClusterManager 查看详细存储分析:
1. 控制台 → ld-xxx → 数据库连接 → "通过 ClusterManager 访问"
2. 存储分析 → 查看:
- 表级存储占用 Top 10
- 列族存储分布
- 数据膨胀分析
需要查看存储增长趋势吗?
```
---
### 流程 2:查询存储使用率(实时监控)
**适用场景**:用户想查看最新的存储使用率(通过云监控)。
**执行命令**:
```bash
# 热存储使用率
aliyun cms describe-metric-last \
--namespace acs_lindorm \
--metric-name hot_storage_used_percent \
--dimensions '[{"instanceId":"<instance-id>"}]'
# 冷存储使用率
aliyun cms describe-metric-last \
--namespace acs_lindorm \
--metric-name cold_storage_used_percent \
--dimensions '[{"instanceId":"<instance-id>"}]'
```
**输出呈现**:
- 当前热存储使用率
- 当前冷存储使用率
- 是否接近告警阈值(80%)
---
### 流程 3:查询存储历史趋势
**适用场景**:用户想分析存储增长趋势、预测何时需要扩容。
**执行命令**:
```bash
# 热存储已使用量(过去 7 天,每小时 1 个点)
aliyun cms describe-metric-data \
--namespace acs_lindorm \
--metric-name hot_storage_used_bytes \
--dimensions '[{"instanceId":"<instance-id>"}]' \
--start-time "<7天前 格式: YYYY-MM-DD HH:MM:SS>" \
--end-time "<当前时间 格式: YYYY-MM-DD HH:MM:SS>" \
--period 3600
# 冷存储已使用量(过去 7 天,每小时 1 个点)
aliyun cms describe-metric-data \
--namespace acs_lindorm \
--metric-name cold_storage_used_bytes \
--dimensions '[{"instanceId":"<instance-id>"}]' \
--start-time "<7天前 格式: YYYY-MM-DD HH:MM:SS>" \
--end-time "<当前时间 格式: YYYY-MM-DD HH:MM:SS>" \
--period 3600
```
**分析要点**:
- 计算日均增长量(GB/天)
- 预测存储耗尽时间(按当前增长速率)
- 判断增长趋势(线性/指数/平稳)
**示例输出**:
```
【存储增长趋势】实例 ld-uf6l5kr48wqm6rf1h(过去 7 天)
【热存储】
- 7 天前:280GB
- 当前:320GB
- 增长量:40GB(日均 5.7GB)
- 趋势:线性增长
- 预测:按当前速率,热存储将在 32 天后达到 80% 阈值
【冷存储】
- 7 天前:195GB
- 当前:200GB
- 增长量:5GB(日均 0.7GB)
- 趋势:平稳
【建议】
- 短期:热存储增长较快,建议 1 个月内扩容或启用自动冷热分层
- 长期:冷存储增长平稳,暂无压力
如需更多细节,可参考官方存储管理指南:
https://help.aliyun.com/zh/lindorm/user-guide/storage-management
📚 冷热分层配置指南:
https://help.aliyun.com/zh/lindorm/user-guide/hot-and-cold-separation/
需要帮您配置冷热分层策略吗?
```
---
## 存储相关监控指标
### 热存储指标
| 指标名称 | 描述 | 单位 | 告警阈值 |
|----------|------|------|----------|
| `hot_storage_total_bytes` | 热存储总容量 | bytes | - |
| `hot_storage_used_bytes` | 热存储已使用 | bytes | - |
| `hot_storage_used_percent` | 热存储使用率 | % | > 80% |
### 冷存储指标
| 指标名称 | 描述 | 单位 | 告警阈值 |
|----------|------|------|----------|
| `cold_storage_total_bytes` | 冷存储总容量 | bytes | - |
| `cold_storage_used_bytes` | 冷存储已使用 | bytes | - |
| `cold_storage_used_percent` | 冷存储使用率 | % | > 80% |
### 其他存储指标
| 指标名称 | 描述 | 单位 |
|----------|------|------|
| `storage_total_bytes` | 存储空间总量 | bytes |
| `storage_used_bytes` | 存储空间使用量 | bytes |
| `storage_used_percent` | 存储空间使用比例 | % |
| `table_hot_storage_used_bytes` | 宽表热存储使用量 | bytes |
| `table_cold_storage_used_bytes` | 宽表冷存储使用量 | bytes |
| `tsdb_hot_storage_used_bytes` | 时序热存储使用量 | bytes |
| `tsdb_cold_storage_used_bytes` | 时序冷存储使用量 | bytes |
---
## 存储使用率阈值与告警
Lindorm 默认磁盘使用率阈值为 **80%**。
| 存储使用率 | 状态 | 影响 | 建议 |
|-----------|------|------|------|
| < 60% | ✅ 正常 | 无影响 | 持续监控 |
| 60% - 80% | ⚠️ 关注 | 无影响 | 考虑扩容计划 |
| 80% - 90% | ⚠️ 告警 | 写入性能下降 | 尽快扩容或清理数据 |
| 90% - 95% | 🚨 严重告警 | 写入性能严重下降 | 立即扩容或清理数据 |
| ≥ 95% | 🔴 系统禁止写入 | 系统自动禁止数据写入(硬限制) | 必须扩容后才能恢复写入 |
---
## 冷热分层优化建议
### 何时启用冷存储?
| 场景 | 建议 |
|------|------|
| 历史数据访问频率低(< 1 次/天) | 启用冷存储,设置自动冷热分层策略 |
| 热存储使用率 > 60% | 考虑将历史数据迁移到冷存储 |
| 成本优化 | 冷存储成本仅为标准型云存储的 20%(约1/5),适合长期归档 |
### 开通冷存储(容量型云存储)
> ⚠️ **警告**:开通过程中需要**滚动重启实例**,可能会导致部分业务的读写请求出现**延迟波动或连接中断**,建议在**业务低峰期**操作。
**前提条件:**
- 实例存储类型为**本地 HDD 盘**时,**不支持**开通容量型云存储
- 云存储类型(性能型/标准型)实例支持开通
**控制台开通路径:**
1. 登录 [Lindorm 控制台](https://lindorm.console.aliyun.com/)
2. 在页面左上角,选择实例所属的**地域**
3. 在实例列表页,单击**目标实例ID**或者目标实例所在行操作列的**管理**
4. 在左侧导航栏,选择**冷存储**
5. 单击**开通**
6. 设置**容量型云存储容量**
7. 阅读并勾选服务协议,单击**立即购买**
**存储类型说明:**
| 存储类型 | 用途 | 是否支持冷热分层 |
|---------|------|-----------------|
| 性能型云存储 | 热存储,低延迟(< 10ms) | ✅ 支持(需开通容量型作为冷存储) |
| 标准型云存储 | 热存储,中等延迟 | ✅ 支持(需开通容量型作为冷存储) |
| 容量型云存储 | 冷存储,低成本(标准型云存储的 20%) | ✅ 作为冷存储介质 |
| 本地 HDD 盘 | 本地存储 | ❌ 不支持冷热分层 |
> **注意**:容量型云存储与性能型/标准型云存储可以**并存**,无需变更现有存储类型。
### 冷热分层策略
- **自动分层**:设置时间策略(如数据写入 30 天后自动转冷)
- **手动分层**:通过 Lindorm 表属性配置冷热分层规则
- **查询优化**:冷数据查询延迟较高(通常 > 100ms),热数据查询延迟低(< 10ms)
📍 配置冷热分层:
1. 通过 Lindorm SQL:
```sql
ALTER TABLE metrics SET COLD_DATA_AGE = 2592000; -- 30 天后转冷
```
2. 通过 HBase Shell:
```bash
alter 'metrics', {NAME => 'cf', COLD_DATA_AGE => 2592000}
```
### 冷存储性能说明
| 指标 | 热存储(性能型) | 冷存储(容量型) |
|------|----------------|-----------------|
| 查询延迟 | < 10ms | > 100ms |
| 存储成本 | 基准 | 仅为标准型云存储的 20% |
| 适用场景 | 高频访问 | 低频访问(< 1次/天) |
| 读取 IOPS | 高 | 限流(每 25GiB 容量 1 IOPS) |
如需更多细节,可参考官方配置指南:
https://help.aliyun.com/zh/lindorm/user-guide/hot-and-cold-separation/
📚 冷热分离最佳实践:
https://help.aliyun.com/zh/lindorm/user-guide/enable-cold-storage
---
## 缺参处理
### 缺 instance-id
**追问策略**:先用 `aliyun hitsdb get-instance-summary` 确认地域,再用 `aliyun hitsdb get-lindorm-instance-list --region <region>` 让用户选择实例。
### 缺 时间范围(存储趋势分析)
**默认策略**:使用过去 7 天,并告知用户。
---
## 错误处理
| 错误 | 原因 | 引导用户 |
|------|------|----------|
| **无存储详情** | 实例未开通文件引擎或实例状态异常 | 检查实例状态和引擎配置 |
| **指标无数据** | 时间范围内无数据或指标名错误 | 调整时间范围或确认指标名 |
| **权限不足** | Access Key 无 Lindorm 权限 | 提示需要 `AliyunLindormReadOnlyAccess` 权限 |
---
## 常见场景速查
| 用户描述 | 执行流程 |
|----------|----------|
| "磁盘还剩多少?" | 流程 1:获取存储详情 |
| "存储快满了吗?" | 流程 2:查询存储使用率 |
| "存储增长趋势如何?" | 流程 3:查询存储历史趋势 |
| "冷热存储分别用了多少?" | 流程 1:获取存储详情(区分冷热) |
| "什么时候需要扩容?" | 流程 3:分析增长趋势并预测 |
FILE:references/02-ops/user-permission.md
# 账号权限管理场景
## 触发条件
**用户意图:** 创建数据库用户、配置表级权限、实现权限隔离。
**典型问法:**
- "怎么创建一个只读账号?"
- "如何给开发环境配置独立的账号?"
- "能限制某个用户只能访问指定的表吗?"
- "怎么查看用户的权限列表?"
- "如何回收用户权限?"
---
## Agent 行为原则
**⚠️ 安全边界:提供权限配置指导,不直接创建用户或授权**
1. **查询能力:** Agent 暂无直接查询用户列表和权限的 API
2. **提供方案:** 提供权限配置的最佳实践和操作步骤
3. **引导操作:** 引导用户通过 Lindorm Insight 控制台进行权限管理
4. **不执行变更:** 不直接创建用户、授权或回收权限
5. **权限关键字:** Lindorm SQL 使用 READ/WRITE/SYSTEM/ADMIN/TRASH 权限体系,与 MySQL 的 SELECT/INSERT/UPDATE/DELETE 不同
6. **V1/V2 实例区别:** Lindorm 旧版实例通过 Lindorm Insight(ClusterManager)管理用户权限;**Lindorm 新版实例**需使用「统一用户认证及权限管理系统」,操作方式不同,需先确认实例版本
---
## 执行流程
### 官方文档参考
**账号权限管理相关文档:**
- 用户权限管理:https://help.aliyun.com/zh/lindorm/user-guide/user-and-permission-management
- Ranger 权限配置:https://help.aliyun.com/zh/lindorm/user-guide/use-ranger-to-control-data-access
- SQL 权限管理:https://help.aliyun.com/zh/lindorm/user-guide/manage-namespaces
- 访问控制(RAM):https://help.aliyun.com/zh/lindorm/security-and-compliance/
### 根据用户场景提供配置方案
---
## 输出示例
### 完整对话示例1:创建只读账号
**用户:** "我想创建一个只读账号给 BI 团队使用"
**Agent:**
```markdown
好的,我来帮您创建一个只读账号。Lindorm 支持两种权限管理方式,我推荐使用 **Lindorm Insight 控制台**(更直观)。
## 只读账号创建方案
### 方案1:通过 Lindorm Insight 控制台(推荐)
#### 步骤1:登录 Lindorm Insight
1. 登录 [Lindorm 控制台](https://lindorm.console.aliyun.com/)
2. 在实例列表页面,单击**目标实例ID**
3. 在左侧导航栏,单击**数据库连接**
4. 在宽表引擎页面的 **Lindorm Insight** 区域中,单击**通过 ClusterManager 公网访问**(或专有网络访问)
5. 输入用户名和密码登录
⚠️ **首次登录需重置密码:**
- 在宽表引擎页面的 Lindorm Insight 区域中,单击**重置 UI 访问密码**
- 密码规则:2~30位,需包含大写字母、小写字母、特殊字符和数字中的一种或多种
- 允许的特殊字符:`_` 和 `-`
#### 步骤2:创建只读用户
1. Lindorm Insight 页面 → 左侧菜单 → **数据管理 > 用户管理**
2. 点击"创建用户"
3. 配置参数:
```
用户名:bi_readonly
密码:********(强密码,包含大小写字母+数字+特殊字符)
确认密码:********
```
4. 点击"确定"
#### 步骤3:授予只读权限
1. 用户列表 → 找到 `bi_readonly` → 点击"授权"
2. 选择权限类型:
```
权限范围:指定命名空间和表
命名空间:default(或您的业务命名空间)
表名:*(所有表)或指定表名
权限类型:☑️ 只读(READ)
```
3. 点击"确定"
#### 步骤4:验证权限
新用户可以使用以下方式连接:
**连接信息:**
> Agent 根据实例 ServiceType 自动选择正确的 V1/V2 域名格式,参见 `sql-client-guide.md` →「连接域名格式」
```bash
mysql -h <您的连接地址> -P 33060 -u bi_readonly -p
# 地址获取:控制台实例详情 → 数据库连接,或 aliyun hitsdb get-lindorm-instance-engine-list --instance-id <id>
```
**测试只读权限:**
```sql
-- 可以执行的操作
SELECT * FROM default.user_table LIMIT 10; -- ✅ 成功
-- 不能执行的操作
CREATE TABLE test_table (...); -- ❌ 权限不足
INSERT INTO default.user_table VALUES (...); -- ❌ 权限不足
DELETE FROM default.user_table WHERE ...; -- ❌ 权限不足
```
### 方案2:通过 SQL 操作
⚠️ **注意:Lindorm SQL 权限体系与 MySQL 不同,使用 READ/WRITE/SYSTEM/ADMIN/TRASH 关键字**
#### 步骤1:使用管理员账号连接
```bash
# MySQL 协议连接(推荐)
mysql -h <您的连接地址> -P 33060 -u root -p
```
#### 步骤2:创建用户并授权
```sql
-- 1. 创建用户
CREATE USER IF NOT EXISTS bi_readonly identified BY 'YourStrongPassword123!';
-- 2. 授予指定表的只读权限(需先确认表存在)
GRANT READ ON TABLE default.user_table TO bi_readonly;
GRANT READ ON TABLE default.order_table TO bi_readonly;
-- 或授予整个数据库的只读权限(database 名若为保留字需用反引号)
GRANT READ ON DATABASE `default` TO bi_readonly;
-- 授予读写权限
GRANT READ, WRITE ON DATABASE dev TO dev_user;
-- 3. 查看所有用户权限
SHOW PRIVILEGES;
-- 查看所有用户
SHOW USERS;
```
#### 步骤3:验证权限
使用新用户登录并测试:
```bash
mysql -h <您的连接地址> -P 33060 -u bi_readonly -p
```
### 权限对照表
| 权限类型 | SQL 关键字 | 允许的操作 |
|---------|----------|-----------|
| 只读 | READ | 查询数据 |
| 读写 | READ, WRITE | 查询、插入、更新、删除 |
| 管理员 | ADMIN | 创建/删除表和索引等 DDL |
| 删除 | TRASH | TRUNCATE TABLE(清空表数据) |
| 系统 | SYSTEM | 管理用户权限、COMPACT、FLUSH 等运维操作(仅全局授权) |
### 安全建议
✅ **推荐做法:**
- 为不同环境创建独立账号(生产/测试/开发)
- 使用强密码(至少 12 位,包含大小写字母+数字+特殊字符)
- 定期轮换密码(建议 90 天)
- 遵循最小权限原则(只授予必要的权限)
- 对敏感表设置更严格的权限
❌ **避免做法:**
- 不要共享默认管理员账号
- 不要给所有用户授予管理员权限
- 不要使用弱密码(如 123456、password)
### 官方文档
- 完整权限管理指南:https://help.aliyun.com/zh/lindorm/user-guide/user-and-permission-management
- SQL 权限命令:https://help.aliyun.com/zh/lindorm/user-guide/manage-namespaces
```
---
### 完整对话示例2:配置表级权限隔离
**用户:** "开发团队和运维团队需要访问不同的表,怎么配置?"
**Agent:**
```markdown
好的,我来帮您规划表级权限隔离方案。
## 表级权限隔离方案
### 业务场景分析
- **开发团队:** 需要访问开发环境的表,可读写
- **运维团队:** 需要访问所有表,但只读(监控、排查问题)
- **隔离目标:** 开发团队不能访问生产表,运维团队不能修改数据
### 推荐方案:命名空间 + 表级权限
#### 步骤1:规划命名空间结构
```
lindorm_instance
├── namespace: prod(生产环境)
│ ├── user_table
│ ├── order_table
│ └── payment_table
├── namespace: dev(开发环境)
│ ├── test_user_table
│ └── test_order_table
└── namespace: staging(预发环境)
└── staging_user_table
```
#### 步骤2:创建用户并授权
**2.1 创建开发团队账号**
```sql
-- 通过 Lindorm Insight 或 SQL 创建
CREATE USER IF NOT EXISTS dev_team identified BY 'DevPassword123!';
-- 授予 dev 数据库的读写权限
GRANT READ, WRITE ON DATABASE dev TO dev_team;
-- 查看权限
SHOW PRIVILEGES;
```
**2.2 创建运维团队账号**
```sql
-- 创建用户
CREATE USER IF NOT EXISTS ops_team identified BY 'OpsPassword123!';
-- 授予所有数据库的只读权限
GRANT READ ON DATABASE prod TO ops_team;
GRANT READ ON DATABASE dev TO ops_team;
GRANT READ ON DATABASE staging TO ops_team;
-- 查看权限
SHOW PRIVILEGES;
```
#### 步骤3:验证权限隔离
**开发团队测试:**
```sql
-- 连接:mysql -h <您的连接地址> -P 33060 -u dev_team -p
-- ✅ 可以操作 dev 命名空间
SELECT * FROM dev.test_user_table;
INSERT INTO dev.test_user_table VALUES (...);
-- ❌ 不能访问 prod 命名空间
SELECT * FROM prod.user_table; -- 报错:权限不足
```
**运维团队测试:**
```sql
-- 连接:mysql -h <您的连接地址> -P 33060 -u ops_team -p
-- ✅ 可以查询所有命名空间
SELECT * FROM prod.user_table;
SELECT * FROM dev.test_user_table;
-- ❌ 不能修改数据
INSERT INTO prod.user_table VALUES (...); -- 报错:权限不足
DELETE FROM prod.user_table WHERE ...; -- 报错:权限不足
```
### 更细粒度的权限控制
如果需要更细粒度的控制(如只允许访问指定的列),可以使用 **Ranger**:
#### 步骤1:开通 Ranger
1. Lindorm 控制台 → 实例详情 → 数据库连接
2. 计算引擎 → 点击"开通 Ranger 地址"
#### 步骤2:配置 Ranger 策略
1. 访问 Ranger 管理页面
2. 创建策略:
```
策略名称:dev_team_policy
资源:
- 命名空间:dev
- 表:*(所有表)
- 列:*(所有列)
用户/组:dev_team
权限:READ, WRITE
```
#### 步骤3:配置列级权限(可选)
```
策略名称:ops_sensitive_data_mask
资源:
- 命名空间:prod
- 表:user_table
- 列:phone, id_card(敏感列)
用户/组:ops_team
权限:READ
```
### 权限管理最佳实践
#### 1. 使用用户组(推荐)
```
用户组划分:
- admin_group:管理员组(DBA)
- dev_group:开发组
- ops_group:运维组
- readonly_group:只读组(BI、分析)
好处:
- 新用户加入组即可继承权限
- 统一管理,避免重复授权
- 便于权限审计
```
#### 2. 权限申请流程
```
1. 开发人员提交工单申请权限
2. DBA 审核并创建账号
3. 授予最小必要权限
4. 定期审查和回收不用的权限
```
#### 3. 审计日志监控
```
开通 SQL 审计日志(控制台 → 宽表引擎 → SQL审计日志)
监控:
- 谁访问了哪些表
- 执行了哪些操作
- 是否有越权访问
```
### 官方文档
- Ranger 权限管理:https://help.aliyun.com/zh/lindorm/user-guide/use-ranger-to-control-data-access
- 命名空间管理:https://help.aliyun.com/zh/lindorm/user-guide/log-in-to-the-cluster-management-system
```
---
### 完整对话示例3:回收用户权限
**用户:** "一个员工离职了,怎么回收他的账号权限?"
**Agent:**
```markdown
明白了,员工离职需要及时回收权限以保障安全。
## 账号权限回收方案
### 方案1:删除用户(彻底移除)
#### 通过 Lindorm Insight 控制台
1. 登录 [Lindorm 控制台](https://lindorm.console.aliyun.com/),进入目标实例的**数据库连接**页面
2. 在宽表引擎页面的 **Lindorm Insight** 区域中,单击**通过 ClusterManager 公网访问**
3. 登录 Lindorm Insight 后,进入**数据管理 > 用户管理** → **用户列表**
4. 找到目标用户 → 点击"删除"
5. 确认删除
#### 通过 SQL 命令
```sql
-- 1. 查看现有用户权限
SHOW PRIVILEGES;
-- 2. 删除用户(自动回收所有权限)
DROP USER employee_username;
-- 3. 验证删除
SHOW USERS;
-- 目标用户不再列表中,说明已删除
```
### 方案2:禁用用户(保留账号,临时禁用)
Lindorm 暂不支持直接禁用用户,建议:
#### 方案A:回收所有权限
```sql
-- 查看用户当前权限
SHOW PRIVILEGES;
-- 逐一回收权限
REVOKE READ ON DATABASE prod FROM employee_username;
REVOKE READ, WRITE ON DATABASE dev FROM employee_username;
-- 验证权限已回收
SHOW PRIVILEGES;
```
#### 方案B:修改密码(使原密码失效)
```sql
-- 修改为随机强密码,用户无法再登录
ALTER USER employee_username SET PASSWORD = 'RandomStrongPassword!@#$%123456';
```
### 方案3:批量回收权限(多个离职员工)
如果有多个员工离职,可以使用脚本批量处理:
```sql
-- 删除多个用户
DROP USER 'employee1';
DROP USER 'employee2';
DROP USER 'employee3';
-- 或使用脚本循环执行
```
### 权限回收检查清单
完成权限回收后,请检查以下项目:
- [ ] **数据库账号**:已删除或禁用
- [ ] **白名单 IP**:已移除离职员工的办公 IP
- [ ] **RAM 账号**:已删除或禁用阿里云 RAM 子账号
- [ ] **报警联系人**:已从报警联系人组中移除
- [ ] **密钥/证书**:已轮换相关密钥和证书
- [ ] **审计日志**:检查离职前是否有异常操作
### 安全建议
✅ **推荐做法:**
1. **立即回收:** 员工离职当天立即回收所有权限
2. **交接完成后删除:** 确认业务交接完成后再删除账号
3. **审计日志保留:** 保留用户的操作审计日志至少 180 天
4. **定期审查:** 每季度审查用户列表,清理不再使用的账号
⚠️ **注意事项:**
- 删除用户前,确认没有正在运行的应用使用该账号
- 如果应用还在使用,先切换到新账号再删除旧账号
- 重要账号删除前,建议先备份操作日志
### 账号生命周期管理
建议建立完整的账号生命周期管理流程:
```
1. 入职 → 创建账号 → 授予最小权限
2. 在职 → 定期审查权限 → 根据职责调整
3. 离职 → 立即禁用 → 确认交接 → 删除账号
```
### 官方文档
- 用户权限管理:https://help.aliyun.com/zh/lindorm/user-guide/user-and-permission-management
- SQL 权限命令:https://help.aliyun.com/zh/lindorm/user-guide/manage-namespaces
```
---
## 缺参追问
| 缺失参数 | 追问话术 | 默认策略 |
|---------|---------|----------|
| 用户类型 | "请问是创建只读账号还是读写账号?" | 默认推荐只读账号(安全) |
| 权限范围 | "需要访问哪些命名空间或表?" | 默认提供指定表权限方案 |
| 用户名 | "请提供要创建的用户名" | 提供命名规范建议 |
| 操作类型 | "是创建新用户还是修改现有用户权限?" | 默认按创建新用户引导 |
---
## 错误处理
### 常见问题
#### 问题1:用户创建失败 - 用户名已存在
**Agent 输出:**
```
用户创建失败,可能是用户名已存在。
**排查步骤:**
1. **查看现有用户列表**
- Lindorm Insight → 数据管理 > 用户管理 → 用户列表
- 或通过 SQL:`SHOW USERS;`
2. **选择解决方案**
**方案A:使用不同的用户名**
```sql
CREATE USER new_username identified BY 'Password123!';
```
**方案B:删除旧用户后重新创建**
```sql
-- 先删除
DROP USER existing_username;
-- 再创建
CREATE USER existing_username identified BY 'NewPassword123!';
```
**方案C:修改现有用户密码**
```sql
ALTER USER existing_username SET PASSWORD = 'NewPassword123!';
```
**官方文档:** https://help.aliyun.com/zh/lindorm/user-guide/user-and-permission-management
```
#### 问题2:权限授予失败 - 表不存在
**Agent 输出:**
```
权限授予失败,可能是表不存在。
**排查步骤:**
1. **查看表是否存在**
```sql
-- 查看命名空间列表
SHOW SCHEMAS;
-- 查看指定命名空间的表
SHOW TABLES IN default;
```
2. **确认表名格式**
Lindorm SQL 通过 MySQL 协议连接时,表名格式为:`ON TABLE namespace.table`
**正确示例:**
```sql
GRANT READ ON TABLE default.user_table TO username; -- ✅
```
**错误示例:**
```sql
GRANT READ ON user_table TO username; -- ❌ 缺少 TABLE 关键字和命名空间
```
3. **如果表确实不存在**
- 先创建表,再授权
- 或授予整个数据库的权限:
```sql
GRANT READ ON DATABASE `default` TO username;
```
**官方文档:** https://help.aliyun.com/zh/lindorm/user-guide/manage-namespaces
```
#### 问题3:用户连接失败 - 认证错误
**Agent 输出:**
```
用户连接失败,认证错误,请检查:
**1. 用户名是否正确**
- 确认用户名拼写无误
- Lindorm 用户名区分大小写
**2. 密码是否正确**
- 确认密码无误
- 注意大小写和特殊字符
**3. 白名单是否配置**
- 确认客户端 IP 已加入白名单
- 控制台 → 访问控制 → 白名单
**4. 连接地址是否正确**
- 确认使用正确的连接地址和端口(端口见 SKILL.md →「端口号速查表」)
**5. 用户是否有连接权限**
验证用户权限:
```sql
-- 查看所有用户及权限
SHOW PRIVILEGES;
SHOW USERS;
```
**测试连接:**
```bash
# MySQL 协议连接(推荐)
mysql -h <您的连接地址> -P 33060 -u username -p
# HBase API 连接(需要在代码中配置)
```
**官方文档:** https://help.aliyun.com/zh/lindorm/user-guide/user-and-permission-management
```
---
## 关联场景
- 白名单配置 → 跳转 `../01-dev/connection-guide.md`
- 连接信息获取 → 跳转 `../01-dev/connection-guide.md`
- 审计日志查询 → 控制台操作,无 API 支持
- RAM 访问控制 → 跳转官方文档
---
## 技术说明
### 为何不直接创建用户和授权?
1. **安全风险:** 用户账号和权限是核心安全配置,错误操作可能导致:
- 权限泄漏(授权过大)
- 业务中断(误删用户)
- 数据泄露(未经审批的账号创建)
2. **密码管理:** 用户密码是敏感信息,Agent 不应直接生成或管理密码
3. **审批流程:** 企业通常有严格的权限申请和审批流程,不应绕过
4. **符合最佳实践:** 账号和权限管理应通过控制台或审计的 SQL 操作,便于追溯和审计
### Agent 提供的价值
- ✅ 提供权限规划方案和最佳实践
- ✅ 完整的权限配置步骤和 SQL 命令
- ✅ 权限隔离和安全加固建议
- ✅ 排查权限问题的完整方案
- ✅ 权限生命周期管理指导
FILE:references/03-ref/acceptance-criteria.md
# Acceptance Criteria: alibabacloud-lindorm-agent-skill
**Scenario**: Lindorm 全场景运维管理
**Purpose**: Skill 验收标准
---
## Correct CLI 命令模式
### 1. Lindorm API 调用(aliyun hitsdb)
#### ✅ CORRECT
```bash
# 查询实例列表(默认上海地域)
aliyun hitsdb get-lindorm-instance-list --region cn-shanghai
# 查询实例详情
aliyun hitsdb get-lindorm-instance --instance-id ld-uf6l5kr48wqm6rf1h
# 查询存储详情
aliyun hitsdb get-lindorm-fs-used-detail --instance-id ld-uf6l5kr48wqm6rf1h
# 查询 V2 实例存储详情
aliyun hitsdb get-lindorm-v2-storage-usage --instance-id ld-uf64f07n285tlbaz2
# 查询引擎列表
aliyun hitsdb get-lindorm-instance-engine-list --instance-id ld-uf6l5kr48wqm6rf1h
# 查询 IP 白名单
aliyun hitsdb get-instance-ip-white-list --instance-id ld-uf6l5kr48wqm6rf1h
# 查询地域列表
aliyun hitsdb describe-regions
# 查询实例概览(所有地域)
aliyun hitsdb get-instance-summary
```
#### ❌ INCORRECT
```bash
# 错误:实例 ID 格式错误
aliyun hitsdb get-lindorm-instance --instance-id lindorm-xxx --region cn-shanghai # ❌ 应使用 ld-xxx 格式
# 错误:缺少必需参数
aliyun hitsdb get-lindorm-instance --region cn-shanghai # ❌ 缺少 --instance-id
# 错误:使用错误的地域格式
aliyun hitsdb get-lindorm-instance-list --region Shanghai # ❌ 应使用 cn-shanghai
```
### 2. 云监控 API 调用(aliyun cms)
#### ✅ CORRECT
```bash
# 查询 Lindorm 监控指标列表
aliyun cms describe-metric-meta-list --namespace acs_lindorm
# 查询 CPU 空闲率最新数据
aliyun cms describe-metric-last \
--namespace acs_lindorm \
--metric-name cpu_idle \
--dimensions '[{"instanceId":"ld-uf6l5kr48wqm6rf1h"}]'
# 查询历史监控数据(指定时间范围)
aliyun cms describe-metric-data \
--namespace acs_lindorm \
--metric-name cpu_idle \
--dimensions '[{"instanceId":"ld-uf6l5kr48wqm6rf1h"}]' \
--start-time "2026-04-14 08:00:00" \
--end-time "2026-04-14 09:00:00" \
--period 60
```
#### ❌ INCORRECT
```bash
# 错误:错误的 namespace
aliyun cms describe-metric-meta-list --namespace acs_hbase --region cn-shanghai # ❌ 应使用 acs_lindorm
# 错误:错误的指标名称
aliyun cms describe-metric-last --metric-name cpu_usage # ❌ 应使用 cpu_idle
# 错误:dimensions 格式错误
aliyun cms describe-metric-last --dimensions "instanceId=ld-xxx" # ❌ 应使用 JSON 数组格式
```
---
## 参数验证模式
### 1. 实例 ID 格式
#### ✅ CORRECT
```
ld-uf6l5kr48wqm6rf1h # ✅ 以 ld- 开头,后接字母数字
ld-bp1234567890abcdef # ✅ 正确格式
```
#### ❌ INCORRECT
```
lindorm-xxx # ❌ 不以 ld- 开头
ld_xxx # ❌ 使用下划线而非连字符
LD-XXX # ❌ 使用大写(建议使用小写)
```
### 2. 地域格式
#### ✅ CORRECT
```
cn-shanghai # ✅ 正确格式
cn-beijing # ✅ 正确格式
cn-hangzhou # ✅ 正确格式
```
#### ❌ INCORRECT
```
shanghai # ❌ 缺少 cn- 前缀
CN-SHANGHAI # ❌ 使用大写
cn_shanghai # ❌ 使用下划线
```
### 3. 时间格式
时间格式规则见 SKILL.md →「时间格式」,验收正误对比:
#### ✅ CORRECT
```bash
--start-time "2026-04-14 08:00:00" # ✅ 本地时间格式(推荐),解析为 CST
--start-time "1773897600000" # ✅ Unix 毫秒时间戳
--start-time "2026-04-14T08:00:00Z" # ✅ ISO 8601 UTC,解析为 UTC(注意时区换算:UTC+8=CST)
```
#### ❌ INCORRECT
```bash
--start-time "2026-04-14T08:00Z" # ❌ ISO 8601 短格式(无秒)不支持,报 parse param time error
--start-time "2026/04/14 08:00:00" # ❌ 使用斜杠分隔
--start-time "08:00:00" # ❌ 仅提供时间,缺少日期
```
---
## 输出格式验证
### 1. 监控数据输出
#### ✅ CORRECT 输出结构
```json
{
"Datapoints": [
{
"instanceId": "ld-uf6l5kr48wqm6rf1h",
"timestamp": 1773897600000,
"Average": 75.5
}
],
"DatapointCount": 1,
"Success": true
}
```
#### 验证要点
- `Datapoints` 数组存在且非空
- 每个数据点包含 `instanceId`, `timestamp`, `Average`(部分指标含 `Maximum`/`Minimum`)
- `Average` 在合理范围(如 CPU 空闲率 0-100)
### 2. 实例列表输出
#### ✅ CORRECT 输出结构
```json
{
"InstanceList": [
{
"InstanceId": "ld-uf6l5kr48wqm6rf1h",
"InstanceAlias": "生产环境",
"InstanceStatus": "ACTIVATION",
"RegionId": "cn-shanghai"
}
],
"TotalCount": 1
}
```
#### 验证要点
- `InstanceList` 数组存在
- 每个实例包含 `InstanceId`, `InstanceStatus`
- `InstanceStatus` 为有效状态值(ACTIVATION, CREATING, STOPPED)
---
## 错误处理模式
### 1. API 错误响应
#### ✅ CORRECT 错误处理
```json
{
"Code": "InvalidParameter.InstanceId",
"Message": "The specified instance ID is invalid.",
"RequestId": "xxx"
}
```
**处理流程**:
1. 读取 `Code` 字段识别错误类型
2. 参考 `references/02-ops/error-troubleshoot.md` 查找解决方案
3. 引导用户检查参数或权限
#### ❌ INCORRECT 错误处理
- 忽略错误码,直接返回原始错误信息
- 凭训练知识猜测错误原因(应查官方文档)
---
## 场景触发验证
### 正确触发场景
| 用户输入 | 触发场景 | 执行文档 |
|---------|---------|---------|
| "我有哪些 Lindorm 实例" | 实例查询 | `02-ops/instance-management.md` |
| "CPU 使用率是多少" | 监控查询 | `02-ops/monitoring-guide.md` |
| "报错 InvalidParameter" | 错误排查 | `02-ops/error-troubleshoot.md` |
| "怎么连接 Lindorm" | 连接指南 | `01-dev/connection-guide.md` |
| "怎么建表" | 建表指南 | `01-dev/table-design.md` |
### 错误触发场景
| 用户输入 | 错误处理 |
|---------|---------|
| "RDS 实例有哪些" | ❌ 不应触发 Lindorm skill,应提示使用 RDS skill |
| "MySQL 怎么用" | ❌ 不应触发,除非明确提到 Lindorm SQL |
---
## 安全规范验证
安全规则见 SKILL.md →「前置条件 → 凭证已配置」。
#### ❌ FORBIDDEN
```bash
echo $ALIBABA_CLOUD_ACCESS_KEY_ID # ❌ 禁止读取/打印 AK/SK
"请输入您的 AccessKey ID" # ❌ 禁止在对话中要求用户输入
aliyun configure set --access-key-id LTAI5t # ❌ 禁止硬编码凭证
```
FILE:references/03-ref/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
Complete guide for installing and configuring Aliyun CLI.
> **Aliyun CLI 3.3.3+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.3 or later for full plugin ecosystem coverage.
## Installation
### macOS
**Using Homebrew (Recommended)**
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
# Verify version (>= 3.3.3)
aliyun version
```
**Using Binary**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
# Extract
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
# Move to PATH
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
### Linux
**Debian/Ubuntu**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**CentOS/RHEL**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**ARM64 Architecture**
```bash
# Download ARM64 version
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-arm64.tgz
sudo mv aliyun /usr/local/bin/
```
### Windows
**Using Binary**
1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Open new Command Prompt or PowerShell
5. Verify: `aliyun version`
**Using PowerShell**
```powershell
# Download
Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip"
# Extract
Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli
# Add to PATH (requires admin privileges)
$env:Path += ";C:\aliyun-cli"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)
# Verify
aliyun version
```
## Configuration
### Quick Start
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
All `aliyun configure` commands support non-interactive flags, which is the recommended approach —
it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts.
**Where to Get Access Keys**
1. Log in to Aliyun Console: https://ram.console.aliyun.com/
2. Navigate to: AccessKey Management
3. Create a new AccessKey pair
4. Save the secret immediately — it's only shown once
### Configuration Modes
Aliyun CLI supports 6 authentication modes. All examples below use non-interactive flags.
#### 1. AK Mode (Access Key)
Most common mode for personal accounts and scripts.
```bash
aliyun configure set \
--mode AK \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Configuration is stored in `~/.aliyun/config.json`:
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "AK",
"access_key_id": "LTAI5tXXXXXXXX",
"access_key_secret": "8dXXXXXXXXXXXXXXXXXXXXXXXX",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "en"
}
]
}
```
#### 2. StsToken Mode (Temporary Credentials)
For short-lived access (tokens expire in 1-12 hours).
```bash
aliyun configure set \
--mode StsToken \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--sts-token v1.0:XXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Use cases: CI/CD pipelines, temporary access for external contractors, cross-account access.
#### 3. RamRoleArn Mode (Assume RAM Role)
Assume a RAM role for elevated or cross-account access.
```bash
aliyun configure set \
--mode RamRoleArn \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--ram-role-arn acs:ram::123456789012:role/AdminRole \
--role-session-name my-session \
--region cn-hangzhou
```
Use cases: cross-account resource access, temporary elevated privileges, role-based access control.
#### 4. EcsRamRole Mode (ECS Instance RAM Role)
Use the RAM role attached to an ECS instance — no credentials needed.
```bash
aliyun configure set \
--mode EcsRamRole \
--ram-role-name MyEcsRole \
--region cn-hangzhou
```
Requirements: must be running on an ECS instance with a RAM role attached.
Use cases: scripts and automation running on ECS instances.
#### 5. RsaKeyPair Mode (RSA Key Pair)
Use RSA key pair for authentication (generate key pair in Aliyun Console first).
```bash
aliyun configure set \
--mode RsaKeyPair \
--private-key /path/to/private-key.pem \
--key-pair-name my-key-pair \
--region cn-hangzhou
```
#### 6. RamRoleArnWithEcs Mode (ECS + RAM Role)
Combine ECS instance role with RAM role assumption for cross-account access from ECS.
```bash
aliyun configure set \
--mode RamRoleArnWithEcs \
--ram-role-name MyEcsRole \
--ram-role-arn acs:ram::123456789012:role/TargetRole \
--role-session-name my-session \
--region cn-hangzhou
```
### Environment Variables
**Highest priority** - overrides config file
**Access Key Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**STS Token Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_SECURITY_TOKEN=your_sts_token
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**ECS RAM Role Mode**
```bash
export ALIBABA_CLOUD_ECS_METADATA=role_name
```
**Use Case**:
- CI/CD pipelines
- Docker containers
- Temporary credential override
### Managing Multiple Profiles
**Create Named Profiles**
```bash
aliyun configure set --profile projectA \
--mode AK \
--access-key-id LTAI5tAAAAAAAA \
--access-key-secret 8dAAAAAAAAAAAAAAAAAAAAAAAA \
--region cn-hangzhou
aliyun configure set --profile projectB \
--mode AK \
--access-key-id LTAI5tBBBBBBBB \
--access-key-secret 8dBBBBBBBBBBBBBBBBBBBBBBBB \
--region cn-shanghai
```
**Use Specific Profile**
```bash
aliyun ecs describe-instances --profile projectA
export ALIBABA_CLOUD_PROFILE=projectA
aliyun ecs describe-instances # Uses projectA
```
**List and Switch Profiles**
```bash
aliyun configure list # List all profiles
aliyun configure set --current projectA # Switch default profile
```
### Credential Priority
Credentials are loaded in this order (first found wins):
1. **Command-line flag**: `--profile <name>`
2. **Environment variable**: `ALIBABA_CLOUD_PROFILE`
3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc.
4. **Configuration file**: `~/.aliyun/config.json` (current profile)
5. **ECS Instance RAM Role**: If running on ECS with attached role
## Verification
### Test Authentication
```bash
# Basic test - list regions
aliyun ecs describe-regions
# Expected output: JSON array of regions
```
**If successful**, you'll see:
```json
{
"Regions": {
"Region": [
{
"RegionId": "cn-hangzhou",
"RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com",
"LocalName": "华东 1(杭州)"
},
...
]
},
"RequestId": "..."
}
```
**If failed**, you'll see error messages:
- `InvalidAccessKeyId.NotFound` - Wrong Access Key ID
- `SignatureDoesNotMatch` - Wrong Access Key Secret
- `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode)
- `Forbidden.RAM` - Insufficient permissions
### Debug Configuration
```bash
# Show current configuration
aliyun configure get
# Test with debug logging
aliyun ecs describe-regions --log-level=debug
# Check credential provider
aliyun configure get mode
```
## Security Best Practices
### 1. Use RAM Users (Not Root Account)
❌ **Don't**: Use Aliyun root account credentials
✅ **Do**: Create RAM users with specific permissions
```bash
# Create RAM user in console
# Attach only necessary policies
# Use RAM user's access keys
```
### 2. Principle of Least Privilege
Grant only the minimum permissions needed:
```bash
# Example: Read-only ECS access
# Attach policy: AliyunECSReadOnlyAccess
```
### 3. Rotate Access Keys Regularly
```bash
# Create new access key in RAM Console, then update configuration
aliyun configure set --access-key-id NEW_KEY --access-key-secret NEW_SECRET
# Delete old access key from console
```
### 4. Use STS Tokens for Temporary Access
```bash
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token XXXX --region cn-hangzhou
```
### 5. Use ECS RAM Roles When Possible
```bash
aliyun configure set --mode EcsRamRole --ram-role-name MyRole --region cn-hangzhou
```
### 6. Never Commit Credentials
```bash
# Add to .gitignore
echo "~/.aliyun/config.json" >> .gitignore
# Use environment variables in CI/CD instead
```
### 7. Secure Config File
```bash
# Restrict permissions
chmod 600 ~/.aliyun/config.json
```
## Troubleshooting
### Issue: Command Not Found
```bash
# Check installation
which aliyun
# Check PATH
echo $PATH
# Reinstall or add to PATH
```
### Issue: Authentication Failed
```bash
# Verify configuration
aliyun configure get
# Test with debug
aliyun ecs describe-regions --log-level=debug
# Check credentials in console
# Verify access key is active
```
### Issue: Permission Denied
```bash
# Error: Forbidden.RAM
# Check RAM user permissions
# Attach necessary policies in RAM console
# Example: AliyunECSFullAccess for ECS operations
```
### Issue: STS Token Expired
```bash
# Error: InvalidSecurityToken.Expired
# Reconfigure with new token
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token NEW_TOKEN --region cn-hangzhou
```
### Issue: Wrong Region
```bash
# Some resources may not exist in the specified region
# Check available regions
aliyun ecs describe-regions
# Update default region
aliyun configure set region cn-shanghai
```
## Advanced Configuration
### Custom Endpoint
```bash
# Use custom or private endpoint
export ALIBABA_CLOUD_ECS_ENDPOINT=ecs-vpc.cn-hangzhou.aliyuncs.com
```
### Proxy Settings
```bash
# HTTP proxy
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
# No proxy for specific domains
export NO_PROXY=localhost,127.0.0.1,.aliyuncs.com
```
### Timeout Settings
```bash
# Connection timeout (default: 10s)
export ALIBABA_CLOUD_CONNECT_TIMEOUT=30
# Read timeout (default: 10s)
export ALIBABA_CLOUD_READ_TIMEOUT=30
```
## Next Steps
After installation and configuration:
1. **Install plugins** for services you need (v3.3.3+ supports all published product plugins):
```bash
aliyun plugin install --names hitsdb cms
# List all available plugins
aliyun plugin list-remote
```
2. **Explore commands**:
```bash
aliyun hitsdb --help
aliyun cms --help
```
3. **Read documentation**:
- 参考阿里云官方文档:https://help.aliyun.com/zh/cli/
## References
- Official Documentation: https://help.aliyun.com/zh/cli/
- RAM Console: https://ram.console.aliyun.com/
- Access Key Management: https://ram.console.aliyun.com/manage/ak
- Plugin Repository: https://github.com/aliyun/aliyun-cli
FILE:references/03-ref/lindorm-cli-guide.md
# Lindorm CLI & HBase Shell Guide
Lindorm 专用命令行工具指南,覆盖 Lindorm CLI(SQL/TSDB)和 HBase Shell(alihbase)。
---
## Lindorm CLI
Lindorm CLI is the official command-line client for connecting to and operating Lindorm wide table engine (SQL) and time series engine (TSDB).
> **Official Doc**: https://help.aliyun.com/zh/lindorm/user-guide/use-lindorm-cli-to-connect-to-and-use-lindormtable
### Installation
#### Download Links
| Platform | Download URL |
|----------|-------------|
| Linux x64 | `https://tsdbtools.oss-cn-hangzhou.aliyuncs.com/lindorm-cli-linux-latest.tar.gz` |
| Linux ARM64 | `https://tsdbtools.oss-cn-hangzhou.aliyuncs.com/lindorm-cli-linux-arm64-latest.tar.gz` |
| macOS AMD64 | `https://tsdbtools.oss-cn-hangzhou.aliyuncs.com/lindorm-cli-mac-amd64-latest.tar.gz` |
| macOS ARM64 | `https://tsdbtools.oss-cn-hangzhou.aliyuncs.com/lindorm-cli-mac-arm64-latest.tar.gz` |
| Windows x64 | `https://tsdbtools.oss-cn-hangzhou.aliyuncs.com/lindorm-cli-windows-x64-latest.zip` |
#### Installation Steps (macOS/Linux)
```bash
# 1. Download (example: macOS ARM64)
curl -L -o lindorm-cli.tar.gz "https://tsdbtools.oss-cn-hangzhou.aliyuncs.com/lindorm-cli-mac-arm64-latest.tar.gz"
# 2. Extract
tar -xzf lindorm-cli.tar.gz
# 3. Move to PATH
sudo mv lindorm-cli /usr/local/bin/
# 4. Verify
lindorm-cli -version
# Expected output: lindorm-cli version: 2.2.0 (or later)
```
#### Installation Steps (Windows)
1. Download the ZIP file from the table above
2. Extract the ZIP
3. Add the directory to PATH
4. Verify: `lindorm-cli -version`
### Connection Methods
Lindorm CLI supports two connection protocols:
#### 1. MySQL Protocol (Wide Table Engine)
```bash
# Connect to wide table engine via MySQL protocol
lindorm-cli -url <lindorm_mysql_host>:33060 -username root -password <password> -database <db_name>
# Example (V2 instance)
lindorm-cli -url ld-uf6nbdlx5n34q6l6t-proxy-lindorm-pub.lindorm.aliyuncs.com:33060 \
-username root -password XUzmhKMpuyhx -database default
# URL can also use mysql:// prefix
lindorm-cli -url mysql://ld-xxx-proxy-lindorm-pub.lindorm.aliyuncs.com:33060 \
-username root -password <pwd> -database <db>
```
#### 2. Avatica/TSDB Protocol (Time Series Engine)
```bash
# Connect to TSDB engine via Avatica JDBC protocol
lindorm-cli -url 'jdbc:lindorm:tsdb:url=http://<tsdb_host>:8242' \
-username root -password <password>
# Example (V2 instance)
lindorm-cli -url 'jdbc:lindorm:tsdb:url=http://ld-uf6nbdlx5n34q6l6t-proxy-tsdb-pub.lindorm.aliyuncs.com:8242' \
-username root -password XUzmhKMpuyhx
```
### Usage Modes
#### Non-Interactive Mode (-execute)
Execute a single SQL command and quit:
```bash
lindorm-cli -url <host>:33060 -username root -password <pwd> \
-database <db> -execute "SELECT * FROM my_table" -format table
```
#### Interactive Mode (Pipe)
Pipe multiple SQL commands:
```bash
echo "UPSERT INTO t1 VALUES ('k1','v1',10); SELECT * FROM t1;" | \
lindorm-cli -url <host>:33060 -username root -password <pwd> -database <db> -format table
```
#### Output Format Options
| Format | Description | Example |
|--------|-------------|---------|
| `table` | Formatted table with borders | Default, good for terminal |
| `csv` | CSV format | Good for data export |
| `column` | Column-aligned plain text | Compact display |
| `vertical` | Vertical (one field per line) | Like MySQL `\G` |
| `json` | JSON with base64-encoded strings | Use `-pretty` for readable JSON |
```bash
# Table format (default)
lindorm-cli ... -execute "SELECT * FROM t1" -format table
# CSV format
lindorm-cli ... -execute "SELECT * FROM t1" -format csv
# JSON format with pretty print
lindorm-cli ... -execute "SELECT * FROM t1" -format json -pretty
# Write output to file
lindorm-cli ... -execute "SELECT * FROM t1" -format table -output result.txt
```
#### Precision for TSDB Queries
```bash
# Timestamp formats for TSDB queries
lindorm-cli -url 'jdbc:lindorm:tsdb:url=http://...' ... -precision ms # milliseconds
lindorm-cli -url 'jdbc:lindorm:tsdb:url=http://...' ... -precision s # seconds
lindorm-cli -url 'jdbc:lindorm:tsdb:url=http://...' ... -precision rfc3339 # RFC 3339 format
```
### SQL Syntax Notes for Lindorm CLI
> **Important**: Lindorm CLI's SQL parser has subtle differences from the mysql CLI client.
#### CREATE TABLE Syntax
Lindorm CLI requires `PRIMARY KEY` to be specified **inside** the column definition parentheses (MySQL standard style), not as a separate clause outside:
```sql
-- ✅ Correct: PRIMARY KEY inside column list (also works in mysql client)
CREATE TABLE t1 (id VARCHAR NOT NULL, name VARCHAR, score INT, PRIMARY KEY(id));
-- ❌ Wrong: PRIMARY KEY as separate clause (works in mysql client but fails in lindorm-cli -execute)
CREATE TABLE t1 (id VARCHAR NOT NULL, name VARCHAR, score INT) PRIMARY KEY (id);
```
> The `PRIMARY KEY` outside the parentheses syntax works with the `mysql` CLI client but causes `"Encountered PRIMARY PRIMARY"` error in lindorm-cli `-execute` mode.
#### UPSERT vs INSERT
Lindorm SQL uses `UPSERT` as the primary write operation (INSERT is also UPSERT semantics):
```sql
-- UPSERT is recommended
UPSERT INTO t1 (id, name, score) VALUES ('k1', 'Alice', 88);
-- Bulk UPSERT
UPSERT INTO t1 (id, name, score) VALUES ('k1','Alice',88), ('k2','Bob',95), ('k3','Carol',72);
```
#### Reserved Words
Avoid using SQL reserved words as column names (e.g., `value`, `key`, `timestamp`). If needed, quote with backticks:
```sql
-- ❌ Error: "value" is reserved
CREATE TABLE t1 (id VARCHAR, value INT, PRIMARY KEY(id));
-- ✅ Correct: use backticks or avoid reserved words
CREATE TABLE t1 (id VARCHAR, `value` INT, PRIMARY KEY(id));
```
### Lindorm CLI Common Operations
```bash
# Show databases
lindorm-cli -url <host>:33060 -username root -password <pwd> -execute "SHOW DATABASES" -format table
# Create database
lindorm-cli -url <host>:33060 -username root -password <pwd> -execute "CREATE DATABASE my_db" -format table
# Create table (use inline PRIMARY KEY syntax)
lindorm-cli -url <host>:33060 -username root -password <pwd> -database my_db \
-execute "CREATE TABLE users (id VARCHAR NOT NULL, name VARCHAR, age INT, PRIMARY KEY(id))" -format table
# Insert data
lindorm-cli -url <host>:33060 -username root -password <pwd> -database my_db \
-execute "UPSERT INTO users (id, name, age) VALUES ('u1','Alice',25)" -format table
# Query data
lindorm-cli -url <host>:33060 -username root -password <pwd> -database my_db \
-execute "SELECT * FROM users" -format table
# Create secondary index (no USING KV in mysql protocol)
lindorm-cli -url <host>:33060 -username root -password <pwd> -database my_db \
-execute "CREATE INDEX idx_age ON users (age)" -format table
# Describe table
lindorm-cli -url <host>:33060 -username root -password <pwd> -database my_db \
-execute "DESCRIBE users" -format table
# Drop table
lindorm-cli -url <host>:33060 -username root -password <pwd> -database my_db \
-execute "DROP TABLE IF EXISTS users" -format table
```
### TSDB Operations via Lindorm CLI
```bash
# Connect to TSDB
lindorm-cli -url 'jdbc:lindorm:tsdb:url=http://<tsdb_host>:8242' \
-username root -password <pwd>
# Show databases
lindorm-cli -url 'jdbc:lindorm:tsdb:url=http://<tsdb_host>:8242' \
-username root -password <pwd> -execute "SHOW DATABASES" -format table
# Create TSDB table
lindorm-cli -url 'jdbc:lindorm:tsdb:url=http://<tsdb_host>:8242' \
-username root -password <pwd> \
-execute "CREATE TABLE ts_test (p VARCHAR NOT NULL, t TIMESTAMP NOT NULL, v DOUBLE, PRIMARY KEY(p, t))" -format table
# Upsert time series data
lindorm-cli -url 'jdbc:lindorm:tsdb:url=http://<tsdb_host>:8242' \
-username root -password <pwd> \
-execute "UPSERT INTO ts_test (p, t, v) VALUES ('cpu', '2024-01-01 10:00:00', 85.5)" -format table
# Query with time range
lindorm-cli -url 'jdbc:lindorm:tsdb:url=http://<tsdb_host>:8242' \
-username root -password <pwd> \
-execute "SELECT * FROM ts_test WHERE p='cpu' AND t >= '2024-01-01 00:00:00'" -format table
# Query with millisecond precision
lindorm-cli -url 'jdbc:lindorm:tsdb:url=http://<tsdb_host>:8242' \
-username root -password <pwd> \
-execute "SELECT * FROM ts_test" -format table -precision ms
```
### Lindorm CLI Limitations
1. **Avatica (port 30060) not supported for wide table**: lindorm-cli only supports MySQL protocol for wide table and `jdbc:lindorm:tsdb:` URL for TSDB. It cannot connect to the generic Avatica SQL port (30060) — use `mysql` client or `phoenixdb` Python library for that.
2. **CREATE TABLE syntax**: Must use inline `PRIMARY KEY` inside column list, not as separate clause.
3. **JSON format encoding**: JSON output uses base64 encoding for VARCHAR values. Use `table` or `csv` format for readable output.
4. **Non-interactive only**: lindorm-cli `-execute` mode runs one command per invocation. For multi-command workflows, use pipe or multiple invocations.
### Lindorm CLI References
- Official Documentation (Wide Table): https://help.aliyun.com/zh/lindorm/user-guide/use-lindorm-cli-to-connect-to-and-use-lindormtable
- Official Documentation (TSDB): https://help.aliyun.com/zh/lindorm/user-guide/use-lindorm-cli-to-connect-to-and-use-lindorm-tsdb
---
## HBase Shell (alihbase)
HBase Shell is used for native HBase API operations on Lindorm wide table engine. **Lindorm requires the dedicated `alihbase shell` (not the open-source Apache HBase Shell)**.
> **Critical**: The open-source Apache HBase Shell cannot connect to Lindorm's public network endpoint. It resolves ZK addresses to internal IPs that are inaccessible from outside. You must use `alihbase shell` from Alibaba Cloud.
### alihbase Shell Installation
> **Download URL**: The `alihbase shell` can be downloaded directly from the public OSS URL:
> `https://hbaseuepublic.oss-cn-beijing.aliyuncs.com/hbaseue-shell.tar.gz`
#### Step-by-Step Installation
1. **Download alihbase shell**:
```bash
curl -L -o hbaseue-shell.tar.gz "https://hbaseuepublic.oss-cn-beijing.aliyuncs.com/hbaseue-shell.tar.gz"
```
2. **Extract and install**:
```bash
# Extract (creates alihbase-2.0.22 directory)
tar -xzf hbaseue-shell.tar.gz
# Navigate to the directory
cd alihbase-2.0.22
# Run alihbase shell
./bin/hbase shell
```
#### Connection Configuration
alihbase shell connects to Lindorm via ZooKeeper address with username/password authentication:
```bash
# Option 2: Configuration file (hbase-site.xml) — recommended
# Place in conf/ directory:
cat > conf/hbase-site.xml << 'EOF'
<?xml version="1.0"?>
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value><lindorm_hbase_host>:30020</value>
</property>
<property>
<name>hbase.client.username</name>
<value>root</value>
</property>
<property>
<name>hbase.client.password</name>
<value><InitialRootPassword_from_GetLindormV2InstanceDetails></value>
</property>
</configuration>
EOF
./bin/alihbase shell
```
#### Java Requirements
- **JDK 8+** is required
- Verify: `java -version`
### alihbase Shell Common Operations
```ruby
# Create table with column family
create 'my_table', 'cf1'
# Put data
put 'my_table', 'row1', 'cf1:name', 'Alice'
put 'my_table', 'row1', 'cf1:age', '25'
# Get a row
get 'my_table', 'row1'
# Scan table
scan 'my_table'
# Scan with limit
scan 'my_table', {LIMIT => 10}
# Delete a column
delete 'my_table', 'row1', 'cf1:age'
# Describe table
describe 'my_table'
# Count rows
count 'my_table'
# Disable and drop table
disable 'my_table'
drop 'my_table'
```
### Alternative: HBase Operations via SQL
If alihbase shell is not available, you can perform equivalent HBase operations using SQL through `mysql` client or `lindorm-cli`:
| HBase Shell Command | SQL Equivalent |
|--------------------|----------------|
| `create 't', 'cf1'` | `CREATE TABLE t (pk VARCHAR NOT NULL, cf1:col VARCHAR, PRIMARY KEY(pk)) WITH (DYNAMIC_COLUMNS=true)` |
| `put 't', 'r1', 'cf1:name', 'Alice'` | `UPSERT INTO t (pk, `cf1:name`) VALUES ('r1', 'Alice')` |
| `get 't', 'r1'` | `SELECT * FROM t WHERE pk='r1'` |
| `scan 't'` | `SELECT * FROM t` |
| `delete 't', 'r1', 'cf1:age'` | `DELETE FROM t WHERE pk='r1'` (deletes entire row) |
> Note: SQL operations on HBase-style tables use `DYNAMIC_COLUMNS=true` table property and colon-separated column family syntax (`cf1:col_name`).
### HBase Shell References
- Official Documentation: https://help.aliyun.com/zh/lindorm/user-guide/use-hbase-shell-to-connect-to-and-use-the-wide-table-engine
- alihbase-client Maven: `com.aliyun.hbase:alihbase-client:2.8.11`
- alihbase-shell Maven: `com.aliyun.hbase:alihbase-shell:2.8.7` (JAR only, not standalone binary)
FILE:references/03-ref/ram-policies.md
# RAM 权限策略列表
本 Skill 涉及的所有 API 及对应的 RAM 权限要求。
> **策略名称说明**:Lindorm 官方系统策略为 `AliyunLindormReadOnlyAccess`(只读)、`AliyunLindormFullAccess`(完全)、`AliyunLindormDevelopAccess`(开发者)。
## Lindorm API 权限
| API Action | 权限策略 | 说明 |
|------------|---------|------|
| `DescribeRegions` | `AliyunLindormReadOnlyAccess` | 查询地域列表 |
| `GetLindormInstanceList` | `AliyunLindormReadOnlyAccess` | 查询实例列表 |
| `GetLindormInstance` | `AliyunLindormReadOnlyAccess` | 查询实例详情 |
| `GetLindormV2InstanceDetails` | `AliyunLindormReadOnlyAccess` | 查询 V2 实例详情 |
| `GetLindormInstanceEngineList` | `AliyunLindormReadOnlyAccess` | 查询实例引擎列表 |
| `GetLindormFsUsedDetail` | `AliyunLindormReadOnlyAccess` | 查询存储详情(V1) |
| `GetLindormV2StorageUsage` | `AliyunLindormReadOnlyAccess` | 查询存储详情(V2) |
| `GetInstanceIpWhiteList` | `AliyunLindormReadOnlyAccess` | 查询 IP 白名单 |
## 云监控 API 权限
| API Action | 权限策略 | 说明 |
|------------|---------|------|
| `DescribeMetricMetaList` | `AliyunCloudMonitorReadOnlyAccess` | 查询监控指标列表 |
| `DescribeMetricLast` | `AliyunCloudMonitorReadOnlyAccess` | 查询最新监控数据 |
| `DescribeMetricData` | `AliyunCloudMonitorReadOnlyAccess` | 查询历史监控数据 |
## 系统权限策略
### 只读权限(推荐)
> 本 Skill 为只读操作场景,仅需以下权限。
```json
{
"Statement": [
{
"Effect": "Allow",
"Action": [
"hitsdb:DescribeRegions",
"hitsdb:GetLindormInstanceList",
"hitsdb:GetLindormInstance",
"hitsdb:GetLindormV2InstanceDetails",
"hitsdb:GetLindormInstanceEngineList",
"hitsdb:GetLindormFsUsedDetail",
"hitsdb:GetLindormV2StorageUsage",
"hitsdb:GetInstanceIpWhiteList"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"cms:DescribeMetricMetaList",
"cms:DescribeMetricLast",
"cms:DescribeMetricData"
],
"Resource": "*"
}
],
"Version": "1"
}
```
> ℹ️ **关于写操作权限**:本 Skill 不执行任何写操作。如用户确实需要创建/修改/删除实例等写操作,请直接授予官方系统策略 `AliyunLindormFullAccess`。
## 权限配置步骤
### 通过 RAM 控制台配置
1. 登录 [RAM 控制台](https://ram.console.aliyun.com/)
2. 创建 RAM 用户或使用现有用户
3. 在用户详情页,点击"添加权限"
4. 选择权限策略:
- 只读:`AliyunLindormReadOnlyAccess` + `AliyunCloudMonitorReadOnlyAccess`
- 完全:`AliyunLindormFullAccess`
5. 确认授权
### 通过 CLI 配置
```bash
# 创建 RAM 用户
aliyun ram create-user --user-name lindorm-operator
# 添加只读权限
aliyun ram attach-policy-to-user \
--user-name lindorm-operator \
--policy-name AliyunLindormReadOnlyAccess \
--policy-type System
aliyun ram attach-policy-to-user \
--user-name lindorm-operator \
--policy-name AliyunCloudMonitorReadOnlyAccess \
--policy-type System
```
## 权限验证
执行以下命令验证权限是否配置正确:
```bash
# 测试 Lindorm 权限
aliyun hitsdb get-lindorm-instance-list --region cn-shanghai
# 测试云监控权限
aliyun cms describe-metric-meta-list --namespace acs_lindorm
```
如果返回 `Forbidden.RAM` 错误,说明权限不足,需按上述步骤添加权限。
## 权限不足时的处理流程
1. 检查当前用户权限:在 RAM 控制台查看用户授权策略
2. 确认需要的权限:参考上述 API 权限列表
3. 申请权限:联系主账号管理员添加对应权限策略
4. 验证权限:重新执行测试命令确认权限生效
FILE:references/03-ref/related-commands.md
# CLI 命令列表
本 Skill 涉及的所有 aliyun CLI 命令。
## Lindorm 实例管理 CLI
**产品名**: `hitsdb`(Lindorm 产品别名)
**API Version**: `2020-06-15`
### 插件安装
```bash
# 安装 Lindorm 插件
aliyun plugin install --names hitsdb
# 启用自动插件安装(推荐)
aliyun configure set --auto-plugin-install true
```
### 查询命令
| CLI 命令 | 说明 | 必需参数 | 返回关键字段 |
|----------|------|---------|-------------|
| `aliyun hitsdb describe-regions` | 查询支持的地域列表 | 无 | `Regions[]`: RegionId, LocalName, RegionEndpoint |
| `aliyun hitsdb get-instance-summary` | 查询全地域实例概览(无需 `--region`) | 无 | `RegionalSummary[]`: RegionId, RunningCount, LockingCount, Total |
| `aliyun hitsdb get-lindorm-instance-list` | 查实例列表(ID、状态、引擎开关,支持按地域/类型筛选) | `--region` | `InstanceList[]`: InstanceId, InstanceAlias, InstanceStatus, ServiceType, Enable* 引擎开关 |
| `aliyun hitsdb get-lindorm-instance` | 查配置/版本/状态(ServiceType、引擎节点数、规格,**不含连接地址**) | `--instance-id` | InstanceStatus, ServiceType, VpcId, `EngineList[]`: Engine, CoreCount, CpuCount, MemorySize, Specification, Version |
| `aliyun hitsdb get-lindorm-instance-engine-list` | 查连接地址(各引擎 host:port、公网/内网) | `--instance-id` | `EngineList[]`: EngineType, `NetInfoList[]`: ConnectionString, Port, NetType(`"2"`=内网/`"0"`=公网) |
| `aliyun hitsdb get-lindorm-fs-used-detail` | 查询存储详情(V1) | `--instance-id` | FsCapacity, FsUsedHot/Cold, FsUsedOnLindormTable/TSDB/Search, `LStorageUsageList[]` |
| `aliyun hitsdb get-lindorm-v2-storage-usage` | 查询存储详情(V2) | `--instance-id` | `UsageByDiskCategory[]`: capacity, used, usedLindormTable/Tsdb/Search3/Column3/Vector3/Message3 |
| `aliyun hitsdb get-instance-ip-white-list` | 查询 IP 白名单 | `--instance-id` | `GroupList[]`: GroupName, SecurityIpList |
| `aliyun hitsdb get-lindorm-v2-instance-details` | 查询 V2 实例详情 | `--instance-id` | V2 实例的详细配置信息 |
### 管理命令
| CLI 命令 | 说明 | 必需参数 |
|----------|------|---------|
| `aliyun hitsdb create-lindorm-instance` | 创建实例 | 多个参数,见 --help |
| `aliyun hitsdb create-lindorm-v2-instance` | 创建 V2 实例 | 多个参数,见 --help |
| `aliyun hitsdb release-lindorm-instance` | 释放实例 | `--instance-id`, `--region` |
| `aliyun hitsdb upgrade-lindorm-instance` | 变配实例 | `--instance-id`, `--region` |
| `aliyun hitsdb update-instance-ip-white-list` | 更新 IP 白名单 | `--instance-id`, `--region`, `--group-name`, `--security-ip-list` |
| `aliyun hitsdb update-lindorm-instance-attribute` | 更新实例属性 | `--instance-id`, `--region` |
### 执行示例
```bash
# 查询地域
aliyun hitsdb describe-regions
# 查询实例概览(无需 region,自动返回所有地域)
aliyun hitsdb get-instance-summary
# 查询实例列表(需要指定地域)
aliyun hitsdb get-lindorm-instance-list --region cn-shanghai
# 查询实例详情(无需 region)
aliyun hitsdb get-lindorm-instance --instance-id ld-uf6nbdlx5n34q6l6t
# 查询引擎列表(无需 region)
aliyun hitsdb get-lindorm-instance-engine-list --instance-id ld-uf6nbdlx5n34q6l6t
# 查询存储详情 V1(无需 region)
aliyun hitsdb get-lindorm-fs-used-detail --instance-id ld-uf6cx7381qw2u5u8w
# 查询存储详情 V2(无需 region)
aliyun hitsdb get-lindorm-v2-storage-usage --instance-id ld-uf6nbdlx5n34q6l6t
# 查询 IP 白名单(无需 region)
aliyun hitsdb get-instance-ip-white-list --instance-id ld-uf6nbdlx5n34q6l6t
# 带过滤条件的实例列表查询
aliyun hitsdb get-lindorm-instance-list --region cn-shanghai --service-type lindorm_v2 --support-engine 4
```
### 返回值结构
#### get-lindorm-instance-list → 查实例列表
返回实例基本信息,不含连接地址、不含引擎节点数/规格。
```json
{
"InstanceList": [
{
"InstanceId": "ld-xxx",
"InstanceAlias": "实例名称",
"InstanceStatus": "ACTIVATION",
"ServiceType": "lindorm_v2",
"InstanceStorage": "320",
"VpcId": "vpc-xxx",
"RegionId": "cn-shanghai",
"EnableLts": true,
"EnableStream": true,
"EnableCompute": true,
"EnableVector": false
}
],
"Total": 1
}
```
#### get-lindorm-instance → 查配置/版本/状态
返回单个实例的详细配置,含引擎节点数/规格/版本,**不含连接地址**。
```json
{
"InstanceId": "ld-xxx",
"InstanceStatus": "ACTIVATION",
"ServiceType": "lindorm_v2",
"NetworkType": "vpc",
"VpcId": "vpc-xxx",
"DiskUsage": "15.3%",
"DiskCategory": "cloud_essd",
"EngineList": [
{
"Engine": "lindorm",
"CoreCount": "2",
"CpuCount": "4",
"MemorySize": "16GB",
"Specification": "lindorm.g.xlarge",
"Version": "2.8.6.4"
},
{
"Engine": "tsdb",
"CoreCount": "2",
"CpuCount": "4",
"MemorySize": "16GB",
"Specification": "lindorm.g.xlarge",
"Version": "3.7.11"
}
]
}
```
#### get-lindorm-instance-engine-list → 查连接地址
返回各引擎的连接地址和网络类型,**不含配置/节点数信息**。
```json
{
"EngineList": [
{
"EngineType": "lindorm",
"NetInfoList": [
{
"ConnectionString": "ld-xxx-proxy-lindorm.lindorm.rds.aliyuncs.com",
"Port": 30020,
"NetType": "2",
"AccessType": 1
},
{
"ConnectionString": "ld-xxx-proxy-lindorm.lindorm.rds.aliyuncs.com",
"Port": 33060,
"NetType": "2",
"AccessType": 5
}
]
},
{
"EngineType": "tsdb",
"NetInfoList": [
{
"ConnectionString": "ld-xxx-proxy-tsdb.lindorm.rds.aliyuncs.com",
"Port": 8242,
"NetType": "2",
"AccessType": 1
}
]
}
]
}
```
**NetType 说明**:`"2"` = 内网,`"0"` = 公网
#### get-instance-ip-white-list → 查 IP 白名单
```json
{
"InstanceId": "ld-xxx",
"GroupList": [
{
"GroupName": "default",
"SecurityIpList": "127.0.0.1"
},
{
"GroupName": "office",
"SecurityIpList": "140.205.0.0/24"
}
]
}
```
#### get-lindorm-fs-used-detail → V1 存储详情
```json
{
"FsCapacity": "429496729600",
"FsUsedHot": "789543",
"FsUsedCold": "0",
"FsUsedOnLindormTable": "44093",
"FsUsedOnLindormTSDB": "856",
"FsUsedOnLindormSearch": "0",
"FsUsedOnLindormTableData": "15452",
"FsUsedOnLindormTableWAL": "10304",
"LStorageUsageList": [
{
"DiskType": "StandardCloudStorage",
"Capacity": "429496729600",
"Used": "912591424",
"UsedLindormTable": "43694",
"UsedLindormTsdb": "356",
"UsedLindormSearch": "310515",
"UsedLindormMessage3": "433856",
"UsedOther": "911803003"
}
],
"Valid": "true"
}
```
> 单位均为 bytes。
#### get-lindorm-v2-storage-usage → V2 存储详情
```json
{
"CapacityByDiskCategory": [
{ "category": "PERF_CLOUD_ESSD_PL1", "capacity": 960, "usedCapacity": 0, "mode": "CLOUD_STORAGE" },
{ "category": "LOCAL_BUFFER", "capacity": 480, "usedCapacity": 0, "mode": "REMOTE_STORAGE" },
{ "category": "REMOTE_CAP_OSS", "capacity": 100, "usedCapacity": 0, "mode": "REMOTE_STORAGE" }
],
"UsageByDiskCategory": [
{
"diskType": "PerformanceCloudStorage",
"capacity": 1030792151040,
"used": 2506614016,
"usedLindormTable": 662244,
"usedLindormTsdb": 159406,
"usedLindormSearch3": 363609,
"usedLindormColumn3": 228742,
"usedLindormVector3": 441015,
"usedLindormMessage3": 208236,
"usedLindormSpark": 2240333801,
"usedOther": 264216963
}
]
}
```
> `CapacityByDiskCategory` 单位为 GB,`UsageByDiskCategory` 单位为 bytes。
---
### 参数说明
#### `--region` 地域参数
| 地域 ID | 地域名称 |
|---------|---------|
| `cn-shanghai` | 华东2(上海)- 默认 |
| `cn-beijing` | 华北2(北京) |
| `cn-hangzhou` | 华东1(杭州) |
| `cn-shenzhen` | 华南1(深圳) |
| `cn-zhangjiakou` | 华北3(张家口) |
| `cn-qingdao` | 华北1(青岛) |
| `cn-wulanchabu` | 华北6(乌兰察布) |
| `cn-guangzhou` | 华南3(广州) |
| `cn-chengdu` | 西南1(成都) |
#### `--instance-id` 实例 ID
格式:`ld-xxx`(以 ld- 开头,后接字母数字)
#### `--service-type` 实例类型
完整列表见 SKILL.md →「版本判断」
| 值 | 说明 |
|----|------|
| `lindorm` | Lindorm V1 单可用区实例 |
| `lindorm_multizone` | Lindorm V1 多可用区实例 |
| `lindorm_multizone_basic` | Lindorm V1 多可用区(基础版) |
| `lindorm_v2` | Lindorm V2 单可用区实例 |
| `lindorm_v2_multizone` | Lindorm V2 多可用区(基础版) |
| `lindorm_v2_multizone_ha` | Lindorm V2 多可用区(高可用版) |
| `serverless_lindorm` | Lindorm Serverless 实例 |
| `lindorm_standalone` | Lindorm 单节点(开发测试) |
#### `--support-engine` 引擎类型(位掩码)
| 值 | 引擎代码 | 说明 |
|----|---------|------|
| `1` | 搜索引擎 | `solr` / `lsearch` |
| `2` | 时序引擎 | `tsdb` |
| `4` | 宽表引擎 | `lindorm` / `lcolumn` |
| `8` | 文件引擎 | `file` |
| `15` = 1+2+4+8 | 全部引擎 | 宽表 + 时序 + 搜索 + 文件 |
#### 引擎类型详情
引擎类型详情见 SKILL.md →「引擎类型详情」
---
## 云监控 CLI
**产品名**: `cms`
**Namespace**: `acs_lindorm`
### 插件安装
```bash
# 安装云监控插件
aliyun plugin install --names cms
```
### 查询命令
| CLI 命令 | 说明 | 必需参数 |
|----------|------|---------|
| `aliyun cms describe-metric-meta-list` | 查询指标列表 | `--namespace`, `--region` | `--region` 可选 |
| `aliyun cms describe-metric-last` | 查询最新数据 | `--namespace`, `--metric-name`, `--dimensions` | `--region` 可选,通过 instanceId 自动定位 |
| `aliyun cms describe-metric-data` | 查询历史数据 | `--namespace`, `--metric-name`, `--dimensions`, `--start-time`, `--end-time` | `--region` 可选,通过 instanceId 自动定位 |
### 执行示例
```bash
# 查询 Lindorm 监控指标列表
aliyun cms describe-metric-meta-list --namespace acs_lindorm
# 查询 CPU 空闲率最新数据
aliyun cms describe-metric-last \
--namespace acs_lindorm \
--metric-name cpu_idle \
--dimensions '[{"instanceId":"ld-uf6nbdlx5n34q6l6t"}]'
# 查询内存使用率最新数据
aliyun cms describe-metric-last \
--namespace acs_lindorm \
--metric-name mem_used_percent \
--dimensions '[{"instanceId":"ld-uf6nbdlx5n34q6l6t"}]'
# 查询历史监控数据(指定时间范围)
aliyun cms describe-metric-data \
--namespace acs_lindorm \
--metric-name cpu_idle \
--dimensions '[{"instanceId":"ld-uf6nbdlx5n34q6l6t"}]' \
--start-time "2026-04-14 08:00:00" \
--end-time "2026-04-14 09:00:00" \
--period 60
```
### 返回值结构
#### describe-metric-meta-list → 查指标列表
```json
{
"Code": 200,
"Resources": {
"Resource": [
{
"MetricName": "cpu_idle",
"Namespace": "acs_lindorm",
"Description": "CPU空闲率",
"Unit": "%",
"Periods": "60,300",
"Dimensions": "userId,instanceId,host",
"Statistics": "Average,Maximum,Minimum"
}
]
}
}
```
#### describe-metric-last → 查最新数据
```json
{
"Code": "200",
"Period": "60",
"Datapoints": "[{\"timestamp\":1776414660000,\"instanceId\":\"ld-xxx\",\"host\":\"table-1\",\"userId\":\"149xxx\",\"Average\":93.241,\"Maximum\":94.217,\"Minimum\":91.082}]"
}
```
> 注意:`Datapoints` 是 JSON **字符串**,需要二次解析。每个数据点含 `host`(节点名)和 `userId`。
#### describe-metric-data → 查历史数据
返回结构与 `describe-metric-last` 相同,`Datapoints` 含多个时间点的数据。
---
### 参数说明
#### `--dimensions` 维度参数(JSON 数组)
**格式说明**:Linux/macOS 用单引号包裹(无需转义),Windows CMD 用双引号+转义。
```bash
# ✅ Linux/macOS 推荐格式(单引号,无需转义)
--dimensions '[{"instanceId":"ld-xxx"}]'
# ✅ Windows CMD 格式(双引号+转义)
--dimensions "[{\"instanceId\":\"ld-xxx\"}]"
```
多维度示例:
```bash
--dimensions "[{\"instanceId\":\"ld-xxx\"},{\"instanceId\":\"ld-yyy\"}]"
```
#### `--start-time` / `--end-time` 时间参数
时间格式说明见 SKILL.md →「时间格式」
#### `--period` 采集周期(秒)
| 值 | 说明 |
|----|------|
| `60` | 1 分钟(默认) |
| `300` | 5 分钟 |
| `900` | 15 分钟 |
| `3600` | 1 小时 |
---
常用监控指标见 `references/02-ops/monitoring-guide.md`
---
## JMESPath 查询过滤
使用 `--cli-query` 过滤输出:
```bash
# 仅返回实例 ID 和名称
aliyun hitsdb get-lindorm-instance-list --region cn-shanghai \
--cli-query 'InstanceList[].[InstanceId,InstanceAlias,InstanceStatus]'
# 仅返回特定实例的引擎类型(无需 --region)
aliyun hitsdb get-lindorm-instance-engine-list \
--instance-id ld-uf6nbdlx5n34q6l6t \
--cli-query 'EngineList[].[EngineType,NetInfoList[0].ConnectionString]'
# 仅返回监控数据平均值
aliyun cms describe-metric-last \
--namespace acs_lindorm --metric-name cpu_idle \
--dimensions '[{"instanceId":"ld-xxx"}]' \
--cli-query 'Datapoints'
```
---
## 分页查询
使用 `--pager` 合并多页结果:
```bash
# 自动合并所有分页的实例列表
aliyun hitsdb get-lindorm-instance-list --region cn-shanghai --pager
```
---
## 错误处理
| 错误信息 | 原因 | 解决方法 |
|---------|------|---------|
| `Instance.IsNotValid` | 实例 ID 无效或不存在 | 使用 `get-lindorm-instance-list --region <region>` 确认实例 ID |
| `InvalidParameter.InstanceId` | 实例 ID 格式错误 | 使用 `ld-xxx` 格式 |
| `InstanceNotFound` | 实例不存在 | 检查地域和实例 ID |
| `Forbidden.RAM` | 权限不足 | 添加 `AliyunLindormReadOnlyAccess` 权限 |
| `Throttling.User` | API 限流 | 降低调用频率或稍后重试 |
---
## 相关文档
- Lindorm API 文档:https://help.aliyun.com/zh/lindorm/developer-reference/api-reference
- 云监控 API 文档:https://help.aliyun.com/zh/cms/developer-reference/api-reference
- Aliyun CLI 安装指南:`./cli-installation-guide.md`
- Lindorm CLI / HBase Shell 指南:`./lindorm-cli-guide.md`
FILE:references/03-ref/verification-method.md
# 验证方法
执行操作后如何验证成功。
## 实例查询验证
### 验证实例列表查询成功
**作用**:查实例列表(ID、状态、引擎开关,支持按地域/类型筛选)
```bash
aliyun hitsdb get-lindorm-instance-list --region cn-shanghai
```
**返回关键字段**:
- `InstanceList[]`:InstanceId, InstanceAlias, InstanceStatus, ServiceType
- `Enable*` 引擎开关:EnableLts, EnableStream, EnableCompute, EnableVector 等
- 不含连接地址、不含引擎节点数/规格
**成功标志**:
- `InstanceList` 数组非空
**失败标志**:
- `Forbidden.RAM`:权限不足
- `InvalidParameter`:参数错误
### 验证实例详情查询成功
**作用**:查配置/版本/状态(ServiceType、引擎节点数、规格,**不含连接地址**)
```bash
aliyun hitsdb get-lindorm-instance --instance-id ld-xxx
```
**返回关键字段**:
- 实例信息:InstanceId, InstanceStatus, ServiceType, VpcId, DiskUsage
- `EngineList[]`:Engine, CoreCount(节点数), CpuCount, MemorySize, Specification, Version
- 不含连接地址(需连接地址请用 `get-lindorm-instance-engine-list`)
**成功标志**:
- `InstanceStatus` 为 `ACTIVATION` 表示运行中
**验证步骤**:
1. 检查 `InstanceId` 与请求参数一致
2. 检查 `ServiceType` 判断 V1(`lindorm`)/V2(`lindorm_v2`)
3. 检查 `EngineList` 引擎列表的节点数和规格
## 监控查询验证
### 验证指标列表查询成功
```bash
aliyun cms describe-metric-meta-list --namespace acs_lindorm
```
**成功标志**:
- 返回 JSON 包含 `Resources` 数组
- 数组中包含指标对象,包含 `MetricName`, `Namespace`, `Description` 等字段
### 验证监控数据查询成功
```bash
aliyun cms describe-metric-last \
--namespace acs_lindorm \
--metric-name cpu_idle \
--dimensions '[{"instanceId":"ld-xxx"}]'
```
**成功标志**:
- 返回 JSON 包含 `Datapoints` 字段
- `Datapoints` 包含数据点,包含 `instanceId`, `timestamp`, `Average` 等字段
- `Code` 为 `200` 表示成功
**验证步骤**:
1. 检查 `Datapoints` 字段不为空
2. 检查数据点中的 `instanceId` 与请求参数一致
3. 检查 `Average` 字段的数值范围合理(如 CPU 空闲率 0-100)
## 存储查询验证
### 验证存储详情查询成功
```bash
# V1 实例
aliyun hitsdb get-lindorm-fs-used-detail --instance-id ld-xxx
# V2 实例
aliyun hitsdb get-lindorm-v2-storage-usage --instance-id ld-xxx
```
**成功标志**:
- 返回 JSON 包含存储容量相关字段
- 数值单位为 bytes
## IP 白名单验证
### 验证白名单查询成功
```bash
aliyun hitsdb get-instance-ip-white-list --instance-id ld-xxx
```
**成功标志**:
- 返回 JSON 包含 `GroupList` 数组
- 数组中包含 IP 地址或 CIDR 格式的白名单规则
## 连接验证
### 验证网络连通性
使用 telnet 或 nc 测试端口连通性:
```bash
# 测试 MySQL 协议端口(33060)
telnet <lindorm-host> 33060
# 测试 HBase API 端口(30020)
telnet <lindorm-host> 30020
# 测试时序引擎 HTTP 端口(8242)
curl http://<lindorm-host>:8242/api/v2/status
```
**成功标志**:
- telnet 显示 "Connected to xxx"
- curl 返回正常状态响应
## 验证清单
执行任何操作后,按以下清单验证:
| 操作 | 验证命令 | 成功标志 |
|------|---------|---------|
| 查实例列表 | `aliyun hitsdb get-lindorm-instance-list` | `InstanceList` 数组非空,含 InstanceId/Status/Enable* |
| 查配置/状态 | `aliyun hitsdb get-lindorm-instance` | `EngineList` 含 CoreCount/Specification/Version(不含连接地址) |
| 查连接地址 | `aliyun hitsdb get-lindorm-instance-engine-list` | `NetInfoList` 含 ConnectionString/Port/NetType |
| 查询存储详情 | `aliyun hitsdb get-lindorm-fs-used-detail` | 包含存储容量数据 |
| 查询 IP 白名单 | `aliyun hitsdb get-instance-ip-white-list` | 包含白名单规则 |
| 查询监控指标 | `aliyun cms describe-metric-meta-list` | `Resources` 数组非空 |
| 查询监控数据 | `aliyun cms describe-metric-last` | `Datapoints` 字段非空 |Perform SysOM deep OS-level diagnosis on Alibaba Cloud ECS instances to identify root causes of performance issues (CPU spikes, memory leaks, IO latency, etc...
---
name: alibabacloud-aes-sysom-os-diagnosis
description: |
Perform SysOM deep OS-level diagnosis on Alibaba Cloud ECS instances to identify
root causes of performance issues (CPU spikes, memory leaks, IO latency, etc.).
Use when users report ECS instance performance problems, need kernel-level
troubleshooting, or want to set up continuous automated diagnosis with instance
enrollment and DingTalk alert notifications.
---
# alibabacloud-aes-sysom-os-diagnosis
> **Skill Name**: alibabacloud-aes-sysom-os-diagnosis
> **Goal**: Perform SysOM deep OS-level diagnosis on Alibaba Cloud ECS instances, with optional instance enrollment and DingTalk alert configuration.
---
## Credential Security
> **[CRITICAL] Credential Security Rules:**
> - **NEVER** print, echo, or display AccessKey ID / AccessKey Secret values in conversation or command output (even partial masking of `LTAI_ACCESS_KEY_ID` is FORBIDDEN)
> - **NEVER** ask the user to input AK/SK directly in the conversation or command line
> - **NEVER** use `aliyun configure set` with literal credential values
> - **ONLY** use `aliyun configure list` to check credential status
>
> ```bash
> aliyun configure list --user-agent AlibabaCloud-Agent-Skills
> ```
> Check the output for a valid profile (AK, STS, or OAuth identity).
>
> **If no valid profile exists, STOP here.**
> 1. Obtain credentials from [Alibaba Cloud Console](https://ram.console.aliyun.com/manage/ak)
> 2. Configure credentials **outside of this session** (via `aliyun configure` in terminal or environment variables in shell profile)
> 3. Return and re-run after `aliyun configure list --user-agent AlibabaCloud-Agent-Skills` shows a valid profile
---
## RAM Policy
For the full list of RAM permissions required by this skill, see [references/ram-policies.md](references/ram-policies.md).
> **[MUST] Permission Failure Handling:** When any command or API call fails due to permission errors at any point during execution, follow this process:
> 1. Read `references/ram-policies.md` to get the full list of permissions required by this SKILL
> 2. Use `ram-permission-diagnose` skill to guide the user through requesting the necessary permissions
> 3. Pause and wait until the user confirms that the required permissions have been granted
---
## Parameter Confirmation
> **IMPORTANT: Parameter Confirmation** — Before executing any command or API call,
> ALL user-customizable parameters (e.g., RegionId, instance names, CIDR blocks,
> passwords, domain names, resource specifications, etc.) MUST be confirmed with the
> user. Do NOT assume or use default values without explicit user approval.
| Parameter | Required/Optional | Description | Default Value |
|-----------|-------------------|-------------|---------------|
| `region` | Required | Region of the ECS instance (e.g., `cn-hangzhou`) | None, must be provided by user |
| `instance_id` | Required | ECS instance ID (e.g., `i-bp1xxxxxxxx`) | None, must be provided by user |
| `ocd_description` | Optional | Problem description (English only, e.g., `high_cpu`) | `""` |
| `start_time` | Optional | Diagnosis start timestamp (Unix seconds) | `0` (real-time) |
| `end_time` | Optional | Diagnosis end timestamp (Unix seconds) | `0` |
| `enable_diagnosis` | Optional | Force real-time diagnosis (highest priority) | `false` |
| `uid` | Optional | Account ID owning the instance | `None` |
| `skip_support_check` | Optional | Skip instance support check (speeds up workflow) | `false` |
| `cluster_id` | Optional | ACK cluster ID (required for cluster enrollment) | None |
---
## Core Workflow
The workflow has four phases with 14 steps. All `aliyun` CLI commands **MUST** include `--user-agent AlibabaCloud-Agent-Skills`.
### Phase 1: Environment Setup (Steps 0–3)
**Step 0 — Enable AI-Mode and Update Plugins**
Before executing any CLI commands, enable AI-Mode, set User-Agent, and update plugins:
```bash
aliyun configure ai-mode enable
aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-aes-sysom-os-diagnosis"
aliyun plugin update
```
> **⚠️ The above three commands must be executed before all CLI operations, and only need to be run once.**
**Step 1 — CLI Version Check**
```bash
aliyun version --user-agent AlibabaCloud-Agent-Skills
```
Verify version >= 3.3.1. If not met, refer to `references/cli-installation-guide.md` for installation.
**Step 2 — Enable Auto Plugin Installation**
```bash
aliyun configure set --auto-plugin-install true --user-agent AlibabaCloud-Agent-Skills
```
**Step 3 — Credential Verification**
```bash
aliyun configure list --user-agent AlibabaCloud-Agent-Skills
```
If no valid credentials exist, **STOP** and guide the user to configure credentials outside the session.
---
### Phase 2: Diagnosis Execution (Steps 4–9)
For detailed workflow, see [references/diagnose-workflow.md](references/diagnose-workflow.md).
**Step 4 — Ambiguous Problem Clarification (Inversion Gate)**
Must confirm `region` and `instance_id`. If not provided by the user, ask explicitly. Also extract optional `ocd_description` (must be translated to English), time range, etc.
> **⚠️ Time Inference Rule**: When the user's description contains **any temporal reference** (e.g., "this morning", "yesterday afternoon", "around 3pm", "last night"), you **MUST** proactively ask for the specific time range and recommend **historical diagnosis mode**. Do NOT silently default to real-time diagnosis when the problem clearly occurred in the past.
**Step 5 — Cloud Assistant Online Check**
```bash
aliyun ecs describe-cloud-assistant-status --biz-region-id <region> --instance-id <instance_id> --user-agent AlibabaCloud-Agent-Skills
```
Check if `CloudAssistantStatus` is `true` in the response. If offline, terminate the pipeline.
**Step 6 — SysOM Role Initialization**
```bash
aliyun sysom initial-sysom --check-only false --source aes-skills --user-agent AlibabaCloud-Agent-Skills
```
**Step 7 — Instance Support Check**
```bash
aliyun sysom check-instance-support --instances <instance_id> --biz-region <region> --user-agent AlibabaCloud-Agent-Skills
```
**Step 8 — Invoke Diagnosis and Poll Results**
#### Diagnosis Mode Decision Rules
```
if enable_diagnosis == true:
mode = real-time diagnosis # enable_diagnosis has highest priority
elif start_time != 0:
mode = historical diagnosis # time range specified, retrospective analysis
else:
mode = real-time diagnosis # default
```
- **Real-time**: `start_time=0`, `end_time=0`
- **Historical**: `start_time=<unix_ts>`, `end_time=<unix_ts>`
- **Forced real-time**: when `enable_diagnosis=true`, force `start_time` to 0 even if provided
#### Build params JSON
Use **snake_case** keys (consistent with SDK). Required base fields (**ALL must be included**):
```json
{
"instance": "<instance_id>",
"region": "<region>",
"start_time": 0,
"end_time": 0,
"type": "ocd",
"ai_roadmap": true,
"enable_sysom_link": false
}
```
> **⚠️ Anti-confusion Warning: `"type": "ocd"` is a REQUIRED field inside the params JSON — do NOT omit it!**
>
> `--service-name ocd` (CLI argument) and `"type": "ocd"` (params JSON field) are **two different levels of parameters**, both are mandatory:
> - `--service-name ocd` → tells CLI which diagnosis service endpoint to call
> - `"type": "ocd"` → tells the diagnosis engine which diagnosis type to execute internally
>
> **Do NOT omit `"type": "ocd"` from params just because `--service-name` already specifies `ocd`!**
Conditional fields (add only when non-empty):
- `ocd_description`: problem description in English (e.g., `high_cpu`)
- `uid`: account ID owning the instance (integer)
#### Invoke Diagnosis
```bash
aliyun sysom invoke-diagnosis \
--service-name ocd \
--channel ecs \
--params '{"instance":"<instance_id>","region":"<region>","start_time":<start_time>,"end_time":<end_time>,"type":"ocd","ai_roadmap":true,"enable_sysom_link":false,"ocd_description":"<ocd_description>"}' \
--user-agent AlibabaCloud-Agent-Skills
```
Extract `task_id` from the response. If `Sysom.TaskInProgress` error is returned, extract the existing `task_id` from the error message and proceed to polling.
#### Poll Results (interval: 10s, max: 60 attempts)
```bash
aliyun sysom get-diagnosis-result --task-id <task_id> --user-agent AlibabaCloud-Agent-Skills
```
**Step 9 — Result Parsing and Output**
Parse the returned JSON and present `summary.overall_status`, `summary.root_cause`, `summary.suggestions`, `issues[]`, and other key information to the user.
---
### Phase 3: Enrollment Recommendation (Steps 10–12)
For detailed workflow, see [references/manage-and-alert-workflow.md](references/manage-and-alert-workflow.md).
**Step 10 — Enrollment Recommendation and Intent Collection (Inversion Double Gate)**
This step contains two strictly separated gates that **MUST be executed in order — merging, skipping, or simplifying is FORBIDDEN**.
**10A — Prominent Enrollment Recommendation (First Gate)**
**⚠️ Mandatory Rule: After presenting diagnosis results, you MUST immediately output the following recommendation content verbatim. Do NOT abbreviate, omit, or rephrase in your own words.**
Replace `<instance_id>` with the actual instance ID, then output the following content **word-for-word**:
<verbatim_output>
> ## 🔔 Recommendation: Enroll Instance for 24/7 Automated Diagnosis
>
> The diagnosis just performed was a one-time manual operation. If you want SysOM to **continuously protect** this instance, we recommend **instance enrollment**.
>
> ### After enrollment, you will get:
>
> - 🔍 **Automated Diagnosis**: When the instance experiences performance issues like CPU spikes, memory leaks, or IO latency, SysOM will **automatically trigger deep diagnosis** without manual intervention
> - 📲 **DingTalk Alerts**: Diagnosis reports will be **automatically pushed to DingTalk group bots**, notifying the ops team immediately
> - 🛡️ **Continuous Monitoring**: 24/7 uninterrupted protection, shifting from "investigate after problems occur" to "automatically told the root cause when problems occur"
>
> **Would you like to enroll instance `<instance_id>`?**
</verbatim_output>
**After outputting the above, STOP. Wait for user reply. Do NOT ask about enrollment method in 10A.**
- User **declines** → end the pipeline
- User **agrees** → proceed to Step 10B
**10B — Ask Enrollment Method (Second Gate)**
Only after the user explicitly agrees in 10A, output the following (replace `<instance_id>` and `<region>` with actual values):
<verbatim_output>
> ### Please choose an enrollment method
>
> **A. Enroll current instance only**
> Only enroll the instance just diagnosed: `<instance_id>` (`<region>`)
>
> **B. Enroll ACK cluster**
> If this instance belongs to an ACK cluster, you can enroll **all nodes** in the cluster with one click.
> Newly added nodes will be automatically enrolled — no manual action needed.
> 👉 Please provide the **ACK Cluster ID** (e.g., `c9d7f3fc3d42********c1100ffb19d`)
>
> **C. Enroll multiple specified instances**
> Batch enroll multiple instances.
> 👉 Please provide the instance list in the format `InstanceID:Region`, separated by spaces
> Example: `i-xxx:cn-beijing i-yyy:cn-hangzhou`
>
> Please choose A / B / C, or tell me your requirements directly.
</verbatim_output>
**After outputting the above, STOP. Wait for user reply.**
**Step 11 — Execute Enrollment**
> **Fixed parameter values for `--agent-id`, `--agent-version`, `--config-id` in enrollment commands are listed in the "Fixed Parameters" table in [references/related-commands.md](references/related-commands.md).**
```bash
# Instance mode
aliyun sysom install-agent \
--instances instance=<instance_id> region=<region> \
--install-type InstallAndUpgrade \
--agent-id <agent-id> \
--agent-version <agent-version> \
--user-agent AlibabaCloud-Agent-Skills
# Cluster mode
aliyun sysom install-agent-for-cluster \
--cluster-id <cluster_id> \
--agent-id <agent-id> \
--agent-version <agent-version> \
--config-id <config-id> \
--user-agent AlibabaCloud-Agent-Skills
```
**Step 12 — Enrollment Status Confirmation**
```bash
# Instance mode — poll instance status (interval: 10s, max: 60 attempts)
aliyun sysom list-instance-status --instance <instance_id> --biz-region <region> --user-agent AlibabaCloud-Agent-Skills
# Cluster mode — get full cluster list, then match target cluster by cluster_id
aliyun sysom list-clusters --user-agent AlibabaCloud-Agent-Skills
# From the returned cluster list, match the target cluster by cluster_id field and check its cluster_status
```
> **⚠️ Enrollment success criteria: status `Running` means enrollment is complete — stop polling immediately and proceed to the next step.**
---
### Phase 4: Alert Configuration (Steps 13–15)
For detailed workflow, see [references/manage-and-alert-workflow.md](references/manage-and-alert-workflow.md).
**Step 13 — Collect DingTalk Webhook and Create Alert Destination (Inversion Gate + SDK Call)**
After successful enrollment, you **MUST immediately collect the DingTalk bot Webhook URL** from the user to create an alert destination. This feature is **NOT supported by CLI** — use SDK scripts under `scripts/`.
Ask the user:
<verbatim_output>
> 📲 Please provide the DingTalk group bot **Webhook URL** for receiving alert notifications.
> Format: `https://oapi.dingtalk.com/robot/send?access_token=xxx`
>
> 💡 How to get it: DingTalk Group Settings → Bot Management → Add Bot → Custom Bot → Optional keyword: alert → Copy Webhook URL
</verbatim_output>
After the user provides the Webhook, initialize the SDK environment and create the alert destination:
```bash
# Initialize SDK environment (first time only, can skip afterwards)
bash scripts/setup-sdk.sh
# Create alert destination (stdout outputs destination_id)
.sysom-sdk-venv/bin/python scripts/create-alert-destination.py '<user-provided-webhook-url>'
```
> **⚠️ You MUST use `.sysom-sdk-venv/bin/python` to execute scripts** — using system `python3` is FORBIDDEN (signature algorithm depends on specific SDK version).
On success, stdout outputs `destination_id` (a pure number). Record this value for use in Step 15.
**Step 14 — Alert Item Selection (Inversion Gate)**
```bash
aliyun sysom list-alert-items --user-agent AlibabaCloud-Agent-Skills
```
Display the alert items list (categorized by NODE/POD), supporting quick selection (`all`, `node-all`, `pod-all`) and numbered selection.
**Step 15 — Create Alert Strategy (SDK Call)**
After the user selects alert items, **create the alert strategy directly** with `destinations` set to the destination ID from Step 13.
> **⚠️ CLI does NOT support the `destinations` parameter — you MUST use the SDK script to create alert strategies.**
```bash
.sysom-sdk-venv/bin/python scripts/create-alert-strategy.py \
--name "aliyun-aes-skills-create-<YYYYMMDDHHmm>" \
--items "<alert_item_1>,<alert_item_2>" \
--clusters "<clusters_value>" \
--destinations "<destination_id>"
```
- Instance mode → `--clusters` value is `default`
- Cluster mode → `--clusters` value is `<cluster_name>` (note: name, NOT ID)
- `--destinations` → destination ID from Step 13 (multiple IDs comma-separated, e.g., `1,2`)
- `--items` → alert item names comma-separated
> **⚠️ You MUST use `.sysom-sdk-venv/bin/python` to execute scripts** — using system `python3` is FORBIDDEN.
---
## Success Verification
For verification methods of each phase, see [references/verification-method.md](references/verification-method.md).
---
## Cleanup
The diagnosis operations in this skill are **read-only** and do not modify instance state — no cleanup is needed.
To uninstall an enrolled Agent, use `aliyun sysom uninstall-agent` (see [references/related-commands.md](references/related-commands.md) for parameters).
**After all CLI operations are complete, you MUST disable AI-Mode:**
```bash
aliyun configure ai-mode disable
```
---
## Command Tables
For the full CLI command list, see [references/related-commands.md](references/related-commands.md).
---
## Best Practices
1. **Check Cloud Assistant status before diagnosis**: SysOM diagnosis depends on Cloud Assistant being online — always confirm in Step 5
2. **Use real-time diagnosis mode**: Unless the user explicitly specifies a time range, default to real-time diagnosis
3. **Use English keywords for ocd_description**: API only supports `[a-zA-Z0-9_.~-]` characters
4. **Use double gate for enrollment recommendation**: Recommend first, then ask method — avoid information overload
5. **Cluster enrollment batch limit**: When exceeding 50 instances, the first batch installs only 50; the rest are installed automatically
6. **clusters parameter for alert strategy**: Use `default` for instance mode, use cluster **name** (not ID) for cluster mode
7. **Alert destinations via SDK**: Alert destination APIs are not supported by CLI — must use Python SDK (`alibabacloud_sysom20231230`)
8. **destinations parameter for alert strategy**: After creating an alert destination, include `destinations` (destination ID list) in `create-alert-strategy` — alerts will be pushed to DingTalk via SysOM
9. **Credential security**: Never print or echo AK/SK values in conversation
10. **All CLI commands must include `--user-agent AlibabaCloud-Agent-Skills`**
11. **Remediation suggestions may involve high-risk operations**: Follow the Human-in-the-loop protocol and wait for user confirmation
---
## Unsupported Scenarios
- Non-Linux instances (Windows instances are not supported)
- Instances with incompatible kernel versions (checked via check-instance-support)
- Pure configuration issues (e.g., security group rules, VPC routing — no OS-level diagnosis needed)
---
## Error Handling
| Error Scenario | CLI Response | Agent Action |
|----------------|-------------|--------------|
| Instance not supported by SysOM | check-instance-support returns unsupported | Inform user that kernel-level diagnosis is not supported, fall back to standard diagnosis |
| Role authorization failure | initial-sysom returns error | Prompt user to check SysOM service activation status |
| Diagnosis invocation failure | invoke-diagnosis returns error | Check credential and permission configuration |
| Diagnosis timeout | get-diagnosis-result polling timeout | Suggest user retry later |
| Insufficient permissions | API returns Forbidden | Read `references/ram-policies.md` and guide user to request permissions |
| SDK not installed | `ModuleNotFoundError: No module named 'alibabacloud_sysom20231230'` | Prompt user to run `pip install alibabacloud_sysom20231230` |
| Alert destination creation failure | SDK returns error | Check Webhook URL format and credential permissions |
---
## Reference Links
| Reference | Description |
|-----------|-------------|
| [references/cli-installation-guide.md](references/cli-installation-guide.md) | Aliyun CLI installation and configuration guide |
| [references/ram-policies.md](references/ram-policies.md) | RAM permission policy list |
| [references/related-commands.md](references/related-commands.md) | Full CLI command list |
| [references/verification-method.md](references/verification-method.md) | Success verification methods for each phase |
| [references/diagnose-workflow.md](references/diagnose-workflow.md) | Detailed diagnosis workflow (Steps 4–9) |
| [references/manage-and-alert-workflow.md](references/manage-and-alert-workflow.md) | Detailed enrollment and alert workflow (Steps 10–15) |
| [references/acceptance-criteria.md](references/acceptance-criteria.md) | Test acceptance criteria |
FILE:references/acceptance-criteria.md
# Acceptance Criteria: alibabacloud-aes-sysom-os-diagnosis
**Scenario**: SysOM 深度诊断 — ECS 实例内核级性能诊断、纳管与告警配置
**Purpose**: Skill testing acceptance criteria
---
## Correct CLI Command Patterns
### 1. Product — verify product name exists
#### ✅ CORRECT
```bash
aliyun sysom invoke-diagnosis ...
aliyun ecs describe-cloud-assistant-status ...
```
#### ❌ INCORRECT
```bash
# 错误:产品名不存在
aliyun SysOM invoke-diagnosis ...
aliyun sysom InvokeDiagnosis ...
```
### 2. Command — verify action exists under the product
#### ✅ CORRECT
```bash
aliyun sysom invoke-diagnosis
aliyun sysom get-diagnosis-result
aliyun sysom initial-sysom --check-only false --source aes-skills
aliyun sysom check-instance-support
aliyun sysom install-agent
aliyun sysom install-agent-for-cluster
aliyun sysom list-instance-status
aliyun sysom list-clusters
aliyun sysom list-alert-items
aliyun sysom create-alert-strategy # CLI 存在但不支持 destinations,需用 SDK 脚本
aliyun sysom uninstall-agent
```
#### ❌ INCORRECT
```bash
# 错误:使用传统 API 格式而非 plugin mode
aliyun sysom InvokeDiagnosis
aliyun sysom GetDiagnosisResult
aliyun sysom InstallAgent
```
### 3. Parameters — verify each parameter name exists
#### ✅ CORRECT
```bash
# invoke-diagnosis 参数(params key 使用 snake_case,必须包含 type)
aliyun sysom invoke-diagnosis --service-name ocd --channel ecs \
--params '{"instance":"i-xxx","region":"cn-hangzhou","start_time":0,"end_time":0,"type":"ocd","ai_roadmap":true,"enable_sysom_link":false}'
# install-agent 参数
aliyun sysom install-agent --instances instance=i-xxx region=cn-hangzhou --install-type InstallAndUpgrade --agent-id xxx --agent-version 3.12.0-1
# describe-cloud-assistant-status 参数
aliyun ecs describe-cloud-assistant-status --biz-region-id cn-hangzhou --instance-id i-xxx
# list-instance-status 参数
aliyun sysom list-instance-status --instance i-xxx --biz-region cn-hangzhou
# list-clusters(不传 --cluster-id,获取全量后匹配)
aliyun sysom list-clusters
# create-alert-strategy(通过 SDK 脚本,CLI 不支持 destinations)
.sysom-sdk-venv/bin/python scripts/create-alert-strategy.py --name my-strategy --items "节点CPU使用率检测" --clusters "default" --destinations "1"
```
#### ❌ INCORRECT
```bash
# 错误:参数名不正确
aliyun sysom invoke-diagnosis --serviceName ocd # 应为 --service-name
aliyun sysom install-agent --instanceId i-xxx # 应为 --instances instance=i-xxx region=xxx
aliyun ecs describe-cloud-assistant-status --region-id cn-hangzhou # 应为 --biz-region-id
aliyun sysom check-instance-support --region cn-hangzhou # 应为 --biz-region
# 错误:invoke-diagnosis params 使用 camelCase 或缺少 type
aliyun sysom invoke-diagnosis --params '{"instanceId":"i-xxx","startTime":0}' # key 应为 snake_case,且缺少 type
# 错误:list-clusters 传入 --cluster-id(应获取全量后匹配)
aliyun sysom list-clusters --cluster-id cxxx # 应不传参数,获取全量列表后按 cluster_id 匹配
```
### 5. Alert Destination SDK Calls — verify SDK usage patterns
#### ✅ CORRECT
```bash
# SDK 环境初始化
bash scripts/setup-sdk.sh
# 创建告警联系人(通过脚本)
.sysom-sdk-venv/bin/python scripts/create-alert-destination.py 'https://oapi.dingtalk.com/robot/send?access_token=xxx'
# 创建告警联系人(指定名称)
.sysom-sdk-venv/bin/python scripts/create-alert-destination.py 'https://oapi.dingtalk.com/robot/send?access_token=xxx' '运维告警群'
```
#### ❌ INCORRECT
```bash
# 错误:尝试通过 CLI 调用告警联系人 API(不支持 CLI)
aliyun sysom create-alert-destination ... # 此命令不存在
# 错误:未先运行 setup-sdk.sh 就直接调用脚本
python scripts/create-alert-destination.py '...' # 应使用虚拟环境中的 python
# 错误:直接用 pip install 而非 setup-sdk.sh(不会创建虚拟环境)
pip install alibabacloud_sysom20231230
```
### 4. --user-agent flag present
#### ✅ CORRECT
```bash
aliyun sysom invoke-diagnosis --service-name ocd --channel ecs --params '...' --user-agent AlibabaCloud-Agent-Skills
```
#### ❌ INCORRECT
```bash
# 错误:缺少 --user-agent
aliyun sysom invoke-diagnosis --service-name ocd --channel ecs --params '...'
```
---
## Credential Verification Pattern
#### ✅ CORRECT
```bash
aliyun configure list --user-agent AlibabaCloud-Agent-Skills
```
#### ❌ INCORRECT
```bash
# 错误:打印 AK/SK 值
echo $ALIBABA_CLOUD_ACCESS_KEY_ID
# 错误:在命令行中传入明文凭据
aliyun configure set --access-key-id LTAI5tXXXXXX --access-key-secret 8dXXXXXXXX
```
---
## Parameter Handling
#### ✅ CORRECT
- 所有用户可定制参数(RegionId、instance_id 等)在执行前向用户确认
- `ocd_description` 使用纯英文关键词
- `--instances` 使用结构化格式 `instance=<id> region=<region>`
#### ❌ INCORRECT
- 假设 region 为 `cn-hangzhou` 而不询问用户
- 将中文直接传入 `ocd_description`
- `--instances` 使用 JSON 数组格式而非结构化格式
---
## CLI Plugin Mode Format
#### ✅ CORRECT
```bash
aliyun sysom invoke-diagnosis # 小写 + 连字符
aliyun sysom get-diagnosis-result
aliyun sysom install-agent
```
#### ❌ INCORRECT
```bash
aliyun sysom InvokeDiagnosis # 传统 API 格式
aliyun sysom GetDiagnosisResult
aliyun sysom InstallAgent
```
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
Complete guide for installing and configuring Aliyun CLI.
> **Aliyun CLI 3.3.1+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.1 or later for full plugin ecosystem coverage.
## Installation
### macOS
**Using Homebrew (Recommended)**
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
# Verify version (>= 3.3.1)
aliyun version
```
**Using Binary**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
# Extract
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
# Move to PATH
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
### Linux
**Debian/Ubuntu**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**CentOS/RHEL**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**ARM64 Architecture**
```bash
# Download ARM64 version
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-arm64.tgz
sudo mv aliyun /usr/local/bin/
```
### Windows
**Using Binary**
1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Open new Command Prompt or PowerShell
5. Verify: `aliyun version`
**Using PowerShell**
```powershell
# Download
Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip"
# Extract
Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli
# Add to PATH (requires admin privileges)
$env:Path += ";C:\aliyun-cli"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)
# Verify
aliyun version
```
## Configuration
### Quick Start
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
All `aliyun configure` commands support non-interactive flags, which is the recommended approach —
it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts.
**Where to Get Access Keys**
1. Log in to Aliyun Console: https://ram.console.aliyun.com/
2. Navigate to: AccessKey Management
3. Create a new AccessKey pair
4. Save the secret immediately — it's only shown once
### Configuration Modes
Aliyun CLI supports 6 authentication modes. All examples below use non-interactive flags.
#### 1. AK Mode (Access Key)
Most common mode for personal accounts and scripts.
```bash
aliyun configure set \
--mode AK \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Configuration is stored in `~/.aliyun/config.json`:
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "AK",
"access_key_id": "LTAI5tXXXXXXXX",
"access_key_secret": "8dXXXXXXXXXXXXXXXXXXXXXXXX",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "en"
}
]
}
```
#### 2. StsToken Mode (Temporary Credentials)
For short-lived access (tokens expire in 1-12 hours).
```bash
aliyun configure set \
--mode StsToken \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--sts-token v1.0:XXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Use cases: CI/CD pipelines, temporary access for external contractors, cross-account access.
#### 3. RamRoleArn Mode (Assume RAM Role)
Assume a RAM role for elevated or cross-account access.
```bash
aliyun configure set \
--mode RamRoleArn \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--ram-role-arn acs:ram::123456789012:role/AdminRole \
--role-session-name my-session \
--region cn-hangzhou
```
Use cases: cross-account resource access, temporary elevated privileges, role-based access control.
#### 4. EcsRamRole Mode (ECS Instance RAM Role)
Use the RAM role attached to an ECS instance — no credentials needed.
```bash
aliyun configure set \
--mode EcsRamRole \
--ram-role-name MyEcsRole \
--region cn-hangzhou
```
Requirements: must be running on an ECS instance with a RAM role attached.
Use cases: scripts and automation running on ECS instances.
#### 5. RsaKeyPair Mode (RSA Key Pair)
Use RSA key pair for authentication (generate key pair in Aliyun Console first).
```bash
aliyun configure set \
--mode RsaKeyPair \
--private-key /path/to/private-key.pem \
--key-pair-name my-key-pair \
--region cn-hangzhou
```
#### 6. RamRoleArnWithEcs Mode (ECS + RAM Role)
Combine ECS instance role with RAM role assumption for cross-account access from ECS.
```bash
aliyun configure set \
--mode RamRoleArnWithEcs \
--ram-role-name MyEcsRole \
--ram-role-arn acs:ram::123456789012:role/TargetRole \
--role-session-name my-session \
--region cn-hangzhou
```
### Environment Variables
**Highest priority** - overrides config file
**Access Key Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**STS Token Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_SECURITY_TOKEN=your_sts_token
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**ECS RAM Role Mode**
```bash
export ALIBABA_CLOUD_ECS_METADATA=role_name
```
**Use Case**:
- CI/CD pipelines
- Docker containers
- Temporary credential override
### Managing Multiple Profiles
**Create Named Profiles**
```bash
aliyun configure set --profile projectA \
--mode AK \
--access-key-id LTAI5tAAAAAAAA \
--access-key-secret 8dAAAAAAAAAAAAAAAAAAAAAAAA \
--region cn-hangzhou
aliyun configure set --profile projectB \
--mode AK \
--access-key-id LTAI5tBBBBBBBB \
--access-key-secret 8dBBBBBBBBBBBBBBBBBBBBBBBB \
--region cn-shanghai
```
**Use Specific Profile**
```bash
aliyun ecs describe-instances --profile projectA
export ALIBABA_CLOUD_PROFILE=projectA
aliyun ecs describe-instances # Uses projectA
```
**List and Switch Profiles**
```bash
aliyun configure list # List all profiles
aliyun configure set --current projectA # Switch default profile
```
### Credential Priority
Credentials are loaded in this order (first found wins):
1. **Command-line flag**: `--profile <name>`
2. **Environment variable**: `ALIBABA_CLOUD_PROFILE`
3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc.
4. **Configuration file**: `~/.aliyun/config.json` (current profile)
5. **ECS Instance RAM Role**: If running on ECS with attached role
## Verification
### Test Authentication
```bash
# Basic test - list regions
aliyun ecs describe-regions
# Expected output: JSON array of regions
```
**If successful**, you'll see:
```json
{
"Regions": {
"Region": [
{
"RegionId": "cn-hangzhou",
"RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com",
"LocalName": "华东 1(杭州)"
},
...
]
},
"RequestId": "..."
}
```
**If failed**, you'll see error messages:
- `InvalidAccessKeyId.NotFound` - Wrong Access Key ID
- `SignatureDoesNotMatch` - Wrong Access Key Secret
- `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode)
- `Forbidden.RAM` - Insufficient permissions
### Debug Configuration
```bash
# Show current configuration
aliyun configure get
# Test with debug logging
aliyun ecs describe-regions --log-level=debug
# Check credential provider
aliyun configure get mode
```
## Security Best Practices
### 1. Use RAM Users (Not Root Account)
❌ **Don't**: Use Aliyun root account credentials
✅ **Do**: Create RAM users with specific permissions
```bash
# Create RAM user in console
# Attach only necessary policies
# Use RAM user's access keys
```
### 2. Principle of Least Privilege
Grant only the minimum permissions needed:
```bash
# Example: Read-only ECS access
# Attach policy: AliyunECSReadOnlyAccess
```
### 3. Rotate Access Keys Regularly
```bash
# Create new access key in RAM Console, then update configuration
aliyun configure set --access-key-id NEW_KEY --access-key-secret NEW_SECRET
# Delete old access key from console
```
### 4. Use STS Tokens for Temporary Access
```bash
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token XXXX --region cn-hangzhou
```
### 5. Use ECS RAM Roles When Possible
```bash
aliyun configure set --mode EcsRamRole --ram-role-name MyRole --region cn-hangzhou
```
### 6. Never Commit Credentials
```bash
# Add to .gitignore
echo "~/.aliyun/config.json" >> .gitignore
# Use environment variables in CI/CD instead
```
### 7. Secure Config File
```bash
# Restrict permissions
chmod 600 ~/.aliyun/config.json
```
## Troubleshooting
### Issue: Command Not Found
```bash
# Check installation
which aliyun
# Check PATH
echo $PATH
# Reinstall or add to PATH
```
### Issue: Authentication Failed
```bash
# Verify configuration
aliyun configure get
# Test with debug
aliyun ecs describe-regions --log-level=debug
# Check credentials in console
# Verify access key is active
```
### Issue: Permission Denied
```bash
# Error: Forbidden.RAM
# Check RAM user permissions
# Attach necessary policies in RAM console
# Example: AliyunECSFullAccess for ECS operations
```
### Issue: STS Token Expired
```bash
# Error: InvalidSecurityToken.Expired
# Reconfigure with new token
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token NEW_TOKEN --region cn-hangzhou
```
### Issue: Wrong Region
```bash
# Some resources may not exist in the specified region
# Check available regions
aliyun ecs describe-regions
# Update default region
aliyun configure set region cn-shanghai
```
## Advanced Configuration
### Custom Endpoint
```bash
# Use custom or private endpoint
export ALIBABA_CLOUD_ECS_ENDPOINT=ecs-vpc.cn-hangzhou.aliyuncs.com
```
### Proxy Settings
```bash
# HTTP proxy
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
# No proxy for specific domains
export NO_PROXY=localhost,127.0.0.1,.aliyuncs.com
```
### Timeout Settings
```bash
# Connection timeout (default: 10s)
export ALIBABA_CLOUD_CONNECT_TIMEOUT=30
# Read timeout (default: 10s)
export ALIBABA_CLOUD_READ_TIMEOUT=30
```
## Next Steps
After installation and configuration:
1. **Install plugins** for services you need (v3.3.1+ supports all published product plugins):
```bash
aliyun plugin install --names ecs vpc rds
# List all available plugins
aliyun plugin list-remote
```
2. **Explore commands**:
```bash
aliyun ecs --help
aliyun fc --help
```
3. **Read documentation**:
- [Command Syntax Guide](./command-syntax.md)
- [Global Flags Reference](./global-flags.md)
- [Common Scenarios](./common-scenarios.md)
## References
- Official Documentation: https://help.aliyun.com/zh/cli/
- RAM Console: https://ram.console.aliyun.com/
- Access Key Management: https://ram.console.aliyun.com/manage/ak
- Plugin Repository: https://github.com/aliyun/aliyun-cli
FILE:references/diagnose-workflow.md
# Diagnosis Execution Detailed Workflow
This document contains the detailed execution steps for SysOM deep diagnosis (Steps 4–9), referenced from the Core Workflow in SKILL.md.
All `aliyun` CLI commands **MUST** include `--user-agent AlibabaCloud-Agent-Skills`.
---
## Step 4 — Ambiguous Problem Clarification (Inversion Gate)
Before entering the diagnosis pipeline, **the following two required parameters MUST be confirmed**. If the user's question does not include this information, you must ask the user — **do NOT guess or use default values**.
### Required Parameters
| Parameter | Description | Example |
|-----------|-------------|---------|
| `region` | Region of the ECS instance | `cn-hangzhou`, `cn-beijing`, `cn-shanghai` |
| `instance_id` | ECS instance ID | `i-bp1xxxxxxxxxxxxxxx` |
### Clarification Flow
**4a. Check if the user's input already contains region and instance_id**
Extract these two parameters from the user's problem description. Common expressions include:
- "instance i-bp1xxx in Hangzhou" (region=cn-hangzhou, instance_id=i-bp1xxx)
- "instance i-bp1xxx in Beijing" (region=cn-beijing, instance_id=i-bp1xxx)
**4b. If either parameter is missing, ask the user**
> 🔍 To perform SysOM deep diagnosis, I need to confirm the following:
>
> - **Instance ID**: Please provide the ECS instance ID (format: `i-bp1xxxxxxxx`)
> - **Region**: The Alibaba Cloud region where the instance is located (e.g., `cn-hangzhou`, `cn-beijing`, `cn-shanghai`)
**4c. Also extract optional context**
- `ocd_description`: The problem symptoms described by the user
- **Time range inference** (see below)
- `uid`: If the user mentioned an account ID
**⚠️ CRITICAL: Time Inference and Historical Diagnosis Recommendation**
When the user's description contains **any temporal reference** — even vague ones — you **MUST** proactively infer the time range and recommend historical diagnosis mode. Do NOT silently default to real-time diagnosis when the problem clearly occurred in the past.
**Time inference examples:**
| User Description | Inferred Action |
|-----------------|----------------|
| "The server crashed this morning" | Ask: "When exactly did the crash happen this morning? I'll use historical diagnosis to analyze that time window." |
| "Yesterday afternoon there was high CPU" | Ask: "Around what time yesterday afternoon? I'll run historical diagnosis for that period." |
| "It went down around 3am" | Convert to Unix timestamps for today's 3am (±30min buffer), recommend historical diagnosis |
| "The instance rebooted unexpectedly last night" | Ask for approximate time, recommend historical diagnosis |
| "There's been high load for the past 2 hours" | Calculate start_time = now - 2h, recommend historical diagnosis |
| "The server is slow right now" | No time inference needed, use real-time diagnosis (default) |
**Rules:**
1. If the user mentions a **past event** (crash, reboot, spike that already happened), you **MUST** ask for the specific time and recommend historical diagnosis
2. If the user describes an **ongoing issue** ("right now", "currently"), use real-time diagnosis
3. When asking for time, also provide the option: "Or would you prefer a real-time diagnosis to check the current state?"
4. Convert natural language time references to Unix timestamps using the current time as reference
**⚠️ IMPORTANT: `ocd_description` MUST be in English only**
The SysOM API restricts `ocd_description` to only `[a-zA-Z0-9_.~-]` characters. You must translate the user's problem description into short English keywords connected by underscores.
| User Description | ocd_description Value |
|-----------------|----------------------|
| High load / abnormal system load | `high_load` |
| CPU spike / high CPU usage | `high_cpu` |
| Memory leak / out of memory | `memory_leak` |
| High IO latency / slow disk | `io_latency` |
| Network packet loss / network jitter | `network_packet_loss` |
| Crash / kernel panic | `kernel_panic` |
| OOM / process killed | `oom_killed` |
| Overall server health check | `health_check` |
---
## Step 5 — Cloud Assistant Online Check
```bash
aliyun ecs describe-cloud-assistant-status \
--biz-region-id <region> \
--instance-id <instance_id> \
--user-agent AlibabaCloud-Agent-Skills
```
Check the `InstanceCloudAssistantStatusSet.InstanceCloudAssistantStatus` array in the returned JSON, find the `CloudAssistantStatus` field for the target instance:
- `"true"` → Cloud Assistant is online, proceed to Step 6
- `"false"` → Inform user that Cloud Assistant is offline, terminate the pipeline
- API call failure → Ask user whether to continue
---
## Step 6 — SysOM Role Initialization
```bash
aliyun sysom initial-sysom --check-only false --source aes-skills --user-agent AlibabaCloud-Agent-Skills
```
Ensures the SysOM service role has been created. This step is idempotent and can be executed repeatedly.
---
## Step 7 — Instance Support Check
```bash
aliyun sysom check-instance-support \
--instances <instance_id> \
--biz-region <region> \
--user-agent AlibabaCloud-Agent-Skills
```
Verify the target instance meets:
- Operating system is Linux
- Kernel version is compatible with SysOM diagnosis
If the instance is not supported, a clear failure reason is returned — suggest falling back to standard diagnosis.
---
## Step 8 — Invoke Diagnosis and Poll Results
### 8a. Diagnosis Mode Decision
Based on the user's input parameter combination, determine the diagnosis mode:
```
if enable_diagnosis == true:
mode = real-time diagnosis # enable_diagnosis has highest priority, force start_time to 0
elif start_time != 0:
mode = historical diagnosis # time range specified, retrospective analysis
else:
mode = real-time diagnosis # default
```
#### Optional Parameters and Defaults
| Parameter | Default | Description |
|-----------|---------|-------------|
| `start_time` | `0` | Diagnosis start timestamp (Unix seconds) |
| `end_time` | `0` | Diagnosis end timestamp (Unix seconds) |
| `enable_diagnosis` | `false` | Force real-time diagnosis |
| `ocd_description` | `""` | Problem description for intent recognition (English only) |
| `uid` | `None` | Account ID owning the instance |
| `skip_support_check` | `false` | Skip instance support check (speeds up workflow) |
### 8b. Build params JSON
Use **snake_case** keys (consistent with SDK). Required base fields (**ALL must be included**):
```json
{
"instance": "<instance_id>",
"region": "<region>",
"start_time": 0,
"end_time": 0,
"type": "ocd",
"ai_roadmap": true,
"enable_sysom_link": false
}
```
> **⚠️ Anti-confusion Warning: `"type": "ocd"` is a REQUIRED field inside the params JSON — do NOT omit it!**
>
> `--service-name ocd` (CLI argument) and `"type": "ocd"` (params JSON field) are **two different levels of parameters**, both are mandatory:
> - `--service-name ocd` → tells CLI which diagnosis service endpoint to call
> - `"type": "ocd"` → tells the diagnosis engine which diagnosis type to execute internally
>
> **Do NOT omit `"type": "ocd"` from params just because `--service-name` already specifies `ocd`!**
Conditional fields (add to JSON only when non-empty):
- `"ocd_description": "<english_keywords>"` — add when user's problem description is not empty
- `"uid": <integer>` — add when user provides an account ID
**Impact of diagnosis mode on params**:
- **Real-time**: `start_time: 0`, `end_time: 0`
- **Historical**: `start_time: <unix_ts>`, `end_time: <unix_ts>`
- **Forced real-time** (`enable_diagnosis=true`): force `start_time` to `0` even if user provided a value
### 8c. Invoke Diagnosis
```bash
aliyun sysom invoke-diagnosis \
--service-name ocd \
--channel ecs \
--params '{"instance":"<instance_id>","region":"<region>","start_time":<start_time>,"end_time":<end_time>,"type":"ocd","ai_roadmap":true,"enable_sysom_link":false,"ocd_description":"<ocd_description>"}' \
--user-agent AlibabaCloud-Agent-Skills
```
Extract `task_id` from the response.
**Special handling**: If `Sysom.TaskInProgress` error is returned, it means a diagnosis task is already in progress. Extract the existing `task_id` from the error message (regex match `ocd(<task_id>)`) and proceed directly to polling.
### 8d. Poll Diagnosis Results
Interval: 10 seconds, max 60 attempts:
```bash
aliyun sysom get-diagnosis-result \
--task-id <task_id> \
--user-agent AlibabaCloud-Agent-Skills
```
Check the `status` field in the response:
- `Ready` / `Running` → continue polling
- `Success` → diagnosis complete, proceed to Step 9
- `Fail` → diagnosis failed, inform the user
> **⛔ Behavioral Constraints During Polling (MUST OBEY):**
>
> During polling while waiting for diagnosis results, the following actions are **STRICTLY FORBIDDEN (both executing and suggesting to the user)**:
> 1. **FORBIDDEN** to invoke Cloud Assistant to execute commands on the instance (e.g., `top`, `vmstat`, `dmesg`, `iostat`)
> 2. **FORBIDDEN** to call ECS monitoring, CloudMonitor, or other APIs
> 3. **FORBIDDEN** to attempt "alternative diagnosis methods" or initiate new diagnosis tasks
> 4. **FORBIDDEN** to call any command not listed in this skill's [Command Tables]
> 5. **FORBIDDEN** to suggest any of the above actions to the user as "alternatives" or "fallback options"
>
> **The ONLY permitted action**: continue calling `aliyun sysom get-diagnosis-result` to poll, or stop after timeout.
>
> **Timeout handling**: If still incomplete after 60 polling attempts, you **MUST and can ONLY** output the following template, then stop:
>
> ```
> ⏳ SysOM diagnosis task timed out
> - Task ID: <task_id>
> - Current status: <status>
> - Suggestion: Please continue waiting for the diagnosis to complete.
> ```
>
> **FORBIDDEN to add any "alternative diagnosis method" suggestions in the timeout output. Actions that cannot be performed must not be suggested.**
---
## Step 9 — Result Parsing and Output
### Key Field Interpretation
| Field | Meaning | How Agent Should Use It |
|-------|---------|------------------------|
| `summary.overall_status` | Overall status (Info/Warn/Critical) | Determine problem severity |
| `summary.root_cause` | SysOM root cause analysis | Kernel-level root cause evidence |
| `summary.suggestions` | Remediation suggestion list | Incorporate directly into recommendations |
| `issues[]` | Issues found by each sub-diagnostic item | Analyze item by item to locate specific subsystem |
| `diagnose_mode` | Diagnosis mode identifier | Distinguish real-time vs. historical diagnosis |
---
## SysOM Diagnosis Capability Coverage
| Subsystem | Diagnostic Tool | Diagnostic Content |
|-----------|----------------|-------------------|
| CPU | monitor | User-space/kernel-space CPU usage analysis, CPU saturation detection |
| Memory | memgraph | Memory panoramic analysis, memory leak detection, OOM diagnosis |
| IO | iofsstat, iodiagnose | IO traffic attribution analysis, IO latency diagnosis, iowait analysis |
| Network | packetdrop, netjitter | Packet loss diagnosis, network jitter analysis |
| Load | loadtask | System load anomaly analysis, load jitter diagnosis |
| Scheduling | delay | CPU scheduling jitter, scheduling latency diagnosis |
| Crash | vmcore | Crash cause analysis, kernel panic diagnosis |
| Health Score | healthy_score | Overall server health scoring |
FILE:references/manage-and-alert-workflow.md
# Enrollment and Alert Detailed Workflow
This document contains the detailed execution steps for SysOM instance enrollment and alert configuration (Steps 10–15).
All `aliyun` CLI commands **MUST** include `--user-agent AlibabaCloud-Agent-Skills`.
---
## Enrollment Recommendation Phase (Inversion + Pipeline Pattern)
After diagnosis is complete, **proactively recommend instance enrollment to the user**. After enrollment, SysOM will continuously monitor instance health. When performance issues occur, it will automatically trigger diagnosis and push reports to DingTalk group bots, enabling 24/7 unattended intelligent operations.
### Step 10 — Enrollment Recommendation and Intent Collection (Inversion Double Gate)
This step contains two strictly separated gates that **MUST be executed in order — merging, skipping, or simplifying is FORBIDDEN**.
#### Step 10A — Prominent Enrollment Recommendation (First Gate)
**⚠️ Mandatory Rule: After presenting diagnosis results, you MUST immediately output the following recommendation content verbatim. Do NOT abbreviate, omit, or rephrase in your own words.**
You must output the following complete content (replace `<instance_id>` with the actual instance ID):
---
> ## 🔔 Recommendation: Enroll Instance for 24/7 Automated Diagnosis
>
> The diagnosis just performed was a one-time manual operation. If you want SysOM to **continuously protect** this instance, we recommend **instance enrollment**.
>
> ### After enrollment, you will get:
>
> - 🔍 **Automated Diagnosis**: When the instance experiences performance issues like CPU spikes, memory leaks, or IO latency, SysOM will **automatically trigger deep diagnosis** without manual intervention
> - 📲 **DingTalk Alerts**: Diagnosis reports will be **automatically pushed to DingTalk group bots**, notifying the ops team immediately
> - 🛡️ **Continuous Monitoring**: 24/7 uninterrupted protection, shifting from "investigate after problems occur" to "automatically told the root cause when problems occur"
>
> **Would you like to enroll instance `<instance_id>`?**
---
**After outputting the above, STOP. Wait for user reply.**
- User **declines** → end the pipeline
- User **agrees** → proceed to Step 10B
**⚠️ Do NOT ask about enrollment method in Step 10A.**
#### Step 10B — Ask Enrollment Method (Second Gate)
**Only execute this step after the user explicitly agrees to enrollment in Step 10A.**
You must output the following complete content (replace `<instance_id>` and `<region>` with actual values):
---
> ### Please choose an enrollment method
>
> **A. Enroll current instance only**
> Only enroll the instance just diagnosed: `<instance_id>` (`<region>`)
>
> **B. Enroll ACK cluster**
> If this instance belongs to an ACK cluster, you can enroll **all nodes** in the cluster with one click.
> Newly added nodes will be automatically enrolled — no manual action needed.
> 👉 Please provide the **ACK Cluster ID** (e.g., `c9d7f3fc3d42********c1100ffb19d`)
>
> **C. Enroll multiple specified instances**
> Batch enroll multiple instances.
> 👉 Please provide the instance list in the format `InstanceID:Region`, separated by spaces
> Example: `i-xxx:cn-beijing i-yyy:cn-hangzhou`
>
> Please choose A / B / C, or tell me your requirements directly.
---
**After outputting the above, STOP. Wait for user reply.**
#### Intent Parsing Rules
| User Reply | Enrollment Mode | Parameters to Collect |
|-----------|----------------|----------------------|
| Choose A / enroll current instance / agree directly | Single instance | No additional parameters needed, reuse instance_id and region from Step 4 |
| Choose B / provided a cluster ID | Cluster | `cluster_id` (ask if not provided) |
| Choose C / provided multiple instances | Multi-instance | Parse the instance list provided by user |
---
### Step 11 — Execute Enrollment
#### Enroll Single or Multiple Instances
```bash
aliyun sysom install-agent \
--instances instance=<instance_id_1> region=<region_1> \
--instances instance=<instance_id_2> region=<region_2> \
--install-type InstallAndUpgrade \
--agent-id 74a86327-3170-412c-8e67-da3389ec56a9 \
--agent-version 3.12.0-1 \
--user-agent AlibabaCloud-Agent-Skills
```
#### Enroll ACK Cluster
```bash
aliyun sysom install-agent-for-cluster \
--cluster-id <cluster_id> \
--agent-id 74a86327-3170-412c-8e67-da3389ec56a9 \
--agent-version 3.12.0-1 \
--config-id 8gj86wrt7-3170-412c-8e67-da3389ecg6a9 \
--user-agent AlibabaCloud-Agent-Skills
```
#### install-type Enum Values
| Value | Description |
|-------|-------------|
| `InstallAndUpgrade` | Install if not present, upgrade if present (default) |
| `OnlyInstallNotHasAgent` | Install if not present, skip if present |
| `OnlyUpgradeHasAgent` | Skip if not present, upgrade if present |
| `OnlyInstallWithoutStart` | Install component only, do not start service |
> **Note**: For cluster enrollment, the initial enrollment installs the agent on all current ECS instances in the cluster (first batch limited to 50 if exceeding 50 instances). Newly added ECS instances will be automatically enrolled.
---
### Step 12 — Enrollment Status Confirmation and Result Output
#### Instance Mode — Poll Instance Status (interval: 10s, max: 60 attempts)
```bash
aliyun sysom list-instance-status \
--instance <instance_id> \
--biz-region <region> \
--user-agent AlibabaCloud-Agent-Skills
```
#### Cluster Mode — Poll Cluster Status
Get the full cluster list (**do NOT pass `--cluster-id`**), then match the target cluster by `cluster_id` field:
```bash
aliyun sysom list-clusters \
--user-agent AlibabaCloud-Agent-Skills
```
From the returned cluster list, iterate through each cluster object, find the entry where `cluster_id` matches the target, and check its `cluster_status` field. Also record the `name` field for later use in create-alert-strategy.
#### Enrollment Status Reference
| Status | Meaning | Icon |
|--------|---------|------|
| `installing` / `Installing` | Installing | ⏳ |
| `running` / `Running` | Enrollment successful | ✅ |
| `failed` / `Offline` | Failed/Abnormal | ❌ |
| `stopped` | Agent stopped | ⏹️ |
#### Result Display
- **All successful** → Inform user that all instances are enrolled, SysOM will continuously monitor
- **Partially failed** → List successful and failed instances, suggest checking failed ones
- **All failed** → Suggest checking network connectivity, RAM permissions, OS compatibility
---
## Alert Configuration Phase (Pipeline Pattern)
After enrollment is complete, **proceed directly to alert configuration**: first create alert destination (collect Webhook), then select alert items, and finally create alert strategy.
### Step 13 — Collect DingTalk Webhook and Create Alert Destination (Inversion Gate + SDK Call)
**⚠️ Mandatory Rule: After successful enrollment, you MUST immediately collect the DingTalk bot Webhook URL from the user. Do NOT skip this step.**
Alert destinations are used to push SysOM alerts to DingTalk group bots. This feature is **NOT supported by CLI** — use Python SDK scripts under `scripts/`.
> **⚠️ SDK Prerequisites**
>
> Before executing this step, run `scripts/setup-sdk.sh` to initialize the SDK environment (checks Python >= 3.8, creates virtual environment, installs SDK):
> ```bash
> bash scripts/setup-sdk.sh
> ```
#### Step 13a — Collect Webhook URL from User
You must output the following complete content:
---
> 📲 Please provide the DingTalk group bot **Webhook URL** for receiving alert notifications.
> Format: `https://oapi.dingtalk.com/robot/send?access_token=xxx`
>
> 💡 How to get it: DingTalk Group Settings → Bot Management → Add Bot → Custom Bot → Optional keyword: alert → Copy Webhook URL
---
**After outputting the above, STOP. Wait for user reply.**
#### Step 13b — Create Alert Destination
After the user provides the Webhook URL, **immediately create the alert destination via script** — no further confirmation needed:
```bash
.sysom-sdk-venv/bin/python scripts/create-alert-destination.py '<user-provided-webhook-url>'
```
Optionally specify a destination name:
```bash
.sysom-sdk-venv/bin/python scripts/create-alert-destination.py '<webhook-url>' '<destination-name>'
```
> **⚠️ You MUST use the virtual environment Python to execute scripts**
>
> **FORBIDDEN** to use `python3` or `python` directly — system Python dependencies may be incompatible, causing signature verification failures.
On success, **stdout outputs `destination_id` (a pure number)**, detailed info is output to stderr.
**Result handling**:
- **Success** → Display destination ID and name, inform user of successful creation, record `destination_id` for Step 15, **immediately proceed to Step 14**
- **Failure** → Display error message, suggest checking Webhook URL format and RAM permissions
---
### Step 14 — Alert Item Selection (Inversion Gate)
**⚠️ Mandatory Rule: After successful alert destination creation, you MUST immediately display the alert items list. Do NOT skip this step.**
**14a. Get Available Alert Items List**
```bash
aliyun sysom list-alert-items --user-agent AlibabaCloud-Agent-Skills
```
**14b. Display Alert Items List to User**
Display the API-returned alert items categorized, each with a number. Format:
---
> ## 🔔 Please select alert items to enable
>
> Enter numbers, separated by spaces:
>
> **Quick selection**: `all` = select all | `node-all` = all NODE items | `pod-all` = all POD items
>
> **【NODE Saturation】**
> 1. Node CPU Usage Detection
> 2. Node Kernel CPU Usage Detection
> ... (populate based on actual API response)
---
**After outputting, STOP. Wait for user reply.**
#### User Input Parsing Rules
| User Input | Parsing Method |
|-----------|---------------|
| `all` | Select all alert items |
| `node-all` | Select all NODE category items |
| `pod-all` | Select all POD category items |
| `1 2 4 11 12 21` | Select by number |
| `node-all 22 23` | Mixed usage |
---
### Step 15 — Create Alert Strategy
After the user selects alert items, **create the alert strategy directly** with `destinations` set to the destination ID from Step 13.
**15a. Determine clusters Parameter**
| Enrollment Mode | clusters Value |
|----------------|---------------|
| Instance mode (Step 10 chose A or C) | `["default"]` |
| Cluster mode (Step 10 chose B) | `["<cluster_name>"]` (note: name, NOT ID) |
**15b. Execute Creation (SDK Call)**
> **⚠️ CLI does NOT support the `destinations` parameter — you MUST use the SDK script to create alert strategies.**
```bash
.sysom-sdk-venv/bin/python scripts/create-alert-strategy.py \
--name "aliyun-aes-skills-create-<YYYYMMDDHHmm>" \
--items "<alert_item_1>,<alert_item_2>" \
--clusters "<clusters_value>" \
--destinations "<destination_id>"
```
Parameter reference:
| Parameter | Description | Example |
|-----------|-------------|---------|
| `--name` | Strategy name | `aliyun-aes-skills-create-202604151900` |
| `--items` | Alert item names, comma-separated | `Node CPU Usage Detection,Node Memory Usage Detection` |
| `--clusters` | Clusters, comma-separated (use `default` for instance mode) | `default` |
| `--destinations` | Alert destination IDs, comma-separated | `1,2` |
| `--k8s-label` | Enable k8s labels (optional) | Defaults to false if omitted |
> **⚠️ You MUST use `.sysom-sdk-venv/bin/python` to execute scripts** — using system `python3` is FORBIDDEN.
On success, **stdout outputs the strategy name**, detailed info is output to stderr.
**15c. Display Results**
- **Success** → Display strategy name, alert item count, cluster, status, associated alert destinations; inform user that alerts will be pushed to DingTalk via SysOM
- **Failure** → Display error message, suggest checking RAM permissions, enrollment status, network connectivity
---
### Alert Destination Management (On Demand)
Users can manage existing alert destinations via SDK as needed. The following operations all use the Python SDK — **NOT supported by CLI**.
#### Get Alert Destination Details
```python
from alibabacloud_sysom20231230 import models
request = models.GetAlertDestinationRequest(id=<destination_id>)
response = client.get_alert_destination(request)
```
#### Update Alert Destination
Only fill in the fields that need to be modified:
```python
from alibabacloud_sysom20231230 import models
request = models.UpdateAlertDestinationRequest(
id='<destination_id>',
name='<new_name>', # optional
target='dingtalk', # optional
params=models.UpdateAlertDestinationRequestParams(
webhook='<new_webhook_url>' # optional
)
)
response = client.update_alert_destination(request)
```
#### Delete Alert Destination
```python
from alibabacloud_sysom20231230 import models
request = models.DeleteAlertDestinationRequest(id=<destination_id>)
response = client.delete_alert_destination(request)
```
#### List All Alert Destinations
Filter by `name` parameter (optional); omit to return all:
```python
from alibabacloud_sysom20231230 import models
request = models.ListAlertDestinationsRequest(name='<optional_filter_name>')
response = client.list_alert_destinations(request)
```
FILE:references/ram-policies.md
# RAM Policies: alibabacloud-aes-sysom-os-diagnosis
This document lists all APIs and their corresponding RAM permissions used by the SysOM deep diagnosis skill.
---
## SysOM Permissions
| API | RAM Action | Description |
|-----|-----------|-------------|
| InitialSysom | `sysom:InitialSysom` | Initialize SysOM role authorization |
| CheckInstanceSupport | `sysom:CheckInstanceSupport` | Check if instance supports SysOM diagnosis |
| InvokeDiagnosis | `sysom:InvokeDiagnosis` | Invoke intelligent diagnosis |
| GetDiagnosisResult | `sysom:GetDiagnosisResult` | Get diagnosis result |
| InstallAgent | `sysom:InstallAgent` | Enroll instance (install Agent) |
| InstallAgentForCluster | `sysom:InstallAgentForCluster` | Enroll ACK cluster |
| ListInstanceStatus | `sysom:ListInstanceStatus` | Query instance enrollment status |
| ListClusters | `sysom:ListClusters` | Query cluster enrollment status |
| ListAlertItems | `sysom:ListAlertItems` | Get available alert items list |
| CreateAlertStrategy | `sysom:CreateAlertStrategy` | Create alert strategy |
| CreateAlertDestination | `sysom:CreateAlertDestination` | Create alert destination (SDK call) |
| UpdateAlertDestination | `sysom:UpdateAlertDestination` | Update alert destination (SDK call) |
| DeleteAlertDestination | `sysom:DeleteAlertDestination` | Delete alert destination (SDK call) |
| GetAlertDestination | `sysom:GetAlertDestination` | Get alert destination details (SDK call) |
| ListAlertDestinations | `sysom:ListAlertDestinations` | List alert destinations (SDK call) |
## ECS Permissions
| API | RAM Action | Description |
|-----|-----------|-------------|
| DescribeCloudAssistantStatus | `ecs:DescribeCloudAssistantStatus` | Check Cloud Assistant online status |
## Minimum Permission Policy Example
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sysom:InitialSysom",
"sysom:CheckInstanceSupport",
"sysom:InvokeDiagnosis",
"sysom:GetDiagnosisResult",
"sysom:InstallAgent",
"sysom:InstallAgentForCluster",
"sysom:ListInstanceStatus",
"sysom:ListClusters",
"sysom:ListAlertItems",
"sysom:CreateAlertStrategy",
"sysom:CreateAlertDestination",
"sysom:UpdateAlertDestination",
"sysom:DeleteAlertDestination",
"sysom:GetAlertDestination",
"sysom:ListAlertDestinations"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ecs:DescribeCloudAssistantStatus"
],
"Resource": "*"
}
]
}
```
## Permission Tiers
| Phase | Required Permissions | Description |
|-------|---------------------|-------------|
| Diagnosis | `sysom:InitialSysom`, `sysom:CheckInstanceSupport`, `sysom:InvokeDiagnosis`, `sysom:GetDiagnosisResult`, `ecs:DescribeCloudAssistantStatus` | Minimum permissions for deep diagnosis |
| Enrollment | `sysom:InstallAgent` or `sysom:InstallAgentForCluster`, `sysom:ListInstanceStatus`, `sysom:ListClusters` | Enroll instances or clusters |
| Alert | `sysom:ListAlertItems`, `sysom:CreateAlertStrategy` | Configure anomaly event alerts |
| Alert Destination | `sysom:CreateAlertDestination`, `sysom:UpdateAlertDestination`, `sysom:DeleteAlertDestination`, `sysom:GetAlertDestination`, `sysom:ListAlertDestinations` | Manage alert destinations (SDK call) |
FILE:references/related-commands.md
# Related Commands: alibabacloud-aes-sysom-os-diagnosis
This skill uses the `aliyun` CLI to call SysOM and ECS APIs. All commands **MUST** include `--user-agent AlibabaCloud-Agent-Skills`.
---
## Diagnosis Phase
| Product | CLI Command | Description |
|---------|------------|-------------|
| sysom | `aliyun sysom initial-sysom --check-only false --source aes-skills` | Initialize SysOM role authorization |
| sysom | `aliyun sysom check-instance-support --instances <id> --biz-region <region>` | Check if instance supports diagnosis |
| sysom | `aliyun sysom invoke-diagnosis --service-name ocd --channel ecs --params '<JSON>'` | Invoke intelligent diagnosis (params keys use snake_case, must include `type: "ocd"`) |
| sysom | `aliyun sysom get-diagnosis-result --task-id <task_id>` | Get diagnosis result |
| ecs | `aliyun ecs describe-cloud-assistant-status --biz-region-id <region> --instance-id <id>` | Check Cloud Assistant online status |
## Enrollment Phase
| Product | CLI Command | Description |
|---------|------------|-------------|
| sysom | `aliyun sysom install-agent --instances instance=<id> region=<region> --install-type InstallAndUpgrade --agent-id <id> --agent-version <ver>` | Enroll instance |
| sysom | `aliyun sysom install-agent-for-cluster --cluster-id <id> --agent-id <id> --agent-version <ver> --config-id <id>` | Enroll ACK cluster |
| sysom | `aliyun sysom list-instance-status --instance <id> --biz-region <region>` | Query instance enrollment status |
| sysom | `aliyun sysom list-clusters` | Get full cluster list (do not pass cluster-id; match target from response by cluster_id) |
## Alert Phase
| Product | CLI Command | Description |
|---------|------------|-------------|
| sysom | `aliyun sysom list-alert-items` | Get available alert items list |
## Alert Strategy Creation (SDK Call, NOT supported by CLI)
> CLI does not support the `destinations` parameter — alert strategy creation must use the SDK script.
| SDK Script | Description |
|-----------|-------------|
| `.sysom-sdk-venv/bin/python scripts/create-alert-strategy.py --name <name> --items <items> --clusters <clusters> --destinations <ids>` | Create alert strategy (supports destinations to associate alert destinations) |
## Alert Destination (SDK Call, NOT supported by CLI)
> The following APIs are called via Python SDK (`alibabacloud_sysom20231230`), NOT supported by `aliyun` CLI.
| SDK Method | Description |
|-----------|-------------|
| `client.create_alert_destination(request)` | Create alert destination (DingTalk bot Webhook) |
| `client.update_alert_destination(request)` | Update alert destination |
| `client.delete_alert_destination(request)` | Delete alert destination |
| `client.get_alert_destination(request)` | Get alert destination details |
| `client.list_alert_destinations(request)` | List alert destinations (filterable by name) |
## Cleanup
| Product | CLI Command | Description |
|---------|------------|-------------|
| sysom | `aliyun sysom uninstall-agent --instances instance=<id> region=<region> --agent-id <id> --agent-version <ver>` | Uninstall Agent |
## Fixed Parameters
| Parameter | Value | Description |
|-----------|-------|-------------|
| `--agent-id` | `74a86327-3170-412c-8e67-da3389ec56a9` | Agent ID |
| `--agent-version` | `3.12.0-1` | Agent version |
| `--install-type` | `InstallAndUpgrade` | Installation type (default) |
| `--config-id` | `8gj86wrt7-3170-412c-8e67-da3389ecg6a9` | Cluster component config ID |
| `--channel` | `ecs` | Diagnosis channel (fixed) |
| `--service-name` | `ocd` | Diagnosis type (intelligent diagnosis) |
| `--user-agent` | `AlibabaCloud-Agent-Skills` | Must be appended to all commands |
FILE:references/verification-method.md
# Success Verification: alibabacloud-aes-sysom-os-diagnosis
This document describes the success verification methods for each phase. All `aliyun` CLI commands **MUST** include `--user-agent AlibabaCloud-Agent-Skills`.
---
## 1. Environment Setup Verification
### 1.1 CLI Version
```bash
aliyun version --user-agent AlibabaCloud-Agent-Skills
```
**Success criteria**: Version >= 3.3.1
### 1.2 Credential Configuration
```bash
aliyun configure list --user-agent AlibabaCloud-Agent-Skills
```
**Success criteria**: Output contains a valid profile (AK, STS, or OAuth identity)
---
## 2. Diagnosis Phase Verification
### 2.1 Cloud Assistant Online Check
```bash
aliyun ecs describe-cloud-assistant-status \
--biz-region-id <region> \
--instance-id <instance_id> \
--user-agent AlibabaCloud-Agent-Skills
```
**Success criteria**: `CloudAssistantStatus` is `"true"` in the response
### 2.2 SysOM Role Initialization
```bash
aliyun sysom initial-sysom --check-only false --source aes-skills --user-agent AlibabaCloud-Agent-Skills
```
**Success criteria**: No error returned
### 2.3 Instance Support Check
```bash
aliyun sysom check-instance-support \
--instances <instance_id> \
--biz-region <region> \
--user-agent AlibabaCloud-Agent-Skills
```
**Success criteria**: Instance is marked as supported in the response
### 2.4 Diagnosis Execution
```bash
aliyun sysom invoke-diagnosis \
--service-name ocd \
--channel ecs \
--params '{"instanceId":"<instance_id>","region":"<region>","enableDiagnosis":true,"startTime":0,"endTime":0,"aiRoadmap":true,"enableSysomLink":false}' \
--user-agent AlibabaCloud-Agent-Skills
```
**Success criteria**: Response contains `task_id`
### 2.5 Diagnosis Result
```bash
aliyun sysom get-diagnosis-result \
--task-id <task_id> \
--user-agent AlibabaCloud-Agent-Skills
```
**Success criteria**: `status` is `Success`, response contains `summary` and `issues` data
---
## 3. Enrollment Phase Verification
### 3.1 Instance Enrollment
```bash
aliyun sysom install-agent \
--instances instance=<instance_id> region=<region> \
--install-type InstallAndUpgrade \
--agent-id 74a86327-3170-412c-8e67-da3389ec56a9 \
--agent-version 3.12.0-1 \
--user-agent AlibabaCloud-Agent-Skills
```
**Success criteria**: No error returned
### 3.2 Instance Status Polling
```bash
aliyun sysom list-instance-status \
--instance <instance_id> \
--biz-region <region> \
--user-agent AlibabaCloud-Agent-Skills
```
**Success criteria**: Instance `status` is `running`
### 3.3 Cluster Enrollment
```bash
aliyun sysom install-agent-for-cluster \
--cluster-id <cluster_id> \
--agent-id 74a86327-3170-412c-8e67-da3389ec56a9 \
--agent-version 3.12.0-1 \
--config-id 8gj86wrt7-3170-412c-8e67-da3389ecg6a9 \
--user-agent AlibabaCloud-Agent-Skills
```
**Success criteria**: No error returned
### 3.4 Cluster Status Polling
```bash
aliyun sysom list-clusters \
--cluster-id <cluster_id> \
--user-agent AlibabaCloud-Agent-Skills
```
**Success criteria**: `cluster_status` is `Running`
---
## 4. Alert Phase Verification
### 4.1 SDK Environment Initialization
```bash
bash scripts/setup-sdk.sh
```
**Success criteria**: Output shows `✅ SDK installation successful`, Python version >= 3.8
### 4.2 Alert Destination Creation (Script Call)
```bash
.sysom-sdk-venv/bin/python scripts/create-alert-destination.py 'https://oapi.dingtalk.com/robot/send?access_token=xxx'
```
**Success criteria**: stdout outputs `destination_id` (a pure number), stderr outputs `✅ Alert destination created successfully`
### 4.3 Alert Strategy Creation (SDK Script Call)
```bash
.sysom-sdk-venv/bin/python scripts/create-alert-strategy.py \
--name "aliyun-aes-skills-create-<YYYYMMDDHHmm>" \
--items "<alert_items>" \
--clusters "default" \
--destinations "<destination_id>"
```
**Success criteria**: stdout outputs strategy name, stderr outputs `✅ Alert strategy created successfully`
FILE:scripts/create-alert-destination.py
#!/usr/bin/env python3
"""
SysOM 告警联系人创建脚本
功能:通过 SDK 创建钉钉告警联系人(Alert Destination)
用法:python scripts/create-alert-destination.py <webhook_url> [destination_name]
参数:
webhook_url 钉钉群机器人 Webhook 地址(必填)
destination_name 告警联系人名称(可选,默认自动生成)
凭据来源(按优先级):
1. 环境变量 ALIBABA_CLOUD_ACCESS_KEY_ID / ALIBABA_CLOUD_ACCESS_KEY_SECRET
2. aliyun CLI 配置文件 ~/.aliyun/config.json(自动读取当前 profile)
返回:
成功时输出 destination_id(纯数字),供后续 create-alert-strategy 使用
"""
import json
import os
import sys
from datetime import datetime
def validate_arguments():
if len(sys.argv) < 2:
print("❌ 缺少必填参数:webhook_url", file=sys.stderr)
print(f"用法:python {sys.argv[0]} <webhook_url> [destination_name]", file=sys.stderr)
sys.exit(1)
webhook_url = sys.argv[1]
if not webhook_url.startswith("https://oapi.dingtalk.com/robot/send"):
print("⚠️ Webhook 地址格式可能不正确,预期格式:https://oapi.dingtalk.com/robot/send?access_token=xxx", file=sys.stderr)
timestamp = datetime.now().strftime("%Y%m%d%H%M")
destination_name = sys.argv[2] if len(sys.argv) > 2 else f"aliyun-aes-skills-destination-{timestamp}"
return webhook_url, destination_name
def load_credentials_from_cli_config():
config_path = os.path.join(os.path.expanduser("~"), ".aliyun", "config.json")
if not os.path.exists(config_path):
return None, None
try:
with open(config_path, "r", encoding="utf-8") as f:
config = json.load(f)
current_profile = config.get("current", "")
profiles = config.get("profiles", [])
target_profile = None
for profile in profiles:
if profile.get("name") == current_profile:
target_profile = profile
break
if not target_profile and profiles:
target_profile = profiles[0]
if target_profile:
access_key_id = target_profile.get("access_key_id", "")
access_key_secret = target_profile.get("access_key_secret", "")
if access_key_id and access_key_secret:
profile_name = target_profile.get("name", "default")
print(f"🔑 从 aliyun CLI 配置读取凭据(profile: {profile_name})", file=sys.stderr)
return access_key_id, access_key_secret
except (json.JSONDecodeError, KeyError, TypeError):
pass
return None, None
def validate_credentials():
access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")
if access_key_id and access_key_secret:
print("🔑 从环境变量读取凭据", file=sys.stderr)
return access_key_id, access_key_secret
access_key_id, access_key_secret = load_credentials_from_cli_config()
if access_key_id and access_key_secret:
return access_key_id, access_key_secret
print("❌ 未找到阿里云凭据,请通过以下任一方式配置:", file=sys.stderr)
print(" 方式1:aliyun configure(推荐,脚本自动读取)", file=sys.stderr)
print(" 方式2:export ALIBABA_CLOUD_ACCESS_KEY_ID=<your_ak>", file=sys.stderr)
print(" export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<your_sk>", file=sys.stderr)
sys.exit(1)
def main():
webhook_url, destination_name = validate_arguments()
access_key_id, access_key_secret = validate_credentials()
try:
from alibabacloud_tea_openapi.utils_models import Config
from alibabacloud_sysom20231230.client import Client
from alibabacloud_sysom20231230 import models
except ImportError:
print("❌ SDK 未安装,请先运行:bash scripts/setup-sdk.sh", file=sys.stderr)
sys.exit(1)
config = Config(
access_key_id=access_key_id,
access_key_secret=access_key_secret,
endpoint="sysom.aliyuncs.com",
user_agent="AlibabaCloud-Agent-Skills/alibabacloud-aes-sysom-os-diagnosis",
connect_timeout=10000,
read_timeout=30000
)
client = Client(config)
request = models.CreateAlertDestinationRequest(
target="dingtalk",
name=destination_name,
source="aes-skills",
params=models.CreateAlertDestinationRequestParams(
webhook=webhook_url
)
)
try:
response = client.create_alert_destination(request)
response_body = response.body
if hasattr(response_body, "to_map"):
result = response_body.to_map()
else:
result = {"body": str(response_body)}
code = result.get("code", "")
if code == "Success":
data = result.get("data", {})
destination_id = data.get("id")
print(destination_id)
print(f"✅ 告警联系人创建成功", file=sys.stderr)
print(f" ID: {destination_id}", file=sys.stderr)
print(f" 名称: {destination_name}", file=sys.stderr)
print(f" 目标: dingtalk", file=sys.stderr)
else:
message = result.get("message", "未知错误")
print(f"❌ 创建失败:{code} - {message}", file=sys.stderr)
print(json.dumps(result, indent=2, ensure_ascii=False), file=sys.stderr)
sys.exit(1)
except Exception as error:
print(f"❌ API 调用异常:{error}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()
FILE:scripts/create-alert-strategy.py
#!/usr/bin/env python3
"""
SysOM 告警策略创建脚本
功能:通过 SDK 创建告警策略(支持 destinations 参数,CLI 不支持此参数)
用法:.sysom-sdk-venv/bin/python scripts/create-alert-strategy.py \
--name <策略名称> \
--items <告警项1>,<告警项2> \
--clusters <集群1> \
--destinations <联系人ID1>,<联系人ID2>
参数:
--name 策略名称(必填)
--items 告警项列表,逗号分隔(必填)
--clusters 集群列表,逗号分隔(必填,实例模式填 default)
--destinations 告警联系人 ID 列表,逗号分隔(必填)
--k8s-label 是否启用 k8s 标签(可选,默认 false)
凭据来源(按优先级):
1. 环境变量 ALIBABA_CLOUD_ACCESS_KEY_ID / ALIBABA_CLOUD_ACCESS_KEY_SECRET
2. aliyun CLI 配置文件 ~/.aliyun/config.json(自动读取当前 profile)
返回:
成功时 stdout 输出策略名称,详细信息输出到 stderr
"""
import argparse
import json
import os
import sys
def parse_arguments():
parser = argparse.ArgumentParser(description="创建 SysOM 告警策略")
parser.add_argument("--name", required=True, help="策略名称")
parser.add_argument("--items", required=True, help="告警项列表,逗号分隔")
parser.add_argument("--clusters", required=True, help="集群列表,逗号分隔(实例模式填 default)")
parser.add_argument("--destinations", required=True, help="告警联系人 ID 列表,逗号分隔")
parser.add_argument("--k8s-label", action="store_true", default=False, help="是否启用 k8s 标签")
return parser.parse_args()
def load_credentials_from_cli_config():
config_path = os.path.join(os.path.expanduser("~"), ".aliyun", "config.json")
if not os.path.exists(config_path):
return None, None
try:
with open(config_path, "r", encoding="utf-8") as f:
config = json.load(f)
current_profile = config.get("current", "")
profiles = config.get("profiles", [])
target_profile = None
for profile in profiles:
if profile.get("name") == current_profile:
target_profile = profile
break
if not target_profile and profiles:
target_profile = profiles[0]
if target_profile:
access_key_id = target_profile.get("access_key_id", "")
access_key_secret = target_profile.get("access_key_secret", "")
if access_key_id and access_key_secret:
profile_name = target_profile.get("name", "default")
print(f"🔑 从 aliyun CLI 配置读取凭据(profile: {profile_name})", file=sys.stderr)
return access_key_id, access_key_secret
except (json.JSONDecodeError, KeyError, TypeError):
pass
return None, None
def validate_credentials():
access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")
if access_key_id and access_key_secret:
print("🔑 从环境变量读取凭据", file=sys.stderr)
return access_key_id, access_key_secret
access_key_id, access_key_secret = load_credentials_from_cli_config()
if access_key_id and access_key_secret:
return access_key_id, access_key_secret
print("❌ 未找到阿里云凭据,请通过以下任一方式配置:", file=sys.stderr)
print(" 方式1:aliyun configure(推荐,脚本自动读取)", file=sys.stderr)
print(" 方式2:export ALIBABA_CLOUD_ACCESS_KEY_ID=<your_ak>", file=sys.stderr)
print(" export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<your_sk>", file=sys.stderr)
sys.exit(1)
def main():
args = parse_arguments()
access_key_id, access_key_secret = validate_credentials()
items_list = [item.strip() for item in args.items.split(",") if item.strip()]
clusters_list = [cluster.strip() for cluster in args.clusters.split(",") if cluster.strip()]
try:
destinations_list = [int(d.strip()) for d in args.destinations.split(",") if d.strip()]
except ValueError:
print("❌ destinations 参数格式错误,应为逗号分隔的整数 ID(如:1,2,3)", file=sys.stderr)
sys.exit(1)
if not items_list:
print("❌ 至少需要指定一个告警项(--items)", file=sys.stderr)
sys.exit(1)
if not destinations_list:
print("❌ 至少需要指定一个告警联系人 ID(--destinations)", file=sys.stderr)
sys.exit(1)
try:
from alibabacloud_tea_openapi.utils_models import Config
from alibabacloud_sysom20231230.client import Client
from alibabacloud_sysom20231230 import models
except ImportError:
print("❌ SDK 未安装,请先运行:bash scripts/setup-sdk.sh", file=sys.stderr)
sys.exit(1)
config = Config(
access_key_id=access_key_id,
access_key_secret=access_key_secret,
endpoint="sysom.aliyuncs.com",
user_agent="AlibabaCloud-Agent-Skills/alibabacloud-aes-sysom-os-diagnosis",
connect_timeout=10000,
read_timeout=30000
)
client = Client(config)
strategy = models.CreateAlertStrategyRequestStrategy(
clusters=clusters_list,
items=items_list,
destinations=destinations_list
)
request = models.CreateAlertStrategyRequest(
name=args.name,
enabled=True,
k_8s_label=args.k8s_label,
strategy=strategy
)
try:
response = client.create_alert_strategy(request)
response_body = response.body
if hasattr(response_body, "to_map"):
result = response_body.to_map()
else:
result = {"body": str(response_body)}
code = result.get("code", "")
if code == "Success":
print(args.name)
print(f"✅ 告警策略创建成功", file=sys.stderr)
print(f" 策略名称: {args.name}", file=sys.stderr)
print(f" 告警项数: {len(items_list)}", file=sys.stderr)
print(f" 集群: {', '.join(clusters_list)}", file=sys.stderr)
print(f" 告警联系人 ID: {destinations_list}", file=sys.stderr)
else:
message = result.get("message", "未知错误")
print(f"❌ 创建失败:{code} - {message}", file=sys.stderr)
print(json.dumps(result, indent=2, ensure_ascii=False), file=sys.stderr)
sys.exit(1)
except Exception as error:
print(f"❌ API 调用异常:{error}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()
FILE:scripts/setup-sdk.sh
#!/bin/bash
# SysOM Alert Destination SDK 环境初始化脚本
# 功能:检测 Python 版本 >= 3.8,创建虚拟环境,安装 alibabacloud_sysom20231230 SDK
# 用法:bash scripts/setup-sdk.sh
set -e
VENV_DIR=".sysom-sdk-venv"
MIN_PYTHON_VERSION="3.8"
SDK_PACKAGE="alibabacloud_sysom20231230==1.16.0"
echo "🔍 检测 Python 环境..."
# 查找可用的 Python 解释器
PYTHON_CMD=""
for cmd in python3 python; do
if command -v "$cmd" &>/dev/null; then
PYTHON_CMD="$cmd"
break
fi
done
if [ -z "$PYTHON_CMD" ]; then
echo "❌ 未找到 Python 解释器,请先安装 Python >= MIN_PYTHON_VERSION"
echo " 安装指南:https://www.python.org/downloads/"
exit 1
fi
# 检测 Python 版本
PYTHON_VERSION=$($PYTHON_CMD -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')")
PYTHON_MAJOR=$($PYTHON_CMD -c "import sys; print(sys.version_info.major)")
PYTHON_MINOR=$($PYTHON_CMD -c "import sys; print(sys.version_info.minor)")
if [ "$PYTHON_MAJOR" -lt 3 ] || ([ "$PYTHON_MAJOR" -eq 3 ] && [ "$PYTHON_MINOR" -lt 8 ]); then
echo "❌ Python 版本过低:当前 PYTHON_VERSION,要求 >= MIN_PYTHON_VERSION"
echo " 请升级 Python:https://www.python.org/downloads/"
exit 1
fi
echo "✅ Python PYTHON_VERSION 满足要求(>= MIN_PYTHON_VERSION)"
# 确定脚本所在目录(scripts/),虚拟环境创建在 skill 根目录
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
SKILL_ROOT="$(dirname "$SCRIPT_DIR")"
VENV_PATH="SKILL_ROOT/VENV_DIR"
# 创建虚拟环境
if [ -d "$VENV_PATH" ]; then
echo "📦 虚拟环境已存在:VENV_PATH"
else
echo "📦 创建虚拟环境:VENV_PATH"
$PYTHON_CMD -m venv "$VENV_PATH"
fi
# 激活虚拟环境并安装 SDK
echo "📥 安装 SDK:SDK_PACKAGE"
"VENV_PATH/bin/pip" install --quiet --upgrade pip
"VENV_PATH/bin/pip" install --quiet "$SDK_PACKAGE"
# 验证安装
SDK_VERSION=$("VENV_PATH/bin/python" -c "import alibabacloud_sysom20231230; print(alibabacloud_sysom20231230.__version__)")
echo "✅ SDK 安装成功:SDK_PACKAGE vSDK_VERSION"
echo ""
echo "📌 后续使用 SDK 时,请通过以下方式运行 Python 脚本:"
echo " VENV_PATH/bin/python scripts/create-alert-destination.py <webhook_url> [name]"
Use Alibaba Cloud DashScope API and LingMou to generate AI video and speech. Seven capabilities — (1) LivePortrait talking-head (image + audio → video, two-s...
---
name: alibabacloud-avatar-video
description: Use Alibaba Cloud DashScope API and LingMou to generate AI video and speech. Seven capabilities — (1) LivePortrait talking-head (image + audio → video, two-step), (2) EMO talking-head, (3) AA/AnimateAnyone full-body animation (three-step), (4) T2I text-to-image (Wan 2.x, default wan2.2-t2i-flash), (5) I2V image-to-video (Wan 2.x, default wan2.7-i2v-flash, supports T2I→I2V pipeline), (6) Qwen TTS (auto model/voice by scene, default qwen3-tts-vd-realtime-2026-01-15), (7) LingMou digital-human template video with random template, public-template copy, and script confirmation. Trigger when the user needs talking-head, portrait, full-body animation, text-to-image, text-to-video, or speech synthesis.
metadata:
{
"openclaw": {
"emoji": "🎭",
"requires": {
"bins": ["ffmpeg", "ffprobe"],
"env": [
"DASHSCOPE_API_KEY",
"ALIBABA_CLOUD_ACCESS_KEY_ID",
"ALIBABA_CLOUD_ACCESS_KEY_SECRET",
"OSS_BUCKET",
"OSS_ENDPOINT"
]
}
}
}
---
# Human Avatar — Alibaba Cloud AI Video & Speech
## Capabilities overview
| Capability | Script | Model / API | Region | Summary |
|------|------|---------|--------|------|
| **LivePortrait** | `live_portrait.py` | `liveportrait` | cn-beijing | Portrait + audio/video → talking video, two steps |
| **EMO** | `portrait_animate.py` | `emo-v1` | cn-beijing | Portrait + audio → talking head, detect + generate |
| **AA** (AnimateAnyone) | `animate_anyone.py` | `animate-anyone-gen2` | cn-beijing | Full-body animation: detect → motion template → video |
| **T2I** | `text_to_image.py` | `wan2.x-t2i` | Multi-region | Text → image, default wan2.2-t2i-flash |
| **I2V** | `image_to_video.py` | `wan2.x-i2v` | Multi-region | Image → video; T2I→I2V pipeline supported; default wan2.7-i2v-flash |
| **Qwen TTS** | `qwen_tts.py` | `qwen3-tts-*` | cn-beijing / Singapore | Text → speech; auto model/voice by scene |
| **LingMou** | `avatar_video.py` | LingMou SDK | cn-beijing | Template-based digital-human broadcast video |
---
## Quick selection guide
```
Talking head (have audio/video already) → LivePortrait
Talking head (no audio; synthesize first) → Qwen TTS → LivePortrait
Full-body dance / motion → AA (AnimateAnyone)
Text → image → T2I (text_to_image)
Image → video → I2V (image_to_video)
Text → video end-to-end → T2I → I2V (image_to_video --t2i-prompt)
Enterprise digital human / template news → LingMou (avatar_video)
```
---
## Environment setup
```bash
pip install requests==2.33.1 dashscope==1.25.15 oss2==2.19.1 numpy==1.26.4
# LingMou additionally:
pip install alibabacloud-lingmou20250527==1.7.0 alibabacloud-tea-openapi==0.4.4
```
```bash
export DASHSCOPE_API_KEY=sk-xxxx # Beijing-region API key
export ALIBABA_CLOUD_ACCESS_KEY_ID=xxx # OSS upload
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=xxx
export OSS_BUCKET=your-bucket
export OSS_ENDPOINT=oss-cn-beijing.aliyuncs.com
```
> ⚠️ API keys for `cn-beijing` and **Singapore are not interchangeable**; use the key for the correct region.
> `OSS_ENDPOINT` may include or omit the `https://` prefix; scripts normalize it.
---
## 1. LivePortrait — talking-head video
**When to use**: You have a portrait photo + speech and want a talking-head video quickly.
**Flow**:
```
Step 1: liveportrait-detect (sync) → pass=true
↓
Step 2: liveportrait (async) → video_url
```
**Image**: Single person, front-facing portrait, clear face, no occlusion
**Audio**: wav/mp3, < 15MB, 1s–3min
**Video input**: Audio extracted automatically (ffmpeg)
```bash
# Image + audio file
python scripts/live_portrait.py \
--image ./portrait.jpg \
--audio ./speech.mp3 \
--template normal --download
# Image + video (extract audio)
python scripts/live_portrait.py \
--image ./portrait.jpg \
--video ./speech_video.mp4 \
--template active --download
# Public URLs
python scripts/live_portrait.py \
--image-url "https://..." \
--audio-url "https://..." \
--mouth-strength 1.2 --download
```
**Motion templates**:
- `normal` (default, moderate motion)
- `calm` (calm; news / storytelling)
- `active` (lively; singing / hosting)
---
## 2. Qwen TTS — text to speech
**When to use**: Generate speech files from text (for LivePortrait, EMO, etc.).
**Default model**: `qwen3-tts-vd-realtime-2026-01-15`
### Auto model selection by scene
| Scene `--scene` | Suggested model | Suggested voice |
|---------------|---------|---------|
| `default` / `brand` | `qwen3-tts-vd-realtime-2026-01-15` | Cherry |
| `news` / `documentary` / `advertising` | `qwen3-tts-instruct-flash-realtime` | Serena / Ethan |
| `audiobook` / `drama` | `qwen3-tts-instruct-flash-realtime` | Cherry / Dylan |
| `customer_service` / `chatbot` / `education` | `qwen3-tts-flash-realtime` | Anna / Ethan |
| `ecommerce` / `short_video` | `qwen3-tts-flash-realtime` | Cherry / Chelsie |
### Available voices
| Voice | Character |
|------|------|
| `Cherry` | Bright, sweet female; ads / audiobooks / dubbing |
| `Serena` | Mature, intellectual female; news / explainers / corporate |
| `Ethan` | Steady, warm male; education / documentary / training |
| `Dylan` | Expressive male; radio drama / game VO |
| `Anna` | Gentle, friendly female; support / assistant / daily |
| `Chelsie` | Young, fresh female; short video / e-commerce |
| `Thomas` | Deep, magnetic male; brand / ads |
| `Luna` | Warm, soft female; meditation / storytelling |
```bash
# Default (qwen3-tts-vd-realtime + Cherry)
python scripts/qwen_tts.py --text "Hello, welcome to Qwen TTS." --download
# Match by scene
python scripts/qwen_tts.py --text "Today's market..." --scene news --download
python scripts/qwen_tts.py --text "Once upon a time..." --scene audiobook --download
# Style via instructions
python scripts/qwen_tts.py \
--text "Dear students..." \
--model qwen3-tts-instruct-flash-realtime \
--instructions "Warm tone, steady pace, suitable for teaching" \
--download
# List options
python scripts/qwen_tts.py --list-voices
python scripts/qwen_tts.py --list-models
```
---
## 3. T2I — Wan 2.x text-to-image
**When to use**: Generate images from text (optionally feed into I2V).
```bash
# Default model (wan2.2-t2i-flash, fast)
python scripts/text_to_image.py \
--prompt "A woman in Hanfu in a peach blossom forest, cinematic, 4K, soft light" \
--size 960*1696 --download
# Higher quality
python scripts/text_to_image.py \
--prompt "..." --model wan2.2-t2i-plus --size 1280*1280 --download
# Latest (Wan 2.6)
python scripts/text_to_image.py \
--prompt "..." --model wan2.6-t2i --size 1280*1280 --n 1 --download
```
**Models**:
- `wan2.2-t2i-flash` (default, fast, good for tests)
- `wan2.2-t2i-plus` (higher quality)
- `wan2.6-t2i` (latest; more aspect ratios; sync call)
**Common sizes**: `1280*1280` (1:1) / `960*1696` (9:16) / `1696*960` (16:9)
---
## 4. I2V — Wan 2.x image-to-video
**When to use**: Turn an image into motion video; supports text-to-video via T2I first.
```bash
# Local image → video
python scripts/image_to_video.py \
--image ./portrait.jpg \
--prompt "She turns slowly and smiles; dress and petals drift gently" \
--model wan2.7-i2v \
--resolution 720P --duration 5 --download
# Pipeline: text → image → video
python scripts/image_to_video.py \
--t2i-prompt "A woman in Hanfu in a peach blossom forest" \
--prompt "She turns slowly; petals fall; poetic mood" \
--download --output result.mp4
# With background music
python scripts/image_to_video.py \
--image ./portrait.jpg \
--audio-url "https://..." \
--prompt "..." --download
```
**Models**:
- `wan2.7-i2v` (default; includes sound; 5s/10s)
- `wan2.5-i2v-preview` (high-quality preview)
- `wan2.2-i2v-plus` (no built-in audio; faster)
---
## 5. AA AnimateAnyone — full-body animation
**When to use**: Full-body photo + reference motion video → dance / motion video.
**Requirements**:
- Image: Single person, full body front, head to toe, aspect ratio 0.5–2.0
- Video: Full body in frame from first frame; mp4/avi/mov; fps ≥ 24; 2–60s
**Three steps**:
```
Step 1: animate-anyone-detect-gen2 (sync) → check_pass=true
↓
Step 2: animate-anyone-template-gen2 (async) → template_id (~3–5 min)
↓
Step 3: animate-anyone-gen2 (async) → video_url (~3–5 min)
```
```bash
# Local files (auto convert + OSS upload)
python scripts/animate_anyone.py \
--image ./portrait_fullbody.jpg \
--video ./dance.mp4 \
--download --output result.mp4
# Use image as background
python scripts/animate_anyone.py \
--image ./portrait.jpg --video ./dance.mp4 \
--use-ref-img-bg --video-ratio 9:16 --download
# Skip Step 2 (existing template_id)
python scripts/animate_anyone.py \
--image ./portrait.jpg \
--template-id "AACT.xxx.xxx" --download
```
> Auto conversion: video webm/mkv/flv → mp4; image webp/heic → jpg; if fps is under 24, normalize to 24 fps
---
## 6. EMO — talking head (legacy)
**Note**: Prefer LivePortrait; EMO suits cases that need stricter lip-sync.
```bash
python scripts/portrait_animate.py \
--image ./portrait.jpg \
--audio ./speech.mp3 \
--download
```
---
## 7. LingMou — enterprise template video
**When to use**: Corporate digital-human news, template-based broadcasts, scripted reads with optional character images.
### New workflow (prefer no `template_id`)
- If the user **provides `template_id`**: use that template to generate.
- If **no `template_id`**:
1. List existing broadcast templates for the account.
2. If any exist, **pick one at random** for creation.
3. If none, fetch public templates and **copy up to 3** into the account.
4. Pick one at random from the copy results and continue.
- **Caveat**: After a public template is copied, the copy may not yet be a fully “ready-to-render” template; some copies are still drafts and may lack clips, assets, or variable bindings—complete them in LingMou.
- If the user only gives an image and “make a talking video” **without a script**: confirm the spoken copy before generating.
### What `scripts/avatar_video.py` supports
- `--list-templates`: list account templates
- `--list-public-templates`: list public templates (SDK 1.7.0+)
- `--copy-public-templates`: copy up to 3 public templates (SDK 1.7.0+)
- Omit `--template-id`: random existing template
- When local templates are empty: auto try public-template copy as fallback
- `--show-template-detail`: template detail and replaceable variables
- Fills input text into template text variables (prefers `text_content` / `test_text`)
- If generation fails right after copying a public template, surfaces a clear error that the template may still need completion (no silent failure)
```bash
# List templates
python scripts/avatar_video.py --list-templates
# Public templates (SDK 1.7.0+)
python scripts/avatar_video.py --list-public-templates
# Copy up to 3 public templates (SDK 1.7.0+)
python scripts/avatar_video.py --copy-public-templates
# No template_id — random existing template
python scripts/avatar_video.py \
--text "Hello, welcome to today's tech news." \
--download
# Specific template_id
python scripts/avatar_video.py \
--template-id "BS1b2WNnRMu4ouRzT4clY9Jhg" \
--text "Hello, welcome to today's tech news." \
--download
# Detail for randomly chosen template
python scripts/avatar_video.py \
--show-template-detail \
--text "This is a test script for broadcast."
```
### Conversational usage
When the user says things like:
- “Make a talking video from this image”
- “Digital-human broadcast for me”
- “Upload image and make a news read”
Do this:
1. Check whether they already gave copy/script ready to read.
2. If not, ask: **“What is the exact script to read? You can give bullet points and I can turn them into broadcast-ready copy.”**
3. With script in hand, run LingMou: prefer random existing template; if none locally, try public copy.
4. If they uploaded a portrait but the template API does not use it, explain: this path is template-driven; for image-driven talking head, use LivePortrait or EMO.
---
## API reference links
- **LivePortrait**: https://help.aliyun.com/zh/model-studio/liveportrait-api
- **EMO** (emo-detect + emo-v1): [references/emo-api.md](references/emo-api.md)
- **AA** (Animate Anyone): [references/aa-api.md](references/aa-api.md)
- **T2I** (text-to-image v2): https://help.aliyun.com/zh/model-studio/text-to-image-v2-api-reference
- **I2V** (image-to-video): https://help.aliyun.com/zh/model-studio/image-to-video-api-reference/
- **Qwen TTS**: https://help.aliyun.com/zh/model-studio/qwen-tts-realtime
- **LingMou**: [references/lingmou-api.md](references/lingmou-api.md)
- **OSS upload**: [references/oss-upload.md](references/oss-upload.md)
FILE:references/aa-api.md
# AA (AnimateAnyone Gen2) API reference
Official docs:
- Image detection: https://help.aliyun.com/zh/model-studio/animate-anyone-detect-api
- Motion template: https://help.aliyun.com/zh/model-studio/animate-anyone-template-api
- Video generation: https://help.aliyun.com/zh/model-studio/animateanyone-video-generation-api
## Important
- **Region is Beijing**: `dashscope.aliyuncs.com`; API key must be for Beijing.
- Three steps total: Step 1 sync; Steps 2/3 async (poll `task_id`).
---
## Three-step pipeline
```
Step 1: aa-detect (sync) → check_pass=true or error
Step 2: aa-template (async) → template_id
Step 3: aa-generate (async) → video_url
```
---
## Step 1: Image detection (sync)
```
POST https://dashscope.aliyuncs.com/api/v1/services/aigc/image2video/aa-detect
```
| Field | Description |
|------|------|
| `model` | `animate-anyone-detect-gen2` |
| `input.image_url` | Public HTTP/HTTPS URL; jpg/jpeg/png/bmp; under 5MB; max edge ≤4096 |
Response:
```json
{
"output": {
"check_pass": true,
"bodystyle": "full"
}
}
```
(`bodystyle`: `"full"` = full body, `"half"` = half body.)
**Image requirements (to pass)**:
- Single person, front or near-front, no strong profile
- Clear face, no occlusion
- Full body or at least waist-up visible
- Simple background preferred
---
## Step 2: Motion template (async)
```
POST https://dashscope.aliyuncs.com/api/v1/services/aigc/image2video/aa-template-generation/
Header: X-DashScope-Async: enable
```
| Field | Description |
|------|------|
| `model` | `animate-anyone-template-gen2` |
| `input.video_url` | Public URL; mp4/avi/mov; H.264/H.265; fps≥24; 2–60s; ≤200MB |
**Video requirements**:
- Full body in frame, single continuous shot, no hard cuts
- First frame facing camera
- Subject visible from first frame
Poll `GET /api/v1/tasks/{task_id}`:
```json
{
"output": {
"task_status": "SUCCEEDED",
"template_id": "AACT.xxx.xxx-xxx.xxx"
}
}
```
---
## Step 3: Video generation (async)
```
POST https://dashscope.aliyuncs.com/api/v1/services/aigc/image2video/video-synthesis/
Header: X-DashScope-Async: enable
```
| Field | Description |
|------|------|
| `model` | `animate-anyone-gen2` |
| `input.image_url` | Image URL that passed Step 1 |
| `input.template_id` | `template_id` from Step 2 |
| `parameters.use_ref_img_bg` | `false` (default, video background) / `true` (image background) |
| `parameters.video_ratio` | `"9:16"` or `"3:4"` (only when `use_ref_img_bg=true`) |
Poll response:
```json
{
"output": {
"task_status": "SUCCEEDED",
"video_url": "https://xxx/output.mp4"
}
}
```
> ⚠️ `video_url` is valid for **24 hours** after success—download promptly.
---
## Format conversion (ffmpeg)
| Input | Unsupported | Target | Command |
|---------|----------|---------|---------|
| Image | webp, heic, tif, bmp | jpg | `ffmpeg -i input.webp -q:v 2 output.jpg` |
| Video | webm, mkv, flv, wmv | mp4 (H.264) | `ffmpeg -i input.webm -c:v libx264 -crf 22 -c:a aac output.mp4` |
| Video fps under 24 | — | 24 fps | `ffmpeg -i input.mp4 -vf fps=24 -c:v libx264 output.mp4` |
`animate_anyone.py` performs this automatically.
---
## Full examples
```bash
# Local files (convert + OSS)
python scripts/animate_anyone.py \
--image ./portrait.jpg \
--video ./dance.webm \
--download --output result.mp4
# Existing URLs
python scripts/animate_anyone.py \
--image-url "https://oss.../portrait.jpg?..." \
--video-url "https://oss.../dance.mp4?..." \
--download
# Existing template_id (skip Step 2)
python scripts/animate_anyone.py \
--image ./portrait.jpg \
--template-id "AACT.xxx.xxx" \
--download
# Image as background
python scripts/animate_anyone.py \
--image ./portrait.jpg --video ./dance.mp4 \
--use-ref-img-bg --video-ratio 9:16 --download
```
FILE:references/emo-api.md
# EMO API (DashScope)
Official docs:
- EMO detect: `https://help.aliyun.com/zh/model-studio/emo-detect-api`
- EMO generate: `https://help.aliyun.com/zh/model-studio/emo-api`
## Auth and region
- Auth: `Authorization: Bearer $DASHSCOPE_API_KEY`
- Region: **Beijing** (`dashscope.aliyuncs.com`)
- Set header: `X-DashScope-Async: enable`
## 1) Portrait detection (required first)
`POST https://dashscope.aliyuncs.com/api/v1/services/aigc/image2video/face-detect`
Body:
```json
{
"model": "emo-detect-v1",
"input": {"image_url": "https://.../portrait.png"},
"parameters": {"ratio": "1:1"}
}
```
Key success fields:
- `output.check_pass`
- `output.face_bbox`
- `output.ext_bbox`
## 2) Submit video generation
`POST https://dashscope.aliyuncs.com/api/v1/services/aigc/image2video/video-synthesis`
Body:
```json
{
"model": "emo-v1",
"input": {
"image_url": "https://.../portrait.png",
"audio_url": "https://.../speech.mp3",
"face_bbox": [302,286,610,593],
"ext_bbox": [71,9,840,778]
},
"parameters": {"style_level": "normal"}
}
```
Returns: `output.task_id`
## 3) Poll task
`GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}`
States: `PENDING` → `RUNNING` → `SUCCEEDED` / `FAILED`
On success:
- `output.results.video_url`
## Limits
- Image: min edge ≥ 400, max edge ≤ 7000
- Audio: wav/mp3, ≤ 15MB, ≤ 60 seconds
- URLs must be public http/https
FILE:references/lingmou-api.md
# LingMou API
Official docs:
- Create broadcast video from template:
`https://help.aliyun.com/zh/avatar/avatar-application/developer-reference/api-lingmou-2025-05-27-createbroadcastvideofromtemplate`
- Batch query broadcast videos:
`https://help.aliyun.com/zh/avatar/avatar-application/developer-reference/api-lingmou-2025-05-27-listbroadcastvideosbyid`
- Get broadcast template:
`https://api.aliyun.com/api/LingMou/2025-05-27/GetBroadcastTemplate`
- List broadcast templates:
`https://api.aliyun.com/api/LingMou/2025-05-27/ListBroadcastTemplates`
## Auth and region
- Auth: Alibaba Cloud AK/SK (OpenAPI signature)
- Region: `cn-beijing`
- Endpoint: `lingmou.cn-beijing.aliyuncs.com`
- API version: `2025-05-27`
## Verified flow (SDK 1.6.0 in test env)
1. Call `ListBroadcastTemplates` for existing templates.
2. If user did not specify `templateId`, pick one at random.
3. Call `GetBroadcastTemplate` for details and `variables`.
4. Choose a replaceable text variable (prefer `text_content`).
5. Call `CreateBroadcastVideoFromTemplate`.
6. Poll `ListBroadcastVideosById` with returned `id`.
7. When `status=SUCCESS`, read `videoURL`.
## New capabilities (verified with SDK 1.7.0 in venv)
Goals:
1. List public broadcast templates
2. Copy a public template into your account
3. Create video from the copied template
Intended workflow:
- `list broadcast template` first
- If templates exist: pick one at random
- If none: list public templates, copy up to 3, then pick one at random to create video
**Observed behavior**:
- `alibabacloud-lingmou20250527==1.7.0` adds:
- `list_public_broadcast_scene_templates`
- `copy_broadcast_scene_from_template`
- Both were called successfully in tests
- **Creating video immediately after copy is not guaranteed**; errors such as `100010031001-400` (“no valid segments”) can occur
- The new `BS...` id may list variables but still lack a complete renderable scene
- **Production strategy**:
- **Prefer random selection among known-good account templates**
- **Only when local templates are empty**, use public copy as fallback
- If generation still fails, tell the user the template may need completion in LingMou
## Python SDK fields
### Version differences
- System Python had `1.6.0` without public-template APIs
- For public-template tests, use your venv’s Python, e.g.:
```bash
.venv-human-avatar/bin/python scripts/avatar_video.py --list-public-templates
```
### ListBroadcastTemplates
Response includes:
- `data[].id`
- `data[].name`
- `data[].variables` (often empty or minimal in list response)
Example (test account):
```json
[
{"id": "BS1vs5wAhH7OvW7btG1M6VxEQ", "name": "boy-01"},
{"id": "BS1V_mn-IwR6uZTgxuiKoWdPw", "name": "girl-01"},
{"id": "BS1JqkX1Dm4VGjseLKkPkpmiw", "name": "boy-02"},
{"id": "BS1bR7OvVfFY2OkNEy591084A", "name": "girl-02"}
]
```
### GetBroadcastTemplate
Python SDK uses `template_id`, not `id`:
```python
req = lm.GetBroadcastTemplateRequest()
req.template_id = "BS1vs5wAhH7OvW7btG1M6VxEQ"
resp = client.get_broadcast_template(req)
```
Example response:
```json
{
"id": "BS1vs5wAhH7OvW7btG1M6VxEQ",
"name": "boy-01",
"variables": [
{
"name": "text_content",
"type": "text"
}
]
}
```
## CreateBroadcastVideoFromTemplate (key payload)
```json
{
"templateId": "BS1b2WNnRMu4ouRzT4clY9Jhg",
"name": "Broadcast video test",
"variables": [
{
"name": "text_content",
"type": "text",
"properties": {"content": "Script to read aloud"}
}
],
"videoOptions": {
"resolution": "720p",
"fps": 30,
"watermark": true
}
}
```
## ListBroadcastVideosById (key fields)
- `data[].status`: `SUCCESS` / `ERROR` / `PROCESSING` / …
- `data[].videoURL`
- `data[].captionURL`
## Variable types
- `text` — text
- `image` — image asset
- `avatar` — digital-human asset
- `voice` — voice asset
## Integration notes
- Do not require user-supplied `template_id` in chat
- If omitted, list templates and pick at random
- If no script was given, confirm script before generating
- If the user insists on “must use my uploaded photo for talking head”, steer to `LivePortrait` or `EMO`; template broadcast is a different path from image-driven lip-sync
FILE:references/oss-upload.md
# OSS upload (for EMO / AA)
Local files must be uploaded to OSS first; pass the public URL to DashScope.
## Environment variables
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=xxx
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=xxx
export OSS_BUCKET=your-bucket
export OSS_ENDPOINT=oss-cn-beijing.aliyuncs.com
```
## Python examples
### Public bucket (direct public URL)
```python
import os
import oss2
def upload_to_oss(local_path: str, oss_key: str) -> str:
auth = oss2.Auth(
os.environ["ALIBABA_CLOUD_ACCESS_KEY_ID"],
os.environ["ALIBABA_CLOUD_ACCESS_KEY_SECRET"],
)
bucket_name = os.environ["OSS_BUCKET"]
endpoint = os.environ.get("OSS_ENDPOINT", "oss-cn-beijing.aliyuncs.com")
bucket = oss2.Bucket(auth, f"https://{endpoint}", bucket_name)
bucket.put_object_from_file(oss_key, local_path)
return f"https://{bucket_name}.{endpoint}/{oss_key}"
```
### Private bucket (signed URL, default 3-day expiry)
```python
import os
import oss2
DEFAULT_EXPIRES = 3 * 24 * 3600 # 3 days, seconds
def upload_to_oss(local_path: str, oss_key: str, expires: int = DEFAULT_EXPIRES) -> str:
"""
Upload a file to a private OSS bucket and return a signed temporary URL.
Args:
local_path: Local file path
oss_key: OSS object key (e.g. "avatars/face.jpg")
expires: Signed URL lifetime in seconds (default 3 days)
Returns:
Signed publicly reachable URL
"""
auth = oss2.Auth(
os.environ["ALIBABA_CLOUD_ACCESS_KEY_ID"],
os.environ["ALIBABA_CLOUD_ACCESS_KEY_SECRET"],
)
bucket_name = os.environ["OSS_BUCKET"]
endpoint = os.environ.get("OSS_ENDPOINT", "oss-cn-beijing.aliyuncs.com")
bucket = oss2.Bucket(auth, f"https://{endpoint}", bucket_name)
bucket.put_object_from_file(oss_key, local_path)
signed_url = bucket.sign_url("GET", oss_key, expires)
return signed_url
```
#### Usage
```python
url = upload_to_oss("./face.jpg", "avatars/face.jpg")
# → https://your-bucket.oss-cn-beijing.aliyuncs.com/avatars/face.jpg?OSSAccessKeyId=...&Expires=...&Signature=...
# Custom expiry (7 days)
url = upload_to_oss("./speech.mp3", "audio/speech.mp3", expires=7 * 24 * 3600)
```
## Notes
- The URL must be reachable on the public internet (http/https); DashScope must download it directly.
- Signed URL lifetime should cover task duration (EMO jobs often 2–10 minutes; 3 days is plenty).
- For private buckets use a signed URL; **do not** pass intranet URLs or `oss://` to DashScope.
- Consider OSS lifecycle rules to purge temporary assets (e.g. delete after 7 days).
- `bucket.sign_url()` returns a `str` usable as `image_url` / `audio_url`.
FILE:references/ram-policies.md
# RAM permission policies
This skill needs the following RAM permissions for Alibaba Cloud services. Prefer a least-privilege RAM role or user for the application.
## Permission overview
| Service | Permission | Purpose | Scripts |
|------|------|------|---------|
| **DashScope API** | `AliyunDashScopeFullAccess` or custom policy | Call AI APIs (video, TTS, image) | All scripts |
| **OSS** | `AliyunOSSFullAccess` or bucket-scoped | Upload media and issue signed URLs | live_portrait.py, animate_anyone.py, image_to_video.py, portrait_animate.py |
| **LingMou** | `AliyunLingMouFullAccess` or custom policy | Digital-human template video | avatar_video.py |
## Details
### 1. DashScope API
**Service**: DashScope (Model Studio)
**Regions**: cn-beijing (Beijing) / ap-southeast-1 (Singapore)
**Example policy**:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dashscope:CallModel",
"dashscope:GetTask",
"dashscope:ListTasks"
],
"Resource": "*"
}
]
}
```
**Model APIs used**:
- `liveportrait` — LivePortrait talking head
- `liveportrait-detect` — portrait detection
- `emo-v1` — EMO talking head
- `emo-detect-v1` — EMO detection
- `animate-anyone-gen2` — AnimateAnyone video
- `animate-anyone-detect-gen2` — AA image detection
- `animate-anyone-template-gen2` — AA motion template
- `wan2.x-t2i` — Wan text-to-image
- `wan2.x-i2v` — Wan image-to-video
- `qwen3-tts-*` — Qwen real-time TTS
**Env**: `DASHSCOPE_API_KEY`
### 2. OSS
**Service**: Object Storage Service
**Region**: cn-beijing or others
**Example policy**:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"oss:PutObject",
"oss:GetObject",
"oss:ListObjects"
],
"Resource": [
"acs:oss:*:*:your-bucket-name/human-avatar/*"
]
}
]
}
```
**Usage**:
- Upload local images, audio, video to OSS
- Generate signed URLs for DashScope
- Example prefix: `human-avatar/`
**Env**:
- `ALIBABA_CLOUD_ACCESS_KEY_ID`
- `ALIBABA_CLOUD_ACCESS_KEY_SECRET`
- `OSS_BUCKET`
- `OSS_ENDPOINT` (e.g. `oss-cn-beijing.aliyuncs.com`)
### 3. LingMou
**Service**: LingMou digital human
**Region**: cn-beijing
**Example policy**:
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"lingmou:ListBroadcastTemplates",
"lingmou:GetBroadcastTemplate",
"lingmou:CreateBroadcastVideoFromTemplate",
"lingmou:ListBroadcastVideosById",
"lingmou:ListPublicBroadcastSceneTemplates",
"lingmou:CopyBroadcastSceneFromTemplate"
],
"Resource": "*"
}
]
}
```
**Usage**:
- List and inspect broadcast templates
- Create digital-human videos from templates
- Poll video job status
- Copy public templates
**Env**:
- `ALIBABA_CLOUD_ACCESS_KEY_ID`
- `ALIBABA_CLOUD_ACCESS_KEY_SECRET`
- `LINGMOU_ENDPOINT` — optional, default `lingmou.cn-beijing.aliyuncs.com`
- `LINGMOU_REGION` — optional, default `cn-beijing`
## Minimal combined policy example
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dashscope:CallModel",
"dashscope:GetTask",
"dashscope:ListTasks"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"oss:PutObject",
"oss:GetObject"
],
"Resource": "acs:oss:*:*:your-bucket-name/human-avatar/*"
},
{
"Effect": "Allow",
"Action": [
"lingmou:ListBroadcastTemplates",
"lingmou:GetBroadcastTemplate",
"lingmou:CreateBroadcastVideoFromTemplate",
"lingmou:ListBroadcastVideosById",
"lingmou:ListPublicBroadcastSceneTemplates",
"lingmou:CopyBroadcastSceneFromTemplate"
],
"Resource": "*"
}
]
}
```
> **Security**:
> - Replace `your-bucket-name` with your real bucket name
> - Rotate AccessKeys regularly
> - Prefer RAM roles over root account keys
> - Grant only actions you need
## Checklist before deploy
- [ ] DashScope API key created and set in env
- [ ] OSS bucket exists and the key can write
- [ ] (Optional) LingMou enabled and permissions set
- [ ] All required env vars set
- [ ] Using RAM user/role, not root account
## Related docs
- [RAM overview](https://help.aliyun.com/zh/ram/product-overview/what-is-ram)
- [DashScope RAM](https://help.aliyun.com/zh/model-studio/ram-permission)
- [OSS authorization](https://help.aliyun.com/zh/oss/user-guide/authorization)
- [LingMou permissions](https://help.aliyun.com/zh/model-studio/lingmou)
FILE:scripts/animate_anyone.py
#!/usr/bin/env python3
"""
AnimateAnyone Gen2 — 三步流水线
SECURITY NOTES:
- subprocess: used ONLY to invoke system ffmpeg for video/image format
conversion (e.g. webm→mp4, heic→jpg). No shell=True, no eval/exec.
- OSS credentials: read from environment variables, used ONLY to upload
user media to their own OSS bucket. Never transmitted to third parties.
- All API calls target dashscope.aliyuncs.com (Alibaba Cloud official).
Step 1: animate-anyone-detect-gen2 图像检测(同步)
Step 2: animate-anyone-template-gen2 动作模板生成(异步,得到 template_id)
Step 3: animate-anyone-gen2 视频生成(异步,得到 video_url)
支持多种输入格式,通过 ffmpeg 自动转换:
图片: webp/heic/tif/bmp → jpg
视频: webm/avi/mov/mkv/flv → mp4 (H.264, ≥24fps)
用法:
python animate_anyone.py --image ./face.jpg --video ./dance.webm --download
python animate_anyone.py --image-url https://... --video-url https://... --download
python animate_anyone.py --image ./face.jpg --template-id AACT.xxx --download # 跳过模板生成
"""
import argparse
import json
import os
import re
import shutil
import subprocess
import sys
import time
import urllib.request
from pathlib import Path
import requests
from input_validation import (
mk_temp_path_for_ffmpeg,
resolve_under_cwd,
validate_http_https_url,
)
BASE_URL = os.getenv("DASHSCOPE_BASE_URL", "https://dashscope.aliyuncs.com")
_OSS_SIGNED_URL_EXPIRES = int(os.environ.get("OSS_SIGNED_URL_EXPIRES", str(3 * 24 * 3600)))
# ── ffmpeg helpers ─────────────────────────────────────────────────────────────
def _find_ffmpeg() -> str:
"""Find ffmpeg in PATH or common install locations."""
for p in ["ffmpeg", "/usr/bin/ffmpeg", "/usr/local/bin/ffmpeg"]:
if shutil.which(p):
return p
raise RuntimeError("ffmpeg not found. Install: apt install ffmpeg / brew install ffmpeg")
def convert_image(src: str) -> str:
"""
Convert image to jpg if not already jpg/jpeg/png.
Returns path to (possibly new) file. Temp files are tracked for cleanup.
"""
p = Path(src)
if p.suffix.lower() in (".jpg", ".jpeg", ".png"):
return src
ff = _find_ffmpeg()
dst = mk_temp_path_for_ffmpeg(".jpg", "aa_img_")
subprocess.run([ff, "-y", "-i", src, "-q:v", "2", dst],
check=True, capture_output=True)
print(f"[convert] {p.name} → {Path(dst).name}")
return dst
def convert_video(src: str) -> str:
"""
Convert video to mp4 (H.264) with ≥24fps if not already compatible.
Returns path to (possibly new) file.
Requirements: mp4/avi/mov, H.264 or H.265, fps≥24, bitrate reasonable.
"""
p = Path(src)
# probe fps
probe = subprocess.run(
["ffprobe", "-v", "quiet", "-select_streams", "v:0",
"-show_entries", "stream=codec_name,r_frame_rate,width,height",
"-of", "json", src],
capture_output=True, text=True
)
codec, fps_num, fps_den = "unknown", 25, 1
try:
info = json.loads(probe.stdout)
stream = info.get("streams", [{}])[0]
codec = stream.get("codec_name", "unknown")
fr = stream.get("r_frame_rate", "25/1").split("/")
fps_num, fps_den = int(fr[0]), max(int(fr[1]), 1)
except Exception:
pass
fps = fps_num / fps_den
need_convert = (
p.suffix.lower() not in (".mp4", ".avi", ".mov")
or codec not in ("h264", "hevc", "h265")
or fps < 24
)
if not need_convert:
return src
ff = _find_ffmpeg()
dst = mk_temp_path_for_ffmpeg(".mp4", "aa_vid_")
# ensure ≥24fps, H.264
vf = f"fps=max(fps\\,24)" if fps < 24 else None
cmd = [ff, "-y", "-i", src, "-c:v", "libx264", "-preset", "fast",
"-crf", "22", "-c:a", "aac", "-movflags", "+faststart"]
if vf:
cmd += ["-vf", f"fps=24"]
cmd.append(dst)
subprocess.run(cmd, check=True, capture_output=True)
print(f"[convert] {p.name} → {Path(dst).name} codec={codec} fps={fps:.1f}→converted")
return dst
# ── OSS upload ─────────────────────────────────────────────────────────────────
def upload_to_oss(local_path: str, expires: int = _OSS_SIGNED_URL_EXPIRES) -> str:
"""Upload file to OSS and return signed GET URL (default 3 days)."""
import oss2
auth = oss2.Auth(
os.environ["ALIBABA_CLOUD_ACCESS_KEY_ID"],
os.environ["ALIBABA_CLOUD_ACCESS_KEY_SECRET"],
)
bucket_name = os.environ["OSS_BUCKET"]
endpoint = os.environ.get("OSS_ENDPOINT", "oss-cn-beijing.aliyuncs.com")
endpoint = endpoint.replace("https://", "").replace("http://", "").rstrip("/")
bucket = oss2.Bucket(auth, f"https://{endpoint}", bucket_name)
key = f"human-avatar/{Path(local_path).name}"
print(f"[oss] uploading {Path(local_path).name} …")
bucket.put_object_from_file(key, local_path)
url = bucket.sign_url("GET", key, expires)
print(f"[oss] signed_url ok ({expires//3600}h)")
return url
# ── DashScope helpers ──────────────────────────────────────────────────────────
USER_AGENT = "AlibabaCloud-Agent-Skills/alibabacloud-avatar-video"
def _headers(async_mode: bool = False) -> dict:
key = os.environ.get("DASHSCOPE_API_KEY")
if not key:
raise RuntimeError("DASHSCOPE_API_KEY not set")
h = {
"Authorization": f"Bearer {key}",
"Content-Type": "application/json",
"User-Agent": USER_AGENT,
}
if async_mode:
h["X-DashScope-Async"] = "enable"
return h
def wait_task(task_id: str, interval: int = 10, max_wait: int = 1800) -> dict:
"""Poll until task SUCCEEDED/FAILED, return full output dict."""
url = f"{BASE_URL}/api/v1/tasks/{task_id}"
start = time.time()
while time.time() - start < max_wait:
r = requests.get(url, headers=_headers(), timeout=60)
r.raise_for_status()
data = r.json()
out = data.get("output", {})
status = out.get("task_status", "UNKNOWN")
elapsed = int(time.time() - start)
print(f" [{elapsed}s] task_id={task_id[:16]}… status={status}")
if status == "SUCCEEDED":
return out
if status in ("FAILED", "CANCELED", "UNKNOWN"):
raise RuntimeError(f"Task failed: {json.dumps(data, ensure_ascii=False)}")
time.sleep(interval)
raise TimeoutError(f"Task {task_id} timed out after {max_wait}s")
# ── Step 1: Image detect ───────────────────────────────────────────────────────
def aa_detect(image_url: str) -> dict:
"""
POST /aa-detect — 检测图像是否符合 AA 要求(同步)。
Returns output dict with check_pass, bodystyle.
"""
print(f"\n[step1] aa-detect …")
r = requests.post(
f"{BASE_URL}/api/v1/services/aigc/image2video/aa-detect",
headers=_headers(async_mode=False),
json={"model": "animate-anyone-detect-gen2", "input": {"image_url": image_url}},
timeout=60,
)
r.raise_for_status()
data = r.json()
out = data.get("output", {})
check_pass = out.get("check_pass", False)
bodystyle = out.get("bodystyle", "")
reason = out.get("reason", "")
if check_pass:
print(f" ✅ detect passed bodystyle={bodystyle}")
else:
print(f" ❌ detect FAILED reason={reason}")
raise ValueError(f"Image failed AA detect: {reason}")
return out
# ── Step 2: Template generation ────────────────────────────────────────────────
def aa_template(video_url: str) -> str:
"""
POST /aa-template-generation/ — 从视频提取动作模板(异步)。
Returns template_id str.
"""
print(f"\n[step2] aa-template-generation …")
r = requests.post(
f"{BASE_URL}/api/v1/services/aigc/image2video/aa-template-generation/",
headers=_headers(async_mode=True),
json={"model": "animate-anyone-template-gen2", "input": {"video_url": video_url}},
timeout=60,
)
r.raise_for_status()
data = r.json()
task_id = (data.get("output") or {}).get("task_id")
if not task_id:
raise RuntimeError(f"No task_id in template response: {json.dumps(data, ensure_ascii=False)}")
print(f" task_id={task_id}")
out = wait_task(task_id, interval=10)
template_id = out.get("template_id")
if not template_id:
raise RuntimeError(f"No template_id in result: {out}")
duration = (out.get("usage") or {}).get("video_duration", "?")
print(f" ✅ template_id={template_id} duration={duration}s")
return template_id
# ── Step 3: Video generation ────────────────────────────────────────────────────
def aa_generate(image_url: str, template_id: str,
use_ref_img_bg: bool = False, video_ratio: str = "9:16") -> str:
"""
POST /video-synthesis/ — 基于图像 + 动作模板生成视频(异步)。
Returns video_url str.
"""
print(f"\n[step3] aa-generate use_ref_img_bg={use_ref_img_bg} video_ratio={video_ratio} …")
payload = {
"model": "animate-anyone-gen2",
"input": {"image_url": image_url, "template_id": template_id},
"parameters": {"use_ref_img_bg": use_ref_img_bg, "video_ratio": video_ratio},
}
r = requests.post(
f"{BASE_URL}/api/v1/services/aigc/image2video/video-synthesis/",
headers=_headers(async_mode=True),
json=payload,
timeout=60,
)
r.raise_for_status()
data = r.json()
task_id = (data.get("output") or {}).get("task_id")
if not task_id:
raise RuntimeError(f"No task_id in generate response: {json.dumps(data, ensure_ascii=False)}")
print(f" task_id={task_id}")
out = wait_task(task_id, interval=15, max_wait=1800)
video_url = out.get("video_url")
if not video_url:
raise RuntimeError(f"No video_url in result: {out}")
print(f" ✅ video_url={video_url}")
return video_url
# ── Main ───────────────────────────────────────────────────────────────────────
def main():
p = argparse.ArgumentParser(
description="AnimateAnyone Gen2: image+video → animated video (3-step pipeline)"
)
p.add_argument("--image-url", help="图片公网 URL(已上传 OSS 等)")
p.add_argument("--video-url", help="动作视频公网 URL(已上传 OSS 等)")
p.add_argument("--image", help="本地图片路径(自动转换格式并上传 OSS)")
p.add_argument("--video", help="本地动作视频路径(自动转换格式并上传 OSS)")
p.add_argument("--template-id", help="已有 template_id,跳过 Step 2")
p.add_argument("--use-ref-img-bg", action="store_true",
help="以输入图片为背景生成(默认用视频背景)")
p.add_argument("--video-ratio", default="9:16", choices=["9:16", "3:4"],
help="视频画幅(use_ref_img_bg=true 时有效)")
p.add_argument("--download", action="store_true", help="下载输出视频")
p.add_argument("--output", default="aa_output.mp4", help="输出文件名")
p.add_argument("--skip-detect", action="store_true", help="跳过图像检测步骤")
args = p.parse_args()
_AA_TEMPLATE_ID_RE = re.compile(r"^[A-Za-z0-9._-]{1,256}$")
tmp_files = []
try:
# ── prepare image URL ──────────────────────────────────────────────
image_url = args.image_url
if image_url:
image_url = validate_http_https_url(image_url, field="--image-url")
if not image_url:
if not args.image:
p.error("需要 --image 或 --image-url")
converted = convert_image(args.image)
if converted != args.image:
tmp_files.append(converted)
image_url = upload_to_oss(converted)
# ── prepare video URL ──────────────────────────────────────────────
video_url = args.video_url
if video_url:
video_url = validate_http_https_url(video_url, field="--video-url")
if args.template_id and not _AA_TEMPLATE_ID_RE.fullmatch(args.template_id.strip()):
raise ValueError(f"Invalid --template-id format: {args.template_id!r}")
if not video_url and not args.template_id:
if not args.video:
p.error("需要 --video 或 --video-url 或 --template-id")
converted = convert_video(args.video)
if converted != args.video:
tmp_files.append(converted)
video_url = upload_to_oss(converted)
# ── Step 1: detect ─────────────────────────────────────────────────
if not args.skip_detect:
aa_detect(image_url)
# ── Step 2: template ───────────────────────────────────────────────
template_id = args.template_id
if not template_id:
template_id = aa_template(video_url)
# ── Step 3: generate ───────────────────────────────────────────────
final_url = aa_generate(
image_url, template_id,
use_ref_img_bg=args.use_ref_img_bg,
video_ratio=args.video_ratio,
)
print(f"\n✅ Done! video_url = {final_url}")
if args.download:
out_path = resolve_under_cwd(args.output, field="--output")
safe_url = validate_http_https_url(final_url, field="result video URL")
print(f"Downloading → {out_path} …")
with urllib.request.urlopen(safe_url, timeout=300) as response:
with open(out_path, 'wb') as f:
f.write(response.read())
size = out_path.stat().st_size
print(f"Saved {out_path} ({size//1024}KB)")
finally:
for f in tmp_files:
try:
os.unlink(f)
except Exception:
pass
if __name__ == "__main__":
try:
main()
except Exception as e:
print(f"\nERROR: {e}", file=sys.stderr)
sys.exit(1)
FILE:scripts/avatar_video.py
#!/usr/bin/env python3
"""灵眸:自动选择模板创建播报视频并轮询结果。
能力:
- 列出账号下已有播报模板
- 未指定 --template-id 时随机选择一个已有模板
- 自动读取模板 variables,并把文本填入可用的 text 变量(优先 text_content)
- 在 1.7.0+ SDK 下可列出公共模板,并在本地模板为空时复制最多 3 个公共模板
- 注意:复制公共模板成功 ≠ 一定可以立即直接生成视频;有些复制出来的是草稿模板,可能仍需人工补充有效片段/素材
"""
import argparse
import json
import os
import random
import sys
import time
import urllib.request
from typing import Optional
from input_validation import read_text_file_limited, resolve_under_cwd, validate_http_https_url
VENV_PYTHON = os.environ.get(
"LINGMOU_VENV_PYTHON",
"/data/xxx-workspace/.venv-alibabacloud-avatar-video/bin/python",
)
def create_client():
try:
from alibabacloud_lingmou20250527.client import Client
from alibabacloud_tea_openapi import models as open_api_models
except Exception as e:
raise RuntimeError(
"需要安装: pip install alibabacloud-lingmou20250527 alibabacloud-tea-openapi"
) from e
config = open_api_models.Config(
access_key_id=os.environ["ALIBABA_CLOUD_ACCESS_KEY_ID"],
access_key_secret=os.environ["ALIBABA_CLOUD_ACCESS_KEY_SECRET"],
endpoint=os.environ.get("LINGMOU_ENDPOINT", "lingmou.cn-beijing.aliyuncs.com"),
region_id=os.environ.get("LINGMOU_REGION", "cn-beijing"),
user_agent="AlibabaCloud-Agent-Skills/alibabacloud-avatar-video",
)
return Client(config)
def sdk_supports_public_templates(client) -> bool:
return hasattr(client, "list_public_broadcast_scene_templates") and hasattr(client, "copy_broadcast_scene_from_template")
def list_templates(client):
from alibabacloud_lingmou20250527 import models as lm
resp = client.list_broadcast_templates(lm.ListBroadcastTemplatesRequest())
return resp.body.data or []
def list_public_templates(client, size: int = 10):
from alibabacloud_lingmou20250527 import models as lm
if not sdk_supports_public_templates(client):
return []
req = lm.ListPublicBroadcastSceneTemplatesRequest(size=size)
resp = client.list_public_broadcast_scene_templates(req)
return resp.body.data or []
def copy_public_templates(client, public_templates, max_copy: int = 3):
from alibabacloud_lingmou20250527 import models as lm
copied = []
for idx, t in enumerate(public_templates[:max_copy], start=1):
req = lm.CopyBroadcastSceneFromTemplateRequest(
name=f"OpenClaw copied template {idx}",
ratio=getattr(t, "ratio", None) or "16:9",
template_id=t.id,
)
resp = client.copy_broadcast_scene_from_template(req)
copied.append(resp.body.data)
return copied
def get_template_detail(client, template_id: str):
from alibabacloud_lingmou20250527 import models as lm
req = lm.GetBroadcastTemplateRequest()
req.template_id = template_id
resp = client.get_broadcast_template(req)
return resp.body.data
def choose_template(client, explicit_template_id: Optional[str] = None, seed: Optional[int] = None, auto_copy_public: bool = True):
templates = list_templates(client)
if explicit_template_id:
for t in templates:
if t.id == explicit_template_id:
return t, False
class T:
pass
t = T()
t.id = explicit_template_id
t.name = explicit_template_id
return t, False
if templates:
rng = random.Random(seed)
return rng.choice(list(templates)), False
if auto_copy_public and sdk_supports_public_templates(client):
publics = list_public_templates(client, size=10)
if publics:
copied = copy_public_templates(client, publics, max_copy=3)
if copied:
rng = random.Random(seed)
return rng.choice(list(copied)), True
raise RuntimeError(
"当前账号下没有可用播报模板;且未能获得可直接使用的公共模板副本。请先在灵眸中准备至少一个可直接生成的视频模板。"
)
def build_variables(template_detail, text: str):
from alibabacloud_lingmou20250527 import models as lm
variables = getattr(template_detail, "variables", None) or []
if not variables:
return [
lm.TemplateVariable(
name="text_content",
type="text",
properties={"content": text},
)
]
text_vars = [v for v in variables if getattr(v, "type", None) == "text"]
if not text_vars:
raise RuntimeError("模板中没有可替换的 text 变量,暂时无法仅凭脚本生成口播视频")
preferred_names = ["text_content", "test_text", "content", "text"]
target = None
for name in preferred_names:
for v in text_vars:
if getattr(v, "name", None) == name:
target = v
break
if target:
break
if not target:
target = text_vars[0]
return [
lm.TemplateVariable(
name=target.name,
type="text",
properties={"content": text},
)
]
def submit_video(client, template_id: str, text: str, name: str, resolution: str, fps: int, watermark: bool):
from alibabacloud_lingmou20250527 import models as lm
template_detail = get_template_detail(client, template_id)
variables = build_variables(template_detail, text)
req = lm.CreateBroadcastVideoFromTemplateRequest(
template_id=template_id,
name=name,
variables=variables,
video_options=lm.CreateBroadcastVideoFromTemplateRequestVideoOptions(
resolution=resolution,
fps=fps,
watermark=watermark,
),
)
resp = client.create_broadcast_video_from_template(req)
video_id = resp.body.data.id
return video_id, template_detail, variables
def wait_video(client, video_id: str, interval: int = 3, max_wait: int = 1800):
from alibabacloud_lingmou20250527 import models as lm
start = time.time()
while time.time() - start < max_wait:
req = lm.ListBroadcastVideosByIdRequest(video_ids=[video_id])
resp = client.list_broadcast_videos_by_id(req)
data = resp.body.data or []
if not data:
time.sleep(interval)
continue
video = data[0]
status = video.status
print(f"status={status}")
if status == "SUCCESS":
return video.video_url
if status in ("ERROR", "FAILED"):
raise RuntimeError(f"LingMou task failed: {status}")
time.sleep(interval)
raise TimeoutError("LingMou polling timeout")
def main():
p = argparse.ArgumentParser()
p.add_argument("--template-id")
p.add_argument("--list-templates", action="store_true", help="列出账号下已有模板")
p.add_argument("--list-public-templates", action="store_true", help="列出公共模板(需 SDK 1.7.0+)")
p.add_argument("--copy-public-templates", action="store_true", help="复制最多 3 个公共模板到当前账号(需 SDK 1.7.0+)")
p.add_argument("--show-template-detail", action="store_true", help="输出所选模板详情")
p.add_argument("--seed", type=int, default=None, help="随机模板选择种子,便于复现")
p.add_argument("--text")
p.add_argument("--text-file")
p.add_argument("--name", default="OpenClaw Avatar Video")
p.add_argument("--resolution", default="720p", choices=["720p", "1080p"])
p.add_argument("--fps", type=int, default=30, choices=[15, 30])
p.add_argument("--watermark", action="store_true", default=False)
p.add_argument("--download", action="store_true")
p.add_argument("--output", default="lingmou_output.mp4")
args = p.parse_args()
client = create_client()
if args.list_templates:
templates = list_templates(client)
print(json.dumps([
{"id": t.id, "name": getattr(t, "name", None)} for t in templates
], ensure_ascii=False, indent=2))
if args.list_public_templates:
publics = list_public_templates(client, size=10)
print(json.dumps([
{"id": t.id, "name": getattr(t, "name", None), "ratio": getattr(t, "ratio", None), "desc": getattr(t, "desc", None)} for t in publics
], ensure_ascii=False, indent=2))
if args.copy_public_templates:
publics = list_public_templates(client, size=10)
copied = copy_public_templates(client, publics, max_copy=3)
print(json.dumps([
{"id": t.id, "name": getattr(t, "name", None), "ratio": getattr(t, "ratio", None), "status": getattr(t, "status", None)} for t in copied
], ensure_ascii=False, indent=2))
if not (args.text or args.text_file or args.template_id or args.show_template_detail):
return
if (args.list_templates or args.list_public_templates) and not (args.text or args.text_file or args.template_id or args.show_template_detail or args.copy_public_templates):
return
chosen, copied_from_public = choose_template(client, args.template_id, seed=args.seed, auto_copy_public=True)
print(f"template_id={chosen.id}")
print(f"template_name={getattr(chosen, 'name', '')}")
print(f"template_from_public_copy={str(copied_from_public).lower()}")
if args.show_template_detail:
detail = get_template_detail(client, chosen.id)
print(json.dumps(detail.to_map(), ensure_ascii=False, indent=2, default=str))
if not (args.text or args.text_file):
return
text = args.text
if args.text_file:
text = read_text_file_limited(args.text_file, field="--text-file")
if not text:
p.error("Need --text or --text-file")
try:
video_id, template_detail, variables = submit_video(
client,
template_id=chosen.id,
text=text,
name=args.name,
resolution=args.resolution,
fps=args.fps,
watermark=args.watermark,
)
except Exception as e:
if copied_from_public:
raise RuntimeError(
f"公共模板已复制,但直接生成失败:{e}。这通常表示复制出的模板仍是草稿或缺少有效片段/素材,需要先在灵眸侧完善模板。"
)
raise
print(f"video_id={video_id}")
print("variables=" + json.dumps([v.to_map() for v in variables], ensure_ascii=False))
if args.show_template_detail:
print("template_detail=" + json.dumps(template_detail.to_map(), ensure_ascii=False, default=str))
video_url = wait_video(client, video_id)
print(f"video_url={video_url}")
if args.download and video_url:
out_path = resolve_under_cwd(args.output, field="--output")
safe_url = validate_http_https_url(video_url, field="video URL")
with urllib.request.urlopen(safe_url, timeout=300) as response:
with open(out_path, 'wb') as f:
f.write(response.read())
print(f"saved={out_path}")
if __name__ == "__main__":
try:
main()
except Exception as e:
print(f"ERROR: {e}", file=sys.stderr)
sys.exit(1)
FILE:scripts/demo_pipeline.py
#!/usr/bin/env python3
"""
Human Avatar Demo Pipeline
统一入口,调用本 Skill 的三个脚本:
- EMO: portrait_animate.py
- AA: animate_anyone.py
- 灵眸: avatar_video.py
示例:
# EMO(本地图片+本地音频)
python demo_pipeline.py --mode emo --image ./face.jpg --audio ./speech.mp3 --download
# AA(URL)
python demo_pipeline.py --mode aa --image-url https://... --video-url https://... --download
# 灵眸(模板+文案)
python demo_pipeline.py --mode lingmou --template-id BSxxxx --text "大家好" --download
"""
import argparse
import os
import subprocess
import sys
from pathlib import Path
from input_validation import resolve_under_cwd, validate_http_https_url
ROOT = Path(__file__).resolve().parent
def run_cmd(cmd):
print("$", " ".join(cmd))
proc = subprocess.run(cmd)
if proc.returncode != 0:
sys.exit(proc.returncode)
def build_emo_cmd(args):
cmd = [sys.executable, str(ROOT / "portrait_animate.py")]
if args.image_url:
cmd += ["--image-url", args.image_url]
if args.audio_url:
cmd += ["--audio-url", args.audio_url]
if args.image:
cmd += ["--image", args.image]
if args.audio:
cmd += ["--audio", args.audio]
cmd += ["--ratio", args.ratio, "--style-level", args.style_level, "--output", args.output]
if args.download:
cmd += ["--download"]
return cmd
def build_aa_cmd(args):
cmd = [sys.executable, str(ROOT / "animate_anyone.py")]
if args.image_url:
cmd += ["--image-url", args.image_url]
if args.video_url:
cmd += ["--video-url", args.video_url]
if args.image:
cmd += ["--image", args.image]
if args.video:
cmd += ["--video", args.video]
cmd += ["--output", args.output]
if args.download:
cmd += ["--download"]
return cmd
def build_lingmou_cmd(args):
cmd = [
sys.executable,
str(ROOT / "avatar_video.py"),
"--template-id",
args.template_id,
"--name",
args.name,
"--resolution",
args.resolution,
"--fps",
str(args.fps),
"--output",
args.output,
]
if args.watermark:
cmd += ["--watermark"]
if args.text_file:
cmd += ["--text-file", args.text_file]
else:
cmd += ["--text", args.text]
if args.download:
cmd += ["--download"]
return cmd
def validate_args(args):
if args.download:
resolve_under_cwd(args.output, field="--output")
if args.mode == "emo":
if not ((args.image_url or args.image) and (args.audio_url or args.audio)):
raise SystemExit("EMO 需要 image(+url|file) 和 audio(+url|file)")
if args.image_url:
validate_http_https_url(args.image_url, field="--image-url")
if args.audio_url:
validate_http_https_url(args.audio_url, field="--audio-url")
elif args.mode == "aa":
if not ((args.image_url or args.image) and (args.video_url or args.video)):
raise SystemExit("AA 需要 image(+url|file) 和 video(+url|file)")
if args.image_url:
validate_http_https_url(args.image_url, field="--image-url")
if args.video_url:
validate_http_https_url(args.video_url, field="--video-url")
elif args.mode == "lingmou":
if not args.template_id:
raise SystemExit("灵眸模式必须提供 --template-id")
if not (args.text or args.text_file):
raise SystemExit("灵眸模式必须提供 --text 或 --text-file")
def print_env_hint(mode):
print("\n[env-check]")
if mode in ("emo", "aa"):
print("- 需要 DASHSCOPE_API_KEY(北京地域)")
if mode in ("emo", "aa") and not (os.getenv("DASHSCOPE_API_KEY")):
print(" ! 未检测到 DASHSCOPE_API_KEY")
if mode == "lingmou":
print("- 需要 ALIBABA_CLOUD_ACCESS_KEY_ID / ALIBABA_CLOUD_ACCESS_KEY_SECRET")
if not os.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"):
print(" ! 未检测到 ALIBABA_CLOUD_ACCESS_KEY_ID")
if not os.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"):
print(" ! 未检测到 ALIBABA_CLOUD_ACCESS_KEY_SECRET")
def main():
p = argparse.ArgumentParser(description="Human Avatar Demo Pipeline")
p.add_argument("--mode", required=True, choices=["emo", "aa", "lingmou"])
# 通用
p.add_argument("--download", action="store_true")
p.add_argument("--output", default="demo_output.mp4")
# EMO / AA 共用输入
p.add_argument("--image-url")
p.add_argument("--image")
# EMO
p.add_argument("--audio-url")
p.add_argument("--audio")
p.add_argument("--ratio", default="1:1", choices=["1:1", "3:4"])
p.add_argument("--style-level", default="normal", choices=["normal", "calm", "active"])
# AA
p.add_argument("--video-url")
p.add_argument("--video")
# 灵眸
p.add_argument("--template-id")
p.add_argument("--text")
p.add_argument("--text-file")
p.add_argument("--name", default="OpenClaw Avatar Video")
p.add_argument("--resolution", default="720p", choices=["720p", "1080p"])
p.add_argument("--fps", type=int, default=30, choices=[15, 30])
p.add_argument("--watermark", action="store_true")
args = p.parse_args()
validate_args(args)
print_env_hint(args.mode)
if args.mode == "emo":
cmd = build_emo_cmd(args)
elif args.mode == "aa":
cmd = build_aa_cmd(args)
else:
cmd = build_lingmou_cmd(args)
run_cmd(cmd)
if __name__ == "__main__":
main()
FILE:scripts/image_to_video.py
#!/usr/bin/env python3
"""
万相图生视频 (wan2.x-i2v)
SECURITY NOTES:
- subprocess: used ONLY for ffmpeg image format conversion. No shell=True.
- OSS credentials: env-only, used ONLY to upload user media to their own
OSS bucket and generate signed GET URLs for Alibaba Cloud API access.
- All API calls target dashscope.aliyuncs.com (Alibaba Cloud official).
默认模型:wan2.7-i2v
用法:
# wan2.7-i2v(默认)
python image_to_video.py --image ./portrait.jpg --prompt "她在草地上微笑跳舞" --download
python image_to_video.py --image ./portrait.jpg --resolution 1080P --duration 10 --download
# wan2.6-i2v-flash
python image_to_video.py --image ./portrait.jpg --prompt "..." --model wan2.6-i2v-flash --download
python image_to_video.py --image-url https://... --prompt "..." --model wan2.6-i2v-flash --resolution 720P --duration 5
# 带音频
python image_to_video.py --image ./portrait.jpg --audio-url https://.../bgm.mp3 --prompt "..." --download
# 先文生图再图生视频(一条龙)
python image_to_video.py --t2i-prompt "一位身着汉服的女性站在桃花林中" --prompt "她缓缓转身,花瓣飘落" --download
"""
import argparse
import json
import os
import shutil
import subprocess
import sys
import time
import urllib.request
from pathlib import Path
import requests
from input_validation import (
mk_temp_path_for_ffmpeg,
resolve_under_cwd,
validate_http_https_url,
)
BASE_URL = os.getenv("DASHSCOPE_BASE_URL", "https://dashscope.aliyuncs.com")
_OSS_SIGNED_URL_EXPIRES = int(os.environ.get("OSS_SIGNED_URL_EXPIRES", str(3 * 24 * 3600)))
USER_AGENT = "AlibabaCloud-Agent-Skills/alibabacloud-avatar-video"
def _headers(async_mode: bool = True) -> dict:
key = os.environ.get("DASHSCOPE_API_KEY")
if not key:
raise RuntimeError("DASHSCOPE_API_KEY not set")
h = {
"Authorization": f"Bearer {key}",
"Content-Type": "application/json",
"User-Agent": USER_AGENT,
}
if async_mode:
h["X-DashScope-Async"] = "enable"
return h
def _wait_task(task_id: str, interval: int = 10, max_wait: int = 600) -> dict:
url = f"{BASE_URL}/api/v1/tasks/{task_id}"
start = time.time()
while time.time() - start < max_wait:
r = requests.get(url, headers=_headers(async_mode=False), timeout=30)
r.raise_for_status()
data = r.json()
out = data.get("output", {})
status = out.get("task_status", "UNKNOWN")
elapsed = int(time.time() - start)
print(f" [{elapsed}s] status={status}")
if status == "SUCCEEDED":
return out
if status in ("FAILED", "CANCELED", "UNKNOWN"):
raise RuntimeError(f"Task failed: {json.dumps(data, ensure_ascii=False)}")
time.sleep(interval)
raise TimeoutError(f"Task {task_id} timed out")
def upload_to_oss(local_path: str, expires: int = _OSS_SIGNED_URL_EXPIRES) -> str:
"""Upload to OSS, return signed GET URL (default 3 days)."""
import oss2
auth = oss2.Auth(
os.environ["ALIBABA_CLOUD_ACCESS_KEY_ID"],
os.environ["ALIBABA_CLOUD_ACCESS_KEY_SECRET"],
)
bucket_name = os.environ["OSS_BUCKET"]
endpoint = os.environ.get("OSS_ENDPOINT", "oss-cn-beijing.aliyuncs.com")
endpoint = endpoint.replace("https://", "").replace("http://", "").rstrip("/")
bucket = oss2.Bucket(auth, f"https://{endpoint}", bucket_name)
key = f"human-avatar/{Path(local_path).name}"
print(f"[oss] uploading {Path(local_path).name} …")
bucket.put_object_from_file(key, local_path)
url = bucket.sign_url("GET", key, expires)
print(f"[oss] signed_url ok ({expires//3600}h)")
return url
def convert_image(src: str) -> str:
"""Convert image to jpg/png if not compatible."""
p = Path(src)
if p.suffix.lower() in (".jpg", ".jpeg", ".png", ".bmp", ".webp"):
return src
ff = shutil.which("ffmpeg") or "ffmpeg"
dst = mk_temp_path_for_ffmpeg(".jpg", "i2v_img_")
subprocess.run([ff, "-y", "-i", src, "-q:v", "2", dst], check=True, capture_output=True)
return dst
def image_to_video(
img_url: str,
prompt: str = "",
model: str = "wan2.7-i2v",
resolution: str = "720P",
duration: int = 5,
negative_prompt: str = "",
prompt_extend: bool = True,
audio_url: str | None = None,
audio: bool = True,
) -> str:
"""
Call 万相 image-to-video API. Returns video URL.
Args:
img_url: 首帧图像 URL(公网可访问)或 base64
prompt: 描述视频动作的提示词
model: 模型名,默认 wan2.7-i2v
resolution: 分辨率,wan2.7: 720P/1080P(默认 1080P),wan2.6: 480P/720P(默认 720P)
duration: 视频时长(秒),wan2.7: [2,15],wan2.6: 3/5/10s,默认 5
negative_prompt: 反向提示词
prompt_extend: 是否开启提示词智能改写
audio_url: 自定义背景音频 URL(可选)
audio: 是否生成音频(wan2.6/2.5 支持;wan2.7 未传入音频时自动生成)
"""
endpoint = f"{BASE_URL}/api/v1/services/aigc/video-generation/video-synthesis"
# 根据模型版本构建不同的请求结构
is_wan27 = model.startswith("wan2.7")
if is_wan27:
# wan2.7-i2v 使用 media 数组结构
media = [{"type": "first_frame", "url": img_url}]
if audio_url:
media.append({"type": "driving_audio", "url": audio_url})
inp = {"prompt": prompt, "media": media}
if negative_prompt:
inp["negative_prompt"] = negative_prompt
# wan2.7 默认 resolution=1080P, duration 支持 [2,15]
params = {
"resolution": resolution,
"prompt_extend": prompt_extend,
"duration": duration,
}
else:
# wan2.6 及更早版本使用 img_url 字段
inp = {"img_url": img_url}
if prompt:
inp["prompt"] = prompt
if negative_prompt:
inp["negative_prompt"] = negative_prompt
if audio_url:
inp["audio_url"] = audio_url
params = {
"resolution": resolution,
"prompt_extend": prompt_extend,
"duration": duration,
}
# wan2.7-i2v generates audio by default; set False to disable
if not audio:
params["audio"] = False
payload = {"model": model, "input": inp, "parameters": params}
print(f"\n[i2v] submit model={model} resolution={resolution} duration={duration}s")
r = requests.post(endpoint, headers=_headers(async_mode=True), json=payload, timeout=60)
r.raise_for_status()
data = r.json()
task_id = (data.get("output") or {}).get("task_id")
if not task_id:
raise RuntimeError(f"No task_id: {json.dumps(data, ensure_ascii=False)}")
print(f" task_id={task_id}")
out = _wait_task(task_id, interval=10, max_wait=600)
video_url = out.get("video_url")
if not video_url:
raise RuntimeError(f"No video_url in result: {out}")
print(f"\n✅ video_url = {video_url}")
return video_url
def main():
p = argparse.ArgumentParser(description="万相图生视频")
# image input
img_grp = p.add_mutually_exclusive_group()
img_grp.add_argument("--image", help="本地图片路径(自动上传 OSS)")
img_grp.add_argument("--image-url", help="图片公网 URL")
img_grp.add_argument("--t2i-prompt", help="先文生图再图生视频(调用 text_to_image)")
p.add_argument("--prompt", default="", help="描述视频内容的提示词")
p.add_argument("--model", default="wan2.7-i2v",
help="模型名 (default: wan2.7-i2v),可选: wan2.6-i2v-flash, wan2.5-i2v-preview, wan2.2-i2v-plus 等")
p.add_argument("--t2i-model", default="wan2.2-t2i-flash",
help="文生图模型(--t2i-prompt 模式下使用,default: wan2.2-t2i-flash)")
p.add_argument("--resolution", default=None,
help="分辨率,wan2.7: 720P/1080P(默认1080P),wan2.6: 480P/720P(默认720P)")
p.add_argument("--duration", type=int, default=None,
help="视频时长(秒),wan2.7: [2,15],wan2.6: 3/5/10,默认 5")
p.add_argument("--negative-prompt", default="", help="反向提示词")
p.add_argument("--no-prompt-extend", action="store_true", help="关闭提示词智能改写")
p.add_argument("--audio-url", default=None, help="背景音频 URL(wan2.6/2.5 支持)")
p.add_argument("--no-audio", action="store_true", help="生成无声视频")
p.add_argument("--download", action="store_true", help="下载生成视频到本地")
p.add_argument("--output", default="i2v_output.mp4", help="输出文件名")
args = p.parse_args()
# ── get image URL ────────────────────────────────────────────────────
img_url = args.image_url
if img_url:
img_url = validate_http_https_url(img_url, field="--image-url")
if args.audio_url:
args.audio_url = validate_http_https_url(args.audio_url, field="--audio-url")
tmp_files = []
if args.t2i_prompt:
# Generate image first using text_to_image
print(f"\n[step0] text-to-image model={args.t2i_model}")
from text_to_image import text_to_image
images = text_to_image(
prompt=args.t2i_prompt,
model=args.t2i_model,
size="960*1696", # 9:16 default for portrait
n=1,
prompt_extend=not args.no_prompt_extend,
)
img_url = validate_http_https_url(images[0], field="text-to-image URL")
print(f" t2i image_url = {img_url}")
elif args.image:
converted = convert_image(args.image)
if converted != args.image:
tmp_files.append(converted)
img_url = upload_to_oss(converted)
if not img_url:
p.error("需要 --image, --image-url 或 --t2i-prompt 之一")
# 根据模型版本设置默认值
is_wan27 = args.model.startswith("wan2.7")
if args.resolution is None:
args.resolution = "1080P" if is_wan27 else "720P"
if args.duration is None:
args.duration = 5 # 两个版本都支持 5 秒
# 验证参数
if is_wan27:
if args.resolution not in ["720P", "1080P"]:
p.error(f"wan2.7 支持的分辨率: 720P, 1080P,当前: {args.resolution}")
if args.duration < 2 or args.duration > 15:
p.error(f"wan2.7 支持的时长: [2,15] 秒,当前: {args.duration}")
else:
if args.resolution not in ["480P", "720P"]:
p.error(f"wan2.6 及更早版本支持的分辨率: 480P, 720P,当前: {args.resolution}")
if args.duration not in [3, 5, 10]:
p.error(f"wan2.6 及更早版本支持的时长: 3, 5, 10 秒,当前: {args.duration}")
# ── submit i2v ───────────────────────────────────────────────────────
try:
video_url = image_to_video(
img_url=img_url,
prompt=args.prompt,
model=args.model,
resolution=args.resolution,
duration=args.duration,
negative_prompt=args.negative_prompt,
prompt_extend=not args.no_prompt_extend,
audio_url=args.audio_url,
audio=not args.no_audio,
)
if args.download:
out_path = resolve_under_cwd(args.output, field="--output")
safe_url = validate_http_https_url(video_url, field="result video URL")
print(f"Downloading → {out_path} …")
with urllib.request.urlopen(safe_url, timeout=300) as response:
with open(out_path, 'wb') as f:
f.write(response.read())
size_kb = out_path.stat().st_size // 1024
print(f"Saved {out_path} ({size_kb}KB)")
finally:
for f in tmp_files:
try:
os.unlink(f)
except Exception:
pass
if __name__ == "__main__":
try:
main()
except Exception as e:
print(f"\nERROR: {e}", file=sys.stderr)
sys.exit(1)
FILE:scripts/input_validation.py
"""Shared CLI input checks: path confinement, http(s) URLs, readable files."""
from __future__ import annotations
import os
from pathlib import Path
from urllib.parse import urlparse
_MAX_TEXT_FILE_BYTES = 10 * 1024 * 1024
def resolve_under_cwd(user_path: str, *, field: str = "path") -> Path:
"""Resolve a user path under the current working directory; reject traversal."""
base = Path.cwd().resolve(strict=False)
candidate = (base / user_path).resolve(strict=False)
try:
candidate.relative_to(base)
except ValueError as e:
raise ValueError(
f"{field} must resolve inside the current working directory ({base}): {user_path!r}"
) from e
return candidate
def validate_http_https_url(url: str, *, field: str = "URL") -> str:
u = url.strip()
parsed = urlparse(u)
if parsed.scheme not in ("http", "https"):
raise ValueError(f"{field} must use http or https scheme: {url!r}")
if not parsed.netloc:
raise ValueError(f"{field} must include a host: {url!r}")
return u
def validate_readable_file(path_str: str, *, field: str = "file") -> Path:
p = Path(path_str).expanduser()
if not p.is_file():
raise ValueError(f"{field} must be an existing regular file: {path_str!r}")
return p.resolve()
def read_text_file_limited(path_str: str, *, field: str = "--text-file") -> str:
path = validate_readable_file(path_str, field=field)
size = path.stat().st_size
if size > _MAX_TEXT_FILE_BYTES:
raise ValueError(
f"{field} exceeds max size ({_MAX_TEXT_FILE_BYTES // (1024 * 1024)} MiB): {path_str!r}"
)
return path.read_text(encoding="utf-8").strip()
def mk_temp_path_for_ffmpeg(suffix: str, prefix: str) -> str:
"""Create an empty file with a random name and return its path (fd closed for ffmpeg)."""
import tempfile
fd, path = tempfile.mkstemp(suffix=suffix, prefix=prefix)
try:
os.close(fd)
except OSError:
try:
os.unlink(path)
except OSError:
pass
raise
return path
FILE:scripts/live_portrait.py
#!/usr/bin/env python3
"""
LivePortrait 人像视频生成 — 两步流水线
SECURITY NOTES:
- subprocess: used ONLY to invoke system ffmpeg (audio/video format
conversion). All arguments are constructed from user-supplied local
file paths; no shell=True, no dynamic code execution.
- OSS credentials (ALIBABA_CLOUD_ACCESS_KEY_ID/SECRET): read from env,
used ONLY to upload media files to the user's own OSS bucket and
generate time-limited signed GET URLs. Never logged or sent elsewhere.
- All API calls go to dashscope.aliyuncs.com (Alibaba Cloud official).
Step 1: liveportrait-detect 图像检测(同步)
Step 2: liveportrait 视频生成(异步)
输入:
- 人像图片(jpeg/jpg/png/bmp/webp)
- 音频文件(wav/mp3,<15MB,1s~3min)
OR 视频文件(自动提取音频)
用法:
python live_portrait.py --image ./portrait.jpg --audio ./speech.mp3 --download
python live_portrait.py --image ./portrait.jpg --video ./speech.mp4 --download
python live_portrait.py --image-url https://... --audio-url https://... --download
python live_portrait.py --image ./portrait.jpg --audio ./speech.mp3 \\
--template active --mouth-strength 1.2 --head-strength 0.8 --download
"""
import argparse
import json
import os
import shutil
import subprocess
import sys
import time
import urllib.request
from pathlib import Path
import requests
from input_validation import (
mk_temp_path_for_ffmpeg,
resolve_under_cwd,
validate_http_https_url,
)
BASE_URL = os.getenv("DASHSCOPE_BASE_URL", "https://dashscope.aliyuncs.com")
_OSS_SIGNED_URL_EXPIRES = int(os.environ.get("OSS_SIGNED_URL_EXPIRES", str(3 * 24 * 3600)))
# ── helpers ────────────────────────────────────────────────────────────────────
USER_AGENT = "AlibabaCloud-Agent-Skills/alibabacloud-avatar-video"
def _headers(async_mode: bool = False) -> dict:
key = os.environ.get("DASHSCOPE_API_KEY")
if not key:
raise RuntimeError("DASHSCOPE_API_KEY not set")
h = {
"Authorization": f"Bearer {key}",
"Content-Type": "application/json",
"User-Agent": USER_AGENT,
}
if async_mode:
h["X-DashScope-Async"] = "enable"
return h
def _wait_task(task_id: str, interval: int = 5, max_wait: int = 600) -> dict:
url = f"{BASE_URL}/api/v1/tasks/{task_id}"
start = time.time()
while time.time() - start < max_wait:
try:
r = requests.get(url, headers=_headers(), timeout=30)
r.raise_for_status()
data = r.json()
out = data.get("output", {})
status = out.get("task_status", "UNKNOWN")
elapsed = int(time.time() - start)
print(f" [{elapsed}s] status={status}")
if status == "SUCCEEDED":
return out
if status in ("FAILED", "CANCELED", "UNKNOWN"):
raise RuntimeError(f"Task failed: {json.dumps(data, ensure_ascii=False)}")
except RuntimeError:
raise
except Exception as e:
print(f" [poll error] {e}")
time.sleep(interval)
raise TimeoutError(f"Task {task_id} timed out after {max_wait}s")
def _find_ffmpeg() -> str:
for p in ["ffmpeg", "/usr/bin/ffmpeg", "/usr/local/bin/ffmpeg"]:
if shutil.which(p):
return p
raise RuntimeError("ffmpeg not found. Install: apt install ffmpeg")
def upload_to_oss(local_path: str, expires: int = _OSS_SIGNED_URL_EXPIRES) -> str:
"""Upload local file to OSS, return signed GET URL (default 3 days)."""
import oss2
auth = oss2.Auth(
os.environ["ALIBABA_CLOUD_ACCESS_KEY_ID"],
os.environ["ALIBABA_CLOUD_ACCESS_KEY_SECRET"],
)
bucket_name = os.environ["OSS_BUCKET"]
endpoint = os.environ.get("OSS_ENDPOINT", "oss-cn-beijing.aliyuncs.com")
endpoint = endpoint.replace("https://", "").replace("http://", "").rstrip("/")
bucket = oss2.Bucket(auth, f"https://{endpoint}", bucket_name)
key = f"human-avatar/{Path(local_path).name}"
print(f"[oss] uploading {Path(local_path).name} …")
bucket.put_object_from_file(key, local_path)
url = bucket.sign_url("GET", key, expires)
print(f"[oss] signed_url ok ({expires // 3600}h)")
return url
def convert_image(src: str) -> str:
"""Convert image to jpg if format not natively supported (e.g. heic)."""
p = Path(src)
if p.suffix.lower() in (".jpg", ".jpeg", ".png", ".bmp", ".webp"):
return src
ff = _find_ffmpeg()
dst = mk_temp_path_for_ffmpeg(".jpg", "lp_img_")
subprocess.run([ff, "-y", "-i", src, "-q:v", "2", dst], check=True, capture_output=True)
print(f"[convert] image {p.name} → {Path(dst).name}")
return dst
def extract_audio_from_video(video_path: str) -> str:
"""
Extract audio from video file to mp3.
LivePortrait requires wav or mp3, <15MB, 1s~3min.
"""
ff = _find_ffmpeg()
dst = mk_temp_path_for_ffmpeg(".mp3", "lp_audio_")
print(f"[ffmpeg] extracting audio from {Path(video_path).name} …")
subprocess.run(
[ff, "-y", "-i", video_path, "-vn",
"-ar", "44100", "-ac", "1", "-b:a", "128k",
"-t", "180", # cap at 3min
dst],
check=True, capture_output=True
)
size_mb = Path(dst).stat().st_size / 1024 / 1024
duration = float(subprocess.run(
["ffprobe", "-v", "quiet", "-select_streams", "a:0",
"-show_entries", "stream=duration", "-of", "csv=p=0", dst],
capture_output=True, text=True
).stdout.strip() or "0")
print(f"[ffmpeg] extracted: {Path(dst).name} {size_mb:.1f}MB {duration:.1f}s")
if duration < 1:
raise ValueError(f"Extracted audio is too short ({duration:.1f}s), minimum is 1s")
if size_mb > 14.5:
raise ValueError(f"Extracted audio is too large ({size_mb:.1f}MB), max 15MB")
return dst
def convert_audio(src: str) -> str:
"""Convert audio to mp3 if not wav/mp3."""
p = Path(src)
if p.suffix.lower() in (".wav", ".mp3"):
return src
ff = _find_ffmpeg()
dst = mk_temp_path_for_ffmpeg(".mp3", "lp_audio_")
subprocess.run(
[ff, "-y", "-i", src, "-vn", "-ar", "44100", "-ac", "1", "-b:a", "128k", dst],
check=True, capture_output=True
)
print(f"[convert] audio {p.name} → {Path(dst).name}")
return dst
# ── Step 1: detect ─────────────────────────────────────────────────────────────
def lp_detect(image_url: str) -> None:
"""
POST /face-detect — check image meets LivePortrait requirements (sync).
Raises ValueError if check fails.
"""
print(f"\n[step1] liveportrait-detect …")
r = requests.post(
f"{BASE_URL}/api/v1/services/aigc/image2video/face-detect",
headers=_headers(async_mode=False),
json={"model": "liveportrait-detect", "input": {"image_url": image_url}},
timeout=30,
)
r.raise_for_status()
data = r.json()
out = data.get("output", {})
passed = out.get("pass", False)
msg = out.get("message", "")
if passed:
print(f" ✅ detect passed")
else:
raise ValueError(f"Image failed LivePortrait detect: {msg}")
# ── Step 2: generate ────────────────────────────────────────────────────────────
def lp_generate(
image_url: str,
audio_url: str,
template_id: str = "normal",
eye_move_freq: float = 0.5,
video_fps: int = 24,
mouth_move_strength: float = 1.0,
paste_back: bool = True,
head_move_strength: float = 0.7,
) -> str:
"""
POST /video-synthesis/ — generate LivePortrait video (async).
Returns video_url.
"""
print(f"\n[step2] liveportrait generate template={template_id} fps={video_fps} …")
payload = {
"model": "liveportrait",
"input": {
"image_url": image_url,
"audio_url": audio_url,
},
"parameters": {
"template_id": template_id,
"eye_move_freq": eye_move_freq,
"video_fps": video_fps,
"mouth_move_strength": mouth_move_strength,
"paste_back": paste_back,
"head_move_strength": head_move_strength,
},
}
r = requests.post(
f"{BASE_URL}/api/v1/services/aigc/image2video/video-synthesis/",
headers=_headers(async_mode=True),
json=payload,
timeout=60,
)
r.raise_for_status()
data = r.json()
task_id = (data.get("output") or {}).get("task_id")
if not task_id:
raise RuntimeError(f"No task_id: {json.dumps(data, ensure_ascii=False)}")
print(f" task_id={task_id}")
out = _wait_task(task_id, interval=5, max_wait=600)
video_url = (out.get("results") or {}).get("video_url")
if not video_url:
raise RuntimeError(f"No video_url in result: {out}")
duration = (out.get("usage") or {}).get("video_duration", "?")
print(f" ✅ video_url={video_url} duration={duration}s")
return video_url
# ── Main ───────────────────────────────────────────────────────────────────────
def main():
p = argparse.ArgumentParser(
description="LivePortrait: portrait image + audio/video → animated portrait video"
)
# image
img_grp = p.add_mutually_exclusive_group()
img_grp.add_argument("--image", help="本地图片(自动上传 OSS)")
img_grp.add_argument("--image-url", help="图片公网 URL")
# audio / video
audio_grp = p.add_mutually_exclusive_group()
audio_grp.add_argument("--audio", help="本地音频文件(wav/mp3,自动上传 OSS)")
audio_grp.add_argument("--audio-url", help="音频公网 URL")
audio_grp.add_argument("--video", help="本地视频文件,自动提取音频并上传 OSS")
# generation params
p.add_argument("--template", default="normal", choices=["normal", "calm", "active"],
help="动作模板:normal(默认)/calm(播报)/active(演唱)")
p.add_argument("--eye-freq", type=float, default=0.5,
help="眨眼频率 0~1 (default: 0.5)")
p.add_argument("--fps", type=int, default=24,
help="输出帧率 15~30 (default: 24)")
p.add_argument("--mouth-strength", type=float, default=1.0,
help="嘴部动作幅度 0~1.5 (default: 1.0)")
p.add_argument("--head-strength", type=float, default=0.7,
help="头部动作幅度 0~1 (default: 0.7)")
p.add_argument("--no-paste-back", action="store_true",
help="仅输出人脸区域(不贴回原图)")
p.add_argument("--skip-detect", action="store_true",
help="跳过图像检测步骤")
p.add_argument("--download", action="store_true", help="下载生成的视频到本地")
p.add_argument("--output", default="lp_output.mp4", help="输出文件名 (default: lp_output.mp4)")
args = p.parse_args()
tmp_files = []
try:
# ── image URL ──────────────────────────────────────────────────────
image_url = args.image_url
if image_url:
image_url = validate_http_https_url(image_url, field="--image-url")
if not image_url:
if not args.image:
p.error("需要 --image 或 --image-url")
converted = convert_image(args.image)
if converted != args.image:
tmp_files.append(converted)
image_url = upload_to_oss(converted)
# ── audio URL ──────────────────────────────────────────────────────
audio_url = args.audio_url
if audio_url:
audio_url = validate_http_https_url(audio_url, field="--audio-url")
if not audio_url:
if args.video:
extracted = extract_audio_from_video(args.video)
tmp_files.append(extracted)
audio_url = upload_to_oss(extracted)
elif args.audio:
converted_audio = convert_audio(args.audio)
if converted_audio != args.audio:
tmp_files.append(converted_audio)
audio_url = upload_to_oss(converted_audio)
else:
p.error("需要 --audio, --audio-url 或 --video 之一")
# ── Step 1: detect ─────────────────────────────────────────────────
if not args.skip_detect:
lp_detect(image_url)
# ── Step 2: generate ───────────────────────────────────────────────
video_url = lp_generate(
image_url=image_url,
audio_url=audio_url,
template_id=args.template,
eye_move_freq=args.eye_freq,
video_fps=args.fps,
mouth_move_strength=args.mouth_strength,
paste_back=not args.no_paste_back,
head_move_strength=args.head_strength,
)
print(f"\n✅ Done! video_url = {video_url}")
if args.download:
out_path = resolve_under_cwd(args.output, field="--output")
safe_url = validate_http_https_url(video_url, field="result video URL")
print(f"Downloading → {out_path} …")
with urllib.request.urlopen(safe_url, timeout=300) as response:
with open(out_path, 'wb') as f:
f.write(response.read())
size_kb = out_path.stat().st_size // 1024
print(f"Saved {out_path} ({size_kb}KB)")
finally:
for f in tmp_files:
try:
os.unlink(f)
except Exception:
pass
if __name__ == "__main__":
try:
main()
except Exception as e:
print(f"\nERROR: {e}", file=sys.stderr)
sys.exit(1)
FILE:scripts/portrait_animate.py
#!/usr/bin/env python3
"""EMO 口播视频(DashScope)
流程:face-detect -> video-synthesis -> 轮询 tasks/{task_id}
"""
import argparse
import json
import os
import sys
import time
import urllib.request
from pathlib import Path
import requests
from input_validation import resolve_under_cwd, validate_http_https_url
BASE_URL = os.getenv("DASHSCOPE_BASE_URL", "https://dashscope.aliyuncs.com")
USER_AGENT = "AlibabaCloud-Agent-Skills/alibabacloud-avatar-video"
def _headers(async_mode: bool = False):
key = os.getenv("DASHSCOPE_API_KEY")
if not key:
raise RuntimeError("Missing DASHSCOPE_API_KEY")
h = {
"Authorization": f"Bearer {key}",
"Content-Type": "application/json",
"User-Agent": USER_AGENT,
}
if async_mode:
h["X-DashScope-Async"] = "enable"
return h
_OSS_SIGNED_URL_EXPIRES = int(os.environ.get("OSS_SIGNED_URL_EXPIRES", str(3 * 24 * 3600))) # 默认 3 天
def upload_to_oss(local_path: str, expires: int = _OSS_SIGNED_URL_EXPIRES) -> str:
"""上传文件到 OSS,返回签名 URL(私有 bucket)。有效期默认 3 天。"""
import oss2
auth = oss2.Auth(
os.environ["ALIBABA_CLOUD_ACCESS_KEY_ID"],
os.environ["ALIBABA_CLOUD_ACCESS_KEY_SECRET"],
)
bucket_name = os.environ["OSS_BUCKET"]
endpoint = os.environ.get("OSS_ENDPOINT", "oss-cn-beijing.aliyuncs.com")
# normalize: strip any existing scheme prefix to avoid double https://
endpoint = endpoint.replace("https://", "").replace("http://", "").rstrip("/")
bucket = oss2.Bucket(auth, f"https://{endpoint}", bucket_name)
key = f"human-avatar/{Path(local_path).name}"
bucket.put_object_from_file(key, local_path)
# 签名 URL,DashScope 可直接下载,过期时间默认 3 天
signed_url = bucket.sign_url("GET", key, expires)
return signed_url
def emo_detect(image_url: str, ratio: str = "1:1"):
url = f"{BASE_URL}/api/v1/services/aigc/image2video/face-detect"
payload = {
"model": "emo-detect-v1",
"input": {"image_url": image_url},
"parameters": {"ratio": ratio},
}
r = requests.post(url, headers=_headers(), json=payload, timeout=120)
r.raise_for_status()
data = r.json()
out = data.get("output", {})
if not out.get("check_pass"):
raise RuntimeError(f"EMO detect failed: {out.get('message', out)}")
return out["face_bbox"], out["ext_bbox"]
def emo_submit(image_url: str, audio_url: str, face_bbox, ext_bbox, style_level: str = "normal"):
url = f"{BASE_URL}/api/v1/services/aigc/image2video/video-synthesis"
payload = {
"model": "emo-v1",
"input": {
"image_url": image_url,
"audio_url": audio_url,
"face_bbox": face_bbox,
"ext_bbox": ext_bbox,
},
"parameters": {"style_level": style_level},
}
r = requests.post(url, headers=_headers(async_mode=True), json=payload, timeout=120)
r.raise_for_status()
data = r.json()
task_id = data.get("output", {}).get("task_id")
if not task_id:
raise RuntimeError(f"No task_id in response: {json.dumps(data, ensure_ascii=False)}")
return task_id
def wait_task(task_id: str, interval: int = 15, max_wait: int = 1800):
url = f"{BASE_URL}/api/v1/tasks/{task_id}"
start = time.time()
while time.time() - start < max_wait:
r = requests.get(url, headers=_headers(), timeout=60)
r.raise_for_status()
data = r.json()
status = data.get("output", {}).get("task_status")
print(f"status={status}")
if status == "SUCCEEDED":
return data.get("output", {}).get("results", {}).get("video_url")
if status in ("FAILED", "CANCELED", "UNKNOWN"):
raise RuntimeError(json.dumps(data, ensure_ascii=False))
time.sleep(interval)
raise TimeoutError("Task timeout")
def main():
p = argparse.ArgumentParser()
p.add_argument("--image-url")
p.add_argument("--audio-url")
p.add_argument("--image")
p.add_argument("--audio")
p.add_argument("--ratio", default="1:1", choices=["1:1", "3:4"])
p.add_argument("--style-level", default="normal", choices=["normal", "calm", "active"])
p.add_argument("--download", action="store_true")
p.add_argument("--output", default="emo_output.mp4")
args = p.parse_args()
image_url = args.image_url
if image_url:
image_url = validate_http_https_url(image_url, field="--image-url")
elif args.image:
image_url = upload_to_oss(args.image)
audio_url = args.audio_url
if audio_url:
audio_url = validate_http_https_url(audio_url, field="--audio-url")
elif args.audio:
audio_url = upload_to_oss(args.audio)
if not image_url or not audio_url:
p.error("Need --image-url/--image and --audio-url/--audio")
face_bbox, ext_bbox = emo_detect(image_url, ratio=args.ratio)
task_id = emo_submit(image_url, audio_url, face_bbox, ext_bbox, style_level=args.style_level)
print(f"task_id={task_id}")
video_url = wait_task(task_id)
print(f"video_url={video_url}")
if args.download and video_url:
out_path = resolve_under_cwd(args.output, field="--output")
safe_url = validate_http_https_url(video_url, field="result video URL")
with urllib.request.urlopen(safe_url, timeout=300) as response:
with open(out_path, 'wb') as f:
f.write(response.read())
print(f"saved={out_path}")
if __name__ == "__main__":
try:
main()
except Exception as e:
print(f"ERROR: {e}", file=sys.stderr)
sys.exit(1)
FILE:scripts/qwen_tts.py
#!/usr/bin/env python3
"""
Qwen TTS — 文本生成语音(千问实时语音合成)
SECURITY NOTES:
- base64.b64decode: used ONLY to decode audio PCM chunks received from
Alibaba DashScope WebSocket API (response.audio.delta). No external
input is evaluated or executed.
- DASHSCOPE_API_KEY / environment variables: read-only, never logged or
transmitted to any third party. Used solely to authenticate with
dashscope.aliyuncs.com (Alibaba Cloud official endpoint).
- No subprocess calls, no file system writes beyond the output WAV file.
支持根据场景自动选择模型和角色,默认使用 qwen3-tts-vd-realtime-2026-01-15。
输出格式:WAV(内部采集 PCM 后自动转换)。
用法:
python qwen_tts.py --text "你好,欢迎来到未来。" --download
python qwen_tts.py --text "今日股市大涨..." --scene news --download
python qwen_tts.py --text "同学们,今天..." --scene education --voice Ethan --download
python qwen_tts.py --text "亲爱的顾客..." --scene customer_service --download
python qwen_tts.py --text "..." --model qwen3-tts-instruct-flash-realtime \
--instructions "语速较快,带有明显的上扬语调,适合介绍时尚产品" --download
依赖:
pip install dashscope scipy numpy
"""
import argparse
import base64
import io
import os
import struct
import sys
import threading
import time
from pathlib import Path
import dashscope
from input_validation import read_text_file_limited, resolve_under_cwd
from dashscope.audio.qwen_tts_realtime import (
QwenTtsRealtime,
QwenTtsRealtimeCallback,
AudioFormat,
)
# ── User-Agent 配置 ──────────────────────────────────────────────────────────────
USER_AGENT = "AlibabaCloud-Agent-Skills/alibabacloud-avatar-video"
# ── WSS endpoints ──────────────────────────────────────────────────────────────
WSS_URL_CN = "wss://dashscope.aliyuncs.com/api-ws/v1/realtime"
WSS_URL_INTL = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"
# ── Model selection guide ──────────────────────────────────────────────────────
# Default = qwen3-tts-vd-realtime-2026-01-15 (Voice Design, text-described tones)
DEFAULT_MODEL = "qwen3-tts-vd-realtime-2026-01-15"
MODEL_GUIDE = {
"vd": "qwen3-tts-vd-realtime-2026-01-15", # 声音设计(文本描述定制音色)
"vc": "qwen3-tts-vc-realtime-2026-01-15", # 声音复刻(音频样本复刻)
"instruct": "qwen3-tts-instruct-flash-realtime", # 指令控制(情感/角色/播音风格)
"flash": "qwen3-tts-flash-realtime", # 快速多语种(客服/对话机器人)
"legacy": "qwen-tts-realtime", # 旧版稳定
}
# 场景 → 推荐模型 映射
SCENE_TO_MODEL = {
"news": "qwen3-tts-instruct-flash-realtime", # 新闻播报
"documentary": "qwen3-tts-instruct-flash-realtime", # 纪录片
"advertising": "qwen3-tts-instruct-flash-realtime", # 广告宣传
"audiobook": "qwen3-tts-instruct-flash-realtime", # 有声书
"drama": "qwen3-tts-instruct-flash-realtime", # 广播剧/游戏配音
"customer_service": "qwen3-tts-flash-realtime", # 智能客服
"chatbot": "qwen3-tts-flash-realtime", # 对话机器人
"education": "qwen3-tts-flash-realtime", # 教育/讲解
"ecommerce": "qwen3-tts-flash-realtime", # 电商/直播带货
"short_video": "qwen3-tts-flash-realtime", # 短视频配音
"brand": DEFAULT_MODEL, # 品牌定制声音
"default": DEFAULT_MODEL,
}
# 场景 → 推荐 voice(系统音色)
SCENE_TO_VOICE = {
"news": "Serena", # 成熟女声,播报感强
"documentary": "Ethan", # 稳重男声
"advertising": "Cherry", # 活泼女声
"audiobook": "Cherry", # 温柔女声
"drama": "Dylan", # 富有表现力
"customer_service": "Anna", # 亲切女声
"chatbot": "Anna",
"education": "Ethan", # 清晰男声
"ecommerce": "Cherry", # 热情女声
"short_video": "Cherry",
"brand": "Cherry",
"default": "Cherry",
}
# 所有可用系统音色(附描述)
VOICES = {
"Cherry": "活泼甜美女声,中文优先,适合广告/有声书/配音",
"Serena": "成熟知性女声,适合新闻/讲解/企业形象",
"Ethan": "稳重亲切男声,适合教育/纪录片/培训",
"Dylan": "富有表现力男声,适合广播剧/游戏配音",
"Anna": "温柔亲切女声,适合客服/助手/日常",
"Chelsie": "年轻清新女声,适合短视频/电商",
"Thomas": "低沉磁性男声,适合品牌宣传/广告",
"Luna": "温暖柔和女声,适合冥想/故事叙述",
}
# ── PCM → WAV conversion ────────────────────────────────────────────────────────
def pcm_to_wav(pcm_bytes: bytes, sample_rate: int = 24000,
channels: int = 1, sample_width: int = 2) -> bytes:
"""Convert raw PCM bytes to WAV format."""
buf = io.BytesIO()
data_len = len(pcm_bytes)
byte_rate = sample_rate * channels * sample_width
block_align = channels * sample_width
buf.write(b"RIFF")
buf.write(struct.pack("<I", 36 + data_len))
buf.write(b"WAVE")
buf.write(b"fmt ")
buf.write(struct.pack("<I", 16)) # chunk size
buf.write(struct.pack("<H", 1)) # PCM format
buf.write(struct.pack("<H", channels))
buf.write(struct.pack("<I", sample_rate))
buf.write(struct.pack("<I", byte_rate))
buf.write(struct.pack("<H", block_align))
buf.write(struct.pack("<H", sample_width * 8))
buf.write(b"data")
buf.write(struct.pack("<I", data_len))
buf.write(pcm_bytes)
return buf.getvalue()
# ── TTS client ─────────────────────────────────────────────────────────────────
class _TtsCollector(QwenTtsRealtimeCallback):
def __init__(self):
self.pcm_chunks: list[bytes] = []
self.done_event = threading.Event()
self.error: Exception | None = None
def on_open(self) -> None:
pass
def on_close(self, code, msg) -> None:
self.done_event.set()
def on_event(self, response: dict) -> None:
try:
evt = response.get("type", "")
if evt == "response.audio.delta":
chunk = base64.b64decode(response["delta"])
self.pcm_chunks.append(chunk)
elif evt == "session.finished":
self.done_event.set()
elif evt == "error":
self.error = RuntimeError(str(response))
self.done_event.set()
except Exception as e:
self.error = e
self.done_event.set()
def collect(self) -> bytes:
return b"".join(self.pcm_chunks)
def synthesize(
text: str,
model: str = DEFAULT_MODEL,
voice: str = "Cherry",
instructions: str = "",
optimize_instructions: bool = True,
sample_rate: int = 24000,
speed: float = 1.0,
timeout: int = 120,
url: str = WSS_URL_CN,
) -> bytes:
"""
Synthesize text to speech using Qwen TTS Realtime.
Returns:
WAV bytes
"""
dashscope.api_key = os.environ.get("DASHSCOPE_API_KEY", dashscope.api_key)
if not dashscope.api_key:
raise RuntimeError("DASHSCOPE_API_KEY not set")
# Only PCM_24000HZ_MONO_16BIT is universally available; use it regardless of sample_rate param
audio_fmt = AudioFormat.PCM_24000HZ_MONO_16BIT
sample_rate = 24000 # force to match
# 设置 User-Agent
headers = {"User-Agent": USER_AGENT}
collector = _TtsCollector()
client = QwenTtsRealtime(
model=model,
callback=collector,
url=url,
headers=headers, # 传递自定义 headers
)
client.connect()
session_kwargs = dict(
voice=voice,
response_format=audio_fmt,
mode="server_commit",
)
if speed != 1.0:
session_kwargs["speed"] = speed
if instructions:
session_kwargs["instructions"] = instructions
session_kwargs["optimize_instructions"] = optimize_instructions
client.update_session(**session_kwargs)
# Send text in chunks (improves latency for long text)
max_chunk = 100
for i in range(0, len(text), max_chunk):
client.append_text(text[i:i + max_chunk])
time.sleep(0.05)
client.finish()
collector.done_event.wait(timeout=timeout)
if collector.error:
raise collector.error
pcm = collector.collect()
if not pcm:
raise RuntimeError("No audio received from TTS service")
return pcm_to_wav(pcm, sample_rate=sample_rate)
# ── Main ────────────────────────────────────────────────────────────────────────
def main():
p = argparse.ArgumentParser(
description="Qwen TTS — 文本生成语音(根据场景自动选择模型和音色)"
)
p.add_argument("--text", help="要合成的文本(也可用 --text-file)")
p.add_argument("--text-file", help="从文件读取文本")
p.add_argument("--scene", default="default",
choices=list(SCENE_TO_MODEL.keys()),
help="""场景选择(自动推荐模型和音色):
news 新闻播报
documentary 纪录片旁白
advertising 广告宣传
audiobook 有声书
drama 广播剧/游戏配音
customer_service 智能客服
chatbot 对话机器人
education 教育讲解
ecommerce 电商/直播
short_video 短视频配音
brand 品牌定制
default 默认(qwen3-tts-vd-realtime)""")
p.add_argument("--model", default=None,
help=f"指定模型(覆盖 --scene 的推荐)。默认: {DEFAULT_MODEL}")
p.add_argument("--voice", default=None,
help=f"指定音色(覆盖 --scene 的推荐)。可选: {', '.join(VOICES)}")
p.add_argument("--instructions", default="",
help="自然语言指令控制语音表现(需配合 qwen3-tts-instruct-flash-realtime 模型)")
p.add_argument("--speed", type=float, default=1.0,
help="语速倍率,0.5~2.0 (default: 1.0)")
p.add_argument("--sample-rate", type=int, default=24000,
choices=[16000, 24000, 48000],
help="采样率 (default: 24000)")
p.add_argument("--intl", action="store_true",
help="使用新加坡国际节点(需要国际区 API Key)")
p.add_argument("--list-voices", action="store_true", help="列出所有音色及描述")
p.add_argument("--list-models", action="store_true", help="列出模型选型指南")
p.add_argument("--download", action="store_true", help="保存音频到本地文件")
p.add_argument("--output", default="tts_output.wav", help="输出文件名 (default: tts_output.wav)")
args = p.parse_args()
if args.list_voices:
print("\n── 可用音色 ──")
for name, desc in VOICES.items():
print(f" {name:<10} {desc}")
return
if args.list_models:
print("\n── 模型选型指南 ──")
for scene, model in SCENE_TO_MODEL.items():
voice = SCENE_TO_VOICE.get(scene, "Cherry")
print(f" {scene:<20} model={model} voice={voice}")
return
# get text
text = args.text
if not text and args.text_file:
text = read_text_file_limited(args.text_file, field="--text-file")
if not text:
p.error("需要 --text 或 --text-file")
# resolve model and voice
model = args.model or SCENE_TO_MODEL.get(args.scene, DEFAULT_MODEL)
voice = args.voice or SCENE_TO_VOICE.get(args.scene, "Cherry")
url = WSS_URL_INTL if args.intl else WSS_URL_CN
print(f"[tts] model={model} voice={voice} scene={args.scene}")
print(f"[tts] text({len(text)}ch): {text[:60]}{'...' if len(text)>60 else ''}")
if args.instructions:
print(f"[tts] instructions: {args.instructions}")
t0 = time.time()
wav_bytes = synthesize(
text=text,
model=model,
voice=voice,
instructions=args.instructions,
sample_rate=args.sample_rate,
speed=args.speed,
url=url,
)
elapsed = time.time() - t0
size_kb = len(wav_bytes) // 1024
print(f"\n✅ Done {size_kb}KB {elapsed:.1f}s")
if args.download:
out = resolve_under_cwd(args.output, field="--output")
out.parent.mkdir(parents=True, exist_ok=True)
out.write_bytes(wav_bytes)
print(f"Saved → {out} ({size_kb}KB)")
else:
# Write to stdout (for piping)
sys.stdout.buffer.write(wav_bytes)
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print("\n[interrupted]")
except Exception as e:
print(f"\nERROR: {e}", file=sys.stderr)
sys.exit(1)
FILE:scripts/text_to_image.py
#!/usr/bin/env python3
"""
万相文生图 V2 (wan2.x-t2i)
默认模型:wan2.2-t2i-flash
用法:
python text_to_image.py --prompt "一位优雅的女性站在樱花树下" --download
python text_to_image.py --prompt "..." --model wan2.6-t2i --size 960*1696 --n 1
python text_to_image.py --prompt "..." --negative-prompt "低质量,模糊" --download
"""
import argparse
import json
import os
import sys
import time
import urllib.request
from pathlib import Path
import requests
from input_validation import resolve_under_cwd, validate_http_https_url
BASE_URL = os.getenv("DASHSCOPE_BASE_URL", "https://dashscope.aliyuncs.com")
# wan2.6 支持同步调用;wan2.5及以下需要异步
_SYNC_MODELS = {"wan2.6-t2i"}
USER_AGENT = "AlibabaCloud-Agent-Skills/alibabacloud-avatar-video"
def _headers(async_mode: bool = False) -> dict:
key = os.environ.get("DASHSCOPE_API_KEY")
if not key:
raise RuntimeError("DASHSCOPE_API_KEY not set")
h = {
"Authorization": f"Bearer {key}",
"Content-Type": "application/json",
"User-Agent": USER_AGENT,
}
if async_mode:
h["X-DashScope-Async"] = "enable"
return h
def _wait_task(task_id: str, interval: int = 5, max_wait: int = 300) -> dict:
"""Poll async task until SUCCEEDED."""
url = f"{BASE_URL}/api/v1/tasks/{task_id}"
start = time.time()
while time.time() - start < max_wait:
r = requests.get(url, headers=_headers(), timeout=30)
r.raise_for_status()
data = r.json()
status = data.get("output", {}).get("task_status", "UNKNOWN")
elapsed = int(time.time() - start)
print(f" [{elapsed}s] status={status}")
if status == "SUCCEEDED":
return data.get("output", {})
if status in ("FAILED", "CANCELED", "UNKNOWN"):
raise RuntimeError(f"Task failed: {json.dumps(data, ensure_ascii=False)}")
time.sleep(interval)
raise TimeoutError(f"Task {task_id} timed out")
def text_to_image(
prompt: str,
model: str = "wan2.2-t2i-flash",
size: str = "1280*1280",
n: int = 1,
negative_prompt: str = "",
prompt_extend: bool = True,
seed: int = None,
) -> list[str]:
"""
Call 万相 text-to-image API. Returns list of image URLs.
Args:
prompt: 正向提示词
model: 模型名,默认 wan2.2-t2i-flash
size: 分辨率,格式 宽*高,默认 1280*1280
n: 生成张数 1~4,默认 1
negative_prompt: 反向提示词
prompt_extend: 是否开启提示词智能改写,默认 True
seed: 随机种子(可选)
"""
is_sync_model = model in _SYNC_MODELS
payload = {
"model": model,
"input": {
"messages": [
{"role": "user", "content": [{"text": prompt}]}
]
},
"parameters": {
"size": size,
"n": n,
"prompt_extend": prompt_extend,
"watermark": False,
},
}
if negative_prompt:
payload["parameters"]["negative_prompt"] = negative_prompt
if seed is not None:
payload["parameters"]["seed"] = seed
if is_sync_model:
# Synchronous (wan2.6)
endpoint = f"{BASE_URL}/api/v1/services/aigc/multimodal-generation/generation"
print(f"[t2i] sync call model={model} size={size} n={n}")
r = requests.post(endpoint, headers=_headers(), json=payload, timeout=120)
r.raise_for_status()
data = r.json()
images = []
for choice in data.get("output", {}).get("choices", []):
for item in choice.get("message", {}).get("content", []):
if item.get("type") == "image":
images.append(item["image"])
if not images:
raise RuntimeError(f"No images in response: {json.dumps(data, ensure_ascii=False)}")
return images
else:
# Async (wan2.5 and below) — also works for wan2.6
endpoint = f"{BASE_URL}/api/v1/services/aigc/image-generation/generation"
print(f"[t2i] async call model={model} size={size} n={n}")
r = requests.post(endpoint, headers=_headers(async_mode=True), json=payload, timeout=60)
r.raise_for_status()
data = r.json()
task_id = data.get("output", {}).get("task_id")
if not task_id:
raise RuntimeError(f"No task_id: {json.dumps(data, ensure_ascii=False)}")
print(f" task_id={task_id}")
out = _wait_task(task_id)
results = out.get("results", [])
images = [r["url"] for r in results if r.get("url")]
if not images:
raise RuntimeError(f"No image URLs in result: {out}")
return images
def main():
p = argparse.ArgumentParser(description="万相文生图 V2")
p.add_argument("--prompt", required=True, help="正向提示词")
p.add_argument("--model", default="wan2.2-t2i-flash",
help="模型名 (default: wan2.2-t2i-flash),可选: wan2.6-t2i, wan2.5-t2i-preview, wan2.2-t2i-plus 等")
p.add_argument("--size", default="1280*1280",
help="分辨率 宽*高 (default: 1280*1280),推荐: 960*1696(9:16), 1696*960(16:9)")
p.add_argument("--n", type=int, default=1, help="生成张数 1~4 (default: 1)")
p.add_argument("--negative-prompt", default="", help="反向提示词")
p.add_argument("--no-prompt-extend", action="store_true", help="关闭提示词智能改写")
p.add_argument("--seed", type=int, default=None, help="随机种子")
p.add_argument("--download", action="store_true", help="下载生成图片到本地")
p.add_argument("--output-dir", default=".", help="图片保存目录 (default: 当前目录)")
args = p.parse_args()
images = text_to_image(
prompt=args.prompt,
model=args.model,
size=args.size,
n=args.n,
negative_prompt=args.negative_prompt,
prompt_extend=not args.no_prompt_extend,
seed=args.seed,
)
print(f"\n✅ Generated {len(images)} image(s):")
out_dir = resolve_under_cwd(args.output_dir, field="--output-dir")
out_dir.mkdir(parents=True, exist_ok=True)
for i, url in enumerate(images):
print(f" [{i+1}] {url}")
if args.download:
safe_url = validate_http_https_url(url, field="image URL")
filename = out_dir / f"t2i_{int(time.time())}_{i+1}.png"
with urllib.request.urlopen(safe_url, timeout=300) as response:
with open(filename, 'wb') as f:
f.write(response.read())
size_kb = filename.stat().st_size // 1024
print(f" → saved {filename} ({size_kb}KB)")
return images
if __name__ == "__main__":
try:
main()
except Exception as e:
print(f"\nERROR: {e}", file=sys.stderr)
sys.exit(1)
Alibaba Cloud Bailian Video Analysis Skill. Use for intelligent video comprehension and analysis via the Bailian (QuanMiaoLightApp) API. **Required API Produ...
---
name: alibabacloud-bailian-videoanalysis
description: |
Alibaba Cloud Bailian Video Analysis Skill. Use for intelligent video comprehension and analysis via the Bailian (QuanMiaoLightApp) API.
**Required API Product**: QuanMiaoLightApp (version 2024-08-01)
**Required API Actions**: SubmitVideoAnalysisTask, GetVideoAnalysisTask
**DO NOT use**: videorecog, Mts, or any other product for video analysis
Triggers: "analyze video", "understand video", "analyze the local video /temp/xxx.mp4", "analyze the local video https://xxx.com/temp/xxx.mp4", "what is this video about", "summarize this video", "split video into shots", "video comprehension", "extract video insights", "transcribe video", "extract video captions", "generate video title", "generate video outline", "video mindmap".
---
# Bailian Video Analysis
This skill provides video analysis functionality based on Alibaba Cloud Bailian Video Analysis Light Application. It uses the Bailian (QuanMiaoLightApp) API for intelligent video comprehension, including shot analysis, ASR transcription, title generation, caption extraction, and mind mapping.
**Architecture:** `CLI (Credential Chain) + OSS (File Storage) + Bailian Workspace + Quanmiao Video Analysis Service + Python SDK Scripts`
---
## Output Language Policy
**Detection Priority:** User input language → Environment language → **Default: Chinese (中文)**
**Apply to:** All user-facing messages, summaries, error explanations, and interactive prompts.
**Keep original language:** Code snippets, API responses, file paths, technical identifiers, HTTP status codes.
---
## Installation
### 1. Python Dependencies
```bash
python -m pip install -r scripts/requirements.txt
# On some systems, use pyhon3 instead of python
```
### 2. Alibaba Cloud CLI
> **Pre-check: Aliyun CLI >= 3.3.1 required**
> Run `aliyun version` to verify >= 3.3.1. If not installed or version too low,
> see `references/cli-installation-guide.md` for installation instructions.
> Then **[MUST]** run `aliyun configure set --auto-plugin-install true` to enable automatic plugin installation.
> **AI-Mode Configuration for aliyun CLI**
>
> When using the aliyun CLI in this skill, you MUST configure AI-Mode settings:
>
> ```bash
> # Enable AI-Mode (required before using CLI commands)
> aliyun configure ai-mode enable
>
> # Set User-Agent for tracking (required)
> aliyun configure ai-mode set-user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis
>
> # Update plugins to latest versions (recommended)
> aliyun plugin update
> ```
>
> **Note:** After completing your tasks, you can disable AI-Mode:
> ```bash
> aliyun configure ai-mode disable
> ```
### 3. Important Prerequisite: Activate Bailian and Video Analysis Service
**Before using this skill, you must first activate the Video Analysis service in the Alibaba Cloud Bailian console!**
1. Visit [Bailian Video_Analysis page](https://bailian.console.aliyun.com/cn-beijing#/app/app-market/quanmiao/video-comprehend)
2. Click the **"Paid activation"** button
3. Confirm activation and wait for the service to take effect (usually 1-2 minutes)
### 4. Important Prerequisite: Guide Users to Grant Permissions in Alibaba Cloud Access Control and Bailian
**Before using this skill, you must grant the following permissions to the previously configured AK, otherwise calls will fail and may cause 403 errors**
#### 1) RAM Permission Grant
See [references/ram-policies.md](references/ram-policies.md) for the full list of required RAM permissions and authorization instructions.
> **[MUST] Permission Failure Handling:** When any command or API call fails due to permission errors at any point during execution, follow this process:
> 1. Read `references/ram-policies.md` to get the full list of permissions required by this SKILL
> 2. Use `ram-permission-diagnose` skill to guide the user through requesting the necessary permissions
> 3. Pause and wait until the user confirms that the required permissions have been granted
#### 2) Bailian Workspace Permission Grant
1. Visit [Alibaba Cloud Bailian Permission Management](https://bailian.console.aliyun.com/cn-beijing?tab=app#/authority)
2. If the RAM user corresponding to the AK does not exist, click **"Add User"** in the upper right corner of the page, select the corresponding RAM user and click confirm to add.
3. There is a 30s effective time after configuration, please wait patiently for a while.
---
## Authentication
> **Pre-check: Alibaba Cloud Credentials Required**
>
> **Security Rules:**
> - **NEVER** read, echo, or print AK/SK values (e.g., `echo $ALIBABA_CLOUD_ACCESS_KEY_ID` is FORBIDDEN)
> - **NEVER** ask the user to input AK/SK directly in the conversation or command line
> - **NEVER** use `aliyun configure set` with literal credential values
> - **ONLY** use `aliyun configure list` to check credential status
>
> ```bash
> aliyun configure list
> ```
> Check the output for a valid profile (AK, STS, or OAuth identity).
>
> **If no valid profile exists, STOP here.**
> 1. Obtain credentials from [Alibaba Cloud Console](https://ram.console.aliyun.com/manage/ak)
> 2. Configure credentials **outside of this session** (via `aliyun configure` in terminal or environment variables in shell profile)
> 3. Return and re-run after `aliyun configure list` shows a valid profile
---
## Parameter Confirmation
> **IMPORTANT: Parameter Confirmation** — Before executing any command or API call,
> confirm user-provided or customizable parameters (video source, OSS bucket, oss object key).
> System-auto-resolved parameters (workspace_id, default OSS bucket) do NOT
> require explicit confirmation unless the user wants to override them.
| Parameter | Type | Description | Default / Resolution |
|------------------|---------------|-------------------------------------------|------------------------------------------------------------------------------------------|
| `video_source` | Required | Local file path OR downloadable video URL | N/A (user must provide) |
| `workspace_id` | Auto-resolved | Bailian workspace ID | Auto-detected(user may override) |
| `ossBucket` | Optional | OSS bucket name for file upload | Auto-detect from first available bucket; user may specify (e.g. `--ossBucket my-bucket`) |
| `ossObjectKey` | Optional | OSS object key for the uploaded file | `/temp/quanmiao/YYYYMMDD/filename` |
| `expireSeconds` | Optional | Temporary URL expiration time (seconds) | `14400` (4 hours) |
**Confirmation Workflow:**
1. **Auto-detection first**: The skill will auto-detect `workspace_id` and `ossBucket` when possible
2. **User override**: If user wants to specify custom values, confirm before using
3. **Local vs URL**: Confirm whether user is providing a local file path or a public URL
---
## Core Workflow
> **⚠️ CRITICAL: Cloud API Mandatory** — This skill MUST use Bailian (QuanMiaoLightApp) API for video analysis. Local tools (ffmpeg, whisper, OpenCV, ffprobe, mediainfo, etc.) are FORBIDDEN. If API calls fail due to credentials or permissions, follow Permission Failure Handling process — DO NOT fall back to local analysis.
### Step 1: Environment Check
Run `python scripts/check_env.py` to verify:
- Python packages are installed
- Credentials are configured via default credential chain
**If check_env.py fails or returns {"ready": false}**:
- Packages missing → Run `python -m pip install -r scripts/requirements.txt`
- Credentials missing or invalid → **Follow Permission Failure Handling process**:
1. Read `references/ram-policies.md` to get required permissions
2. Use `ram-permission-diagnose` skill to guide user through permission request
3. Wait for user confirmation before proceeding
4. DO NOT proceed with local analysis tools
**Expected output:** `{"ready": true}` indicates environment is properly configured.
### Step 2: Get Workspace ID
**Do not ask the user for workspace_id upfront.** Always auto-fetch available workspaces first:
```bash
aliyun modelstudio list-workspaces --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis
```
**Workspace selection logic:**
- **Single workspace returned** → use it directly, no need to prompt the user
- **Multiple workspaces returned** → display a numbered list and proceed with the following:
1. **Default behavior**: Use the first workspace in the list automatically to avoid unnecessary interaction
2. **User explicitly requests selection**: If the user says "let me choose workspace", "show me the workspace list", or similar, present the full list and ask them to pick one
- **No workspaces returned** → inform the user that no Bailian workspace is available, guide them to create one at the [Bailian Console](https://bailian.console.aliyun.com/)
- **Record user selection** in the session to avoid repeated inquiries
### Step 3: Upload File(video_source) to OSS
Based on the input resource type from Input Resource Validation:
**Case A: User provided a downloadable URL**
→ Verify URL accessibility: Test if the URL is downloadable using appropriate method for your OS
→ Skip this step. Use the video_source as `file_url` in Step 4.
**Case B: User provided a local file path**
→ Auto-detects OSS bucket、Upload local file to OSS and get a temporary URL(file_url) for Step 4:
- **(1) Auto-detect or use user-specified OSS bucket**:
- If user specifies `--ossBucket <bucket_name>`, attempt to use that bucket
- **If the specified bucket returns 403 AccessDenied or BucketAlreadyExists**: DO NOT switch to another bucket automatically. Instead:
1. Inform the user that the specified bucket is not accessible
2. Follow the Permission Failure Handling process in RAM Policy section
3. Guide user to grant OSS bucket access permissions or specify an alternative bucket they own
4. Wait for user confirmation before proceeding
- If no bucket specified, auto-detect from first available bucket
```bash
aliyun ossutil ls --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis
```
- **(2) Upload file to OSS**: Generate a unique key(oss_object_key) for the uploaded file.
**IMPORTANT - Upload Path Restriction:**
- **Default path**: MUST use `/temp/quanmiao/YYYYMMDD/filename` format (auto-generated with current date)
- **Custom path**: ONLY if user explicitly specifies a custom oss_object_key, otherwise always use default path
- **Security rule**: NEVER upload files outside `/temp/quanmiao/` prefix unless user explicitly requests it
```bash
aliyun ossutil cp <video_source> oss://{oss_bucket}/{oss_object_key} --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis --region {oss_region}
```
- **(3) Generate temporary URL**: Generate a temporary URL for the uploaded file using the `ossutil sign` command.
- `--expireSeconds`: Default 14400s (4 hours), confirm if different value needed
```bash
aliyun ossutil sign oss://{oss_bucket}/{oss_object_key} --expires-duration {expire_seconds} --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis --region {oss_region}
```
- **(4) Verify URL accessibility**: Test if the generated URL is downloadable using appropriate method for your OS
- **Note**: Prefer GET request over HEAD request for verification, as some OSS signature versions may reject HEAD requests.
**Recommended validation URL downloadable methods:**
- **macOS/Linux**: `curl -L --connect-timeout 10 --max-time 30 -o /dev/null -w "%{http_code}" <file_url>` (returns HTTP status code)
- **Windows**: `Invoke-WebRequest -Uri <file_url> -Method Head -TimeoutSec 30` (PowerShell)
**Validation criteria:**
- **HTTP 200** → URL is valid and accessible, proceed to Step 4
- **HTTP 403/404** → URL expired or invalid, regenerate with `ossutil sign`
- **Other errors** → Check network or OSS permissions
### Step 4: Submit Video Analysis Task
> **⚠️ MANDATORY API CALL** — You MUST call SubmitVideoAnalysisTask on QuanMiaoLightApp product (version 2024-08-01). Do NOT use videorecog, Mts, or any other product. Do NOT attempt local analysis.
> **API Selection Checklist** — Before calling, verify:
> - ✅ Product: QuanMiaoLightApp (NOT videorecog, NOT Mts)
> - ✅ Version: 2024-08-01
> - ✅ Action: SubmitVideoAnalysisTask
> - ✅ Parameters: workspace_id, file_url
```bash
python scripts/quanmiao_submit_videoAnalysis_task.py --workspace_id <workspace_id> --file_url <file_url>
```
**Parameters requiring confirmation:**
- `--workspace_id`: From Step 2 (confirm with user)
- `--file_url`: From Step 3 upload result or user-provided URL (confirm validity)
**Error Handling**:
- If API returns 401 InvalidApiKey or 403 AccessDenied: **STOP** and follow Permission Failure Handling process
- Do NOT attempt alternative APIs or local tools
- Inform user: "Video analysis requires Bailian service activation and proper RAM permissions. Please follow the permission grant guide."
Returns `task_id` for polling.
### Step 5: Poll for Task Result
> **⚠️ MANDATORY API CALL** — You MUST poll GetVideoAnalysisTask on QuanMiaoLightApp product (version 2024-08-01) until status is SUCCESSED. Do NOT generate summary from local tools or filename inference.
Video analysis is asynchronous. Poll until completion:
**Task Status:** `PENDING` → `RUNNING` → `SUCCESSED` | `FAILED` | `CANCELED`
**Variables:**
- `result_json_path`: `~/.quanmiao/videoanalysis/<video_filename_without_ext>_<task_id>.json`
- `index_file`: `~/.quanmiao/videoanalysis/index.jsonl`
**Polling Loop:**
1. Wait 10 seconds after submission
2. Run: `python scripts/quanmiao_get_videoAnalysis_task_result.py --workspace_id <workspace_id> --task_id <task_id> --save_path <result_json_path>`
3. Check the returned `status` field:
- **`SUCCESSED`** → Script auto-saves JSON to `result_json_path`, append entry to `index_file`, display saved locations, then proceed to Step 6
- **`FAILED`** or **`CANCELED`** → check error message, inform user, stop
- **`PENDING`** or **`RUNNING`** → display any partial results available, wait 10s, repeat from step 2
4. Max 180 retries (approximately 30 minutes)
**When taskStatus = SUCCESSED:**
1. **Append to index file** (`index_file`):
```json
{"task_id": "<task_id>", "video_source": "<original_path_or_url>", "workspace_id": "<workspace_id>", "result_file": "<result_json_path>", "timestamp": "<ISO8601>"}
```
2. **Display saved locations:**
```
✅ Files saved successfully:
- Raw JSON result: <result_json_path>
- Index updated: <index_file>
```
**Parameters requiring confirmation:**
- `--workspace_id`: Same as Step 4 (confirm consistency)
- `--task_id`: From Step 4 submission result (verify before polling)
### Step 6: Summarize Video Content
**CRITICAL: Use the results from Step 5 directly. Do NOT call the API again. Do NOT re-execute any analysis.**
Extract data from the SUCCESSED response obtained in Step 5 and summarize according to user requirements.
**Case A: If the user has a specific analysis request** (e.g., "analyze the speaker's body language", "extract key business insights", "compare two people in the video"), base your answer primarily on:
- **`payload.output.videoGenerateResults`** — scene-by-scene analysis, descriptions, interpretations
- **`payload.output.videoAnalysisResult.text`** — visual shot analysis, object/person recognition, action detection
Combine these fields to construct a targeted answer. Supplement with other fields (captions, mind map, title) as context if relevant.
**Case B: If no specific request**, use the standard output format: Title → Outline → Summary → Captions → Shot Analysis → Timeline → Token Usage
---
## Important Constraints
- **Cloud-only:** No local fallbacks (ffmpeg, whisper, etc.). If cloud API fails, follow Permission Failure Handling process.
- **Violation Consequence:** Using local tools instead of QuanMiaoLightApp API will result in task failure.
- **Security:** Never expose credentials in logs or prompts
- **Permissions:** On auth errors, see `ram-policies.md`
- **Caching:** Check `~/.quanmiao/videoanalysis/index.jsonl` before re-analyzing same video
---
## Success Verification
See [references/verification-method.md](references/verification-method.md) for step-by-step verification commands and expected outcomes.
---
## Cleanup
To clean up resources created by this skill:
**Delete uploaded OSS objects:**
```bash
aliyun ossutil rm oss://{oss_bucket}/{oss_object_key} --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis
```
**Cleanup best practices:**
- Confirm bucket name and oss object key before deletion
- Only delete objects with `/temp/quanmiao/` prefix to avoid accidental data loss
- Cached results at `~/.quanmiao/videoanalysis/` can be kept for future reference or deleted manually
---
## Best Practices
1. **Always verify environment first** — run `check_env.py` before any other operation to catch missing dependencies or credentials early.
2. **Auto-detect workspace_id** — always fetch workspaces via `list-workspaces`; default to the first result, but present a selection list when the user explicitly asks to choose.
3. **Use default OSS settings** — unless the user specifies a particular bucket, let the script auto-detect the bucket and generate the oss object key.
4. **Display partial results during polling** — when task status is `RUNNING`, show available results (title, captions) to give the user real-time feedback.
5. **Save complete result for summary** — when status becomes `SUCCESSED`, use the full result payload directly for Step 6 without re-calling the API.
6. **Respect URL expiration** — temporary OSS URLs expire after `expireSeconds` (default 14400s); ensure the task is submitted before the URL expires.
7. **Handle permission errors gracefully** — follow the Permission Failure Handling process in the RAM Policy section; never improvise credential fixes.
---
## Command Tables
See [references/related-commands.md](references/related-commands.md) for the full list of available scripts and their parameters.
---
## Reference Links
| Reference | Purpose |
|------------------------------------------|----------------------------------------------------------|
| `references/cli-installation-guide.md` | Installing and upgrading Aliyun CLI |
| `references/ram-policies.md` | RAM permission checklist and authorization guide |
| `references/acceptance-criteria.md` | Acceptance criteria and correct/incorrect usage patterns |
| `references/related-commands.md` | Available scripts and CLI command reference |
| `references/verification-method.md` | Step-by-step success verification commands |
---
## Troubleshooting
**Common scenarios:**
- **Permission denied** → See [ram-policies.md](references/ram-policies.md)
- **CLI not found** → See [cli-installation-guide.md](references/cli-installation-guide.md)
- **Workspace not found** → Create at [Bailian Console](https://bailian.console.aliyun.com/)
- **Upload failed** → Check OSS bucket permissions
- **Task timeout** → Video too large or network issues
---
FILE:references/acceptance-criteria.md
# Acceptance Criteria: alibabacloud-bailian-videoanalysis
**Scenario**: Alibaba Cloud Bailian (Quanmiao) Video Analysis
**Purpose**: Skill testing acceptance criteria
---
# Correct Python Script Usage Patterns
## 1. Environment Check
#### ✅ CORRECT
```bash
python scripts/check_env.py
```
- Returns JSON with `ready: true/false`
- Checks Python packages and credentials via default credential chain
- Does NOT read or print AK/SK values
#### ❌ INCORRECT
```bash
echo $ALIBABA_CLOUD_ACCESS_KEY_ID # NEVER print credentials
python scripts/check_env.py --ak xxx --sk xxx # NEVER pass AK/SK as arguments
```
- Credentials are checked via the default credential chain only
- The script uses `CredentialClient` internally; no manual credential passing
## 2. Workspace Listing
#### ✅ CORRECT
```bash
aliyun modelstudio list-workspaces --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis
```
- Returns JSON array of available workspaces
- Auto-detects workspace_id; does NOT require user to know it in advance
- Uses `--user-agent` parameter for tracking
#### ❌ INCORRECT
```bash
aliyun modelstudio list-workspaces # Missing --user-agent parameter
```
## 3. File Upload to OSS
#### ✅ CORRECT
```bash
# List buckets first
aliyun ossutil ls --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis
# Upload file
aliyun ossutil cp /path/to/video.mp4 oss://my-bucket/temp/quanmiao/20260409/video.mp4 --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis --region cn-beijing
# Generate temporary URL
aliyun ossutil sign oss://my-bucket/temp/quanmiao/20260409/video.mp4 --expires-duration 7200 --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis --region cn-beijing
```
- Uses auto-detected bucket and generated object key by default
- All commands include `--user-agent` parameter
- Optionally specifies custom `--expires-duration` (default 7200s)
#### ❌ INCORRECT
```bash
aliyun ossutil cp /path/to/video.mp4 oss://my-bucket/key # Missing --user-agent and --region
aliyun ossutil sign oss://my-bucket/key # Missing required parameters
```
## 4. Submit Video Analysis Task
#### ✅ CORRECT
```bash
python scripts/quanmiao_submit_videoAnalysis_task.py --workspace_id llm-xxx --file_url "https://..."
```
- workspace_id must come from Step 2
- file_url must come from Step 3 (upload script output tempUrl)
#### ❌ INCORRECT
```bash
python scripts/quanmiao_submit_videoAnalysis_task.py --workspace_id fake-id --file_url "invalid-url" # Invalid workspace_id or URL
python scripts/quanmiao_submit_videoAnalysis_task.py # Missing required parameters
```
- Both `--workspace_id` and `--file_url` are required
- file_url must be a valid temporary OSS URL
## 5. Get Task Result
#### ✅ CORRECT
```bash
python scripts/quanmiao_get_videoAnalysis_task_result.py --workspace_id llm-xxx --task_id abc123
```
- Poll every 10-15 seconds until status is `SUCCESSED` or `FAILED`
- Handle `RUNNING` status by displaying partial results
#### ❌ INCORRECT
```bash
python scripts/quanmiao_get_videoAnalysis_task_result.py --workspace_id llm-xxx --task_id abc123 --retry 1000 # No --retry parameter exists
```
- The script returns the current task status; polling logic is handled externally
- Max retries should be 180 (approximately 30 minutes)
## 6. Authentication
#### ✅ CORRECT
```bash
aliyun configure list # Check credential status
```
- Uses `aliyun configure list` to verify credentials
- Relies on Alibaba Cloud default credential chain
#### ❌ INCORRECT
```bash
aliyun configure set --mode AK --access-key-id LTAI... --access-key-secret abc... # NEVER set credentials within the session
echo $ALIBABA_CLOUD_ACCESS_KEY_ID # NEVER print credential values
```
- Credentials must be configured outside of the session
- Never read, echo, or print AK/SK values
# Common Anti-Patterns
## Hardcoding User-Specific Parameters
#### ❌ INCORRECT
```bash
# Assuming a specific workspace_id without checking
python scripts/quanmiao_submit_videoAnalysis_task.py --workspace_id llm-known-good --file_url ...
```
- Always fetch workspaces via step 4 first; default to the first result or present a selection list if the user explicitly asks to choose
## Skipping Environment Check
#### ❌ INCORRECT
```bash
# Jumping directly to OSS upload without checking environment
aliyun ossutil cp /path/to/video.mp4 oss://my-bucket/key --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis --region cn-beijing
```
- Always run `check_env.py` first to ensure dependencies and credentials are ready
## Re-calling API After Success
#### ❌ INCORRECT
```bash
# Step 5 returned SUCCESSED, but calling get_result again in Step 6
python scripts/quanmiao_get_videoAnalysis_task_result.py --workspace_id llm-xxx --task_id abc123
```
- Use the result from Step 5 directly in Step 6; do NOT call the API again
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
Complete guide for installing and configuring Aliyun CLI.
## Table of Contents
- [Installation](#installation)
- [macOS](#macos)
- [Linux](#linux)
- [Windows](#windows)
- [Configuration](#configuration)
- [Quick Start](#quick-start)
- [Configuration Modes](#configuration-modes)
- [Environment Variables](#environment-variables)
- [Managing Multiple Profiles](#managing-multiple-profiles)
- [Credential Priority](#credential-priority)
- [Verification](#verification)
- [Security Best Practices](#security-best-practices)
- [Troubleshooting](#troubleshooting)
- [Advanced Configuration](#advanced-configuration)
- [Next Steps](#next-steps)
- [References](#references)
> **Aliyun CLI 3.3.1+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.1 or later for full plugin ecosystem coverage.
>
> **[IMPORTANT]** After installation, run the following command to enable automatic plugin installation:
> ```bash
> aliyun configure set --auto-plugin-install true
> ```
## Installation
### macOS
**Using Homebrew (Recommended)**
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
# Verify version (>= 3.3.1)
aliyun version
```
**Using Binary**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
# Extract
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
# Move to PATH
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
### Linux
**Debian/Ubuntu**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**CentOS/RHEL**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**ARM64 Architecture**
```bash
# Download ARM64 version
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-arm64.tgz
sudo mv aliyun /usr/local/bin/
```
### Windows
**Using Binary**
1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Open new Command Prompt or PowerShell
5. Verify: `aliyun version`
**Using PowerShell**
```powershell
# Download
Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip"
# Extract
Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli
# Add to PATH (requires admin privileges)
$env:Path += ";C:\aliyun-cli"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)
# Verify
aliyun version
```
## Configuration
### Quick Start
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
All `aliyun configure` commands support non-interactive flags, which is the recommended approach —
it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts.
**Where to Get Access Keys**
1. Log in to Aliyun Console: https://ram.console.aliyun.com/
2. Navigate to: AccessKey Management
3. Create a new AccessKey pair
4. Save the secret immediately — it's only shown once
### Configuration Modes
Aliyun CLI supports 6 authentication modes. All examples below use non-interactive flags.
#### 1. AK Mode (Access Key)
Most common mode for personal accounts and scripts.
```bash
aliyun configure set \
--mode AK \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Configuration is stored in `~/.aliyun/config.json`:
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "AK",
"access_key_id": "LTAI5tXXXXXXXX",
"access_key_secret": "8dXXXXXXXXXXXXXXXXXXXXXXXX",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "en"
}
]
}
```
#### 2. StsToken Mode (Temporary Credentials)
For short-lived access (tokens expire in 1-12 hours).
```bash
aliyun configure set \
--mode StsToken \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--sts-token v1.0:XXXXXXXXXXXXXXXX \
--region cn-hangzhou
```
Use cases: CI/CD pipelines, temporary access for external contractors, cross-account access.
#### 3. RamRoleArn Mode (Assume RAM Role)
Assume a RAM role for elevated or cross-account access.
```bash
aliyun configure set \
--mode RamRoleArn \
--access-key-id LTAI5tXXXXXXXX \
--access-key-secret 8dXXXXXXXXXXXXXXXXXXXXXXXX \
--ram-role-arn acs:ram::123456789012:role/AdminRole \
--role-session-name my-session \
--region cn-hangzhou
```
Use cases: cross-account resource access, temporary elevated privileges, role-based access control.
#### 4. EcsRamRole Mode (ECS Instance RAM Role)
Use the RAM role attached to an ECS instance — no credentials needed.
```bash
aliyun configure set \
--mode EcsRamRole \
--ram-role-name MyEcsRole \
--region cn-hangzhou
```
Requirements: must be running on an ECS instance with a RAM role attached.
Use cases: scripts and automation running on ECS instances.
#### 5. RsaKeyPair Mode (RSA Key Pair)
Use RSA key pair for authentication (generate key pair in Aliyun Console first).
```bash
aliyun configure set \
--mode RsaKeyPair \
--private-key /path/to/private-key.pem \
--key-pair-name my-key-pair \
--region cn-hangzhou
```
#### 6. RamRoleArnWithEcs Mode (ECS + RAM Role)
Combine ECS instance role with RAM role assumption for cross-account access from ECS.
```bash
aliyun configure set \
--mode RamRoleArnWithEcs \
--ram-role-name MyEcsRole \
--ram-role-arn acs:ram::123456789012:role/TargetRole \
--role-session-name my-session \
--region cn-hangzhou
```
### Environment Variables
**Highest priority** - overrides config file
**Access Key Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**STS Token Mode**
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_SECURITY_TOKEN=your_sts_token
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
**ECS RAM Role Mode**
```bash
export ALIBABA_CLOUD_ECS_METADATA=role_name
```
**Use Case**:
- CI/CD pipelines
- Docker containers
- Temporary credential override
### Managing Multiple Profiles
**Create Named Profiles**
```bash
aliyun configure set --profile projectA \
--mode AK \
--access-key-id LTAI5tAAAAAAAA \
--access-key-secret 8dAAAAAAAAAAAAAAAAAAAAAAAA \
--region cn-hangzhou
aliyun configure set --profile projectB \
--mode AK \
--access-key-id LTAI5tBBBBBBBB \
--access-key-secret 8dBBBBBBBBBBBBBBBBBBBBBBBB \
--region cn-shanghai
```
**Use Specific Profile**
```bash
aliyun ecs describe-instances --profile projectA
export ALIBABA_CLOUD_PROFILE=projectA
aliyun ecs describe-instances # Uses projectA
```
**List and Switch Profiles**
```bash
aliyun configure list # List all profiles
aliyun configure set --current projectA # Switch default profile
```
### Credential Priority
Credentials are loaded in this order (first found wins):
1. **Command-line flag**: `--profile <name>`
2. **Environment variable**: `ALIBABA_CLOUD_PROFILE`
3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc.
4. **Configuration file**: `~/.aliyun/config.json` (current profile)
5. **ECS Instance RAM Role**: If running on ECS with attached role
## Verification
### Test Authentication
```bash
# Basic test - list regions
aliyun ecs describe-regions
# Expected output: JSON array of regions
```
**If successful**, you'll see:
```json
{
"Regions": {
"Region": [
{
"RegionId": "cn-hangzhou",
"RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com",
"LocalName": "China East 1 (Hangzhou)"
},
...
]
},
"RequestId": "..."
}
```
**If failed**, you'll see error messages:
- `InvalidAccessKeyId.NotFound` - Wrong Access Key ID
- `SignatureDoesNotMatch` - Wrong Access Key Secret
- `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode)
- `Forbidden.RAM` - Insufficient permissions
### Debug Configuration
```bash
# Show current configuration
aliyun configure get
# Test with debug logging
aliyun ecs describe-regions --log-level=debug
# Check credential provider
aliyun configure get mode
```
## Security Best Practices
### 1. Use RAM Users (Not Root Account)
- **Don't**: Use Aliyun root account credentials
- **Do**: Create RAM users with specific permissions
```bash
# Create RAM user in console
# Attach only necessary policies
# Use RAM user's access keys
```
### 2. Principle of Least Privilege
Grant only the minimum permissions needed:
```bash
# Example: Read-only ECS access
# Attach policy: AliyunECSReadOnlyAccess
```
### 3. Rotate Access Keys Regularly
```bash
# Create new access key in RAM Console, then update configuration
aliyun configure set --access-key-id NEW_KEY --access-key-secret NEW_SECRET
# Delete old access key from console
```
### 4. Use STS Tokens for Temporary Access
```bash
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token XXXX --region cn-hangzhou
```
### 5. Use ECS RAM Roles When Possible
```bash
aliyun configure set --mode EcsRamRole --ram-role-name MyRole --region cn-hangzhou
```
### 6. Never Commit Credentials
```bash
# Add to .gitignore
echo "~/.aliyun/config.json" >> .gitignore
# Use environment variables in CI/CD instead
```
### 7. Secure Config File
```bash
# Restrict permissions
chmod 600 ~/.aliyun/config.json
```
## Troubleshooting
### Issue: Command Not Found
```bash
# Check installation
which aliyun
# Check PATH
echo $PATH
# Reinstall or add to PATH
```
### Issue: Authentication Failed
```bash
# Verify configuration
aliyun configure get
# Test with debug
aliyun ecs describe-regions --log-level=debug
# Check credentials in console
# Verify access key is active
```
### Issue: Permission Denied
```bash
# Error: Forbidden.RAM
# Check RAM user permissions
# Attach necessary policies in RAM console
# Example: AliyunECSFullAccess for ECS operations
```
### Issue: STS Token Expired
```bash
# Error: InvalidSecurityToken.Expired
# Reconfigure with new token
aliyun configure set --mode StsToken \
--access-key-id XXXX --access-key-secret XXXX \
--sts-token NEW_TOKEN --region cn-hangzhou
```
### Issue: Wrong Region
```bash
# Some resources may not exist in the specified region
# Check available regions
aliyun ecs describe-regions
# Update default region
aliyun configure set region cn-shanghai
```
## Advanced Configuration
### Custom Endpoint
```bash
# Use custom or private endpoint
export ALIBABA_CLOUD_ECS_ENDPOINT=ecs-vpc.cn-hangzhou.aliyuncs.com
```
### Proxy Settings
```bash
# HTTP proxy
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
# No proxy for specific domains
export NO_PROXY=localhost,127.0.0.1,.aliyuncs.com
```
### Timeout Settings
```bash
# Connection timeout (default: 10s)
export ALIBABA_CLOUD_CONNECT_TIMEOUT=30
# Read timeout (default: 10s)
export ALIBABA_CLOUD_READ_TIMEOUT=30
```
## Next Steps
After installation and configuration:
1. **Install plugins** for services you need (v3.3.1+ supports all published product plugins):
```bash
aliyun plugin install --names ecs vpc rds
# List all available plugins
aliyun plugin list-remote
```
2. **Explore commands**:
```bash
aliyun ecs --help
aliyun fc --help
```
3. **Read documentation**:
- [Command Syntax Guide](./command-syntax.md)
- [Global Flags Reference](./global-flags.md)
- [Common Scenarios](./common-scenarios.md)
## References
- Official Documentation: https://help.aliyun.com/zh/cli/
- RAM Console: https://ram.console.aliyun.com/
- Access Key Management: https://ram.console.aliyun.com/manage/ak
- Plugin Repository: https://github.com/aliyun/aliyun-cli
FILE:references/ram-policies.md
# RAM Permissions
This Skill requires the following Alibaba Cloud RAM permissions to function properly.
## Required Permissions
| Product | Action | Description |
|------------------|--------------------------------------------|--------------------------------------|
| ModelStudio | `modelstudio:ListWorkspaces` | List Bailian workspaces |
| OSS | `ossutil:ls` | List buckets or objects |
| OSS | `ossutil:cp` | Upload, Download or Copy Objects |
| OSS | `ossutil:presign` | Generate a pre-signed URL for object |
| QuanMiaoLightApp | `quanmiaolightapp:SubmitVideoAnalysisTask` | Submit video analysis task |
| QuanMiaoLightApp | `quanmiaolightapp:GetVideoAnalysisTask` | Get video analysis task results |
## Permission Details
### modelstudio:ListWorkspaces
Used to query the list of available Bailian workspaces.
### ossutil:ls, ossutil:cp, ossutil:presign
Used to manage OSS buckets and objects, including listing buckets/objects, uploading/downloading files, and generating temporary access URLs.
### quanmiaolightapp:SubmitVideoAnalysisTask
Used to submit video analysis tasks to Bailian service.
### quanmiaolightapp:GetVideoAnalysisTask
Used to query video analysis task results.
## Authorization Methods
### Use System Policies (Recommended)
1. Visit [Alibaba Cloud RAM Console](https://ram.console.aliyun.com/users)
2. Select the target RAM user
3. Click "Add Permissions" button
4. Search and select the following system policies:
- `AliyunBailianFullAccess` (includes Bailian-related permissions)
- `AliyunModelStudioReadOnlyAccess` (includes ModelStudio-related permissions)
- `AliyunQuanMiaoLightAppFullAccess` (includes QuanMiao-related permissions)
- `AliyunOSSFullAccess` (includes OSS-related permissions, can be restricted to specific buckets)
5. Confirm and add permissions
## Notes
- There may be a delay of approximately 30 seconds after authorization before permissions take effect
- If you encounter `403` or `Index.NoWorkspacePermissions` errors, please check:
1. Whether the RAM user has been granted the above permissions
2. Whether workspace permissions have been granted to the user in the Bailian console
---
## Permission Failure Handling
When any command or API call fails due to permission errors at any point during execution, follow this process:
1. **Read this file** (`references/ram-policies.md`) to get the full list of permissions required by this SKILL
2. **Use `ram-permission-diagnose` skill** to guide the user through requesting the necessary permissions
3. **Pause and wait** until the user confirms that the required permissions have been granted
4. **Retry the failed operation** after permissions are confirmed
**Important:** Never proceed with operations that require permissions the user does not have. Always pause and wait for explicit confirmation.
FILE:references/related-commands.md
# Related Commands: alibabacloud-bailian-videoanalysis
## Available Python Scripts
All scripts are located in the `scripts/` directory:
| Script | Purpose | Required Parameters | Optional Parameters |
|---------------------------------------------------|----------------------------------------------------------|--------------------------------|----------------------------------------------------|
| `check_env.py` | Check environment configuration (packages + credentials) | None | None |
| `quanmiao_submit_videoAnalysis_task.py` | Submit video analysis task to Bailian | `--workspace_id`, `--file_url` | None |
| `quanmiao_get_videoAnalysis_task_result.py` | Get video analysis task result | `--workspace_id`, `--task_id` | None |
## Aliyun CLI Commands
**Important:** All `aliyun` CLI commands MUST include `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis`.
| Command | Purpose |
|-------------------------------------------------------------------|---------------------------------------------|
| `aliyun version` | Verify CLI version (>= 3.3.1) |
| `aliyun configure list` | Check credential status (NEVER print AK/SK) |
| `aliyun configure set --auto-plugin-install true` | Enable automatic plugin installation |
| `aliyun modelstudio list-workspaces` | List Bailian workspaces |
| `aliyun ossutil ls` | List OSS buckets |
| `aliyun ossutil cp <local-file> oss://<bucket>/<key>` | Upload file to OSS |
| `aliyun ossutil sign oss://<bucket>/<key> --expires-duration 2h` | Generate temporary URL |
| `aliyun ossutil rm oss://<bucket>/<key>` | Delete uploaded OSS object (cleanup) |
## Execution Order
```
1. check_env.py (environment validation)
↓
2. aliyun modelstudio list-workspaces (get workspace_id)
↓
3a. [If local file] aliyun ossutil cp + sign (upload & get URL)
3b. [If URL provided] Skip upload, use URL directly
↓
4. quanmiao_submit_videoAnalysis_task.py (submit task)
↓
5. quanmiao_get_videoAnalysis_task_result.py (poll loop)
↓
6. Summarize (no script call, use Step 5 result directly)
```
FILE:references/verification-method.md
# Verification Method: alibabacloud-bailian-videoanalysis
Step-by-step verification commands to confirm successful execution at each workflow stage.
---
## Step 1: Environment Check Verification
**Command:**
```bash
python scripts/check_env.py
```
**Expected Output (Success):**
```json
{
"pythonPackagesInstalled": {
"alibabacloud-quanmiaolightapp20240801": true,
"alibabacloud-openapi-util": true,
"alibabacloud-credentials": true,
"alibabacloud-tea-openapi": true,
"alibabacloud-tea-util": true
},
"allPythonPackagesInstalled": true,
"credentialsConfigured": true,
"ready": true,
"errors": []
}
```
**Verification Criteria:**
- `ready` field is `true`
- `allPythonPackagesInstalled` field is `true`
- `credentialsConfigured` field is `true`
- `errors` array is empty
**Failure Actions:**
- If `allPythonPackagesInstalled` is `false` → Run `pip install -r scripts/requirements.txt`
- If `credentialsConfigured` is `false` → Guide user to run `aliyun configure` outside session
**Additional CLI Verification:**
```bash
# Verify Aliyun CLI version >= 3.3.1
aliyun version
# Verify credentials are configured
aliyun configure list
```
---
## Step 2: Workspace Listing Verification
**Command:**
```bash
aliyun modelstudio list-workspaces --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis
```
**Expected Output (Success):**
```json
{
"RequestId": "...",
"Workspaces": [
{
"WorkspaceId": "llm-xxx",
"Name": "Default Workspace"
}
]
}
```
**Verification Criteria:**
- `Workspaces` array is non-empty
- Each workspace has `WorkspaceId` and `Name` fields
- `WorkspaceId` starts with `llm-` prefix
**Failure Actions:**
- If `Workspaces` is empty → User may not have activated Bailian service; guide to [Bailian console](https://bailian.console.aliyun.com/cn-beijing#/app/app-market/quanmiao/video-comprehend)
- If error contains `No workspace permissions` → Check RAM permissions and Bailian workspace authorization
---
## Step 3: OSS Upload Verification
**Commands:**
```bash
# List available buckets
aliyun ossutil ls --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis
# Upload file to OSS
aliyun ossutil cp <local-file> oss://<bucket>/<key> --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis --region <region>
# Generate temporary URL
aliyun ossutil sign oss://<bucket>/<key> --expires-duration 7200 --user-agent AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis --region <region>
```
**Expected Output (Sign Command Success):**
```
https://my-bucket.oss-cn-beijing.aliyuncs.com/temp/quanmiao/20260409/video.mp4?Signature=xxx&Expires=xxx&OSSAccessKeyId=xxx
```
**Verification Criteria:**
- Generated URL is a valid HTTPS URL containing OSS domain and signature parameters
- URL includes `Signature`, `Expires`, and `OSSAccessKeyId` query parameters
- Bucket name and object key match the uploaded file
**Failure Actions:**
- If upload fails with permission error → Follow Permission Failure Handling in RAM Policy section
- If file not found → Verify local file path points to an existing file
- If no buckets available → Create an OSS bucket first or use user-provided video URL directly
---
## Step 4: Task Submission Verification
**Command:**
```bash
python scripts/quanmiao_submit_videoAnalysis_task.py --workspace_id <workspace_id> --file_url <tempUrl>
```
**Expected Output (Success):**
```json
{
"task_id": "xxxx"
}
```
**Verification Criteria:**
- Response contains a non-empty `task_id` field
- No error code or message in response
**Failure Actions:**
- If `task_id` is missing → Check that `workspace_id` exists and `file_url` is valid and not expired
- If permission error → Follow Permission Failure Handling in RAM Policy section
---
## Step 5: Task Result Polling Verification
**Command:**
```bash
python scripts/quanmiao_get_videoAnalysis_task_result.py --workspace_id <workspace_id> --task_id <task_id>
```
**Expected Output (SUCCESSED):**
```json
{
"header": {
"taskId": "...",
"event": "task-finished",
"sessionId": "...",
"eventInfo": "完成视频理解"
},
"payload": {
"output": {
"videoTitleGenerateResult": { "text": "..." },
"videoCaptionResult": { "videoCaptions": [] },
"videoAnalysisResult": { "text": "..." },
"videoGenerateResults": [{ "text": "..." }],
"videoMindMappingGenerateResult": { "text": "...", "videoMindMappings": [] },
"videoCalculatorResult": { "items": [] }
},
"usage": {
"inputTokens": 1,
"outputTokens": 1,
"totalTokens": 2
}
},
"requestId": "..."
}
```
**Verification Criteria:**
- `header.event` equals `"task-finished"`
- `payload.output` contains all expected result fields
- `payload.usage` contains token counts
**Status Handling:**
| Status | Action |
|-------------|---------------------------------------------|
| `PENDING` | Wait 10-15s, retry |
| `RUNNING` | Display partial results, wait 10-15s, retry |
| `SUCCESSED` | Proceed to Step 6 |
| `FAILED` | Check error message, inform user |
| `CANCELED` | Inform user task was canceled |
**Maximum retries:** 180 (approximately 30 minutes)
---
## Step 6: Summary Verification
**Verification Criteria:**
- Summary uses data from Step 5 result directly (no additional API calls)
- Output includes all sections: title, outline, overview, captions, shot analysis, timeline, summary, token usage
- Token usage numbers match `payload.usage` from Step 5 result
---
FILE:scripts/check_env.py
#!/usr/bin/env python3
"""
Check the Bailian SDK environment and credential configuration.
Returns a JSON object with the check results.
Uses the Alibaba Cloud default credential chain; does not directly read AccessKey/SecretKey.
"""
import subprocess
import json
import sys
try:
from alibabacloud_credentials.client import Client as CredentialClient
except ImportError:
CredentialClient = None
# Required Python packages list
REQUIRED_PACKAGES = [
'alibabacloud-quanmiaolightapp20240801',
'alibabacloud-openapi-util',
'alibabacloud-credentials',
'alibabacloud-tea-openapi',
'alibabacloud-tea-util'
]
def check_package_installed(package_name):
"""Check if Python package is installed using importlib.metadata (Python 3.8+)"""
try:
# Use importlib.metadata which is more reliable than pip commands
from importlib.metadata import version
version(package_name)
return True
except ImportError:
# Fallback for older Python versions
try:
import pkg_resources
pkg_resources.get_distribution(package_name)
return True
except (ImportError, pkg_resources.DistributionNotFound):
return False
except Exception:
return False
def check_env():
result = {
'pythonPackagesInstalled': {},
'allPythonPackagesInstalled': False,
'credentialsConfigured': False,
'ready': False,
'errors': []
}
# Check if credentials can be obtained through default credential chain
try:
if CredentialClient is None:
raise ImportError('alibabacloud-credentials not installed')
credential = CredentialClient()
# Try to get credentials to verify credential chain is available
credential.get_credential().access_key_id
result['credentialsConfigured'] = True
except Exception as error:
result['errors'].append('Alibaba Cloud credentials not configured, please run `aliyun configure` to configure credentials')
result['credentialsConfigured'] = False
# Check if all required Python packages are installed
all_installed = True
for pkg in REQUIRED_PACKAGES:
if check_package_installed(pkg):
result['pythonPackagesInstalled'][pkg] = True
else:
result['pythonPackagesInstalled'][pkg] = False
result['errors'].append(f'Python package not installed: {pkg}')
all_installed = False
result['allPythonPackagesInstalled'] = all_installed
# Determine if ready
result['ready'] = result['credentialsConfigured'] and result['allPythonPackagesInstalled']
print(json.dumps(result, indent=2, ensure_ascii=False))
if __name__ == '__main__':
check_env()
FILE:scripts/quanmiao_get_videoAnalysis_task_result.py
#!/usr/bin/env python3
"""
Get video analysis task result from Bailian.
Uses the Alibaba Cloud default credential chain.
Optionally save the result to a local JSON file.
"""
import sys
import json
import argparse
import os
from pathlib import Path
from alibabacloud_quanmiaolightapp20240801.client import Client as QuanMiaoLightApp20240801Client
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_quanmiaolightapp20240801 import models as quan_miao_light_app_20240801_models
from alibabacloud_tea_util import models as util_models
def create_client() -> QuanMiaoLightApp20240801Client:
"""
Initialize client using credential chain
@return: Client
@throws Exception
"""
credential = CredentialClient()
config = open_api_models.Config(
credential=credential,
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis'
)
# Endpoint refer to https://api.aliyun.com/product/QuanMiaoLightApp
config.endpoint = f'quanmiaolightapp.cn-beijing.aliyuncs.com'
return QuanMiaoLightApp20240801Client(config)
def main(workspace_id, task_id, save_path=None):
client = create_client()
get_video_analysis_task_request = quan_miao_light_app_20240801_models.GetVideoAnalysisTaskRequest(
task_id=task_id
)
runtime = util_models.RuntimeOptions(
read_timeout=30000,
connect_timeout=5000
)
headers = {}
try:
resp = client.get_video_analysis_task_with_options(workspace_id, get_video_analysis_task_request, headers, runtime)
result_data = resp.body.to_map()
# Save to file if save_path is provided and status is SUCCESSED
if save_path and result_data.get('payload', {}).get('output', {}).get('taskStatus') == 'SUCCESSED':
save_result_to_file(result_data, save_path)
# Print result to stdout
print("\nRaw result: \n\n" + json.dumps(result_data, indent=2, ensure_ascii=False))
except Exception as error:
error_data = getattr(error, 'data', {})
recommend = error_data.get('Recommend', '') if isinstance(error_data, dict) else ''
print(json.dumps({
'error': str(error),
'recommend': recommend
}, indent=2, ensure_ascii=False))
sys.exit(1)
# Parameter validation functions
def validate_workspace_id(arg):
if not arg or arg.strip() == '':
raise ValueError('workspace_id cannot be empty')
if not isinstance(arg, str):
raise ValueError('workspace_id must be a string type')
# Trim whitespace
trimmed = arg.strip()
if len(trimmed) > 64:
raise ValueError('workspace_id length cannot exceed 64 characters')
# Only allow letters, numbers, hyphens, and underscores
import re
if not re.match(r'^[a-zA-Z0-9_-]+$', trimmed):
raise ValueError('workspace_id contains invalid characters, only letters, numbers, hyphens and underscores are allowed')
return trimmed
def validate_task_id(arg):
if not arg or arg.strip() == '':
raise ValueError('task_id cannot be empty')
if not isinstance(arg, str):
raise ValueError('task_id must be a string type')
# Trim whitespace
trimmed = arg.strip()
if len(trimmed) > 128:
raise ValueError('task_id length cannot exceed 128 characters')
return trimmed
def save_result_to_file(result_data, save_path):
"""
Save the result data to a JSON file.
Args:
result_data: The result data to save
save_path: Path to save the JSON file
"""
try:
# Create directory if it doesn't exist
save_dir = os.path.dirname(save_path)
if save_dir:
Path(save_dir).mkdir(parents=True, exist_ok=True)
# Write JSON file
with open(save_path, 'w', encoding='utf-8') as f:
json.dump(result_data, f, indent=2, ensure_ascii=False)
print(f"✅ Raw JSON result saved to: {save_path}", file=sys.stderr)
except Exception as e:
print(f"⚠️ Warning: Failed to save raw JSON result to {save_path}: {str(e)}", file=sys.stderr)
# Get parameters from command line arguments
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Get video analysis task result from Bailian')
parser.add_argument('--workspace_id', required=True, help='Workspace ID')
parser.add_argument('--task_id', required=True, help='Task ID')
parser.add_argument('--save_path', required=False, default=None,
help='Path to save JSON result (only saves when taskStatus=SUCCESSED)')
args = parser.parse_args()
try:
workspace_id_arg = validate_workspace_id(args.workspace_id)
task_id_arg = validate_task_id(args.task_id)
main(workspace_id_arg, task_id_arg, args.save_path)
except Exception as error:
print(json.dumps({'error': str(error)}, indent=2, ensure_ascii=False))
sys.exit(1)
FILE:scripts/quanmiao_submit_videoAnalysis_task.py
#!/usr/bin/env python3
"""
Submit video analysis task to Bailian.
Uses the Alibaba Cloud default credential chain.
"""
import sys
import json
import argparse
from alibabacloud_quanmiaolightapp20240801.client import Client as QuanMiaoLightApp20240801Client
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_quanmiaolightapp20240801 import models as quan_miao_light_app_20240801_models
from alibabacloud_tea_util import models as util_models
def create_client() -> QuanMiaoLightApp20240801Client:
"""
Initialize client using credential chain
@return: Client
@throws Exception
"""
credential = CredentialClient()
config = open_api_models.Config(
credential=credential,
user_agent='AlibabaCloud-Agent-Skills/alibabacloud-bailian-videoanalysis'
)
# Endpoint refer to https://api.aliyun.com/product/QuanMiaoLightApp
config.endpoint = f'quanmiaolightapp.cn-beijing.aliyuncs.com'
return QuanMiaoLightApp20240801Client(config)
def main(workspace_id, file_url):
client = create_client()
submit_video_analysis_task_request = quan_miao_light_app_20240801_models.SubmitVideoAnalysisTaskRequest(
video_url=file_url
)
runtime = util_models.RuntimeOptions(
read_timeout=30000,
connect_timeout=5000
)
headers = {}
try:
resp = client.submit_video_analysis_task_with_options(workspace_id, submit_video_analysis_task_request, headers, runtime)
status = resp.body.http_status_code
if status == 200:
# 输出任务ID
result = {
'task_id': resp.body.data.task_id if resp.body.data else None
}
print(json.dumps(result, indent=2, ensure_ascii=False))
else:
print(json.dumps(resp.body.to_map(), indent=2, ensure_ascii=False))
except Exception as error:
error_data = getattr(error, 'data', {})
recommend = error_data.get('Recommend', '') if isinstance(error_data, dict) else ''
print(json.dumps({
'error': str(error),
'recommend': recommend
}, indent=2, ensure_ascii=False))
sys.exit(1)
# Parameter validation functions
def validate_workspace_id(arg):
if not arg or arg.strip() == '':
raise ValueError('workspace_id cannot be empty')
if not isinstance(arg, str):
raise ValueError('workspace_id must be a string type')
# Trim whitespace
trimmed = arg.strip()
if len(trimmed) > 64:
raise ValueError('workspace_id length cannot exceed 64 characters')
# Only allow letters, numbers, hyphens, and underscores
import re
if not re.match(r'^[a-zA-Z0-9_-]+$', trimmed):
raise ValueError('workspace_id contains invalid characters, only letters, numbers, hyphens and underscores are allowed')
return trimmed
def validate_file_url(arg):
if not arg or arg.strip() == '':
raise ValueError('fileUrl cannot be empty')
if not isinstance(arg, str):
raise ValueError('fileUrl must be a string type')
# Trim whitespace
trimmed = arg.strip()
# Basic URL format validation
if not trimmed.startswith(('http://', 'https://')):
raise ValueError('fileUrl must be a valid HTTP/HTTPS URL')
return trimmed
# Get parameters from command line arguments
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Submit video analysis task to Bailian')
parser.add_argument('--workspace_id', required=True, help='Workspace ID')
parser.add_argument('--file_url', required=True, help='File URL (OSS temporary URL)')
args = parser.parse_args()
try:
workspace_id_arg = validate_workspace_id(args.workspace_id)
file_url_arg = validate_file_url(args.file_url)
main(workspace_id_arg, file_url_arg)
except Exception as error:
print(json.dumps({'error': str(error)}, indent=2, ensure_ascii=False))
sys.exit(1)
FILE:scripts/requirements.txt
alibabacloud-openapi-util==0.2.4
alibabacloud-credentials==1.0.8
alibabacloud-tea-util==0.3.14
alibabacloud-tea-openapi==0.4.4
alibabacloud-quanmiaolightapp20240801>=2.13.8Alibaba Cloud Elasticsearch instance diagnosis skill. Use for cluster health checks, troubleshooting, and performance analysis on Elasticsearch instances. Tr...
---
name: alibabacloud-elasticsearch-instance-diagnose
description: |
Alibaba Cloud Elasticsearch instance diagnosis skill. Use for cluster health checks, troubleshooting, and performance analysis on Elasticsearch instances.
Triggers (English): Elasticsearch diagnosis, ES instance issues, slow search, write failures, cluster Red/Yellow, unassigned shards, node disconnected, load imbalance, thread pool 429, JVM/OOM/circuit breaker, disk watermark / read-only index, instance activating / change stuck, service avalanche / all shards failed.
触发词(中文): ES诊断、阿里云ES、Elasticsearch诊断、ES集群/实例故障排查、ES健康检查、集群红灯/变红/黄灯/变黄、集群异常、分片未分配、主分片未分配、节点掉线/离线、负载不均衡、搜索/查询变慢、慢查询、写入失败/变慢/拒绝、线程池打满、HTTP 429、内存过高、OOM、断路器、磁盘满/水位、索引只读、实例激活中/activating、变更卡住/未完成、雪崩、服务不可用、all shards failed。
---
# Alibaba Cloud Elasticsearch Instance Diagnosis
Collect signals from **Alibaba Cloud OpenAPI (control plane)** and the **Elasticsearch REST API (data plane)**, combine them with the SOP knowledge base under `references/`, and produce **root-cause analysis**, an **evidence chain**, **prioritized remediation guidance**, and—when multiple dimensions fire—a **recency-ordered incident timeline** (severity vs time in window; see **Timeline and recency (MUST)** in §5 Step 4).
**Architecture**: Alibaba Cloud Elasticsearch OpenAPI + Alibaba CloudMonitor (CMS) + Elasticsearch REST API + diagnostic SOPs
**Closure**: If MUST applies and `ES_*` is set, finish authenticated ES API evidence before the final report (see **Feasibility order** in §5).
---
## 1. Prerequisites
### 1.1 Aliyun CLI
> **Pre-check: Aliyun CLI >= 3.3.1 required** (for RAM permission checks and OpenAPI CLI fallback)
> Run `aliyun version` to verify the version is >= 3.3.1. If the CLI is missing or too old, see `references/cli-installation-guide.md`.
> After installation, run `aliyun configure set --auto-plugin-install true` to enable automatic plugin installation (**do not** pass plaintext AccessKey pairs on this command line; see §1.2).
### 1.2 Alibaba Cloud account authentication and security (MUST)
> **Security rules (mandatory):**
> - **NEVER** read, echo, or print AccessKey ID or AccessKey Secret values.
> - **NEVER** prompt or ask the user to paste plaintext AccessKeys in the conversation.
> - **NEVER** embed AccessKeys in scripts, CLI arguments, or `curl` URLs.
> - **NEVER** use `aliyun configure set` (or similar) to pass **literal** AccessKey ID/Secret on the command line.
> - **NEVER** accept AccessKeys that the user pastes into the chat, even if offered voluntarily.
> - **ONLY** use configured CLI profiles (`aliyun configure`) or environment variables such as `ALIBABA_CLOUD_ACCESS_KEY_ID` / `ALIBABA_CLOUD_ACCESS_KEY_SECRET` that the user has set **in their local shell** (the agent **must not** echo those values in the session).
> **⚠️ If the user provides AccessKeys in the chat (e.g. “my AK is xxx”)**
>
> 1. **Stop immediately**: do not run any Alibaba Cloud command that requires credentials.
> 2. **Decline politely** and give **only** the names of approved configuration methods (**do not** repeat any secret the user may have leaked):
> - Recommended: run `aliyun configure` in a local terminal and enter credentials when prompted; credentials are stored in the local profile file.
> - Alternatively: set `ALIBABA_CLOUD_ACCESS_KEY_ID` / `ALIBABA_CLOUD_ACCESS_KEY_SECRET` in the local shell (the user types values **only in the terminal**, not in chat).
> 3. Resume the diagnosis request only after credentials are configured correctly.
> **Verify credentials without exposing secrets:**
>
> ```bash
> aliyun configure list
> aliyun --profile <profile_name> sts get-caller-identity
> ```
>
> **Credential policy:**
> 1. Prefer an `aliyun configure` profile (default or `--profile`).
> 2. If there is no valid identity (`configure list` / `get-caller-identity` fails), **STOP** and guide the user to configure locally; **do not** guess or fabricate credentials.
> 3. Never pass plaintext AccessKeys through the conversation.
### 1.3 Elasticsearch direct-connect credential boundary
> - **NEVER** ask the user to paste `ES_PASSWORD` in chat; **NEVER** echo, print, or log the password; **NEVER** copy a password from chat into commands, hooks, or repo files.
> - Shell expansion for `curl -u "$ES_USERNAME:$ES_PASSWORD"` (or equivalent) is **allowed** when vars are **pre-exported in the user’s local shell**; **NEVER** put the secret as a literal in chat, scripts checked into repos, or command output.
> - If the user tries to send a password in chat: **STOP** as well and ask them to set `ES_PASSWORD` only locally via `export` (see §2.2).
---
## 2. Environment setup
### 2.1 Control plane OpenAPI (via Aliyun CLI)
All control-plane and CMS data collection for this skill uses the **Aliyun CLI**.
**User-Agent (required)**: set a User-Agent for Alibaba Cloud API calls:
```bash
export ALIBABA_CLOUD_USER_AGENT="AlibabaCloud-Agent-Skills"
```
**CLI hardening (recommended)**: when authoring raw `aliyun` commands, add **`--connect-timeout 3 --read-timeout 10`** (increase `read-timeout` for large responses or CMS), consistent with the instance-management skill examples, to avoid indefinite hangs on network faults. If the global User-Agent is not set, add **`--user-agent AlibabaCloud-Agent-Skills`** per invocation. For **optional Elasticsearch probes** inside `check_es_instance_health.py` (when `ES_*` is set), the same knobs exist as **`--connect-timeout`** / **`--read-timeout`** on that script — they map to `curl` for engine calls only, not to the Aliyun OpenAPI client.
Run before diagnosis:
```bash
aliyun version
aliyun configure list
aliyun --profile <profile_name> sts get-caller-identity
```
### 2.2 Elasticsearch API direct access (`curl`)
Have the user set connection variables in a **local terminal** after you confirm the Elasticsearch endpoint (VPC or public) and admin credentials—**do not** hardcode user-specific values in chat:
```bash
export ES_ENDPOINT="http://<elasticsearch-endpoint-ip>:9200"
export ES_USERNAME="elastic"
export ES_PASSWORD="<elasticsearch-admin-password>"
```
> **Public access and `http` vs `https`:** From **`DescribeInstance`**, use **`publicDomain`** / **`domain`** and the reported **`protocol`**. When **`protocol` is `HTTP`** (typical public listener), set **`ES_ENDPOINT` to `http://<publicDomain>:9200`**. Using **`https://`** against an **HTTP-only** endpoint causes **TLS** errors (e.g. **`WRONG_VERSION_NUMBER`**). Use **`https://`** only when **`protocol` is `HTTPS`** (or TLS is actually enabled on the port you use), and supply CA / fingerprint options as in **HTTPS options** below.
>
> **If `http://` “does not work” — when to try `https://`:** Treat **`DescribeInstance` `protocol`** as the **source of truth** for the **REST listener**. **`000`**, **timeouts**, or **connection refused** on **`http://`** usually mean **network path / allowlist / security group / wrong host or port** — **not** “try HTTPS next” when **`protocol` is still `HTTP`**. **Do** switch to **`https://`** when **`protocol` is `HTTPS`** (or the console / product doc states TLS on that endpoint) and the failure on `http://` is a **TLS or scheme** symptom (e.g. **`WRONG_VERSION_NUMBER`**, **`error:0A00010B`**, immediate SSL alert while probing with the wrong scheme). If **`protocol` is `HTTP`** and only **plain TCP** is advertised, **HTTPS is not a fallback** for reachability.
> **Credential safety**
> - **NEVER** echo, print, or log `ES_PASSWORD`; **NEVER** copy credentials from chat into shell history or saved files.
> - **NEVER** ask the user to paste the password in plaintext in chat.
> - **ONLY** use the following checks to verify that variables are set:
> ```bash
> [[ -n "$ES_ENDPOINT" ]] && echo "ES_ENDPOINT: $ES_ENDPOINT" || echo "ES_ENDPOINT: NOT SET"
> [[ -n "$ES_PASSWORD" ]] && echo "ES_PASSWORD: SET" || echo "ES_PASSWORD: NOT SET"
> ```
> **Network connectivity and access control**
>
> | Issue | How to check | Mitigation |
> |------|--------------|------------|
> | Public network access disabled | Elasticsearch console → **Network** | Enable public access or use the VPC endpoint |
> | Public access allowlist | Console → **Security** → **Public access allowlist** | Add the agent host’s public IP |
> | VPC isolation | e.g. `telnet <ES_IP> 9200` | VPC peering, Express Connect, or equivalent |
> | Security group | Inbound rules on the ECS/security group hosting Elasticsearch | Allow TCP **9200** (or the configured port) |
> **Connectivity probe**: `curl -sS -o /dev/null -w "%{http_code}" --connect-timeout 5 "ES_ENDPOINT"` — HTTP code `000` usually means the path is unreachable. **`401` without `-u` is normal** (auth required); if `ES_PASSWORD` is SET, proceed to **authenticated** `GET /_cluster/health` (§7). **`401` with `-u`** → wrong credentials. **`000` / refused / timeout** → network, allowlist, or TLS/scheme mismatch.
> **HTTPS — prerequisites (what must be true)**
> 1. **Listener:** The Elasticsearch HTTP port you call (**9200** unless changed) must actually speak **TLS** — align with **`DescribeInstance` `protocol`** (**`HTTPS`**) or console/network documentation.
> 2. **URL:** **`https://<host>:<port>`** with the same **host** (e.g. **`publicDomain`**) you would use for HTTP.
> 3. **Client trust of the server certificate:** Your client must trust the cluster’s certificate chain (cluster / cloud **CA PEM**, or corporate proxy CA if TLS is intercepted). **`curl`**: prefer **`curl --cacert /path/to/ca.crt ...`**; **`-k` / `--insecure`** only for **short, non-production** diagnosis.
> 4. **Auth:** Same **`ES_USERNAME` / `ES_PASSWORD`** as for HTTP (Basic auth over TLS).
>
> **HTTPS — how this skill documents it**
> - **Manual `curl` (§7 and [es-api-call-failures.md](references/es-api-call-failures.md)):** Add **`--cacert`** (or **`-k`** for testing) to every **`curl`** when using **`https://`** if the default trust store does not include your cluster CA.
> - **`check_es_instance_health.py` optional ES probes:** They invoke **`curl`** with **`-u`** only; they **do not** read **`ES_CA_CERTS`** / **`ES_SSL_FINGERPRINT`** / **`ES_VERIFY_CERTS`** (those names are common for **Python Elasticsearch** clients). For **HTTPS** instances, use **§7 `curl`** with **`--cacert`** for deep checks, or extend the script later to pass **`--cacert`** from an env var.
> - **Python-style env vars (reference for other tooling):** `ES_CA_CERTS`, `ES_SSL_FINGERPRINT`, `ES_VERIFY_CERTS=false` (testing only) — **not** wired into this repo’s optional **`curl`** path today.
---
## 3. RAM permission check
> **[MUST] RAM permission pre-check**
>
> Before running this skill, verify the principal has the required RAM permissions.
> See `references/ram-policies.md` for the full list.
> If the user reports insufficient permissions, direct them to attach the corresponding policies in the RAM console.
---
## 4. Parameter confirmation
> **IMPORTANT: Parameter confirmation**
> Confirm the following with the user **before** any command or API call.
> Do not assume undeclared defaults or hardcode user-specific parameters.
> **Boundary controls (MUST)**
> - **`region-id` and `instance-id` must not be guessed** or taken from unverified defaults; if they disagree with `DescribeInstance` or the user’s explicit statement, reconfirm.
> - **Do not** apply metrics, logs, or `DescribeInstance` conclusions from **instance A** to **instance B**; `ES_ENDPOINT` must match the instance under diagnosis (see **Pre-flight validation for Elasticsearch API** below).
> - This skill is **read-only diagnosis**: **do not** invoke mutating control-plane APIs (create, resize, restart, delete instance, etc.). If the user requests a change, provide recommendations only; execution belongs in the console or an approved change workflow.
| Parameter | Required | Description | Default |
|-----------|----------|-------------|---------|
| `instance-id` | Yes | Elasticsearch instance ID, e.g. `es-cn-xxxxx` | - |
| `region-id` | Yes | Region ID, e.g. `cn-hangzhou` | - |
| `profile` | No | Aliyun CLI profile (explicit `--profile` recommended) | `default` |
| `ES_ENDPOINT` | No | Elasticsearch endpoint (direct API access only) | - |
| `ES_PASSWORD` | No | Elasticsearch admin password (direct API access only) | - |
| `--window` | No | `check_es_instance_health.py`: analysis window in minutes (default **60**) | 60 |
| `--connect-timeout`, `--read-timeout` | No | `check_es_instance_health.py`: `curl` timeouts for optional **ES engine** probes when `ES_*` is set (`--connect-timeout` → `curl --connect-timeout`; **`--read-timeout`** contributes to **`curl -m`** together with connect). Defaults **5** / **10** seconds. | 5 / 10 |
---
## 5. End-to-end diagnostic workflow
### Agent hard rules (non-negotiable)
> **OpenAPI/CMS cannot replace MUST engine APIs.** For any **§5 MUST** table row or **`check_es_instance_health.py` rule-engine MUST**, Alibaba Cloud OpenAPI and CloudMonitor do **not** replace the listed Elasticsearch REST calls for engine-level root cause—when **feasibility** holds, run those `curl` endpoints (see §7); they are complementary layers, not interchangeable.
>
> **Feasibility is decided only by checks, not by assumption.** Whether the agent may call Elasticsearch **must** be determined by actually running the **Feasibility order** (§5): at minimum verify `ES_ENDPOINT` / `ES_PASSWORD` per §2.2, align `ES_ENDPOINT` with `DescribeInstance`, then authenticated `GET /_cluster/health`. **Do not** assume `ES_*` is unset or the path is unreachable without performing these steps in the session.
For Elasticsearch incidents, follow these **four steps**; each has a distinct role.
### Execution strategy (root-cause driven)
> Full policy: [es-api-diagnosis-strategy.md](references/es-api-diagnosis-strategy.md)
Data-plane `curl` collection requires **both**:
1. **Feasibility**: `ES_ENDPOINT` and `ES_PASSWORD` are set and the network path works.
2. **Necessity**: root-cause analysis needs data-plane evidence that the control plane or CMS cannot establish alone.
> For endpoints **listed** under a fired **MUST** table row **or** **rule-engine MUST**, **necessity** for those calls is **already satisfied** by the trigger—still require **feasibility** (**Feasibility order**). For **optional** engine `curl` **not** in those lists, apply **feasibility** and **necessity** per [es-api-diagnosis-strategy.md](references/es-api-diagnosis-strategy.md).
**MUST triggers** (if **any** CMS condition below holds, collect the listed Elasticsearch evidence):
| Trigger | Scenario | Required Elasticsearch evidence |
|---------|----------|----------------------------------|
| `ClusterStatus` max ≥ Yellow / Red | Cluster health | `allocation/explain`, `_cat/shards` |
| `NodeCPUUtilization` max > 80% | CPU overload | `_nodes/hot_threads`, `_tasks` |
| `NodeHeapMemoryUtilization` max > 85% | Memory pressure | `_nodes/stats/breaker`, `GET /_cluster/settings?include_defaults=true` ( **`indices.breaker.*`** in transient / persistent ) |
| Thread pool `rejected` > 0 | Performance | `_nodes/hot_threads`, `_nodes/stats/thread_pool` |
| Inter-node resource CV > 0.3 | Load imbalance | `_cat/shards`, `_cat/allocation` |
| Write failures or index read-only | Disk / watermark / blocks | `_cluster/settings`, `_all/_settings?filter_path=*.settings.index.blocks`, `_cat/allocation` |
| Intermittent Elasticsearch API timeouts + CMS CPU > 80% | Possible cascading failure | `_nodes/hot_threads`, `_nodes/stats/thread_pool`, `_tasks` |
> **Thread-pool row:** interpret **search** vs **write** / **bulk** using [sop-query-thread-pool.md](references/sop-query-thread-pool.md) vs [sop-write-performance.md](references/sop-write-performance.md) (see also **Write-path / bulk saturation** below).
> **Rule-engine MUST:** If `check_es_instance_health.py` prints a **§5 MUST / §5–§7** callout for this run, treat it like a row above—collect that listed ES evidence when feasibility holds.
> **Binding rule (MUST triggers):** If **any** MUST-trigger row **or** the **rule-engine MUST** line above applies, **necessity is satisfied** for that evidence set—OpenAPI/CMS cannot replace those calls for engine-level root cause (cluster-health: `allocation/explain` + `_cat/shards` for Yellow/Red). Confirm **feasibility** per **Feasibility order** below. If reachable with auth, **run the MUST-listed endpoints in Step 2** in parallel with control-plane collection. If still blocked **after authenticated** `GET /_cluster/health`, lead with **blocking reason**: unset `ES_*`; transport failure (`000`, refused, timeout); **401 with `-u`**; scheme/TLS mismatch—not **401 on an unauthenticated probe** when `ES_PASSWORD` is SET.
### Write-path / bulk saturation
> If **`ThreadPool.WriteRejected`** or **`write`** pool stress matches **high-QPS bulk** indexing, read and follow **`references/sop-write-performance.md` — §2**, subsection **“Evidence interpretation: bulk QPS → write pool”** for the evidence chain, **`rejected`** semantics (cumulative since node start), **report ordering vs Old GC / heap** (causal chain or dual P0 — write path before JVM-only headline), **per-node `rejected`/`completed` numbers** (reject share), per-node asymmetry, and write-only vs search. **Do not** lead with a JVM-only narrative when that subsection applies. For **write-queue–style** acceptance prompts, the **opening conclusion** should read as **write-capacity** (data-plane counters + optional CMS rule names), not **only** a GC/heap headline.
### Search-primary vs write (both pools show cumulative `rejected`)
> When **`_nodes/stats/thread_pool`** shows **`search.rejected` ≫ `write.rejected`** on the same node(s) and **`ThreadPool.SearchRejected`** / **query-driven** overload applies, **lead** the **executive summary** and **P0 ordering** with **`search`** (high concurrent query / terms / slow query; hot index when verified) — **not** **`write`** first. **`write.rejected`** may remain **P0/P1** as **parallel** or **secondary** (bulk, catch-up); **Old GC / CPU / node disconnect** stay **co-stress or cascade**. Checker **listing order** is not proof of narrative order — see [acceptance-criteria.md](references/acceptance-criteria.md) **§6.5** and [sop-query-thread-pool.md](references/sop-query-thread-pool.md) *Report narrative*.
>
> **Recency overrides this magnitude default** when **time-resolved** evidence exists: **do not** rank the opening story by **`search.rejected` vs `write.rejected` alone** — cumulative counters lack timestamps. Full rubric: [acceptance-criteria.md](references/acceptance-criteria.md) **§6.5** (*P0 / executive order vs `search` ≫ `write`*: **unless** write dominated **by time**) and **§6.6** (*Executive order*, *No false recency from counters*). **Binding:** **Timeline and recency (MUST)** below (same skill).
### `activating` / change workflow stuck (cross-layer root cause)
> When an instance stays in **`activating`**, a change is unfinished, and **Red** or unassigned shards coexist, follow **`references/sop-activating-change-stuck.md`** end-to-end (**MUST** includes `ListActionRecords`, `DescribeInstance` before/after remediation, collection order **section 3.1**, reporting **section 4**).
### Pre-flight validation for Elasticsearch API
> **[IMPORTANT] `ES_ENDPOINT` must match the diagnosed instance**
>
> Compare `publicDomain` / `domain` and **`protocol`** from `DescribeInstance` with `ES_ENDPOINT`.
> If they differ, warn: `⚠️ ES_ENDPOINT does not match the current instance; run export ES_ENDPOINT="http://{publicDomain}:9200"` when **`protocol` is `HTTP`**, or `https://…` only when **`protocol` is `HTTPS`** (adjust host/port to match the deployment).
### When Elasticsearch credentials are missing or connections fail
> **[CRITICAL]** Guide the user to fix connectivity explicitly; **classify** failure modes (do not default persistent timeouts to “allowlist only”). **Do not imply the agent “forgot” Elasticsearch** — if the first answer is CMS/OpenAPI-heavy, give the **blocking reason** per **Feasibility order** below: unset `ES_*`; transport errors; **401 with valid `-u`**; TLS/scheme—not **401** on a probe **without** `-u` when `ES_PASSWORD` is SET (use authenticated `curl` first).
**Progressive playbook (read in order):** [references/es-api-call-failures.md](references/es-api-call-failures.md) (sections **1 → 4**).
**MUST / strategy context:** [references/es-api-diagnosis-strategy.md](references/es-api-diagnosis-strategy.md) (sections 1–3 and **3.5** summary table).
### Mandatory warning when MUST applies but Elasticsearch is not configured
> **[CRITICAL] If a MUST trigger fires but data-plane evidence is missing, put a warning at the top of the report:** follow **section 4** of [references/es-api-call-failures.md](references/es-api-call-failures.md) (blocking reason first, then MUST list, missing evidence; if `ES_*` unset, pointer to **section 2.2** of this SKILL; if vars are set, use es-api-call-failures **sections 1–2** for auth vs transport).
### Step 1: Quick health scan (initial signals)
Run the lightweight rules engine (17 metric rules) to list P0 / P1 / P2 findings and steer deeper collection:
```bash
python3 scripts/check_es_instance_health.py -i <InstanceId> -r <RegionId> [--window <minutes, default 60>] [--profile <profile_name>]
```
### Feasibility order (agent)
1. **Run** §2.2 `ES_*` checks (password = SET only)—**do not skip**; never infer feasibility without this step.
2. `ES_ENDPOINT` matches `DescribeInstance` `domain` / `publicDomain` (scheme/port).
3. **Authenticated** `GET /_cluster/health`—do not stop at **401** on an unauthenticated probe if `ES_PASSWORD` is SET.
4. MUST scope: **table rows** and/or **rule-engine MUST** line in §5.
### Step 2: Collect evidence in parallel
Based on Step 1, run collection in parallel (prioritize dimensions with signals).
If a **MUST-trigger** row **or rule-engine MUST** applies: run **Feasibility order**, then **run that Required Elasticsearch evidence** via `curl` in the **same** round (see §7). If **no** MUST applies, add optional data-plane `curl` only when **feasibility** and **necessity** both hold per the strategy doc.
Re-run **`check_es_instance_health.py`** with the same invocation pattern as Step 1; for this parallel round, **`--window 120`** and explicit **`--profile <profile_name>`** are common.
To backfill control-plane evidence (`DescribeInstance`, `ListSearchLog`, CMS-style calls), use **`aliyun`** patterns in [references/verification-method.md](references/verification-method.md) (epoch times, profiles, namespaces).
> Note: data-plane access still requires `ES_ENDPOINT` / `ES_PASSWORD`; the Aliyun CLI **cannot** replace `curl` to the cluster.
>
> For **MUST-trigger** rows, necessity for the **listed** endpoints is **already** established—do **not** skip them when feasibility including reachability holds. **Outside** those rows, avoid unrelated bulk `curl` solely because `ES_*` is set; use the strategy doc’s feasibility + necessity test instead.
### Step 3: Read SOPs by signal
Map signals to SOPs and read for deeper reasoning. With multiple signals, process **P0 → P1 → P2** for **severity**, then apply **Timeline and recency (MUST)** in Step 4 so the **narrative order** matches **when** signals mattered in the window—not only static rule-engine print order.
| Observed signal | Read |
|-----------------|------|
| Cluster Red/Yellow, node loss, pending tasks | `references/sop-cluster-health.md` |
| Long `activating`, unfinished change records, Red / unassigned shards | `references/sop-cluster-health.md` + `references/sop-activating-change-stuck.md` |
| High CPU, load, imbalance | `references/sop-cpu-load.md` |
| Per-node load imbalance (CPU/memory/disk/shard count) | `references/sop-node-load-imbalance.md` |
| JVM pressure, GC, circuit breaker, OOM | `references/sop-memory-gc.md` |
| Disk watermark, IO, write failures (read-only) | `references/sop-disk-storage.md` |
| Watermark misconfiguration, index blocks, “normal” disk % but write failures | `references/sop-disk-storage.md` (Section 3 — watermark misconfiguration) |
| Write timeouts / rejections / latency / QPS drop | `references/sop-write-performance.md` |
| Query timeouts / rejections / slow queries | `references/sop-query-thread-pool.md` |
| Nodes look down but CPU still reported; `all shards failed` | `references/sop-service-avalanche.md` |
| Intermittent Elasticsearch timeouts + CMS CPU > 80% | `references/sop-service-avalanche.md` |
| Risky settings, Ngram issues, API anomalies | `references/sop-configuration.md` |
| Event code definitions | `references/health-events-catalog.md` |
### Step 4: Synthesize and write the structured report
> **Acceptance-style optional checklists:** [references/acceptance-criteria.md](references/acceptance-criteria.md) **§6.1**–**§6.6** — Red/Yellow; read-heavy CPU + `search` pool (+ CMS alignment); JVM / breakers / fielddata; write-queue vs GC + **`rejected`/`completed`**; read-heavy **search pool vs GC-only** headline (expand in [sop-query-thread-pool.md](references/sop-query-thread-pool.md) *Report narrative: search pool vs GC / CPU headlines*); timeline/recency. **Bulk/write:** [references/sop-write-performance.md](references/sop-write-performance.md) §2. **Shard `reroute`:** [references/sop-node-load-imbalance.md](references/sop-node-load-imbalance.md) §1.3 (allocator / change control only).
> **[CRITICAL] Remediation must match the diagnosed root cause** — avoid generic templates. Wrong breaker or concurrency fixes (e.g. `in_flight_requests` vs `request`, “split query” when concurrency is the issue) → see **`sop-memory-gc.md`** and the fired signal’s SOP.
> **`activating` + data-plane anomaly**: include the **one-line cross-layer root cause**; see `references/sop-activating-change-stuck.md` **section 4**.
**Report skeleton (copy/fill):** [references/report-template.md](references/report-template.md).
### Timeline and recency (MUST for synthesized reports)
> **Problem:** `check_es_instance_health.py` and P0/P1/P2 bands express **severity**, not **when** a signal mattered most within the analysis window. **Cumulative** engine counters (`search.rejected`, `write.rejected`) do **not** encode recency—**write** and **search** issues can both be “real” while **only one path dominated the recent past** (e.g. search pressure **closer to window end** than write pressure).
**Binding rules for the agent:**
1. **Two axes** — Treat **severity** (P0/P1/P2) and **temporal relevance** (proximity to window end / “now”) as **orthogonal**. Do **not** infer recency from priority alone (e.g. “write is P0 so it must be the current headline”) when time-resolved evidence says otherwise.
2. **Mandatory human-facing section** — When more than one major finding fires (e.g. **write pool + search pool + GC/CPU**), the synthesized report **must** include an **`### Incident timeline (recency-ordered)`** (or equivalent) block **before or immediately after** the executive summary, unless the user explicitly asks for a minimal report. In that block:
- Order **bullets or rows by time** (earlier → later), or state **which signal cluster peaked / persisted in the **latter** portion of `{begin} ~ {end}`**.
- Call out **divergence**: e.g. “write-path stress **earlier** in window; search-path / CPU **more recent**” when CMS or logs support it.
3. **Evidence for recency** (use what exists; do not invent timestamps):
- **CloudMonitor**: per-metric time series — note **peak timestamp** or **sustained-high interval** for `NodeCPUUtilization`, `NodeHeapMemoryUtilization`, GC-related metrics, `ThreadPool.*` **if** exposed as rates or non-cumulative series in the collected JSON.
- **Slow logs / `ListSearchLog`**: correlate **query vs index** slow entries to minutes.
- **Engine (optional):** two `_nodes/stats/thread_pool` samples at **known times** to show **delta** on `rejected` / `completed`; or **`_tasks`** / **`hot_threads`** for **current** skew vs historical cumulative counters.
4. **Executive summary ordering** — The **opening 2–4 sentences** should reflect **recency-weighted** user impact: if **search** pressure is **closer to current** than **write** pressure, **lead** with search/query concurrency and co-stress (GC/CPU) **as appropriate**, and place **historical write saturation** as **context** or **second wave**—**without** dropping P0 write findings if they remain valid for remediation backlog.
5. **Explicit uncertainty** — If only cumulative counters exist and **no** time series differentiates paths, state **one line**: recency is **undifferentiated**; recommend **narrower window**, **slow logs**, or **delta sampling** for the next run.
---
## 6. Data collection details (CLI OpenAPI + injected input)
### One-shot entry
Use the same **`check_es_instance_health.py`** command as **§5 Step 1** (optional **`--window`** / **`--profile`**; default window **60** minutes if omitted).
### Injected input mode (paired with CLI)
`check_es_instance_health.py` accepts external JSON to avoid duplicate calls:
```bash
python3 scripts/check_es_instance_health.py \
-i <InstanceId> -r <RegionId> \
--data-source input \
--input-json-file /path/to/diag-input.json
```
Input JSON shape:
```json
{
"status_info": {},
"metrics": {},
"events": [],
"logs": []
}
```
`--data-source` modes:
- `auto`: prefer injected fields; backfill gaps via Aliyun CLI.
- `cli`: ignore injection; fetch everything via CLI.
- `input`: injection only; no OpenAPI calls.
### Manual control-plane CLI backfill
For additional OpenAPI examples, see `references/verification-method.md`.
---
## 7. Elasticsearch direct API access (data-plane deep dive)
When **feasibility** holds (including reachability), execute the REST calls **required by any MUST-trigger row** (§5). For endpoints **not** listed in a fired MUST row, call them only when **feasibility** and **necessity** both hold per the strategy doc.
> `ES_ENDPOINT` may be `host:port` or a full URL. For the samples below, normalize to `http:////` (use `https://` consistently when the cluster serves TLS).
>
> **Timeouts**: every `curl` must use `--connect-timeout 10 --max-time 30`.
### Red / Yellow (MUST) — recommended set
**Scope:** The cluster-health MUST row uses `ClusterStatus` max ≥ **Yellow** (includes **Red**). Use this set for **unassigned / misallocated shard** root cause on the engine.
```bash
curl -sS --connect-timeout 10 --max-time 30 -u "-elastic:ES_PASSWORD" \
"http://///_cluster/health?pretty"
curl -sS --connect-timeout 10 --max-time 30 -u "-elastic:ES_PASSWORD" \
-H "Content-Type: application/json" \
-X POST "http://///_cluster/allocation/explain?pretty" \
-d '{}'
curl -sS --connect-timeout 10 --max-time 30 -u "-elastic:ES_PASSWORD" \
"http://///_cat/shards?v&h=index,shard,prirep,state,node,unassigned.reason&s=state"
curl -sS --connect-timeout 10 --max-time 30 -u "-elastic:ES_PASSWORD" \
"http://///_cluster/pending_tasks?pretty"
curl -sS --connect-timeout 10 --max-time 30 -u "-elastic:ES_PASSWORD" \
"http://///_nodes/stats/thread_pool?pretty"
```
### Query / write performance (MUST) — recommended set
> Include **`_cluster/settings`** when **heap / GC / breaker** rules fired in Step 1 or **`_nodes/stats/breaker`** shows concern — read **transient** and **persistent** `indices.breaker.*` / `network.breaker.*`.
```bash
curl -sS --connect-timeout 10 --max-time 30 -u "-elastic:ES_PASSWORD" \
"http://///_nodes/hot_threads?threads=3"
curl -sS --connect-timeout 10 --max-time 30 -u "-elastic:ES_PASSWORD" \
"http://///_nodes/stats/breaker?pretty"
curl -sS --connect-timeout 10 --max-time 30 -u "-elastic:ES_PASSWORD" \
"http://///_cluster/settings?include_defaults=true&pretty"
```
> **`/_cluster/pending_tasks`** and **`GET /_nodes/stats/thread_pool`** are also listed under **Red / Yellow (MUST)** above—one call each per session when both sections apply. If you run **only** this performance block, add those two `curl` lines from that block.
### Resource anomalies without a closed loop (SHOULD) — recommended set
```bash
curl -sS --connect-timeout 10 --max-time 30 -u "-elastic:ES_PASSWORD" \
"http://///_cat/nodes?v&s=cpu:desc&h=name,ip,cpu,heap.percent,ram.percent,load_1m"
curl -sS --connect-timeout 10 --max-time 30 -u "-elastic:ES_PASSWORD" \
"http://///_nodes/stats/jvm?pretty"
curl -sS --connect-timeout 10 --max-time 30 -u "-elastic:ES_PASSWORD" \
"http://///_cat/allocation?v&bytes=gb"
```
> **`GET /_cluster/settings?include_defaults=true`** also appears under **Query / write performance (MUST)** above—reuse one response when both blocks apply. If you run **only** this SHOULD block, add the same `curl` line from that block.
**Protocol sanity** (avoid `WRONG_VERSION_NUMBER`): usually **http/https scheme mismatch** on `ES_ENDPOINT` — fix scheme/port and retry.
**Scenario → endpoint** index: [references/es-api-catalog.md](references/es-api-catalog.md).
---
## 8. Diagnostic coverage
The knowledge base covers **48+** health-event-style rules and chained scenarios (e.g. disk pressure → allocation → Red). **Per-category counts, P0/P1/P2 mix, and event codes:** [references/health-events-catalog.md](references/health-events-catalog.md) — scenario runbooks: `references/sop-*.md` (index: [references/README.md](references/README.md)).
---
## 9. Best practices
**Read-only:** no mutating control-plane APIs; no teardown.
1. **Layered + evidence-bound**: scan → SOP depth; every conclusion cites metrics/logs/events; if ES is unreachable, state limits ([es-api-call-failures.md](references/es-api-call-failures.md)).
2. **Priority vs narrative**: **P0→P2** for urgency; **Incident timeline** when multiple dimensions differ in time (Step 4). **Credentials / TLS / parameters:** §1–2 and §4.
3. **Green is not “all clear”** — watermarks, blocks, mis-set limits still matter; **MUST + reachable ES:** do not skip §5/§7 evidence because the cluster is Green or OpenAPI “explains” symptoms.
4. **Thread-pool `rejected`:** **cumulative** unless you show a delta — [sop-query-thread-pool.md](references/sop-query-thread-pool.md) §1–2; write/bulk: [sop-write-performance.md](references/sop-write-performance.md) §2.
---
## 10. Reference links
- `references/verification-method.md` — Verification (how to validate diagnosis; metrics, APIs, workflows)
- `references/report-template.md` — Structured diagnosis report skeleton
- `references/README.md` — **Language map** (reference assets and `sop-*.md` runbooks; English in this repo)
- `references/ram-policies.md` — RAM policy list
- `references/acceptance-criteria.md` — Correct/incorrect patterns and acceptance (includes credential and safety anti-patterns)
- `references/cli-installation-guide.md` — Aliyun CLI installation
- `references/es-api-catalog.md` — Elasticsearch REST API catalog
- `references/health-events-catalog.md` — Health event catalog
- `references/sop-*.md` — Scenario SOPs (e.g. `sop-activating-change-stuck.md` for `activating` / change stuck, cross-layer root cause)
- `references/es-api-diagnosis-strategy.md` — Elasticsearch API diagnosis strategy
FILE:references/README.md
# References (`references/`)
Operational knowledge for the **Alibaba Cloud Elasticsearch instance diagnosis** skill: APIs, verification, event catalog, acceptance rules, and scenario SOPs (`sop-*.md`).
**Discoverability:** skill **trigger keywords** (including **中文** for Chinese-speaking users) live in repo root **`SKILL.md`** frontmatter and **`metadata.yaml`** (`triggers`). Markdown here stays **English**; triggers bridge user language to the skill.
---
## Index
### APIs and access
| File | Purpose |
|------|---------|
| [related-apis.md](related-apis.md) | Related Alibaba Cloud OpenAPIs for diagnosis |
| [es-api-catalog.md](es-api-catalog.md) | Elasticsearch REST endpoints used in workflows |
| [ram-policies.md](ram-policies.md) | RAM policy snippets for CLI / OpenAPI |
### Verification, strategy, and quality bar
| File | Purpose |
|------|---------|
| [verification-method.md](verification-method.md) | How to verify diagnosis (metrics, logs, APIs) |
| [report-template.md](report-template.md) | Structured diagnosis report skeleton (Markdown) |
| [es-api-diagnosis-strategy.md](es-api-diagnosis-strategy.md) | When to call which ES API; CMS vs engine; MUST summary |
| [es-api-call-failures.md](es-api-call-failures.md) | **Progressive:** `curl` failures (401, timeouts, refused), evidence boundary, report checklist |
| [acceptance-criteria.md](acceptance-criteria.md) | PASS / PARTIAL / expectations; **§6.1** Red/Yellow + **ClusterStatus** one voice (CMS window vs engine snapshot) + **CMS `ClusterShardCount` swings** (cross-check engine + ops, avoid “half shards lost”); **§6.2** read-heavy CPU + search pool + CPU table vs log hotspot + **slow-log node vs search-pool node** (routing / phase / time); **§6.3** JVM / breakers / fielddata; **§6.4** write-queue narrative (dual P0 or causal lead + `rejected`/`completed` quant) / GC / bulk / `tripped` vs current heap; **§6.5** search vs GC headline; **`search.rejected` ≫ `write.rejected`** → search-first P0/executive order; **§6.6** timeline / recency-weighted executive lead (search vs write by time) |
| [health-events-catalog.md](health-events-catalog.md) | CMS-style events ↔ skill findings |
### Tooling
| File | Purpose |
|------|---------|
| [cli-installation-guide.md](cli-installation-guide.md) | Aliyun CLI install and configure |
### Scenario SOPs (`sop-*.md`)
| File | Typical signals |
|------|-----------------|
| [sop-activating-change-stuck.md](sop-activating-change-stuck.md) | Long `activating`, change records + engine Red |
| [sop-cluster-health.md](sop-cluster-health.md) | Red/Yellow, node loss, pending tasks, master election |
| [sop-configuration.md](sop-configuration.md) | Risky cluster/index settings, API error rate, slow recovery |
| [sop-cpu-load.md](sop-cpu-load.md) | Sustained or peak CPU, load imbalance |
| [sop-disk-storage.md](sop-disk-storage.md) | Disk watermarks, IO bottleneck, read-only / flood |
| [sop-memory-gc.md](sop-memory-gc.md) | Heap pressure, GC, circuit breakers, OOM |
| [sop-node-load-imbalance.md](sop-node-load-imbalance.md) | CPU / traffic / data skew (CV-style imbalance) |
| [sop-query-thread-pool.md](sop-query-thread-pool.md) | Search rejects, queue, slow queries |
| [sop-service-avalanche.md](sop-service-avalanche.md) | “Node down” but process up; `all shards failed` + high CPU |
| [sop-write-performance.md](sop-write-performance.md) | Write rejects, ingest latency, indexing dropped |
---
## For agents
1. Follow the workflow in the repo root **[`SKILL.md`](../SKILL.md)** (especially the signal → SOP routing table).
2. Prefer **`SKILL.md`** over ad-hoc reading order; open SOPs only when the observed signal matches.
3. Report text and evidence keys in **`scripts/check_es_instance_health.py`** are **English** — keep doc examples aligned (e.g. `cluster_status_latest`, `affected_nodes`, `configured_watermark_low`).
---
## Contributing
Edits to references should stay consistent with **`SKILL.md`** and the checker script’s field names. Prefer small, accurate updates over duplicating long procedures that already live in a SOP.
FILE:references/acceptance-criteria.md
# Acceptance criteria: alibabacloud-elasticsearch-instance-diagnosis
**Scenario**: Elasticsearch instance diagnosis
**Purpose**: Skill test / acceptance checklist
---
> **Version notes (2026-03)**
> - Control-plane OpenAPI collection uses the `aliyun` CLI only.
> - Health diagnosis entry point: `python3 scripts/check_es_instance_health.py ...`.
> - Engine-level deep collection uses `curl` against ES REST APIs (no `invoke_es_api.py`).
## 1. Environment dependencies
### 1.1 CLI dependency
#### ✅ CORRECT
```bash
aliyun version
```
#### ❌ INCORRECT
```bash
# Error: aliyun CLI not available
aliyun: command not found
```
---
## 2. Credential configuration
### 2.1 OpenAPI credentials (CLI profile)
#### ✅ CORRECT
```bash
aliyun configure list
aliyun --profile <profile_name> sts get-caller-identity
```
#### ❌ INCORRECT
```bash
# Error: profile does not exist
aliyun --profile not-exist sts get-caller-identity
```
### 2.2 Direct ES credentials
#### ✅ CORRECT
```bash
[[ -n "$ES_ENDPOINT" ]] && echo "ES_ENDPOINT: SET" || echo "ES_ENDPOINT: NOT SET"
[[ -n "$ES_PASSWORD" ]] && echo "ES_PASSWORD: SET" || echo "ES_PASSWORD: NOT SET"
```
#### ❌ INCORRECT
```bash
# Error: ES password not configured
unset ES_PASSWORD
```
---
## 3. Running the diagnosis
### 3.1 Main health-check entry
#### ✅ CORRECT
```bash
python3 scripts/check_es_instance_health.py \
-i es-cn-xxxxx -r cn-hangzhou \
--data-source cli \
--profile <profile_name>
```
**Expected output**:
- Structured report (P0/P1/P2, evidence, remediation)
- When multiple major dimensions fire, an **incident timeline (recency-ordered)** section aligning narrative with **when** signals peaked in the window (see §6.6)
- Summary of key monitoring metrics
- No dependency on deprecated scripts
### 3.2 Injected-input + auto backfill
#### ✅ CORRECT
```bash
python3 scripts/check_es_instance_health.py \
-i es-cn-xxxxx -r cn-hangzhou \
--data-source auto \
--input-json-file /path/to/diag-input.json \
--profile <profile_name>
```
**Expected output**:
- Input JSON fields take precedence when present
- Missing fields are backfilled via CLI OpenAPI
---
## 4. OpenAPI coverage
### 4.1 Elasticsearch OpenAPI
#### ✅ CORRECT
```bash
aliyun --profile <profile_name> elasticsearch DescribeInstance \
--region cn-hangzhou \
--InstanceId es-cn-xxxxx
aliyun --profile <profile_name> elasticsearch ListSearchLog \
--region cn-hangzhou \
--InstanceId es-cn-xxxxx \
--type INSTANCELOG \
--query "*" \
--beginTime <epoch_ms> \
--endTime <epoch_ms>
aliyun --profile <profile_name> elasticsearch ListActionRecords \
--region cn-hangzhou \
--InstanceId es-cn-xxxxx
aliyun --profile <profile_name> elasticsearch ListAllNode \
--region cn-hangzhou \
--InstanceId es-cn-xxxxx
```
### 4.2 CMS OpenAPI
#### ✅ CORRECT
```bash
aliyun --profile <profile_name> cms DescribeMetricList \
--region cn-hangzhou \
--Namespace acs_elasticsearch \
--MetricName ClusterStatus \
--Dimensions '[{"clusterId":"es-cn-xxxxx"}]' \
--StartTime <epoch_ms> \
--EndTime <epoch_ms> \
--Period 300
aliyun --profile <profile_name> cms DescribeSystemEventAttribute \
--region cn-hangzhou \
--Product elasticsearch \
--SearchKeywords es-cn-xxxxx \
--StartTime <epoch_ms> \
--EndTime <epoch_ms>
aliyun --profile <profile_name> cms DescribeMetricMetaList \
--region cn-hangzhou \
--Namespace acs_elasticsearch
```
**Expected output**:
- Success shape (`Code=200` or `Success=true`)
- Key fields present (instance status, logs, metrics, events, nodes, change records)
---
## 5. Engine-level ES API checks (`curl`)
If calls fail or return **401** / timeouts, classify the failure using **[es-api-call-failures.md](es-api-call-failures.md)** before judging PASS / PARTIAL.
### 5.1 Calls
#### ✅ CORRECT
```bash
curl -sS -u "-elastic:ES_PASSWORD" \
"http://///_cluster/health?pretty"
curl -sS -u "-elastic:ES_PASSWORD" -H "Content-Type: application/json" \
-X POST "http://///_cluster/allocation/explain?pretty" -d '{}'
curl -sS -u "-elastic:ES_PASSWORD" \
"http://///_nodes/stats/thread_pool?pretty"
```
#### ❌ INCORRECT
```bash
# Error: wrong URL path (missing leading /)
curl -sS -u "-elastic:ES_PASSWORD" \
"http:////_cluster/allocation/explain"
```
---
## 6. Report format
### ✅ CORRECT report shape
```markdown
## Diagnosis summary
**Instance**: es-cn-xxxxx (cn-hangzhou)
**Window**: 2026-03-24 10:00 ~ 2026-03-24 12:00
### Incident timeline (recency-ordered)
- Optional in minimal reports; **required** when search vs write vs GC show **different** emphasis over time (SKILL Step 4).
### Findings (by priority)
#### P0 - Critical (immediate action)
- [HealthCheck.ClusterUnhealthy] Cluster status Red
- Evidence: ClusterStatus=2, UNASSIGNED primary shards
- Likely cause: shard allocation failure
- Immediate action: run `POST /_cluster/allocation/explain`
```
### 6.1 Cluster Red / Yellow — optional “full checklist” (acceptance-style)
Use when you want the report to align with **structured acceptance rubrics** (control plane vs data plane already correct). These are **additive**, not a substitute for `allocation/explain` + `sop-cluster-health.md`.
| Item | What to add (one line each when data exists) |
|------|-----------------------------------------------|
| **Red vs Yellow (B5)** | State **`unassigned_primary_shards` > 0** (or equivalent: “at least one **primary** shard is `UNASSIGNED`”) for **Red**; for **Yellow**, primaries assigned but replicas unassigned. |
| **Shard arithmetic (B6)** | e.g. `number_of_shards=3`, `number_of_replicas=1` → **3 primary shard copies + 3 replica shard copies** (one **replica shard** per primary), **6** logical copies → all may show `UNASSIGNED` when allocation cannot place any copy. Avoid wording that sounds like “`number_of_replicas` = 3”. |
| **`unassigned.reason` (C4)** | From `GET /_cat/shards` / explain: cite reason when present (`ALLOCATION_FAILED`, filter / require deciders, etc.). **Optional (stricter scoring):** mention **`INDEX_CREATED`** when it appears — often consistent with “shard never allocated since creation” under a failing filter (e.g. `require._name` to a non-existent node). |
| **Blast radius (B8)** | Name affected indices (from `_cat/shards`); if only one hot index, say other indices were not failing (if `_cat/shards` supports that claim). |
| **Scale-out vs `total_shards_per_node` (B9)** | Do not promise Green from “add a few nodes” when a **per-index per-node cap** stays tight — verify **(data nodes × cap) ≥** total shard copies for that index (see `sop-cluster-health.md` §2 Yellow; e.g. **cap=1** + **2p/1r** ⇒ **4** copies ⇒ **3** nodes often still Yellow). |
| **Post-fix verification (D3)** | After remediation, suggest re-check: `GET /_cluster/health`, `GET /_cat/shards/{index}?v` (and `allocation/explain` if any shard still unassigned). |
| **Ruling out disk / node / CPU** | Prefer: “**In this report’s metrics window and collected evidence**, disk / node loss / CPU are **not** the dominant driver” — not an absolute “unrelated everywhere” claim (other clusters or later windows may differ). |
| **ClusterStatus single voice** | If one paragraph cites **CMS `ClusterStatus` max = Yellow** (or Red) for **`{begin} ~ {end}`** and another cites **`GET /_cluster/health` = green** at a **single** probe time, **reconcile in one sentence**: e.g. **worst status in the monitoring window** vs **snapshot after recovery** vs **instantaneous engine API** — do not leave **Green** and **Yellow** side by side without **time / aggregation** qualifier (parallel sections should not contradict). |
| **CMS `ClusterShardCount` (or total-shard) swings** | If **CMS** shows **large step changes** in **total shard count** (e.g. **76 → 38 → 76**) within or across windows, **do not** imply **“half the shards were lost”** unless **`GET /_cat/shards`** / engine health **confirms** loss — **reconcile** with **Alibaba Cloud console** change / ops events, **metric definition** (scope: data nodes vs cluster, replication, aggregation lag), and **instance lifecycle**. Prefer wording such as **“pending cross-check with ops records and engine shard view”** when the drop is **not** explained by allocation state. |
### 6.2 Read-heavy CPU + search pool (optional checklist)
Optional lines when **query-driven** overload is present (e.g. `_tasks` shows `*search*` on a hot index with `match_all`-style bodies):
| Item | What to add |
|------|-------------|
| **CPU ↔ same workload** | One explicit sentence: **NodeCPU / `hot_threads` and `search` pool queue or `rejected` stem from the same concurrent search load** (name index + task shape when known). |
| **Shard counts / layout in prose** | Once **`GET {index}/_settings`** (and templates) and **`_cat/shards`** **confirm** counts and placement, **use those values consistently** in the report — they override assumptions from any other environment. Optionally one line on **provenance** (template, console, legacy index). |
| **Hot node vs skew + `rejected`** | Summarize the overloaded side as a **hot node** when **`rejected` / heap / GC** align there. Cite **primary/replica skew** as **one factor** in **local** work and pressure, **plus** **parallel** read-path explanations (coordinator / single client path / `preference=_primary` / routing) — not “primaries necessarily answer most default searches.” See [sop-query-thread-pool.md](sop-query-thread-pool.md) §2. |
| **CMS vs thread_pool time alignment** | State whether **CMS CPU** overlaps the **search queue / reject** episode — including **CPU no longer high** while **`rejected` is cumulative** and **`queue` already zero** (post-spike phenomenology). |
| **CMS CPU metric hygiene** | If CMS **NodeCPU** looks modest vs strong **`search.rejected`**, add **one reconciliation block**: burst **early in window** vs CMS **whole-window** stat; metric **name + `Dimensions`** (this instance’s **data nodes**, not wrong rollup); optional **`_nodes/stats/os`** vs sample period. See [sop-cpu-load.md](sop-cpu-load.md) decision-tree notes. |
| **Per-node CPU % vs search “hot” node** | If a **per-node CPU table** (e.g. **5-minute mean** or **window aggregate**) shows **lower** CPU on the node that **logs / `thread_pool.search` / `EsRejectedExecutionException`** identify as the **search hotspot**, add **one sentence** so readers are not stuck: **spike moment ≠ coarse mean**; **CMS bucket / dimension** may not align with **sub-minute** pool saturation; **engine or INSTANCELOG lines** anchor **which node** ran the saturated **`search` pool** — the table is **imbalance context**, not a disproof of pool rejections on that node. |
| **Shard placement phrasing** | Use **consistent** qualifiers — **all** vs **N / N** vs **almost all** — title and tables must match; do not imply exceptions without a shard row to cite. |
| **Short / ambiguous index names** | For generic names (e.g. **`stats`**), add **purpose** (built-in / monitoring vs business-owned) or **full index pattern** so readers do not conflate unrelated indices. |
| **`rejected` counter** | Call out **cumulative since process start** (unless you show a **delta**); do not imply the full counter equals rejects “only in the report window.” |
| **`hot_threads` / logs** | Say whether **`hot_threads` ran successfully** vs **timeout / empty** after spike; for OpenAPI logs, **success vs failure** (e.g. metrics API unavailable). Treat failures as **evidence gaps**, not silent omissions. |
| **Slow-log node vs search-pool node in other logs** | **SEARCHSLOW** / **fetch**-phase lines may name **one data node** while **INSTANCELOG** / **thread_pool** lines at **another minute** name a **different node** as **`search` saturated** — both can be **true**: **query routing**, **primary/replica**, **coordinator vs data role**, and **time skew**. Add **one sentence**: each line reflects **that phase / time / shard copy**; ground **routing** with **`GET /_cat/shards/{index}`** (and **which shard** the slow query hit) so the report does **not** read as **self-contradictory**. |
| **Wording** | Prefer **co-occurring signals** over a strict single-arrow causal chain unless ordering is evidenced. |
| **P0 / P1 band** | Match the **strongest fired rule** in the window (often **P0** CPU and/or **P0** `ThreadPool.Search*` per [health-events-catalog.md](health-events-catalog.md)); avoid a **P1-only** headline when **P0** rules already fired. |
### 6.3 JVM / fielddata / circuit breakers (optional checklist)
Use when **heap**, **GC**, **breaker**, or **fielddata** signals appear — cluster may still be **Green**.
| Item | What to add |
|------|-------------|
| **`_cluster/settings` first-class** | **`GET _cluster/settings?include_defaults=true`** — read **transient** and **persistent** for **`indices.breaker.fielddata.limit`**, **`indices.breaker.request.limit`**, and related keys; they are often changed **together**. |
| **Headline vs Old GC** | If settings show **very low breaker limits** and/or **`_nodes/stats/breaker`** shows **`tripped` > 0** / logs show **`CircuitBreakingException`**, lead with **breaker + settings + query/mapping** — **Old GC rate** as **parallel** or **secondary**, not the only story. See [sop-memory-gc.md](sop-memory-gc.md) §5. |
| **`tripped` vs P2-only wording** | Low **`fielddata.limit`** without trips is **config risk**; **`tripped` > 0** or matching exceptions → **incident-grade** narrative (usually alongside **P0**-class breaker handling in the catalog). |
| **Index-level closure** | Name **indices** and **mapping** culprits (`text` + **`fielddata`**, large **`terms` / `cardinality`**, deep paging) when engine evidence exists. |
| **Heap skew across nodes** | Correlate uneven **heap** with **`_nodes/stats/breaker`**, **`_nodes/stats/indices/fielddata`**, **`_cat/shards`** — not **only** shard counts. |
### 6.4 Write-path saturation + CPU + JVM (optional checklist)
Use when **ingest / bulk**, **`ThreadPool.WriteRejected`**, **CPU peaks**, and/or **Old GC / heap** signals **co-occur** — for example **sustained high-QPS bulk** indexing that stresses the **write** pool.
**Write-queue / bulk prompts (rubric alignment):** Graders often treat **`ThreadPool.WriteRejected`** + **`_nodes/stats/thread_pool`** as the **primary** storyline. A conclusion that **only** headlines **Old GC / heap** while relegating **write-pool saturation** to “P1 / participating factor” can read as a **JVM/GC report**, not a **write-capacity** diagnosis. Prefer one of:
- **Dual P0 (parallel):** **Opening** treats **quantified write-pool stress** (**`write` `rejected`** **and** **`completed`**, or **reject share** per node or cluster) **and** **Old GC wall-clock / `GCTimeRatioTooHigh`** as **co-equal** P0-class signals (see [health-events-catalog.md](health-events-catalog.md) **`ThreadPool.WriteRejected`** / **`HealthCheck.ThreadPoolSaturation`**).
- **Causal chain (upstream first):** **First sentence** states **bulk / ingest saturates the `write` pool** → **merges / heap** → **Old GC + CPU** — so **`write.rejected`** is in the **lead**, not buried below a GC headline.
Add **at least one line of numbers**: e.g. **`rejected` / `completed`** on the **busiest node(s)**, optional **reject share** (approximately `rejected / (rejected + completed)`) for the same pool, and **rule / event names** when citing CMS (e.g. **`ThreadPool.WriteRejected`**).
Align structured reports with **[health-events-catalog.md](health-events-catalog.md)**: CMS **P0** for write rejects usually requires **sustained reject rate + traffic**, not cumulative counters alone — **human-written** reports should still **quote cumulative `rejected`/`completed` with context** so the **data-plane** story is concrete.
**Engine snapshot (`check_es_instance_health.py`):** **`rejected` > 0** is emitted as **P1** when **`JVMMemory.GCTimeRatioTooHigh`** is **not** in the same finding set. When **`GCTimeRatioTooHigh` (P0)** **does** fire, the script **promotes `ThreadPool.WriteRejected` from P1 → P0** so severity bands match a **dual-P0** or **causal-chain** narrative — see table below. **Co-occurrence is not proof of causation**; use **`hot_threads`** / **`_tasks`** to confirm an **ingest-heavy** path before asserting **write overload → GC** as the only story.
| Item | What to add |
|------|-------------|
| **Headline priority (dual P0)** | When **write-path evidence** and **`JVMMemory.GCTimeRatioTooHigh` / Old GC** are **both** material, prefer **two P0 headlines** (or **one causal chain**, not JVM-only): e.g. **write pool saturation / rejects** **and** **Old GC wall-clock share** — not “GC only.” Optional chain: **write overload → merge / heap → Old GC → CPU spikes** (only if evidence supports ordering). |
| **GC collector wording** | Do **not** assume **G1** vs **CMS** (or other) from ES/JDK version alone. Prefer **“old-generation / Old GC”** or cite **`GET _nodes/stats/jvm`** (`gc.collectors.*`) / node GC logs — avoids technical challenges from a wrong collector name. |
| **Script vs narrative** | If the checker still prints **WriteRejected as P1** (no `GCTimeRatioTooHigh` in the window) but CMS/catalog would grade **P0**, say so in one line — **do not** silently downgrade the user-facing conclusion. |
| **`rejected` + `completed` + `queue` / `active`** | Include **`completed`** (same pool, per node or total) next to **`rejected`** to size **reject share**; state **cumulative since node start** for **`rejected`** / **`completed`**; add **current** `queue` / `active` so readers do not confuse **history** with **ongoing** overload. |
| **Per-node skew** | When **`rejections_by_node`** is uneven, tie to **hot shards / routing / transient skew** — same direction as **CPU imbalance** checks. |
| **Shard count vs `write.rejected` skew** | If **shards per data node** are **even** (e.g. similar counts from **`_cat/shards`** / routing) but **`write.rejected`** is **much higher on one node**, state explicitly: imbalance is **unlikely** from **shard count alone** — favor **hot shards / routing / client targets**; cite **numbers** for both **shards/node** and **per-node `rejected`** when available. |
| **Parent breaker `tripped`** | **`tripped`** is **cumulative** (breaker semantics: trips **since JVM start**). A **non-zero** **`parent.tripped`** (e.g. **183**) is **aligned with historical peaks** / past pressure and need **not** contradict **current** `_cat/nodes` heap **~50%** — add **half a sentence** that **`tripped` reflects cumulative history**, not instantaneous heap %. Relate to **historical** heap pressure alongside **`rejected`**; reconcile with **current** heap **in one sentence**. |
| **Bulk / client guidance** | Under **reject / capacity-tight** conditions, prefer: **lower parallel bulk streams**, **smaller per-request bytes/doc count**, **backoff / fewer in-flight bulks**; align total throughput to SLO. If mentioning **larger batches**, **immediately** caveat **timeouts / memory spikes / OOM** — avoid implying “bigger single requests” as the fix. See [sop-write-performance.md](sop-write-performance.md) §2 (*bulk QPS → write pool*). |
### 6.5 Read-heavy scenarios (search pool vs GC-only headline)
Use when the **incident** is **query-driven** (high concurrent search, terms / QPS patterns, “search reject → cascade” timelines). **Section 6.4** already guards **write-path** vs **GC-only** headlines; **section 6.2** covers **search pool + CPU** alignment — this subsection fixes the **remaining gap**: **Old GC / CPU as the only lead** when **`search` pool saturation** is the **primary** plausible storyline. **Long-form narrative, Chinese templates, and evidence closure:** [sop-query-thread-pool.md](sop-query-thread-pool.md) **Report narrative: search pool vs GC / CPU headlines** (keep this section as a **short rubric**; avoid duplicating those paragraphs here).
**Search-pool-primary vs write (both pools have cumulative `rejected`):** For **query-saturation** cases where the catalog targets **`ThreadPool.SearchRejected`** and **`_nodes/stats/thread_pool`** shows **`search.rejected` ≫ `write.rejected`** on the same data nodes, the **executive lead** and **first P0 bullets** must **not** read like a **write-primary** report with search as a footnote — put **`search` pool / query concurrency** **first** (or **before** write in the same band). **`write.rejected`** may remain **P0/P1** as **parallel** or **secondary** (bulk, catch-up, historical ingest); **say so explicitly** so readers do not infer “mainly an ingest/write incident.” Tie the **search** lead to **high concurrent query / terms / slow query** and **hot index name** when verified.
| Item | What to add |
|------|-------------|
| **Headline order** | Do **not** make **“Old GC (P0) + CPU spike (P1)”** the **sole** executive story when **`ThreadPool.SearchRejected`** / **`search` pool capacity** is still plausible — treat **query concurrency > `search` pool** as **first or co-equal**; **GC/CPU** as **co-stress, cascade, or second wave** unless engine evidence proves otherwise. **Parallel to §6.4:** **dual P0** (search capacity + Old GC) or **causal chain** with **`search.rejected` / queue in the first sentence** when the prompt is read-heavy — mirror [sop-write-performance.md](sop-write-performance.md) §2 ordering, swap **`write` → `search`**. |
| **P0 / executive order vs `search` ≫ `write` (cumulative)** | If **`search.rejected` ≫ `write.rejected`** per node (or cluster totals) and **`ThreadPool.SearchRejected`** applies, **do not** place **`ThreadPool.WriteRejected`** **above** search in the **opening summary** or **first P0 line** unless **time-resolved** evidence shows **write** dominated the window. **Script print order** (e.g. checker listing write before search) is **not** narrative order — override for human-facing text when data-plane magnitudes and scenario type say **search-first**. |
| **Quantification** | Same spirit as **§6.4**: cite **`search.rejected`** with **`search.completed`** (optional reject share, cumulative semantics) when stating capacity — not only “queue was high.” |
| **Rules + summary** | If **both** **GC P0-class** and **`ThreadPool.SearchRejected`** (or catalog equivalent) appear in the **same window**, **list both** in the opening summary — do not imply **only GC** is P0. |
| **Engine API incomplete** | Use the **self-limiting** templates in [sop-query-thread-pool.md](sop-query-thread-pool.md) (**Report narrative: search pool vs GC / CPU headlines**): state **`GET /_nodes/stats/thread_pool` (`search`)** as **pending verification** when timeouts prevented collection. |
| **Customer-facing wording** | Prefer **“引擎层必查清单(SKILL 文档第 5 节)”** over **§**-prefixed section labels in external deliverables. |
### 6.6 Timeline vs severity (recency-ordered narrative)
Use when **multiple** of: **`ThreadPool.WriteRejected`**, **`ThreadPool.SearchRejected`**, **GC / heap / CPU** appear together. **Severity bands (P0/P1)** are **not** a substitute for **when** each path stressed the cluster inside `{begin} ~ {end}`.
| Item | What to add |
|------|-------------|
| **Section present** | An **`### Incident timeline (recency-ordered)`** (or equivalent) with **time-ordered** or **“latter-window emphasis”** bullets — unless the report is explicitly minimal **and** recency cannot be distinguished (then one **uncertainty** line). |
| **No false recency from counters** | **`search.rejected` / `write.rejected`** are **cumulative since node start** — do **not** say “write happened first” from **counter magnitude alone**; use **CMS series**, **slow logs**, or **paired deltas** on `rejected`/`completed`. |
| **CMS peaks** | Where available, cite **which metric** peaked **when** (e.g. CPU vs GC duration vs pool-related CMS names) relative to **window start vs end**. |
| **Executive order** | **Whichever path peaked or persisted closer to window end** should **lead** the **opening summary** (recency-weighted): e.g. if **search-path** pressure is **more recent** than **write-path**, **lead** with **search / query concurrency** (and co-stress) **as appropriate**; if **write-path** is **more recent**, **lead** with **write / bulk**. Same **time-over-magnitude** rule as **§6.5** row **P0 / executive order vs `search` ≫ `write`** (override when **time-resolved** evidence shows the **other** path dominated). Do **not** drop **P0** findings for the **other** path from **Findings (by priority)**. |
| **Script print order** | **`check_es_instance_health.py` listing order** is **not** proof of temporal order — override for narrative when **time-resolved** evidence supports it. |
---
## 7. Common mistakes
| Issue | Bad example | Correct approach |
|-------|-------------|------------------|
| Hard-coded secrets | AK/SK/password in command line | CLI profile + environment variables |
| Wrong scheme | `http://` vs `https://` mismatch | Match the instance endpoint |
| Unconfirmed params | Guessing region / instance id | Confirm with the user first |
| Skipping RAM | Ignoring permission errors | Validate RAM Actions first |
| P0 order = time order | Listing write-path P0 before search P1 because “P0 first” when CMS shows **search stress later** in the window | Add **Incident timeline**; **lead** the executive summary with **recency-weighted** impact; keep full **Findings by priority** |
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
Complete guide for installing and configuring Aliyun CLI.
> **Aliyun CLI 3.3.1+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.1 or later for full plugin ecosystem coverage.
## Installation
### macOS
**Using Homebrew (Recommended)**
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
# Verify version (>= 3.3.1)
aliyun version
```
**Using Binary**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
# Extract
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
# Move to PATH
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
### Linux
**Debian/Ubuntu**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**ARM64 Architecture**
```bash
# Download ARM64 version
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-arm64.tgz
sudo mv aliyun /usr/local/bin/
```
### Windows
1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Verify: `aliyun version`
## Configuration
### Quick Start
```bash
aliyun configure set \
--mode AK \
--access-key-id <your-access-key-id> \
--access-key-secret <your-access-key-secret> \
--region cn-hangzhou
```
### Environment Variables
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key_id
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_key_secret
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
## Verification
```bash
# Check configuration
aliyun configure list
# Test authentication
aliyun ecs describe-regions
```
## References
- Official Documentation: https://help.aliyun.com/zh/cli/
- Access Key Management: https://ram.console.aliyun.com/manage/ak
FILE:references/es-api-call-failures.md
# Elasticsearch REST API call failures (progressive guide)
Use this document **in order** when `curl` to the data plane fails or returns no useful JSON. For **when** engine APIs are mandatory vs optional, see [es-api-diagnosis-strategy.md](es-api-diagnosis-strategy.md) (sections 1–3: feasibility, necessity, MUST triggers).
**Security:** never print `ES_PASSWORD` or ask for it in chat — only [SKILL.md](../SKILL.md) section 2.2 patterns (local `export` in the same shell the agent uses).
---
## 0. Mandatory order (agents and operators)
Do **not** infer “this environment has no Elasticsearch credentials” from a lone **unauthenticated** `curl` (for example a probe that omits `-u`), even if the response is **401**. **401 without `Authorization` is expected** on secured clusters.
**Always run §1.1 first** in the **same shell** that will execute the diagnosis `curl` chain (for Cursor: the **integrated terminal** session the agent uses). Then:
| Step | Action |
|------|--------|
| 1 | §1.1 presence checks for `ES_ENDPOINT` / `ES_PASSWORD` (and `ES_USERNAME` if you use non-default). |
| 2 | If **both** `ES_ENDPOINT` and `ES_PASSWORD` are **SET** → run authenticated minimal APIs immediately (§1.3). Do not ask the user to export secrets until a **authenticated** attempt fails. |
| 3 | If **either** is **NOT SET** → blocking reason is **missing engine credentials in this shell**; point to [SKILL.md](../SKILL.md) §2.2 and have the user `export` in **that same terminal**, then re-run §1.1 → §1.3. |
| 4 | If authenticated calls return **401** → §1.2 (wrong user/password/realm or URL scheme). |
| 5 | If authenticated calls return **000** / refused / **28** → §2 (transport). |
This order matches **actual runs**: many failures are “agent skipped §1.1” or “wrong `http`/`https`,” not missing cloud APIs.
---
## 1. Preconditions — why the first answer might skip ES JSON
Engine `curl` needs **both**:
| Gate | Meaning |
|------|---------|
| **Executable** | `ES_ENDPOINT` + `ES_PASSWORD` set; TCP/TLS path to Elasticsearch works; authenticated calls return JSON. |
| **Necessary** | The question needs engine proof that CMS / OpenAPI cannot supply (see strategy doc). |
If **Executable** fails, the agent is **not** “forgetting” Elasticsearch: calls either were not possible (no env), were sent **without** auth and got **401**, used the **wrong** auth or **URL scheme**, or hit transport errors — so `_cluster/health`, `/_cluster/allocation/explain`, etc. do **not** yield business JSON.
### 1.1 Default habit (same shell as the agent)
Run **presence only** (never echo secrets):
```bash
[[ -n "$ES_ENDPOINT" ]] && echo "ES_ENDPOINT: SET" || echo "ES_ENDPOINT: NOT SET"
[[ -n "$ES_PASSWORD" ]] && echo "ES_PASSWORD: SET" || echo "ES_PASSWORD: NOT SET"
[[ -n "$ES_USERNAME" ]] && echo "ES_USERNAME: SET" || echo "ES_USERNAME: NOT SET"
```
Interpretation:
- **`ES_ENDPOINT` + `ES_PASSWORD` both SET** → proceed to **§1.3** immediately (minimal authenticated checks). Optional: print **only** the host/scheme for sanity (`echo "$ES_ENDPOINT"` is OK if it contains no secrets; do **not** echo password).
- **Either NOT SET** → state clearly: **this shell has no engine credentials**, not “Elasticsearch is unreachable.” Point to [SKILL.md](../SKILL.md) **section 2.2**; user `export` in the **integrated terminal** (same shell), then you **re-run §1.1 then §1.3** — do not treat an earlier unauthenticated 401 as proof that credentials are absent.
### 1.2 HTTP **401 Unauthorized** (after you know whether auth was sent)
Classify **401**:
| Situation | Meaning | What to do |
|-----------|---------|------------|
| **401** on a request **without** `-u` / `Authorization` | Normal for secured ES; **not** a credential verdict | Run §1.1; if vars SET, retry **with** auth (§1.3). |
| **401** on a request **with** `ES_USERNAME` / `ES_PASSWORD` (or `-u`) | Auth rejected | Fix **local env only**: password, username (often `elastic`), **`http://` vs `https://`** on `ES_ENDPOINT` (must match `DescribeInstance` `protocol` / actual listener), port, or trailing path typos. See [SKILL.md](../SKILL.md) §2.2. |
**Principle:** “We saw 401” is not actionable until the table above is satisfied.
### 1.3 Minimal authenticated checks (run as soon as §1.1 passes)
Use timeouts on every call (see [SKILL.md](../SKILL.md) §7). Normalize base URL once:
- If `ES_ENDPOINT` is `host:port` with no scheme, prepend **`http://`** or **`https://`** consistently with the cluster (Alibaba Cloud `DescribeInstance` reports `protocol` for the ES endpoint).
- Strip trailing `/` before appending paths.
Example pattern (do not log password):
```bash
BASE="ES_ENDPOINT"
case "$BASE" in
http://*|https://*) ;;
*) BASE="http://BASE" ;;
esac
BASE="BASE%/"
curl -sS --connect-timeout 10 --max-time 30 -u "-elastic:ES_PASSWORD" "BASE/_cluster/health?pretty"
```
For **`https://`**, add **`--cacert /path/to/ca.pem`** (or **`-k`** for short, non-production tests) if the system trust store does not trust the cluster certificate. **Do not** switch from **`http://` to `https://`** as a blind fix for **`000`/timeout** when **`DescribeInstance` `protocol` is `HTTP`** — prefer allowlist / SG / path checks; see [SKILL.md](../SKILL.md) §2.2 (*If `http://` does not work — when to try `https://`* and *HTTPS — prerequisites*).
When **MUST** triggers require allocation evidence and the cluster is **Red**, prefer **targeted** explain after a quick shard listing — see **§1.4** (empty `explain` body can pick a **replica** while the business-critical failure is an **unassigned primary**).
### 1.4 `POST /_cluster/allocation/explain` — practical pitfall
`POST /_cluster/allocation/explain` with an **empty body** `{}` returns **one** explanatory shard, which may be a **replica** (`"primary" : false`). For **Red** clusters, the P0 signal is usually an **unassigned primary**.
**Runbook:**
1. `GET _cat/shards?v&h=index,shard,prirep,state,node,unassigned.reason&s=state` — find `UNASSIGNED` rows with **`p`** (primary).
2. `POST /_cluster/allocation/explain` with body, for example:
```json
{
"index": "<index_from_cat>",
"shard": 0,
"primary": true
}
```
Then read `allocate_explanation`, `unassigned_info`, and `node_allocation_decisions[].deciders[]` (see [sop-cluster-health.md](sop-cluster-health.md)).
---
## 2. Transport and TLS outcomes (classify before blaming “network”)
| Pattern | Typical symptom | First direction | See also |
|---------|-----------------|-----------------|------|
| **Connection refused** | `curl` exit **7**, `Connection refused` | Process down, public access off, wrong port | [sop-cluster-health.md](sop-cluster-health.md) |
| **Persistent timeout** | exit **28**, **all** calls time out | **Context-dependent** — section 2.1 below | Allowlist + CMS triage |
| **Intermittent timeout** | **some** curls OK, **some** timeout | Often **overload** / meltdown | [sop-service-avalanche.md](sop-service-avalanche.md) |
| **Auth failure** | HTTP **401** **with** auth headers sent | Wrong/missing password, user, or scheme | §1.2 |
| **TLS mismatch** | e.g. `WRONG_VERSION_NUMBER` | `http://` vs `https://` on `ES_ENDPOINT` | Match **`DescribeInstance` `protocol`**: public **HTTP** listener → **`http://<publicDomain>:9200`**; **`https://`** on HTTP-only causes TLS errors. |
### 2.1 Persistent timeout (exit 28) — context-aware
> Do **not** equate exit 28 with “allowlist only.” Exit 28 means the TCP connection did not complete in time — triangulate with control plane + CMS.
Suggested order:
1. **Allowlist:** `DescribeInstance` `publicIpWhitelist` / `networkConfig.whiteIpGroupList` (`whiteIpType=PUBLIC_ES`) vs client egress IP.
2. **If allowlist looks OK but still timing out:**
| If… | Lean toward… | Action |
|-----|---------------|--------|
| CMS `NodeCPUUtilization` > **80%** or `NodeHeapMemoryUtilization` > **85%** | **Server overload** — ES too busy to finish TLS/TCP quickly | Treat as degradation; [sop-service-avalanche.md](sop-service-avalanche.md) |
| CMS OK, security group missing inbound **9200** | Security group | Open path from client |
| CMS OK, allowlist + SG OK | Path / middlebox | `telnet` / `nc` hop check |
**Principle:** exit 28 is a transport-timeout signal, not proof of a firewall drop. If CMS shows resource pressure, prefer **overload** before “pure network.”
### 2.2 Intermittent timeouts
When some `curl` calls succeed and others time out in the same session:
1. Do **not** label as “preconditions not met” if `ES_*` is set.
2. **Must** cross-check CMS `NodeCPUUtilization`.
3. If CPU > **80%**, treat as suspected meltdown — [sop-service-avalanche.md](sop-service-avalanche.md).
4. Use successful responses as **partial** evidence; state coverage limits in the report.
**Principle:** flaky connectivity is itself a signal — often overload with a **live** process, not a simple misconfig.
### 2.3 Flaky API collection tactics
When the cluster is hot: longer `--connect-timeout` / `--max-time`, start with light APIs (`_cat/nodes`, `_cluster/health`), retry `_nodes/hot_threads` a few times — see [sop-service-avalanche.md](sop-service-avalanche.md) (intermittent timeout entry).
---
## 3. Evidence boundary (one screen)
- With **only** control plane + CMS, or with **no successful authenticated engine JSON**, you may still state **high-confidence** facts such as: CMS **`ClusterStatus` = Red ⇒ at least one primary shard is unassigned**.
- You **cannot** uniquely pin **allocation-explain-class** causes (e.g. bad `routing.allocation.require._name`, exact decider text) **without** at least one **authenticated successful** `/_cluster/allocation/explain` targeting the relevant shard (or equivalent) — see **§1.4**.
- Say explicitly: the gap is **missing authenticated data-plane evidence** or **wrong shard in explain output**, not “Elasticsearch cannot be diagnosed.”
---
## 4. Report checklist (when MUST data is missing)
Put at the **top** of the report:
1. **One line blocking reason** — only **after** §1.1: e.g. `ES_PASSWORD NOT SET in this shell`, **`401` with authenticated curl** (then scheme/user/password), **connection refused**, **persistent vs intermittent timeout** — **before** a long generic “evidence boundary” paragraph.
2. Which **MUST** conditions fired (from CMS / script).
3. Which engine evidence types **would** be collected after auth/path works (`allocation/explain` with **primary** shard if Red, `hot_threads`, …).
4. Pointer to [SKILL.md](../SKILL.md) **section 2.2** only when §1.1 shows vars **not** set; if vars are set but calls fail, point to **§1.2 / §2** instead.
---
## 5. Related references
| Doc | Role |
|-----|------|
| [es-api-diagnosis-strategy.md](es-api-diagnosis-strategy.md) | MUST / SHOULD / executable vs necessary |
| [verification-method.md](verification-method.md) | `curl` examples and checklists |
| [acceptance-criteria.md](acceptance-criteria.md) | PASS / partial when engine checks blocked |
| [sop-service-avalanche.md](sop-service-avalanche.md) | CPU + intermittent timeouts + `all shards failed` |
| [sop-cluster-health.md](sop-cluster-health.md) | Real node loss vs cluster issues |
FILE:references/es-api-catalog.md
# Elasticsearch REST API catalog (`curl`)
Use `curl` directly against the Elasticsearch data plane.
```bash
export ES_ENDPOINT="<host:9200 or http://host:9200>"
export ES_USERNAME="elastic"
export ES_PASSWORD="<elasticsearch-admin-password>"
# Generic template (HTTP)
curl -sS -u "-elastic:ES_PASSWORD" \
"http://///<endpoint>?pretty"
```
> If the cluster only serves **HTTPS**, switch to `https://` and add CA options or `-k` for testing only.
---
## Cluster layer
| Endpoint | Purpose |
|----------|---------|
| `GET /_cluster/health` | Cluster health (green/yellow/red) |
| `GET /_cluster/stats` | Cluster statistics (nodes/shards/disk/JVM) |
| `GET /_cluster/settings` | Cluster dynamic settings |
| `GET /_cluster/pending_tasks` | Master pending tasks |
| `POST /_cluster/allocation/explain` | Explain unassigned shards (requires unassigned shards) |
---
## Node layer
| Endpoint | Purpose |
|----------|---------|
| `GET /_cat/nodes?v` | Node list (IP, CPU, heap, load) |
| `GET /_cat/nodes?v&s=cpu:desc` | Nodes sorted by CPU |
| `GET /_nodes/hot_threads?threads=3` | Hot threads (when CPU is high) |
| `GET /_nodes/stats/thread_pool` | Thread pool queue / rejected |
| `GET /_nodes/stats/breaker` | Circuit breaker trips |
| `GET /_nodes/stats/jvm` | JVM / GC stats |
> **Version note**
> - The `?local` flag on `GET /_cat/nodes` was deprecated in 7.x and removed in 8.0 — **do not use it**.
> - Other node APIs are compatible across 6.x / 7.x / 8.x.
---
## Index and shard layer
| Endpoint | Purpose |
|----------|---------|
| `GET /_cat/indices?v&s=pri.store.size:desc` | Indices by primary store size |
| `GET /_cat/indices?v&s=pri:desc` | Indices by primary shard count |
| `GET /_cat/shards?v&s=state` | Shards including unassigned reasons |
| `GET /_cat/allocation?v&bytes=gb` | Per-node allocation and disk |
| `GET /_cat/fielddata?v&s=size:desc` | Fielddata memory (6.x may need extra settings) |
| `GET /_cat/recovery?v&active_only=true` | Active shard recovery |
---
## Task layer
| Endpoint | Purpose |
|----------|---------|
| `GET /_cat/tasks?v&detailed=true` | Running tasks with descriptions |
| `GET /_tasks?detailed=true` | Task details (search/write timeouts) |
---
## Snapshots and ILM
| Endpoint | Purpose |
|----------|---------|
| `GET /_snapshot/_status` | In-flight snapshot operations |
| `GET /_cat/snapshots/<repo>?v` | Snapshot list (repository required) |
| `GET /_ilm/status` | ILM status (**Elasticsearch 7.0+** only) |
---
## Version compatibility
| ES version | Compatible endpoints | Not supported | Notes |
|------------|----------------------|---------------|--------|
| **7.x – 8.x** | 18 / 18 (100%) | — | Full |
| **6.x** | 17 / 18 (94%) | `_ilm/status` | ILM introduced in 7.0 |
**Notes**
- `_ilm/status`: ILM is not available on 6.x.
- Do not use `?local` on `_cat/nodes` on 8.x.
- `_cat/fielddata` on 6.x may require additional settings to return data.
---
## Scenario quick map
| Scenario class | Primary endpoints |
|----------------|-------------------|
| Cluster Red/Yellow, unassigned shards | `POST /_cluster/allocation/explain` + `GET /_cat/shards` |
| Sustained high CPU, hot threads | `GET /_nodes/hot_threads` + `GET /_cat/nodes` |
| JVM pressure, GC | `GET /_nodes/stats/jvm` + `GET /_nodes/stats/breaker` |
| Thread pool saturation / rejections | `GET /_nodes/stats/thread_pool` |
| Disk pressure, large indices | `GET /_cat/allocation` + `GET /_cat/indices` |
| Slow search/write, timeouts | `GET /_cat/tasks` + `GET /_tasks?detailed=true` |
| Snapshot failures | `GET /_snapshot/_status` + `GET /_cat/snapshots/<repo>` |
| Master backlog | `GET /_cluster/pending_tasks` |
| ILM issues | `GET /_ilm/status` |
FILE:references/es-api-diagnosis-strategy.md
# Elasticsearch REST API diagnosis strategy (MUST rules)
This note captures Q&A from skill usage: when `SKILL.md` says ES APIs are **MUST**, why some paths still rely on control plane + CMS only, and how skill text relates to tooling.
**When `curl` fails** (401, timeouts, refused): use the progressive guide [es-api-call-failures.md](es-api-call-failures.md) first, then return here for MUST / SHOULD rules.
**SKILL alignment:** MUST-trigger workflow (**reachability** including `ES_*` and endpoint match, then **required `curl` per fired row**) is **normative** in `SKILL.md` §5 (*Binding rule (MUST triggers)*), Step 2, and §7. The CMS tables in this file are supplementary; if workflow nuance differs, follow **`SKILL.md`** for agent execution.
---
## 1. Decision rules in the skill (evidence-driven)
Calling engine-layer `curl` (REST) needs **both**:
| Condition | Meaning |
|-----------|---------|
| **Executable** | `ES_ENDPOINT` and `ES_PASSWORD` are set, path to port `9200` works (scheme, allowlists, security group, etc.). |
| **Necessary** | Root-cause reasoning needs engine-layer proof that control plane / CMS cannot supply or refute. |
### Summary
- **MUST (run ES APIs)**
- Cluster Red/Yellow: unassigned shard reasons, allocation filters.
- Search/write performance: thread pools, rejections, hot threads, breakers, task backlog.
- Cases where control plane looks fine but the engine is not.
- **SHOULD (prefer ES APIs)**
- CPU/memory/load anomalies that disagree with control-plane signals — cross-check.
- Post-change incidents — confirm stuck recovery or long-running tasks on the engine.
- **CAN SKIP**
- Root cause is already closed on the control plane and engine data would not change the answer (e.g. clear RAM denial, instance in a controlled change window, billing/resource state making the instance unavailable).
---
## 2. MUST trigger signals (CMS)
**Treat as MUST when any of the following CMS signals appear:**
| CMS signal | MUST theme | Typical ES API evidence |
|------------|------------|-------------------------|
| `ClusterStatus` max ≥ 1 (Yellow) or ≥ 2 (Red) | Cluster health | `/_cluster/allocation/explain`, `/_cat/shards` |
| `ClusterDisconnectedNodeCount` max > 0 | Node loss | `/_cat/nodes`, `/_cluster/health` |
| `NodeCPUUtilization` max > 80% | CPU overload | `/_nodes/hot_threads`, `/_tasks` |
| `NodeHeapMemoryUtilization` max > 85% | Memory pressure | `/_nodes/stats/jvm`, `/_nodes/stats/breaker` |
| `NodeDiskUtilization` max > 85% | Disk pressure | `/_cat/allocation`, `/_cat/shards` |
| Thread pool `rejected` > 0 on any node | Performance | `/_nodes/hot_threads`, `/_nodes/stats/thread_pool` |
| CPU / memory / disk CV > 0.3 across nodes | Imbalance | `/_cat/shards`, `/_cat/allocation` |
| **Every diagnosis (ALWAYS)** | **Disk watermarks / index read-only** | `/_cluster/settings`, `/_all/_settings?filter_path=*.settings.index.blocks`, `/_cat/allocation` |
| Intermittent ES API timeouts + `NodeCPUUtilization` > 80% | Possible meltdown | `/_nodes/hot_threads`, `/_nodes/stats/thread_pool`, `/_tasks` |
**Mandatory hint:** If a MUST situation is detected and `ES_ENDPOINT` is **not** set, the report **must** start with a warning listing the missing engine-layer evidence.
---
## 2.5 ALWAYS checks (when ES is reachable)
**These do not wait on CMS thresholds; run whenever `ES_ENDPOINT` (and password) is available:**
| Check | ES API | What it catches | Why |
|-------|--------|-----------------|-----|
| Watermark settings | `GET _cluster/settings?include_defaults=true&filter_path=**.watermark` | Absolute-byte or bad % watermarks | CMS disk % can look fine while transient watermarks already force read-only |
| Index read-only blocks | `GET _all/_settings?filter_path=*.settings.index.blocks` | `read_only_allow_delete: true` | Cluster can be Green with “normal” disk % while indices are already read-only from flood_stage |
| Free space vs absolute watermark | `GET _cat/allocation?format=json&bytes=b` | Free bytes below absolute `flood_stage` | Only needed when watermarks use absolute bytes; % breach is compared to CMS in the script |
**Rationale (watermark edge case):** Low disk utilization and cluster Green can still coexist with transient absolute-byte watermarks that block writes. CMS-only views miss that class of misconfiguration.
---
## 3. Why sometimes only control plane + CMS?
**MUST addresses necessity:** in the situations above, methodology says you **should** use engine APIs for deep proof.
But the skill **also** states **executable** preconditions. If `ES_ENDPOINT` / `ES_PASSWORD` are missing, or TCP to `9200` fails, `curl` **cannot** run. Then you can only:
- Use `aliyun` CLI + CMS + logs;
- State the **evidence boundary** in the report: no ES REST path — no engine-level root cause.
So: **not** “MUST can be skipped on purpose”, but **“MUST cannot run until preconditions hold.”**
---
## 3.5 Connection failures — differentiate patterns
> **Full playbooks** (credential probe, HTTP 401 meaning, exit 28 triage, intermittent timeouts, evidence boundary, report checklist): [es-api-call-failures.md](es-api-call-failures.md)
**Summary table** (read the linked doc for steps):
| Pattern | Typical symptom | First pointer |
|---------|-----------------|---------------|
| **Refused** | curl exit 7 | Process / port / public access — `sop-cluster-health.md` |
| **Persistent timeout** | exit 28, all calls | Allowlist + CMS triage — **not** “network only by default” |
| **Intermittent timeout** | mixed success | Overload / meltdown — `sop-service-avalanche.md` |
| **Auth failure** | HTTP 401 | `ES_PASSWORD` / `ES_USERNAME` / scheme |
| **TLS mismatch** | wrong version errors | `http://` vs `https://` on `ES_ENDPOINT` |
---
## 4. Documentation vs `check_es_instance_health.py` today
When `ES_ENDPOINT` and `ES_PASSWORD` are set, `_check_cluster_config_optional` **automatically** calls:
| Area | API | Role |
|------|-----|------|
| Liveness | `GET /_cluster/health` | Fail fast if ES is unreachable |
| Settings | `GET /_cluster/settings?include_defaults=true` | Fielddata breaker + disk **watermark** rules (incl. absolute-byte branch with `/_cat/allocation?format=json&bytes=b` when needed) |
| Replicas | `GET /_cat/indices?h=index,rep&format=json` | Business indices with `number_of_replicas=0` |
| Read-only | `GET /_all/_settings?filter_path=*.settings.index.blocks` | `read_only_allow_delete` blocks |
| Thread pools | `GET /_nodes/stats/thread_pool` | Non-zero `rejected` counters |
Finding **remediation** text may recommend `/_cluster/allocation/explain`, `/_cat/shards`, `/_nodes/hot_threads`, etc. Those are **operator / agent next steps** — the script does **not** HTTP-fetch **hot_threads** or **allocation/explain** bodies by itself. When rule findings imply a **§5 MUST** row, `check_es_instance_health.py` prints a short **§5 MUST — engine APIs** footer (deduped paths); the agent must still **execute** those calls after §2.2 reachability.
**Skill / agent workflow:** for Red/Yellow, thread-pool saturation, and other MUST rows in `SKILL.md`, the agent should still run the listed `curl` (or Console) steps to collect engine evidence beyond what the script auto-pulls.
### Planned vs implemented
Older drafts described stderr banners such as `[MUST采集] … auto allocation/explain`. **That auto-fetch is not in the current script** — allocation/explain and hot_threads bodies remain **manual MUST**. The script may print a **§5 MUST — engine APIs** footer (stdout) when findings map to `SKILL.md` §5; that is a checklist, not an HTTP substitute.
---
## 5. Summary
| Question | Answer |
|----------|--------|
| MUST scenarios — must ES APIs be used? | **Methodologically yes**, for evidence control plane/CMS cannot replace. |
| Why sometimes not? | Usually **executable** preconditions fail (no creds, no route, wrong endpoint). |
| What does the health script auto-call? | Health + cluster settings (breakers/watermarks) + zero-replica cat + read-only settings + thread-pool stats — **not** allocation/explain or hot_threads bodies. |
| Practical advice | When `9200` is reachable, set env vars; run additional MUST curls from `SKILL.md` / SOPs where the script stops at recommendations. |
---
## 6. Related references
- Main skill: `../SKILL.md` (workflow, credentials, direct ES access)
- **REST call failures (progressive):** [es-api-call-failures.md](es-api-call-failures.md)
- Cluster health SOP: `sop-cluster-health.md`
- Performance SOPs: `sop-write-performance.md`, `sop-query-thread-pool.md`
FILE:references/health-events-catalog.md
# Elasticsearch health event catalog
> **18 event codes / 49 reason codes** across **8** categories. Each row: **reason code** under its **event code**.
---
## Quick nav: TOP 15 by case frequency
| Rank | Reason code | Event code | Priority | Threshold (summary) | SOP |
| --- | --- | --- | --- | --- | --- |
| 1 | `CPU.PeakUsageHigh` | `HealthCheck.CPULoadHigh` | P1 | P99 > 95% for 3m | [sop-cpu-load.md](sop-cpu-load.md) |
| 2 | `JVMMemory.OldGenUsageCritical` | `HealthCheck.JVMMemoryPressure` | P0 | avg > **85%** for 2m | [sop-memory-gc.md](sop-memory-gc.md) |
| 3 | `Node.Disconnected` | `HealthCheck.ClusterUnhealthy` | P0 | disconnected nodes > 0 for 10m | [sop-cluster-health.md](sop-cluster-health.md) |
| 4 | `CPU.PersistUsageHigh` | `HealthCheck.CPULoadHigh` | P0/P1 | avg > **70%** (P0) / > **60%** (P1) for 10m | [sop-cpu-load.md](sop-cpu-load.md) |
| 5 | `JVMMemory.GCRateTooHigh` | `HealthCheck.JVMMemoryPressure` | P1 | Old GC > **1/min** for 10m | [sop-memory-gc.md](sop-memory-gc.md) |
| 6 | `Disk.UsageCritical` | `HealthCheck.DiskUsageHigh` | P0 | max > **85%** for 5m | [sop-disk-storage.md](sop-disk-storage.md) |
| 7 | `JVMMemory.OldGenUsageHigh` | `HealthCheck.JVMMemoryPressure` | P1 | avg > **75%** for 5m | [sop-memory-gc.md](sop-memory-gc.md) |
| 8 | `ThreadPool.WriteRejected` | `HealthCheck.ThreadPoolSaturation` | P0 | write rejected > 0.1/s and TPS > 1/s | [sop-write-performance.md](sop-write-performance.md) |
| 9 | `JVMMemory.GCTimeRatioTooHigh` | `HealthCheck.JVMMemoryPressure` | P0 | GC time / wall clock > **10%** for 5m | [sop-memory-gc.md](sop-memory-gc.md) |
| 10 | `ThreadPool.WriteQueueHigh` | `HealthCheck.ThreadPoolSaturation` | P0 | write queue > threads×80% for 5m | [sop-write-performance.md](sop-write-performance.md) |
| 11 | `ThreadPool.SearchRejected` | `HealthCheck.ThreadPoolSaturation` | P0 | search rejected > 0.1/s and QPS > 1/s for 5m | [sop-query-thread-pool.md](sop-query-thread-pool.md) |
| 12 | `Balancing.NodeCPUUnbalanced` | `HealthCheck.LoadUnbalanced` | P1 | CPU CV > **0.3** for 10m | [sop-cpu-load.md](sop-cpu-load.md) |
| 13 | `Disk.UsageHigh` | `HealthCheck.DiskUsageHigh` | P1 | max > **75%** for 10m | [sop-disk-storage.md](sop-disk-storage.md) |
| 14 | `ThreadPool.SearchQueueHigh` | `HealthCheck.ThreadPoolSaturation` | P0 | search queue > threads×80% for 5m | [sop-query-thread-pool.md](sop-query-thread-pool.md) |
| 15 | `Cluster.StatusRed` | `HealthCheck.ClusterUnhealthy` | P0 | ClusterStatus == 2 for 2m | [sop-cluster-health.md](sop-cluster-health.md) |
---
## 1. Cluster health (3 reason codes)
**Event code:** `HealthCheck.ClusterUnhealthy`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Cluster.StatusRed` | **P0** | `ClusterStatus == 2` for **2m** | 1 | `ClusterStatus` | Node loss / disk full / allocation misconfiguration |
| `Cluster.StatusYellow` | P1 | `ClusterStatus == 1` for **30m** | 2 | `ClusterStatus` | Too few nodes / disk pressure / change in progress |
| `Node.Disconnected` | **P0** | `ClusterDisconnectedNodeCount > 0` for **10m** | 3 | `ClusterDisconnectedNodeCount` | OOM / CPU pegged / disk full / network |
> **Log-based (no CMS rule):** `Cluster.UnavailableShards` — `UnavailableShardsException` in logs, primary inactive; same **event code** `HealthCheck.ClusterUnhealthy`.
**Metric source:** `ClusterStatus`, `ClusterDisconnectedNodeCount` from CMS (`fetch_metrics_batch`).
---
## 2. Master stability (3 reason codes)
**Event code:** `HealthCheck.MasterStabilityRisk`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Master.TasksPendingCritically` | **P0** | `pending_tasks > node_count×50` for **3m** | 4 | metric TBD | Small master / huge shard count / ILM backlog |
| `Master.TasksPendingHigh` | P1 | `pending_tasks > 50` for **5m** | 5 | same | Same, early warning |
| `Master.ElectionTooMany` | **P0** | Master elections > 0 in 1h | 6 | same | Master OOM / network / CPU pegged / long GC pauses |
> **Metric status:** `ClusterPendingTasksCount`, `MasterElectionCount` not in CMS yet — collect via ES API (`GET _cluster/pending_tasks`, `GET _cat/master`).
---
## 3. Service availability (1 reason code)
**Event code:** `HealthCheck.ClusterRequestError`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Cluster.RequestErrorTooHigh` | P1 | HTTP 5xx rate > **5%** and QPS > 1/s for **3m** | 7 | ClusterRequest5xxQPS | Version bug / resource exhaustion / metadata issues |
> Historical typo `Cluster.RequstErrorTooHigh` may appear in older exports; canonical name: **`Cluster.RequestErrorTooHigh`**.
---
## 4. Resource anomalies (17 reason codes)
### 4.1 JVM memory pressure
**Event code:** `HealthCheck.JVMMemoryPressure`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `JVMMemory.OldGenUsageHigh` | P1 | heap avg > **75%** for **5m** | 8 | `NodeHeapMemoryUtilization` | Small heap / heavy queries / fielddata / leak |
| `JVMMemory.OldGenUsageCritical` | **P0** | heap avg > **85%** for **2m** | 9 | `NodeHeapMemoryUtilization` | Same, more urgent |
| `JVMMemory.HeapGrowthRateFast` | P1 | +9% in 3m and +15% in 15m | 10 | `NodeHeapMemoryUtilization` | Bulk queries / leak / traffic spike |
| `JVMMemory.GCRateTooHigh` | P1 | Old GC > **1/min** for **10m** | 17 | `JVMGCOldCollectionCount` | Heap pressure / fielddata / large aggs |
| `JVMMemory.GCTimeRatioTooHigh` | **P0** | GC time / wall clock > **10%** for **5m** | 18 | `JVMGCOldCollectionDuration` | Old Gen pressure / frequent Full GC |
| `JVMMemory.GCDurationTooLong` | **P0** | avg GC duration > 5000ms | 19 | `JVMGCOldCollectionDuration` | Bad G1 tuning / large heap objects |
| `JVMMemory.FielddataCacheTooLarge` | P1 | Fielddata > 30% heap | 20 | metric TBD | text field aggs / unbounded fielddata |
| `JVMMemory.BreakerTripped` | **P0** | breaker trips ≥ 5 per 5m | 35 | metric TBD | Huge result sets / fielddata blow-up / heavy aggs |
| `JVMMemory.BreakerLimitConfigLow` | P2 | `indices.breaker.fielddata.limit` < 40% (settings check, ES API) | — | `GET /_cluster/settings` | Limit below recommendation — small aggs can trip |
| `JVMMemory.OOM` | **P0** | `OutOfMemoryError` in logs | — | instance logs | OldGen full / fielddata explosion |
### 4.2 High CPU
**Event code:** `HealthCheck.CPULoadHigh`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `CPU.PeakUsageHigh` | P1 | CPU P99 > **95%** for **3m** | 11 | `NodeCPUUtilization` | Heavy queries / ingest / GC / imbalance |
| `CPU.PersistUsageHigh` (P0) | **P0** | CPU avg > **70%** for **10m** | 12 | `NodeCPUUtilization` | Sustained load — node loss risk |
| `CPU.PersistUsageHigh` (P1) | P1 | CPU avg > **60%** for **10m** | 12 | `NodeCPUUtilization` | Same, warning band |
| `CPU.UsageGrowthRateFast` | P0 | +9% in 3m and +15% in 15m | 15 | `NodeCPUUtilization` | Slow requests / burst traffic |
> `CPU.PersistUsageHigh` uses dual thresholds: avg > 70% → P0, 60–70% → P1 (same **reason_code**, different **priority**).
### 4.3 Disk utilization
**Event code:** `HealthCheck.DiskUsageHigh`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Disk.UsageHigh` | P1 | max > **75%** for **10m** | 13 | `NodeDiskUtilization` | Data growth / snapshots not cleaned |
| `Disk.UsageCritical` | **P0** | max > **85%** for **5m** | 14 | `NodeDiskUtilization` | Near ES flood-stage (~95%) read-only |
| `Disk.UsagePredictiveRisk` | P1 | `predict_linear` shows < 5% free in 24h | 22 | `NodeDiskUtilization` | Upward trend |
| `Disk.IndexReadOnly` | **P0** | block count > 0 | 36 | ClusterIndexWritesBlocked | Disk ≥95% auto read-only |
| `Disk.WatermarkAbsoluteFloodBreached` | **P0** | free bytes < absolute `flood_stage` | — | ES API: `_cluster/settings` + `_cat/allocation` | Absolute watermark tripped |
| `Disk.WatermarkAbsoluteFloodMarginLow` | P1 | free space barely above flood_stage (margin < 500MB) | — | ES API: same | Absolute watermark, tiny margin |
| `Disk.WatermarkAbsoluteValue` | P1 | watermarks configured as absolute bytes (not %) | — | ES API: `_cluster/settings` | Non-default — does not scale with disk growth |
### 4.4 Disk IO
**Event code:** `HealthCheck.DiskIOBottleneck`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Disk.IOPerformancePoor` | **P0** | IO util > **90%** for **5m** | 19 | `NodeStatsDataDiskUtil` | Slow disk / bad disk / heavy write / merges |
| `Disk.IOBandwidthThrottling` | **P0** | disk bandwidth util > **90%** for **5m** | 16 | NodeStatsDataDiskIoCloudDiskBaseBandwidthRate | Cloud disk cap / concentrated write |
> IO rule priority was P1 → updated to **P0** (rule #19).
### 4.5 File descriptors
**Event code:** `HealthCheck.SystemFileDescriptorsHigh`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `System.FileDescriptorsUsageHigh` | **P0** | open_fd / max_fd > **90%** for **10m** | 21 | metric TBD | Too many connections / shards / leaks |
---
## 5. Performance bottlenecks (10 reason codes)
### 5.1 Thread pool saturation
**Event code:** `HealthCheck.ThreadPoolSaturation`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `ThreadPool.SearchQueueHigh` | **P0** | search queue > threads×80% (or > 100) for **5m** | 23 | SearchThreadpoolQueue | Slow queries / high QPS / low CPU headroom |
| `ThreadPool.SearchRejected` | **P0** | search rejected > 0.1/s and QPS > 1/s for **5m** | 24 | SearchThreadpoolRejected | Queue full — new searches rejected |
| `ThreadPool.WriteQueueHigh` | **P0** | write queue > threads×80% (or > 100) for **5m** | 25 | WriteThreadpoolQueue | Heavy ingest / slow disk / low CPU |
| `ThreadPool.WriteRejected` | **P0** | write rejected > 0.1/s and TPS > 1/s for **5m** | 26 | WriteThreadpoolRejected | Queue full — writes rejected (HTTP 429) |
| `ThreadPool.GenericQueueHigh` | P1 | generic queue > threads×80% for **10m** | 27 | metric TBD | High recovery concurrency / `node_concurrent_recoveries` too high |
### 5.2 High latency
**Event code:** `HealthCheck.LatencyHigh`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Latency.IndexingSlow` | **P0** | indexing latency > 1000ms and QPS > 5/s for **1m** | 28 | ClusterIndexingLatency | Slow disk / CPU / aggressive refresh |
| `Latency.SearchSlow` | **P0** | search latency > 2000ms and QPS > 5/s for **1m** | 29 | ClusterSearchLatency | Slow queries / CPU / memory / cold data |
| `Latency.SearchTaskRunningLong` | P1 | search task runtime > 5 minutes | 30 | metric TBD | delete_by_query / reindex / heavy aggs |
| `Latency.RefreshSlow` | P1 | refresh > 1000ms for **5m** | 31 | metric TBD | Too many segments / many fields / complex mappings |
### 5.3 Slow recovery
**Event code:** `HealthCheck.RecoverySlow`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Recovery.SlowWarning` | Info | recovery rate < 50% of configured cap or ETA > 4h | 45 | metric TBD | Low recovery throttle / slow disk / limited bandwidth |
---
## 6. Capacity planning (6 reason codes)
### 6.1 Shard misconfiguration
**Event code:** `HealthCheck.ShardMisconfigured`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Shard.SegmentCountTooMany` | P1 | segments > 100 per node for **30m** | 32 | metric TBD | Rare forcemerge / short refresh interval |
| `Shard.TotalCountTooMany` | P1 | total shards > nodes×CPU×20 for **1h** | 41 | `ClusterShardCount` | Too many indices / shards per index |
| `Shard.SizeUnreasonable` | P2 | avg shard < 10GB or > 50GB for **1h** | 42 | metric TBD | Wrong shard count / bad sizing |
| `Shard.DocumentNearLimit` | **P0** | docs per shard > 2B (Lucene ~2.1B cap) for **30m** | 43 | metric TBD | No rollover / ILM |
| `Shard.NodeCountTooHigh` | P1 | shards per node > 1000 for **30m** | 44 | metric TBD | Too many shards / too few nodes |
### 6.2 Load imbalance
**Event code:** `HealthCheck.LoadUnbalanced`
> Imbalance spans **traffic**, **data placement**, and **resource levels**. See [sop-node-load-imbalance.md](sop-node-load-imbalance.md).
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Balancing.NodeCPUUnbalanced` | P1 | CPU CV > **0.3** for **10m** | 40 | `NodeCPUUtilization` | Hot shards / no coordinating nodes / skew |
| `Balancing.NodeTrafficUnbalanced` | P1 | QPS/TPS CV > **0.3** for **10m** | — | thread pool active/queue | No coordinating nodes / uneven clients / routing skew |
| `Balancing.NodeDataUnbalanced` | P2 | shard count or store CV > **0.3** for **30m** | — | `_cat/allocation` | New nodes / large index concentration / routing skew |
| `Balancing.NodeDiskUnbalanced` | P1/P0 | disk util CV > **0.3**, hot node >75% (P1) / >85% (P0) | — | `NodeDiskUtilization` | Large shards on few nodes / old data |
| `Balancing.NodeMemoryUnbalanced` | P1 | heap util CV > **0.3** and max node > **75%** for **10m** | — | `NodeHeapMemoryUtilization` | Fielddata skew / hot query caches |
---
## 7. High-availability risk (4 reason codes)
### 7.1 Cluster scale
**Event code:** `HealthCheck.ClusterScaleLow`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Scale.NodesInsufficient` | P1 | total nodes < 3 for **10m** | 33 | `ClusterNodeCount` | Nodes lost / under-provisioned |
| `Scale.MasterEligibleNodesInsufficient` | P1 | master-eligible < 3 for **5m** | 39 | metric TBD | No dedicated masters / master nodes down |
### 7.2 Missing replication
**Event code:** `HealthCheck.ReplicationMissing`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Replication.MissingReplicas` | P1 | index with 0 replicas and no snapshot | 37 | metric TBD | Manual setting / leftover during resize |
### 7.3 Backup failure
**Event code:** `HealthCheck.BackupFailure`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Backup.SnapshotFailed` | **P0** | auto snapshot status == 2 (failed) | 34 | `ClusterAutoSnapshotLatestStatus` | Space / permissions / cluster load |
| `Backup.SnapshotOutdated` | P2 | no successful snapshot for **24h** | 38 | metric TBD | Policy missing / repeated failures |
---
## 8. Configuration risk (3 reason codes)
**Event code:** `HealthCheck.ConfigurationRisk`
| Reason code | Priority | Trigger | Rule # | Key metrics | Typical root cause |
| ----------- | -------- | ------- | ------ | ----------- | ------------------ |
| `Config.JVMGCStrategyNotOptimal` | Info | Old GC rate > 1/min for **30m** | 46 | `JVMGCOldCollectionCount` | CMS instead of G1 / wrong heap sizing |
| `Config.RiskClusterSettings` | P1 | `concurrent_rebalance > 16` or `concurrent_recoveries > 8` or `recovery > 200MB/s` | 47 | metric TBD | Ops tuned migration too hot for nodes |
| `Config.RiskndexSettings` | P1 | Ngram `max_gram > 100`, or too many shards on small index, or 0 replicas without snapshot | 48 | metric TBD | Bad Ngram / shard planning |
---
## Appendix A: Threshold cheat sheet (health script subset)
Thresholds implemented in `scripts/check_es_instance_health.py` (baseline 20260318) may differ slightly from CMS-only rule numbers below — use the script `THRESHOLDS` for ground truth.
| Metric | Warning (P1) | Critical (P0) | reason_code |
| ----------- | ------------- | ------------- | ----------- |
| Heap utilization | avg > **75%** | avg > **85%** | `JVMMemory.OldGenUsageHigh` / `Critical` |
| CPU sustained | avg > **60%** | avg > **70%** | `CPU.PersistUsageHigh` |
| CPU peak | — | max ≥ **95%** (with sustained avg ≤ 60%) | `CPU.PeakUsageHigh` (P0); 80–94% band → P1 |
| Disk utilization | max > **75%** | max > **85%** | `Disk.UsageHigh` / `Critical` |
| Disk IO util | — | max > **90%** | `Disk.IOPerformancePoor` |
| Old GC rate | > **1/min** (max in window) | — | `JVMMemory.GCRateTooHigh` |
| GC time ratio | — | > **10%** of wall time | `JVMMemory.GCTimeRatioTooHigh` |
| CPU imbalance CV | CV > **0.3** (with CPU floor) | — | `Balancing.NodeCPUUnbalanced` |
---
## Appendix B: How metrics are collected
### Available via CMS `fetch_metrics_batch`
> References: [Basic metrics (Alibaba Cloud Help)](https://help.aliyun.com/zh/es/user-guide/basic-metrics) |
> [Cluster metrics guide](https://help.aliyun.com/zh/es/user-guide/metrics-and-exception-handling-suggestions)
```
ClusterStatus Cluster health (0=Green, 1=Yellow, 2=Red)
ClusterAutoSnapshotLatestStatus Snapshot status (-1=none/0=ok/1=running/2=failed)
ClusterDisconnectedNodeCount Disconnected node count
ClusterNodeCount Node count
ClusterShardCount Total shard count
ClusterQueryQPS Cluster search QPS
ClusterIndexQPS Cluster indexing QPS
ClusterSearchLatency Avg search latency (ms)
ClusterIndexingLatency Avg indexing latency (ms)
ClusterSlowSearchingCount Slow query count
NodeCPUUtilization Node CPU (%)
NodeHeapMemoryUtilization Node heap (%)
NodeDiskUtilization Node disk (%)
NodeFreeStorageSpace Free storage (MiB)
NodeLoad_1m 1m load
NodeStatsDataDiskUtil Disk IO utilization (%)
NodeStatsDataDiskRm Disk read bandwidth (MiB/s)
NodeStatsDataDiskWm Disk write bandwidth (MiB/s)
NodeStatsDataDiskR Read IOPS
NodeStatsDataDiskW Write IOPS
JVMGCOldCollectionCount Old GC count (per sample bucket)
JVMGCOldCollectionDuration Old GC duration ms (per sample bucket)
NodeStatsFullGcCollectionCount Full GC count
NodeStatsExceptionLogCount Exception log count
...
```
### Requires additional ES API (for deeper diagnosis)
```bash
# Thread pools (ThreadPool.*)
GET _nodes/stats/thread_pool?filter_path=nodes.*.thread_pool
# Breakers (JVMMemory.BreakerTripped)
GET _nodes/stats/breaker
# Heap / mem (JVMMemory.FielddataCacheTooLarge)
GET _nodes/stats/jvm?filter_path=nodes.*.jvm.mem
# Running tasks (Latency.SearchTaskRunningLong)
GET _tasks?detailed=true
# Pending tasks (Master.TasksPending*)
GET _cluster/pending_tasks
# Shard states (Cluster.StatusRed/Yellow)
GET _cat/shards?v&s=state&h=index,shard,prirep,state,node,unassigned.reason
# Unassigned reason
GET _cluster/allocation/explain
# Recovery (Recovery.SlowWarning)
GET _cat/recovery?v&active_only=true
# Snapshots (Backup.*)
GET _cat/snapshots?v
# ILM (often tied to Master.TasksPending*)
GET _ilm/status
# Read-only blocks (Disk.IndexReadOnly / watermarks)
GET _all/_settings?filter_path=*.settings.index.blocks
# or
GET _cat/indices?v&h=index,status,health
# Watermarks (Disk.WatermarkAbsolute*)
GET _cluster/settings?include_defaults=true&filter_path=**.watermark,**.flood_stage
GET _cat/allocation?format=json&bytes=b
```
FILE:references/ram-policies.md
# RAM permission list
Minimum read-only RAM permissions required for this skill (no write actions):
## Alibaba Cloud Elasticsearch OpenAPI
- `elasticsearch:DescribeInstance` — Instance details (including clusterTasks)
- `elasticsearch:ListInstance` — List instances
- `elasticsearch:ListSearchLog` — Instance runtime logs
- `elasticsearch:ListActionRecords` — Instance change / action records
- `elasticsearch:ListAllNode` — Cluster node information
## Cloud Monitor (CMS)
- `cms:DescribeMetricList` — Time-series metrics (CPU, memory, disk, load, cluster health)
- `cms:DescribeSystemEventAttribute` — System events (control-plane changes, restarts, etc.)
- `cms:DescribeMetricMetaList` — Metric metadata (available metric catalog)
## Optional but recommended
- `sts:GetCallerIdentity` — Validate the CLI profile (`aliyun sts get-caller-identity`)
## Runtime dependencies
```
aliyun CLI >= 3.3.1
curl
```
FILE:references/related-apis.md
# Related APIs
Complete list of Alibaba Cloud OpenAPIs and Elasticsearch REST APIs used by this skill.
---
## Control plane (OpenAPI)
### Elasticsearch OpenAPI
| Product | API action | Description | Entry point |
|---------|------------|-------------|-------------|
| elasticsearch | DescribeInstance | Instance details (status, version, cluster-related fields) | `check_es_instance_health.py` / `openapi_cli_collect.py` |
| elasticsearch | ListInstance | List instances | `aliyun elasticsearch ListInstance` |
| elasticsearch | ListSearchLog | Instance logs (instance, slow, GC, etc.) | `check_es_instance_health.py` / `openapi_cli_collect.py` |
| elasticsearch | ListActionRecords | Change / action records | `aliyun elasticsearch ListActionRecords` |
| elasticsearch | ListAllNode | Cluster node information | `aliyun elasticsearch ListAllNode` |
### Alibaba Cloud Monitor (CMS) OpenAPI
| Product | API action | Description | Entry point |
|---------|------------|-------------|-------------|
| cms | DescribeMetricList | Time-series metrics | `check_es_instance_health.py` / `openapi_cli_collect.py` |
| cms | DescribeSystemEventAttribute | System events | `check_es_instance_health.py` / `openapi_cli_collect.py` |
| cms | DescribeMetricMetaList | Metric metadata (available metric catalog) | `aliyun cms DescribeMetricMetaList` |
---
## Data plane (Elasticsearch REST API)
### Cluster health and state
| API | Endpoint | Description | Invocation |
|-----|----------|-------------|--------------|
| Cluster health | `GET /_cluster/health` | Cluster health (green/yellow/red) | `curl` |
| Cluster stats | `GET /_cluster/stats` | Cluster statistics | `curl` |
| Pending tasks | `GET /_cluster/pending_tasks` | Master pending tasks | `curl` |
| Allocation explain | `POST /_cluster/allocation/explain` | Unassigned shard reasons | `curl` |
### Nodes
| API | Endpoint | Description | Invocation |
|-----|----------|-------------|--------------|
| Nodes stats | `GET /_nodes/stats` | Node stats (CPU/memory/disk) | `curl` |
| Hot threads | `GET /_nodes/hot_threads` | Hot thread stacks | `curl` |
| Nodes JVM | `GET /_nodes/stats/jvm` | JVM statistics | `curl` |
| Thread pools | `GET /_nodes/stats/thread_pool` | Thread pool stats | `curl` |
| Circuit breakers | `GET /_nodes/stats/breaker` | Breaker trips | `curl` |
| Cat nodes | `GET /_cat/nodes` | Node overview | `curl` |
| Cat nodes CPU | `GET /_cat/nodes?h=name,cpu,load_1m` | CPU-oriented view | `curl` |
### Indices and shards
| API | Endpoint | Description | Invocation |
|-----|----------|-------------|--------------|
| Cat indices | `GET /_cat/indices` | Index list | `curl` |
| Cat indices by size | `GET /_cat/indices?s=store.size:desc` | Indices sorted by size | `curl` |
| Cat shards | `GET /_cat/shards` | Shard layout | `curl` |
| Cat allocation | `GET /_cat/allocation` | Disk / shard allocation | `curl` |
### Tasks and recovery
| API | Endpoint | Description | Invocation |
|-----|----------|-------------|--------------|
| Tasks | `GET /_tasks` | Running tasks | `curl` |
| Tasks detailed | `GET /_tasks?detailed=true` | Detailed tasks | `curl` |
| Cat tasks | `GET /_cat/tasks` | Task overview | `curl` |
| Cat recovery | `GET /_cat/recovery` | Shard recovery | `curl` |
### Snapshots and ILM
| API | Endpoint | Description | Invocation |
|-----|----------|-------------|--------------|
| Snapshot status | `GET /_snapshot/_status` | In-flight snapshots | `curl` |
| ILM status | `GET /_ilm/status` | ILM status (Elasticsearch 7.0+) | `curl` |
---
## Dependencies
```
aliyun CLI >= 3.3.1
elasticsearch>=7,<9 # client library if used by tooling (optional for curl-only path)
```
---
## Permissions
See [ram-policies.md](ram-policies.md).
FILE:references/report-template.md
# Structured diagnosis report (skeleton)
Use with **[acceptance-criteria.md](acceptance-criteria.md)** (§6.x) and repo root **[SKILL.md](../SKILL.md)** §5 Step 4. Copy the block below and fill placeholders.
**Before publish:** Reconcile **ClusterStatus** wording if CMS **window max** (e.g. Yellow) and a **single** `/_cluster/health` snapshot (e.g. green) both appear — qualify **time vs aggregation** ([acceptance-criteria.md](acceptance-criteria.md) **§6.1**). If **per-node CPU %** seems to contradict **which node** was **`search` pool**–bound, add **one sentence** on **sampling / window mean vs spike** ([acceptance-criteria.md](acceptance-criteria.md) **§6.2**). If **slow-log node name** ≠ **node named in pool-rejection / INSTANCELOG** lines, add **one sentence** on **routing / phase / time** ([acceptance-criteria.md](acceptance-criteria.md) **§6.2**). If **CMS `ClusterShardCount`** (or similar) **jumps** (e.g. half then back), **do not** imply shard loss without **`_cat/shards`** + **ops record** cross-check ([acceptance-criteria.md](acceptance-criteria.md) **§6.1**).
```text
## Diagnosis summary
**Instance**: {instance_id} ({region_id})
**Analysis window**: {begin} ~ {end}
### Cross-layer root cause (required when `activating` coexists with Red / unassigned)
**One-line root cause**: {Chain per `sop-activating-change-stuck.md` section 4: change waiting for recovery ← Red ← allocation explain}
(Omit or mark “N/A” if there is no `activating` or the cluster is Green.)
### Incident timeline (recency-ordered)
{Earlier → later, or “latter-window emphasis”: which dimensions (search / write / GC / CPU / disk) peaked or persisted when; cite CMS peak times or log alignment}
### Findings (by priority)
#### P0 - Critical (immediate)
- [Event code] Description
- Evidence: {metrics / log keywords / events}
- Root-cause reasoning: {analysis}
- Immediate actions: {commands or steps}
#### P1 - Warning (within 30 minutes)
...
### Root-cause chain diagram
{Propagation path, e.g. disk full → shards cannot allocate → cluster Red}
### Open questions / follow-ups
{Uncertainties and next checks}
```
FILE:references/sop-activating-change-stuck.md
# SOP: Instance `activating` / change stuck
**Covers:** `ManagementPlane.ActivatingStuck`; often appears together with `HealthCheck.ClusterUnhealthy` (cluster Red) and unassigned shards on the engine.
**Related:** For engine-side Red / allocation, follow [sop-cluster-health.md](sop-cluster-health.md) (Section 1 — cluster Red). **This** SOP explains the **cross-layer causality** between control-plane lifecycle `activating` and engine health — so the report does not only list “one control-plane line + one engine line” without a closed loop.
---
## 1. What `activating` means
`activating` is **not** a random standalone fault. It is the **control-plane lifecycle state** while a change has **not finished** (console / OpenAPI). Typical causes:
- **Rolling orchestration** after `RestartInstance` and similar;
- Tasks still in `updating`, e.g. plugin install/remove or config changes.
User experience: “The instance has been changing / activating for hours.”
---
## 2. Cross-layer causality (closed-loop root cause)
### 2.1 Control-plane facts
- `DescribeInstance`: `status == activating` (or equivalent), often with `updatedAt` to detect “stuck too long” (aligned with rule engine `ManagementPlane.ActivatingStuck`).
- `ListActionRecords`: change type (e.g. rolling restart, plugin op), phase, timeline.
- `ListAllNode`: node roles, whether under rolling, abnormal nodes.
### 2.2 Orchestration constraint (why “change never finishes” shows as `activating`)
Many changes require the **cluster to reach an acceptable health state** (e.g. rolling waits for shard assignment, waits until the cluster is servicable) before the next step. If that condition **cannot be met for a long time**, the orchestration task stays in updating and the instance stays **`activating`**.
### 2.3 Engine-side closure (why “the cluster never recovers”)
**Do not duplicate** `/_cluster/allocation/explain`, `_cat/shards`, `unassigned.reason` playbooks here — use [sop-cluster-health.md](sop-cluster-health.md) (Section 1 — cluster Red).
In the “long `activating`” context, remember: **orchestration is waiting for “cluster can recover / shards can allocate”**; the **engine** Red root cause still comes from **allocation explain** (e.g. allocation filter to a non-existent node, disk, allocation disabled). Until that is fixed, **“wait for cluster recovery” may never succeed** → instance stays `activating`.
> **Takeaway:** On the control plane, `activating` means “change not finished”. The **technical closure** that explains **why** it cannot finish is still **`allocation/explain`**, not a paraphrase of `activating` text instead of engine evidence.
### 2.4 Remediation order
1. **When `activating` is confirmed, complete control-plane evidence before touching the engine:** at least **`DescribeInstance` + `ListActionRecords` (MUST)**, and **`ListAllNode`** is recommended. `ListActionRecords` gives change type (e.g. `RestartInstance`), phase, rolling progress, stuck node / `pendingOperation`, etc. **Reporting only “activating + RestartInstance” without this API is an incomplete control-plane chain** (a common gap when reviews expect full change-task evidence).
2. **Then** follow [sop-cluster-health.md](sop-cluster-health.md) (Red / allocation section) for engine explain / shard root cause and fixes (delete index, change settings, etc.).
3. **Before destructive or recovery actions** (e.g. `DELETE` index, `PUT` allocation-related settings): keep a **pre-action snapshot** — same-moment `DescribeInstance` + `ListActionRecords` summary (or explicit timestamps and key fields) so the report can state “control-plane state while the change was stuck”.
4. **After** engine blockers are cleared, expect **intermediate** Yellow, throttling, `same_shard`, CMS lag, etc. while moving toward Green (see cluster-health SOP) — that is normal. **Re-check control plane after remediation:** **`DescribeInstance`** again (and **`ListActionRecords`** if needed) to see whether `activating` ended and to validate **“engine recovered → change completed → lifecycle normalized”**; if still `activating`, continue from the new records.
---
## 3. Evidence checklist (MUST / SHOULD)
| Dimension | Source | Requirement |
|-----------|--------|-------------|
| Lifecycle | `DescribeInstance` | **MUST:** `status`, `updatedAt`, etc.; at least once before and after remediation |
| Change task detail | `elasticsearch ListActionRecords` | **MUST** whenever `activating` / change-stuck: type, progress, stuck or pending nodes; **do not** skip this call with only Describe text |
| Nodes | `ListAllNode` | **SHOULD:** rolling and abnormal nodes |
| Cluster health (control/CMS) | `DescribeMetricList` (ClusterStatus, etc.) | **SHOULD:** cross-check with engine |
| Engine root cause | `allocation/explain`, `/_cat/shards` | **MUST** when Red / unassigned: steps in **sop-cluster-health** Section 1 |
### 3.1 Recommended collection order
1. `DescribeInstance` → confirm `activating` and instance fields.
2. **`ListActionRecords`** → task detail (distinguish from “we guessed RestartInstance”: **must come from API**).
3. `ListAllNode` (recommended) → corroborate with change records.
4. Engine: `allocation/explain`, `_cat/shards`, etc. (cluster-health Red section).
5. **Before risky remediation:** another snapshot of `DescribeInstance` + `ListActionRecords` if time has passed since steps 1–2 or you are about to delete indices.
6. Execute remediation → watch `_cluster/health` / CMS for intermediate states.
7. **After remediation:** `DescribeInstance` (+ optional `ListActionRecords`) → confirm `activating` cleared.
---
## 4. Report expectations (skill template)
When both **`activating` (or `ManagementPlane.ActivatingStuck`)** and **engine Red / unassigned primary** exist:
- **Do not** write only “control plane and engine in parallel” with no causal link.
- **Evidence** must include a **`ListActionRecords` summary** (change type, phase/progress, abnormal or pending nodes), cross-checked with **`DescribeInstance`**; if remediation ran, add **before/after** control-plane comparison (Section 3.1 recommended collection order).
- **Must** add a **one-line cross-layer root cause** that states explicitly:
**Change waiting for cluster recovery ← cluster stays Red ← concrete reason from allocation explain (e.g. require points to a non-existent node)**
Example sentence (replace index name and explain conclusion with real values):
> The rolling change on the control plane waits for cluster health; on the engine, index `{index}` has `index.routing.allocation.require._name` pointing at a non-existent node, so primaries cannot allocate and the cluster stays Red — orchestration cannot proceed → instance remains `activating`.
If there is no `activating` or the engine is already Green, write “N/A” or omit this subsection.
FILE:references/sop-cluster-health.md
# SOP: Cluster health diagnosis
**Covers:** `HealthCheck.ClusterStatusRed`, `HealthCheck.ClusterStatusYellow`, `HealthCheck.NodeDisconnected`, `HealthCheck.PendingTasksCritical/Warning`, `HealthCheck.MasterReelection`
> **Related:** If the instance stays `activating` for a long time, the change never finishes, and **cluster Red / unassigned shards** persist together, read [sop-activating-change-stuck.md](sop-activating-change-stuck.md) for **cross-layer causality** and reporting requirements in addition to engine steps in this SOP.
> **Report polish (optional):** [acceptance-criteria.md](acceptance-criteria.md) **§6.1** — Red vs Yellow wording (`unassigned_primary_shards`), primary vs **replica shard** count arithmetic, `unassigned.reason` (incl. optional `INDEX_CREATED` for scoring), affected-index scope, post-remediation `curl` checks, and **scoped** wording when excluding disk/node/CPU. **§6.3** — JVM / breaker / fielddata headline vs GC-only framing.
---
## Diagnosis entry: cluster state decision tree
```
ClusterStatus metric
├── 2 (Red) → [P0 immediate] primary unassigned — partial data unavailable
├── 1 (Yellow) → [P1 within ~30m] replica unassigned — available but data-loss risk
└── 0 (Green) → OK
ClusterDisconnectedNodeCount metric
└── > 0 → [P0 immediate] node disconnected
```
---
## 1. Cluster status Red (`HealthCheck.ClusterStatusRed`)
### Trigger conditions
- `ClusterStatus == 2` for 2 minutes, or
- Logs contain `UnavailableShardsException`
### Diagnosis steps
**Step 1 — Find Red indices**
```bash
GET _cat/indices?v&health=red
```
**Step 2 — Why shards are unassigned**
```bash
GET _cat/shards?v&h=index,shard,prirep,state,node,unassigned.reason&s=state
GET _cluster/allocation/explain # full reason for one representative unassigned shard
# With body — specific shard:
GET _cluster/allocation/explain
{
"index": "my_index",
"shard": 0,
"primary": true
}
```
**Step 3 — Interpret `unassigned.reason`**
| unassigned.reason | Meaning | Mitigation |
|-------------------|---------|------------|
| `NODE_LEFT` | Node left cluster | → see **Section 3 — Node disconnected** |
| `ALLOCATION_FAILED` | Allocation failed more than 5 times | `POST _cluster/reroute?retry_failed=true` |
| `ALLOCATION_DISABLED` | Allocation disabled cluster-wide | `PUT _cluster/settings {"persistent":{"cluster.routing.allocation.enable":"all"}}` |
| `NO_VALID_SHARD_COPY` | No valid shard copy | Data likely lost — restore from snapshot |
| `DISK_THRESHOLD_EXCEEDED` | Disk watermark | → [sop-disk-storage.md](sop-disk-storage.md) |
| `DECIDERS_THROTTLED` | Recovery throttled | Wait / tune throttling / `reroute` as appropriate |
| `DECIDERS_NO` + `box_type` | Hot/warm `box_type` mismatch | Remove `routing.allocation.require.box_type` on the index or add matching nodes |
| `AWAITING_INFO` | Waiting on snapshot | Wait for snapshot or cancel the snapshot job |
**Step 4 — Special cases**
**A. Hot/warm (`box_type`) mismatch**
```bash
GET _cat/nodeattrs?v&h=host,attr,value
PUT /index_name/_settings
{
"index.routing.allocation.require.box_type": null
}
```
**B. Synonym configuration blocking allocation**
Example errors:
```text
IOException while reading synonyms_path_path: /path/to/file.txt
IllegalArgumentException: term: xxx analyzed to a token with position increment != 1 (got: 2)
```
**Scenario 1 — file path dependency**
```bash
POST /index_name/_close
# Fix synonym file path or remove the dependency
POST /index_name/_open
```
**Scenario 2 — stopword vs synonym conflict** (real case)
- **Root cause:** The synonym file contains terms that are removed by a stop filter (e.g. `"perche à selfie"` where `à` is dropped), producing `position increment != 1`.
- **Remediation:**
1. Inspect the synonym file for stopword overlap.
2. Remove or change synonym rules that conflict with stopwords.
3. Or reorder analysis so the stop filter runs **after** the synonym filter (product-dependent).
**C. Blocked by snapshot**
```bash
GET _snapshot/_status
DELETE _snapshot/repo_name/snapshot_name # cancel if appropriate
```
**Step 5 — If data cannot be recovered (destructive — confirm with user)**
```bash
POST _cluster/reroute
{
"commands": [{
"allocate_empty_primary": {
"index": "index_name",
"shard": 0,
"node": "node_name",
"accept_data_loss": true
}
}]
}
```
### Common causal chains
```text
Node OOM → node leaves → primary missing → Red
Disk full → read-only / allocation blocked → Red
Resize / rolling change in progress → migration not done → temporary Red (often expected; wait)
```
> **vs. change stuck / `activating`:** The last line above is often **expected temporary Red** while migration or rolling progresses. If **`DescribeInstance` stays `activating` far longer than normal**, change records show no progress, and the cluster stays **persistently** Red, treat it as **control-plane orchestration crossed with engine allocation** — use engine steps here **and** [sop-activating-change-stuck.md](sop-activating-change-stuck.md) for cross-layer narrative and evidence.
---
## 2. Cluster status Yellow (`HealthCheck.ClusterStatusYellow`)
### Trigger conditions
- `ClusterStatus == 1` for ~30 minutes, often with **replica** shards `UNASSIGNED`
### Diagnosis steps
**Step 1 — Unassigned replica shards**
```bash
GET _cat/shards?v&h=index,shard,prirep,state,unassigned.reason | grep UNASSIGNED
```
**Step 2 — Allocation explain**
```bash
GET _cluster/allocation/explain
```
### Common causes
| Cause | Clues | Mitigation |
|-------|-------|------------|
| Too few nodes (`replicas >= node count`) | explain: `not enough nodes` | Add nodes or lower `number_of_replicas` |
| Per-index `total_shards_per_node` too low for primaries + replicas | explain often shows **`shards_limit`** and/or **`same_shard`** (order can vary by version); Yellow with **replica** `UNASSIGNED`, primaries started | **Prefer** relax/remove or raise `index.routing.allocation.total_shards_per_node`. **If the cap stays 1**, scaling to **3** data nodes often **still** Yellow — need **(nodes × cap) ≥** shard copies for that index (2p + 1r ⇒ **4** copies) **or** a higher cap / fewer replicas / reindex. **≥4** data nodes is the usual minimum **only when** cap stays **1** for that pattern |
| Change / scale-out in progress | events / actions in `Executing` | Wait (often normal); if **`activating` persists** with no progress → [sop-activating-change-stuck.md](sop-activating-change-stuck.md) |
| Disk watermarks | disk-related explain | Free disk / adjust thresholds — [sop-disk-storage.md](sop-disk-storage.md) |
| Routing / awareness filters | explain shows filter predicates | Review `routing.allocation.*` settings |
| `same_shard` (replica on same node as primary) | explain | Temporarily set replicas to 0 then back to 1 (use with care) |
**Temporary mitigation (reduces redundancy)**
```bash
PUT /index_name/_settings
{
"index.number_of_replicas": 0
}
# After disk/node issues are fixed, raise replicas again (e.g. 1)
```
### Normal Yellow vs abnormal Yellow
- **Normal Yellow:** during change / scaling; usually clears when the task completes.
- **Abnormal Yellow:** persists **> ~30 minutes** — investigate actively.
- **Long `activating` + Yellow/Red with no progress:** add cross-layer analysis per [sop-activating-change-stuck.md](sop-activating-change-stuck.md); do not only say “wait for the change to finish”.
---
## 3. Node disconnected (`HealthCheck.NodeDisconnected`)
### Trigger conditions
- `ClusterDisconnectedNodeCount > 0` for ~1 minute
### Diagnosis steps
**Step 1 — Which node**
```bash
GET _cat/nodes?v&h=name,ip,heapPercent,cpu,load_1m,diskAvailInBytes,node.role
```
**Step 2 — Logs: when and why**
On the master / node logs, search for:
- `removed` — node removed from cluster
- `node-left` — identify the node
- `ERROR` / `WARN` + disconnected node IP — errors just before leave
**Step 3 — Log keywords → cause**
| Log pattern | Likely cause | Next step |
|-------------|--------------|-----------|
| `OutOfMemoryError` / `java.lang.OOM` | JVM OOM | [sop-memory-gc.md](sop-memory-gc.md) |
| `Data too large` / `CircuitBreakingException` | Circuit breaker | [sop-memory-gc.md](sop-memory-gc.md) |
| `rejected` + `write` / `search` | Thread pool saturated | [sop-write-performance.md](sop-write-performance.md) / [sop-query-thread-pool.md](sop-query-thread-pool.md) |
| `watermark` + disk | Disk pressure | [sop-disk-storage.md](sop-disk-storage.md) |
| `Connection refused` / `network` | Network partition | Check host / VPC connectivity |
| `IOException` + `disk` | Disk I/O errors | Check disk health |
| *(no clear error)* | Host / heartbeat timeout | Check host metrics; restart node if needed |
**Step 4 — Restart the node (or process) if required, then watch recovery**
After the underlying issue is addressed, restart the failed node per your change process, then monitor shard recovery:
```bash
GET _cat/recovery?v&active_only=true&h=index,shard,time,type,stage,source_node,target_node
```
### Frequent root causes (from 150+ cases)
1. **Node OOM** (most common): heap pressure → long GC pauses → heartbeat timeout → node leaves
2. **Slow disk I/O:** writes + merges saturate I/O → heartbeat timeout
3. **CPU pegged:** no CPU for heartbeat handling → timeout
4. **Too many shards:** e.g. **> ~50k** shards → slow cluster-state application → node appears offline (details below)
5. **ILM backlog:** ILM work saturates generic thread pool → instability
6. **Recovery concurrency too high:** small nodes + aggressive recovery settings → generic pool full → node leaves
### Special case: recovery concurrency too high
**Typical pattern:**
- During resize / change, cold nodes drop and cluster goes Red
- Logs: `rejected execution of TransportReplicationAction`
- Generic thread pool queue full
**Chain:**
```text
Manually raised recovery concurrency (e.g. concurrent_recoveries = 128)
→ small node size (e.g. 4c16g cold)
→ generic queue full, high CPU
→ heartbeat lag → master marks node disconnected
```
**Mitigation:**
1. Reset recovery-related settings to defaults (Elasticsearch generally discourages changing these):
```bash
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.node_concurrent_recoveries": null,
"cluster.routing.allocation.node_initial_primaries_recoveries": null,
"cluster.routing.allocation.node_concurrent_incoming_recoveries": null,
"cluster.routing.allocation.node_concurrent_outgoing_recoveries": null
}
}
```
2. Restart affected nodes.
3. **Note:** Raising these rarely improves migration speed proportionally and can destabilize the cluster.
### Special case: too many shards → node disconnect
**Typical pattern:**
- **50k+** shards; nodes flap offline
- Master log: `cluster state applier task [...] took [57.9s] which is above the warn threshold of [30s]`
- `no master connection` style messages
**Chain:**
```text
Very high shard count
→ long IndicesClusterStateService / applier latency (tens of seconds)
→ heartbeat lag
→ master marks node disconnected
```
**Checks:**
```bash
GET _cluster/health?filter_path=active_shards,unassigned_shards,number_of_pending_tasks
GET _cat/pending_tasks?v
# Search master logs for "cluster state applier task" duration warnings
```
**Guidelines:**
- Roughly **≤ 20 shards per GB heap** on a data node (e.g. 16 GB heap → ~320 shards per node guideline)
- **≤ ~1000 shards per node** as a practical ceiling
- **≤ ~50k total shards** per cluster as a soft target
**Mitigation:**
- Drop expired indices (especially many small log indices)
- Merge / shrink where possible — see [sop-configuration.md](sop-configuration.md) for Shrink-related guidance
- Split clusters if the workload allows
### After a node leaves
```text
Node leaves
→ shard rerouting (Yellow; Red if primaries affected)
→ higher I/O / network (recovery)
→ risk of cascading failures if the cluster was already hot
```
---
## 4. Pending tasks backlog (`HealthCheck.PendingTasksCritical/Warning`)
### Trigger conditions
- **Critical:** `pending_tasks > node_count × 50` for ~5 minutes
- **Warning:** `pending_tasks > node_count × 20` for ~10 minutes
### Diagnosis steps
**Step 1 — Inspect pending tasks**
```bash
GET _cluster/pending_tasks
GET _cat/pending_tasks?v
```
**Step 2 — Task flavor**
| Dominant task type | Likely cause | Mitigation |
|--------------------|--------------|------------|
| Many `put-mapping` | Mapping explosion / too many fields | Cap dynamic fields / redesign mappings |
| Many `create-index` | Runaway index creation | Reduce dynamic index churn |
| Many ILM-related | ILM backlog | Temporarily `POST _ilm/stop`, reduce shards / fix ILM, then `POST _ilm/start` |
| Many `shard-started` | Nodes joining/leaving often | Fix node stability (Section 3) |
| Master CPU high | Undersized dedicated master | Scale dedicated master tier |
**Step 3 — Temporary relief (ILM pile-up)**
```bash
POST _ilm/stop
GET _ilm/status
# After remediation
POST _ilm/start
```
---
## 5. Master re-election (`HealthCheck.MasterReelection`)
### Trigger conditions
- Monitoring shows the **elected master node id** changed
### Diagnosis steps
**Step 1 — Current master**
```bash
GET _cat/master?v
GET _cat/nodes?v&h=name,ip,node.role,master
```
**Step 2 — Election in logs**
Search for: `elected-as-master`, `master node changed`, `new_master`
**Step 3 — Typical drivers**
1. Previous master **OOM** — memory metrics + OOM logs on old master
2. Previous master **CPU saturated**
3. **Long GC pauses** — no heartbeat during GC
4. **Network partition** — subset of nodes cannot reach master → new election
**Step 4 — Dedicated master topology**
```bash
GET _cat/nodes?v&h=name,ip,node.role
# Role includes `m` → master-eligible
# Only one `m` in production → single point of failure risk
```
> **Practice:** Use **dedicated master nodes** in production (often **3** master-eligible nodes that do not hold data) instead of data nodes also being master-eligible.
---
## Appendix: Quick command reference (cluster health)
```bash
GET _cluster/health?pretty
GET _cat/health?v
GET _cat/nodes?v&h=name,ip,heapPercent,cpu,load_1m,disk.used_percent,node.role
GET _cat/shards?v&s=state&h=index,shard,prirep,state,node,unassigned.reason
GET _cluster/allocation/explain
POST _cluster/reroute?retry_failed=true
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}
GET _cat/recovery?v&active_only=true
```
FILE:references/sop-configuration.md
# SOP: Configuration safety and API health
**Covers:** `ConfigSafety.DangerousClusterConfig` (P1), `ConfigSafety.DangerousIndexConfig` (P1), `ApiHealth.HighErrorRate` (P1), `CapacityPlanning.SlowRecovery` (P2)
*(Event IDs aligned with catalog **V4.5+**.)*
---
## 1. Dangerous cluster settings (`ConfigSafety.DangerousClusterConfig`, P1)
### Trigger conditions (any one)
- `cluster.routing.allocation.cluster_concurrent_rebalance > 16`
- `cluster.routing.allocation.node_concurrent_recoveries > 8`
- `indices.recovery.max_bytes_per_sec > 200MB`
### Why it is risky
These are often raised during data migration to speed it up. **Too high** values tend to:
1. Saturate the **generic** thread pool → pending tasks pile up → nodes drop
2. Saturate node **CPU / I/O** → heartbeat timeouts → nodes drop
3. Trigger **cascading failures** → cluster Red
### Typical cases
**Case 1 — cold tier resize, `rebalance=128`**
- Before the change, `cluster_concurrent_rebalance=128` was set to speed migration
- Many shards move at once → CPU / I/O saturated → node leaves → Red
**Case 2 — node loss with `recovery=200mb`**
- `max_bytes_per_sec=200mb` to speed recovery
- Network + I/O saturated together → heartbeat timeout → node leaves
### Diagnosis steps
**Step 1 — Inspect current settings**
```bash
GET _cluster/settings?include_defaults=true&filter_path=*.cluster.routing.allocation*,*.indices.recovery*
```
**Step 2 — Compare to safe thresholds**
| Setting | Risky if | Suggested default | Off-peak ceiling |
|---------|----------|-------------------|------------------|
| `cluster_concurrent_rebalance` | > 16 | 2 | 8 |
| `node_concurrent_recoveries` | > 8 | 2 | 4 |
| `recovery.max_bytes_per_sec` | > 200MB | 40mb | 100mb |
**Step 3 — Restore safer settings**
```bash
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.cluster_concurrent_rebalance": "2",
"cluster.routing.allocation.node_concurrent_recoveries": "2",
"indices.recovery.max_bytes_per_sec": "40mb"
}
}
```
### Tuning guidance
- **Peak hours:** keep defaults (rebalance `2`, recovery concurrency `2`, recovery rate `40mb`).
- **Off-peak migration:** you may raise modestly, but stay **below** the ceilings in the table.
- **Before peak returns:** reset to defaults.
---
## 2. Dangerous index settings (`ConfigSafety.DangerousIndexConfig`, P1)
### Trigger conditions (any one)
- Ngram tokenizer: `max_gram > 100` or `min_gram = 0`
- Small index (stored size under ~10 GB) with **primary shard count > 10**
- **Zero replicas** and **no usable snapshot** (data-loss risk)
### Case A — bad Ngram settings
**Example:** write timeouts with `min_gram=0`, `max_gram=1024`.
- `min_gram=0` can produce empty tokens → `ArrayIndexOutOfBoundsException`
- `max_gram=1024` explodes tokens per term → high CPU on indexing, timeouts
**Inspect tokenizer settings**
```bash
GET /index_name/_settings?filter_path=*.analysis.tokenizer
```
**Safer Ngram example** (adjust names to your mapping; `min_gram` must be **≥ 1**; `max_gram` often **≤ ~20**, never **> 100**):
```json
{
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 2,
"max_gram": 10
}
}
}
}
```
**Fix (requires reindex)**
```bash
# 1. Create new index with corrected analysis
PUT /new_index { ... }
# 2. Reindex
POST _reindex
{
"source": {"index": "old_index"},
"dest": {"index": "new_index"}
}
# 3. Swap alias
POST _aliases
{
"actions": [
{"remove": {"index": "old_index", "alias": "alias_name"}},
{"add": {"index": "new_index", "alias": "alias_name"}}
]
}
# 4. Delete old index
DELETE /old_index
```
### Case B — too many primaries on a small index
**Find candidates**
```bash
GET _cat/indices?v&s=store.size:asc&h=health,index,pri,rep,docs.count,store.size
# Look for store.size < ~10GB but pri > 10
```
**Why it hurts**
- Each shard is a Lucene index (~**50–100MB** heap per shard is a common rule-of-thumb band)
- Many tiny shards slow the master and increase GC pressure
**Mitigation — reduce primary count (Shrink)**
Shrink requires the index to be **read-only** and (for a single target shard layout) primaries typically co-located per Elasticsearch shrink rules — follow product docs for your version.
```bash
# Step 1: Route to one node and block writes
PUT /old_index/_settings
{
"index.routing.allocation.require._name": "target_node",
"index.blocks.write": true
}
# Step 2: Shrink
POST /old_index/_shrink/new_index
{
"settings": {
"index.number_of_shards": 2,
"index.number_of_replicas": 1
}
}
# Step 3: After verification, delete old index and point aliases as needed
```
### Case C — zero replicas and no snapshot
**Risk:** if a node holding the only copy goes away, data can be **lost permanently**.
**Check**
```bash
GET _cat/indices?v&h=index,rep&s=rep:asc
# indices with rep=0
GET _cat/snapshots?v
# confirm backup coverage
```
**Mitigation**
```bash
PUT /index_name/_settings
{
"index.number_of_replicas": 1
}
# and/or schedule snapshot policies
```
---
## 3. High Elasticsearch API error rate (`ApiHealth.HighErrorRate`, P1)
### Trigger conditions
- HTTP **5xx** rate **> ~5%** for **~3 minutes**
- APIs such as `_cat/indices`, `_cat/nodes` **repeatedly** fail
### Diagnosis steps
**Step 1 — Probe key APIs**
```bash
GET _cat/health?v
GET _cat/nodes?v
GET _cat/indices?v
GET _cluster/stats
```
**Step 2 — Elasticsearch logs**
Filter **ERROR** on master / data nodes. Watch for `NullPointerException`, `NumberFormatException`, `IllegalStateException`, etc.
**Step 3 — Error pattern → cause**
| Exception / pattern | Likely cause | Action |
|---------------------|--------------|--------|
| `NullPointerException` | Metadata glitch / version bug | Restart affected nodes; escalate if cluster-wide |
| `NumberFormatException` | Counter overflow (known-class bugs in some versions) | Upgrade or restart |
| `IllegalStateException` | Bad index lifecycle state | `POST /index/_close` then `POST /index/_open` |
| `CircuitBreakingException` | Memory pressure | [sop-memory-gc.md](sop-memory-gc.md) |
**Examples**
- `_cat/indices` NPE on a specific build → often **node restart** after vendor guidance
- Brand-new cluster: `_cat` empty/errors until cluster formation finishes → **wait**
**Step 4 — Emergency mitigation**
```bash
# Node-level issues: restart the affected node (console / automation)
# Index state issues:
POST /problem_index/_close
POST /problem_index/_open
```
---
## 4. Slow data migration / recovery (`CapacityPlanning.SlowRecovery`, P2)
### Trigger conditions
- Recovery throughput stays **below roughly half** of the configured cap **and** estimated completion **exceeds ~4 hours** (tune thresholds to your SLOs)
### Diagnosis steps
**Step 1 — Active recovery**
```bash
GET _cat/recovery?v&active_only=true&h=index,shard,time,type,stage,source_node,target_node,bytes_total,bytes_percent
GET _cat/recovery?v&active_only=true&h=index,shard,time,stage,translog_ops,translog_ops_percent
```
**Step 2 — Rough ETA**
```text
ETA ≈ remaining_bytes / observed_throughput
Example: 500 GB left at 40 MB/s → on the order of hours (validate live)
```
**Step 3 — Tune migration speed to the window**
**Speed up off-peak** (only if the cluster has headroom — stay under **Section 1** safety caps):
```bash
PUT _cluster/settings
{
"transient": {
"indices.recovery.max_bytes_per_sec": "100mb"
}
}
```
Use `200mb` only with extreme care and monitoring; the **danger** threshold in Section 1 is **> 200MB**.
**Slow down during peak** (protect latency)
```bash
PUT _cluster/settings
{
"transient": {
"indices.recovery.max_bytes_per_sec": "40mb"
}
}
```
**Pause rebalance** (if deferrable)
```bash
PUT _cluster/settings
{
"transient": {
"cluster.routing.rebalance.enable": "none"
}
}
# Re-enable off-peak
PUT _cluster/settings
{
"transient": {
"cluster.routing.rebalance.enable": "all"
}
}
```
### Examples
- **~1.5 TB** at **~100 MB/s** → many hours can still be **normal**; no action if within SLO
- Migration “stuck” with cap **10 MB/s** → raising toward **40–100 MB/s** during a safe window often helps
---
## Appendix: Quick commands (configuration and API health)
```bash
GET _cluster/settings?include_defaults=true
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.cluster_concurrent_rebalance": "2",
"cluster.routing.allocation.node_concurrent_recoveries": "2",
"indices.recovery.max_bytes_per_sec": "40mb"
}
}
GET /index_name/_settings?filter_path=*.analysis
GET _cat/indices?v&s=store.size:asc&h=health,index,pri,rep,docs.count,store.size
GET _cat/snapshots?v
GET _cluster/health
GET _cat/health?v
GET _cat/indices?v
GET _cat/recovery?v&active_only=true
```
FILE:references/sop-cpu-load.md
# SOP: High CPU load and uneven node load
**Covers:** `ResourceMonitor.CPUSustainedHigh` (P0), `ResourceMonitor.CPUPeakHigh` (P1), `CapacityPlanning.LoadImbalance` (P1)
CMS metric ↔ finding names may differ by catalog version — cross-check [health-events-catalog.md](health-events-catalog.md).
---
## Diagnosis decision tree
```
NodeCPUUtilization (example CMS family)
├── Any node avg > ~80% sustained → CPUSustainedHigh (P0), treat as urgent
├── Any node max > ~95% (spike) → CPUPeakHigh (P1), respond within ~30m
└── Coefficient of variation (CV) of per-node avg CPU > ~0.3 → LoadImbalance (P1)
NodeLoad_1m
└── High load but CPU not high → often IO wait; correlate with disk / queue metrics
```
> **Read-heavy overload:** When **`_tasks?detailed=true&actions=*search*`** shows sustained heavy search on specific indices and **`search` thread_pool** queue/rejected rises together with **NodeCPU**, say so in one line — **CPU spike and search rejects share the same query load**, not two unrelated incidents. Co‑stress wording: [sop-query-thread-pool.md](sop-query-thread-pool.md) (note after the decision tree). Use the **highest priority** already fired by metrics/rules (often **P0** CPU and/or **P0** search pool per catalog) for the summary band.
> **CMS CPU looks “low” vs heavy `search.rejected`:** Before finalizing the story, add **one short reconciliation** (especially for external readers): **(1)** whether the **load burst** sat mostly **early** in the report window while CMS shows a **whole-window** max/mean; **(2)** that **`NodeCPUUtilization` (or the catalog’s CPU metric)** is scoped to **this instance’s data nodes** — not a cluster rollup or wrong `Dimensions`; **(3)** optional **`GET _nodes/stats/os`** (process CPU) near the burst — **sub-minute** spikes can sit **between** coarse CMS samples. If **(1–3)** hold, it is **legitimate** to conclude **thread-pool / reject saturation without prolonged high CMS CPU** — but if a check fails, **fix the metric or window wording** instead of forcing the claim.
> **Per-node CPU % “lower” on the node that logs show as `search` hotspot:** Table **means** (e.g. **5 min** buckets) can **rank nodes differently** from **minute-of-peak** behavior or **thread-pool saturation** (queue/reject, `EsRejectedExecutionException` in logs). Prefer **time-aligned** evidence (**slow logs**, **thread_pool**, **hot_threads** at spike) for **which node** was search-bound; **CPU imbalance** rows are **supporting**, not a contradiction when **granularity** differs. See [acceptance-criteria.md](acceptance-criteria.md) **§6.2** (*Per-node CPU % vs search “hot” node*).
---
## 1. Three-step triage (any high-CPU scenario)
**Step 1 — Hot threads (capture while the issue is live)**
```bash
GET _nodes/hot_threads
# Thread name; stack frames: search / index / merge / GC
```
Prefer **`hot_threads` while CMS CPU is elevated** or immediately after a spike is reported. If the call **times out**, returns **thin stacks**, or the spike already ended while **`_nodes/stats/thread_pool`** / **`_tasks`** still show search overload, state **time-skewed or degraded hot_threads evidence** — do not claim “no search CPU work” for the whole incident window.
**Step 2 — Running tasks**
```bash
GET _tasks?detailed=true
GET _cat/tasks?v
# action, running_time_in_nanos (very long tasks)
```
**Step 3 — Slow logs**
Filter logs for `took_millis`, `query_time_in_millis`, or slow-log channels (`SEARCHSLOW` / `INDEXINGSLOW`).
---
## 2. Sustained high CPU (`CPUSustainedHigh`, P0)
### Criteria (example)
- Average CPU **> ~80%** for **> ~10 minutes** (tune to your alerts)
### Root-cause paths
**Path A — Traffic spike**
- **Signals:** `ClusterQueryQPS` / `ClusterIndexQPS` (or access logs) step up
- **Check:** large `body_size` requests
- **Mitigate:** throttle / rate-limit, scale out nodes urgently
**Path B — Slow queries blocking**
- **Signals:** hot threads dominated by **search** stacks; `_tasks` shows long-running search
- **Common anti-patterns** (see [sop-query-thread-pool.md](sop-query-thread-pool.md) for tuning detail):
- `wildcard` / `prefix` / `regexp` (leading wildcards are worst)
- Heavy `aggregations` (high cardinality)
- Deep `from + size` paging (e.g. > 10k)
- `nested` / `parent-child`
- Oversized `highlight` `fragment_size`
- Numeric fields queried as full-text; prefer `keyword` where appropriate
- **Mitigate:** `POST _tasks/{task_id}/_cancel` for runaway tasks; fix queries
**Path C — Heavy indexing**
- **Signals:** hot threads show **index** / **bulk**; high ingest QPS
- **Mitigate:** lower **parallel** bulk pressure and client retries; tune **per-bulk payload** (avoid oversized single bulks — see [sop-write-performance.md](sop-write-performance.md) **§2**); relax `refresh_interval` where safe
- **If CPU / GC spike with `write.rejected`:** align the **report narrative** with [sop-write-performance.md](sop-write-performance.md) **§2** (write-path first or dual P0 with ingest before JVM-only headline)
**Path D — Frequent merges**
- **Signals:** hot threads show **merge**
- **Mitigate (temporary):** cap merge scheduler threads (impacts indexing latency):
```bash
PUT _cluster/settings
{
"transient": {
"indices.merge.scheduler.max_thread_count": 1
}
}
```
### Emergency order of operations
```text
1. Throttle / shed load (often fastest)
2. Cancel huge tasks (GET _tasks → POST _tasks/{id}/_cancel)
3. Scale out if needed
4. Deeper root-cause and query/index design work
```
---
## 3. Very high CPU spikes (`CPUPeakHigh`, P1)
### Criteria (example)
- CPU **max > ~95%** even if the average stays lower
### vs `CPUSustainedHigh`
| | CPUPeakHigh | CPUSustainedHigh |
|---|-------------|------------------|
| Average | often < ~80% | > ~80% |
| Peak | > ~95% | often also > ~95% |
| Duration | short burst | ~10m+ |
| Risk | heartbeat / latency spikes | higher chance of node loss / overload cascade |
### Focus
- Spikes are often **one huge query** or **merge**
- **Hot threads** must be captured **during** the spike; evidence disappears after
- Also review: BKD range queries, large aggs, heavy `script` queries
---
## 4. Uneven load across nodes (`LoadImbalance`, P1)
### Criteria (example)
- Per-node CPU **CV = stddev / mean > ~0.3** for **~10+ minutes**
### Typical pattern
- A few nodes at **80%+** CPU, others **below ~30%**
- Hot traffic pinned to specific nodes
### Causes and mitigations
**Cause A — Shard skew**
- **Verify:**
```bash
GET _cat/shards?v&s=node&h=index,shard,prirep,state,docs,store,node
GET _cat/nodes?v&h=name,ip,cpu,heapPercent,diskAvailInBytes,shards
```
- If shards of **hot indices** pile on the hot CPUs → rebalance
- **Mitigate — move a shard:**
```bash
POST _cluster/reroute
{
"commands": [{
"move": {
"index": "hot_index",
"shard": 0,
"from_node": "node_hot",
"to_node": "node_cool"
}
}]
}
```
**Cause B — Skew *inside* shards (segment / doc locality)**
- **Pattern:** shard **count** looks balanced, but one node still burns CPU; hot threads point at one shard
- **Mitigation:** `forcemerge` can compact segments and sometimes smooth read patterns — **use off-peak**; understand IO cost and [Elasticsearch guidance](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html) for your version
```bash
POST hot_index/_forcemerge?max_num_segments=1
```
**Cause C — No ingest-only / coordinating tier**
- **Pattern:** large QPS differences by node for request handling
- **Mitigate:** add **coordinating** (client) nodes so data nodes focus on data
**Cause D — Custom `_routing` skew**
- **Pattern:** custom routing concentrates docs in a few shards
- **Mitigate:** review routing keys and shard/key design with the application team
---
## 5. Single-node CPU hot spots (field experience; not a separate V4.5 catalog line)
### Pattern A — JVM `ThreadLocal` / stale-entry churn
**Typical signals**
- Only **1–2** nodes at **80%+** CPU; peers ~**40%**
- Hot threads: **`management`** thread ~**100%** CPU
- Stack traces mention **`ThreadLocal`** / **`expungeStaleEntry`**
**Mechanism (simplified)**
- Many stale `ThreadLocal` entries; **G1** may not clean aggressively → **`ThreadLocalMap.expungeStaleEntry`** dominates CPU.
**Confirm**
```bash
GET _nodes/hot_threads
# e.g.:
# 100% cpu usage by thread 'elasticsearch[node][management][T#3]'
# at java.lang.ThreadLocal$ThreadLocalMap.expungeStaleEntry(...)
```
**Mitigation**
- **Short term:** coordinated **Full GC** or node restart **only** with vendor / SRE guidance
- **Long term:** upgrade to a **7.10+** line where your distribution documents ThreadLocal / JVM fixes (major upgrades need a full migration plan)
### Pattern B — Huge search; coordinating node melts
**Typical signals**
- Hot node is often the **coordinating** node for the request
- Heavy **query reduce** / coordination cost
- Not explained by shard placement alone
**Check**
```bash
GET _tasks?detailed=true&actions=*search*
# action, node, running_time
```
---
## Appendix: Quick commands (CPU)
```bash
GET _nodes/hot_threads
GET _tasks?detailed=true
GET _cat/tasks?v
POST _tasks/{task_id}/_cancel
GET _cat/nodes?v&h=name,ip,cpu,load_1m,heapPercent,disk.used_percent
GET _cat/shards?v&s=node&h=index,shard,prirep,docs,store,node
PUT _cluster/settings
{
"transient": {
"indices.merge.scheduler.max_thread_count": 1
}
}
POST _cluster/reroute
{
"commands": [{
"move": {
"index": "index_name",
"shard": 0,
"from_node": "node_hot",
"to_node": "node_cool"
}
}]
}
```
FILE:references/sop-disk-storage.md
# SOP: Disk watermarks and IO bottlenecks
**Covers:** `ResourceMonitor.DiskUsageWarning` (P1), `ResourceMonitor.DiskUsageCritical` (P0), `ResourceMonitor.DiskIOBottleneck` (P1), and **`DiskWatermarkConfigAnomaly`**-class cases (P0/P1; see **Section 3**)
CMS metric names are examples — align thresholds with [health-events-catalog.md](health-events-catalog.md) and your deployment.
---
## Diagnosis decision tree
```
Disk space (e.g. NodeDiskUtilization)
├── max > ~95% → DiskUsageCritical (P0); flood-stage / read-only imminent or active
├── max > ~85% → DiskUsageWarning (P1); plan cleanup / expansion
├── Logs contain "watermark" → watermark path
└── Utilization looks fine but writes fail → MUST check watermark settings (absolute-byte injection)
Cluster watermark settings (_cluster/settings)
├── Watermarks use absolute values (e.g. "19605mb") → misconfiguration risk (P1); compare to free space
├── Watermark percentages extremely low (e.g. < ~5%) → misconfiguration (P1); easy write block
└── _all/_settings has read_only_allow_delete: true → flood protection active (P0)
Disk performance (use multiple signals)
├── IO utilization (NodeStatsDataDiskUtil) near 100% → device saturated
├── Bandwidth rate (NodeStatsDataDiskIoCloudDiskBaseBandwidthRate) near 100% → throughput cap
└── IOPS rate (NodeStatsDataDiskIoCloudDiskBaseIopsRate) near 100% → random-IO cap
→ Any one near 100% supports an IO bottleneck diagnosis (combine with workload context)
Write failures
├── Logs: "index read-only" / blocks.read_only_allow_delete → flood-stage protection
└── Disk % normal + bad watermarks → config-driven block (disk not “actually full”)
```
---
## 1. Disk usage warning (`DiskUsageWarning`, ~85%+)
### Response time
Within **~2 hours** by policy — treat as **urgent** in practice to avoid escalation to Critical.
### Steps
**Step 1 — Per-node disk**
```bash
GET _cat/allocation?v
GET _cat/nodes?v&h=name,ip,disk.used_percent,disk.avail,disk.total
```
**Step 2 — Largest indices**
```bash
GET _cat/indices?v&s=store.size:desc&h=health,status,index,docs.count,store.size,pri.store.size
```
**Step 3 — Cleanup options**
| Action | How | Notes |
|--------|-----|-------|
| Drop expired indices | `DELETE /old_index_name` | Confirm with the app owner |
| Drop `.monitoring-*` | `DELETE /.monitoring-*` | Loses built-in monitoring history in-cluster |
| Lower replicas | `PUT /index/_settings` → `number_of_replicas: 0` | Temporary; reduces redundancy |
| Prune old snapshots | `GET _cat/snapshots` then `DELETE /_snapshot/repo/snap` | Confirm backups are not needed |
| Expand disk | Console / provider API | Brief impact during resize |
**Step 4 — Alerting** so the next incident is not a surprise.
---
## 2. Critical disk usage (`DiskUsageCritical`, ~95%+)
### Treat as immediate (~5 minutes)
**Elasticsearch disk-based shard routing / flood behavior (typical defaults):**
- Above **~85%** free space low: stop **allocating new shards** to the node (low watermark).
- Above **~90%**: try to **relocate shards away** (high watermark).
- Above **~95%** (flood stage): indices on that node may get **`read_only_allow_delete`**.
**Step 1 — Confirm read-only blocks**
```bash
GET _all/_settings?filter_path=*.settings.index.blocks
# or per index
GET /your_index/_settings
```
**Step 2 — Free space urgently**
```bash
DELETE /old_index_name
DELETE /.monitoring-*
PUT /big_index/_settings
{
"index.number_of_replicas": 0
}
```
**Step 3 — Clear read-only after free space is healthy (often below ~85% on that path)**
```bash
PUT _all/_settings
{
"index.blocks.read_only_allow_delete": null
}
```
> **Important:** After disk pressure drops, **`read_only_allow_delete` is not always cleared automatically** — you usually must run the command above (Elasticsearch 7.x behavior).
**Step 4 — Temporary watermark relief (buys time only)**
```bash
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "95%",
"cluster.routing.allocation.disk.watermark.high": "98%",
"cluster.routing.allocation.disk.watermark.flood_stage": "99%"
}
}
```
> Raising watermarks can **unblock writes briefly** — you must **still reclaim disk** and **revert** to sane defaults afterward.
---
## 3. Disk watermark misconfiguration (`DiskWatermarkConfigAnomaly`, P0/P1)
> **When:** Disk **utilization % looks fine** but watermarks or flood-stage still block writes.
### vs “disk actually full”
| | Disk actually full | Watermark misconfiguration |
|---|-------------------|----------------------------|
| Utilization | High (~85–95%+) | Can be **~1–5%** |
| Root cause | Data growth / retention | `transient` / `persistent` **absolute-byte** watermarks (or absurd %) |
| Typical source | Logs / indices | Scripts, drills, bad copy-paste |
| Trap | Low | **High** if you only glance at `% used` |
### Example chain (absolute-byte flood)
```text
Script sets transient watermarks to absolute free-space requirements
→ flood_stage requires > 19605 MB free
→ only ~19632 MB free (~27 MB headroom)
→ small writes trip flood_stage → read_only_allow_delete on indices
→ writes fail while CMS “disk %” still looks ~4%
```
### Extended write-failure tree
```text
Write failures
├── High disk % (> ~85%) → Sections 1–2 (real capacity)
└── Normal disk % → MUST also:
├── GET _cluster/settings?include_defaults=true&filter_path=**.watermark
│ ├── Absolute-byte watermarks → misconfiguration
│ └── Absurdly low % watermarks → misconfiguration
├── GET _all/_settings?filter_path=*.settings.index.blocks
│ └── read_only_allow_delete: true → flood already applied
└── transient vs persistent
├── transient → may reset on full-cluster restart; often script residue
└── persistent → must be cleared explicitly
```
### Steps
**Step 1 — Inspect watermarks**
```bash
GET _cluster/settings?include_defaults=true&filter_path=**.watermark,**.flood_stage
```
Distinguish **absolute** values (`19605mb`, `500gb`) from **percentages** (`85%`). Absolute-byte watermarks on small volumes are a common foot-gun.
**Step 2 — Index blocks**
```bash
GET _all/_settings?filter_path=*.settings.index.blocks
```
If you see `read_only_allow_delete: true`, flood-stage protection is in effect.
**Step 3 — Compare free space to thresholds**
```bash
GET _cat/allocation?v&bytes=mb
```
Compare `disk.avail` to the effective watermark / flood thresholds.
**Step 4 — Remediate**
| Finding | Order | Action |
|---------|-------|--------|
| Bad transient watermarks | 1 | Reset: `PUT _cluster/settings` with `transient` keys set to `null` for `cluster.routing.allocation.disk.watermark.{low,high,flood_stage}` |
| Indices read-only | 2 | `PUT _all/_settings` → `index.blocks.read_only_allow_delete`: `null` (after space / config fixed) |
| Find who set it | 3 | Audit change tickets, automation, load tests |
| Verify | 4 | Test index/write paths |
> **Elasticsearch 7.x:** `read_only_allow_delete` from flood stage often **persists** until cleared manually even after disk recovers.
---
## 4. Disk IO bottleneck (`DiskIOBottleneck`, P1)
### Criteria (example)
- `NodeStatsDataDiskUtil` **> ~80%** for **~5+ minutes**, or
- Logs show growing write latency / IO wait
### vs disk space
- **High watermark / flood:** capacity (free space).
- **IO bottleneck:** performance; disk can be half empty and still saturated.
### Three dimensions (recommended)
> Do not rely on **IO Util alone** — combine **utilization**, **bandwidth %**, and **IOPS %**.
| Dimension | Meaning | CMS utilization metric | Raw / rate metrics | Bottleneck hint |
|-----------|---------|------------------------|--------------------|-----------------|
| **IO Util** | Device busy % | `NodeStatsDataDiskUtil` | — | Near **100%** |
| **Throughput** | MB/s vs cap | `NodeStatsDataDiskIoCloudDiskBaseBandwidthRate` | `NodeStatsDataDiskIoMbPerS` (MiB/s) | Near **100%** |
| **IOPS** | Ops/s vs cap | `NodeStatsDataDiskIoCloudDiskBaseIopsRate` | `NodeStatsDataDiskIo` (count) | Near **100%** |
**If any of the three nears 100%,** narrow the shape of load:
- **IO Util high**, bandwidth/IOPS % not maxed → weak device / heavy queueing / mixed workload.
- **Bandwidth % maxed**, IO util not maxed → large sequential reads/writes (merge, snapshot, recovery).
- **IOPS % maxed** → many small random IOs (many shards, chatty queries, small writes).
> Effective caps depend on **disk SKU + size + instance type** (often **min(disk limit, instance limit)**). CMS **%** metrics encode that — you usually do not hand-compute ceilings.
#### CMS metric catalog (namespace `acs_elasticsearch`)
> Discover fields with `DescribeMetricMetaList --Namespace acs_elasticsearch` (Alibaba Cloud Monitor).
| Category | Metric | Unit | Notes |
|----------|--------|------|-------|
| **IO Util** | `NodeStatsDataDiskUtil` | % | Busy time |
| **Throughput — absolute** | `NodeStatsDataDiskIoMbPerS` | MiB/s | Total |
| | `NodeStatsDataDiskRm` / `NodeStatsDataDiskWm` | MiB/s | Read / write |
| **Throughput — %** | `NodeStatsDataDiskIoCloudDiskBaseBandwidthRate` | % | Node-level vs baseline |
| | `NodeStatsDataDiskIoSingleDiskMaxThroughputRate` | % | Per-disk |
| | `NodeStatsDataDiskRmCloudDiskBaseBandwidthRate` | % | Read vs baseline |
| | `NodeStatsDataDiskWmCloudDiskBaseBandwidthRate` | % | Write vs baseline |
| | `NodeStatsDataDiskIoNetworkBaseBandwidthRate` | % | IO network baseline % |
| **IOPS — absolute** | `NodeStatsDataDiskIo` | count | Total IOPS |
| | `NodeStatsDataDiskR` / `NodeStatsDataDiskW` | count | Read / write IOPS |
| **IOPS — %** | `NodeStatsDataDiskIoCloudDiskBaseIopsRate` | % | Node-level |
| | `NodeStatsDataDiskIoSingleDiskMaxIopsRate` | % | Per-disk |
| | `NodeStatsDataDiskRSingleDiskMaxIopsRate` | % | Read |
| | `NodeStatsDataDiskWSingleDiskMaxIopsRate` | % | Write |
| **Queue** | `DiskAverageQueueSize` | count | Avg queue depth |
> **Practical triage:** start with **`NodeStatsDataDiskUtil`**, **`NodeStatsDataDiskIoCloudDiskBaseBandwidthRate`**, **`NodeStatsDataDiskIoCloudDiskBaseIopsRate`** — whichever pegs first drives the story; then use split read/write metrics.
### Causes and mitigations
**Cause A — Heavy ingest**
- Throttle at the source; larger bulk batches (fewer round-trips); relax `refresh_interval` when safe:
```bash
PUT /index_name/_settings
{
"refresh_interval": "30s"
}
```
**Cause B — Frequent merges (common)**
- Hot threads show **merge** stacks. Short-term:
```bash
PUT _cluster/settings
{
"transient": {
"indices.merge.scheduler.max_thread_count": 1
}
}
```
- Longer term: review `index.merge.policy.*` (e.g. `segments_per_tier`) with version docs — trade merge cost vs segment count.
**Cause C — Hitting disk / instance ceilings**
- **Ceiling ≈ min(disk SKU limit, instance type disk limit).**
- Example pitfall: upgraded ESSD tier but **no gain** because the **instance** throughput cap dominates.
- **Order:** confirm which side is saturated from CMS vs quotas → scale **instance** and/or **disk** → if both maxed, **add nodes** to spread IO.
**Cause D — Bad hardware / media errors**
- Logs: `IOException`, `EIO` → replace disk / escalate to infrastructure support.
### Example chain (slow IO → node leaves)
```text
Sustained high IO → slow translog / fsync → slow heartbeat handling → node marked offline
Mitigation: cap merge concurrency + reduce ingest pressure (+ fix disk tier if capped)
```
---
## 5. Data disk vs system disk (field note; not a separate catalog line)
### Data volume full
- **Signals:** watermark logs + high data-disk metrics.
- **Mitigate:** Sections 1–2.
### Root / system volume full (rare, severe)
- **Signals:** data disk OK but JVM/host misbehaves (cannot write temp files, logs).
- **Common causes:** core dumps under `/var/crash` or `/tmp`, container image sprawl, huge system logs.
- **Mitigate:** host cleanup by **SRE / infrastructure**; monitor root mount (e.g. `/` or `/dev/vda1`) in host monitoring, not only Elasticsearch data paths.
---
## 6. Disk issues during changes
### 6.1 High disk + change stuck (`activating`)
Typical story: **disk pressure → user triggers restart → change hangs in `activating`**.
1. Relieve disk (delete / resize / temporary watermark relief per policy).
2. Let nodes come back cleanly.
3. If the change is still stuck, use [sop-activating-change-stuck.md](sop-activating-change-stuck.md) and escalate per your support model.
### 6.2 Recovery bandwidth too high → node / FS hang
**Pattern** (cold-tier resize case): new nodes cannot join; shell `cd` into `logs` hangs; cluster stays Yellow/Red.
```text
indices.recovery.max_bytes_per_sec set very high (e.g. 300MB/s)
→ exceeds disk SKU sustained throughput (e.g. ~250MB/s)
→ block layer stalls → filesystem appears hung
→ Elasticsearch cannot flush logs / data → bootstrap fails
```
**Mitigation:**
1. Roll back the change / restore service on surviving nodes if required.
2. Lower recovery bandwidth to a safe value for the disk class:
```bash
PUT _cluster/settings
{
"persistent": {
"indices.recovery.max_bytes_per_sec": "100mb"
}
}
```
3. Retry the change in a **low-traffic** window.
4. If on a low baseline disk SKU, plan upgrade to a higher baseline (e.g. ESSD PL1+) per product guidance.
---
## Appendix: Quick commands (disk)
```bash
GET _cat/allocation?v
GET _cat/nodes?v&h=name,ip,disk.used_percent,disk.avail,disk.total
GET _cat/indices?v&s=store.size:desc
DELETE /index_name
GET _all/_settings?filter_path=*.settings.index.blocks
PUT _all/_settings
{
"index.blocks.read_only_allow_delete": null
}
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "95%",
"cluster.routing.allocation.disk.watermark.high": "98%"
}
}
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": null,
"cluster.routing.allocation.disk.watermark.high": null,
"cluster.routing.allocation.disk.watermark.flood_stage": null
}
}
PUT /large_index/_settings
{
"index.number_of_replicas": 0
}
PUT _cluster/settings
{
"transient": {
"indices.merge.scheduler.max_thread_count": 1
}
}
```
FILE:references/sop-memory-gc.md
# SOP: JVM memory, GC, and circuit breakers
**Covers:** `ResourceMonitor.OldGenMemoryWarning` (P0), `ResourceMonitor.OldGenMemoryCritical` (P0), `ResourceMonitor.MemoryRapidGrowth` (P1), `ResourceMonitor.OldGCFrequent` (P1), `ResourceMonitor.GCTimeRatioHigh` (P0), `HighAvailability.CircuitBreakerTriggered` (P0), `ResourceMonitor.OOM` (P0)
CMS / catalog names may differ — align thresholds with [health-events-catalog.md](health-events-catalog.md).
---
## Diagnosis decision tree
```
NodeHeapMemoryUtilization (example)
├── avg > ~95% → OldGenMemoryCritical (P0); OOM imminent
├── avg > ~85% → OldGenMemoryWarning (P0); elevated OOM risk
└── rises > ~30% within ~30m → MemoryRapidGrowth (P1); trend alert
JVMGCOldCollectionCount (per minute)
└── max > ~5 / min → OldGCFrequent (P1)
JVMGCOldCollectionDuration
└── GC time / wall time > ~30% → GCTimeRatioHigh (P0)
Master log keywords
├── OutOfMemoryError / OOM → ResourceMonitor.OOM (P0)
├── Data too large / CircuitBreakingException → circuit breaker (P0)
└── Killed / OOM killer → OS killed the process
```
> **Headline priority (heap / GC vs breakers):** Cluster **Green** does not rule out **JVM pressure**. When rules or logs point to **fielddata**, **`CircuitBreakingException`**, or **artificially low `indices.breaker.*` limits** in **`_cluster/settings`** (**transient** and **persistent**), treat **settings + `_nodes/stats/breaker` (`tripped`) + query/mapping** as the **primary** story — not **Old GC frequency alone** (GC is often a **parallel** or **downstream** signal unless isolated as the driver).
> **Per-node heap skew:** Uneven **heap percent** across data nodes should be read with **`_nodes/stats/breaker`**, **`_nodes/stats/indices/fielddata`**, **`_cat/shards`** / hot indices, and **query shape** — not **only** “bad shard counts” without breaker and workload context.
> **Co-occurrence with `ThreadPool.WriteRejected` / bulk saturation:** If **`_nodes/stats/thread_pool`** shows **`write` / `bulk` `rejected`** and **`hot_threads`** or traffic points at **ingest**, do **not** headline **Old GC alone** ahead of the write path — use the **causal chain or dual-P0 ordering** in [sop-write-performance.md](sop-write-performance.md) **§2** (*Evidence interpretation: bulk QPS → write pool*). Old GC / `GCTimeRatioTooHigh` remains **P0-class** but is often **downstream** of indexing + merges + retries.
> **Collector names in prose:** Alibaba Cloud ES 7.x often ships **JDK 11** with **G1**, but the **actual** collector is **JVM / image dependent**. Do **not** assert **G1** vs **CMS** from version alone — confirm via **`GET _nodes/stats/jvm`** (`gc.collectors.*`) or node **GC logs**; otherwise write **“old-generation / Old GC”** or **“old GC (collector per JVM stats)”**.
---
## 1. Old-gen memory warning (`OldGenMemoryWarning`, ~85%+)
### Act first (before deep analysis)
**1. Clear caches (often fastest)**
```bash
POST _cache/clear
POST _cache/clear?fielddata=true
# Or per index
POST /index_name/_cache/clear?fielddata=true&query_cache=true&request_cache=true
```
**2. Heap snapshot from the API**
```bash
GET _nodes/stats/jvm?filter_path=nodes.*.jvm.mem,nodes.*.name
# heap_used_percent, old_gen_used_bytes (names vary by version)
```
**3. Fielddata footprint**
```bash
GET _nodes/stats/indices/fielddata?fields=*&filter_path=nodes.*.indices.fielddata
```
### Root causes
**A — Fielddata too large**
- **Signals:** slow logs show **aggregations on `text`** fields
- **Effect:** fielddata stays on-heap until evicted / cleared
- **Mitigate:**
```bash
POST /index_name/_cache/clear?fielddata=true
PUT _cluster/settings
{
"persistent": {
"indices.fielddata.cache.size": "20%"
}
}
```
(`indices.fielddata.cache.size` behavior depends on version — confirm in your distribution.)
- **Proper fix:** avoid **`text`** for aggs; use a **`keyword`** sub-field or `doc_values` where appropriate.
**B — Huge aggregations**
- **Signals:** hot threads in agg paths; long-running `_tasks` with search/agg
- **Mitigate:** cancel runaway tasks; reduce cardinality, bucket count, `size`; consider `execution_hint: map` where supported
**C — Oversized result sets**
- **Signals:** very large `size`, deep `from + size` (e.g. > 10k)
- **Mitigate:** cap `size`; use **scroll** or **search_after**
**D — Very large `terms` queries**
- **Signals:** `terms` with huge ID lists (tens of thousands); hot threads with `TermsQuery` / `TermInSetQuery`
- **Effect:** large bitsets on heap (default `index.max_terms_count` is **65536**)
- **Mitigate:**
```bash
GET _tasks?detailed=true&actions=*search*
POST _tasks/{task_id}/_cancel
GET {index}/_settings?filter_path=**.max_terms_count
PUT {index}/_settings
{
"index.max_terms_count": 10000
}
```
- **Proper fix:** **batch** terms lists (e.g. **< ~1000** per request) or use **`ids`** where applicable.
> **Example:** tens of thousands of IDs in one `terms` query → high heap churn and GC → slow cluster. Fix: smaller batches + app-side chunking.
**E — Undersized heap for the workload**
- **Signals:** “normal” traffic but heap pegged
- **Mitigate:** scale node RAM / heap per vendor guidance
---
## 2. Old-gen memory critical (`OldGenMemoryCritical`, ~95%+)
### Emergency sequence
```text
1. POST _cache/clear (all tiers you can afford to drop)
2. Cancel long searches: GET _tasks → POST _tasks/{id}/_cancel
3. Restart affected nodes if still unstable (plan brief outage)
4. After stabilization, find the recurring driver (queries, mappings, breakers)
```
**Cancel large searches**
```bash
GET _tasks?detailed=true&actions=*search*
POST _tasks/{task_id}/_cancel
```
**Restart**
- Use console / automation for the specific node.
- After restart, confirm `heapPercent` trends flat, not climbing endlessly.
---
## 3. Frequent old GC (`OldGCFrequent`, P1)
### Criteria (example)
- `JVMGCOldCollectionCount` **max > ~5 / minute**
### Causes
| Cause | Clues | Mitigation |
|-------|-------|------------|
| Old gen pegged (> ~75%) | heap trend | Sections 1–2 |
| Large fielddata | node stats | clear + mapping fix |
| Heavy aggs | slow logs | optimize / throttle |
| G1 never “full” enough to reclaim certain structures | rare JVM / TL patterns | upgrade path or vendor-guided Full GC |
**G1 / `ThreadLocal` stale-entry churn**
G1 may not run a “classic” full GC often; some `ThreadLocal` / weak-reference cleanup paths need a full collection cycle. See **Pattern A** in [sop-cpu-load.md](sop-cpu-load.md) (single-node management thread hot + `expungeStaleEntry`).
---
## 4. High GC time ratio (`GCTimeRatioHigh`, P0)
### Criteria (example)
- **Old GC duration / wall time > ~30%**
- Users see slow responses; long STW pauses
### Chain
```text
Old-gen pressure → frequent old-gen collections → long pauses
→ heartbeat lag → node leaves cluster
→ client timeouts / connection errors
```
### Quick check
```bash
GET _nodes/stats/jvm?filter_path=nodes.*.jvm.gc,nodes.*.name
# collection_count, collection_time_in_millis
```
---
## 5. Circuit breaker tripped (`CircuitBreakerTriggered`, P0)
### Symptoms
- Logs like:
```text
CircuitBreakingException[[parent] Data too large, data for [...] would be [XXX], which is larger than the limit of [YYY]]
```
- Requests fail with **HTTP 429** or **503**
### Breaker types
| Breaker | Role | Setting | Typical default (JVM %) |
|---------|------|---------|------------------------|
| `parent` | Ceiling for breaker accounting | `indices.breaker.total.limit` | ~95% |
| `fielddata` | Fielddata heap | `indices.breaker.fielddata.limit` | **~40%** (lowering below ~40% is risky) |
| `request` | Single in-flight request structure | `indices.breaker.request.limit` | ~60% |
| `in_flight_requests` | **All** in-flight HTTP request bytes | `network.breaker.inflight_requests.limit` | **~100%** |
| `accounting` | Long-lived accounting | `indices.breaker.accounting.limit` | — |
> **`request` vs `in_flight_requests` (important)**
> - **`request`:** one heavy query (huge agg, large `size`, wide fields). Fix: **smaller query**, fewer buckets, better mapping.
> - **`in_flight_requests`:** **aggregate** traffic from **many concurrent** requests. Fix: **lower client concurrency**, queueing, **more nodes** — **not** “split one big query into many small concurrent queries” (that usually **raises** concurrency and makes this worse).
> - If the log shows **`[in_flight_requests]`**, do **not** default to “shard the query into more parallel clients.”
> **Breaker limits in `_cluster/settings` (MUST when JVM/breaker rules fire):** Always pull **both** **transient** and **persistent** (and defaults when using `include_defaults=true`) — e.g. `GET _cluster/settings?include_defaults=true` and read **`indices.breaker.*`** / **`network.breaker.*`** (use `filter_path` if you need a smaller payload). Mis-tuning often appears as **`indices.breaker.fielddata.limit`** set to a **very small JVM fraction** (e.g. **~1–5%**); **`indices.breaker.request.limit`** is **frequently lowered at the same time** (e.g. **~5%**) — cite **both** when present.
> **Severity:** A low **`fielddata.limit` alone** is a **configuration risk**. If **`_nodes/stats/breaker`** shows **`tripped` > 0** for **`fielddata`** / **`request`** / **`parent`**, or logs show **`CircuitBreakingException`**, elevate the narrative to **incident-grade** (typically **P0**-class alongside query relief) — **not** only a **P2 “potential risk”** footnote.
> **`tripped` vs current heap:** **`tripped`** is a **cumulative** counter (trips **since JVM start**). A **large** **`parent.tripped`** (e.g. **183**) reflects **historical** breaker events and **past** heap pressure peaks — it can **coexist** with **moderate current heap** from **`_cat/nodes`**. When both appear in one report, add **one sentence** so readers do not read “**tripped** high” and “heap **~50%** now” as contradictory.
> **Index-level closure:** Tie symptoms to **concrete indices** (`_cat/shards`, **`GET {index}/_mapping`**) — especially **`text` fields with `fielddata: true`**, **large `terms` / `cardinality` aggs**, or deep paging — so remediation is query- and mapping-specific.
```bash
GET _cluster/settings?include_defaults=true
# Inspect persistent, transient, and defaults for indices.breaker.* and network.breaker.*
```
If `fielddata` limit is **below ~40%** of JVM **without** a deliberate vendor exception, treat as misconfiguration and restore recommended values after change approval.
### Response steps
**Step 1 — Breaker stats**
```bash
GET _nodes/stats/breaker
```
**Step 2 — Map `name` to action**
- **`fielddata`** → text-field aggs / huge fielddata → `POST _cache/clear?fielddata=true`
- **`parent`** → overall pressure → `POST _cache/clear` + load relief
- **`request`** → single huge request → rewrite query (size, aggs, fields)
- **`in_flight_requests`** → too much **parallel** traffic:
1. **Throttle clients** / shrink pools / add queue depth server-side
2. **Scale out** nodes
3. Only if a **single** request is proven huge, tune that query — avoid fan-out that increases concurrency
4. Raising `network.breaker.inflight_requests.limit` is rarely the right first move (often already at JVM %)
**Step 3 — Corroborate**
```bash
GET _tasks?detailed=true
# plus slow logs around the trip time
```
### Example (e-commerce, coordinating node)
- Traffic spike with heavy **agg** at ~15:00
- Heap already ~80%
- **`parent`** trips
- Mitigation: throttle + clear fielddata + larger nodes
---
## 6. OOM (`ResourceMonitor.OOM`, P0)
### Symptoms
- `java.lang.OutOfMemoryError: Java heap space` in logs
- Node may have restarted on its own
- Host: `/var/log/messages` (or journal) for **OOM killer** lines
### Steps
**Step 1 — Is the node back?**
```bash
GET _cat/nodes?v&h=name,ip,heapPercent
```
**Step 2 — If not healthy, restart** the process / node per runbook.
**Step 3 — After restart,** if heap climbs again immediately → sustained leak or abusive query pattern.
**Step 4 — Heap histogram (when permitted on host)**
```bash
jmap -histo:live {pid} | head -50
```
### Prevention
- Enable **heap dump on OOM:** `-XX:+HeapDumpOnOutOfMemoryError`
- Alert at **~85%** heap (`OldGenMemoryWarning` class)
- **Very small** flavors (e.g. **2 vCPU / 4 GB**) are a poor fit for production Elasticsearch — overhead dominates.
---
## Appendix: Quick commands (memory / GC)
```bash
GET _nodes/stats/jvm?filter_path=nodes.*.jvm.mem,nodes.*.name
GET _nodes/stats/jvm?filter_path=nodes.*.jvm.gc,nodes.*.name
GET _nodes/stats/indices/fielddata?fields=*
GET _nodes/stats/breaker
POST _cache/clear
POST /index_name/_cache/clear?fielddata=true
GET _tasks?detailed=true&actions=*search*
POST _tasks/{task_id}/_cancel
GET _cat/nodes?v&h=name,ip,heapPercent,heapCurrent,heapMax,ram.percent
```
FILE:references/sop-node-load-imbalance.md
# SOP: Per-node load imbalance
**Covers:** `HealthCheck.LoadUnbalanced`
**Reason codes:**
- `Balancing.NodeCPUUnbalanced` — uneven CPU
- `Balancing.NodeTrafficUnbalanced` — uneven query / index traffic
- `Balancing.NodeDataUnbalanced` — uneven data placement
- `Balancing.NodeDiskUnbalanced` — uneven disk utilization
- `Balancing.NodeMemoryUnbalanced` — uneven heap / memory pressure
---
## Diagnosis entry: imbalance decision tree
```
Three dimensions (CV = coefficient of variation = stddev / mean)
│
├── Traffic imbalance
│ ├── Per-node NodeQPS CV > ~0.3 → uneven search load
│ └── Per-node NodeIndexQPS CV > ~0.3 → uneven write load
│
├── Data imbalance
│ ├── Per-node shard count CV > ~0.3 → shard skew (very common)
│ ├── Per-node document count CV > ~0.5 → doc skew
│ └── Per-node stored size CV > ~0.3 → storage skew
│
└── Resource imbalance
├── Per-node NodeCPUUtilization CV > ~0.3 → CPU skew
├── Per-node NodeHeapMemoryUtilization CV > ~0.3 → heap skew
└── Per-node NodeDiskUtilization CV > ~0.3 → disk skew
```
**Severity heuristic**
- Resource skew **and** any hot node **> ~80%** → treat as **P0** (overload / failure risk)
- Resource skew but hot nodes **< ~80%** → **P1** (latent risk)
- Traffic / data skew only, resources healthy → **P2** (optimize when convenient)
---
## Time-series CV and peak windows
> The diagnosis engine may compute several CV flavors so short spikes are not washed out by daily averages:
| CV flavor | Meaning | Use |
|-----------|---------|-----|
| **CV (avg)** | CV of per-node **means** | Overall trend; can hide short spikes |
| **CV (max)** | CV of per-node **maxes** | Catches peak-window skew |
| **Peak-window CV** | CV across nodes at each timestamp; take **max** | Pinpoints the worst clock time |
**Effective CV** ≈ `max(CV_avg, CV_max, peak_window_CV)`.
If the report shows **peak-window CV ≫ average CV**, the skew is **time-localized** — inspect traffic / tasks in that window.
---
## 1. Shard-count skew (most common driver)
> **Shard skew is the single most common upstream cause of uneven CPU / heap / disk — check it first.**
### 1.1 Quick commands
```bash
GET _cat/nodes?v&h=name,ip,cpu,heap.percent,disk.used_percent,shards&s=shards:desc
GET _cat/allocation?v
# If one node owns far more shards than peers → placement skew
```
### 1.2 Typical causes
| Cause | How you know | Mitigation |
|-------|--------------|------------|
| New nodes not drained/rebalanced | new nodes have far fewer shards | enable rebalance; manual `reroute` |
| Allocation rules | review `cluster.routing.allocation.*` | relax / fix filters |
| Index created when cluster was tiny | old indices lopsided | reindex / shrink / reroute |
| Hot indices packed on few nodes | `_cat/shards` shows concentration | move hot shards |
### 1.3 Rebalance / move
```bash
PUT _cluster/settings
{
"transient": {
"cluster.routing.rebalance.enable": "all"
}
}
POST _cluster/reroute
{
"commands": [{
"move": {
"index": "hot_index",
"shard": 0,
"from_node": "node_many_shards",
"to_node": "node_few_shards"
}
}]
}
```
> **`reroute` scope:** `POST _cluster/reroute` issues **allocator-respecting moves** — it is **not** a supported pattern to “flip primary vs replica by hand” just to **even out primary counts** for load. Durable fixes for shard / primary skew are usually **new index settings + reindex**, **shrink** (when applicable), **node add/remove + rebalance**, and fixing **allocation / awareness** rules. Use **`move`** only under **change control**, in a **maintenance window**, after **`_cat/shards`** / explain confirm the plan — **do not** treat ad-hoc primary/replica swaps as a routine performance knob.
---
## 2. Traffic imbalance
### 2.1 Trigger (example)
- Per-node QPS / index QPS **CV > ~0.3** for **~10+ minutes**
### 2.2 Steps
**Step 1 — Node-level traffic proxies**
```bash
GET _cat/nodes?v&h=name,ip,cpu,load_1m,search,search.total,indexing
GET _cat/thread_pool?v&h=node_name,name,active,queue,rejected&s=name
```
**Step 2 — Index-level hot spots**
```bash
GET _cat/indices?v&h=index,pri,rep,docs.count,store.size,search.query_current,indexing.index_current&s=search.query_current:desc
```
**Step 3 — Shard placement for hot indices**
```bash
GET _cat/shards/{hot_index}?v&h=index,shard,prirep,state,docs,store,node
```
### 2.3 Causes and fixes
| Cause | Clues | Mitigation |
|-------|-------|------------|
| No coordinating tier | data nodes take client traffic | add **coordinating** / client nodes |
| Sticky clients | few nodes get all connections | load-balance / shuffle client endpoints |
| Hot shards | shards of hot index on few nodes | reroute or increase shards (plan carefully) |
| Custom `_routing` | doc skew + query skew | redesign routing / keys |
| Multi-zone + awareness | coordinators per zone see uneven ingress | see **Section 2.5** |
### 2.4 Commands (illustrative)
```bash
# After adding coordinators, point clients at the full coordinator pool
POST _cluster/reroute
{
"commands": [{
"move": {
"index": "hot_index",
"shard": 0,
"from_node": "node_hot",
"to_node": "node_cool"
}
}]
}
```
### 2.5 Multi–availability-zone: uneven **coordinator** load
**Background:** With `cluster.routing.allocation.awareness.attributes: zone`, primaries and replicas spread across zones.
**How skew appears:**
```text
Clients reach Elasticsearch via a VPC endpoint / proxy
→ the proxy prefers coordinators in the **same zone** as the client
→ if **clients** cluster in zone K, zone-K coordinators absorb most traffic
```
**Checks**
```bash
GET _cluster/settings?filter_path=**.awareness*
# e.g. awareness.attributes = zone
GET _cat/nodeattrs?v&h=node,attr,value&s=attr
# inspect zone attrs
GET _cat/nodes?v&h=name,ip,cpu,load_1m,node.role&s=node.role
# focus on nodes without `d` (no data role) if those are your coordinators
```
**Mitigations**
| Option | Notes | When |
|--------|-------|------|
| Spread clients across zones | best if you control deploy topology | preferred |
| Private connectivity to **all** coordinators | bypasses zone-sticky proxy | requires network design |
| More coordinators in hot zones | when traffic mix cannot change | capacity fix |
> **Example:** many app VMs in zone **K**; coordinators in **K / H / G**. Proxy sends K clients to K-only coordinators → K coordinators run hot. Fix: rebalance **clients** across zones or widen coordinator access.
---
## 3. Data imbalance
### 3.1 Trigger (example)
- Shard count, document count, or stored-size **CV > ~0.3** (tune to your catalog)
### 3.2 Steps
```bash
GET _cat/nodes?v&h=name,ip,disk.total,disk.used,disk.avail,disk.used_percent,shards
GET _cat/allocation?v
GET _cat/shards?v&s=node&h=index,shard,prirep,state,docs,store,node
GET _cat/nodes?v&h=name,shards
GET _cat/indices?v&h=index,pri,rep,docs.count,store.size&s=store.size:desc
GET {index}/_settings?include_defaults=true&filter_path=**.number_of_shards,**.number_of_replicas
```
### 3.3 Patterns
#### Type A — Shard count skew
**Signs:** one node has many more shards than others.
**Mitigate:** rebalance + `reroute` as in **Section 1**.
#### Type B — Doc count skew **inside one index**
**Signs:** same index, shard doc counts differ a lot (`_cat/shards/{index}&s=docs:desc`).
**Causes:** custom routing, uneven IDs / hashing.
**Mitigate:** better routing key; `index.routing_partition_size` where appropriate; reindex if needed.
```bash
GET _cat/shards/{index}?v&h=shard,prirep,docs,store,node&s=docs:desc
```
#### Type C — Stored size skew
**Signs:** large gap in **disk used** per node.
**Causes:** huge shards on few nodes; retention; snapshot footprint.
```bash
GET _cat/allocation?v
GET _cat/indices?v&h=index,store.size&s=store.size:desc
```
**Mitigate:**
```bash
POST _cluster/reroute
{
"commands": [{
"move": {
"index": "large_index",
"shard": 0,
"from_node": "node_full",
"to_node": "node_empty"
}
}]
}
DELETE old_index_*
```
Optional: nudge disk routing — align with [sop-disk-storage.md](sop-disk-storage.md) before changing watermarks:
```bash
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "85%",
"cluster.routing.allocation.disk.watermark.high": "90%"
}
}
```
---
## 4. Resource-level imbalance
Cross-links: [sop-cpu-load.md](sop-cpu-load.md) (CPU / hot threads), [sop-memory-gc.md](sop-memory-gc.md) (fielddata / breakers), [sop-disk-storage.md](sop-disk-storage.md) (disk / flood).
### 4.1 CPU skew
**Trigger (example):** per-node CPU **CV > ~0.3** and max node **> ~10%** (sanity floor).
```bash
GET _cat/nodes?v&h=name,ip,cpu,load_1m,load_5m,load_15m&s=cpu:desc
GET _nodes/hot_threads
GET _tasks?detailed=true
GET _cat/tasks?v
```
| Cause | Clues | Mitigation |
|-------|-------|------------|
| Shard skew | hot node has more shards | reroute |
| Hot segments inside shards | even shard counts but skewed CPU | `forcemerge` off-peak (see ES docs); see [sop-cpu-load.md](sop-cpu-load.md) |
| Missing coordinators | one node coordinates most queries | add coordinators |
| Monster queries | hot threads = search | optimize / cancel tasks |
```bash
POST {hot_index}/_forcemerge?max_num_segments=1
```
Use **forcemerge** only with maintenance windows — IO heavy.
### 4.2 Heap skew
**Trigger (example):** heap **CV > ~0.3**.
```bash
GET _cat/nodes?v&h=name,ip,heap.percent,heap.current,heap.max,ram.percent&s=heap.percent:desc
GET _nodes/stats/indices?human&filter_path=nodes.*.indices.fielddata,nodes.*.indices.query_cache,nodes.*.indices.segments
GET _cat/fielddata?v&s=size:desc
```
| Cause | Clues | Mitigation |
|-------|-------|------------|
| Fielddata skew | `_cat/fielddata` dominated by one node | fix aggs; clear caches — [sop-memory-gc.md](sop-memory-gc.md) |
| Query cache | large `query_cache` on some nodes | tune cache limits / queries |
| Segment memory | `segments` stats skewed | merge policy / forcemerge |
### 4.3 Disk skew
**Trigger (example):** disk **CV > ~0.3**.
```bash
GET _cat/nodes?v&h=name,ip,disk.total,disk.used,disk.avail,disk.used_percent&s=disk.used_percent:desc
GET _cat/allocation?v
GET _cat/indices?v&h=index,store.size&s=store.size:desc
GET _cat/shards/{large_index}?v&h=shard,prirep,store,node
```
**Mitigate:** reroute large shards first. Watermark tuning **only** with governance — defaults and danger cases are in [sop-disk-storage.md](sop-disk-storage.md).
```bash
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": "80%",
"cluster.routing.allocation.disk.watermark.high": "85%",
"cluster.routing.allocation.disk.watermark.flood_stage": "90%"
}
}
POST _cluster/reroute
{
"commands": [{
"move": {
"index": "large_index",
"shard": 0,
"from_node": "node_disk_full",
"to_node": "node_disk_free"
}
}]
}
```
---
## 5. End-to-end workflow
### 5.1 One-pass command bundle
```bash
GET _cat/nodes?v&h=name,ip,cpu,heap.percent,disk.used_percent,load_1m,shards&s=cpu:desc
GET _cat/allocation?v
GET _cat/thread_pool?v&h=node_name,name,active,queue,rejected&s=queue:desc
GET _cat/indices?v&h=index,pri,rep,docs.count,store.size&s=store.size:desc
```
### 5.2 Priority
```text
1. [Urgent] Resource skew + any node > ~85% → reroute / relieve hot nodes now
2. [High] Traffic skew + any node CPU > ~70% → coordinators / hot shard moves
3. [Normal] Data skew only, resources OK → plan rebalance; not always urgent
4. [Watch] Mild skew (CV < ~0.5, resources < ~60%) → document and monitor
```
### 5.3 Causal sketch
```text
Traffic skew
├── no coordinators → data nodes take client work → CPU skew
├── sticky clients → few nodes get connections → CPU skew
└── hot shards → queries hit same nodes → CPU / IO skew
Data skew
├── allocation rules → new nodes underfilled → disk skew
├── custom routing → doc hotspots → shard hotspots
└── huge indices localized → disk / IO skew
Resource skew
├── shard count skew → uneven work → CPU / heap skew
├── hot segments → query concentration → CPU skew
└── uneven fielddata → heap skew
```
---
## Appendix: Quick commands (imbalance)
```bash
GET _cat/nodes?v&h=name,ip,cpu,heap.percent,disk.used_percent,load_1m,shards&s=cpu:desc
GET _cat/allocation?v
GET _cat/shards?v&s=node&h=index,shard,prirep,docs,store,node
GET _cat/thread_pool?v&h=node_name,name,active,queue,rejected
GET _cat/nodes?v&h=name,search,indexing
GET _cat/fielddata?v&s=size:desc
GET _nodes/stats/indices?human&filter_path=nodes.*.indices.fielddata
GET _nodes/hot_threads
PUT _cluster/settings
{
"transient": {
"cluster.routing.rebalance.enable": "all"
}
}
POST _cluster/reroute
{
"commands": [{
"move": {
"index": "index_name",
"shard": 0,
"from_node": "node_from",
"to_node": "node_to"
}
}]
}
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": "80%",
"cluster.routing.allocation.disk.watermark.high": "85%"
}
}
```
FILE:references/sop-query-thread-pool.md
# SOP: Query performance and thread pools
**Covers:** `ThreadPool.SearchRejected` (P0), `ThreadPool.SearchQueueHigh` (P0), `Performance.SearchLatencyHigh` (P1), `Performance.LongRunningTask` (P1)
*(Frequency notes in parentheses refer to internal case statistics — tune to your catalog.)*
---
## Diagnosis decision tree
```
Query symptoms
├── HTTP 429 on search → SearchRejected (pool reject)
├── High search queue → SearchQueueHigh (backlog)
├── Latency above SLO → SearchLatencyHigh
└── Task running > ~10m → LongRunningTask
Key signals
├── GET _nodes/stats/thread_pool → search.queue / search.rejected
├── GET _tasks?detailed=true → long-running search tasks
└── SEARCHSLOW logs → slow query bodies / timings
```
> **Co‑stress vs strict causality:** Wide / high-QPS searches (e.g. `match_all` on a large index) typically **raise CPU, search queue/rejections, and GC pressure in the same window**. Describe these as **co-occurring or mutually reinforcing** under one workload — not a fixed chain like “rejected → therefore GC rose first,” unless timestamps prove ordering.
> **CPU rubric alignment:** When **`_tasks`** / slow logs show a hot index + heavy `*search*`, state explicitly that **NodeCPU / `hot_threads` and `search` pool saturation are the same overload story** (see [sop-cpu-load.md](sop-cpu-load.md)). If CMS or the rule catalog already fires **CPU at P0** and/or **`ThreadPool.Search*` at P0** ([health-events-catalog.md](health-events-catalog.md)), keep the **headline severity** aligned — e.g. **“resource overload (CPU + search path)”** — rather than narrating only secondary signals as **P1-only**.
> **CMS time alignment:** Add **one sentence** on whether **`NodeCPUUtilization` (or equivalent CMS window)** overlaps the period of **`search` queue / reject** pressure — including the case where **CPU mean is no longer high** but **CPU peaked in the same window** as rejects (common after the queue drains).
---
## Report narrative: search pool vs GC / CPU headlines
**Audience:** Authors of **human-written** diagnosis summaries (and agent prose). This section does **not** change CMS thresholds or rule engines; it fixes **headline ordering** and **explicit evidence gaps** when **read-path** overload is plausible. Aligns with [acceptance-criteria.md](acceptance-criteria.md) section **6.2** (read-heavy CPU + search pool) and section **6.5** (search pool vs GC-only headline). For write-path vs GC ordering, see [sop-write-performance.md](sop-write-performance.md) and acceptance-criteria section **6.4**.
### Query-bound overload: what to lead with
- When the workload is **read-heavy** (high concurrent search, terms / high-QPS patterns, **`ThreadPool.SearchRejected`** or queue stress), the **primary** storyline should be **`search` pool saturation / query concurrency exceeding pool capacity** — **not** a **GC-only** headline. **`ThreadPool.SearchRejected`** (or catalog-equivalent) should appear **first or co-equal** with resource signals.
- **Old GC + CPU spikes** often **co-occur** or form a **second wave** with concurrent search and segment merge load — they are **not** wrong as evidence, but **misleading as the sole lead** when the incident is **query-bound** and **`thread_pool.search` has not been ruled out**.
- **Production caveat:** The same CMS signals can still support a **legitimately GC-primary** or **breaker-primary** story if engine APIs show **heap / breaker / mapping** issues **without** search-pool saturation. Separate **what CMS proved** from **what still needs** `GET /_nodes/stats/thread_pool` (or `_cat/thread_pool`).
### When engine REST evidence is incomplete
If **`GET /_nodes/stats/thread_pool`** or **`_cat/thread_pool`** was **not** successfully collected (timeouts, unstable public path, firewall), but **GC / CPU** findings exist from **CMS or the rule engine**:
**Suggested wording (Chinese — adapt to facts):**
> 在引擎 API 未完整采集前:若规则引擎或监控已出现 **`ThreadPool.SearchRejected`** 或 **线程池饱和类事件**,应优先按 **「查询并发打满 `search` 池」** 排查;当前因超时/网络未能用 **`/_nodes/stats/thread_pool`** 证实 **`search.rejected` / queue**,以下 **GC / CPU** 为 **监控侧风险画像**,需在 **VPC 内或白名单稳定路径** 上补 **`thread_pool.search`**、**`/_tasks`**、**`/_nodes/hot_threads`** 后再定主次因果。
If **`ThreadPool.SearchRejected`** (or equivalent) **did not** fire in the **same window**, still add **one explicit line** that **`search` pool reject/queue remains unverified** (do not invent rejects):
> **待补证:** **`search` 池是否拒绝或排队**(读路径过载时通常为首要排查点之一),需引擎层 **`/_nodes/stats/thread_pool`** 或 **`_cat/thread_pool`**。
Self-limiting discipline for failures: [es-api-call-failures.md](es-api-call-failures.md).
### Executive summary: parallel P0 / P1 lines
- If the rule engine reports **`JVMMemory.GCTimeRatioTooHigh` (P0-class)** **and** **`ThreadPool.SearchRejected`** (exact priority depends on catalog; some snapshots emit P1), the **human-facing** summary should still **mention both** when the catalog treats search rejects as incident-class.
- Opening bullets must **not** read as **“the only P0 is Old GC”** when **search-path** rules also fired. Order: **search pool / reject / query concurrency** **first or co-equal**; GC/CPU as **co-stress, cascade, or pending confirmation** after `thread_pool` + `_tasks`.
**Quantification (mirrors [sop-write-performance.md](sop-write-performance.md) §2):** For rubric-style reports, add **one line of numbers** from **`GET _nodes/stats/thread_pool`**: per-node **`search.rejected`** and **`search.completed`**, optional **reject share** (roughly `rejected / (rejected + completed)`). **`rejected` is cumulative** since node start (see §1 **Inspect**). When aligning to CMS, cite rule/event names (e.g. **`ThreadPool.SearchRejected`**) alongside API stats.
**Both `search` and `write` show `rejected` (cumulative):** Compare **magnitudes** on the **same node(s)**. When **`search.rejected` ≫ `write.rejected`** and the case is **query-saturation** (high QPS search, **`ThreadPool.SearchRejected`** in scope), the **first storyline** is **`search` pool / query concurrency** — link to **terms / wide query / slow query**, **`_tasks`** / **`hot_threads`**, and **hot index name** when confirmed. **`write.rejected`** stays **valid** as **secondary or parallel** (bulk backfill, catch-up indexing); **do not** open with **write** ahead of **search** in the **executive summary** or **first P0 bullet** unless **timelines** or **paired deltas** show **write** was the dominant stressor in the window. See [acceptance-criteria.md](acceptance-criteria.md) **§6.5** (*P0 / executive order vs `search` ≫ `write`*).
**Different nodes in slow logs vs pool-rejection logs:** A **SEARCHSLOW** / **fetch** entry may attribute latency to **node A** while an **INSTANCELOG** line at **another timestamp** shows **`search` pool** pressure on **node B** — not necessarily a mistake: **fetch** runs where the **shard copy** lives; **pool saturation** can concentrate on the **busiest data node** for that window; **routing** and **replica** choice shift over time. **Reconcile** with **`GET /_cat/shards/{index}`** and **time ordering**; add **one explicit sentence** in the report so reviewers do not treat the two as contradictory. See [acceptance-criteria.md](acceptance-criteria.md) **§6.2** (*Slow-log node vs search-pool node*).
### Customer-facing references to SKILL / engine APIs
| Audience | Preferred phrasing |
|----------|---------------------|
| **External / 对客** | **引擎层必查清单(见 SKILL 文档第 5 节)**、**第 5–7 节** — avoid **§**-style section markers in PDFs or customer email. |
| **Internal** | **“SKILL.md section 5”** in prose is fine without the **§** symbol. |
The health-check script’s footer uses the same **no-§** convention.
### Evidence closure checklist
1. **Rules / CMS:** Did **`ThreadPool.SearchRejected`** (or a metric/event with the same meaning) fire in the report window? State **yes / no** explicitly.
2. **Engine (stable path):** **`GET /_nodes/stats/thread_pool`** with **`filter_path`** for **`search`**; **`GET /_tasks?detailed=true&actions=*search*`** when CPU or reject signals exist.
3. **Narrative:** If (1) is yes or engine shows **`search.rejected` / queue**, do **not** deliver a **GC-only** conclusion for a **read-heavy** incident — tie **GC/CPU** to **co-occurring search load** when evidence supports it (see **Co‑stress** note at the top of this document).
---
## 1. Search thread-pool queue buildup (`SearchQueueHigh`, P0)
### Inspect
```bash
GET _nodes/stats/thread_pool?filter_path=nodes.*.thread_pool.search,nodes.*.name
# queue = current depth; rejected = cumulative rejects (since node process start unless reset)
```
**Counter semantics:** `search.rejected` is **cumulative** for the node process (until restart). Do **not** treat the current value as “rejects only inside the diagnosis window” unless you have a **delta** between two samples or a known reset. **`queue == 0`** can still follow a burst where **`rejected` increased** — the pool drained after the spike.
### Root causes
**A — Slow queries holding workers**
- **Signals:** few tasks tie up `search` workers; queue grows behind them
- **Check:** `GET _tasks?detailed=true&actions=*search*`
- **Mitigate:** `POST _tasks/{task_id}/_cancel`, then fix the query
**B — QPS exceeds capacity**
- **Signals:** queue grows roughly linearly with traffic
- **Mitigate:** throttle / shed load at the client or gateway; **scale out** nodes
**C — CPU starved**
- **Signals:** high CPU **and** high search queue
- **Mitigate:** [sop-cpu-load.md](sop-cpu-load.md)
### Note: search queue size is not a tuning knob
On typical Elasticsearch builds the **search** queue has a **fixed** upper bound (often **1000**). Unlike some write-side pools, you usually **cannot** “fix” saturation by raising the search queue — you must **reduce work per query**, **reduce concurrency**, or **add capacity**.
---
## 2. Search thread-pool rejections (`SearchRejected`, P0)
### Symptoms
- Clients: **`HTTP 429 Too Many Requests`** on search paths
- Logs: `rejected execution of search` (wording varies by version)
### `_cat/thread_pool` vs `_nodes/stats/thread_pool`
Either **`GET _cat/thread_pool`** or **`GET _nodes/stats/thread_pool?filter_path=...search...`** is valid engine evidence for the **`search`** pool. For **`rejected`**, both surface the **same node-level cumulative semantics** (see §1 counter note); use one or both if reviewers ask for **source parity**.
### Immediate response
**1. Capture what is running (while the incident is live)**
```bash
GET _nodes/hot_threads
GET _tasks?detailed=true&actions=*search*
```
**2. Cancel runaway searches**
```bash
POST _tasks/{task_id}/_cancel
```
**3. Check gateway / vendor QoS (if applicable)**
```bash
# Alibaba Cloud Elasticsearch — QoS limiter metrics (if exposed in your build)
GET _qos/limiter/metric
```
### Node skew (`rejected` higher on one node)
Uneven **`search.rejected`** can come from **one client entry path**, shard / routing skew, or **query coordination** landing more often on certain nodes. The elected master participates in **coordination / routing** (and may receive client searches depending on topology) — it does **not** imply “the master executes all shard searches.” Treat one hot node as a **lead**, then confirm with **`_cat/shards`**, **`_tasks`**, and node roles.
**Hot node + layout (wording):** When **`_cat/shards`** shows **extreme primary/replica skew** and **`search.rejected`**, **heap**, or **GC** concentrate on the **same node**, treat that node as the **hot node** in the narrative — the chain can be **internally consistent**. Describe **skew** as **one amplifier of local engine work** on that side (**merge / refresh / indexing** when those paths are active), **together with** read-side concentration (**coordinator / single client URL / `preference=_primary` / routing**, time-skewed peaks). **Coordinator + ingress can still bias index-level work** toward one node even when replicas exist — this is **not** the same as “all reads hit primaries only.” **Do not** imply that **default read traffic** is routed **only** to primaries; **replicas serve queries** unless clients opt out. **`_cat/shards` + per-node `rejected`** stay **facts**; the bullets above are **hypotheses to narrow** with `_tasks`, client settings, and query parameters.
**Placement wording:** Use **one convention** everywhere — e.g. **“all primaries on {node}”**, **“6 / 6 primaries on {node}”**, or **“every `p` row on {node}”** — avoid mixing **“almost all”** in the title with **“all”** in tables unless you document a real exception shard.
**CPU vs `rejected` on different nodes:** Per-node **CPU max** and **rejected** need not align at the **same instant** (peaks can occur at **different minutes**), and one node may host **other hot indices or system work**. When CMS shows **higher CPU on node X** but **higher `search.rejected` on node Y**, say so explicitly — avoid implying “the hottest CPU node must always be the highest-reject node” for the whole window.
---
## 3. Slow queries (`Performance.SearchLatencyHigh`, P1)
### Three-step triage
**Step 1 — Search Profiler**
Use Kibana Search Profiler, or add `"profile": true` to the search body:
```bash
GET /index_name/_search
{
"profile": true,
"query": {
"match": { "field": "value" }
}
}
```
Inspect shard query breakdown (`shards.*.query`, etc.) for dominant query types.
**Step 2 — Slow logs**
Filter `SEARCHSLOW` (or equivalent) for high `took` / `search_time_ms` (e.g. **> 1000 ms** as a starting filter — align to your SLO).
**Step 3 — Baseline**
- Versus yesterday / last week same window?
- All queries slow, or only certain indices / patterns?
---
## 4. Common slow query patterns and mitigations
### 4.1 Wildcard-style queries (often worst)
- **Issue:** `wildcard`, `prefix`, `regexp` can scan huge posting lists.
- **Worst:** leading wildcards (e.g. `*foo`) — effectively full-term scans.
- **Mitigations:**
- `prefix` → **completion** suggester or **edge n-gram** indexing where appropriate
- `wildcard` → structured filters / `match` where possible
- `regexp` → narrower `term` / `terms` / keyword patterns where possible
### 4.2 Deep paging (`from` + `size`)
- **Issue:** each shard must fetch **`from + size`** ordered hits; cost scales with shards.
- **Guard:** default **`index.max_result_window` is 10000** (`from + size` cap).
- **Mitigations:**
- Live paging → **`search_after`**
- Export / scan → **`scroll`** or **PIT + search_after** (modern ES)
- Avoid raising `max_result_window` as a “fix” — it hides the real cost
### 4.3 Aggregations
- **Issue:** high-cardinality `terms` aggs (e.g. raw `user_id`, `ip`) burn CPU / heap.
- **Mitigations:**
- Lower **`size`** (return fewer buckets)
- **`execution_hint: map`** where it fits **low-cardinality** workloads (check version docs)
- Never aggregate on **`text`** — use **`keyword`** (or `doc_values`)
- Prefer indexed fields over heavy **`script`** aggs
- **Never aggregate on `_id` or `text` for production traffic** — fielddata / heap explosions: [sop-memory-gc.md](sop-memory-gc.md)
### 4.4 Script queries
- **Issue:** scripts bypass most index structures.
- **Mitigations:** precompute at index time; **Painless** (not legacy Groovy); avoid hot loops in scripts.
### 4.5 Numeric `term` queries
- **Issue:** building doc-id sets for numeric `term` can be expensive at scale.
- **When:** you only need exact match, not range / sort on that numeric.
- **Mitigation:** add a **`keyword`** (or string) parallel field for equality filters.
### 4.6 `nested` / `parent-child`
- **Issue:** nested docs multiply work; parent-child is often **several×** heavier than nested.
- **Mitigation:** model as **`object`** / flattened structures when business rules allow.
### 4.7 Highlighting
- **Issue:** huge **`fragment_size`** → CPU / memory on highlight.
- **Mitigation:** smaller fragments or highlight off-cluster.
### 4.8 Huge time ranges
- **Issue:** scanning years of data every request.
- **Mitigation:** bound time filters; batch by window.
### 4.9 Unbounded `range` queries
- **Issue:** e.g. `{"range": {"date": {"lte": "2024-01-01"}}}` with **no lower bound** — cost grows forever with data.
- **Mitigation:** always set **both** bounds when possible.
### 4.10 Massive hit counts
- **Issue:** very broad query → **`hits.total`** in the millions; even `size: 10` can be expensive if every shard scores huge sets.
- **Mitigations:** tighten filters; **`track_total_hits: false`** or a capped integer; existence checks → **`size: 0`** + **`terminate_after`** where appropriate.
### 4.11 `refresh_interval` vs query cost
- **Issue:** very low **`refresh_interval`** (e.g. **1s**) → many small segments → more segment files to query / more merge pressure on small nodes.
- **Inspect:**
```bash
GET {index}/_settings?filter_path=**.refresh_interval
GET _cat/segments/{index}?v&h=index,shard,segment,size
```
Many small segments per shard (rule-of-thumb **> ~50** per shard) suggests excessive refresh / merge pressure — validate against your workload.
| Scenario | Suggested `refresh_interval` | Notes |
|----------|------------------------------|-------|
| Logs / high ingest, relaxed search freshness | **30s–60s** | Fewer segments, better bulk throughput |
| Near-real-time dashboards | **5s–10s** | Balance freshness vs cost |
| Bulk load window | **`-1`** (disabled) | After bulk: `POST {index}/_refresh`, then restore interval |
| Small nodes (e.g. 2c4g / 4c8g) | **≥ 30s** typical | Avoid 1s refresh on tiny flavors |
```bash
PUT {index}/_settings
{
"index.refresh_interval": "30s"
}
PUT {index}/_settings
{
"index.refresh_interval": "-1"
}
POST {index}/_refresh
PUT {index}/_settings
{
"index.refresh_interval": "30s"
}
```
---
## 5. Long-running tasks (`LongRunningTask`, P1)
| Task kind | Duration | Notes |
|-----------|----------|-------|
| `delete_by_query` | can exceed **30m** on large data | prefer off-peak |
| `reindex` | hours possible | tune `scroll_size` / slices per docs |
| `_update_by_query` | same class as delete_by_query | same cautions |
| Snapshot / restore | depends on data + network | usually expected; watch progress |
| Scroll | leaks if clients omit `clear_scroll` | monitor open contexts |
### Inspect / cancel / scroll hygiene
```bash
GET _tasks?detailed=true&nodes=*
GET _tasks/{task_id}
POST _tasks/{task_id}/_cancel
GET /_nodes/stats/indices/search?filter_path=nodes.*.indices.search.scroll_*
DELETE /_search/scroll
{
"scroll_id": ["scroll_id_1", "scroll_id_2"]
}
```
---
## Appendix: Quick commands (query)
```bash
GET _cat/thread_pool?v&h=name,node_name,active,queue,rejected
GET _nodes/stats/thread_pool?filter_path=nodes.*.thread_pool.search,nodes.*.name
GET _nodes/hot_threads
GET _tasks?detailed=true&actions=*search*
POST _tasks/{task_id}/_cancel
# Profiling: use a JSON body (not inline on the GET line)
GET /index_name/_search
{
"profile": true,
"query": { "match_all": {} }
}
GET _nodes/stats/indices/search?filter_path=nodes.*.indices.search
# query_total, query_time_in_millis, fetch_total, fetch_time_in_millis
```
FILE:references/sop-service-avalanche.md
# SOP: Node “unavailable” vs search-pool meltdown (service avalanche)
**Covers:** `ServiceAvalanche.SearchServiceDown` (P0), `ServiceAvalanche.AllShardsFailed` (P0), `ServiceAvalanche.CPUInducedUnavailability` (P0)
> **Why this SOP exists:** It targets **“the node looks dead but the process is still up.”** CMS may **not** increment `ClusterDisconnectedNodeCount`, and you may **not** see `NODE_LEFT` / `removed` in `INSTANCELOG` — the usual **node-offline** playbook fails. You need **cross-signals** (CMS CPU still reporting + search pool rejects + `all shards failed`) to pin **CPU overload → search thread-pool exhaustion → cascading search failures**.
---
## Diagnosis entry: “node offline” vs meltdown
```
User: "node offline / ES unavailable" OR intermittent ES API timeouts
├── Control plane DescribeInstance.status
│ ├── active → control plane OK → continue to engine
│ ├── activating → change in flight (scale/restart/upgrade); wait or see sop-activating-change-stuck.md
│ └── inactive / invalid → platform issue → open a ticket with your cloud provider
│
├── CMS NodeCPUUtilization still reporting?
│ ├── Yes → 【process likely alive】 at OS level
│ │ ├── CPU > ~80% sustained → ★ ServiceAvalanche (core scenario in this SOP)
│ │ └── CPU normal but API bad → network / security group / allowlist
│ └── Gaps / no points → 【process may be dead】
│ ├── INSTANCELOG "removed" / "left" → real node departure → [sop-cluster-health.md](sop-cluster-health.md)
│ └── CMS SystemEvent node events → [sop-cluster-health.md](sop-cluster-health.md)
│
├── ES API call pattern (during diagnosis)
│ ├── All calls time out consistently → often network / allowlist (not avalanche)
│ ├── Intermittent success + timeout + CPU > ~80% → ★ suspected avalanche
│ └── Connection refused → process/port down → [sop-cluster-health.md](sop-cluster-health.md)
│
└── INSTANCELOG contains "all shards failed"?
├── Yes + high CPU → ★ avalanche (strong)
├── Yes + normal CPU → often allocation / shard issues → [sop-cluster-health.md](sop-cluster-health.md)
└── No → other paths (query, gateway, client bugs)
```
---
## 1. Core scenario: CPU overload → search pool meltdown
### What “service avalanche” means here
**Definition:** OS process is up, CMS still shows points, but **search is effectively unusable** — searches fail with `SearchPhaseExecutionException: all shards failed` (wording varies). To the app this feels like **node down**.
**Mechanism:** Not “node left cluster,” but **search worker pool saturation + queue overflow + timeouts + cascade**.
### When to call it confirmed (example)
| Signal | Example threshold | CMS / logs |
|--------|-------------------|------------|
| Sustained CPU | avg **> ~80%** for **≥ ~10 min** | `NodeCPUUtilization` |
| Search rejects | `rejected` rising | `SearchThreadpoolRejected` |
| Cascade in logs | burst of **all shards failed** | `INSTANCELOG` |
### vs true node loss
| | Real node offline | Service avalanche (this SOP) |
|---|-------------------|------------------------------|
| Control-plane status | may go `activating` | often stays **active** |
| CMS CPU series | **stops** or is missing | **continues** (often **80%+**) |
| CMS `ClusterStatus` | may go Red | may stay Green / sparse |
| INSTANCELOG | `node left` / `removed` | **no** clean leave |
| INSTANCELOG | no mass `all shards failed` | **many** `all shards failed` |
| `ClusterDisconnectedNodeCount` | **> 0** | often **0** |
| REST `_cluster/health` | sometimes **refused** | often **timeouts** (intermittent) |
| Call pattern | all fail same way | **some succeed, some time out** |
> **Key discriminator:** **CPU metrics still arriving** from the node, at high values, **plus** `all shards failed` → treat as avalanche, not “host gone.” **Intermittent** timeouts + high CPU support overload; **uniform** timeouts lean network/ACL; **refused** leans dead process/port.
---
## 2. Entry signal: intermittent Elasticsearch API timeouts
> **When:** During diagnosis, REST calls **sometimes** return, **sometimes** time out, while CMS shows **CPU > ~80%**.
### Why intermittent timeouts fit overload
- TCP can connect; process listens.
- Internal **`search` pool** is saturated → work queues → latency blows up.
- Some requests finish before the client timeout; others do not → **jittery** success/failure.
- That pattern differs from **“every call times out the same way”** (often path/network).
### Branching checklist
```
Intermittent timeouts
├── CMS NodeCPUUtilization (last ~10m)
│ ├── Sustained > ~80% → highly suspicious → gather evidence in Section 3 Step 2
│ ├── < ~50% → deprioritize avalanche; look at network instability
│ └── sparse points → tighten CMS window / resolution; re-sample
├── DescribeInstance.status
│ ├── active → continue engine analysis
│ └── not active → not this avalanche path
└── INSTANCELOG "rejected" / "all shards failed"
├── rejected lines → pool overflow (strong)
└── none yet → capture hot_threads in brief windows when API responds
```
### Collection tactics when the API is flaky
1. **Longer client timeouts:** e.g. `--connect-timeout 20 --max-time 60`.
2. **Light calls first:** `_cat/nodes`, `_cluster/health`.
3. **Retry heavy calls** (`_nodes/hot_threads`, `_nodes/stats/thread_pool`) up to **~3** times.
4. **Log success ratio** as a severity side-channel.
---
## 3. Four-step diagnosis
### Step 1 — Is the process actually gone?
**Goal:** separate **real** departure from **meltdown**.
```bash
# 1) Control plane
aliyun elasticsearch DescribeInstance --region <region> --InstanceId <id>
# status active → usually OK at CP layer
# 2) CMS CPU last ~5 minutes (points exist?)
aliyun cms DescribeMetricList \
--Namespace "acs_elasticsearch" \
--MetricName "NodeCPUUtilization" \
--Dimensions '[{"instanceId":"<id>"}]' \
--StartTime "<5_min_ago_ms>" \
--EndTime "<now_ms>" \
--Period 60
# Points on both data nodes → likely alive at process level
# No points on a node → suspect real loss / agent gap
# 3) Leave events in INSTANCELOG
aliyun elasticsearch ListSearchLog \
--region <region> --InstanceId <id> \
--type INSTANCELOG \
--query "removed OR left OR disconnect" \
--beginTime "<start_ms>" --endTime "<end_ms>"
# empty → no classic "node left" story
```
**Rule:** CMS CPU **present** + **no** `removed/left` → **not** classic offline → **Step 2**.
CMS CPU **absent** + leave logs → **real offline** → [sop-cluster-health.md](sop-cluster-health.md).
### Step 2 — Prove the avalanche chain (CPU → pool → search failure)
```bash
# 1) CPU trend (~30m) — when did it cross ~80%?
aliyun cms DescribeMetricList \
--Namespace "acs_elasticsearch" \
--MetricName "NodeCPUUtilization" \
--Dimensions '[{"instanceId":"<id>"}]' \
--StartTime "<30_min_ago_ms>" --EndTime "<now_ms>" --Period 60
# 2) Search pool rejects
aliyun cms DescribeMetricList \
--Namespace "acs_elasticsearch" \
--MetricName "SearchThreadpoolRejected" \
--Dimensions '[{"instanceId":"<id>"}]' \
--StartTime "<30_min_ago_ms>" --EndTime "<now_ms>" --Period 60
# 3) Write pool rejects (ingest also melting?)
aliyun cms DescribeMetricList \
--Namespace "acs_elasticsearch" \
--MetricName "WriteThreadpoolRejected" \
--Dimensions '[{"instanceId":"<id>"}]' \
--StartTime "<30_min_ago_ms>" --EndTime "<now_ms>" --Period 60
# 4) "all shards failed" in INSTANCELOG
aliyun elasticsearch ListSearchLog \
--region <region> --InstanceId <id> \
--type INSTANCELOG \
--query "shards" \
--beginTime "<start_ms>" --endTime "<end_ms>"
```
**Example “confirmed” triad:**
1. CPU avg **> ~80%** for **≥ ~10 min**
2. `SearchThreadpoolRejected` **> 0** (rising)
3. `INSTANCELOG` shows **`all shards failed`** burst
### Step 3 — Find the trigger (index / traffic / query)
**3.1 Index from rejected / slow paths**
```bash
aliyun elasticsearch ListSearchLog \
--region <region> --InstanceId <id> \
--type INSTANCELOG \
--query "rejected" \
--beginTime "<start_ms>" --endTime "<end_ms>"
# path like "/my-index/_search" → hot index
```
**3.2 Thread-pool excerpt (`EsRejectedExecutionException`)**
Example block (shape varies by version):
```text
QueueResizingEsThreadPoolExecutor[
name = <node>/search,
pool size = 4,
active threads = 4,
queued tasks = 1005,
queue capacity = 1000,
task execution EWMA = 371μs
]
```
| Field | Healthy-ish | Meltdown-ish | Meaning |
|-------|-------------|--------------|---------|
| active | < pool | **= pool** | all workers busy |
| queued | low | **> capacity** | queue overflow |
| EWMA | low μs | high μs | slower per-task work |
| completed | rising | stalls | throughput collapse |
**3.3 `Caused by` under `all shards failed`**
| `Caused by` | Meaning | Next |
|-------------|---------|------|
| `EsRejectedExecutionException` | pool saturated | CPU / QPS / capacity |
| `TaskCancelledException` (parent cancelled) | timeout cascade | parent timeouts / load |
| `CircuitBreakingException` | breaker | [sop-memory-gc.md](sop-memory-gc.md) |
| `NodeNotConnectedException` | shard/node channel dead | real disconnect path → [sop-cluster-health.md](sop-cluster-health.md) |
**3.4 Elasticsearch APIs (when REST is reachable)**
```bash
curl -sS --connect-timeout 10 --max-time 30 \
-u "-elastic:ES_PASSWORD" \
"http://///_nodes/hot_threads?threads=3"
curl -sS --connect-timeout 10 --max-time 30 \
-u "-elastic:ES_PASSWORD" \
"http://///_tasks?detailed=true&actions=*search*"
curl -sS --connect-timeout 10 --max-time 30 \
-u "-elastic:ES_PASSWORD" \
"http://///_nodes/stats/thread_pool?filter_path=nodes.*.thread_pool.search,nodes.*.name"
```
Deeper query tuning: [sop-query-thread-pool.md](sop-query-thread-pool.md), [sop-cpu-load.md](sop-cpu-load.md).
### Step 4 — Report template
```markdown
## Conclusion: service avalanche (not physical node loss)
**Instance:** {instance_id} ({region})
**Topology:** {node_count} × {spec} (search workers ≈ {pool_size} — verify with _cat/thread_pool)
**Window:** {begin} ~ {end}
### Differentiation
- Control plane: active
- CMS CPU: continuous points (process likely alive)
- Leave logs: none
- Verdict: **meltdown**, not host-down
### Evidence chain
1. CPU inflection at {time}: {before}% → {after}%
2. Search rejects on {node}: {count}
3. `all shards failed` burst at {time}; indices: {index}
4. Root `Caused by`: {text}
### Chain sketch
{trigger} → CPU ~{cpu}% for {duration}m
→ search pool full (active {active}/{pool}, queue {queue}/{cap})
→ rejects ({n})
→ parent cancels → shard failures cascade
→ `all shards failed` → user-visible outage
### Trigger hypothesis
- Hot index: {index}
- Driver: {QPS spike / slow query / load test}
- EWMA: {a}μs → {b}μs
### Actions
{concrete steps}
```
---
## 4. Remediation
### Emergency (P0)
**1) Cut read traffic to the hot index / cluster** (fastest)
- Pause or slash client QPS; stop abusive load tests; route reads to a standby if you have one; enable gateway throttling.
**2) Cancel huge searches** (when `_tasks` works)
```bash
curl -sS -u "ES_USERNAME:ES_PASSWORD" \
"http://///_tasks?detailed=true&actions=*search*"
curl -sS -u "ES_USERNAME:ES_PASSWORD" \
-X POST "http://///_tasks/{task_id}/_cancel"
```
**3) Watch recovery**
- CPU should fall within **~1–2 min** after load drops.
- When `search` queue drains, new searches succeed; `all shards failed` stops.
### Short term (days)
- **Scale vCPU/RAM** — search worker count tracks allocated processors (exact formula is **version/vendor-specific**; on small 7.x-style clusters a **2 vCPU** node often maps to **~4** `search` workers — easy to saturate).
- **Add data nodes** to spread QPS.
- Add **coordinating-only** nodes if reduce-phase / fan-out is the bottleneck.
### Long term
- Query hygiene: wildcards, huge aggs, deep paging, scripts — see [sop-query-thread-pool.md](sop-query-thread-pool.md).
- **Capacity:** rough stress thinking: sustainable QPS scales with **workers / p95 latency**; keep **steady CPU < ~60%**, spikes **< ~80%** if possible.
---
## 5. Load-test checklist
### Before the test
| Check | Guidance | How |
|-------|------------|-----|
| SKU | prefer **≥ 4c16g** class nodes | `DescribeInstance` |
| Count | **≥ 3** nodes incl. coordinators if used | `DescribeInstance` |
| Search workers | know baseline (see `_cat/thread_pool`) | `GET _cat/thread_pool?v&h=node_name,name,active,queue,rejected` |
| Replicas | **≥ 1** for hot indices | `_cat/indices` |
| `refresh_interval` | consider **30s** during soak | `GET /{index}/_settings` |
### During the test
| Metric | Safe band | Danger | CMS |
|--------|-----------|--------|-----|
| CPU avg | < ~60% | > ~80% | `NodeCPUUtilization` |
| Search rejected | 0 | > 0 | `SearchThreadpoolRejected` |
| Heap | < ~75% | > ~85% | `NodeHeapMemoryUtilization` |
| GC overhead | low | high | GC logs |
| Latency | < ~2× baseline | > ~5× | EWMA / APM |
### Ramp strategy
```text
Baseline → 50% → 75% → 100% → 120% → target QPS
Hold each step ≥ ~5m; require CPU < ~60% and rejected = 0 before advancing
If CPU > ~80% OR rejected > 0 → drop back one step immediately
```
---
## 6. Case study: 2000 QPS load test → meltdown
**Cluster:** 2 × **2c8g** (search workers ≈ **4** in observed build)
**Timeline:**
```text
T+0: 2000 QPS to index simple-ppt
T+4m: CPU ~5% → 80%+
T+10m: CPU ~85%+, GC overhead visible (~277ms/s class)
T+21m: search queue 1005 > 1000 → EsRejectedExecutionException (~385 rejects)
T+26m: burst of "all shards failed", EWMA ~67μs → ~371μs
T+26m: users report "node offline"
```
**Differentiation:** `DescribeInstance` active; CMS CPU still reporting; no `removed/left`; `rejected` + `all shards failed` present.
**Conclusion:** 2c nodes with **4** search workers cannot carry **2000 QPS** for that workload → CPU peg → pool meltdown.
**Mitigation:** stop test → recovery → retest on **4c16g** (or more nodes) with ramp rules above.
---
## Appendix: quick commands
```bash
aliyun elasticsearch DescribeInstance --region <region> --InstanceId <id>
aliyun cms DescribeMetricList --Namespace "acs_elasticsearch" \
--MetricName "NodeCPUUtilization" \
--Dimensions '[{"instanceId":"<id>"}]' \
--StartTime "<5_min_ago_ms>" --EndTime "<now_ms>" --Period 60
aliyun elasticsearch ListSearchLog --region <region> --InstanceId <id> \
--type INSTANCELOG --query "removed OR left OR disconnect" \
--beginTime "<start_ms>" --endTime "<end_ms>"
aliyun cms DescribeMetricList --Namespace "acs_elasticsearch" \
--MetricName "SearchThreadpoolRejected" \
--Dimensions '[{"instanceId":"<id>"}]' \
--StartTime "<start_ms>" --endTime "<end_ms>" --Period 60
aliyun elasticsearch ListSearchLog --region <region> --InstanceId <id> \
--type INSTANCELOG --query "shards" \
--beginTime "<start_ms>" --endTime "<end_ms>"
aliyun elasticsearch ListSearchLog --region <region> --InstanceId <id> \
--type INSTANCELOG --query "rejected" \
--beginTime "<start_ms>" --endTime "<end_ms>"
curl -sS --connect-timeout 10 --max-time 30 \
-u "-elastic:ES_PASSWORD" \
"http://///_nodes/hot_threads?threads=3"
curl -sS --connect-timeout 10 --max-time 30 \
-u "-elastic:ES_PASSWORD" \
"http://///_tasks?detailed=true&actions=*search*"
curl -sS --connect-timeout 10 --max-time 30 \
-u "-elastic:ES_PASSWORD" \
"http://///_nodes/stats/thread_pool?filter_path=nodes.*.thread_pool.search,nodes.*.name"
```
FILE:references/sop-write-performance.md
# SOP: Ingest performance and write thread pools
**Covers:** `ThreadPool.WriteRejected` (P0), `ThreadPool.WriteQueueHigh` (P0), `Performance.IndexLatencyHigh` (P0), `Performance.IndexingDropped` (P0)
*(Frequency notes refer to internal case statistics — align names with your rule catalog.)*
---
## Diagnosis decision tree
```
Ingest symptoms
├── HTTP 429 "Too Many Requests" → write pool rejection (WriteRejected)
├── Ingest QPS drops > ~50% → IndexingDropped
├── Ingest latency above SLO → IndexLatencyHigh
└── Writes fail with index read-only → disk flood / blocks → [sop-disk-storage.md](sop-disk-storage.md)
Thread pools
└── GET _nodes/stats/thread_pool → write / bulk queue + rejected
```
---
## 1. Write thread-pool queue buildup (`WriteQueueHigh`, P0)
### Inspect
```bash
GET _nodes/stats/thread_pool?filter_path=nodes.*.thread_pool.write,nodes.*.name
# queue = depth now; rejected = cumulative rejects
```
### Root causes
**A — Ingest QPS exceeds capacity**
- **Signals:** `write` queue grows with traffic.
- **Mitigate:** short term — lower client QPS; **fewer round-trips** via larger bulks **only when** each bulk stays within a safe payload (see **§2 Bulk guidelines** — avoid **oversized** single requests that time out); long term — **scale out**.
**B — Slow disk IO backs up writes**
- **Signals:** high `write` queue **and** high disk utilization (e.g. `NodeStatsDataDiskUtil`).
- **Mitigate:** [sop-disk-storage.md](sop-disk-storage.md) (IO + watermarks).
**C — Merges steal CPU / IO**
- **Signals:** `hot_threads` shows merge; queue high without extreme QPS.
- **Mitigate:** cap merge scheduler threads (see [sop-disk-storage.md](sop-disk-storage.md) / [sop-configuration.md](sop-configuration.md)); reduce segment pressure.
**D — `refresh_interval` too aggressive**
- **Signals:** `refresh_interval` at **1s** or **100ms** on heavy ingest.
- **Mitigate:**
```bash
PUT /index_name/_settings
{
"index.refresh_interval": "30s"
}
```
Tune to the freshness your product actually needs.
**E — Too few nodes for concurrent writers**
- **Signals:** (ingest QPS × payload) / node count exceeds per-node sustainable rate.
- **Mitigate:** add nodes.
### Raising the write queue (usually a poor long-term fix)
Parameter names and supported pools **vary by version** — confirm in your distribution before applying.
```bash
PUT _cluster/settings
{
"transient": {
"thread_pool.write.queue_size": 1000
}
}
```
Prefer **fewer, larger bulks** and **more capacity** over permanently huge queues.
---
## 2. Write thread-pool rejections (`WriteRejected`, P0)
### Symptoms
- Clients: **`HTTP 429 Too Many Requests`**
- Logs: `rejected execution of ...` (wording varies)
### Immediate response
**1) Thread pool snapshot**
```bash
GET _cat/thread_pool?v&h=name,node_name,active,queue,rejected
# inspect write / bulk rows
```
**2) Pattern**
- **Sustained** buildup → cluster **over capacity** → scale or cut traffic.
- **Spiky** rejects → tune client: **fewer parallel** bulk streams first; then adjust **docs / payload per bulk** within safe upper bounds (not “always bigger bulks”).
**3) Client-side (most direct)**
- **First:** cap **parallel** bulk workers / connections (reject storms often come from concurrency × retries).
- Target **~5–15 MB** per bulk request (depends on doc size); **oversized** bulks risk **single-request timeouts** and long write stalls — balance **fewer requests** vs **per-request cap**.
- Retries with **exponential backoff** on 429.
### Bulk guidelines
```text
Payload per bulk: often ~10–30 MB (adjust for doc size)
Docs per bulk: often ~500–2000 (if each doc is small)
Parallel bulks: often ≤ ~2× node count (validate under load)
Heuristic:
- Low CPU but many rejects → too much parallelism → reduce parallel bulks / client connections
- High CPU but slow ingest → bulks may be too small → increase docs per bulk only if each request
stays below a safe max size (avoid monster single bulks)
```
### Evidence interpretation: bulk QPS → write pool
**When this applies:** `ThreadPool.WriteRejected` (or equivalent alerts), `_nodes/stats/thread_pool` shows **`write`** queue / **`rejected`**, and/or **`hot_threads`** shows **`TransportShardBulkAction`**, **`IndexShard`**, **`DocumentParser`**. Typical chain: **high bulk QPS → write-pool saturation** (queue buildup, **`rejected`** counters, bulk/index stacks in **`hot_threads`**, often concurrent merges).
**Primary evidence (prefer over a generic “JVM-only” story):**
| Layer | What to cite |
|-------|----------------|
| Thread pools | `GET _nodes/stats/thread_pool` — `write` **queue**, **active**, **`rejected`** |
| Hot threads | `GET _nodes/hot_threads` — **`TransportShardBulkAction`**, indexing, **`DocumentParser`** |
| Merges | Often concurrently: **`ConcurrentMergeScheduler`** / **`IndexWriter`** / Lucene merge stacks after heavy writes |
**One-line causal chain for reports:** **bulk → write-thread saturation → segment merges → CPU**. **Old GC / heap pressure** from CMS or `JVMMemory.*`-style rules may appear as **concurrent or downstream** stress from indexing+merges—treat as **additive**, not a substitute headline when **`write` / bulk** evidence is clear.
**Report ordering when `ThreadPool.WriteRejected` and JVM P0 both fire (e.g. acceptance / high-QPS-bulk scenarios):** Do **not** imply “GC first, ingest second.” Use one of:
- **Causal chain (recommended for write-queue prompts):** **write overload / `rejected` → client retries → merge + heap pressure → Old GC STW + CPU / IO spikes** — `write` path is the **first** mechanism link; put **`write.rejected` / bulk saturation in the first sentence** of the conclusion when the prompt centers on **queue full / write reject**.
- **Dual P0 headlines:** **`ThreadPool.WriteRejected` (P0)** and **`JVMMemory.GCTimeRatioTooHigh` / Old-gen pressure (P0)** as **co-equal** critical findings; order **write / bulk** before **GC** in the body so the summary matches the one-line conclusion.
**Quantification (rubric-friendly):** Beside **“cumulative large”**, add **concrete counters** from **`_nodes/stats/thread_pool`**: per-node **`write.rejected`** and **`write.completed`** (same pool), optional **reject share** (roughly `rejected / (rejected + completed)`) on the hottest node(s). When aligning to control plane / CMS, cite **rule or event names** (e.g. **`ThreadPool.WriteRejected`**, **`HealthCheck.ThreadPoolSaturation`**) in addition to ES API stats.
**Rule-engine alignment:** `scripts/check_es_instance_health.py` **promotes `ThreadPool.WriteRejected` from P1 → P0** when **`JVMMemory.GCTimeRatioTooHigh`** is present in the same run, and includes **`completed_total` / `completed_by_node`** beside **`rejected`** for reject-share context — keep **human-written** conclusions in the same order as this subsection. Treat promotion as a **label heuristic for co-occurring P0 signals**; confirm **写入 workload** (ingest-heavy vs mixed) with **`hot_threads`** / **`_tasks`** before a strict write→GC causal chain.
**`rejected` semantics:** `thread_pool.write.rejected` (and `bulk` where present) is **cumulative since each data node JVM process started** — **not** “last 120 minutes” unless you show a **delta** between two samples. Say **“cumulative since node start”** in customer-facing summaries to avoid window misread (see also **§1 Inspect** above).
**Per-node asymmetry:** The node with the busiest **`write`** activity in **`hot_threads`** may **not** be the node with the highest **`rejected`** count—**primary shard placement, coordinating/ingest paths, and per-node stats need not align.** Add **one sentence** when both are cited to avoid false contradictions.
**Search-path hypotheses:** Under **write-only** overload, **deprioritize** fielddata / heavy-aggregation root causes. Open **`_nodes/stats/breaker`** / fielddata analysis only if **search**-path signals exist (`search` pool saturation, query rejections, documented read-heavy workload). For mixed CPU/GC symptoms, see [sop-memory-gc.md](sop-memory-gc.md).
---
## 3. High indexing latency (`IndexLatencyHigh`, P0)
### Steps
**Step 1 — Indexing stats**
```bash
GET _nodes/stats/indices/indexing?filter_path=nodes.*.indices.indexing
# index_time_in_millis / index_total ≈ avg indexing time per op (rough)
```
**Step 2 — Common drivers**
| Cause | Clues | Mitigation |
|-------|-------|------------|
| Disk IO | disk metrics high | disk tier / merge limits — [sop-disk-storage.md](sop-disk-storage.md) |
| CPU | CPU high | cheaper mappings, scale out — [sop-cpu-load.md](sop-cpu-load.md) |
| Refresh | very low `refresh_interval` | raise interval for bulk windows |
| Analysis cost | hot threads in analyzer | simplify analyzers / reduce field count |
| Translog `request` durability | fsync every op | `async` + `sync_interval` for bulk (risk tradeoff) |
| Huge documents | multi-MB docs | split fields / shrink payloads |
**Bulk-oriented index settings (example — validate for your ES version)**
```bash
PUT /index_name/_settings
{
"index": {
"refresh_interval": "30s",
"translog": {
"durability": "async",
"sync_interval": "30s"
},
"merge": {
"policy": {
"segments_per_tier": 30,
"max_merged_segment": "256mb"
}
}
}
}
```
`async` translog is **weaker** durability than `request` — use only when the business accepts the risk.
---
## 4. Field notes (from production)
### A — `refresh_interval` and write timeouts (counter-intuitive band)
**Pattern:** very rich analysis → **large refresh batches**; on some topologies **larger nodes** use **longer effective refresh windows**, so **each refresh flushes more data** and can spike latency → **write timeouts**.
**Mitigation (example from a real case):** shorten the window so each refresh does less work:
```bash
PUT /index/_settings
{
"index.refresh_interval": "100ms"
}
```
This **conflicts** with the usual “raise `refresh_interval` for ingest” advice — pick based on **measurement** (indexing rate, segment count, p99 write latency). See also [sop-query-thread-pool.md](sop-query-thread-pool.md) Section 4.11 for the **opposite** problem (refresh **too** fast → segment churn).
### B — ILM cold phase blocks writes
**Pattern:** log indices on ILM enter **cold** with **`read_only`** or move to **searchable snapshot / cold storage**; writes after the transition fail.
**Check**
```bash
GET /index_name/_settings
# look for index.blocks.write
```
**Mitigation**
```bash
PUT /index_name/_settings
{
"index.blocks.write": null
}
```
Use **temporarily** to recover; fix **ILM phase timing** so active writers stay on **hot** tiers, or route writes to the **new** active index.
### C — Queue full but CPU / disk look “fine”
**Pattern:** **many** connections sending **tiny** bulks (e.g. **1–10** docs each).
**Check**
```bash
GET _cat/thread_pool?v&h=name,node_name,active,queue,rejected
GET _nodes/stats?filter_path=nodes.*.process.open_file_descriptors
```
**Mitigate:** fewer connections, **much larger** bulks (e.g. toward **500+** small docs per request when safe). **1000 bulks × 1 doc** is vastly more expensive than **1 bulk × 1000 docs**.
---
## 5. Sudden ingest collapse (`Performance.IndexingDropped`, P0)
### Branching
```text
Ingest drops
├── Same-time drop across many instances → platform / upstream (e.g. shared control plane, network path)
└── Single instance
├── Disk full / flood read-only → sop-disk-storage.md
├── Node left / shards relocating → sop-cluster-health.md
├── Planned change → often temporary; see sop-activating-change-stuck.md if stuck
└── Client throttle / feature flag → check app logs
```
---
## Appendix: quick commands (ingest)
```bash
GET _cat/thread_pool?v&h=name,node_name,active,queue,rejected
GET _nodes/stats/thread_pool?filter_path=nodes.*.thread_pool.write,nodes.*.name
GET _nodes/stats/indices/indexing
GET _cat/indices?v&h=index,docs.count,indexing.index_total,indexing.index_time,indexing.index_failed
PUT /index_name/_settings
{
"index.refresh_interval": "30s"
}
PUT /index_name/_settings
{
"index": {
"translog": {
"durability": "async",
"sync_interval": "30s"
}
}
}
PUT _cluster/settings
{
"transient": {
"indices.merge.scheduler.max_thread_count": 1
}
}
```
FILE:references/verification-method.md
# Verification method
How to validate the Elasticsearch instance diagnosis skill against the **current architecture**:
- Control plane + CMS: `aliyun` CLI OpenAPI
- Health rule engine: `python3 scripts/check_es_instance_health.py`
- Engine-level collection: `curl` to ES REST APIs
When `curl` fails (**401**, **timeouts**, **connection refused**, TLS mismatch), use the progressive guide **[es-api-call-failures.md](es-api-call-failures.md)** before re-running checks blindly.
---
## 1. Environment and credentials
### 1.1 CLI and profile
```bash
aliyun version
aliyun configure list
aliyun --profile <profile_name> sts get-caller-identity
```
**Checklist**:
- [ ] CLI works (recommend >= 3.3.1)
- [ ] `aliyun configure list` shows at least one `Valid` profile
- [ ] The chosen profile returns caller identity successfully
### 1.2 Direct ES credentials
```bash
[[ -n "$ES_ENDPOINT" ]] && echo "ES_ENDPOINT: SET" || echo "ES_ENDPOINT: NOT SET"
[[ -n "$ES_PASSWORD" ]] && echo "ES_PASSWORD: SET" || echo "ES_PASSWORD: NOT SET"
```
**Checklist**:
- [ ] `ES_ENDPOINT` and `ES_PASSWORD` are set when engine checks are required
- [ ] Plain-text password is not echoed
---
## 2. Control-plane OpenAPI (`aliyun` CLI)
These steps can be run standalone or as manual backfill in a runbook.
### 2.1 Elasticsearch OpenAPI
```bash
aliyun --profile <profile_name> elasticsearch DescribeInstance \
--region <region> \
--InstanceId <instance_id>
aliyun --profile <profile_name> elasticsearch ListSearchLog \
--region <region> \
--InstanceId <instance_id> \
--type INSTANCELOG \
--query "*" \
--beginTime <epoch_ms> \
--endTime <epoch_ms>
aliyun --profile <profile_name> elasticsearch ListActionRecords \
--region <region> \
--InstanceId <instance_id>
aliyun --profile <profile_name> elasticsearch ListAllNode \
--region <region> \
--InstanceId <instance_id>
```
### 2.2 CMS OpenAPI
```bash
aliyun --profile <profile_name> cms DescribeMetricList \
--region <region> \
--Namespace acs_elasticsearch \
--MetricName ClusterStatus \
--Dimensions '[{"clusterId":"<instance_id>"}]' \
--StartTime <epoch_ms> \
--EndTime <epoch_ms> \
--Period 300
aliyun --profile <profile_name> cms DescribeSystemEventAttribute \
--region <region> \
--Product elasticsearch \
--SearchKeywords <instance_id> \
--StartTime <epoch_ms> \
--EndTime <epoch_ms>
aliyun --profile <profile_name> cms DescribeMetricMetaList \
--region <region> \
--Namespace acs_elasticsearch
```
**Checklist**:
- [ ] Each call returns `Code=200` or `Success=true`
- [ ] `DescribeMetricList` returns datapoints (or empty series without error)
- [ ] `ListSearchLog` returns `Result` or an empty array (not a hard failure)
- [ ] `ListActionRecords`, `ListAllNode`, `DescribeMetricMetaList` return structured JSON
---
## 3. Health-check script
### 3.1 CLI data source
```bash
python3 scripts/check_es_instance_health.py \
-i <instance_id> -r <region> \
--window 60 \
--data-source cli \
--profile <profile_name>
```
**Checklist**:
- [ ] Script completes without traceback
- [ ] Structured report (P0/P1/P2, evidence, remediation)
- [ ] Metric summary matches the live instance state
### 3.2 Injected input + auto
```bash
python3 scripts/check_es_instance_health.py \
-i <instance_id> -r <region> \
--data-source auto \
--input-json-file /path/to/diag-input.json \
--profile <profile_name>
```
**Checklist**:
- [ ] Fields from input JSON are preferred when present
- [ ] Missing fields fall back to CLI collection
- [ ] With `--data-source input`, no OpenAPI calls are made
---
## 4. Engine-level ES APIs (`curl`)
### 4.1 Connectivity
```bash
curl -sS -u "-elastic:ES_PASSWORD" \
"http://///_cluster/health?pretty"
```
### 4.2 Key endpoints
```bash
curl -sS -u "-elastic:ES_PASSWORD" \
-H "Content-Type: application/json" \
-X POST "http://///_cluster/allocation/explain?pretty" \
-d '{}'
curl -sS -u "-elastic:ES_PASSWORD" \
"http://///_cat/shards?v&h=index,shard,prirep,state,node,unassigned.reason&s=state"
curl -sS -u "-elastic:ES_PASSWORD" \
"http://///_nodes/stats/thread_pool?pretty"
```
**Checklist**:
- [ ] JSON or cat text is returned
- [ ] No auth or connection errors
- [ ] Fields useful for root-cause analysis appear (`unassigned.reason`, `thread_pool.rejected`, etc.)
---
## 5. Negative / edge cases
### 5.1 Invalid CLI profile
```bash
python3 scripts/check_es_instance_health.py \
-i <instance_id> -r <region> \
--data-source cli \
--profile not-exist
```
**Expected**: Clear profile / authentication error (no silent success).
### 5.2 Non-existent instance
```bash
python3 scripts/check_es_instance_health.py \
-i es-cn-invalid -r cn-hangzhou \
--data-source cli
```
**Expected**: Instance-not-found or invalid resource error; script does not crash.
### 5.3 Wrong HTTP scheme for endpoint
```bash
curl -sS -u "-elastic:ES_PASSWORD" \
"https://///_cluster/health?pretty"
```
**Expected**: May see `WRONG_VERSION_NUMBER` if TLS mismatch; fix by switching between `http://` and `https://` to match the endpoint.
FILE:scripts/_common.py
"""
_common.py — shared time utilities for the health-check rule engine.
"""
import re
import sys
from datetime import datetime, timedelta, timezone
from typing import Optional
# ---------------------------------------------------------------------------
# Time helpers
# ---------------------------------------------------------------------------
def to_ms_timestamp(time_str: str) -> int:
"""
Parse several time formats into epoch milliseconds.
Supported:
-60 → now minus 60 minutes
2024-01-01 00:00:00 → full datetime
2024-01-01T00:00:00 → ISO-style
2024-01-01 → midnight that day
"""
try:
minutes = int(time_str)
dt = datetime.now() + timedelta(minutes=minutes)
return int(dt.timestamp() * 1000)
except (ValueError, TypeError):
pass
for fmt in ("%Y-%m-%d %H:%M:%S", "%Y-%m-%dT%H:%M:%S", "%Y-%m-%d"):
try:
return int(datetime.strptime(time_str, fmt).timestamp() * 1000)
except ValueError:
continue
print(f"Error: cannot parse time string '{time_str}'")
print("Supported: '2024-01-01 00:00:00' / '2024-01-01T00:00:00' / '2024-01-01' / '-60'")
sys.exit(1)
def format_timestamp(ts_ms: int) -> str:
"""Epoch ms → local time string YYYY-MM-DD HH:MM:SS."""
try:
return datetime.fromtimestamp(int(ts_ms) / 1000).strftime("%Y-%m-%d %H:%M:%S")
except Exception:
return str(ts_ms)
def now_ms() -> int:
"""Current time in epoch milliseconds."""
return int(datetime.now().timestamp() * 1000)
def ago_ms(minutes: int) -> int:
"""Epoch ms for *minutes* ago."""
return int((datetime.now() - timedelta(minutes=minutes)).timestamp() * 1000)
def ms_to_str(ms: int) -> str:
"""Alias of format_timestamp."""
return format_timestamp(ms)
def parse_utc(s: str) -> Optional[datetime]:
"""
Parse ISO UTC strings into timezone-aware datetime (UTC).
Supports: ...Z / +00:00 / [UTC] suffix / fractional seconds stripped.
"""
if not s:
return None
clean = re.sub(r"\.\d+", "", s.replace("[UTC]", "").strip())
for fmt in ("%Y-%m-%dT%H:%M:%SZ", "%Y-%m-%dT%H:%M:%S+00:00", "%Y-%m-%dT%H:%M:%S"):
try:
return datetime.strptime(clean, fmt).replace(tzinfo=timezone.utc)
except ValueError:
continue
return None
FILE:scripts/check_es_instance_health.py
#!/usr/bin/env python3
"""
Alibaba Cloud Elasticsearch instance health check.
Concurrently collects instance status, CloudMonitor metrics, system events, and key logs,
applies Elasticsearch health-event rules (20260318 baseline), and prints a structured report
(findings + evidence + remediation).
Each Finding carries:
event_code — e.g. HealthCheck.ClusterUnhealthy
reason_code — e.g. Cluster.StatusRed
"""
import argparse
import json
import re
import sys
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional
# Shared helpers (time utilities) from _common
from _common import now_ms, ago_ms, ms_to_str, parse_utc
# Control-plane + CMS data collection via aliyun CLI
from openapi_cli_collect import (
METRIC_DEFINITIONS,
normalize_datapoints,
fetch_instance_status_info,
fetch_metrics_batch,
fetch_events,
fetch_log_items,
)
# ---------------------------------------------------------------------------
# Thresholds (ES health-event rules baseline 20260318)
# ---------------------------------------------------------------------------
THRESHOLDS = {
# Cluster health (rules #1–3)
"cluster_red": 2, # ClusterStatus == 2 → Cluster.StatusRed (P0)
"cluster_yellow": 1, # ClusterStatus == 1 → Cluster.StatusYellow (P1)
# CPU (rules #12–13: P0=70%, P1=60%; rule #11 peak: P0=95%, P1=80%)
"cpu_sustained_critical": 70.0, # avg CPU > 70% → CPU.PersistUsageHigh (P0)
"cpu_sustained_warning": 60.0, # avg CPU > 60% → CPU.PersistUsageHigh (P1)
"cpu_peak_critical": 95.0, # max CPU ≥ 95% → CPU.PeakUsageHigh (P0)
"cpu_peak_warning": 80.0, # max CPU ≥ 80% → CPU.PeakUsageHigh (P1)
# Heap (rules #8–9: warning 75% P1, critical 85% P0)
"heap_critical_avg": 85.0, # avg heap > 85% → JVMMemory.OldGenUsageCritical (P0)
"heap_warning_avg": 75.0, # avg heap > 75% → JVMMemory.OldGenUsageHigh (P1)
# Disk utilization (rules #13–14)
"disk_critical_max": 85.0, # max disk > 85% → Disk.UsageCritical (P0)
"disk_warning_max": 75.0, # max disk > 75% → Disk.UsageHigh (P1)
# Disk IO (rule #19: IO util > 90% P0)
"disk_io_max": 90.0, # max IO util > 90% → Disk.IOPerformancePoor (P0)
# Old GC rate (rule #22: >1/min → P1)
"old_gc_rate_per_min": 1.0, # max GC count per sample > 1 → JVMMemory.GCRateTooHigh (P1)
# GC time ratio (rule #18: >10% wall time → P0)
"gc_time_ratio": 0.10, # avg GC duration / 60000ms > 10% → JVMMemory.GCTimeRatioTooHigh (P0)
# CPU imbalance CV (rule #40) + absolute baseline (Bug-New-01)
"load_imbalance_cv": 0.3, # CV > 0.3 → Balancing.NodeCPUUnbalanced (P1)
"load_imbalance_min_cpu": 10.0, # require max node avg CPU > 10% before CV check (idle-cluster guard)
# Disk / memory imbalance
"disk_imbalance_cv": 0.3, # CV > 0.3 → Balancing.NodeDiskUnbalanced
"disk_imbalance_warning": 75.0, # max node > 75% → P1
"disk_imbalance_critical": 85.0, # max node > 85% → P0
"memory_imbalance_cv": 0.3, # CV > 0.3 → Balancing.NodeMemoryUnbalanced
"memory_imbalance_min": 75.0, # fire only if max heap > 75%
# Management-plane duration
"activating_stuck_min": 30, # activating longer than N min → stuck (P0)
"event_stuck_min": 60, # Executing event longer than N min → stuck (P1)
}
PRIORITY_ICON = {"P0": "🔴", "P1": "🟡", "P2": "🔵"}
PRIORITY_LABEL = {"P0": "Critical", "P1": "Warning", "P2": "Info"}
# ---------------------------------------------------------------------------
# Data structures
# ---------------------------------------------------------------------------
@dataclass
class Finding:
"""One diagnostic finding."""
event_code: str # e.g. HealthCheck.ClusterUnhealthy
reason_code: str # e.g. Cluster.StatusRed
name: str
priority: str # P0 / P1 / P2
category: str
description: str
evidence: Dict[str, Any] = field(default_factory=dict)
remediation: List[str] = field(default_factory=list)
@property
def code(self) -> str:
"""Backward compatible alias for event_code."""
return self.event_code
# ---------------------------------------------------------------------------
# Statistics helpers
# ---------------------------------------------------------------------------
def _cv(values: List[float]) -> float:
"""Coefficient of variation std/mean."""
if len(values) < 2:
return 0.0
mean = sum(values) / len(values)
if mean == 0:
return 0.0
std = (sum((v - mean) ** 2 for v in values) / len(values)) ** 0.5
return std / mean
def _summarize(datapoints: List[Dict], metric_name: str) -> Dict[str, Dict]:
"""
Group by node/cluster; compute avg / max / min / latest.
Uses normalize_datapoints + METRIC_DEFINITIONS.
Returns {group_key: {"avg", "max", "min", "latest", "points"}}.
"""
meta = METRIC_DEFINITIONS.get(metric_name, {
"group_field": "nodeIP",
"value_field": "Average",
})
grouped = normalize_datapoints(datapoints, meta)
result: Dict[str, Dict] = {}
for key, points in grouped.items():
valid_pts = [p for p in points if p["value"] is not None]
if not valid_pts:
continue
vals = [p["value"] for p in valid_pts]
# latest: newest sample by timestamp (current state), not window max
latest_pt = max(valid_pts, key=lambda p: p.get("timestamp", 0))
result[key] = {
"avg": sum(vals) / len(vals),
"max": max(vals),
"min": min(vals),
"latest": latest_pt["value"],
"points": valid_pts, # raw points for time-series CV
}
return result
def _compute_timeseries_cv(summary: Dict[str, Dict]) -> Dict[str, Any]:
"""
Time-series load imbalance: CV across nodes per timestamp, track peak window.
Returns:
cv_avg, cv_max, peak_cv, peak_time, peak_values
"""
if len(summary) < 2:
return {"cv_avg": 0.0, "cv_max": 0.0, "peak_cv": 0.0, "peak_time": 0, "peak_values": {}}
# CV from per-node avg and per-node max series
avg_vals = [s["avg"] for s in summary.values()]
max_vals = [s["max"] for s in summary.values()]
cv_avg = _cv(avg_vals)
cv_max = _cv(max_vals)
# Per-timestamp cross-node CV; track worst imbalance bucket
all_timestamps = set()
for s in summary.values():
for pt in s.get("points", []):
all_timestamps.add(pt.get("timestamp", 0))
peak_cv = cv_avg
peak_time = 0
peak_values = {}
for ts in sorted(all_timestamps):
ts_values = {}
for node, s in summary.items():
for pt in s.get("points", []):
if pt.get("timestamp") == ts and pt["value"] is not None:
ts_values[node] = pt["value"]
break
if len(ts_values) >= 2:
vals = list(ts_values.values())
ts_cv = _cv(vals)
if ts_cv > peak_cv:
peak_cv = ts_cv
peak_time = ts
peak_values = ts_values
return {
"cv_avg": cv_avg,
"cv_max": cv_max,
"peak_cv": peak_cv,
"peak_time": peak_time,
"peak_values": peak_values,
}
# Metrics subset for diagnosis (from METRIC_DEFINITIONS)
_DIAG_METRICS = [
"ClusterStatus",
"ClusterDisconnectedNodeCount",
"ClusterNodeCount",
"ClusterShardCount",
"ClusterQueryQPS",
"ClusterIndexQPS",
"NodeCPUUtilization",
"NodeHeapMemoryUtilization",
"NodeDiskUtilization",
"NodeFreeStorageSpace",
"NodeLoad_1m",
"NodeStatsDataDiskUtil",
"JVMGCOldCollectionCount",
"JVMGCOldCollectionDuration",
]
def _cms_period_for_window(window_min: int) -> str:
"""
CMS sampling period from window length:
- window ≤ 30 min → period 60s (finer; ~30 min retention for 60s series)
- window > 30 min → period 300s (31-day retention; avoids empty CMS at 60s boundary)
Bug-New-02: window=60 could align begin_ms outside 60s retention → empty datapoints.
"""
return "60" if window_min <= 30 else "300"
def _get_cluster_setting(raw_settings: Dict, *keys: str) -> Optional[str]:
"""Walk transient → persistent → defaults; return first string leaf at *keys."""
for section in ("transient", "persistent", "defaults"):
node = raw_settings.get(section, {})
for k in keys:
node = node.get(k, {}) if isinstance(node, dict) else None
if node is None:
break
if node and isinstance(node, str):
return node
return None
def _compute_security_score(findings: List[Finding]):
"""
Overall health grade from finding severities.
Returns (grade, icon, description):
A — Healthy: no issues
B — Stable: P2 hints only
C — Warning: P1 present, respond within ~30 minutes
D — Critical: P0 present, immediate action
Rule: any P0 → D; P1 without P0 → C; P2 only → B; none → A.
"""
p0 = [f for f in findings if f.priority == "P0"]
p1 = [f for f in findings if f.priority == "P1"]
p2 = [f for f in findings if f.priority == "P2"]
if p0:
return "D", "🔴", f"Critical — {len(p0)} P0 finding(s) require immediate action"
if p1:
return "C", "🟡", f"Warning — {len(p1)} P1 finding(s); respond within ~30 minutes"
if p2:
return "B", "🔵", f"Stable — {len(p2)} P2 informational finding(s) (no runtime risk)"
return "A", "✅", "Healthy — no findings; instance appears normal"
# ---------------------------------------------------------------------------
# Validation
# ---------------------------------------------------------------------------
# Characters rejected in ES user/password (command-injection hardening for curl args)
_DANGEROUS_CHARS = set('`$(){}[]|;&<>\n\r')
def _validate_endpoint_consistency(
endpoint: str,
instance_id: Optional[str],
) -> Optional[str]:
"""
Endpoint vs instance_id consistency (Bug-07).
For *.aliyuncs.com public endpoints, require instance_id substring in URL
to avoid optional ES checks hitting the wrong cluster.
Returns:
None — proceed
"skip" — mismatch; warning printed; caller should skip optional checks
"info" — private endpoint; informational message only
"""
if not instance_id:
return None
is_aliyun_public = "aliyuncs.com" in endpoint
instance_in_endpoint = instance_id.lower() in endpoint.lower()
if is_aliyun_public and not instance_in_endpoint:
print(
f"⚠️ [ES config check] Endpoint mismatch (Bug-07):\n"
f" ES_ENDPOINT={endpoint}\n"
f" Does not contain diagnosed instance_id={instance_id}; endpoint may belong to another instance.\n"
f" Skipping optional config checks (circuit breakers / watermarks / thread pools) to avoid cross-instance false positives.\n"
f" Fix: export ES_ENDPOINT=\"http://<{instance_id}-endpoint>:9200\"",
file=sys.stderr,
)
return "skip"
elif not is_aliyun_public and not instance_in_endpoint:
# Private IP/hostname — cannot infer ownership automatically
print(
f"ℹ️ [ES config check] Using private endpoint {endpoint} (not *.aliyuncs.com).\n"
f" Confirm this endpoint belongs to instance {instance_id}; otherwise results may reflect another cluster.",
file=sys.stderr,
)
return "info"
return None
def _validate_credentials_safe(username: str, password: str) -> bool:
"""
Reject username/password if they contain characters unsafe for curl argv.
Returns:
True if safe to pass to curl -u; False if unsafe (message printed).
"""
if any(c in _DANGEROUS_CHARS for c in password) or any(c in _DANGEROUS_CHARS for c in username):
print(
f"⚠️ [ES config check] Username/password contain unsafe characters; skipping Elasticsearch API collection.\n"
f" Do not use these in credentials: ` $ ( ) {{ }} [ ] | ; & < > newlines",
file=sys.stderr,
)
return False
return True
# ---------------------------------------------------------------------------
# Data fetch / transform
# ---------------------------------------------------------------------------
def _fetch_error_logs(
instance_id: str,
region_id: str,
begin_ms: int,
end_ms: int,
profile: Optional[str] = None,
) -> List[Dict]:
"""Fetch error/warning logs; normalize fields for rule engine."""
try:
raw_items = fetch_log_items(
instance_id, region_id,
log_type="INSTANCELOG",
begin_ms=begin_ms,
end_ms=end_ms,
query="Exception OR OutOfMemory OR OOM OR ERROR",
profile=profile,
)
logs = []
for d in raw_items:
cc = d.get("contentCollection") or {}
if not isinstance(cc, dict):
cc = {}
logs.append({
"level": cc.get("level", ""),
"host": cc.get("host", d.get("host", "")),
"time": cc.get("time", ""),
"content": (cc.get("content", "") or "")[:300],
})
return logs
except Exception:
return []
def _load_json_bundle(inline_json: Optional[str], json_file: Optional[str]) -> Dict[str, Any]:
"""Load injected diagnostic bundle from CLI args."""
if inline_json:
obj = json.loads(inline_json)
return obj if isinstance(obj, dict) else {}
if json_file:
with open(json_file, "r", encoding="utf-8") as f:
obj = json.load(f)
return obj if isinstance(obj, dict) else {}
return {}
# ---------------------------------------------------------------------------
# Rules — CMS / management metrics
# ---------------------------------------------------------------------------
def _check_instance_status(info: Dict) -> List[Finding]:
findings = []
if not info or "_error" in info:
return findings
status = info.get("status", "")
updated_at_str = info.get("updated_at", "")
if status == "activating" and updated_at_str:
updated_at = parse_utc(updated_at_str)
if updated_at:
elapsed_min = (datetime.now(timezone.utc) - updated_at).total_seconds() / 60
if elapsed_min > THRESHOLDS["activating_stuck_min"]:
findings.append(Finding(
event_code="ManagementPlane.ActivatingStuck",
reason_code="ManagementPlane.ActivatingStuck",
name="Activating change stuck too long",
priority="P0",
category="Management plane",
description=(
f"Instance has stayed in activating for {int(elapsed_min)} minute(s) "
f"(last updated: {updated_at_str}); control-plane workflow may be stuck."
),
evidence={
"current_status": status,
"last_updated_at": updated_at_str,
"elapsed_minutes": int(elapsed_min),
},
remediation=[
"In the console or CMS, check for system events stuck in Executing",
"If no legitimate in-flight change, trigger a harmless update (e.g. description) to refresh state",
"Contact Alibaba Cloud support if the workflow remains stuck",
],
))
elif status == "inactive":
findings.append(Finding(
event_code="ManagementPlane.InstanceInactive",
reason_code="ManagementPlane.InstanceInactive",
name="Instance frozen (inactive)",
priority="P0",
category="Management plane",
description="Instance is inactive (frozen), often due to billing or manual freeze; service unavailable.",
evidence={"current_status": status},
remediation=[
"Verify account balance and payment status",
"After payment, contact support to unfreeze if needed",
],
))
elif status == "invalid":
findings.append(Finding(
event_code="ManagementPlane.InstanceInvalid",
reason_code="ManagementPlane.InstanceInvalid",
name="Instance invalid",
priority="P0",
category="Management plane",
description="Instance is invalid; service unavailable.",
evidence={"current_status": status},
remediation=["Contact Alibaba Cloud support to determine cause and remediation"],
))
return findings
def _check_cluster_health(metrics: Dict[str, List[Dict]]) -> List[Finding]:
findings = []
# --- Cluster health (CMS ClusterStatus) ---
cluster_status_dps = metrics.get("ClusterStatus", [])
if cluster_status_dps:
summary = _summarize(cluster_status_dps, "ClusterStatus")
for cid, stat in summary.items():
# Use latest sample, not max, so stale Red in the window does not override current state
current_status = stat["latest"]
if current_status >= THRESHOLDS["cluster_red"]:
findings.append(Finding(
event_code="HealthCheck.ClusterUnhealthy",
reason_code="Cluster.StatusRed",
name="Cluster status Red",
priority="P0",
category="Cluster health",
description="Cluster is Red: at least one primary shard is unassigned; some data is unavailable.",
evidence={
"cluster_status_latest": f"{int(current_status)} (Red)",
"cluster_status_window_max": f"{int(stat['max'])}",
"sample_point_count": len(cluster_status_dps),
},
remediation=[
"GET _cat/indices?v&health=red # list red indices",
"GET _cat/shards?v&h=index,shard,prirep,state,node,unassigned.reason # shard state",
"GET _cluster/allocation/explain # unassigned reason",
"Fix root cause: offline node → recover; disk full → free space / expand; bad settings → correct",
],
))
elif current_status >= THRESHOLDS["cluster_yellow"]:
findings.append(Finding(
event_code="HealthCheck.ClusterUnhealthy",
reason_code="Cluster.StatusYellow",
name="Cluster status Yellow",
priority="P1",
category="Cluster health",
description="Cluster is Yellow: all primaries assigned but at least one replica is unassigned.",
evidence={
"cluster_status_latest": f"{int(current_status)} (Yellow)",
"cluster_status_window_max": f"{int(stat['max'])}",
},
remediation=[
"GET _cat/shards?v&h=index,shard,prirep,state,unassigned.reason | grep UNASSIGNED",
"GET _cluster/allocation/explain # replica allocation reason",
"If disk pressure: delete cold data or expand capacity",
"If relocation in progress: wait for completion",
],
))
# --- Disconnected nodes ---
disconn_dps = metrics.get("ClusterDisconnectedNodeCount", [])
if disconn_dps:
summary = _summarize(disconn_dps, "ClusterDisconnectedNodeCount")
for _, stat in summary.items():
if stat["max"] > 0:
findings.append(Finding(
event_code="HealthCheck.ClusterUnhealthy",
reason_code="Node.Disconnected",
name="Node disconnected or failed",
priority="P0",
category="Cluster health",
description=f"Up to {int(stat['max'])} node(s) disconnected from the cluster; data on those nodes may be unavailable.",
evidence={
"max_disconnected_nodes": int(stat["max"]),
"avg_disconnected_nodes": round(stat["avg"], 2),
},
remediation=[
"GET _cat/nodes?v # which nodes are missing",
"Check ES logs on offline nodes: OutOfMemoryError / JVM crash / Connection refused",
"OOM → more heap or query tuning; network → fix connectivity; high CPU → reduce load",
"After restart: GET _cat/recovery?v&active_only=true",
],
))
return findings
def _check_cpu(metrics: Dict[str, List[Dict]]) -> List[Finding]:
findings = []
dps = metrics.get("NodeCPUUtilization", [])
if not dps:
return findings
summary = _summarize(dps, "NodeCPUUtilization")
# P0: avg > 70%; P1: 60% < avg <= 70% (rules #12–13)
critical = [(n, s) for n, s in summary.items() if s["avg"] > THRESHOLDS["cpu_sustained_critical"]]
warning = [(n, s) for n, s in summary.items()
if THRESHOLDS["cpu_sustained_warning"] < s["avg"] <= THRESHOLDS["cpu_sustained_critical"]]
# Peak: sustained avg not high (<= 60%) but spike high — >=95% P0, >=80% P1 (rule #11 extension)
peak_critical = [(n, s) for n, s in summary.items()
if s["max"] >= THRESHOLDS["cpu_peak_critical"] and
s["avg"] <= THRESHOLDS["cpu_sustained_warning"]]
peak_warning = [(n, s) for n, s in summary.items()
if THRESHOLDS["cpu_peak_warning"] <= s["max"] < THRESHOLDS["cpu_peak_critical"] and
s["avg"] <= THRESHOLDS["cpu_sustained_warning"]]
if critical:
findings.append(Finding(
event_code="HealthCheck.CPULoadHigh",
reason_code="CPU.PersistUsageHigh",
name="Sustained high CPU (critical)",
priority="P0",
category="Resource metrics",
description=(
f"Node CPU average exceeds {THRESHOLDS['cpu_sustained_critical']}%; "
f"severe bottleneck — high risk of node loss or cascading failure."
),
evidence={
"affected_nodes": [
f"{n} avg={s['avg']:.1f}% max={s['max']:.1f}%"
for n, s in critical
]
},
remediation=[
"GET _nodes/hot_threads # hot threads",
"GET _tasks?detailed=true&actions=*search* # find/cancel heavy searches",
"Throttle clients: lower QPS or pause bulk ingest temporarily",
"Tune slow queries: filters, avoid deep pagination, reduce wildcard/fuzzy",
"Scale out: add data nodes or upgrade instance class if urgent",
],
))
elif warning:
findings.append(Finding(
event_code="HealthCheck.CPULoadHigh",
reason_code="CPU.PersistUsageHigh",
name="Sustained high CPU (warning)",
priority="P1",
category="Resource metrics",
description=(
f"Node CPU average exceeds {THRESHOLDS['cpu_sustained_warning']}%; "
f"elevated load — watch for slow queries or write backlog."
),
evidence={
"affected_nodes": [
f"{n} avg={s['avg']:.1f}% max={s['max']:.1f}%"
for n, s in warning
]
},
remediation=[
"GET _nodes/hot_threads # CPU consumers",
"GET _tasks?detailed=true # in-flight tasks",
"Tune slow queries: deep pagination, wildcard, large aggs",
"Consider more nodes or larger nodes if sustained",
],
))
elif peak_critical:
findings.append(Finding(
event_code="HealthCheck.CPULoadHigh",
reason_code="CPU.PeakUsageHigh",
name="CPU spike critical",
priority="P0",
category="Resource metrics",
description=(
f"Node CPU peak ≥ {THRESHOLDS['cpu_peak_critical']}%; "
f"very high risk of heartbeat timeouts and nodes leaving the cluster."
),
evidence={
"affected_nodes": [
f"{n} max={s['max']:.1f}% avg={s['avg']:.1f}%"
for n, s in peak_critical
]
},
remediation=[
"Cancel heavy tasks: GET _tasks?detailed=true&actions=*search*",
"GET _nodes/hot_threads # identify CPU consumers",
"Temporarily reduce client concurrency",
"Tune heavy queries: BKD ranges, large aggs, complex scripts",
"Scale out or upgrade nodes if spikes recur",
],
))
elif peak_warning:
findings.append(Finding(
event_code="HealthCheck.CPULoadHigh",
reason_code="CPU.PeakUsageHigh",
name="CPU spike warning",
priority="P1",
category="Resource metrics",
description=(
f"Node CPU peak in {THRESHOLDS['cpu_peak_warning']}–{THRESHOLDS['cpu_peak_critical']}%; "
f"not sustained high load, but spikes indicate heavy queries or ingest bursts "
f"that may become sustained if ignored."
),
evidence={
"affected_nodes": [
f"{n} max={s['max']:.1f}% avg={s['avg']:.1f}%"
for n, s in peak_warning
]
},
remediation=[
"GET _nodes/hot_threads # CPU consumers",
"GET _tasks?detailed=true # in-flight tasks",
"Tune heavy queries: large aggs, deep pagination, BKD-heavy patterns",
"Consider more nodes or larger nodes if spikes recur",
],
))
# Load imbalance (rule #40) — time-series CV
cv_stats = _compute_timeseries_cv(summary)
cv_avg = cv_stats["cv_avg"]
cv_max = cv_stats["cv_max"]
peak_cv = cv_stats["peak_cv"]
peak_time = cv_stats["peak_time"]
peak_values = cv_stats["peak_values"]
avg_vals = [s["avg"] for s in summary.values()]
max_vals = [s["max"] for s in summary.values()]
max_avg_cpu = max(avg_vals) if avg_vals else 0.0
# Use max of peak / window CV so a short spike is not averaged away
effective_cv = max(cv_avg, cv_max, peak_cv)
# Absolute CPU floor: skip CV when cluster is idle (idle-cluster CV false positive guard)
if (effective_cv > THRESHOLDS["load_imbalance_cv"]
and len(avg_vals) >= 2
and max_avg_cpu > THRESHOLDS["load_imbalance_min_cpu"]):
evidence = {
"cv_avg_full_window": round(cv_avg, 3),
"cv_max_full_window": round(cv_max, 3),
"cv_peak_window": round(peak_cv, 3),
"max_node_cpu_avg_pct": f"{max(avg_vals):.1f}%",
"min_node_cpu_avg_pct": f"{min(avg_vals):.1f}%",
"max_node_cpu_max_pct": f"{max(max_vals):.1f}%",
"avg_spread_pct": f"{max(avg_vals) - min(avg_vals):.1f}%",
"node_count": len(avg_vals),
}
if peak_cv > cv_avg and peak_time and peak_values:
evidence["peak_time_local"] = ms_to_str(peak_time)
evidence["peak_window_cpu_by_node"] = [
f"{node}: {val:.1f}%" for node, val in sorted(peak_values.items(), key=lambda x: -x[1])
]
desc_suffix = ""
if peak_cv > cv_avg * 1.2:
desc_suffix = (
f" Note: peak-window CV={peak_cv:.2f} is much higher than average — inspect that time range."
)
findings.append(Finding(
event_code="HealthCheck.LoadUnbalanced",
reason_code="Balancing.NodeCPUUnbalanced",
name="Uneven CPU load across nodes",
priority="P1",
category="Capacity planning",
description=(
f"Cross-node CPU coefficient of variation CV={effective_cv:.2f} "
f"(threshold {THRESHOLDS['load_imbalance_cv']}); "
f"max avg {max(avg_vals):.1f}%, min avg {min(avg_vals):.1f}%. "
f"Common causes: uneven shard counts, hot shards, missing coordinating nodes.{desc_suffix}"
),
evidence=evidence,
remediation=[
"# Step 1 — shard balance (common root cause)",
"GET _cat/nodes?v&h=name,ip,cpu,heap.percent,disk.used_percent,shards",
"GET _cat/allocation?v",
"",
"# Step 2 — hot indices / large shards",
"GET _cat/shards?v&s=store:desc&h=index,shard,prirep,store,node",
"GET _cat/indices?v&s=store.size:desc&h=index,pri,rep,store.size",
"",
"# Step 3 — rebalance",
"POST _cluster/reroute { \"commands\": [{\"move\": {\"index\": \"hot_index\", \"shard\": 0, \"from_node\": \"node_busy\", \"to_node\": \"node_idle\"}}] }",
"",
"# Longer term",
"Add coordinating-only nodes, ILM for tiering, tune routing where applicable",
],
))
return findings
def _check_memory(metrics: Dict[str, List[Dict]]) -> List[Finding]:
findings = []
dps = metrics.get("NodeHeapMemoryUtilization", [])
if not dps:
return findings
summary = _summarize(dps, "NodeHeapMemoryUtilization")
# rule #9: P0 critical=85%; rule #8: P1 warning=75%
critical = [(n, s) for n, s in summary.items() if s["avg"] > THRESHOLDS["heap_critical_avg"]]
warning = [(n, s) for n, s in summary.items()
if THRESHOLDS["heap_warning_avg"] < s["avg"] <= THRESHOLDS["heap_critical_avg"]]
if critical:
findings.append(Finding(
event_code="HealthCheck.JVMMemoryPressure",
reason_code="JVMMemory.OldGenUsageCritical",
name="Heap usage critical (OOM risk)",
priority="P0",
category="Resource metrics",
description=(
f"Node heap utilization avg exceeds {THRESHOLDS['heap_critical_avg']}%; "
f"OutOfMemoryError risk is high."
),
evidence={
"affected_nodes": [
f"{n} avg={s['avg']:.1f}% max={s['max']:.1f}%"
for n, s in critical
]
},
remediation=[
"POST _cache/clear # clear caches",
"GET _tasks?detailed=true&actions=*search* # find heavy tasks",
"POST _tasks/<task_id>/_cancel # cancel large tasks",
"Restart affected nodes off-peak if required (brief disruption)",
"Longer term: larger heap / node class; avoid huge result sets",
],
))
elif warning:
findings.append(Finding(
event_code="HealthCheck.JVMMemoryPressure",
reason_code="JVMMemory.OldGenUsageHigh",
name="Heap usage warning",
priority="P1",
category="Resource metrics",
description=(
f"Node heap utilization avg exceeds {THRESHOLDS['heap_warning_avg']}%; "
f"OOM risk and frequent Full GC are possible."
),
evidence={
"affected_nodes": [
f"{n} avg={s['avg']:.1f}% max={s['max']:.1f}%"
for n, s in warning
]
},
remediation=[
"GET _nodes/stats/jvm?filter_path=nodes.*.jvm.mem",
"POST _cache/clear",
"POST /index_name/_cache/clear?fielddata=true # fielddata cache",
"Tune queries: smaller pages, less fielddata, shallower aggs",
"Consider larger nodes if sustained",
],
))
# GC time ratio (rule #18: P0 when GC time > 10% of wall clock per sample window)
gc_dur_dps = metrics.get("JVMGCOldCollectionDuration", [])
if gc_dur_dps:
dur_summary = _summarize(gc_dur_dps, "JVMGCOldCollectionDuration")
# JVMGCOldCollectionDuration: GC ms per sample bucket; ratio = avg_ms / 60000
gc_ratio_nodes = [
(n, s, s["avg"] / 60000.0)
for n, s in dur_summary.items()
if s["avg"] / 60000.0 > THRESHOLDS["gc_time_ratio"]
]
if gc_ratio_nodes:
findings.append(Finding(
event_code="HealthCheck.JVMMemoryPressure",
reason_code="JVMMemory.GCTimeRatioTooHigh",
name="GC time ratio too high",
priority="P0",
category="Resource metrics",
description=(
f"Old GC time exceeds {THRESHOLDS['gc_time_ratio'] * 100:.0f}% of wall time; "
f"pauses hurt search and indexing latency."
),
evidence={
"affected_nodes": [
f"{n} avg_gc_ms={s['avg']:.0f}/min ratio={ratio * 100:.1f}%"
for n, s, ratio in gc_ratio_nodes
]
},
remediation=[
"GET _nodes/stats/jvm",
"POST _cache/clear # relieve heap pressure",
"Inspect large fielddata / request cache usage",
"Tune GC / heap only after confirming collector + pause profile "
"(see jvm.gc.collectors in _nodes/stats/jvm or node GC logs; do not assume G1 vs CMS from version alone)",
"Reduce heavy aggs / large allocations",
],
))
# Old GC frequency (rule #22: P1, >1 per minute)
gc_cnt_dps = metrics.get("JVMGCOldCollectionCount", [])
if gc_cnt_dps:
gc_summary = _summarize(gc_cnt_dps, "JVMGCOldCollectionCount")
gc_nodes = [(n, s) for n, s in gc_summary.items()
if s["max"] > THRESHOLDS["old_gc_rate_per_min"]]
if gc_nodes:
findings.append(Finding(
event_code="HealthCheck.JVMMemoryPressure",
reason_code="JVMMemory.GCRateTooHigh",
name="Old GC too frequent",
priority="P1",
category="Resource metrics",
description=(
f"Old GC rate exceeds {THRESHOLDS['old_gc_rate_per_min']:.0f}/minute; "
f"heap pressure may increase search/index latency."
),
evidence={
"affected_nodes": [
f"{n} max_gc={s['max']:.1f}/min avg_gc={s['avg']:.1f}/min"
for n, s in gc_nodes
]
},
remediation=[
"GET _nodes/stats/jvm",
"Check fielddata and deep aggregations on heap",
"Tune queries to reduce memory churn",
"Tune GC / heap or upgrade nodes if sustained (confirm collector via jvm stats or GC logs, not version guesswork)",
],
))
return findings
def _check_disk(metrics: Dict[str, List[Dict]]) -> List[Finding]:
findings = []
dps = metrics.get("NodeDiskUtilization", [])
if not dps:
return findings
summary = _summarize(dps, "NodeDiskUtilization")
# rule #14: P0 critical=85%; rule #13: P1 warning=75%
critical = [(n, s) for n, s in summary.items() if s["max"] > THRESHOLDS["disk_critical_max"]]
warning = [(n, s) for n, s in summary.items()
if THRESHOLDS["disk_warning_max"] < s["max"] <= THRESHOLDS["disk_critical_max"]]
if critical:
findings.append(Finding(
event_code="HealthCheck.DiskUsageHigh",
reason_code="Disk.UsageCritical",
name="Disk usage critical",
priority="P0",
category="Resource metrics",
description=(
f"Node disk usage exceeds {THRESHOLDS['disk_critical_max']}%; "
f"near ES flood-stage (~95%) writes may fail."
),
evidence={
"affected_nodes": [
f"{n} max={s['max']:.1f}% avg={s['avg']:.1f}%"
for n, s in critical
]
},
remediation=[
"GET _cat/allocation?v",
"GET _cat/indices?v&s=store.size:desc # largest indices",
"DELETE /old_index # drop cold indices if safe",
"PUT _all/_settings {\"index.blocks.read_only_allow_delete\": null} # clear read-only if set",
"Expand disk capacity in console if needed",
],
))
elif warning:
findings.append(Finding(
event_code="HealthCheck.DiskUsageHigh",
reason_code="Disk.UsageHigh",
name="Disk usage warning",
priority="P1",
category="Resource metrics",
description=(
f"Node disk usage exceeds {THRESHOLDS['disk_warning_max']}%; "
f"free space before flood-stage read-only (~95%)."
),
evidence={
"affected_nodes": [
f"{n} max={s['max']:.1f}%"
for n, s in warning
]
},
remediation=[
"GET _cat/indices?v&s=store.size:desc",
"Delete or archive cold data (e.g. OSS)",
"Use ILM to expire old indices",
"Plan disk expansion if growth continues",
],
))
# Disk IO utilization (rule #19: P0 when IO util > 90%)
io_dps = metrics.get("NodeStatsDataDiskUtil", [])
if io_dps:
io_summary = _summarize(io_dps, "NodeStatsDataDiskUtil")
io_nodes = [(n, s) for n, s in io_summary.items()
if s["max"] > THRESHOLDS["disk_io_max"]]
if io_nodes:
findings.append(Finding(
event_code="HealthCheck.DiskIOBottleneck",
reason_code="Disk.IOPerformancePoor",
name="Disk IO utilization too high",
priority="P0",
category="Resource metrics",
description=(
f"Disk IO utilization exceeds {THRESHOLDS['disk_io_max']}%; "
f"higher IO latency can cause heartbeat loss or node drop-out."
),
evidence={
"affected_nodes": [
f"{n} max_io={s['max']:.1f}% avg_io={s['avg']:.1f}%"
for n, s in io_nodes
]
},
remediation=[
"Lower ingest pressure: fewer concurrent bulks, larger bulk batches",
"Raise refresh_interval (e.g. 30s) to cut refresh cost",
"PUT _cluster/settings {\"indices.merge.scheduler.max_thread_count\": 1}",
"Longer term: faster disk tier (ESSD PL1/PL2)",
],
))
return findings
def _check_resource_imbalance(metrics: Dict[str, List[Dict]]) -> List[Finding]:
"""
Detect uneven disk and heap utilization across nodes (CV-based).
Complements CPU imbalance in _check_cpu. See references/sop-node-load-imbalance.md.
"""
findings = []
# --- Disk utilization imbalance ---
disk_dps = metrics.get("NodeDiskUtilization", [])
if disk_dps:
disk_summary = _summarize(disk_dps, "NodeDiskUtilization")
if len(disk_summary) >= 2:
cv_stats = _compute_timeseries_cv(disk_summary)
cv_avg = cv_stats["cv_avg"]
cv_max = cv_stats["cv_max"]
peak_cv = cv_stats["peak_cv"]
effective_cv = max(cv_avg, cv_max, peak_cv)
avg_vals = [s["avg"] for s in disk_summary.values()]
max_vals = [s["max"] for s in disk_summary.values()]
max_disk = max(max_vals) # use max utilization for severity
min_disk = min(avg_vals)
if effective_cv > THRESHOLDS["disk_imbalance_cv"]:
if max_disk > THRESHOLDS["disk_imbalance_critical"]:
priority = "P0"
desc_suffix = "Hot node above critical watermark — read-only / flood risk."
elif max_disk > THRESHOLDS["disk_imbalance_warning"]:
priority = "P1"
desc_suffix = "Hot node disk high — rebalance shards soon."
else:
priority = "P2"
desc_suffix = "CV high but absolute disk OK — plan rebalance."
findings.append(Finding(
event_code="HealthCheck.LoadUnbalanced",
reason_code="Balancing.NodeDiskUnbalanced",
name="Uneven disk usage across nodes",
priority=priority,
category="Capacity planning",
description=(
f"Cross-node disk utilization CV={effective_cv:.2f} "
f"(threshold {THRESHOLDS['disk_imbalance_cv']}); "
f"max {max_disk:.1f}%, min {min_disk:.1f}%. "
f"Common causes: uneven shard counts / large shards on few nodes. {desc_suffix}"
),
evidence={
"cv_avg_full_window": round(cv_avg, 3),
"cv_max_full_window": round(cv_max, 3),
"cv_peak_window": round(peak_cv, 3),
"max_node_disk_pct": f"{max_disk:.1f}%",
"min_node_disk_avg_pct": f"{min_disk:.1f}%",
"spread_pct": f"{max_disk - min_disk:.1f}%",
"node_count": len(avg_vals),
"top_nodes": [
f"{n} avg={s['avg']:.1f}% max={s['max']:.1f}%"
for n, s in sorted(disk_summary.items(), key=lambda x: -x[1]["max"])
][:5],
},
remediation=[
"# Step 1 — shard balance",
"GET _cat/nodes?v&h=name,ip,disk.used_percent,shards",
"GET _cat/allocation?v",
"",
"# Step 2 — move large shards",
"GET _cat/shards?v&s=store:desc&h=index,shard,prirep,store,node",
"POST _cluster/reroute { \"commands\": [{\"move\": {...}}] }",
"",
"# Longer term",
"Tune disk watermarks for earlier relocation; ILM for retention",
],
))
# --- Heap utilization imbalance ---
heap_dps = metrics.get("NodeHeapMemoryUtilization", [])
if heap_dps:
heap_summary = _summarize(heap_dps, "NodeHeapMemoryUtilization")
if len(heap_summary) >= 2:
cv_stats = _compute_timeseries_cv(heap_summary)
cv_avg = cv_stats["cv_avg"]
cv_max = cv_stats["cv_max"]
peak_cv = cv_stats["peak_cv"]
effective_cv = max(cv_avg, cv_max, peak_cv)
avg_vals = [s["avg"] for s in heap_summary.values()]
max_vals = [s["max"] for s in heap_summary.values()]
max_heap = max(max_vals)
min_heap = min(avg_vals)
if (effective_cv > THRESHOLDS["memory_imbalance_cv"]
and max_heap > THRESHOLDS["memory_imbalance_min"]):
findings.append(Finding(
event_code="HealthCheck.LoadUnbalanced",
reason_code="Balancing.NodeMemoryUnbalanced",
name="Uneven heap usage across nodes",
priority="P1",
category="Capacity planning",
description=(
f"Cross-node heap utilization CV={effective_cv:.2f} "
f"(threshold {THRESHOLDS['memory_imbalance_cv']}); "
f"max {max_heap:.1f}%, min {min_heap:.1f}%. "
f"Common causes: uneven shards, fielddata skew, hot caches."
),
evidence={
"cv_avg_full_window": round(cv_avg, 3),
"cv_max_full_window": round(cv_max, 3),
"cv_peak_window": round(peak_cv, 3),
"max_node_heap_pct": f"{max_heap:.1f}%",
"min_node_heap_avg_pct": f"{min_heap:.1f}%",
"spread_pct": f"{max_heap - min_heap:.1f}%",
"node_count": len(avg_vals),
"top_nodes": [
f"{n} avg={s['avg']:.1f}% max={s['max']:.1f}%"
for n, s in sorted(heap_summary.items(), key=lambda x: -x[1]["max"])
][:5],
},
remediation=[
"# Step 1 — shard balance",
"GET _cat/nodes?v&h=name,ip,heap.percent,shards",
"",
"# Step 2 — fielddata",
"GET _cat/fielddata?v&s=size:desc",
"GET _nodes/stats/indices?filter_path=nodes.*.indices.fielddata",
"",
"# Step 3 — rebalance / query tuning",
"Avoid huge text field aggs; reroute shards if needed",
"POST _cluster/reroute { \"commands\": [{\"move\": {...}}] }",
],
))
return findings
def _check_metrics_sparse(metrics: Dict[str, List[Dict]], window_min: int, period: str) -> None:
"""
Warn on stderr when CMS datapoints are much sparser than expected.
Sparse if ClusterStatus / NodeCPUUtilization point count < 20% of expected minimum.
"""
key_metrics = ["ClusterStatus", "NodeCPUUtilization"]
expected_min = max(1, window_min // (int(period) // 60))
threshold = max(1, int(expected_min * 0.2))
sparse = [
m for m in key_metrics
if len(metrics.get(m, [])) < threshold
]
if sparse:
print(
f"\n⚠️ [CMS] Sparse datapoints for key metrics: {sparse}\n"
f" Possible causes:\n"
f" 1. Retention / granularity window shorter than {window_min} minutes (try --window 30)\n"
f" 2. Fault injection just started; wait ≥5 minutes for CMS to populate\n"
f" 3. Wrong instance ID or region — CMS returned no data\n"
f" Conclusions may be incomplete; interpret with care.",
file=sys.stderr,
)
# ---------------------------------------------------------------------------
# Rules — events / logs
# ---------------------------------------------------------------------------
def _check_events(events: List[Dict]) -> List[Finding]:
findings = []
now_utc = datetime.now(timezone.utc)
stuck = []
for e in events:
if e.get("event_status") == "Executing":
start_dt = parse_utc(e.get("start_time", ""))
if start_dt:
elapsed = (now_utc - start_dt).total_seconds() / 60
if elapsed > THRESHOLDS["event_stuck_min"]:
stuck.append({
"event_name": e.get("name", ""),
"reason": e.get("reason", ""),
"start_time": e.get("start_time", ""),
"elapsed_minutes": int(elapsed),
})
if stuck:
findings.append(Finding(
event_code="ManagementPlane.EventStuck",
reason_code="ManagementPlane.EventStuck",
name="Change event stuck in Executing",
priority="P1",
category="Management plane",
description=(
f"{len(stuck)} system event(s) stayed Executing longer than "
f"{THRESHOLDS['event_stuck_min']} minutes; change may be stuck."
),
evidence={"stuck_events": stuck},
remediation=[
"Check change progress in the Alibaba Cloud console",
"If truly stuck, contact support with event IDs",
"Also verify data-node health — node faults often block changes",
],
))
return findings
def _check_logs(logs: List[Dict]) -> List[Finding]:
findings = []
oom = [l for l in logs if "OutOfMemoryError" in l.get("content", "") or
("OOM" in l.get("content", "") and "ERROR" in l.get("content", "").upper())]
if oom:
sample = oom[0]
findings.append(Finding(
event_code="HealthCheck.JVMMemoryPressure",
reason_code="JVMMemory.OOM",
name="Node OutOfMemoryError",
priority="P0",
category="Resource metrics",
description="Logs show OutOfMemoryError; node likely ran out of heap and may have restarted.",
evidence={
"oom_log_lines": len(oom),
"affected_host": sample.get("host", "-"),
"log_time": sample.get("time", "-"),
"log_excerpt": sample.get("content", "")[:150],
},
remediation=[
"Restart affected nodes after confirming recovery path",
"GET _nodes/stats/jvm",
"Analyze heap dumps if captured",
"Increase heap / node size; reduce query memory footprint",
],
))
# UnavailableShardsException in logs (often alongside Red)
shard_errs = [l for l in logs if "UnavailableShardsException" in l.get("content", "")]
if shard_errs and not oom:
sample = shard_errs[0]
# Extract index name from log line (supports [[idx][0]] style)
m = re.search(r"\[+([^\]\[]+)\]\[0\] primary shard is not active", sample.get("content", ""))
idx_name = m.group(1) if m else "unknown"
# System indices (.monitoring-*, .security-*, .kibana*, …) → P2 informational:
# usually a side-effect of resource pressure, not primary business data loss.
_SYS_PREFIXES = (".monitoring-", ".security-", ".kibana", # .kibana-* / .kibana_
".apm-", ".fleet-", ".async-search", ".ds-ilm", ".watches")
is_system_idx = any(idx_name.startswith(p) for p in _SYS_PREFIXES)
findings.append(Finding(
event_code="HealthCheck.ClusterUnhealthy",
reason_code="Cluster.UnavailableShards",
name=(
"System index shard unavailable (informational)"
if is_system_idx else
"Primary shard unavailable"
),
priority="P2" if is_system_idx else "P0",
category="Informational" if is_system_idx else "Cluster health",
description=(
f"UnavailableShardsException in logs (index: {idx_name})."
+ (
" System / stack index — unassigned shards often follow CPU or heap pressure; "
"fix resource pressure first."
if is_system_idx else
" Primary shard inactive — reads/writes to this index will time out."
)
),
evidence={
"matching_log_lines": len(shard_errs),
"sample_index": idx_name,
"is_system_index": "yes" if is_system_idx else "no",
"affected_host": sample.get("host", "-"),
"log_excerpt": sample.get("content", "")[:150],
},
remediation=(
[
"Usually no direct fix on the system index — relieve CPU/heap pressure first",
"Optional cleanup: DELETE .monitoring-es-7-* (expired monitoring)",
"To disable: xpack.monitoring in kibana.yml (product-specific)",
]
if is_system_idx else
[
"GET _cat/shards?v&h=index,shard,prirep,state,unassigned.reason",
"GET _cluster/allocation/explain",
"Apply allocation explain guidance (disk, settings, node availability)",
]
),
))
return findings
# ---------------------------------------------------------------------------
# Rules — optional ES API (cluster settings / _cat / thread pool)
# ---------------------------------------------------------------------------
def _check_fielddata_breaker(raw_settings: Dict) -> List[Finding]:
"""Fielddata circuit breaker limit vs recommended 40%."""
findings: List[Finding] = []
fd_limit = _get_cluster_setting(raw_settings, "indices", "breaker", "fielddata", "limit")
if not fd_limit or "%" not in fd_limit:
return findings
try:
pct = float(fd_limit.rstrip("%"))
except ValueError:
return findings
if pct < 40.0:
findings.append(Finding(
event_code="HealthCheck.JVMMemoryPressure",
reason_code="JVMMemory.BreakerLimitConfigLow",
name="Fielddata breaker limit too low",
priority="P2",
category="Configuration",
description=(
f"indices.breaker.fielddata.limit = {fd_limit}, below the 40% recommendation. "
f"Even when tripped=0, text-field aggregations can easily hit CircuitBreakingException."
),
evidence={
"configured_value": fd_limit,
"recommended": "40% (Elasticsearch default)",
"risk": "Low traffic hides the issue until a spike trips the breaker",
},
remediation=[
'PUT _cluster/settings { "persistent": { "indices.breaker.fielddata.limit": "40%" } }',
"Before changing: GET _nodes/stats/indices/fielddata?fields=*",
],
))
return findings
def _parse_watermark_to_bytes(value: str) -> Optional[int]:
"""Parse disk watermark to bytes. Percentages return None; supports e.g. 500gb, 19605mb, 1000000b."""
if not value:
return None
value = value.strip().lower()
if "%" in value:
return None # percentage watermarks are handled separately
multipliers = {"b": 1, "kb": 1024, "mb": 1024**2, "gb": 1024**3, "tb": 1024**4, "pb": 1024**5}
for suffix, mult in sorted(multipliers.items(), key=lambda x: -len(x[0])):
if value.endswith(suffix):
try:
return int(float(value[:-len(suffix)]) * mult)
except ValueError:
return None
# Bare integer → bytes
try:
return int(value)
except ValueError:
return None
def _check_disk_watermark_config(
raw_settings: Dict,
metrics: Optional[Dict[str, List[Dict]]] = None,
curl_fn=None,
) -> List[Finding]:
"""Disk watermark settings: absolute-byte mode, breach vs CMS, too-low / too-high %."""
findings: List[Finding] = []
wm_low = _get_cluster_setting(raw_settings, "cluster", "routing", "allocation", "disk", "watermark", "low")
wm_flood = _get_cluster_setting(raw_settings, "cluster", "routing", "allocation", "disk", "watermark", "flood_stage")
if not wm_low:
return findings
# --- Branch A: absolute-byte watermarks (e.g. transient "19605mb") ---
flood_bytes = _parse_watermark_to_bytes(wm_flood) if wm_flood else None
low_bytes = _parse_watermark_to_bytes(wm_low)
if low_bytes is not None:
disk_avail_bytes_list: list = [] # [(node_name, avail_bytes)]
if curl_fn:
try:
alloc_data = curl_fn("/_cat/allocation?format=json&bytes=b")
for row in (alloc_data or []):
node = row.get("node", "unknown")
avail = int(row.get("disk.avail", 0))
disk_avail_bytes_list.append((node, avail))
except Exception:
pass
if flood_bytes is not None and disk_avail_bytes_list:
flood_breached = [(n, avail) for n, avail in disk_avail_bytes_list if avail < flood_bytes]
flood_margin = [(n, avail) for n, avail in disk_avail_bytes_list
if avail >= flood_bytes and (avail - flood_bytes) < 500 * 1024 * 1024]
if flood_breached:
findings.append(Finding(
event_code="HealthCheck.DiskUsageHigh",
reason_code="Disk.WatermarkAbsoluteFloodBreached",
name="Free disk below absolute flood_stage (indices read-only)",
priority="P0",
category="Configuration",
description=(
f"flood_stage is an absolute value {wm_flood} (~{flood_bytes / 1024**2:.0f} MB); "
f"these nodes are below it — indices should be read_only_allow_delete."
),
evidence={
"flood_stage_setting": wm_flood,
"flood_stage_bytes": flood_bytes,
"breached_nodes": [
f"{n}: free {avail / 1024**2:.0f}MB < threshold {flood_bytes / 1024**2:.0f}MB"
for n, avail in flood_breached
],
"setting_layer": (
"transient"
if raw_settings.get("transient", {}).get("cluster", {}).get("routing", {}).get("allocation", {}).get("disk", {}).get("watermark")
else "persistent"
),
},
remediation=[
'Reset watermarks: PUT _cluster/settings { "transient": { "cluster.routing.allocation.disk.watermark.low": null, '
'"cluster.routing.allocation.disk.watermark.high": null, '
'"cluster.routing.allocation.disk.watermark.flood_stage": null } }',
'Clear read-only: PUT _all/_settings { "index.blocks.read_only_allow_delete": null }',
"Find who set transient watermarks (scripts / runbooks / drills)",
],
))
elif flood_margin:
findings.append(Finding(
event_code="HealthCheck.DiskUsageHigh",
reason_code="Disk.WatermarkAbsoluteFloodMarginLow",
name="Free disk margin above absolute flood_stage is tiny",
priority="P1",
category="Configuration",
description=(
f"flood_stage absolute {wm_flood} (~{flood_bytes / 1024**2:.0f} MB); "
f"these nodes are barely above it — read-only may return quickly."
),
evidence={
"flood_stage_setting": wm_flood,
"at_risk_nodes": [
f"{n}: free {avail / 1024**2:.0f}MB, margin {(avail - flood_bytes) / 1024**2:.0f}MB"
for n, avail in flood_margin
],
},
remediation=[
"GET _cluster/settings?include_defaults=true&filter_path=**.watermark",
'If mis-set: PUT _cluster/settings { "transient": { "cluster.routing.allocation.disk.watermark.low": null, '
'"cluster.routing.allocation.disk.watermark.high": null, '
'"cluster.routing.allocation.disk.watermark.flood_stage": null } }',
],
))
is_transient = bool(raw_settings.get("transient", {}).get("cluster", {}).get("routing", {}).get("allocation", {}).get("disk", {}).get("watermark"))
findings.append(Finding(
event_code="HealthCheck.ConfigurationRisk",
reason_code="Disk.WatermarkAbsoluteValue",
name="Disk watermarks use absolute bytes (non-default)",
priority="P1",
category="Configuration",
description=(
f"Disk watermarks are absolute bytes (low={wm_low}, flood_stage={wm_flood or 'not set'}), "
f"not percentage mode. Absolute values do not scale when disks grow and are easy to misconfigure."
),
evidence={
"low": wm_low,
"flood_stage": wm_flood or "n/a",
"setting_layer": "transient" if is_transient else "persistent_or_defaults",
"risk": "Does not auto-adjust after disk expansion; can block writes while disk looks healthy",
},
remediation=[
'Revert to percentage defaults: PUT _cluster/settings { "transient": { "cluster.routing.allocation.disk.watermark.low": null, '
'"cluster.routing.allocation.disk.watermark.high": null, '
'"cluster.routing.allocation.disk.watermark.flood_stage": null } }',
"Trace the source of absolute-byte watermarks (scripts / ops / fault injection)",
],
))
return findings
# --- Branch B: percentage watermarks ---
if "%" not in wm_low:
return findings
try:
pct = float(wm_low.rstrip("%"))
except ValueError:
return findings
if metrics:
disk_summary = _summarize(metrics.get("NodeDiskUtilization", []), "NodeDiskUtilization")
breached = [(node, stat["max"]) for node, stat in disk_summary.items() if stat["max"] > pct]
if breached:
findings.append(Finding(
event_code="HealthCheck.DiskUsageHigh",
reason_code="Disk.WatermarkBreached",
name="Disk usage above configured watermark.low",
priority="P1",
category="Configuration",
description=(
f"cluster.routing.allocation.disk.watermark.low = {wm_low}, "
f"but node disk utilization max exceeds it — relocations should be active or imminent."
),
evidence={
"configured_watermark_low": wm_low,
"breached_nodes_max_pct": [f"{n} {v:.1f}%" for n, v in breached],
},
remediation=[
"Free disk: delete old indices/snapshots or expand capacity",
"GET _cat/shards?v # relocations in flight?",
"GET _cluster/settings",
'If low was mis-tuned: PUT _cluster/settings { "persistent": { "cluster.routing.allocation.disk.watermark.low": "85%" } }',
],
))
if pct < 5.0:
findings.append(Finding(
event_code="HealthCheck.DiskUsageHigh",
reason_code="Disk.WatermarkConfigLow",
name="Disk watermark.low extremely low (write-block risk)",
priority="P1",
category="Configuration",
description=(
f"cluster.routing.allocation.disk.watermark.low = {wm_low}, far below a sane floor (~5%). "
f"Normal ingest will cross it quickly and hit flood-stage write blocks."
),
evidence={
"configured_low": wm_low,
"recommended_low": "85% (Elasticsearch default)",
"risk": "Tiny headroom before read-only / flood-stage",
},
remediation=[
'PUT _cluster/settings { "persistent": { "cluster.routing.allocation.disk.watermark.low": "85%", '
'"cluster.routing.allocation.disk.watermark.high": "90%", '
'"cluster.routing.allocation.disk.watermark.flood_stage": "95%" } }',
'If writes blocked: PUT index_name/_settings { "index.blocks.read_only_allow_delete": null }',
],
))
if pct > 90.0:
findings.append(Finding(
event_code="HealthCheck.DiskUsageHigh",
reason_code="Disk.WatermarkConfigHigh",
name="Disk watermark.low too high (relocations start late)",
priority="P2",
category="Configuration",
description=(
f"cluster.routing.allocation.disk.watermark.low = {wm_low}, above the ~85% recommendation. "
f"Shards relocate only at very high utilization — little time to react."
),
evidence={
"configured_low": wm_low,
"recommended_low": "85% (Elasticsearch default)",
"risk": "With high=95%, only ~5% between low and high triggers",
},
remediation=[
'PUT _cluster/settings { "persistent": { "cluster.routing.allocation.disk.watermark.low": "85%" } }',
"Also review watermark.high (~90%) and flood_stage (~95%)",
],
))
return findings
def _check_index_readonly_blocks(curl_fn) -> List[Finding]:
"""Detect indices blocked with read_only_allow_delete."""
findings: List[Finding] = []
try:
settings_raw = curl_fn("/_all/_settings?filter_path=*.settings.index.blocks")
if not settings_raw:
return findings
readonly_indices = []
for idx_name, idx_settings in settings_raw.items():
blocks = idx_settings.get("settings", {}).get("index", {}).get("blocks", {})
if str(blocks.get("read_only_allow_delete", "")).lower() == "true":
readonly_indices.append(idx_name)
if readonly_indices:
biz_indices = [i for i in readonly_indices if not i.startswith(".")]
sys_indices = [i for i in readonly_indices if i.startswith(".")]
findings.append(Finding(
event_code="HealthCheck.DiskUsageHigh",
reason_code="Disk.IndexReadOnly",
name="Indices read-only (read_only_allow_delete)",
priority="P0",
category="Configuration",
description=(
f"{len(readonly_indices)} index(es) have read_only_allow_delete; writes are rejected. "
f"Usually from disk flood_stage; on ES 7.x blocks do not auto-clear after disk recovers."
),
evidence={
"readonly_index_count": len(readonly_indices),
"business_indices_sample": biz_indices[:10] if biz_indices else "(none)",
"system_indices_sample": sys_indices[:10] if sys_indices else "(none)",
},
remediation=[
'PUT _all/_settings { "index.blocks.read_only_allow_delete": null }',
"GET _cat/allocation?v # confirm free disk",
"GET _cluster/settings?include_defaults=true&filter_path=**.watermark",
"On ES 7.x you must clear read-only manually after fixing disk",
],
))
except Exception as e:
print(f"⚠️ [ES config check] Index read-only block check failed: {e}", file=__import__('sys').stderr)
return findings
def _check_zero_replicas(curl_fn) -> List[Finding]:
"""Detect business indices with number_of_replicas=0."""
findings: List[Finding] = []
try:
indices_raw = curl_fn("/_cat/indices?h=index,rep&format=json")
zero_rep = [
i["index"] for i in (indices_raw or [])
if str(i.get("rep", "1")) == "0" and not i.get("index", "").startswith(".")
]
if zero_rep:
findings.append(Finding(
event_code="HealthCheck.ClusterUnhealthy",
reason_code="Index.ZeroReplicas",
name="Business indices have zero replicas",
priority="P2",
category="Configuration",
description=(
f"{len(zero_rep)} non-system index(es) have number_of_replicas=0; "
f"a single node loss makes that index unavailable."
),
evidence={"zero_replica_index_count": len(zero_rep), "sample_indices": zero_rep[:5]},
remediation=[
'PUT /index_name/_settings { "number_of_replicas": 1 }',
"Repeat per index (or use index patterns carefully)",
"Ignore if intentionally cost-optimized",
],
))
except Exception as e:
print(f"⚠️ [ES config check] Failed to read index replica settings: {e}", file=sys.stderr)
return findings
def _check_thread_pool_rejected(curl_fn) -> List[Finding]:
"""Detect thread_pool rejected counters > 0."""
findings: List[Finding] = []
_MONITORED_POOLS = {
"search": ("ThreadPool.SearchRejected", "search"),
"write": ("ThreadPool.WriteRejected", "write"),
"bulk": ("ThreadPool.WriteRejected", "bulk"),
}
try:
raw_tp = curl_fn("/_nodes/stats/thread_pool")
nodes_tp = raw_tp.get("nodes", {})
pool_rejected: Dict[str, Dict[str, int]] = {}
pool_completed: Dict[str, Dict[str, int]] = {}
for _nid, _ninfo in nodes_tp.items():
_nname = _ninfo.get("name", _nid)
for _pool, (_rcode, _label) in _MONITORED_POOLS.items():
_tp = _ninfo.get("thread_pool", {}).get(_pool, {})
_rej = _tp.get("rejected", 0)
_comp = _tp.get("completed", 0)
if _rej > 0:
pool_rejected.setdefault(_pool, {})[_nname] = _rej
pool_completed.setdefault(_pool, {})[_nname] = _comp
merged: Dict[str, Dict] = {}
for _pool, _by_node in pool_rejected.items():
_rcode, _label = _MONITORED_POOLS[_pool]
entry = merged.setdefault(
_rcode,
{"label": _label, "by_node": {}, "total": 0, "completed_by_node": {}, "completed_total": 0},
)
for _n, _c in _by_node.items():
entry["by_node"][_n] = entry["by_node"].get(_n, 0) + _c
entry["total"] += _c
_c_done = pool_completed.get(_pool, {}).get(_n, 0)
entry["completed_by_node"][_n] = entry["completed_by_node"].get(_n, 0) + _c_done
entry["completed_total"] += _c_done
for _rcode, _info in merged.items():
findings.append(Finding(
event_code="HealthCheck.ThreadPoolSaturation",
reason_code=_rcode,
name=f"{_info['label']} thread pool rejected requests",
priority="P1",
category="Resource metrics",
description=(
f"The {_info['label']} thread pool rejected {_info['total']} request(s) "
f"since last restart (cumulative); backlog exceeded pool capacity. "
f"CMS/catalog P0 for this class usually requires sustained reject rate plus traffic "
f"(see health-events-catalog); use engine evidence with CMS to grade severity."
),
evidence={
"total_rejected": _info["total"],
"completed_total": _info["completed_total"],
"completed_by_node": _info["completed_by_node"],
"rejections_by_node": _info["by_node"],
"note": "rejected counters are cumulative since node restart; correlate with logs and traffic",
},
remediation=[
f"Reduce concurrency or QPS for {_info['label']} workloads",
"Inspect hot threads: GET /_nodes/hot_threads",
"Inspect queue depth: GET /_nodes/stats/thread_pool (queue field)",
"Longer term: scale out or add data nodes",
],
))
except Exception as e:
msg = str(e)
if "-u" in msg or "elastic:" in msg:
msg = "(error details omitted to avoid leaking credentials in logs)"
print(f"⚠️ [ES config check] Failed to read thread pool stats: {msg}", file=sys.stderr)
return findings
def _check_cluster_config_optional(
metrics: Optional[Dict[str, List[Dict]]] = None,
instance_id: Optional[str] = None,
*,
connect_timeout: float = 5.0,
read_timeout: float = 10.0,
) -> List[Finding]:
"""
Optional Elasticsearch API checks when ES_ENDPOINT and ES_PASSWORD are set.
Preconditions (skip all checks, return []):
- Endpoint vs instance_id consistency (Bug-07)
- Safe credentials (no shell metacharacters in user/password)
Checks (REST probes, not only CMS alerts):
1. JVMMemory.BreakerLimitConfigLow — fielddata breaker < 40%
2. Disk.WatermarkBreached — disk % above configured watermark.low vs CMS
3. Disk.WatermarkConfigLow — watermark.low < 5%
4. Disk.WatermarkConfigHigh — watermark.low > 90%
5. Index.ZeroReplicas — business indices with replicas=0
6. ThreadPool.SearchRejected — pool rejected > 0 (P1 snapshot; catalog P0 is rate+sustained).
ThreadPool.WriteRejected — P1 unless JVMMemory.GCTimeRatioTooHigh also fires → promoted to P0 (ingest+GC co-headline).
Args:
metrics: CMS bundle including NodeDiskUtilization for % watermark breach; None skips breach.
instance_id: Instance under diagnosis for endpoint consistency.
connect_timeout: Passed to curl ``--connect-timeout`` (TCP connect phase).
read_timeout: Transfer budget after connect; curl ``-m`` is set to ``connect_timeout + read_timeout``
(total operation ceiling, same spirit as prior fixed 15s default when defaults 5+10).
Returns:
List of Finding objects for configuration issues.
"""
import os
endpoint = os.environ.get("ES_ENDPOINT", "").strip()
password = os.environ.get("ES_PASSWORD", "").strip()
if not endpoint or not password:
return []
endpoint_check = _validate_endpoint_consistency(endpoint, instance_id)
if endpoint_check == "skip":
return []
username = os.environ.get("ES_USERNAME", "elastic").strip() or "elastic"
endpoint_url = endpoint if endpoint.startswith(("http://", "https://")) else f"http://{endpoint}"
if not _validate_credentials_safe(username, password):
return []
ct = float(connect_timeout)
rt = float(read_timeout)
max_time = ct + rt
subprocess_timeout = max_time + 5.0
_ct_s, _mt_s = f"{ct:g}", f"{max_time:g}"
def _curl_json(path: str, method: str = "GET", body: Optional[str] = None) -> Dict[str, Any]:
"""
Call Elasticsearch HTTP API via curl argv list (not shell=True).
Credentials are pre-validated for unsafe characters; argv list avoids shell injection.
"""
import json as _json
import subprocess as _subprocess
# argv list — no shell interpretation
cmd = [
"curl", "-sS",
"--connect-timeout", _ct_s,
"-m", _mt_s,
"-u", f"{username}:{password}",
"-X", method.upper(),
f"{endpoint_url}{path}",
]
if body is not None:
cmd += ["-H", "Content-Type: application/json", "-d", body]
try:
proc = _subprocess.run(
cmd, capture_output=True, text=True, check=False, timeout=subprocess_timeout
)
except _subprocess.TimeoutExpired:
# Do not propagate TimeoutExpired — str(e) includes argv and leaks -u credentials.
raise RuntimeError(
f"request timed out (curl budget connect+read={max_time}s, subprocess cap={subprocess_timeout}s)"
) from None
if proc.returncode != 0:
raise RuntimeError((proc.stderr or proc.stdout or "").strip())
txt = (proc.stdout or "").strip()
if not txt:
return {}
return _json.loads(txt)
cluster_status: Optional[str] = None # green/yellow/red
thread_pool_has_rejected = False # reserved for future use
try:
health_resp = _curl_json("/_cluster/health")
cluster_status = health_resp.get("status", "").lower()
except Exception as e:
msg = str(e)
if "-u" in msg or "elastic:" in msg:
msg = "(request failed; details omitted to avoid leaking credentials in logs)"
print(
f"⚠️ [ES config check] Cannot reach Elasticsearch at {endpoint_url}: {msg}\n"
f" Skipping optional config checks (circuit breakers / watermarks / zero-replica).",
file=sys.stderr,
)
return []
findings: List[Finding] = []
try:
raw_settings = _curl_json("/_cluster/settings?include_defaults=true")
findings.extend(_check_fielddata_breaker(raw_settings))
findings.extend(_check_disk_watermark_config(raw_settings, metrics, curl_fn=_curl_json))
except Exception as e:
print(f"⚠️ [ES config check] Failed to read circuit breaker / watermark settings: {e}", file=sys.stderr)
findings.extend(_check_zero_replicas(_curl_json))
findings.extend(_check_index_readonly_blocks(_curl_json))
findings.extend(_check_thread_pool_rejected(_curl_json))
return findings
# ---------------------------------------------------------------------------
# Rule engine — merge & order
# ---------------------------------------------------------------------------
def _promote_write_rejected_if_old_gc_wallclock_p0(findings: List[Finding]) -> None:
"""
When Old GC wall-clock share fires as P0 (JVMMemory.GCTimeRatioTooHigh), promote
engine ThreadPool.WriteRejected from P1 → P0 so severity bands can match ingest+GC
dual-P0 narratives. Co-occurrence is not causation — confirm with hot_threads / _tasks.
"""
codes = {f.reason_code for f in findings}
if "JVMMemory.GCTimeRatioTooHigh" not in codes:
return
extra = (
" Co-occurs with JVMMemory.GCTimeRatioTooHigh (Old GC wall-clock share, P0): "
"use dual P0 headlines or chain write/bulk overload → merges / heap → Old GC → CPU spikes — "
"not a GC-tuning-only narrative. "
"Co-occurrence does not prove ingest caused GC — confirm with hot_threads / _tasks."
)
for i, f in enumerate(findings):
if f.reason_code != "ThreadPool.WriteRejected" or f.priority != "P1":
continue
findings[i] = Finding(
event_code=f.event_code,
reason_code=f.reason_code,
name=f.name,
priority="P0",
category=f.category,
description=f.description + extra,
evidence=f.evidence,
remediation=f.remediation,
)
def _finding_sort_key(f: Finding) -> tuple:
"""Priority first; within the same priority, thread-pool saturation before JVM memory (write/search narrative before GC-only)."""
pri = {"P0": 0, "P1": 1, "P2": 2}.get(f.priority, 9)
if f.reason_code in ("ThreadPool.WriteRejected", "ThreadPool.SearchRejected"):
band = 0
elif f.reason_code.startswith("JVMMemory."):
band = 1
else:
band = 2
return (pri, band, f.reason_code)
def _apply_rules(
status_info: Dict,
metrics: Dict[str, List[Dict]],
events: List[Dict],
logs: List[Dict],
instance_id: Optional[str] = None,
*,
es_curl_connect_timeout: float = 5.0,
es_curl_read_timeout: float = 10.0,
) -> List[Finding]:
findings: List[Finding] = []
findings += _check_instance_status(status_info)
findings += _check_cluster_health(metrics)
findings += _check_cpu(metrics)
findings += _check_memory(metrics)
findings += _check_disk(metrics)
findings += _check_resource_imbalance(metrics)
findings += _check_events(events)
findings += _check_logs(logs)
findings += _check_cluster_config_optional(
metrics,
instance_id=instance_id,
connect_timeout=es_curl_connect_timeout,
read_timeout=es_curl_read_timeout,
)
seen: set = set()
deduped = []
for f in findings:
if f.reason_code not in seen:
seen.add(f.reason_code)
deduped.append(f)
_promote_write_rejected_if_old_gc_wallclock_p0(deduped)
deduped.sort(key=_finding_sort_key)
return deduped
def _must_engine_footer_lines(findings: List[Finding]) -> List[str]:
"""
Deduped engine REST paths for SKILL.md section 5 (MUST table) rows implied by findings.
Does not replace running curl — reminds the operator/agent after environment reachability checks.
"""
codes = {f.reason_code for f in findings}
out: List[str] = []
def add(path: str) -> None:
if path not in out:
out.append(path)
if codes & {"Cluster.StatusRed", "Cluster.StatusYellow"}:
add("POST /_cluster/allocation/explain {}")
add("GET /_cat/shards?v&h=index,shard,prirep,state,node,unassigned.reason")
if codes & {"ThreadPool.SearchRejected", "ThreadPool.WriteRejected"}:
add("GET /_nodes/hot_threads?threads=3")
if codes & {"CPU.PersistUsageHigh", "CPU.PeakUsageHigh"}:
add("GET /_nodes/hot_threads?threads=3")
add("GET /_tasks?detailed=true&actions=*search*")
if codes & {
"JVMMemory.OldGenUsageCritical",
"JVMMemory.OldGenUsageHigh",
"JVMMemory.GCTimeRatioTooHigh",
"JVMMemory.GCRateTooHigh",
"JVMMemory.OOM",
}:
add("GET /_nodes/stats/breaker?pretty")
if "JVMMemory.BreakerLimitConfigLow" in codes:
add("GET /_cluster/settings?include_defaults=true # transient + persistent: indices.breaker.*")
add("GET /_nodes/stats/breaker?pretty")
if codes & {
"Balancing.NodeCPUUnbalanced",
"Balancing.NodeDiskUnbalanced",
"Balancing.NodeMemoryUnbalanced",
}:
add("GET /_cat/shards?v&h=index,shard,prirep,state,node,docs,store")
add("GET /_cat/allocation?v")
if codes & {
"Disk.IndexReadOnly",
"Disk.WatermarkBreached",
"Disk.WatermarkAbsoluteFloodBreached",
"Disk.WatermarkAbsoluteFloodMarginLow",
"Disk.WatermarkAbsoluteValue",
}:
add("GET /_cluster/settings?include_defaults=true&filter_path=**.watermark")
add("GET /_all/_settings?filter_path=*.settings.index.blocks")
return out
# ---------------------------------------------------------------------------
# Report
# ---------------------------------------------------------------------------
def _print_report(
instance_id: str,
status_info: Dict,
findings: List[Finding],
metrics: Dict[str, List[Dict]],
begin_ms: int,
end_ms: int,
):
W = 82
print(f"\n{'═' * W}")
print(f" Elasticsearch instance health report")
print(f"{'═' * W}")
print(f" Instance ID : {instance_id}")
if status_info and "_error" not in status_info:
status = status_info.get("status", "-")
from_v = {
"active": "active (running)",
"activating": "activating (change in progress)",
"inactive": "inactive (frozen)",
"invalid": "invalid (unavailable)",
}
print(f" Instance name : {status_info.get('name', '-')}")
print(f" Control status : {from_v.get(status, status)}")
print(f" ES version : {status_info.get('es_version', '-')}")
print(f" Node count : {status_info.get('node_count', '-')}")
print(f" Last updated : {status_info.get('updated_at', '-')}")
print(f" Analysis window : {ms_to_str(begin_ms)} ~ {ms_to_str(end_ms)}")
print(f" Report time : {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
# ── Summary ──
print(f"\n{'─' * W}")
if not findings:
print(f" ✅ No findings — instance appears healthy.")
else:
p0 = [f for f in findings if f.priority == "P0"]
p1 = [f for f in findings if f.priority == "P1"]
p2 = [f for f in findings if f.priority == "P2"]
print(f" 📊 {len(findings)} finding(s): "
f"🔴 P0×{len(p0)} 🟡 P1×{len(p1)} 🔵 P2×{len(p2)}")
# ── Findings ──
for i, f in enumerate(findings, 1):
icon = PRIORITY_ICON.get(f.priority, "⚪")
print(f"\n{'─' * W}")
print(f" [{i}] {icon} {f.priority} | {f.category} | {f.name}")
print(f" Event code : {f.event_code}")
print(f" Reason code : {f.reason_code}")
print(f" Description : {f.description}")
if f.evidence:
print(f" Evidence :")
for k, v in f.evidence.items():
if isinstance(v, list):
for item in v:
print(f" - {item}")
else:
print(f" {k}: {v}")
if f.remediation:
print(f" Remediation :")
for r in f.remediation:
print(f" • {r}")
must_paths = _must_engine_footer_lines(findings)
if must_paths:
print(f"\n{'─' * W}")
print(
" Engine API checklist (SKILL.md sections 5–7); run after ES endpoint is reachable:"
)
for p in must_paths:
print(f" • {p}")
print(f"{'─' * W}")
# Read-heavy path: do not let GC/CPU findings imply search pool was ruled out.
# Skip when write-path rejection is present — bulk/write-primary diagnosis should follow
# sop-write-performance §2, not a search-pool headline.
# See references/sop-query-thread-pool.md (Report narrative: search pool vs GC / CPU headlines).
rc = {f.reason_code for f in findings}
if (
"ThreadPool.SearchRejected" not in rc
and "ThreadPool.WriteRejected" not in rc
and (
"JVMMemory.GCTimeRatioTooHigh" in rc
or "JVMMemory.GCRateTooHigh" in rc
or "CPU.PeakUsageHigh" in rc
or "CPU.PersistUsageHigh" in rc
)
):
print(f"\n{'─' * W}")
print(" Narrative note (read-heavy / search-pool overload):")
print(" ThreadPool.SearchRejected was not in this run's findings. If query load may")
print(" exceed search pool capacity, verify GET /_nodes/stats/thread_pool")
print(" (search.rejected / queue) on a stable path; GC/CPU may be co-stress.")
print(" See references/sop-query-thread-pool.md (Report narrative: search pool vs GC / CPU headlines).")
print(f"{'─' * W}")
# ── Health grade ──
print(f"\n{'─' * W}")
grade, grade_icon, grade_desc = _compute_security_score(findings)
print(f" Health grade {grade_icon} {grade} — {grade_desc}")
print(f"{'─' * W}")
# ── Metric summary ──
print(f"\n{'═' * W}")
print(f" Key metrics (avg / max / min in window)")
print(f"{'─' * W}")
DISPLAY_METRICS = [
("ClusterStatus", "Cluster health (CMS) "),
("ClusterDisconnectedNodeCount", "Disconnected nodes "),
("NodeCPUUtilization", "Node CPU (%) "),
("NodeHeapMemoryUtilization", "Node heap (%) "),
("NodeDiskUtilization", "Node disk (%) "),
("NodeStatsDataDiskUtil", "Node disk IO util (%) "),
("NodeLoad_1m", "Node load_1m "),
("JVMGCOldCollectionCount", "Old GC count "),
("JVMGCOldCollectionDuration", "Old GC duration (ms) "),
]
STATUS_MAP = {0: "Green", 1: "Yellow", 2: "Red"}
for metric_name, label in DISPLAY_METRICS:
dps = metrics.get(metric_name, [])
if not dps:
continue
summary = _summarize(dps, metric_name)
print(f"\n {label}")
for node, stat in summary.items():
avg_v = stat["avg"]
if metric_name == "ClusterStatus":
val_str = f"avg={STATUS_MAP.get(int(round(avg_v)), str(int(round(avg_v))))} max={STATUS_MAP.get(int(round(stat['max'])), str(int(round(stat['max']))))}"
else:
val_str = f"avg={avg_v:.2f} max={stat['max']:.2f} min={stat['min']:.2f}"
print(f" {node:<50} {val_str}")
print(f"\n{'═' * W}")
print(f" Check complete")
print(f"{'═' * W}\n")
# ---------------------------------------------------------------------------
# CLI entry
# ---------------------------------------------------------------------------
def check(
instance_id: str,
region_id: str,
window_min: int = 60,
profile: Optional[str] = None,
data_source: str = "auto",
input_bundle: Optional[Dict[str, Any]] = None,
*,
connect_timeout: float = 5.0,
read_timeout: float = 10.0,
):
end_ms = now_ms()
begin_ms = ago_ms(window_min)
period = _cms_period_for_window(window_min)
print(f"\n🔍 Health check: {instance_id} (region: {region_id})")
print(f" Window: {ms_to_str(begin_ms)} ~ {ms_to_str(end_ms)} (last {window_min} minutes)")
print(f" CMS period: {period}s ({int(period)//60} minute(s) bucket)")
print(f" Data source: {data_source}")
if profile:
print(f" CLI profile: {profile}")
print(f" Collecting status / metrics / events / logs...\n")
bundle = input_bundle or {}
injected_status = bundle.get("status_info")
injected_metrics = bundle.get("metrics")
injected_events = bundle.get("events")
injected_logs = bundle.get("logs")
use_input_only = data_source == "input"
use_cli_only = data_source == "cli"
if isinstance(injected_status, dict) and not use_cli_only:
status_info = injected_status
elif use_input_only:
status_info = {}
else:
status_info = fetch_instance_status_info(instance_id, region_id, profile=profile)
if isinstance(injected_metrics, dict) and not use_cli_only:
metrics = injected_metrics
elif use_input_only:
metrics = {}
else:
metrics = fetch_metrics_batch(
instance_id, region_id, _DIAG_METRICS, begin_ms, end_ms, period, profile=profile
)
if isinstance(injected_events, list) and not use_cli_only:
events = injected_events
elif use_input_only:
events = []
else:
events = fetch_events(instance_id, region_id, begin_ms, end_ms, profile=profile)
if isinstance(injected_logs, list) and not use_cli_only:
logs = injected_logs
elif use_input_only:
logs = []
else:
logs = _fetch_error_logs(instance_id, region_id, begin_ms, end_ms, profile=profile)
if data_source != "input":
_check_metrics_sparse(metrics, window_min, period)
findings = _apply_rules(
status_info,
metrics,
events,
logs,
instance_id=instance_id,
es_curl_connect_timeout=connect_timeout,
es_curl_read_timeout=read_timeout,
)
_print_report(instance_id, status_info, findings, metrics, begin_ms, end_ms)
def main():
parser = argparse.ArgumentParser(
description="Alibaba Cloud Elasticsearch instance health check (rules baseline 20260318).",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Coverage (event / reason codes):
Control plane ManagementPlane.ActivatingStuck Long-running activating
ManagementPlane.EventStuck System event stuck Executing
ManagementPlane.InstanceInactive Instance frozen
ManagementPlane.InstanceInvalid Instance invalid
Cluster HealthCheck.ClusterUnhealthy
Cluster.StatusRed Cluster Red (P0)
Cluster.StatusYellow Cluster Yellow (P1)
Node.Disconnected Node disconnected (P0)
Cluster.UnavailableShards Primary unavailable [logs] (P0)
CPU / load HealthCheck.CPULoadHigh
CPU.PersistUsageHigh Sustained CPU high >70% (P0) / >60% (P1)
CPU.PeakUsageHigh CPU peak >95% (P0) / 80–94% (P1)
HealthCheck.LoadUnbalanced
Balancing.NodeCPUUnbalanced CPU imbalance CV>0.3 (P1)
Balancing.NodeDiskUnbalanced Disk imbalance CV>0.3 (P0/P1/P2)
Balancing.NodeMemoryUnbalanced Memory imbalance CV>0.3 (P1)
JVM / GC HealthCheck.JVMMemoryPressure
JVMMemory.OldGenUsageCritical Heap >85% (P0)
JVMMemory.OldGenUsageHigh Heap >75% (P1)
JVMMemory.GCTimeRatioTooHigh GC time ratio >10% (P0)
JVMMemory.GCRateTooHigh Old GC >1/min (P1)
JVMMemory.OOM OutOfMemoryError in logs (P0)
Disk HealthCheck.DiskUsageHigh
Disk.UsageCritical Disk >85% (P0)
Disk.UsageHigh Disk >75% (P1)
HealthCheck.DiskIOBottleneck
Disk.IOPerformancePoor Disk IO util >90% (P0)
Examples:
python3 check_es_instance_health.py -i es-cn-xxx -r cn-hangzhou
python3 check_es_instance_health.py -i es-cn-xxx -r cn-hangzhou --window 120
python3 check_es_instance_health.py -i es-cn-xxx -r cn-hangzhou --profile prod
python3 check_es_instance_health.py -i es-cn-xxx -r cn-hangzhou --connect-timeout 3 --read-timeout 10
python3 check_es_instance_health.py -i es-cn-xxx -r cn-hangzhou --data-source input --input-json-file /tmp/diag.json
Credentials:
Prefer `aliyun configure` profiles; pass --profile when needed.
""",
)
parser.add_argument("--instance-id", "-i", required=True, help="Elasticsearch instance ID")
parser.add_argument("--region-id", "-r", required=True, help="Region ID, e.g. cn-hangzhou")
parser.add_argument(
"--window", "-w", type=int, default=60,
help=(
"Analysis window in minutes (default 60). "
"For window ≤30 minutes, CMS period=60s; for >30 minutes, period=300s "
"(avoids empty CMS responses at 60s retention boundaries)."
),
)
parser.add_argument("--profile", default=None, help="Aliyun CLI profile name (default profile if omitted)")
parser.add_argument(
"--data-source",
choices=["auto", "cli", "input"],
default="auto",
help="auto: use injected bundle first, then CLI; cli: CLI only; input: injected bundle only",
)
parser.add_argument("--input-json", default=None, help="Injected diagnostic JSON string")
parser.add_argument("--input-json-file", default=None, help="Path to injected diagnostic JSON file")
parser.add_argument(
"--connect-timeout",
type=float,
default=5.0,
metavar="SEC",
help=(
"Elasticsearch engine probes (optional curl to ES_*): max seconds for TCP connect "
"(curl --connect-timeout). Default 5."
),
)
parser.add_argument(
"--read-timeout",
type=float,
default=10.0,
metavar="SEC",
help=(
"Elasticsearch engine probes: transfer time budget in seconds; curl -m uses "
"connect-timeout + read-timeout (total operation ceiling). Default 10 (15s total with defaults)."
),
)
args = parser.parse_args()
if args.connect_timeout <= 0 or args.read_timeout <= 0:
parser.error("--connect-timeout and --read-timeout must be positive")
input_bundle = _load_json_bundle(args.input_json, args.input_json_file)
check(
args.instance_id,
args.region_id,
args.window,
profile=args.profile,
data_source=args.data_source,
input_bundle=input_bundle,
connect_timeout=args.connect_timeout,
read_timeout=args.read_timeout,
)
if __name__ == "__main__":
main()
FILE:scripts/openapi_cli_collect.py
#!/usr/bin/env python3
"""
OpenAPI data collection via Aliyun CLI (shared by check_es_instance_health.py).
Design goals:
- Do not rely on OPENAPI_* env vars; prefer `aliyun configure` profiles.
- Preserve response shapes compatible with legacy get_es_instance_*.py scripts so rules do not regress.
"""
import json
import os
import subprocess
from concurrent.futures import ThreadPoolExecutor, as_completed
from datetime import datetime, timedelta
from typing import Any, Dict, List, Optional
NAMESPACE = "acs_elasticsearch"
# Metric definitions required for health diagnosis (field names aligned with legacy collectors)
METRIC_DEFINITIONS: Dict[str, Dict[str, str]] = {
"ClusterStatus": {"group_field": "clusterId", "value_field": "Value"},
"ClusterDisconnectedNodeCount": {"group_field": "clusterId", "value_field": "Maximum"},
"ClusterNodeCount": {"group_field": "clusterId", "value_field": "Maximum"},
"ClusterShardCount": {"group_field": "clusterId", "value_field": "Maximum"},
"ClusterQueryQPS": {"group_field": "clusterId", "value_field": "Average"},
"ClusterIndexQPS": {"group_field": "clusterId", "value_field": "Average"},
"NodeCPUUtilization": {"group_field": "nodeIP", "value_field": "Average"},
"NodeHeapMemoryUtilization": {"group_field": "nodeIP", "value_field": "Average"},
"NodeDiskUtilization": {"group_field": "nodeIP", "value_field": "Average"},
"NodeFreeStorageSpace": {"group_field": "nodeIP", "value_field": "Minimum"},
"NodeLoad_1m": {"group_field": "nodeIP", "value_field": "Average"},
"NodeStatsDataDiskUtil": {"group_field": "nodeIP", "value_field": "Maximum"},
"JVMGCOldCollectionCount": {"group_field": "nodeIP", "value_field": "Maximum"},
"JVMGCOldCollectionDuration": {"group_field": "nodeIP", "value_field": "Maximum"},
}
def _run_aliyun_json(
product: str,
action: str,
params: Dict[str, Any],
region_id: Optional[str] = None,
profile: Optional[str] = None,
timeout_sec: int = 30,
) -> Dict[str, Any]:
"""
Invoke Aliyun CLI and parse JSON.
Raises RuntimeError on failure; callers may degrade.
Security: subprocess.run uses argv list mode (no shell=True). Arguments are passed
as discrete list elements and are not interpreted by a shell. Values are validated
to reject characters that could be abused for injection.
"""
env = os.environ.copy()
if not env.get("ALIBABA_CLOUD_USER_AGENT"):
env["ALIBABA_CLOUD_USER_AGENT"] = "AlibabaCloud-Agent-Skills"
_DANGEROUS_CHARS = set('`$()|;&<>\n\r')
_JSON_LITERAL_PARAM_KEYS = frozenset({"Dimensions"})
def _is_safe(val: str) -> bool:
return not any(c in _DANGEROUS_CHARS for c in str(val))
for k, v in params.items():
if v is None:
continue
if k in _JSON_LITERAL_PARAM_KEYS:
continue
if not _is_safe(str(v)):
raise RuntimeError(
f"aliyun {product} {action}: parameter {k!r} contains unsafe characters; refused"
)
cmd: List[str] = ["aliyun"]
if profile:
if not _is_safe(profile):
raise RuntimeError("profile name contains unsafe characters")
cmd += ["--profile", profile]
cmd += [product, action]
if region_id:
if not _is_safe(region_id):
raise RuntimeError("region_id contains unsafe characters")
cmd += ["--region", region_id]
for k, v in params.items():
if v is None:
continue
cmd += [f"--{k}", str(v)]
proc = subprocess.run(
cmd,
capture_output=True,
text=True,
check=False,
timeout=timeout_sec,
env=env,
)
if proc.returncode != 0:
stderr = (proc.stderr or "").strip()
raise RuntimeError(f"aliyun {product} {action} failed: {stderr or proc.stdout.strip()}")
try:
return json.loads(proc.stdout or "{}")
except json.JSONDecodeError as e:
raise RuntimeError(f"aliyun {product} {action} returned non-JSON: {e}")
def normalize_datapoints(datapoints: List[Dict], metric_def: Dict[str, str]) -> Dict[str, List[Dict]]:
"""Group CMS datapoints by group_field and normalize value/timestamp fields."""
group_field = metric_def.get("group_field", "nodeIP")
value_field = metric_def.get("value_field", "Average")
grouped: Dict[str, List[Dict]] = {}
for dp in datapoints or []:
key = dp.get(group_field) or dp.get("nodeIP") or dp.get("clusterId") or "UNKNOWN"
value = dp.get(value_field)
if value is None:
value = dp.get("Value", dp.get("Average", dp.get("Maximum", dp.get("Minimum"))))
ts = dp.get("timestamp", dp.get("Timestamp", 0))
try:
value = float(value) if value is not None else None
except (TypeError, ValueError):
value = None
grouped.setdefault(str(key), []).append({"timestamp": int(ts or 0), "value": value, "raw": dp})
return grouped
def fetch_instance_status_info(instance_id: str, region_id: str, profile: Optional[str] = None) -> Dict[str, Any]:
"""Replacement for get_es_instance_status.fetch_instance_status_info."""
try:
data = _run_aliyun_json(
"elasticsearch",
"DescribeInstance",
{"InstanceId": instance_id},
region_id=region_id,
profile=profile,
timeout_sec=30,
)
r = data.get("Result", {})
return {
"name": r.get("description", "-"),
"status": r.get("status", "-"),
"es_version": r.get("esVersion", "-"),
"updated_at": r.get("updatedAt", "") or "",
"created_at": r.get("createdAt", "") or "",
"node_count": r.get("nodeAmount", 0) or 0,
"protocol": r.get("protocol", "-"),
}
except Exception as e:
return {"_error": str(e)}
def fetch_instance_detail(instance_id: str, region_id: str, profile: Optional[str] = None) -> Dict[str, Any]:
"""Full DescribeInstance payload (clusterTasks, etc.) — replaces get_es_instance_detail.py."""
return _run_aliyun_json(
"elasticsearch",
"DescribeInstance",
{"InstanceId": instance_id},
region_id=region_id,
profile=profile,
timeout_sec=30,
).get("Result", {})
def _fetch_metric_points(
instance_id: str,
region_id: str,
metric_name: str,
begin_ms: int,
end_ms: int,
period: str,
profile: Optional[str] = None,
) -> List[Dict]:
"""Fetch one metric with DescribeMetricList pagination (NextToken)."""
points: List[Dict] = []
next_token: Optional[str] = None
max_pages = 20
# CMS DescribeMetricList expects millisecond timestamps
start_ms = str(int(begin_ms))
end_ms_str = str(int(end_ms))
for _ in range(max_pages):
params = {
"Namespace": NAMESPACE,
"MetricName": metric_name,
"Dimensions": json.dumps([{"clusterId": instance_id}], ensure_ascii=False),
"StartTime": start_ms,
"EndTime": end_ms_str,
"Period": period,
"Length": 1440,
"NextToken": next_token,
}
resp = _run_aliyun_json("cms", "DescribeMetricList", params, region_id=region_id, profile=profile)
raw = resp.get("Datapoints", "[]")
batch = json.loads(raw) if isinstance(raw, str) else (raw or [])
if isinstance(batch, list):
points.extend(batch)
next_token = resp.get("NextToken")
if not next_token:
break
return points
def fetch_metrics_batch(
instance_id: str,
region_id: str,
metric_names: List[str],
begin_ms: int,
end_ms: int,
period: str = "300",
profile: Optional[str] = None,
) -> Dict[str, List[Dict]]:
"""Replacement for get_es_instance_metrics.fetch_metrics_batch."""
result: Dict[str, List[Dict]] = {}
with ThreadPoolExecutor(max_workers=min(8, max(1, len(metric_names)))) as ex:
fut_map = {
ex.submit(
_fetch_metric_points,
instance_id,
region_id,
m,
begin_ms,
end_ms,
period,
profile,
): m
for m in metric_names
}
for fut in as_completed(fut_map):
m = fut_map[fut]
try:
result[m] = fut.result()
except Exception:
result[m] = []
return result
def fetch_events(
instance_id: str,
region_id: str,
begin_ms: int,
end_ms: int,
profile: Optional[str] = None,
) -> List[Dict]:
"""Replacement for get_es_instance_events.fetch_events — list of normalized dicts."""
events: List[Dict] = []
page = 1
page_size = 100
start_ms = str(int(begin_ms))
end_ms_str = str(int(end_ms))
while True:
resp = _run_aliyun_json(
"cms",
"DescribeSystemEventAttribute",
{
"Product": "elasticsearch",
"SearchKeywords": instance_id,
"StartTime": start_ms,
"EndTime": end_ms_str,
"PageNumber": page,
"PageSize": page_size,
},
region_id=region_id,
profile=profile,
)
wrapper = (resp.get("SystemEvents") or {}).get("SystemEvent") or []
if not wrapper:
break
for e in wrapper:
content_raw = e.get("Content")
content = {}
if isinstance(content_raw, str):
try:
content = json.loads(content_raw)
except json.JSONDecodeError:
content = {}
events.append(
{
"name": e.get("Name", ""),
"time_ms": int(e.get("Time", 0) or 0),
"level": e.get("Level", ""),
"event_status": content.get("eventStatus", ""),
"reason": content.get("reasonCode", content.get("reason", "")),
"start_time": content.get("executeStartTime", ""),
"finish_time": content.get("executeFinishTime", ""),
}
)
if len(wrapper) < page_size:
break
page += 1
events.sort(key=lambda x: x.get("time_ms", 0), reverse=True)
return events
def fetch_log_items(
instance_id: str,
region_id: str,
log_type: str = "INSTANCELOG",
begin_ms: Optional[int] = None,
end_ms: Optional[int] = None,
query: str = "*",
size: int = 50,
profile: Optional[str] = None,
) -> List[Dict]:
"""Replacement for get_es_instance_log.fetch_log_items."""
now = datetime.now()
if begin_ms is None:
begin_ms = int((now - timedelta(minutes=10)).timestamp() * 1000)
if end_ms is None:
end_ms = int(now.timestamp() * 1000)
resp = _run_aliyun_json(
"elasticsearch",
"ListSearchLog",
{
"InstanceId": instance_id,
"type": log_type,
"query": query or "*",
"beginTime": int(begin_ms),
"endTime": int(end_ms),
"page": 1,
"size": max(1, min(int(size), 50)),
},
region_id=region_id,
profile=profile,
)
return resp.get("Result") or []
Diagnose Alibaba Cloud ECS GPU instances to detect GPU device status, driver issues, and hardware failures. Use this Skill when users report GPU instance ano...
--- name: alibabacloud-ecs-gpu-diagnosis description: > Diagnose Alibaba Cloud ECS GPU instances to detect GPU device status, driver issues, and hardware failures. Use this Skill when users report GPU instance anomalies, deep learning task failures, GPU not visible, or when troubleshooting GPU hardware issues. Supports automatic Alibaba Cloud CLI installation, diagnosis report creation, and polling for diagnosis results. license: Apache-2.0 compatibility: > Requires Alibaba Cloud CLI (aliyun) installed and configured with AccessKey. Supported regions: cn-hangzhou, cn-shanghai, cn-beijing, cn-shenzhen, etc. metadata: domain: aiops owner: ecs-team contact: [email protected] --- ## Usage Instructions Initiate diagnosis on a specified ECS GPU instance to detect GPU device status and output diagnosis results. ## Execution Constraints - All steps MUST be executed in order; skipping steps is NOT permitted - Each step MUST be verified as successful before proceeding to the next - Inform the user of the current step being executed - If any step fails, user confirmation MUST be obtained before continuing ### Prerequisites 1. **Check Alibaba Cloud CLI Environment** - Execute `which aliyun` or `aliyun --version` to check if CLI is installed - If not installed, inform the user that Alibaba Cloud CLI needs to be installed and provide installation guidance from `references/cli-installation.md`: - macOS: Homebrew installation or manual installation (Intel/Apple Silicon) - Linux: Download installation package for corresponding architecture (x86_64/ARM64) - Windows: Download installation package and configure PATH, or use PowerShell installation - After installation, run `aliyun version` to confirm version >= 3.0.299 - Confirm CLI is configured with AccessKey: `aliyun configure` - **Permission Reminder**: Remind the user that the current RAM user needs the permissions to execute GPU diagnosis from `references/ram-policies.md` : 2. **Obtain Required Parameters** - Check if `INSTANCE_ID` is provided (ECS instance ID, format MUST match this regular expression ^i-[a-z0-9]{20}$ ) - Check if `REGION_ID` is provided (region ID, like cn-shanghai) - If either parameter is missing, ask the user: - "Please provide the ECS instance ID to diagnose (format: i-bp1xxxxx)" - "Please provide the region ID where the instance is located (e.g., cn-shanghai, cn-hangzhou)" 3. **Validate Parameters** - **Validate INSTANCE_ID format**: Check if `INSTANCE_ID` matches the regex pattern `^i-[a-z0-9]{20}$` - If validation fails, inform the user: "Invalid instance ID format. Instance ID must match the pattern ^i-[a-z0-9]{20}$" - **Validate REGION_ID**: Query available regions using DescribeRegions API to verify the region is valid: ```bash aliyun ecs DescribeRegions --user-agent AlibabaCloud-Agent-Skills ``` - Extract the `Regions.Region[].RegionId` list from the response - Check if the provided `REGION_ID` exists in the list - If region is invalid, inform the user: "Invalid region ID. Please provide a valid region ID from the available regions list." 4. **Check Instance Operating System Type** - Before creating a diagnosis report, query instance information to confirm the OS type: ```bash aliyun ecs DescribeInstances --user-agent AlibabaCloud-Agent-Skills --RegionId REGION_ID --InstanceIds '["INSTANCE_ID"]' ``` - Extract the `Instances.Instance[0].OSType` field from the response - **If `OSType` is "linux"**: Continue with the subsequent diagnosis process - **If `OSType` is not "linux"**: Notify the user and terminate the process: ``` The current instance INSTANCE_ID has operating system OSType. This Skill currently only supports Linux operating system instances, other operating systems are not supported. No further diagnosis process is needed. ``` ### Execute Diagnosis 1. **Create Diagnostic Report** Use the following command to initiate GPU diagnosis: ```bash aliyun ecs CreateDiagnosticReport \ --user-agent AlibabaCloud-Agent-Skills \ --RegionId 'REGION_ID' \ --ResourceId 'INSTANCE_ID' \ --MetricSetId 'dms-instanceGPUdevice' \ --output cols=ReportId ``` Extract `ReportId` from the output and save it for subsequent queries. 2. **Poll Diagnostic Results** Use the following command to query the diagnosis report status: ```bash aliyun ecs DescribeDiagnosticReports \ --user-agent AlibabaCloud-Agent-Skills \ --RegionId 'REGION_ID' \ --ReportIds.1 'REPORT_ID' ``` Handle based on the returned `Status` field: - **Status = "Finished"**: Diagnosis complete, parse the `Issues` field content - If `Issues` is empty or does not exist, report "GPU diagnosis normal, no anomalies detected" - If `Issues` contains content, extract each Issue's `IssueId`, `MetricId`, `Severity`, and `MetricCategory`, and output diagnosis results and recommended actions according to the IssueId mapping table below - **Status = "InProgress"**: Diagnosis in progress, wait 5 seconds before querying again - **Status = "Failed"**: Diagnosis failed, report the failure status to the user Set timeout mechanism: poll up to 60 times (approximately 5 minutes), if still not complete, prompt the user to query manually later. ### Output Description After diagnosis is complete, the output should include: - Instance ID and region - Diagnostic report ID - GPU device status summary - Discovered Issues (if any) - Recommended remediation measures (inferred from Issues content) ### Diagnostic Result Analysis The `Issues` returned in the diagnosis report is an array, where each Issue contains `IssueId`, `MetricId`, `Severity`, and `MetricCategory` fields. Output diagnosis description and handling measures according to the IssueId mapping table below: | IssueId | Diagnostic Description | Exception Handling Measures | |---------|------------------------|----------------------------| | GuestOS.GPU.MemoryEccCheckError | Detect GPU Double Bit Error conditions | Prompt user to restart instance based on error count | | GuestOS.GPU.InfoRomCorrupted | Detect GPU infoROM firmware information | O&M notification will be sent to user | | GuestOS.GPU.DriverVersionMismatch | Detect driver anomalies caused by Kernel upgrades | User needs to uninstall and reinstall driver | | GuestOS.GPU.FabricmanagerCheck | Detect Fabricmanager component running status | User needs to install or start Fabricmanager component service | | GuestOS.GPU.PowerCableError | Detect GPU power cable and power supply status | O&M notification will be sent to user | | GuestOS.GPU.DeviceLost | Detect GPU card loss conditions | O&M notification will be sent to user | | GuestOS.GPU.DriverNotInstalled | Detect GPU driver installation status | User needs to install driver | | GuestOS.GPU.NVXidError | Detect GPU Xid error anomalies | Prompt user to restart instance based on different XID errors | | GuestOS.GPU.RmInitAdapterError | Detect GPU card initialization anomalies, manifested as driver card loss | O&M notification will be sent to user | | GuestOS.GPU.NVLinkError | Check GPU NVlink status | O&M notification will be sent to user | **Output Format Example**: ``` Diagnosis Complete! Instance: i-bp1xxxxxxxxx (cn-shanghai) Report ID: dr-xxxxxxxx 1 anomaly found: [1] GuestOS.GPU.DriverNotInstalled Severity: Warn Diagnostic Description: Detect GPU driver installation status Handling Measures: User needs to install driver Diagnostic Recommendations: - Please install the corresponding version of NVIDIA GPU driver - Installation Guide: https://help.aliyun.com/document_detail/108460.html ``` **Special Reminder**: When the exception handling measure is "O&M notification will be sent to user", append the following reminder to the output: ``` ⚠️ Important Reminder: - Alibaba Cloud will send you O&M event notifications - Please go to the ECS console to view event details - Pay attention to whether you receive O&M events and handle them as required ``` If `Issues` is an empty array or does not exist, output: ``` Diagnosis Complete! Instance: i-bp1xxxxxxxxx (cn-shanghai) Report ID: dr-xxxxxxxx GPU diagnosis normal, no anomalies detected. ``` ### Edge Case Handling - **Instance does not exist**: CLI will return an error, capture and inform the user that the instance ID may be incorrect - **Region error**: Prompt user to confirm the region where the instance is located - **Non-GPU specification**: If the instance is not a GPU specification, diagnosis may have no results, prompt user to confirm instance type - **Insufficient permissions**: If permission error is returned, prompt user to check AccessKey permissions - **Network timeout**: Set command execution timeout (recommended 30 seconds), retry after timeout or prompt user to check network ### Example Workflow ``` User: Help me diagnose this GPU server i-bp1xxxxxxxxx Agent: 1. Check CLI is installed 2. Ask for region (user did not provide) 3. User replies: cn-shanghai 4. Check instance OS type is Linux 5. Execute CreateDiagnosticReport, get ReportId: dr-xxxxxxxx 6. Poll DescribeDiagnosticReports 7. Status=InProgress, wait 5 seconds... 8. Query again, Status=Finished 9. Output Issues content to user ``` FILE:references/cli-installation.md # Alibaba Cloud CLI Installation Guide ## macOS **Homebrew (Recommended):** ```bash brew install aliyun-cli ``` **Manual Installation:** ```bash # Intel chip wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz tar -xzf aliyun-cli-macosx-latest-amd64.tgz sudo mv aliyun /usr/local/bin/ # Apple Silicon (M1/M2/M3/M4) wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-arm64.tgz tar -xzf aliyun-cli-macosx-latest-arm64.tgz sudo mv aliyun /usr/local/bin/ ``` ## Linux ```bash # x86_64 wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz tar -xzf aliyun-cli-linux-latest-amd64.tgz sudo mv aliyun /usr/local/bin/ # ARM64 wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz tar -xzf aliyun-cli-linux-latest-arm64.tgz sudo mv aliyun /usr/local/bin/ ``` ## Windows 1. Download: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip 2. Extract to any directory (e.g., `C:\aliyun-cli`) 3. The directory MUST be added to the system PATH environment variable **PowerShell Installation:** ```powershell Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip" Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli [Environment]::SetEnvironmentVariable("Path", $env:Path + ";C:\aliyun-cli", [System.EnvironmentVariableTarget]::Machine) ``` ## Verify Installation After installation, the following command MUST be executed to confirm successful installation: ```bash aliyun version ``` A version number in the output indicates successful installation. The version MUST be >= 3.0.299 to support OAuth authentication. After installation, the CLI MUST be updated to the latest version: ```bash # macOS (Homebrew) brew upgrade aliyun-cli # macOS (manual) / Linux: re-download the latest version and overwrite # Intel Mac wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz tar -xzf aliyun-cli-macosx-latest-amd64.tgz && sudo mv aliyun /usr/local/bin/ # Apple Silicon Mac wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-arm64.tgz tar -xzf aliyun-cli-macosx-latest-arm64.tgz && sudo mv aliyun /usr/local/bin/ # Linux x86_64 wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz tar -xzf aliyun-cli-linux-latest-amd64.tgz && sudo mv aliyun /usr/local/bin/ # Linux ARM64 wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz tar -xzf aliyun-cli-linux-latest-arm64.tgz && sudo mv aliyun /usr/local/bin/ # Windows PowerShell Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip" Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli -Force ``` After updating, `aliyun version` MUST be executed again to confirm the version has been updated. ## Common Issues **command not found:** You MUST verify that the directory containing `aliyun` has been added to PATH. ```bash which aliyun # macOS/Linux where aliyun # Windows ``` ## Reference - Official documentation: https://help.aliyun.com/zh/cli/install-cli FILE:references/ram-policies.md # RAM Permission List RAM permissions required for this Skill execution: ## Diagnostic Operation Permissions `ecs:CreateDiagnosticReport` — Create ECS instance diagnostic report `ecs:DescribeDiagnosticReports` — Query diagnostic report status and results ## Instance Query Permissions (for prerequisite checks) `ecs:DescribeInstances` — Query ECS instance basic information, verify instance existence
面向 Linux 对内存、网络、IO、负载等问题做深度分析与诊断。当出现如下问题时使用:内存含内存高/不足、OOM、oom-killer、Java 应用内存与内存全景大图(含 socket 队列与 TCP 内存视角);网络含丢包与时延抖动(与 memgraph 互补);IO 含磁盘统计与慢 IO;负载含调度延迟与...
---
name: alibabacloud-sysom-diagnosis
version: 1.14.0
description: 面向 Linux 对内存、网络、IO、负载等问题做深度分析与诊断。当出现如下问题时使用:内存含内存高/不足、OOM、oom-killer、Java 应用内存与内存全景大图(含 socket 队列与 TCP 内存视角);网络含丢包与时延抖动(与 memgraph 互补);IO 含磁盘统计与慢 IO;负载含调度延迟与系统负载。输出结构化诊断信封供自动化续跑。
layer: application
category: os-ops
lifecycle: operations
tags:
- os-ops
- sysom
- diagnosis
- ecs
- memory
- oom
- oom-killer
- 内存不足
- 内核日志
- io
- network
- load
status: beta
---
# SysOM 诊断(sysom-diagnosis)
在 **`sysom-diagnosis/`**(技能根)下使用 **`./scripts/osops.sh`** 做诊断。stdout 为 **JSON 信封**,默认**本机快速排查不调云**;远程专项由 **`--deep-diagnosis`**(内存域)或 **`io`/`net`/`load` 子命令**触发,内建环境检查。
## Installation
> **Pre-check: Aliyun CLI >= 3.3.3 required**
> Run `aliyun version` to verify >= 3.3.3. If not installed or version too low,
> run `curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash` to update.
> Then [MUST] run `aliyun configure set --auto-plugin-install true` to enable automatic plugin installation.
> Then [MUST] run `aliyun plugin update` to ensure local plugins are up-to-date.
>
> At the **start** of the Core Workflow (before any CLI invocation):
> **[MUST] Enable AI-Mode** — AI-mode is required for Agent Skill execution.
> ```bash
> aliyun configure ai-mode enable
> aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-sysom-diagnosis"
> ```
>
> At the **end** of the workflow (after success or failure):
> **[MUST] Disable AI-Mode** — AI-mode must be disabled after skill execution completes.
> ```bash
> aliyun configure ai-mode disable
> ```
## Agent 核心行为
1. **本机优先**:先跑 memory quick,不要默认索要 region/instance。
2. **意图路由**:
- **内存域**:占用高/大图 → `memgraph`;OOM/oom-killer → `oom`;Java → `javamem`;不明 → `classify`。详见 [memory-routing.md](./references/memory-routing.md)。
- **非内存域**:IO/网络/负载 → 对应 `io`/`net`/`load` 子命令,直接走远程专项。详见 [non-memory-routing.md](./references/non-memory-routing.md)。
- **远程专项硬约束**:凡是 `--deep-diagnosis`、`io/*`、`net/*`、`load/*` 场景,必须通过 `./scripts/osops.sh` 触发 SysOM `InvokeDiagnosis`;禁止退化为 ECS 通用诊断 API、`Ecs.RunCommand`/Cloud Assistant 手工采集(如 `top`/`ps`/`iostat`)替代专项诊断。
3. **服从信封指令**:始终读 `agent.summary` 并执行 `agent.next`。quick 输出仅为信号检测,`agent.next` 有命令时必须先执行,再向用户总结。
4. **信封即结果**:诊断结论以信封 `data` 为准,无需自行采集额外信息。
5. **网络延迟 + socket 队列积压**:已跑 `net netjitter`/`net packetdrop` 且结果正常,但 `ss` 显示 Send-Q/Recv-Q 偏大时,须交叉 `memory memgraph --deep-diagnosis`。
完整约定(执行目录、凭证安全、precheck 降噪等)见 [agent-conventions.md](./references/agent-conventions.md)。
## 信封输出
CLI stdout 为 JSON 信封(`format: sysom_agent`, `version: 3.4`)。Agent 直接消费 `agent.summary`(摘要)、`agent.findings`(关键指标)、`agent.next`(下一步命令,应在技能根用 Bash 执行);业务载荷在 `data.routing`、`data.local`、`data.remote`。详见 [output-format.md](./references/output-format.md)。
### Precheck / 认证失败
认证失败时信封含 `data.remediation`(独立 precheck)或 `data.precheck_gate.remediation`(deep-diagnosis 合并),按信封指令引导用户完成配置。详见 [agent-conventions.md](./references/agent-conventions.md)。
## 子命令速查
### 内存域
| 子命令 | 能力 | 专文 |
|--------|------|------|
| `memory memgraph` | 内存全景/大盘,含 TCP 内存与 socket 队列 | [memgraph.md](./references/diagnoses/memgraph.md) |
| `memory oom` | OOM / oom-killer 专项 | [oomcheck.md](./references/diagnoses/oomcheck.md) |
| `memory javamem` | Java 内存 | [javamem.md](./references/diagnoses/javamem.md) |
| `memory classify` | 综合归类(不明时兜底) | 路由见 [memory-routing.md](./references/memory-routing.md) |
### IO 域
| 子命令 | 能力 | 专文 |
|--------|------|------|
| `io iofsstat` | IO 大盘(磁盘统计) | [iofsstat.md](./references/diagnoses/iofsstat.md) |
| `io iodiagnose` | IO 深度(慢 IO、延迟) | [iodiagnose.md](./references/diagnoses/iodiagnose.md) |
### 网络域
| 子命令 | 能力 | 专文 |
|--------|------|------|
| `net packetdrop` | 丢包(rtrace) | [packetdrop.md](./references/diagnoses/packetdrop.md) |
| `net netjitter` | 抖动(时延波动) | [netjitter.md](./references/diagnoses/netjitter.md) |
### 负载域
| 子命令 | 能力 | 专文 |
|--------|------|------|
| `load delay` | 调度延迟(nosched) | [delay.md](./references/diagnoses/delay.md) |
| `load loadtask` | 系统负载 | [loadtask.md](./references/diagnoses/loadtask.md) |
## 快速开始
```bash
cd <sysom-diagnosis>
./scripts/osops.sh memory classify # 本机归类
./scripts/osops.sh memory memgraph # 本机内存大图
./scripts/osops.sh memory memgraph --deep-diagnosis --channel ecs --timeout 300 # 远程内存专项
./scripts/osops.sh io iofsstat --channel ecs --timeout 300 # IO 大盘
./scripts/osops.sh net packetdrop --channel ecs --region cn-hangzhou --instance i-xxx # 丢包诊断
./scripts/osops.sh load delay --channel ecs --params '{"duration":30}' # 调度延迟
```
其它实例加 `--region <id> --instance <i-xxx>`。首次使用先 `./scripts/init.sh`。
## 远程 OpenAPI 三要素
| 要素 | 说明 |
|------|------|
| 身份 | AK/SK 或实例 RAM Role |
| 策略 | `AliyunSysomFullAccess` |
| 开通与 SLR | 控制台开通 SysOM;SLR 见 [service-linked-role-subaccount.md](./references/service-linked-role-subaccount.md) |
## 关键路径索引
| 需求 | 文档 |
|------|------|
| 内存意图→子命令映射 | [memory-routing.md](./references/memory-routing.md) |
| IO/网络/负载路由 | [non-memory-routing.md](./references/non-memory-routing.md) |
| 远程调用契约 / CLI 选项 / 元数据 | [invoke-diagnosis.md](./references/invoke-diagnosis.md) |
| 权限 / 凭证 / precheck | [permission-guide.md](./references/permission-guide.md) → [openapi-permission-guide.md](./references/openapi-permission-guide.md) |
| 输出信封格式 | [output-format.md](./references/output-format.md) |
| Agent 行为约定 | [agent-conventions.md](./references/agent-conventions.md) |
| 各诊断 params 字段 | [diagnoses/README.md](./references/diagnoses/README.md) |
FILE:references/agent-conventions.md
# Agent 约定(sysom-diagnosis)
本文件从 SKILL.md 迁出,详述 Agent 行为规范。主 SKILL 仅保留三条核心规则与索引链接。
## 执行目录
- `./scripts/osops.sh` 为相对路径,仅在 **技能根**(与 `SKILL.md` 同级)下有效。
- 推荐写法:`cd <技能根绝对路径> && ./scripts/osops.sh …`。
- `agent.next` 中的 `command` 同样须在技能根用 Bash 执行,不要改为让用户自行复制。
## 本地优先
- 用户只说「内存高」「OOM」等泛化症状、未明确要远程时,先跑本机 quick(按 [memory-routing.md](./memory-routing.md) 选子命令),不索要 region/instance。
- 用户明确要远程、或 `agent.next` 已给出 `--deep-diagnosis` 时,再按 [invoke-diagnosis.md](./invoke-diagnosis.md) 区分本机/远程。
## 凭证与安全
- **禁止**在对话中收集 AccessKey / Secret。
- 引导用户在本机终端于技能根执行 `./scripts/osops.sh configure`,写入 `~/.aliyun/config.json`。
- 无 PTY 时:在 COSH 中通过 `/settings` 开启「交互式 Shell(PTY)」,或用 `/bash` 进入交互式 Bash。
## Precheck 信封消费
- 按 `data.remediation`(独立 precheck)或 `data.precheck_gate.remediation`(deep-diagnosis 合并)**顺序**引导用户修复。`primary_path` 已锁定时只展示该路径,`configure_identity` 时按 `guidance.auth_path_choice` 让用户选择。
- `data.precheck_gate` 存在时以其为准,不要把 quick findings 与 precheck findings 重复全文复述。
## 检查与使用顺序
### A. 内存域
1. **快速排查**:按 [memory-routing.md](./memory-routing.md) 选本机子命令。下一步读 `agent.next`,关键发现见 `agent.findings`。
2. **(可选)`precheck`**:单独验证凭证与 SysOM 开通。
3. **确认目标**:当前实例不传 `--region`/`--instance`;其它实例须用户给出 region + instance-id。
4. **深度诊断**:按 `agent.next` 中的 `command`,或加 `--deep-diagnosis`。
5. **失败处理**:读 `error`、`data`(含 `remediation`、`precheck_gate` 等)、`agent.summary`。
### B. 非内存域(IO / 网络 / 负载)
1. **确认当前/其它实例**(同 A)→ 2. `./scripts/osops.sh <io|net|load> <子命令> …`(调用前内建环境检查)→ 读 `data.routing`/`data.remote`、`agent.findings`。
2. **网络延迟 + socket 队列积压**:已跑 `net netjitter`/`net packetdrop` 且结果正常,但 `ss` 显示 Send-Q/Recv-Q 偏大时,须交叉 `memory memgraph --deep-diagnosis`。详见 [non-memory-routing.md](./non-memory-routing.md)。
3. **失败处理**:同 A。
## 与其它 memory 技能的边界
- 本技能在 `sysom-diagnosis/` 下使用 `./scripts/osops.sh`(`memory`/`io`/`net`/`load` 配套)。
- 其它技能或父仓库里的入口可能与本目录不同;SysOM 远程专项请使用本目录内的 `osops`。
FILE:references/authentication.md
# 阿里云认证配置指南
本文档详细说明如何配置阿里云认证,以便使用 SysOM 诊断工具的远程功能。
## 前提条件
- 已安装阿里云 CLI(如未安装,执行 `yum install aliyun-cli -y`)
- 拥有阿里云账号或 RAM 用户权限
## 支持的认证模式
SysOM 工具支持三种主要认证方式:
| 认证方式 | 适用场景 | 推荐程度 |
|---------|---------|---------|
| **ECS RAM Role** | ECS 实例内执行 | ⭐⭐⭐⭐⭐ 强烈推荐 |
| **AccessKey (AK)** | 本地开发/测试环境 | ⭐⭐⭐ 仅限开发使用 |
| **STS Token** | 短时会话/跨账号临时授权 | ⭐⭐⭐⭐ 推荐(优于长期 AK) |
---
## 方式 1: ECS RAM Role(推荐)
### 为什么推荐 RAM Role?
- **零配置**:无需手动管理 AccessKey
- **最安全**:自动轮换临时凭证,无泄露风险
- **最便捷**:SDK 自动从元数据服务获取凭证
### 配置步骤
#### 步骤 1: 创建 RAM 角色(如未创建)
1. 登录 [RAM 控制台](https://ram.console.aliyun.com/roles)
2. 点击"创建角色" → 选择"阿里云服务" → 选择"普通服务角色"
3. 选择受信服务类型:**云服务器(ECS)**
4. 输入角色名称(如 `SysomRole`)和备注
5. 点击"完成"创建角色
> 📖 详细文档:[创建 RAM 角色](https://help.aliyun.com/zh/ram/user-guide/create-a-ram-role-for-a-trusted-alibaba-cloud-service)
#### 步骤 2: 授予 AliyunSysomFullAccess 权限(⚠️ 必需)
**重要**:必须为 RAM 角色授予 `AliyunSysomFullAccess` 权限,否则无法访问 SysOM API。
1. 在 [RAM 控制台角色列表](https://ram.console.aliyun.com/roles) 中找到刚创建的角色
2. 点击角色名称进入详情页
3. 点击"权限管理"标签页 → "新增授权"
4. 在搜索框输入 `AliyunSysomFullAccess`
5. 勾选该权限策略,点击"确定"
> 📖 权限策略详情:[AliyunSysomFullAccess 权限说明](https://ram.console.aliyun.com/policies/AliyunSysomFullAccess/System/content)
**权限验证**:
- 确保权限策略列表中包含 `AliyunSysomFullAccess`
- 权限类型为"系统策略"
#### 步骤 3: 为 ECS 实例绑定 RAM 角色
1. 登录 [ECS 控制台](https://ecs.console.aliyun.com/server)
2. 找到目标 ECS 实例
3. 点击实例右侧的"更多" → "实例设置" → "绑定/解绑 RAM 角色"
4. 在弹出框中选择刚创建的 RAM 角色(如 `SysomRole`)
5. 点击"确定"完成绑定
**绑定后验证**:
- 实例详情页"配置信息"标签可看到"RAM 角色"字段
- 在实例内执行 `curl http://100.100.100.200/latest/meta-data/ram/security-credentials/` 可看到角色名
> 📖 详细文档:[为 ECS 实例绑定 RAM 角色](https://help.aliyun.com/zh/ecs/user-guide/attach-an-instance-ram-role-to-an-ecs-instance)
#### 步骤 4: 在 ECS 实例内配置 aliyun CLI
登录到 ECS 实例,执行以下命令配置 aliyun CLI 使用 RAM Role:
```bash
# 配置使用 ECS RAM Role 模式
aliyun configure --mode EcsRamRole --ram-role-name <RAM角色名>
```
**示例**:
```bash
aliyun configure --mode EcsRamRole --ram-role-name SysomRole
```
**配置文件示例**(`~/.aliyun/config.json`):
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "EcsRamRole",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "zh"
}
],
"meta_path": ""
}
```
**注意**:
- 配置文件中**不需要** `ram_role_name` 字段
- SDK 会自动从 ECS 元数据服务(`http://100.100.100.200`)获取角色名和临时凭证
- 只要实例绑定了 RAM 角色,SDK 就能自动获取
#### 步骤 5: 验证配置
```bash
# 运行预检查(唯一推荐的验证方法)
cd /path/to/sysom-diagnosis
./scripts/osops.sh precheck
```
`precheck` 命令会自动执行以下检查:
1. **检查 ECS 元数据服务**:通过 `curl http://100.100.100.200/latest/meta-data/ram/security-credentials/` 查看实例是否绑定了 RAM 角色
2. **检查环境变量**:验证 `ALIBABA_CLOUD_ACCESS_KEY_ID` / `ALIBABA_CLOUD_ACCESS_KEY_SECRET`(及可选 `ALIBABA_CLOUD_SECURITY_TOKEN`)
3. **检查配置文件**:解析 `~/.aliyun/config.json` 中的认证配置(AK / StsToken / EcsRamRole 模式)
4. **验证 SysOM API**:调用 `InitialSysom` API 验证权限是否充足
**预期输出(成功)**:
```json
{
"ok": true,
"method": "ECS RAM Role (SysomRole)",
"role": "SysomRole",
"message": "认证验证成功,拥有 SysOM 访问权限",
"check_details": [
{"method": "ECS元数据", "status": "✓ 实例已绑定 RAM 角色: SysomRole"},
{"method": "环境变量 AKSK", "status": "✗ 未配置"}
]
}
```
**预期输出(检测到 RAM 角色但权限不足)**:
```json
{
"ok": false,
"error": "未找到有效的认证配置",
"ecs_role_name": "SysomRole",
"suggestion": "检测到实例已绑定 RAM 角色 'SysomRole',请为该角色授予 AliyunSysomFullAccess 权限,然后配置 aliyun CLI 使用 EcsRamRole 模式",
"check_details": [
{"method": "ECS元数据", "status": "✓ 实例已绑定 RAM 角色: SysomRole"},
{"method": "环境变量 AKSK", "status": "✗ 未配置"},
{"method": "配置文件", "status": "✗ 未配置或配置无效"}
]
}
```
**注意**:请使用 `osops precheck` 验证配置,不要使用 `aliyun ecs describe-instances` 等命令,因为 SysOM 需要特定的 API 权限。
### 故障排查
**问题 1:绑定 RAM 角色后,precheck 仍然失败**
可能原因:
- RAM 角色未授予 `AliyunSysomFullAccess` 权限
- 权限刚授予,需要等待 2-5 分钟生效
解决方案:
```bash
# 1. 检查元数据服务是否可访问
curl http://100.100.100.200/latest/meta-data/ram/security-credentials/
# 2. 检查角色名是否正确
curl http://100.100.100.200/latest/meta-data/ram/security-credentials/<RAM角色名>
# 3. 等待几分钟后重新运行 precheck
./scripts/osops.sh precheck
```
**问题 2:找不到 RAM 角色**
确认操作:
- 在 [RAM 控制台](https://ram.console.aliyun.com/roles) 确认角色已创建
- 在 [ECS 控制台](https://ecs.console.aliyun.com/) 实例详情中确认角色已绑定
> 📖 更多故障排查:[ECS 实例 RAM 角色常见问题](https://help.aliyun.com/zh/ecs/user-guide/faq-about-instance-ram-roles)
---
## 方式 2: AccessKey (AK)
### 注意事项
- ⚠️ **安全风险**:长期密钥泄露风险高,仅用于开发/测试
- ⚠️ **不支持命令行直接指定**:出于安全考虑,AK/SK 不能通过命令行参数传递
### 配置步骤
#### 步骤 0: Agent / 多终端场景(优先)
若你在 **COSH** 中:**不要在对话里向 Agent 提供或粘贴 AccessKey/Secret**(会进入聊天记录与日志)。
请优先在**本机终端**(由 Agent 调用 Bash 或用户自己打开终端),在 sysom-diagnosis(技能根)执行 **`./scripts/osops.sh configure`**(交互式写入 `~/.aliyun/config.json`),再执行 **`./scripts/osops.sh precheck`**。若必须用环境变量,仅在**同一 shell 一条命令**内:`export ... && ./scripts/osops.sh precheck`,且**勿在聊天中发送密钥**。
若 Agent 调用的终端**不支持交互**(无 PTY),`osops configure` 无法读入密钥:在 **COSH** 中可通过 **`/settings`** 使能「**交互式Shell(PTY)**」,或使用 **`/bash`** 进入交互式 Bash,再在技能根执行 **`./scripts/osops.sh configure`**。
#### 步骤 1: 获取 AccessKey
1. 登录 [RAM 控制台](https://ram.console.aliyun.com/)
2. 创建或选择 RAM 用户
3. 创建 AccessKey,记录 `AccessKey ID` 和 `AccessKey Secret`
#### 步骤 2: 配置认证(选择一种方式)
**方式 A: 手动编辑配置文件(推荐)**
创建或编辑 `~/.aliyun/config.json`:
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "AK",
"access_key_id": "LTAI...",
"access_key_secret": "...",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "zh"
}
],
"meta_path": ""
}
```
**方式 B: 使用 osops configure 交互式配置**
在 sysom-diagnosis(技能根)执行:
```bash
./scripts/osops.sh configure
```
按提示输入 Access Key ID、Access Key Secret、Region ID(如 `cn-hangzhou`);Secret 不会回显。
#### 步骤 3: 验证配置
```bash
# 运行预检查(唯一推荐的验证方法)
./scripts/osops.sh precheck
```
预检查命令会自动执行以下检查:
1. 检查 ECS 元数据服务(如果在 ECS 环境中)
2. 检查环境变量中的 AKSK / STS Token
3. 检查 `~/.aliyun/config.json` 配置
4. 调用 SysOM API 验证权限
**预期输出(成功)**:
```json
{
"ok": true,
"method": "配置文件 AKSK",
"message": "认证验证成功,拥有 SysOM 访问权限",
"check_details": [
{"method": "ECS元数据", "status": "✗ HTTP 404: 无法访问元数据服务"},
{"method": "环境变量 AKSK", "status": "✗ 未配置"}
]
}
```
**注意**:请使用 `osops precheck` 验证配置,不要使用 `aliyun ecs describe-instances` 等命令,因为 SysOM 需要特定的 API 权限。
---
## 方式 3: STS Token(临时凭证)
### 适用场景
- 通过 `AssumeRole` 等方式获取了临时凭证(`AccessKeyId` / `AccessKeySecret` / `SecurityToken`)
- 希望避免长期 AK/SK,降低密钥泄露风险
### 配置方式 A:环境变量
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID="STS.xxx"
export ALIBABA_CLOUD_ACCESS_KEY_SECRET="..."
export ALIBABA_CLOUD_SECURITY_TOKEN="CAIS..."
./scripts/osops.sh precheck
```
兼容变量名:
- `ALICLOUD_ACCESS_KEY_ID`
- `ALICLOUD_ACCESS_KEY_SECRET`
- `ALICLOUD_SECURITY_TOKEN`
- `SECURITY_TOKEN`
### 配置方式 B:`~/.aliyun/config.json`(`mode=StsToken`)
```json
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "StsToken",
"access_key_id": "STS.xxx",
"access_key_secret": "...",
"sts_token": "CAIS...",
"region_id": "cn-hangzhou",
"output_format": "json",
"language": "zh"
}
],
"meta_path": ""
}
```
`sts_token` 也可写为 `security_token`(兼容字段)。
### 预期输出(成功)
环境变量方式示例:
```json
{
"ok": true,
"method": "环境变量 STS Token",
"message": "认证验证成功,拥有 SysOM 访问权限"
}
```
配置文件方式示例:
```json
{
"ok": true,
"method": "配置文件 STS Token",
"message": "认证验证成功,拥有 SysOM 访问权限"
}
```
---
## 环境变量方式(用于 Python SDK)
如果直接使用 Python SDK(而非 `aliyun` CLI),可以通过环境变量配置:
```bash
export ALIBABA_CLOUD_ACCESS_KEY_ID="LTAI..."
export ALIBABA_CLOUD_ACCESS_KEY_SECRET="..."
export ALIBABA_CLOUD_REGION_ID="cn-hangzhou"
# 如果使用 STS 临时凭证,额外配置:
export ALIBABA_CLOUD_SECURITY_TOKEN="CAIS..."
```
或创建 `.env` 文件(推荐用于开发):
```bash
# 在 sysom-diagnosis 目录创建 .env
cat > .env <<EOF
ALIBABA_CLOUD_ACCESS_KEY_ID=LTAI...
ALIBABA_CLOUD_ACCESS_KEY_SECRET=...
ALIBABA_CLOUD_REGION_ID=cn-hangzhou
# STS 场景可加:
# ALIBABA_CLOUD_SECURITY_TOKEN=CAIS...
EOF
# 确保不提交到 Git
echo ".env" >> .gitignore
```
**注意**:
- 环境变量仅适用于 Python SDK,不适用于 `aliyun` CLI
- SysOM 工具的 `precheck` 命令会检测环境变量配置
---
## 权限配置
### 授予 AliyunSysomFullAccess 权限
无论使用哪种认证方式,都需要授予 **AliyunSysomFullAccess** 权限:
#### 对于 RAM 用户(AK 模式)
1. 登录 [RAM 控制台](https://ram.console.aliyun.com/)
2. 找到对应的 RAM 用户
3. 点击"添加权限"
4. 搜索并选择 `AliyunSysomFullAccess`
5. 确认授权
#### 对于 RAM 角色(ECS RAM Role 模式)
1. 登录 [RAM 控制台](https://ram.console.aliyun.com/)
2. 进入"角色"页面,找到对应角色
3. 点击"添加权限"
4. 搜索并选择 `AliyunSysomFullAccess`
5. 确认授权
### 权限验证
运行 `precheck` 命令验证配置是否正确:
```bash
./scripts/osops.sh precheck
```
预检查命令会自动执行以下检查:
- ✅ 检查 ECS 元数据服务(`curl http://100.100.100.200/latest/meta-data/ram/security-credentials/`)
- ✅ 检查环境变量中的 AKSK / STS Token 配置
- ✅ 检查 `~/.aliyun/config.json` 中的认证配置(AK / StsToken / EcsRamRole)
- ✅ 调用 SysOM API 验证权限是否充足(`InitialSysom`)
**输出示例**:
如果在 ECS 上检测到 RAM 角色但权限不足:
```json
{
"ok": false,
"ecs_role_name": "AliyunECSInstanceForSysOM",
"suggestion": "检测到实例已绑定 RAM 角色 'AliyunECSInstanceForSysOM',请为该角色授予 AliyunSysomFullAccess 权限"
}
```
如果配置正确:
```json
{
"ok": true,
"method": "配置文件 AKSK",
"message": "认证验证成功,拥有 SysOM 访问权限"
}
```
---
## 安全最佳实践
### 生产环境
1. **优先使用 ECS RAM Role**
- 零配置,无需管理密钥
- 自动轮换临时凭证
2. **最小权限原则**
- 只授予必需的权限
- 定期审计权限使用情况
3. **启用 MFA**
- 为主账号和关键 RAM 用户启用多因素认证
### 开发环境
1. **使用 .env 文件管理密钥**
```bash
# 添加到 .gitignore
echo ".env" >> .gitignore
```
2. **定期轮换 AccessKey**
- 建议每 90 天轮换一次
- 在 RAM 控制台可设置自动轮换
3. **避免硬编码**
- 不要在代码中硬编码 AccessKey
- 不要将配置文件提交到 Git
---
## 故障排查
### 问题 1: 预检查失败 - 未找到认证配置
**症状**:
```
✗ 检查失败
未找到有效的认证配置
```
**解决方案**:
1. 检查配置文件是否存在:`cat ~/.aliyun/config.json`
2. 检查环境变量:`echo $ALIBABA_CLOUD_ACCESS_KEY_ID`
3. 在 sysom-diagnosis(技能根)重新执行:`./scripts/osops.sh configure`
### 问题 2: 权限不足
**症状**:
```
✗ 检查失败
认证成功但权限不足
```
**解决方案**:
1. 确认已授予 `AliyunSysomFullAccess` 权限
2. 等待 2-5 分钟让权限生效
3. 重新运行 `./scripts/osops.sh precheck`
### 问题 3: ECS RAM Role 不可用
**症状**:
```
✗ 检查失败
ECS RAM Role 配置失败
```
**解决方案**:
1. 确认实例已绑定 RAM 角色:
```bash
curl http://100.100.100.200/latest/meta-data/ram-role-name
```
2. 确认角色有 `AliyunSysomFullAccess` 权限
3. 检查元数据服务是否可访问:
```bash
curl http://100.100.100.200/latest/meta-data/instance-id
```
### 问题 4: AccessKey 泄露怎么办?
**立即行动**:
1. 登录 RAM 控制台,禁用泄露的 AccessKey
2. 创建新的 AccessKey
3. 更新所有使用该 AccessKey 的配置
4. 审查近期操作日志,排查异常行为
---
## 参考资源
- [阿里云 CLI 认证方式](https://help.aliyun.com/zh/cli/configure-credentials)
- [ECS RAM 角色配置](https://help.aliyun.com/zh/ecs/user-guide/attach-an-instance-ram-role-to-an-ecs-instance)
- [RAM 权限管理](https://help.aliyun.com/zh/ram/user-guide/overview)
- [SysOM API 文档](https://help.aliyun.com/zh/sysom)
- [ECS Metadata 服务](./metadata-api.md)
FILE:references/cli-development-guide.md
# SysOM CLI 开发指南
欢迎参与 `sysom_cli` 的开发!本文档将帮助你快速上手,了解项目架构,并学会如何添加新功能。
> **说明**:**`memory`** 子系统含 **`classify` / `memgraph` / `oom` / `javamem`**(默认**快速排查**);可选 **`--deep-diagnosis`** 接**深度诊断**(远程专项)。非内存 Agent 主路径为 **`io` / `net` / `load`**。底层 OpenAPI 直调 CLI 不向 Agent 宣读,维护者见 [invoke-diagnosis.md](./invoke-diagnosis.md)。见 [SKILL.md](../SKILL.md)。权限引导见 [openapi-permission-guide.md](./openapi-permission-guide.md)。
## 目录
- [项目架构](#项目架构)
- [开发环境设置](#开发环境设置)
- [添加新命令](#添加新命令)
- [执行模式详解](#执行模式详解)
- [代码规范](#代码规范)
- [测试和调试](#测试和调试)
- [常见问题](#常见问题)
## 项目架构
### 目录结构
```
sysom_cli/
├── lib/ # 【通用工具库】所有子系统共享
│ ├── schema.py # JSON 信封格式定义
│ ├── specialty_args.py # io/net/load 共用的 OpenAPI 侧参数(与 invoke 实现同源)
│ ├── specialty_command.py # BaseServiceSpecialtyCommand(薄封装 service_name)
│ ├── diagnosis_backend.py # 可插拔 SysOM 专项后端(默认 DiagnosisInvokeCommand)
│ ├── kernel_log.py # 内核日志采集工具
│ ├── log_plugin.py # 简单日志扫描插件框架
│ ├── log_parser.py # 复杂日志解析引擎框架
│ └── auth.py # 阿里云认证工具
│
├── core/ # 【核心框架】所有子系统共享
│ ├── base.py # BaseCommand 抽象基类
│ ├── registry.py # 命令自动发现与注册(支持多级)
│ └── executor.py # 统一执行器
│
├── precheck/ # 【顶层命令】环境预检查
│ └── command.py # 命令入口(@command_metadata)
│
├── memory/ # 【子系统】内存快速排查子命令
│ ├── lib/ # classify_engine、oom_quick、oom_log_extract、envelope_memory、invoke_bridge…
│ ├── classify/ # memory classify
│ ├── memgraph/ # memory memgraph(内存全景)
│ ├── oom/ # memory oom
│ └── javamem/ # memory javamem(骨架)
│
├── io/ # 【子系统】磁盘与 IO 专项(iofsstat、iodiagnose)
├── net/ # 【子系统】网络专项(packetdrop、netjitter)
├── load/ # 【子系统】负载与调度专项(delay、loadtask)
│
├── diagnosis/ # 【子系统】SysOM 远程诊断(InvokeDiagnosis)
│ └── invoke/
│ └── command.py # @command_metadata(subsystem="diagnosis")
│
└── __main__.py # CLI 入口(统一调度所有命令)
```
### 命令层级结构
`sysom_cli` 支持两种命令类型:
1. **顶层命令**(Top-Level Command)
- 直接在 `sysom_cli/` 下有自己的目录
- 目录中包含 `command.py` 文件
- 示例:`precheck`
- 使用方式:`osops precheck`
2. **子系统命令**(Subsystem Command)
- 属于某个子系统,子系统下有多个子命令
- 子系统目录下的各个命令目录中包含 `command.py`
- 示例:`diagnosis` 子系统包含 `invoke`
- 使用方式:非内存 `osops io|net|load <service_name> ...`;维护者 OpenAPI 直调见 [invoke-diagnosis.md](./invoke-diagnosis.md)
### 核心组件
#### 1. BaseCommand 抽象基类
所有子命令必须继承此类:
```python
from sysom_cli.core.base import BaseCommand, ExecutionMode
class MyCommand(BaseCommand):
@property
def command_name(self) -> str:
return "mycmd"
@property
def supported_modes(self) -> Dict[str, bool]:
return {
ExecutionMode.LOCAL: True,
ExecutionMode.REMOTE: False,
ExecutionMode.HYBRID: False,
}
def execute_local(self, ns: Namespace) -> Dict[str, Any]:
# 实现 local 模式逻辑
pass
```
#### 2. CommandRegistry 命令注册中心
自动发现和注册命令:
- 支持两种发现模式:
- `discover_commands(top_level=True)`: 扫描顶层命令
- `discover_commands(subsystem="diagnosis")`: 扫描子系统的子命令
- 自动导入 `command.py` 模块
- 通过 `@command_metadata` 装饰器注册
#### 3. CommandExecutor 统一执行器
- 从环境变量 `MEMORY_MODE` 获取执行模式
- 路由到对应命令的执行方法
## 开发环境设置
### 1. 安装依赖
```bash
cd /path/to/sysom-diagnosis
./scripts/init.sh
```
### 2. 设置 PYTHONPATH
```bash
export PYTHONPATH="/path/to/sysom-diagnosis/scripts:$PYTHONPATH"
```
### 3. 验证安装
```bash
python3 -m sysom_cli --list-capabilities
```
## 添加新命令
添加新命令只需 3 步,**无需修改任何其他文件**!
### 添加子系统命令(如 memory leak)
#### 步骤 1: 创建命令目录
例如,在 `memory` 子系统下添加 `leak` 命令:
```bash
mkdir -p scripts/sysom_cli/memory/leak
touch scripts/sysom_cli/memory/leak/__init__.py
```
#### 步骤 2: 创建 command.py
创建 `scripts/sysom_cli/memory/leak/command.py`:
```python
# -*- coding: utf-8 -*-
"""
Memory leak 检测命令
"""
from __future__ import annotations
from argparse import Namespace
from typing import Any, Dict
from sysom_cli.core.base import BaseCommand, ExecutionMode
from sysom_cli.core.registry import command_metadata
@command_metadata(
name="leak",
help="内存泄漏检测和分析",
subsystem="memory", # 重要:标记这是 memory 子系统的命令
args=[
(["--pid"], {"type": int, "help": "进程 PID"}),
(["--duration"], {"type": int, "default": 60, "help": "监控时长(秒)"}),
(["--threshold"], {"type": float, "default": 10.0, "help": "泄漏阈值(MB/s)"}),
]
)
class LeakCommand(BaseCommand):
"""内存泄漏检测命令"""
@property
def command_name(self) -> str:
return "leak"
@property
def supported_modes(self) -> Dict[str, bool]:
return {
ExecutionMode.LOCAL: True,
ExecutionMode.REMOTE: False,
ExecutionMode.HYBRID: False,
}
def execute_local(self, ns: Namespace) -> Dict[str, Any]:
"""Local 模式:本地监控和分析"""
from sysom_cli.lib.schema import envelope, agent_block
# 你的实现逻辑
pid = getattr(ns, "pid", None)
duration = getattr(ns, "duration", 60)
# TODO: 实现泄漏检测逻辑
result_data = {
"pid": pid,
"leak_detected": False,
"leak_rate_mb_s": 0.0,
}
return envelope(
action="memory_leak",
ok=True,
agent=agent_block(
"normal",
f"进程 {pid} 监控 {duration}s,未检测到泄漏。"
),
data=result_data,
execution={"mode": "local", "stage": "leak"}
)
```
#### 步骤 3: 直接使用
**无需修改任何其他文件!** 命令自动注册。
```bash
# 查看帮助
./scripts/osops.sh memory leak --help
# 执行命令
./scripts/osops.sh memory leak --pid 1234 --duration 120
```
### 添加顶层命令(如 version)
#### 步骤 1: 创建命令目录
```bash
mkdir -p scripts/sysom_cli/version
touch scripts/sysom_cli/version/__init__.py
```
#### 步骤 2: 创建 command.py
创建 `scripts/sysom_cli/version/command.py`:
```python
# -*- coding: utf-8 -*-
"""
版本信息命令
"""
from __future__ import annotations
from argparse import Namespace
from typing import Any, Dict
from sysom_cli.core.base import BaseCommand, ExecutionMode
from sysom_cli.core.registry import command_metadata
@command_metadata(
name="version",
help="显示 sysom_cli 版本信息",
# 注意:顶层命令不需要指定 subsystem 参数
args=[]
)
class VersionCommand(BaseCommand):
"""版本信息命令"""
@property
def command_name(self) -> str:
return "version"
@property
def supported_modes(self) -> Dict[str, bool]:
return {
ExecutionMode.LOCAL: True,
ExecutionMode.REMOTE: False,
ExecutionMode.HYBRID: False,
}
def execute_local(self, ns: Namespace) -> Dict[str, Any]:
"""返回版本信息"""
from sysom_cli.lib.schema import envelope, agent_block
return envelope(
action="version",
ok=True,
agent=agent_block("normal", "SysOM CLI v1.0.0"),
data={"version": "1.0.0", "build": "20260319"}
)
```
#### 步骤 3: 注册到顶层命令列表
编辑 `scripts/sysom_cli/__main__.py`,在 `TOP_COMMANDS` 列表中添加:
```python
TOP_COMMANDS: List[Dict[str, Any]] = [
{"name": "memory", "help": "内存诊断", "is_subsystem": True},
{"name": "version", "help": "显示版本信息", "is_subsystem": False}, # 新增
{"name": "precheck", "help": "环境预检查", "is_subsystem": False},
{"name": "am", "help": "活动监控 [WIP]", "is_subsystem": False, "not_implemented": True},
]
```
#### 步骤 4: 使用
```bash
# 查看帮助
./scripts/osops.sh version --help
# 执行命令
./scripts/osops.sh version
```
## 执行模式详解
`sysom_cli` 支持三种执行模式,通过环境变量 `MEMORY_MODE` 控制:
### Local 模式 (default)
```bash
# 默认就是 local 模式
./scripts/osops.sh memory oom
# 或显式指定
MEMORY_MODE=local ./scripts/osops.sh memory oom
```
**特点**:
- 本地生成命令
- 本地执行采集
- 本地分析处理
**实现示例**:
```python
def execute_local(self, ns: Namespace) -> Dict[str, Any]:
# 1. 采集数据
data = collect_data(ns)
# 2. 分析数据
analysis = analyze_data(data)
# 3. 返回结果
return envelope(
action="memory_oom",
ok=True,
agent=agent_block("normal", "OOM 分析完成"),
data=analysis
)
```
### Remote 模式
```bash
MEMORY_MODE=remote ./scripts/osops.sh memory cache
```
**特点**:
- 直接调用远程 API
- API 返回最终结果
- 不做本地处理
**实现示例**:
```python
def execute_remote(self, ns: Namespace) -> Dict[str, Any]:
# 返回执行计划,由网关在节点执行
plan = build_execution_plan(ns)
return envelope(
action="memory_cache_plan",
ok=True,
agent=agent_block(
"normal",
"已生成执行计划,由网关执行。",
next_steps=[{
"tool": "sysom_cli memory cache --raw-file <path>",
"reason": "解析网关回传的结果"
}]
),
data={"phase": "remote_plan", "plan": plan}
)
```
### Hybrid 模式
```bash
MEMORY_MODE=hybrid ./scripts/osops.sh memory oom
```
**特点**:
- 调用 OpenAPI(可能是异步接口)
- 轮询任务状态
- 本地处理数据
- 转换为 Agent 标准格式
**实现示例**:
```python
def execute_hybrid(self, ns: Namespace) -> Dict[str, Any]:
from sysom_cli.memory.oom.api_client import OomApiClient
client = OomApiClient()
# 1. 提交诊断任务
task_id = client.submit_oom_diagnosis(
instance_id=getattr(ns, "instance_id", None)
)
# 2. 轮询任务状态
raw_result = client.poll_task_result(task_id, timeout=300)
# 3. 本地格式化
return format_api_result(raw_result, ns)
```
## 代码规范
### 0. 通用库说明
`sysom_cli/lib/` 提供了多个通用工具库,所有命令都可以使用:
#### `schema.py` - JSON 信封格式
```python
from sysom_cli.lib.schema import envelope, agent_block, dumps
# 创建标准响应信封
result = envelope(
action="memory_oom",
ok=True,
agent=agent_block("normal", "分析完成"),
data={"oom_count": 3}
)
# 输出 JSON
print(dumps(result))
```
#### `kernel_log.py` - 内核日志采集
```python
from sysom_cli.lib.kernel_log import get_kernel_log_lines
# 获取内核日志(自动选择 dmesg 或 journalctl)
lines = get_kernel_log_lines()
```
#### `log_plugin.py` - 简单日志扫描框架
适用于**全日志扫描、简单模式匹配**:
```python
from sysom_cli.lib.log_plugin import LogScanContext
# 创建上下文
ctx = LogScanContext(
log_lines=lines,
log_source="dmesg",
metadata={"host": "server01"}
)
# 定义插件(简单函数)
def oom_detector(ctx: LogScanContext) -> Dict[str, Any]:
oom_count = sum(1 for line in ctx.log_lines if "Out of memory" in line)
return {"oom_detected": oom_count > 0, "count": oom_count}
# 执行插件
result = oom_detector(ctx)
```
#### `log_parser.py` - 复杂日志解析框架
适用于**结构化日志解析、状态管理、段落识别**:
```python
from sysom_cli.lib.log_parser import (
LogParser, LogParserContext, LogParserPluginBase
)
# 定义插件(类,有状态)
class OomBlockPlugin(LogParserPluginBase):
def is_start(self, line, global_context, lines, idx):
return "invoked oom-killer" in line
def is_end(self, line, global_context, lines, idx):
return "Killed process" in line
def process(self, line, global_context, lines, idx):
self.local_context.data.append(line)
def done(self, local_context, global_context):
# 处理完整的 OOM 段落
pass
# 使用解析器
parser = LogParser(plugin_list=[OomBlockPlugin()], context=LogParserContext())
parser.parse(lines)
```
**区别**:
- `log_plugin.py`: 简单、无状态、函数式,适合快速扫描
- `log_parser.py`: 复杂、有状态、类式,适合结构化解析
#### `auth.py` - 阿里云认证
```python
from sysom_cli.lib.auth import run_precheck
# 执行认证预检查
result = run_precheck()
# 返回:{"method": "aksk", "success": True, "access_key_id": "LTAI..."}
```
### 1. 文件组织
每个子命令目录应包含:
- **`command.py`** (必需): 命令入口,继承 `BaseCommand`
- **辅助文件** (可选): 根据需要拆分逻辑
- `run.py`: 数据采集逻辑
- `process.py`: 数据处理和格式化
- `api_client.py`: API 客户端(hybrid 模式)
- 其他业务逻辑文件
### 2. 命令元数据
使用 `@command_metadata` 装饰器声明命令信息:
```python
@command_metadata(
name="cmdname", # 命令名称
help="命令描述", # 帮助信息
subsystem="memory", # 可选:标记命令所属子系统(子系统命令必需,顶层命令省略)
args=[ # 参数定义(argparse 格式)
(["--param"], {"help": "参数说明", "default": "value"}),
]
)
class MyCommand(BaseCommand):
pass
```
**关键区别**:
- **子系统命令**(如 `memory oom`):必须指定 `subsystem="memory"`
- **顶层命令**(如 `precheck`):不需要指定 `subsystem`
### 3. 返回格式
所有命令必须返回标准的 JSON 信封格式:
```python
from sysom_cli.lib.schema import envelope, agent_block
return envelope(
action="memory_oom", # 动作名称
ok=True, # 成功/失败
agent=agent_block(
"normal", # 状态:normal/critical/warning/unknown
"简要说明", # 给 Agent 的消息
next_steps=[...] # 可选:后续建议步骤
),
data={...}, # 数据负载
error={...}, # 可选:错误信息
execution={ # 可选:执行信息
"mode": "local",
"stage": "oom"
}
)
```
### 4. 错误处理
```python
try:
result = risky_operation()
except Exception as e:
from sysom_cli.lib.schema import envelope, agent_block
return envelope(
action="memory_oom",
ok=False,
agent=agent_block("unknown", f"执行失败: {e}"),
data={},
error={"code": "execution_error", "message": str(e)}
)
```
## 测试和调试
### 1. 单元测试
创建测试文件 `tests/test_leak.py`:
```python
import pytest
from argparse import Namespace
from sysom_cli.memory.leak.command import LeakCommand
def test_leak_command_local():
cmd = LeakCommand()
ns = Namespace(pid=1234, duration=60, threshold=10.0)
result = cmd.execute_local(ns)
assert result["ok"] == True
assert "pid" in result["data"]
```
### 2. 手动测试
```bash
# 测试命令注册
python3 -m sysom_cli --list-capabilities
# 测试命令执行
MEMORY_MODE=local ./scripts/osops.sh memory leak --pid 1234
# 测试不同模式
MEMORY_MODE=remote ./scripts/osops.sh memory cache
MEMORY_MODE=hybrid ./scripts/osops.sh memory oom
```
### 3. 调试技巧
```bash
# 打开 Python 调试器
python3 -m pdb -m sysom_cli memory leak --pid 1234
# 查看详细日志
python3 -m sysom_cli memory leak --pid 1234 --verbose
```
### 4. 验证 JSON 输出
```bash
# 输出 JSON 并格式化
./scripts/osops.sh memory classify | python3 -m json.tool
```
## 常见问题
### Q1: 新命令没有被发现?
**原因**:
- `command.py` 文件名错误
- 缺少 `@command_metadata` 装饰器
- 目录结构不对
**解决**:
```bash
# 检查目录结构
ls -la scripts/sysom_cli/memory/mycommand/
# 应该看到:
# - __init__.py
# - command.py
# 检查 command.py 是否有装饰器
grep -n "@command_metadata" scripts/sysom_cli/memory/mycommand/command.py
```
### Q2: 如何支持多个执行模式?
在 `supported_modes` 中声明:
```python
@property
def supported_modes(self) -> Dict[str, bool]:
return {
ExecutionMode.LOCAL: True,
ExecutionMode.REMOTE: True, # 支持 remote
ExecutionMode.HYBRID: True, # 支持 hybrid
}
```
然后实现对应的方法:
```python
def execute_remote(self, ns: Namespace) -> Dict[str, Any]:
# remote 模式实现
pass
def execute_hybrid(self, ns: Namespace) -> Dict[str, Any]:
# hybrid 模式实现
pass
```
### Q3: 如何添加子系统(如 `io`)?
**步骤 1**: 创建子系统目录
```bash
mkdir -p scripts/sysom_cli/io
touch scripts/sysom_cli/io/__init__.py
```
**步骤 2**: 创建子系统入口(可选)
```bash
# 如果需要子系统级配置,创建 entry.py
touch scripts/sysom_cli/io/entry.py
```
**步骤 3**: 添加子命令
```bash
mkdir -p scripts/sysom_cli/io/disk
touch scripts/sysom_cli/io/disk/__init__.py
# 创建 scripts/sysom_cli/io/disk/command.py
# 重要:在 @command_metadata 中添加 subsystem="io"
```
**步骤 4**: 更新顶层命令列表
编辑 `scripts/sysom_cli/__main__.py`,添加 `io` 到 `TOP_COMMANDS`:
```python
TOP_COMMANDS: List[Dict[str, Any]] = [
{"name": "memory", "help": "内存诊断", "is_subsystem": True},
{"name": "io", "help": "IO 诊断", "is_subsystem": True}, # 新增
...
]
```
### Q4: 如何共享代码?
**选项 1**: 放在顶层 `lib/` 目录(跨子系统共享,推荐)
```bash
# 将通用工具放到顶层 lib/
# 例如:lib/auth.py, lib/log_parser.py, lib/schema.py
```
```python
# 在任何命令中使用
from sysom_cli.lib.schema import envelope, agent_block
from sysom_cli.lib.auth import run_precheck
from sysom_cli.lib.log_parser import LogParser
```
**选项 2**: 放在子系统内部(子系统内共享)
```bash
# 只在 memory 子系统内使用的工具
mkdir -p scripts/sysom_cli/memory/lib
```
```python
# memory/oom/command.py
from sysom_cli.memory.lib.utils import memory_specific_util
```
**选项 3**: 放在命令目录内(命令内部使用)
```bash
# 只在 oom 命令内使用的辅助文件
# scripts/sysom_cli/memory/oom/collector.py
# scripts/sysom_cli/memory/oom/analyzer.py
```
### Q5: 如何处理异步 API?
使用 Python 的 `asyncio` 或同步轮询:
```python
import time
def execute_hybrid(self, ns: Namespace) -> Dict[str, Any]:
client = ApiClient()
# 提交任务
task_id = client.submit_task()
# 轮询状态
max_attempts = 60
for i in range(max_attempts):
status = client.get_status(task_id)
if status == "completed":
return client.get_result(task_id)
elif status == "failed":
raise Exception("Task failed")
time.sleep(5)
raise TimeoutError("Task timeout")
```
## 贡献指南
1. Fork 本仓库
2. 创建特性分支 (`git checkout -b feature/my-command`)
3. 提交更改 (`git commit -am 'Add my command'`)
4. 推送到分支 (`git push origin feature/my-command`)
5. 创建 Pull Request
## 参考资源
- **核心框架代码**:
- `sysom_cli/core/base.py` - 抽象基类定义
- `sysom_cli/core/registry.py` - 命令注册机制
- `sysom_cli/core/executor.py` - 执行器实现
- **示例命令**:
- `sysom_cli/memory/oom/command.py` - 完整示例
- `sysom_cli/memory/classify/command.py` - 简单示例
- `sysom_cli/memory/cache/command.py` - 多模式示例
- **测试用例**: `tests/` 目录
## 联系方式
如有问题,请在 GitHub 上提 Issue 或联系维护者。
---
**Happy Coding! 🚀**
FILE:references/diagnoses/README.md
# SysOM 诊断类型索引(按 `service_name`)
本目录每个 **`*.md`** 描述 **该诊断在 `params` 内的专有字段**及建议用法(依据 **SysOM 诊断侧脚本与 OpenAPI 行为** 整理;实现不在本包内时以线上为准)。
## 与其它文档的分工(避免重复)
| 内容 | 在哪读 |
|------|--------|
| **诊断本机/远程、InvokeDiagnosis 请求体、`region`/`instance`、元数据补全** | [invoke-diagnosis.md](../invoke-diagnosis.md) |
| **ECS 元数据服务端点、常用路径、IMDS 说明** | [metadata-api.md](../metadata-api.md) |
| **precheck、凭证、三要素、场景 A-K** | [openapi-permission-guide.md](../openapi-permission-guide.md) |
| **本目录各 `service_name` 的字段表** | 下文索引 → 对应 `*.md` |
## 维护约定
- **OpenAPI 全量**诊断项以 **阿里云 SysOM 服务端** 注册的配置为准(本包不嵌服务端 `config` 文件)。
- **本技能文档**覆盖的 `service_name` 以 [SKILL.md](../SKILL.md) 能力表与下文「按分类索引」为准;服务端若另有诊断项,以控制台与 OpenAPI 为准。
- 专文应与 **`service_scripts`** 实现一致;与控制台不一致时以**代码与线上行为**为准。
### 运行命令时的当前目录
`cwd` 约定见 [agent-conventions.md](../agent-conventions.md)。
## 按分类索引(与 SKILL 透出表一致)
能力与 [SKILL.md](../SKILL.md) 总览表一致,下文按分类列出 **params 专文** 链接。
### 内存与 Java / Go
| service_name | 文档 |
|--------------|------|
| memgraph | [memgraph.md](./memgraph.md) |
| oomcheck | [oomcheck.md](./oomcheck.md) |
| javamem | [javamem.md](./javamem.md) |
| gomemdump(服务端若仍存在;**本技能 CLI 已不暴露**) | [gomemdump.md](./gomemdump.md) |
### IO 与磁盘(CLI:`io`)
| service_name | 文档 |
|--------------|------|
| iofsstat | [iofsstat.md](./iofsstat.md) |
| iodiagnose | [iodiagnose.md](./iodiagnose.md) |
### 网络(CLI:`net`)
| service_name | 文档 |
|--------------|------|
| packetdrop | [packetdrop.md](./packetdrop.md) |
| netjitter | [netjitter.md](./netjitter.md) |
### 负载与调度(CLI:`load`)
| service_name | 文档 |
|--------------|------|
| delay | [delay.md](./delay.md) |
| loadtask | [loadtask.md](./loadtask.md) |
## 实现源码(排障)
各诊断在 **SysOM 诊断服务** 侧通常有 `*_pre.py` 或同名脚本,经 **OpenAPI `invoke_diagnosis`** 路由到实例执行;源码不在本仓库技能包内时以 **线上行为** 为准。
FILE:references/diagnoses/delay.md
# delay(调度延迟 / nosched 诊断)
> 参数说明依据 SysOM 诊断侧脚本与 OpenAPI 行为整理。OpenAPI **`service_name` 为 `delay`**;实现使用 **`sysak -g nosched`**(调度延迟,非 ICMP RTT)。
## 功能概述
- **实时**(`is_history` 为 0):执行 **`sysak -g nosched`**,阈值 **`threshold`**(毫秒)、持续 **`duration`**(秒)。
- **历史**(`is_history` 非 0):按时间戳查询(**≤1 小时、7 天内**)。
## 何时选用(Agent)
- **调度延迟、runqueue、nosched 可观测的卡顿**。
- **不要**当作「ICMP 网络时延」唯一手段(产品配置注释可能写「网络延迟」,以本实现为准)。
## `params` 字段
| 字段 | 类型 | 必填 | 含义 | 默认 | 备注 |
|------|------|------|------|------|------|
| `region` | string | 是* | 地域 | — | `--region` |
| `instance` | string | 是* | 实例 ID(去 `:port`) | — | `--instance` |
| `is_history` | int | 否 | 是否历史 | `0` | 非 0 走历史查询 |
| `anomaly_start` / `anomaly_end` | int | 条件 | Unix 秒 | `0` | 历史模式必填 |
| `duration` | int | 否 | 实时采集秒数 | `20` | **须 ≤60** |
| `threshold` | int | 否 | 阈值 **毫秒** | `20` | 传给 `nosched -t` |
\* CLI `--region`/`--instance` 可合并写入 params;本机 ECS 省略时由元数据补全。
## 平台约束
| 项 | 值 |
|----|-----|
| support_channel | **ecs \| eflo** |
| support_mode | **仅 node** |
| sysak 最低 | **`3.6.0-1`** |
## 建议用法
**当前目录**:见 [agent-conventions.md](../agent-conventions.md)(在 `sysom-diagnosis/` 下使用 `./scripts/osops.sh`)。
```bash
./scripts/osops.sh load delay --channel ecs \
--region cn-hangzhou --instance i-xxx \
--params '{"duration":30,"threshold":20}'
```
FILE:references/diagnoses/iodiagnose.md
# iodiagnose(IO 深度 / ioMonitor 诊断)
> 参数说明依据 SysOM 诊断侧脚本与 OpenAPI 行为整理。
## 功能概述
在目标实例执行 **`sysak ioMonitor`**(带固定 yaml、日志路径与诊断开关),用于 **IO 延迟、iowait、burst** 等深度采集。
## 何时选用(Agent)
- **IO 慢、iowait 高** 等需要 ioMonitor 一键采集时(在 iofsstat 大盘之后)。
## `params` 字段
| 字段 | 类型 | 必填 | 含义 | 默认 | 备注 |
|------|------|------|------|------|------|
| `region` | string | 是* | 地域 | — | `--region` |
| `instance` | string | 是* | 实例 ID | — | `--instance` |
| `timeout` | string/int | 否 | ioMonitor 采集时长(秒) | `"30"` | 解析失败或非正数时用 30;**上限 300**,超出则回退为 30 |
\* CLI `--region`/`--instance` 可合并写入 params;本机 ECS 省略时由元数据补全。
## 平台约束
| 项 | 值 |
|----|-----|
| support_channel | **ecs \| eflo** |
| support_mode | **仅 node** |
| sysak 最低 | **`3.6.0-1`** |
## 建议用法
**当前目录**:见 [agent-conventions.md](../agent-conventions.md)(在 `sysom-diagnosis/` 下使用 `./scripts/osops.sh`)。
```bash
./scripts/osops.sh io iodiagnose --channel ecs \
--region cn-hangzhou --instance i-xxx --params '{"timeout":60}'
```
长采集时同步增大 `--timeout`(轮询总超时)。
FILE:references/diagnoses/iofsstat.md
# iofsstat(IO 流量 / 磁盘统计大盘)
> 参数说明依据 SysOM 诊断侧脚本与 OpenAPI 行为整理。
## 功能概述
在目标实例执行 **`sysak iofsstat`**,采集 IO 统计并输出 JSON,用于 **磁盘/块设备 IO 统计大盘**。
## 何时选用(Agent)
- 需要 **磁盘/块设备 IO 统计大盘**,先看 IO 概况再决定是否做 **iodiagnose**。
## `params` 字段
| 字段 | 类型 | 必填 | 含义 | 默认 | 备注 |
|------|------|------|------|------|------|
| `region` | string | 是* | 地域 | — | `--region` |
| `instance` | string | 是* | 实例 ID | — | `--instance` |
| `timeout` | string/int | 否 | 采样时长(秒) | `"15"` | ≤0 时置 15;**>30 时置 30** |
| `disk` | string | 否 | 块设备名(如 `vda`) | `""` | 非空时命令追加 `-d <disk>` |
\* CLI `--region`/`--instance` 可合并写入 params;本机 ECS 省略时由元数据补全。
## 平台约束
| 项 | 值 |
|----|-----|
| support_channel | **ecs \| eflo** |
| support_mode | **仅 node** |
| sysak 最低 | **`3.6.0-1`** |
## 建议用法
**当前目录**:见 [agent-conventions.md](../agent-conventions.md)(在 `sysom-diagnosis/` 下使用 `./scripts/osops.sh`)。
```bash
./scripts/osops.sh io iofsstat --channel ecs \
--region cn-hangzhou --instance i-xxx \
--params '{"timeout":"20","disk":"vda"}'
```
FILE:references/diagnoses/javamem.md
# javamem(Java 内存诊断)
> 参数说明依据 `service_scripts/javamem_pre.py` 整理。
## 功能概述
对指定 **Java 进程**执行 SysOM javamem,输出 **`javamem.json`**,用于 JVM 内存分析。
## 何时选用(Agent)
- **Java 应用内存高、堆、JNI profiling** 等需云上采集时。
## `params` 字段
| 字段 | 类型 | 必填 | 含义 | 默认 | 备注 |
|------|------|------|------|------|------|
| `region` | string | 是* | 地域 | — | `--region` |
| `instance` | string | 是* | 实例 ID | — | `--instance` |
| `pod` | string | 否 | Pod 名 | — | 与 `Pid`/`pid` 至少其一配合使用 |
| `Pid` 或 `pid` | string/int | **条件** | Java 进程 PID | — | **`pod` 与 `pid` 至少填一个**(否则 INVALID_PARAMS) |
| `duration` | int | 否 | profiling 持续 **分钟数** | `"0"` | `0` 表示不追加 profiling |
### 校验
- 若 **既无 `pod` 也无 `pid`**:服务端返回 **参数无效**。
## 平台约束
| 项 | 值 |
|----|-----|
| support_channel | **all** |
| support_mode | **all** |
| 最低版本 | 常见 **`3.8.0-beta`** |
## 建议用法
**当前目录**:见 [README.md](./README.md)(在 `sysom-diagnosis/` 下使用 `./scripts/osops.sh`)。
```bash
./scripts/osops.sh memory javamem --deep-diagnosis --channel ecs \
--region cn-hangzhou --instance i-xxx \
--params '{"pid":12345,"duration":5}'
```
按需把 `pid`/`Pod`/`duration` 放入 `--params-file`。长耗时加大 `--timeout`。
FILE:references/diagnoses/loadtask.md
# loadtask(系统负载诊断)
> 参数说明依据 SysOM 诊断侧脚本与 OpenAPI 行为整理。
## 功能概述
在目标实例执行 **`sysak -g loadtask`**,读取 `summary.json` 与临时日志,用于 **load average 高、CPU 排队、负载任务分析**。
## 何时选用(Agent)
- **load average 高、CPU 排队、负载任务分析**。
- 必须走 `./scripts/osops.sh load loadtask ...` 触发 SysOM `InvokeDiagnosis`;不要改用 ECS 通用诊断或 RunCommand 手工采集替代。
## `params` 字段
| 字段 | 类型 | 必填 | 含义 | 默认 | 备注 |
|------|------|------|------|------|------|
| `region` | string | 是* | 地域 | — | `--region` |
| `instance` | string | 是* | 实例 ID | — | `--instance` |
\* CLI `--region`/`--instance` 可合并写入 params;本机 ECS 省略时由元数据补全。
## 平台约束
| 项 | 值 |
|----|-----|
| support_channel | **ecs \| eflo** |
| support_mode | **仅 node** |
| sysak 最低 | **`3.6.0-1`** |
## 建议用法
**当前目录**:见 [agent-conventions.md](../agent-conventions.md)(在 `sysom-diagnosis/` 下使用 `./scripts/osops.sh`)。
```bash
./scripts/osops.sh load loadtask --channel ecs \
--region cn-hangzhou --instance i-xxx --timeout 300
```
FILE:references/diagnoses/memgraph.md
# memgraph(内存大盘 / 内存全景)
> 参数说明依据 SysOM 诊断侧脚本与 OpenAPI 行为整理。
## 功能概述
在目标实例上执行 SysOM memgraph 采集,生成 **`memgraph.json`** 回传,用于 **内存全景 / 内存大盘** 分析。覆盖 **整机/应用维度的内存组成**,部分版本还支持 **TCP 内存 / socket 队列** 等网络侧视角。Agent 结论应结合用户现象与回传 JSON,勿仅凭本机 `/proc` 粗检。
## 何时选用(Agent)
- 需要 **整体内存分布、各类占用**;或怀疑 **TCP / socket 队列** 参与内存压力时。
- 用户主诉延迟/卡顿且 **`ss` 显示 Send-Q/Recv-Q 偏大**:应加做 **`memory memgraph --deep-diagnosis`**。
- OOM 排查链路中需要 memgraph 结果(与 **oomcheck** 数据路径相关)。
## `params` 字段(JSON 对象,经 InvokeDiagnosis 作字符串传入)
| 字段 | 类型 | 必填 | 含义 | 默认 | 备注 |
|------|------|------|------|------|------|
| `region` | string | 是* | 地域 ID | — | CLI `--region` 可合并写入 |
| `instance` | string | 是* | ECS 实例 ID 或目标标识 | — | CLI `--instance` 可合并写入 |
| `pod` | string | 否 | Pod 名(容器场景) | `""` | 非空时命令追加 `-p <pod>` |
| `profiling_on` | bool | 否 | 是否开启 profiling | `false` | 高版本才生效 |
| `pid` | string/int | 否 | 进程 PID | `null` | 与 `profiling_on` 配合 |
| `duration` | int | 否 | profiling 持续 **分钟数** | `0` | `0` 表示不追加 profiling |
\* 在 **sysom-diagnosis(技能根)** 执行、且 **`--region` / `--instance` 由 CLI 从元数据或命令行合并进 `params`** 时,JSON 内可不写二者。
## 平台约束
| 项 | 值 |
|----|-----|
| support_channel | **all** |
| support_mode | **all**(node / pod) |
| 最低版本 | 常见 node **`3.6.0-1`** |
## 建议用法
**当前目录**:须在 **sysom-diagnosis(技能根)**(存在 `scripts/osops.sh`)下执行;若不在该目录,使用 `cd <sysom-diagnosis> && …`(说明见 [README.md](./README.md) 中「运行命令时的当前目录」)。在任意工作目录下直接执行 `./scripts/osops.sh` 会报「未找到」。
```bash
# Agent:直接走内存全景专项(常见「内存高、要大图拆解」主路径)
cd <sysom-diagnosis> && ./scripts/osops.sh memory memgraph --deep-diagnosis --channel ecs \
--region cn-hangzhou --instance i-xxx --timeout 300
```
不确定归类时用 **`memory classify --deep-diagnosis`**。复杂 profiling 时用 `--params-file` 传 JSON;长耗时加大 `--timeout`。
FILE:references/diagnoses/netjitter.md
# netjitter(网络抖动诊断)
> 参数说明依据 SysOM 诊断侧脚本与 OpenAPI 行为整理。
## 功能概述
- **实时**(`is_history` 为 0/false):执行 **`sysak rtrace --jitter-unity`**,阈值 **`threshold`**(毫秒)、持续 **`duration`**(秒)。
- **历史**(`is_history` 非 0):按 **`anomaly_start`/`anomaly_end`** 查本地表(时间窗 **≤1 小时、7 天内**)。
## 何时选用(Agent)
- **网络抖动、时延波动** 类问题。
## 与 `memory memgraph` 的互补
本专项**不**采集 socket 发送/接收队列积压、TCP 内存占用等 memgraph 侧数据。若 `netjitter` 正常但用户仍觉延迟,且 `ss` 显示 Send-Q/Recv-Q 偏大,应再跑 `memory memgraph --deep-diagnosis`。详见 [non-memory-routing.md](../non-memory-routing.md) 与 [memgraph.md](./memgraph.md)。
## `params` 字段
| 字段 | 类型 | 必填 | 含义 | 默认 | 备注 |
|------|------|------|------|------|------|
| `region` | string | 是* | 地域 | — | `--region` |
| `instance` | string | 是* | 实例 ID(去 `:port`) | — | `--instance` |
| `is_history` | int/bool | 否 | 是否历史模式 | `0` | 非 0 走历史查询 |
| `anomaly_start` | int | 条件 | 起始 Unix 秒 | `0` | 历史必填 |
| `anomaly_end` | int | 条件 | 结束 Unix 秒 | `0` | 历史必填 |
| `duration` | int | 否 | 实时采集持续 **秒** | `20` | **须 ≤60** |
| `threshold` | int | 否 | 抖动阈值 **毫秒** | `10` | 传入 `rtrace --jitter-unity -t` |
\* CLI `--region`/`--instance` 可合并写入 params;本机 ECS 省略时由元数据补全。
## 平台约束
| 项 | 值 |
|----|-----|
| support_channel | **ecs \| eflo** |
| support_mode | **仅 node** |
| sysak 最低 | **`3.6.0-1`** |
## 建议用法
**当前目录**:见 [agent-conventions.md](../agent-conventions.md)(在 `sysom-diagnosis/` 下使用 `./scripts/osops.sh`)。
```bash
./scripts/osops.sh net netjitter --channel ecs \
--region cn-hangzhou --instance i-xxx \
--params '{"duration":30,"threshold":10}'
```
FILE:references/diagnoses/oomcheck.md
# oomcheck(OOM 诊断)
> 参数说明依据 `service_scripts/oomcheck_pre.py` 整理。
## 功能概述
在目标实例上执行 SysOM oomcheck,结合 memgraph 输出路径分析 **OOM / oom-killer**。
## 何时选用(Agent)
- 云上 **OOM、oom-killer**、需 **SysOM 远程 OOM 诊断**。
- **勿**与父仓库 **linux-memory-oom / `sysom_cli memory oom`**(本机 dmesg)混淆。
- 远程 OOM 诊断必须通过 `./scripts/osops.sh memory oom --deep-diagnosis ...` 触发 SysOM `InvokeDiagnosis`,不要退化为 ECS RunCommand 手工采集。
## Agent 操作约定
通用约定(执行目录、Bash 执行、本机/远程区分、凭证安全)见 [agent-conventions.md](../agent-conventions.md)。以下仅列 oomcheck 特有规则。
### 多次 OOM
- 本机 quick 显示多次 OOM 时,可用 `--oom-at` 锚定某次。
- 远程 oomcheck 须用 `--oom-time` 或 `--params` 中的 `time`;**禁止**在用户已指定时刻时仍走无时间限定的默认命令。
### 时间格式
- CLI 支持 ISO、`YYYY-MM-DD HH:MM:SS`、Unix 秒、journal 风格等;发起 Invoke 前会自动转为 Unix 秒。
## `params` 字段
| 字段 | 类型 | 必填 | 含义 | 默认 | 备注 |
|------|------|------|------|------|------|
| `region` | string | 是* | 地域 | — | `--region` |
| `instance` | string | 是* | 实例 ID | — | `--instance` |
| `pod` | string | 否 | Pod 名 | `""` | 非空追加 `-p` |
| `time` | string | 否 | OOM 发生时间或 `开始~结束` | `""` | CLI `--oom-time`;自动转 Unix 秒 |
## 平台约束
| 项 | 值 |
|----|-----|
| support_channel | **all** |
| support_mode | **all** |
## 建议用法
本机(CLI 自动补全 region/instance):
```bash
./scripts/osops.sh memory oom --deep-diagnosis --channel ecs --timeout 300
```
远程实例:
```bash
./scripts/osops.sh memory oom --deep-diagnosis --channel ecs \
--region cn-hangzhou --instance i-xxx --timeout 300
```
本机 quick 专用选项:`--oom-at`(锚定时间)、`--max-oom-summaries`(默认 64)、`--max-oom-full-logs`(默认 1)。
远程:`--oom-time` 写入 `params.time`(自动转 Unix 秒)。历史窗口务必 **≤1 小时、7 天内**。
FILE:references/diagnoses/packetdrop.md
# packetdrop(网络丢包诊断)
> 参数说明依据 SysOM 诊断侧脚本与 OpenAPI 行为整理。
## 功能概述
- **实时**(`is_history` 为假,默认):在实例上执行 **`sysak -g rtrace --drop-unity`**。
- **历史**(`is_history` 为真):在服务端聚合 Prometheus 等指标做离线分析。
## 何时选用(Agent)
- **丢包、重传、网卡侧 rtrace**;或需 **历史区间** 分析时开 `is_history`。
## 与 `memory memgraph` 的互补
本专项**不**覆盖 socket 队列全景、TCP 内存等;`packetdrop` 无异常不能排除应用背压导致 Send-Q 积压或内核 TCP 内存与延迟的关联。出现此类线索时请加做 `memory memgraph --deep-diagnosis`。详见 [non-memory-routing.md](../non-memory-routing.md) 与 [memgraph.md](./memgraph.md)。
## `params` 字段
| 字段 | 类型 | 必填 | 含义 | 默认 | 备注 |
|------|------|------|------|------|------|
| `region` | string | 是* | 地域 | — | `--region` |
| `instance` | string | 是* | 实例 ID(可含 `:port`,实现取 host) | — | `--instance` |
| `is_history` | bool | 否 | 是否走历史/离线分析 | `false` | `true` 时用 `anomaly_start`/`anomaly_end` |
| `anomaly_start` | number | 条件 | 历史查询起始时间戳(秒) | `0` | |
| `anomaly_end` | number | 条件 | 历史结束时间戳(秒) | `0` | |
\* CLI `--region`/`--instance` 可合并写入 params;本机 ECS 省略时由元数据补全。
## 平台约束
| 项 | 值 |
|----|-----|
| support_channel | **ecs \| eflo** |
| support_mode | **仅 node** |
| sysak 最低 | **`3.6.0-1`** |
## 建议用法
**当前目录**:见 [agent-conventions.md](../agent-conventions.md)(在 `sysom-diagnosis/` 下使用 `./scripts/osops.sh`)。
```bash
./scripts/osops.sh net packetdrop --channel ecs \
--region cn-hangzhou --instance i-xxx
```
历史模式加 `--params '{"is_history":true,"anomaly_start":...,"anomaly_end":...}'`。
FILE:references/invoke-diagnosis.md
# InvokeDiagnosis(契约、CLI 与前置)
> **本文件**说明 **InvokeDiagnosis** 契约、**本机/远程**约定与 **CLI 选项**。Agent 主路径为 `memory … --deep-diagnosis`。各诊断专有 `params` 见 [diagnoses/](./diagnoses/)。
**权限与开通**:远程路径在发 OpenAPI 前会**内建**与 **`osops precheck` 相同的环境检查**;亦可单独运行 precheck 自检。三要素与场景矩阵见 [openapi-permission-guide.md](./openapi-permission-guide.md)。
**本地优先**:未明确要远程时,先跑本机 quick(见 [memory-routing.md](./memory-routing.md))。下表「诊断目标」仅适用于远程深度诊断。
## 诊断目标(远程深度诊断前必做,所有 `service_name` 相同)
**在每次发起远程深度诊断之前**(`memory … --deep-diagnosis`),Agent 须先请用户确认诊断范围:
| 用户选择 | 命令行 | 说明 |
|----------|--------|------|
| **A — 本机** | 不传 `--region`/`--instance`,CLI 从元数据补全 | Agent 勿自行 curl 元数据填参数 |
| **B — 远程实例** | 用户提供 `--region` 与 `--instance` | 禁止用本机元数据冒充远程实例 |
失败重试、换诊断类型时,**仍遵循上表**:本机继续省略两参数;远程继续用用户提供的 region/instance。
## 请求体结构(InvokeDiagnosis)
| 字段 | 说明 |
|------|------|
| `service_name` | 字符串,诊断类型,与 OpenAPI **`diagnosis_item_config.items`** 的键一致。 |
| `channel` | 字符串,当前一般为 **`ecs`**(与 CLI `--channel ecs` 一致)。 |
| `params` | **字符串**:内容为 **JSON** 文本;反序列化后为对象,通常需 **`region`**、**`instance`** 以定位 ECS;**各诊断特有字段**见 [diagnoses/](./diagnoses/) 下对应 `service_name` 专文。 |
经 OpenAPI **`invoke_diagnosis`** 转发时,服务端会把 **`uid`** 合并进 **`params`**(来自请求上下文),一般无需在 `params` 里重复手填。
**前置条件**:目标 ECS 须运行中、已装云助手;须在 **SysOM / ECS 控制台** 对目标实例完成 **诊断授权**(勿使用已废弃的 OpenAPI 授权接口);账号侧需 **AliyunServiceRoleForSysom** 等服务关联角色(细则见阿里云 SysOM 产品说明)。
## 路由硬约束(评测关键)
- 远程诊断结论必须来自 SysOM `InvokeDiagnosis` / `GetDiagnosisResult` 返回结果。
- 禁止以 ECS 通用诊断 API 或 `Ecs.RunCommand`/Cloud Assistant 手工采集(`top`/`ps`/`iostat`/`uptime`)替代 SysOM 专项调用。
## CLI 与内部 Invoke(原 `diagnosis invoke`)
**对外**:`./scripts/osops.sh memory <子命令> --deep-diagnosis …`,选项如下:
| 选项 | 说明 |
|------|------|
| `--region` / `--instance` | 合并进 `params`;见下节 **元数据补全**。 |
| `--timeout` | 轮询 **GetDiagnosisResult** 的总等待秒数,默认 `300`;长任务需加大。 |
| `--poll-interval` | 轮询间隔秒数,默认 `1`。 |
| `--verbose-envelope` | 成功时保留完整 **`agent.summary`**;默认紧凑(省 token),业务载荷见 **`data.remote`**。 |
**结果**:成功时见 `data.routing`、`data.remote`(`remote.result`)、`agent.findings`。
**失败时**:`error.code`/`error.message` 为标准业务码;环境类失败可能返回与 precheck 同构的信封;业务失败时 `data` 含 `remediation` 等引导。
### 本机元数据补全
未传 `--region`/`--instance` 时,CLI 会请求 ECS 元数据(`100.100.100.200`)补全。若 `error.code` 为 `Sysom.InvalidParameter`、`instance not found in ecs`,常见原因是 **AK 所属账号与实例不匹配**(含跨账号)。precheck 通过仅表示凭证可调接口,不保证实例对齐。
元数据详见 [metadata-api.md](./metadata-api.md)。
## 按诊断类型的参数
见 [diagnoses/README.md](./diagnoses/README.md) 及对应 `*.md` 专文。
## 相关入口
- [SKILL.md](../SKILL.md) — 能力总览表
- [openapi-permission-guide.md](./openapi-permission-guide.md) — 权限与 precheck
FILE:references/memory-routing.md
# 内存域:快速排查与 SysOM 专项(Agent 补充)
本页**不重复** [SKILL.md](../SKILL.md) 中的「诊断能力总览」表;仅归纳 **内存相关命令** 与 **下一步读哪**。
## 何时用哪条入口
**原则**:优先匹配用户表述是否已对应某一 **专项**(下表前几行);**无法匹配或意图明显模糊**时,再用最后一行 **`memory classify`** 综合归类。
| 意图 | 建议 |
|------|------|
| 内存占用高、要看整机/应用内存大图与组成拆解;怀疑 **TCP 内存高 / socket 队列积压**;或**主诉延迟/卡顿**且 **`ss` 显示 Send-Q/Recv-Q 偏大** | `./scripts/osops.sh memory memgraph` |
| 明确怀疑 OOM / 要看内核 oom-killer 线索 | `./scripts/osops.sh memory oom` |
| 明确 Java 内存 / JVM | `./scripts/osops.sh memory javamem` |
| Go 相关弱特征或需语言侧细节(非 Java) | `./scripts/osops.sh memory memgraph`(全景);已确认 Java 时用 `javamem` |
| 不明确、需综合归类(meminfo + OOM 线索 + RSS 合一) | `./scripts/osops.sh memory classify` |
各命令 stdout 信封中请关注 **`agent.next`**(环境已通过时常仅一条深度命令;否则可含 precheck 步骤)、**`agent.findings`**(关键指标摘要)、**`data.routing`** / **`data.local`**。路由到 **oomcheck** 时另见 [oomcheck.md](./diagnoses/oomcheck.md)「Agent 操作约定」。**`data`** 侧不放 `next_steps`;**`--verbose-envelope`** 只加长 **`agent.summary`**。
## 可选:一步完成专项
在 **`memory memgraph` / `oom` / `javamem` / `classify`** 上加 **`--deep-diagnosis`**(并可传 `--region` / `--instance` / **`--params` / `--params-file`** 等与 [invoke-diagnosis.md](./invoke-diagnosis.md) 专项约定一致的选项),可发起**深度诊断**;**`memory oom`** 还可传 **`--oom-time`** 写入 oomcheck 的 `time`。本地入口加 `--deep-diagnosis` 时以远程专项为主,合并结果见 **`data.remote`**。**本机诊断**的示例命令**勿**写 `--region`/`--instance`;**仅诊断其它 ECS** 且用户已提供时再写入这两项。
远程路径会**内建**与 precheck 相同的环境检查;**可选**单独执行 **`./scripts/osops.sh precheck`** 做自检或排障。
## 专项参数与契约
各 `service_name` 的 **params**、边界:仅读 [diagnoses/](./diagnoses/) 下对应专文;请求体与元数据补全:[invoke-diagnosis.md](./invoke-diagnosis.md)。
FILE:references/metadata-api.md
# ECS Metadata 服务
## 什么是 Metadata 服务?
ECS Metadata 是阿里云 ECS 实例内置的一个 HTTP 服务,用于提供当前实例的元数据信息。**无需配置 AccessKey,实例内部可直接访问**。
---
## 访问端点
| 版本 | 端点 URL | 说明 |
|------|----------|------|
| V1 | `http://100.100.100.200/` | 简化访问,无需 token |
| V2 | `http://100.100.100.200/latest/meta-data/` | 需要 IMDSv2 token,更安全 |
| 兼容 | `http://169.254.169.254/` | AWS 兼容端点 |
---
## 常用 Metadata 字段
### 实例基本信息
```bash
curl http://100.100.100.200/latest/meta-data/instance-id # 实例 ID
curl http://100.100.100.200/latest/meta-data/instance-type # 实例规格
curl http://100.100.100.200/latest/meta-data/region-id # 地域 ID
curl http://100.100.100.200/latest/meta-data/zone-id # 可用区 ID
curl http://100.100.100.200/latest/meta-data/hostname # 主机名
curl http://100.100.100.200/latest/meta-data/serial-number # 序列号
curl http://100.100.100.200/latest/meta-data/image-id # 镜像 ID
curl http://100.100.100.200/latest/meta-data/os-type # 操作系统类型
curl http://100.100.100.200/latest/meta-data/os-name # 操作系统名称
curl http://100.100.100.200/latest/meta-data/launch-time # 启动时间
```
### 网络信息
```bash
curl http://100.100.100.200/latest/meta-data/mac # MAC 地址
curl http://100.100.100.200/latest/meta-data/vpc-id # VPC ID
curl http://100.100.100.200/latest/meta-data/vswitch-id # 交换机 ID
curl http://100.100.100.200/latest/meta-data/private-ipv4 # 私网 IP
curl http://100.100.100.200/latest/meta-data/public-ipv4 # 公网 IP
curl http://100.100.100.200/latest/meta-data/eipv4 # EIP 地址
curl http://100.100.100.200/latest/meta-data/network/interfaces/macs/<MAC>/vpc-id # 指定网卡的 VPC
```
### 安全相关
```bash
curl http://100.100.100.200/latest/meta-data/security-groups # 安全组列表
curl http://100.100.100.200/latest/meta-data/ram-role-name # RAM 角色名
```
### 其他
```bash
curl http://100.100.100.200/latest/meta-data/ntp-conf/ntp-servers # NTP 服务器
curl http://100.100.100.200/latest/meta-data/source-address # 请求源地址
```
---
## 查看所有可用字段
```bash
curl http://100.100.100.200/latest/meta-data/
```
---
## IMDSv2(更安全的访问方式)
阿里云支持 IMDSv2,需要先生成 token:
```bash
# 第一步:获取 token
TOKEN=$(curl -X PUT "http://100.100.100.200/latest/api/token" \
-H "X-aliyun-ecs-metadata-token-ttl-seconds: 21600")
# 第二步:使用 token 访问 metadata
curl -H "X-aliyun-ecs-metadata-token: $TOKEN" \
http://100.100.100.200/latest/meta-data/instance-id
```
---
## Metadata 服务特点
| 特性 | 说明 |
|------|------|
| 仅实例内访问 | 只能从 ECS 实例内部访问,外部无法访问 |
| 无需认证 | V1 模式无需任何凭证 |
| 实时性 | 数据随实例状态实时更新 |
| 只读 | 只能读取,不能修改 |
| 本地服务 | 不产生公网流量 |
---
## 典型使用场景
### 1. 脚本自动获取实例信息
```bash
INSTANCE_ID=$(curl -s http://100.100.100.200/latest/meta-data/instance-id)
echo "Running on instance: $INSTANCE_ID"
```
### 2. 根据实例规格动态配置
```bash
INSTANCE_TYPE=$(curl -s http://100.100.100.200/latest/meta-data/instance-type)
case $INSTANCE_TYPE in
*large*) WORKERS=4 ;;
*xlarge*) WORKERS=8 ;;
esac
```
### 3. 获取 RAM 角色临时凭证
```bash
ROLE_NAME=$(curl -s http://100.100.100.200/latest/meta-data/ram-role-name)
curl -s http://100.100.100.200/latest/meta-data/ram/security-credentials/$ROLE_NAME
```
### 4. SysOM 工具使用场景
**自动获取实例信息用于诊断**:
```bash
#!/bin/bash
# 自动收集实例元数据用于诊断报告
INSTANCE_ID=$(curl -s http://100.100.100.200/latest/meta-data/instance-id)
REGION_ID=$(curl -s http://100.100.100.200/latest/meta-data/region-id)
INSTANCE_TYPE=$(curl -s http://100.100.100.200/latest/meta-data/instance-type)
echo "实例信息:"
echo " Instance ID: $INSTANCE_ID"
echo " Region: $REGION_ID"
echo " Type: $INSTANCE_TYPE"
# 在 sysom-diagnosis(技能根)执行(或 cd <sysom-diagnosis> && …);REGION_ID 与上面 curl 一致
# 快速排查后深度诊断(oomcheck):
cd <sysom-diagnosis> && ./scripts/osops.sh memory oom --deep-diagnosis --channel ecs --region "$REGION_ID" --instance "$INSTANCE_ID"
```
**验证 RAM Role 配置**:
```bash
# 检查实例是否绑定了 RAM 角色
ROLE_NAME=$(curl -s http://100.100.100.200/latest/meta-data/ram-role-name)
if [ -z "$ROLE_NAME" ]; then
echo "⚠️ 实例未绑定 RAM 角色"
echo "请参考文档配置 ECS RAM Role:./authentication.md"
else
echo "✓ 实例已绑定 RAM 角色: $ROLE_NAME"
# 获取临时凭证
CREDS=$(curl -s http://100.100.100.200/latest/meta-data/ram/security-credentials/$ROLE_NAME)
echo "临时凭证已获取(有效期内自动刷新)"
fi
```
---
## 安全建议
- 优先使用 IMDSv2(需要 token),防止 SSRF 攻击
- 通过 MetadataOptions 控制 token 跳转限制
- 定期审计实例的 Metadata 访问配置
- 在应用代码中避免将 Metadata 信息暴露给外部用户
---
## 参考资源
- [阿里云 ECS Metadata 文档](https://help.aliyun.com/zh/ecs/user-guide/overview-of-ecs-instance-metadata)
- [IMDSv2 安全实践](https://help.aliyun.com/zh/ecs/user-guide/use-instance-metadata)
- [认证配置指南](./authentication.md)
FILE:references/non-memory-routing.md
# 非内存域:IO / 网络 / 负载子命令路由
与 [memory-routing.md](./memory-routing.md) 并列;IO/网络/负载子命令本身即远程专项入口,内建环境检查。
## 强约束(避免错误路由)
- `io/*`、`net/*`、`load/*` 场景必须通过 `./scripts/osops.sh` 进入 SysOM 远程专项(`InvokeDiagnosis`)。
- 禁止用 ECS 通用诊断 API 或 `Ecs.RunCommand`/Cloud Assistant 手工执行 `top`、`ps`、`iostat`、`uptime` 等命令替代专项诊断。
## 推荐入口
子命令名与 **`service_name`** 一致;`--channel`/`--params`/`--region`/`--instance`/轮询选项等与 [invoke-diagnosis.md](./invoke-diagnosis.md) 中专项约定一致。
| 子系统 | 子命令 | 专文 |
|--------|--------|------|
| **`io`** | `iofsstat` | [iofsstat.md](./diagnoses/iofsstat.md) |
| **`io`** | `iodiagnose` | [iodiagnose.md](./diagnoses/iodiagnose.md) |
| **`net`** | `packetdrop` | [packetdrop.md](./diagnoses/packetdrop.md) |
| **`net`** | `netjitter` | [netjitter.md](./diagnoses/netjitter.md) |
| **`load`** | `delay` | [delay.md](./diagnoses/delay.md) |
| **`load`** | `loadtask` | [loadtask.md](./diagnoses/loadtask.md) |
示例:
```bash
./scripts/osops.sh io iofsstat --channel ecs --timeout 300
./scripts/osops.sh net packetdrop --channel ecs --region cn-hangzhou --instance i-xxx
```
## 与内存域 `memgraph` 的交叉(延迟 / socket 队列)
`net packetdrop` / `net netjitter` 主要覆盖丢包、rtrace、抖动阈值等;**不**提供整机内存 + socket 队列 + TCP 内存占用的 SysOM memgraph 视图。
当用户描述 **网络延迟、卡顿**,且本机 `ss` 等已观察到 **Send-Q / Recv-Q 积压**(含 `127.0.0.1` 上应用间 TCP),即使 packetdrop / netjitter 远程结果正常,仍建议补充:
```bash
./scripts/osops.sh memory memgraph --deep-diagnosis --channel ecs --timeout 300
```
本机 ECS 不传 `--region`/`--instance`;规则同 [invoke-diagnosis.md](./invoke-diagnosis.md)。详见 [memgraph.md](./diagnoses/memgraph.md)。
FILE:references/openapi-permission-guide.md
# SysOM OpenAPI:Agent 权限引导速查
供 Agent 按步骤引导用户完成身份、策略与开通配置;与 **`precheck` JSON**、**`diagnosis`** 命令配合使用。
## 1. 会话铁律
1. 用户每次会话中若要用 **远程 SysOM 能力**(`memory … --deep-diagnosis`),**第一步**在 **sysom-diagnosis(技能根)**执行:
```bash
./scripts/osops.sh precheck
```
(或 `cd scripts && ./osops.sh precheck`。首次使用需先 `./scripts/init.sh`。)
2. **precheck 通过前**,不要反复调用会走 OpenAPI 的诊断命令。
3. precheck **失败**时:根据 JSON 中的 `error.code`、`data.guidance`、`agent.findings` 引导用户;**禁止**盲目重试同一远程接口。配置完成后**再跑 precheck**,通过后再执行诊断。
## 2. 远程 OpenAPI 成功的三要素(缺一不可)
| 要素 | 含义 | 典型操作入口 |
|------|------|----------------|
| **身份** | 可调 OpenAPI 的凭证 | **AK/SK**:RAM 用户 + AccessKey;**ECS RAM Role**(推荐在 ECS 上跑 Agent):RAM 可信 ECS 的角色 + 绑定到实例 + `aliyun configure --mode EcsRamRole` |
| **策略** | `AliyunSysomFullAccess` | RAM → 对**用户**(AK 方案)或**角色**(ECS RAM Role 方案)授权 |
| **开通与 SLR** | 账号开通 SysOM;存在服务关联角色 `AliyunServiceRoleForSysom` | 通常 [Alinux 控制台](https://alinux.console.aliyun.com/?source=cosh) 开通,**开通流程会自动创建**该 SLR;子账号若在开通过程中报无 `ram:CreateServiceLinkedRole`,见 [service-linked-role-subaccount.md](./service-linked-role-subaccount.md) |
子账号仅负责开通时:见 [service-linked-role-subaccount.md](./service-linked-role-subaccount.md)(RAM 自定义策略与组织规范以控制台为准)。
详细步骤见 [authentication.md](./authentication.md)。InvokeDiagnosis 侧要求(云助手、控制台诊断授权等)见 [invoke-diagnosis.md](./invoke-diagnosis.md)。
### 2.1 第一次使用本 CLI(环境)
1. 将完整 **sysom-diagnosis** 目录纳入工作区(含 `scripts/`、`references/`、`SKILL.md`)。
2. 在 **sysom-diagnosis(技能根)**执行一次:`./scripts/init.sh`(安装/同步 `scripts` 下依赖;若已用 `uv`/`venv` 可跳过重复执行)。
3. 确认本机可执行:`./scripts/osops.sh --help` 或 `./scripts/osops.sh precheck`(不要求一次成功,仅确认入口可用)。
### 2.4 Agent / 多段 Bash 与凭证(重要)
**安全策略**:**禁止**在对话中向用户索要或粘贴 AccessKey / Secret。Agent 应引导用户在**本地终端**执行 **`./scripts/osops.sh configure`**,由用户在终端内交互输入,密钥**不进入聊天记录**。
在 **COSH** 中,**每次 `Bash` 工具调用往往是独立进程**;上一段的 `export` **不会**传给下一段 `./scripts/osops.sh precheck`。
| 方式 | 说明 |
|------|------|
| **唯一推荐(Agent 场景)** | 在 sysom-diagnosis(技能根)执行 **`./scripts/osops.sh configure`**,写入 `~/.aliyun/config.json`。完成后**再**执行 `./scripts/osops.sh precheck`。 |
| **进阶(自动化脚本,非聊天)** | 仅在**同一 shell 进程**内:`export ALIBABA_CLOUD_ACCESS_KEY_ID=... && export ALIBABA_CLOUD_ACCESS_KEY_SECRET=... && ./scripts/osops.sh precheck`。环境变量名亦支持 `ALICLOUD_ACCESS_KEY_*`。 |
**执行环境不支持交互式配置时**(例如 Agent 调用的 Bash 无 TTY,`configure` 无法提示输入):在 **COSH** 中可通过 **`/settings`** 使能「**交互式Shell(PTY)**」,或使用 **`/bash`** 进入交互式 Bash,再在技能根执行 **`./scripts/osops.sh configure`**。precheck 失败信封中的 **`data.guidance.credential_policy`** 含同条说明。
**禁止**引导用户「在聊天里提供 AK/SK」或「分多条消息粘贴密钥」。
### 2.2 端到端检查清单(从预检到远程诊断)
| 序号 | 动作 | 说明 |
|------|------|------|
| 1 | `./scripts/osops.sh precheck` | 会话内首次远程能力前必做;记录 `ok` 与 `error.code`。 |
| 2 | 若 `ok: false` | 按下方「分支」与场景表处理,**不要**跳过配置直接发起深度诊断。 |
| 3 | 若 `ok: true` | 再执行深度诊断:`memory … --deep-diagnosis`(维护者 OpenAPI 直调见 [invoke-diagnosis.md](./invoke-diagnosis.md))。 |
| 4 | 诊断仍失败 | 再跑一遍 precheck;读 JSON 中 `remediation`;对照 [invoke-diagnosis.md](./invoke-diagnosis.md) 核对目标 ECS(云助手、授权等)。 |
### 2.3 precheck 失败时的分支(Agent)
- **无任何有效凭证**(环境变量与 `~/.aliyun/config.json` 均不可用):走 **A-K1** 或 **E-R1**,见 [authentication.md](./authentication.md) 完整配置步骤。
- **有 AK 但无 SysOM 策略**:**A-K2** → RAM 为用户附加 `AliyunSysomFullAccess` → 等待策略生效 → 再 precheck。
- **`error.code == service_not_activated`**:**A-K3 / E-R4** → Alinux 控制台开通 SysOM(及 SLR)→ 等待 1~3 分钟 → 再 precheck。
- **在 ECS 上且 `instance-id` 可访问,但 `ram/security-credentials/` 为 404**:多为**未绑定实例 RAM 角色** → ECS 控制台绑定角色 + 角色附加 `AliyunSysomFullAccess` → 实例内 `aliyun configure --mode EcsRamRole` → 再 precheck。
- **`ecs_role_name` 有值但仍失败**:**E-R3** 或策略未生效 → RAM 中为**该角色**附加 `AliyunSysomFullAccess`,等待数分钟 → 再 precheck。
- **`~/.aliyun/config.json` 解析失败**:修复 JSON(或删除空文件后在技能根执行 `./scripts/osops.sh configure` 重建)→ 再 precheck。
## 3. AK/SK 路径:场景与处置
| 场景 ID | 条件摘要 | Agent 引导要点 |
|---------|----------|----------------|
| **A-K1** | 无 AK / 未配置 | RAM 创建用户 → 生成 AK/SK → 授权 `AliyunSysomFullAccess` → 配置环境变量或 `~/.aliyun/config.json` → **precheck** |
| **A-K2** | 有 AK,无 SysOM 策略 | RAM 为用户附加 **AliyunSysomFullAccess** → **precheck** |
| **A-K3** | 有 AK 与策略,**未开通** SysOM | Alinux 控制台开通 → **precheck**(对应 `error_code: service_not_activated`) |
| **A-K4** | 全部具备 | **precheck** 应通过 → 可发起深度诊断(`memory … --deep-diagnosis` 等) |
### 3.1 AK 路径详细步骤(按场景执行,不必全做)
**A-K1(无 AK / 未配置)**
1. 登录 [RAM 控制台](https://ram.console.aliyun.com/),创建或使用已有 **RAM 用户**(禁止使用主账号 AK)。
2. 为该用户 **创建 AccessKey**,安全保存 ID 与 Secret(仅展示一次)。
3. 为该用户 **新增授权** → 搜索并附加系统策略 **`AliyunSysomFullAccess`**。
4. 配置凭证(**勿在对话中传输密钥**):在 sysom-diagnosis(技能根)执行 **`./scripts/osops.sh configure`**;或按 [authentication.md](./authentication.md) 编辑 `~/.aliyun/config.json`(`mode: AK`)。进阶用户可在**同一 shell** 内使用环境变量(见 §2.4)。
5. 在 **sysom-diagnosis(技能根)**执行:`./scripts/osops.sh precheck`,确认 `ok: true`。
**A-K2(有 AK,无 SysOM 策略)**
1. RAM → 用户 → 找到该用户 → **添加权限** → 附加 **`AliyunSysomFullAccess`**。
2. 等待约 **2~5 分钟** 策略生效。
3. 再执行 `./scripts/osops.sh precheck`。
**A-K3(有 AK 与策略,未开通 SysOM)**
1. 使用有权限的账号打开 [Alinux / SysOM 开通入口](https://alinux.console.aliyun.com/?source=cosh),按页面完成 **SysOM 开通**;开通向导通常会**自动**创建服务关联角色 **AliyunServiceRoleForSysom**,一般无需在控制台外单独创建。
2. 子账号若在开通步骤报无 `CreateServiceLinkedRole` 等权限,见 [service-linked-role-subaccount.md](./service-linked-role-subaccount.md)。
3. 开通后等待 **1~3 分钟**,再 `./scripts/osops.sh precheck`。
**A-K4**
1. 直接 `./scripts/osops.sh precheck` 应通过;若仍失败,对照 `error.code` 与 `agent.findings` 是否账号不一致或区域/API 异常。
## 4. ECS RAM Role 路径:场景与处置
| 场景 ID | 条件摘要 | Agent 引导要点 |
|---------|----------|----------------|
| **E-R1** | 无角色 / 未绑 ECS | RAM 创建 ECS 信任角色 → 授权 **AliyunSysomFullAccess** → ECS 绑定实例(可 OOS 批量)→ `aliyun configure --mode EcsRamRole --ram-role-name <角色名>` → **precheck** |
| **E-R2** | 有角色,**未关联**实例 | ECS 控制台绑定 RAM 角色 → **precheck** |
| **E-R3** | 已绑 ECS,角色**无** `AliyunSysomFullAccess` | RAM 为**角色**授权 → **precheck** |
| **E-R4** | 身份与策略齐,**未开通** SysOM | Alinux 开通 → **precheck** |
### 4.1 ECS RAM Role 详细步骤(按场景)
**E-R1(无角色 / 未绑 ECS)**
1. RAM → **创建角色** → 可信服务类型选 **云服务器 ECS** → 记录角色名。
2. 为该角色附加 **`AliyunSysomFullAccess`**。
3. [ECS 控制台](https://ecs.console.aliyun.com/) → 目标实例 → **更多** → **绑定 RAM 角色** → 选择上述角色。
4. SSH 登录实例,执行:
`aliyun configure --mode EcsRamRole --ram-role-name <角色名>`
(或按 [authentication.md](./authentication.md) 手写 `~/.aliyun/config.json` 的 `EcsRamRole` 段。)
5. 验证:`curl -s http://100.100.100.200/latest/meta-data/ram/security-credentials/` 应返回**一行角色名**。
6. 在 **sysom-diagnosis(技能根)**执行 `./scripts/osops.sh precheck`。
**E-R2(有角色,未关联当前实例)**
1. ECS 控制台 → 实例详情 → 绑定 RAM 角色(或更换为正确角色)。
2. 若需,在实例内重跑 `aliyun configure --mode EcsRamRole --ram-role-name <角色名>`。
3. `./scripts/osops.sh precheck`。
**E-R3(已绑 ECS,角色无 AliyunSysomFullAccess)**
1. RAM → **角色** → 选中该角色 → **新增授权** → `AliyunSysomFullAccess`。
2. 等待策略生效后 `./scripts/osops.sh precheck`。
**E-R4(与 A-K3 类似,开通 SysOM)**
1. Alinux 控制台完成 SysOM 开通(SLR 通常随开通流程自动创建)。
2. `./scripts/osops.sh precheck`。
## 5. precheck 输出如何读(给 Agent)
- `ok: true`:可继续发起深度诊断(`memory … --deep-diagnosis`);仍须满足 InvokeDiagnosis 侧:**云助手、控制台诊断授权** 等,见 [invoke-diagnosis.md](./invoke-diagnosis.md))。
- `ok: false`:
- `error.code == service_not_activated` → 对齐 **A-K3 / E-R4**:引导 Alinux 开通后再 **precheck**。
- 无有效凭证 / API 失败 → 对齐 **A-K1 / A-K2 / E-R1–E-R3**:读 `agent.findings`、`data.suggestion`,并打开 [authentication.md](./authentication.md)。
- `data.ecs_role_name` 有值但失败 → 重点检查角色是否附加 **AliyunSysomFullAccess** 及 CLI 是否为 EcsRamRole 模式。
## 6. 远程诊断命令失败时
若深度诊断(`memory … --deep-diagnosis` 等)返回失败:
1. 先让用户重新执行 **`./scripts/osops.sh precheck`**。
2. 若仍为 OpenAPI/权限类错误,对照本节与 [authentication.md](./authentication.md)。
3. 若提示实例未授权或诊断无法下发:在 **SysOM/ECS 控制台** 对目标实例完成诊断授权(勿使用已废弃的 OpenAPI 授权接口);参数与前置见 [invoke-diagnosis.md](./invoke-diagnosis.md)。
## 7. CLI 失败时的常见字段
CLI 在失败时尽量附带:
- `error.code`:机器可读
- `data.guidance` 或 `data.remediation`:有序步骤或文档指针(含 **`credential_policy`**、**`configure_command`**、**`guided_steps`**)
- `data.precheck_command`:建议复跑的预检命令
## 8. `error.code` 与处置速查
| `error.code`(示例) | 含义倾向 | 建议操作顺序 |
|---------------------|----------|--------------|
| `auth_failed` | 无有效凭证或 InitialSysom 未通过 | 先 A-K* / E-R* 与 [authentication.md](./authentication.md),再 precheck |
| `service_not_activated` | SysOM 未开通或账号侧未就绪 | Alinux 开通 → 等待 → precheck |
| (diagnosis 返回)权限类 | 三要素缺一或实例侧未授权 | precheck → openapi 本页 → invoke-diagnosis |
具体文案以当次 JSON 的 `agent.findings`、`data.remediation` 为准。
## 9. SysOM 开通与服务关联角色(摘要)
1. **主账号或有权限的 RAM 用户**登录 [Alinux 控制台](https://alinux.console.aliyun.com/?source=cosh),完成 **SysOM 产品开通**;开通流程通常会**自动**创建服务关联角色 **`AliyunServiceRoleForSysom`**。
2. 子账号若在开通向导中报无 `ram:CreateServiceLinkedRole` 等权限,详见 [service-linked-role-subaccount.md](./service-linked-role-subaccount.md)。
3. 开通完成后 **等待 1~3 分钟**,再执行 `./scripts/osops.sh precheck`。
4. 仍报未开通时:确认当前使用的 **AK 或 RAM 角色所属账号**与开通控制台登录账号一致(避免跨账号)。
## 10. 远程深度诊断命令失败时的补充步骤
1. **再次** `./scripts/osops.sh precheck`,排除凭证与开通问题。
2. 阅读本次 **stdout JSON** 全文:`error`、`data`、`agent.summary`。
3. 对照 [invoke-diagnosis.md](./invoke-diagnosis.md):地域、实例 ID、`service_name`、`channel`、目标机云助手与 **控制台诊断授权** 等前置条件。
4. 若 API 明确提示授权类错误,在控制台对目标 ECS 完成 SysOM/诊断相关授权后重试。
5. 避免在相同错误信息下仅重复相同参数调用。
FILE:references/output-format.md
# CLI 输出信封格式(sysom-diagnosis)
本文件从 SKILL.md 迁出,描述 `osops` CLI stdout JSON 信封的字段结构与读取约定。
## 信封版本
- **`format`**: `sysom_agent`
- **`version`**: `3.4`(字段契约版本;删改已承诺的 data/agent 键时递增)
## 顶层结构
| 字段 | 说明 |
|------|------|
| `ok` | 布尔,成功/失败 |
| `action` | 命令动作标识(`memory_classify`、`memory_memgraph_hint` 等) |
| `error` | 失败时含 `code`/`message`/`request_id`(若有) |
| `agent` | 供模型读取的摘要与引导(见下) |
| `data` | 业务载荷(见下) |
| `execution` | 执行元信息(`subsystem`、`phase`、`mode`) |
## `agent` 块
| 字段 | 说明 |
|------|------|
| `summary` | 一段话摘要;**`--verbose-envelope`** 展开完整版 |
| `findings` | 发现列表,每条含 `kind` 与关键指标 |
| `next` | 结构化下一步(`action_kind`/`command`/`purpose_zh`);成功深度诊断后为空 |
## `data` 块(memory quick 路径)
| 字段 | 说明 |
|------|------|
| `data.routing` | 路由结果:`recommended_service_name`/`confidence`/`categories`/`oom_signal` 等 |
| `data.local` | 本机快照:`facts`/`oom_local`/`meminfo_facts`/`rss_top_sample` |
| `data.remote` | 仅 `--deep-diagnosis` 时出现:`ok`/`task_id`/`result`/`error` |
## CLI 可调项
- **通用选项**:`--channel`、`--region`、`--instance`、`--timeout`、`--poll-interval`、`--verbose-envelope`
- **专项参数**:通过 **`--params`(JSON 字符串)或 `--params-file`(JSON 文件)** 传入 OpenAPI `params`,字段见 [diagnoses/](./diagnoses/) 对应专文
- **本机 memory quick**(未加 `--deep-diagnosis`):默认策略固定,可调项走远程专项 + `--params`
FILE:references/permission-guide.md
# 文档入口(sysom-diagnosis)
避免在 **permission** 与 **诊断契约** 之间来回重复叙述:按主题 **只读一处**。
| 需求 | 文档 |
|------|------|
| **precheck、AK/RAM Role、三要素、场景 A-K、凭证/开通类问题** | [openapi-permission-guide.md](./openapi-permission-guide.md) |
| **RAM 最小权限、Action 映射、策略模板(可直接用于自定义策略)** | [ram-policies.md](./ram-policies.md) |
| **诊断本机/远程(必做)、InvokeDiagnosis 请求体、`region`/`instance`、元数据补全;对客 CLI 为 `memory … --deep-diagnosis`(维护者 OpenAPI 直调见同文)** | [invoke-diagnosis.md](./invoke-diagnosis.md) |
| **深度诊断业务错误**(如 `InvalidParameter`、`instance not found`) | [invoke-diagnosis.md](./invoke-diagnosis.md) 与本节下说明;以 CLI 输出 **`error`** / **`data.diagnosis_target`** / **`data.read_next`** / **`data.remediation`** 为准 |
| **ECS 元数据 URL、常用 curl、IMDS** | [metadata-api.md](./metadata-api.md) |
| **各 `service_name` 的 `params` 字段** | [diagnoses/README.md](./diagnoses/README.md) → 对应 `*.md` |
| **内存域快速排查入口归纳** | [memory-routing.md](./memory-routing.md) |
| **技能能力总览表** | [SKILL.md](../SKILL.md) |
Agent 应优先根据 **`precheck` JSON** 的 `data.guidance`(及 `data.remediation`)处理 **凭证与开通**;**深度诊断**(`memory … --deep-diagnosis`)返回的业务错误对照 [invoke-diagnosis.md](./invoke-diagnosis.md) 与 **`error` / `data`**。**不要在对话中收集 AccessKey/Secret**;认证失败时引导用户在终端执行 **`configure`**:在 **`sysom-diagnosis/`**(技能根)执行 **`./scripts/osops.sh configure`**。若当前 Bash 无 PTY、无法交互输入,见 **`data.guidance.credential_policy`**。
FILE:references/ram-policies.md
# RAM Policy - SysOM 诊断技能
本文件列出 `alibabacloud-sysom-diagnosis` Skill 所需的 RAM 权限(用于远程深度诊断相关 OpenAPI)。
## 权限列表
### SysOM 诊断调用权限
| API 名称 | 权限 Action | 说明 |
|----------|-------------|------|
| `InitialSysom` | `sysom:InitialSysom` | precheck 与开通/权限校验 |
| `InvokeDiagnosis` | `sysom:InvokeDiagnosis` | 发起诊断任务 |
| `GetDiagnosisResult` | `sysom:GetDiagnosisResult` | 查询诊断任务结果 |
## 最小权限策略模板
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sysom:InitialSysom",
"sysom:InvokeDiagnosis",
"sysom:GetDiagnosisResult"
],
"Resource": "*"
}
]
}
```
## 系统策略推荐
| 策略名称 | 说明 |
|----------|------|
| AliyunSysomFullAccess | SysOM完全访问权限 |
FILE:references/service-linked-role-subaccount.md
# 子账号开通 SysOM 与服务关联角色(SLR)
子账号在 **Alinux 控制台** 开通 SysOM 时,若提示缺少 **`ram:CreateServiceLinkedRole`**(或同类 RAM 权限),需要由**主账号**或具备 RAM 管理权限的账号处理,或按组织规范为子账号授予相应权限。
**常见处置思路**(以控制台与租户策略为准):
1. 使用具备权限的账号完成 SysOM 开通(SLR 通常随开通自动创建 **`AliyunServiceRoleForSysom`**)。
2. 在 **RAM 控制台** 为子账号附加允许创建**服务关联角色**的自定义策略(策略内容需符合贵组织安全要求)。
3. 由管理员预先完成开通与 SLR 创建后,子账号仅使用 SysOM 能力。
更完整的认证与 AK/RAM Role 路径见同目录 [authentication.md](./authentication.md)、[openapi-permission-guide.md](./openapi-permission-guide.md)。
FILE:scripts/init.sh
#!/usr/bin/env bash
# OS-Ops CLI Initialization Script
# Automatically setup the shared CLI environment
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
# sysom-diagnosis/(技能根)
SKILL_ROOT="$(dirname "$SCRIPT_DIR")"
# Work in scripts directory where pyproject.toml is
cd "$SCRIPT_DIR"
echo "╔══════════════════════════════════════════════════════════════╗"
echo "║ OS-Ops CLI Initialization ║"
echo "╚══════════════════════════════════════════════════════════════╝"
echo ""
# Check Python version
echo "📋 Checking prerequisites..."
if ! command -v python3 &>/dev/null; then
echo "❌ Python 3 is required but not found"
exit 1
fi
PYTHON_VERSION=$(python3 --version | awk '{print $2}')
echo " ✓ Python $PYTHON_VERSION found"
# Check if uv is available
UV_AVAILABLE=false
if command -v uv &>/dev/null; then
UV_VERSION=$(uv --version | awk '{print $2}')
echo " ✓ uv $UV_VERSION found (recommended)"
UV_AVAILABLE=true
else
echo " ℹ uv not found (will use pip + venv)"
echo " Install uv for better experience: curl -LsSf https://astral.sh/uv/install.sh | sh"
fi
echo ""
echo "🔧 Setting up environment..."
# Method 1: Use uv (fastest)
if [ "$UV_AVAILABLE" = true ]; then
echo " Using uv for dependency management..."
# Sync dependencies
uv sync --quiet
echo ""
echo "✅ Setup complete!"
echo ""
echo "📚 Available commands:"
echo " uv run osops --help # Show all commands"
echo " uv run osops precheck # Verify SysOM credentials"
echo " uv run osops memory oom --deep-diagnosis ... # 快速排查 + 远程专项(示例)"
echo " # 维护者 OpenAPI 直调:见 references/invoke-diagnosis.md"
echo ""
echo "🎯 Quick start:"
echo " cd $SKILL_ROOT"
echo " export PYTHONPATH=\"$SCRIPT_DIR:\$PYTHONPATH\""
echo " uv run --directory scripts osops precheck"
echo " # Or use wrapper:"
echo " ./scripts/osops.sh precheck"
echo ""
exit 0
fi
# Method 2: Use pip + venv
echo " Using pip + venv..."
# Create venv if not exists
if [[ ! -d ".venv" ]]; then
python3 -m venv .venv
echo " Created virtual environment: .venv/"
fi
# Activate and install
source .venv/bin/activate
pip install --quiet --upgrade pip
pip install --quiet -e .
echo ""
echo "✅ Setup complete!"
echo ""
echo "📚 Available commands:"
echo " source .venv/bin/activate # Activate venv"
echo " osops --help # Show all commands"
echo " osops precheck # Verify SysOM credentials"
echo " osops memory classify --deep-diagnosis ... # 内存:快速 + 远程专项(示例)"
echo " # 维护者 OpenAPI 直调:见 references/invoke-diagnosis.md"
echo ""
echo "🎯 Quick start:"
echo " cd $SKILL_ROOT"
echo " export PYTHONPATH=\"$SCRIPT_DIR:\$PYTHONPATH\""
echo " # Use wrapper (recommended):"
echo " ./scripts/osops.sh precheck"
echo " # Or activate venv:"
echo " source scripts/.venv/bin/activate"
echo " osops precheck"
echo ""
FILE:scripts/osops.sh
#!/usr/bin/env bash
# OS-Ops CLI Wrapper
# Universal entry point for all osops commands
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "BASH_SOURCE[0]")" && pwd)"
# sysom-diagnosis/(技能根):scripts → .
SKILL_ROOT="$(dirname "$SCRIPT_DIR")"
# Try uv first (fastest)
if command -v uv &>/dev/null; then
exec uv run --directory "$SCRIPT_DIR" python -m sysom_cli "$@"
fi
# Try venv in scripts directory
if [[ -f "$SCRIPT_DIR/.venv/bin/python" ]]; then
export PYTHONPATH="$SCRIPT_DIR:-"
exec "$SCRIPT_DIR/.venv/bin/python" -m sysom_cli "$@"
fi
# Try installed command
if command -v osops &>/dev/null; then
exec osops "$@"
fi
# Not initialized
cat >&2 <<EOF
[ERROR] OS-Ops CLI not initialized
Please run initialization first:
cd $SKILL_ROOT
./scripts/init.sh
Or install manually:
cd $SCRIPT_DIR
pip install -e .
EOF
exit 1
FILE:scripts/pyproject.toml
[build-system]
requires = ["setuptools>=45", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "osops"
version = "1.0.0"
description = "OS Operations toolkit - Shared CLI for system operations"
readme = "README.md"
requires-python = ">=3.8"
license = {text = "MIT"}
authors = [
{name = "Sysom Team", email = "[email protected]"}
]
dependencies = [
"httpx>=0.24.0",
"requests>=2.28.0",
"alibabacloud-tea-openapi>=0.3.0",
"alibabacloud-credentials>=0.3.0",
"alibabacloud-tea-util>=0.3.0",
"alibabacloud-sysom20231230>=1.14.0",
"pydantic>=2.0.0",
]
[project.optional-dependencies]
dev = [
"pytest>=7.0",
"pytest-asyncio>=0.21",
"pytest-cov>=4.0",
"black>=23.0",
"ruff>=0.1.0",
]
[project.scripts]
osops = "sysom_cli.__main__:main"
[tool.black]
line-length = 100
target-version = ["py39"]
[tool.ruff]
line-length = 100
target-version = "py39"
FILE:scripts/requirements.txt
# Core dependencies
httpx>=0.24.0
requests>=2.28.0
# Alibaba Cloud OpenAPI SDK
alibabacloud-tea-openapi>=0.3.0
alibabacloud-credentials>=0.3.0
alibabacloud-tea-util>=0.3.0
alibabacloud-sysom20231230>=1.14.0
# Data validation
pydantic>=2.0.0
FILE:scripts/sysom_cli/__init__.py
# -*- coding: utf-8 -*-
# sysom_cli: 顶层入口;diagnosis / am / precheck 等,子命令由各模块注册。
FILE:scripts/sysom_cli/__main__.py
# -*- coding: utf-8 -*-
"""
入口:python -m sysom_cli [--list-capabilities] [top_cmd] [sub_cmd] ...
"""
from __future__ import annotations
import argparse
import sys
from argparse import Namespace
from typing import Any, Dict, List
# ============================================================================
# 命令配置和解析器构建
# ============================================================================
# 二层命令配置:(name, help, is_subsystem)
# is_subsystem=True: 有子命令的子系统(如 diagnosis)
# is_subsystem=False: 独立的顶层命令(如 precheck, version)
TOP_COMMANDS: List[Dict[str, Any]] = [
{
"name": "memory",
"help": "内存快速排查:classify / memgraph / oom / javamem;可选 --deep-diagnosis 接深度诊断。",
"is_subsystem": True,
},
{
"name": "io",
"help": "磁盘与 IO 专项:iofsstat、iodiagnose(与 service_name 对齐)。",
"is_subsystem": True,
},
{
"name": "net",
"help": "网络专项:packetdrop、netjitter(与 service_name 对齐)。",
"is_subsystem": True,
},
{
"name": "load",
"help": "系统负载与调度专项:delay、loadtask(与 service_name 对齐)。",
"is_subsystem": True,
},
{
"name": "precheck",
"help": "环境预检查:验证阿里云认证配置和 SysOM API 权限(自动发现)。",
"is_subsystem": False,
},
{
"name": "configure",
"help": "交互式将 RAM 用户 AK 写入 ~/.aliyun/config.json(跨 Shell 生效,推荐 Agent 环境)。",
"is_subsystem": False,
},
]
def build_parser() -> argparse.ArgumentParser:
"""构建顶层命令行解析器"""
ap = argparse.ArgumentParser(
prog="sysom_cli",
description="Sysom CLI:统一命令行工具,支持多级命令自动发现。",
)
ap.add_argument(
"--list-capabilities",
action="store_true",
help="列出所有子命令及支持的 local/remote/hybrid 模式",
)
ap.add_argument(
"--json-errors",
action="store_true",
help="异常时 stdout 仍输出 JSON 信封",
)
top_sub = ap.add_subparsers(dest="top_cmd", metavar="TOPCMD")
from sysom_cli.core.registry import CommandRegistry
# 先发现顶层命令
CommandRegistry.discover_commands(top_level=True)
for top_spec in TOP_COMMANDS:
name = top_spec["name"]
is_subsystem = top_spec.get("is_subsystem", False)
not_implemented = top_spec.get("not_implemented", False)
p = top_sub.add_parser(name, help=top_spec["help"])
if not_implemented:
# 预留命令,不添加子命令
continue
if is_subsystem:
# 子系统:子命令(如 memory classify、io iofsstat)
CommandRegistry.discover_commands(subsystem=name)
all_metadata = CommandRegistry.get_all_metadata(subsystem=name)
if all_metadata:
sub = p.add_subparsers(dest="sub_cmd", required=True, metavar="SUBCOMMAND")
for cmd_name, metadata in all_metadata.items():
subp = sub.add_parser(cmd_name, help=metadata.get("help", ""))
for flags, kw in metadata.get("args", []):
fl = flags if isinstance(flags, (list, tuple)) else [flags]
subp.add_argument(*fl, **kw)
else:
# 顶层独立命令(如 precheck)
# 从注册中心获取元数据并添加参数
try:
metadata = CommandRegistry.get_metadata(name)
for flags, kw in metadata.get("args", []):
fl = flags if isinstance(flags, (list, tuple)) else [flags]
p.add_argument(*fl, **kw)
except KeyError:
# 命令未通过 @command_metadata 注册,可能是预留命令
pass
return ap
# ============================================================================
# 能力查询
# ============================================================================
# 能力注册表:由命令自动发现机制动态获取
def get_capabilities():
"""从注册中心获取所有命令的能力"""
from sysom_cli.core.registry import CommandRegistry
CommandRegistry.discover_commands(top_level=True)
CommandRegistry.discover_commands()
capabilities = {}
for cmd_name in CommandRegistry.list_commands():
metadata = CommandRegistry.get_metadata(cmd_name)
supported_modes = metadata.get("supported_modes", {})
subsystem = CommandRegistry._command_subsystem.get(cmd_name, "__top_level__")
# 构建命令键:(subsystem, command_name) 或 (command_name,)
if subsystem == "__top_level__":
key = (cmd_name,)
else:
key = (subsystem, cmd_name)
capabilities[key] = {
"local": supported_modes.get("local", False),
"remote": supported_modes.get("remote", False),
"hybrid": supported_modes.get("hybrid", False),
}
return capabilities
def list_capabilities_json() -> str:
import json
capabilities = get_capabilities()
out = []
for key, modes in sorted(capabilities.items()):
if len(key) == 1:
# 顶层命令
cmd_str = key[0]
sub = key[0]
else:
# 子系统命令
cmd_str = f"{key[0]} {key[1]}"
sub = key[1]
out.append({
"command": cmd_str,
"sub": sub,
"local": modes["local"],
"remote": modes["remote"],
"hybrid": modes["hybrid"],
})
return json.dumps({"capabilities": out}, ensure_ascii=False, indent=2)
def list_capabilities_table() -> str:
capabilities = get_capabilities()
lines = ["command local remote hybrid", "------------------ ----- ------ ------"]
for key, modes in sorted(capabilities.items()):
if len(key) == 1:
cmd = key[0]
else:
cmd = f"{key[0]} {key[1]}"
lines.append(f"{cmd:<18} {str(modes['local']):<5} {str(modes['remote']):<6} {str(modes['hybrid'])}")
return "\n".join(lines)
# ============================================================================
# 命令执行
# ============================================================================
def run_diagnosis(sub_cmd: str, ns: Namespace) -> dict:
"""执行 diagnosis 子命令(子系统,有多个子命令)"""
from sysom_cli.core.executor import CommandExecutor
from sysom_cli.lib.schema import error_envelope
# 构建命令参数命名空间
diag_ns = Namespace(
**{k: v for k, v in vars(ns).items() if k not in ("top_cmd", "sub_cmd", "rest")}
)
diag_ns.cmd = sub_cmd
try:
# 使用统一执行器
return CommandExecutor.execute(sub_cmd, diag_ns)
except KeyError as e:
return error_envelope(
sub_cmd,
"command_not_found",
f"命令 '{sub_cmd}' 未找到: {e}",
)
except Exception as e:
return error_envelope(
sub_cmd,
"execution_error",
f"执行失败: {e}",
)
def run_top_level_command(cmd_name: str, ns: Namespace) -> dict:
"""执行顶层独立命令(如 precheck)"""
from sysom_cli.core.executor import CommandExecutor
from sysom_cli.lib.schema import error_envelope
try:
# 使用统一执行器
return CommandExecutor.execute(cmd_name, ns)
except KeyError as e:
return error_envelope(
cmd_name,
"command_not_found",
f"命令 '{cmd_name}' 未找到: {e}",
)
except Exception as e:
return error_envelope(
cmd_name,
"execution_error",
f"执行失败: {e}",
)
# ============================================================================
# 主入口
# ============================================================================
def main() -> int:
ap = build_parser()
ns = ap.parse_args()
if getattr(ns, "list_capabilities", False):
print(list_capabilities_json())
return 0
top_cmd = getattr(ns, "top_cmd", None)
if not top_cmd:
ap.print_help()
return 0
# 查找命令配置
cmd_spec = next((c for c in TOP_COMMANDS if c["name"] == top_cmd), None)
if not cmd_spec:
print(f'{{"ok": false, "error": "未知命令: {top_cmd}"}}')
return 1
# 检查是否是未实现的预留命令
if cmd_spec.get("not_implemented"):
from sysom_cli.lib.schema import FORMAT_NAME
print(
'{"format":"'
+ FORMAT_NAME
+ '","ok":false,"error":{"code":"not_implemented","message":"该命令预留,尚未实现"}}'
)
return 1
from sysom_cli.lib.schema import dumps
if cmd_spec.get("is_subsystem"):
# 子系统命令:需要子命令(如 memory classify、io iofsstat)
sub_cmd = getattr(ns, "sub_cmd", None)
if not sub_cmd:
ap.print_help()
return 0
out = run_diagnosis(sub_cmd, ns)
print(dumps(out))
return 0 if out.get("ok") else 1
else:
# 顶层独立命令(如 precheck)
result = run_top_level_command(top_cmd, ns)
# 统一输出 JSON
print(dumps(result))
return 0 if result.get("ok") else 1
if __name__ == "__main__":
raise SystemExit(main())
FILE:scripts/sysom_cli/configure/__init__.py
# -*- coding: utf-8 -*-
"""交互式写入 ~/.aliyun/config.json(AK 模式)。"""
FILE:scripts/sysom_cli/configure/command.py
# -*- coding: utf-8 -*-
"""将 RAM 用户 AK 写入 ~/.aliyun/config.json,供 osops / SDK 跨 Shell 读取。"""
from __future__ import annotations
import getpass
import json
import sys
from argparse import Namespace
from pathlib import Path
from typing import Any, Dict
from sysom_cli.core.base import BaseCommand, ExecutionMode
from sysom_cli.core.registry import command_metadata
from sysom_cli.lib.guidance import base_guidance_block, precheck_commands
def _default_config_path() -> Path:
return Path.home() / ".aliyun" / "config.json"
def _envelope_non_interactive_shell() -> Dict[str, Any]:
"""
无 PTY 时 input/getpass 触发 EOFError:返回结构化信封,避免被 __main__ 打成泛化 execution_error,
并显式提示 /settings(PTY)与 /bash(与 credential_policy 一致)。
"""
from sysom_cli.lib.schema import agent_block, envelope
bg = base_guidance_block()
pc0 = precheck_commands()[0]
remediation = [
"原因:当前 shell 无交互式输入(无 PTY),无法在本命令内读取 AccessKey(EOF)。",
"方式 1(优先):在 **COSH** 中执行 **/settings**,使能「**交互式Shell(PTY)**」,再在 sysom-diagnosis(技能根)重试: ./scripts/osops.sh configure",
"方式 2:执行 **/bash** 进入交互式 Bash,cd 到技能根后执行: ./scripts/osops.sh configure",
"方式 3:按 references/authentication.md「方式 A: 手动编辑配置文件」编辑 ~/.aliyun/config.json(mode=AK),勿在聊天粘贴 Secret",
f"配置完成后执行: {pc0}",
]
detail_lines = "\n".join(
[
"1) /settings → 开启「交互式Shell(PTY)」→ 技能根重试 ./scripts/osops.sh configure",
"2) /bash → cd <sysom-diagnosis 技能根> → ./scripts/osops.sh configure",
"3) 或手工编辑 ~/.aliyun/config.json(见 authentication.md)",
]
)
return envelope(
action="configure",
ok=False,
agent=agent_block(
status="error",
summary=(
"无 PTY,无法交互输入。请先在 /settings 使能「交互式Shell(PTY)」或使用 /bash 后再运行 configure;"
"或手工编辑 ~/.aliyun/config.json。"
),
findings=[
{
"severity": "info",
"title": "补救步骤(含 /settings)",
"detail": detail_lines,
}
],
),
data={
"remediation": remediation,
"read_next": [
"references/authentication.md",
],
"guidance": {
"credential_policy": bg["credential_policy"],
"configure_command": bg["configure_command"],
"precheck_command": bg["precheck_command"],
},
},
error={
"code": "non_interactive_shell",
"message": "EOF when reading a line(无 PTY,无法交互输入)",
},
execution={"mode": "local", "subsystem": "configure"},
)
@command_metadata(
name="configure",
help="交互式配置 RAM 用户 AccessKey 到 ~/.aliyun/config.json(mode=AK),供 precheck 使用",
args=[],
)
class ConfigureCommand(BaseCommand):
@property
def command_name(self) -> str:
return "configure"
@property
def supported_modes(self) -> Dict[str, bool]:
return {
ExecutionMode.LOCAL: True,
ExecutionMode.REMOTE: False,
ExecutionMode.HYBRID: False,
}
def execute_local(self, ns: Namespace) -> Dict[str, Any]:
from sysom_cli.lib.schema import agent_block, envelope
path = _default_config_path()
path.parent.mkdir(parents=True, exist_ok=True)
existing: Dict[str, Any] = {}
if path.exists():
try:
raw = path.read_text(encoding="utf-8").strip()
if raw:
existing = json.loads(raw)
except json.JSONDecodeError:
return envelope(
action="configure",
ok=False,
agent=agent_block(
status="error",
summary="现有 ~/.aliyun/config.json 不是合法 JSON,请先备份后删除或手工修复。",
findings=[],
),
data={"config_path": str(path)},
error={"code": "invalid_config", "message": "config.json 解析失败"},
execution={"mode": "local"},
)
print("交互式写入 ~/.aliyun/config.json(RAM 用户 AK 模式)。Secret 不会回显。", file=sys.stderr)
try:
ak_id = input("AccessKey ID: ").strip()
ak_secret = getpass.getpass("AccessKey Secret: ").strip()
region = input("Region ID [cn-hangzhou]: ").strip() or "cn-hangzhou"
except EOFError:
return _envelope_non_interactive_shell()
if not ak_id or not ak_secret:
return envelope(
action="configure",
ok=False,
agent=agent_block(
status="error",
summary="AccessKey ID 或 Secret 为空,已取消写入。",
findings=[],
),
data={},
error={"code": "empty_credentials", "message": "未输入完整 AK"},
execution={"mode": "local"},
)
profiles = existing.get("profiles")
if not isinstance(profiles, list):
profiles = []
profile_name = "default"
replaced = False
for p in profiles:
if isinstance(p, dict) and p.get("name") == profile_name:
p["mode"] = "AK"
p["access_key_id"] = ak_id
p["access_key_secret"] = ak_secret
p["region_id"] = region
p["output_format"] = p.get("output_format") or "json"
p["language"] = p.get("language") or "zh"
replaced = True
break
if not replaced:
profiles.append(
{
"name": profile_name,
"mode": "AK",
"access_key_id": ak_id,
"access_key_secret": ak_secret,
"region_id": region,
"output_format": "json",
"language": "zh",
}
)
out_doc: Dict[str, Any] = {
**{k: v for k, v in existing.items() if k not in ("current", "current_profile", "profiles")},
"current": profile_name,
"profiles": profiles,
}
path.write_text(json.dumps(out_doc, ensure_ascii=False, indent=2) + "\n", encoding="utf-8")
path.chmod(0o600)
return envelope(
action="configure",
ok=True,
agent=agent_block(
status="success",
summary=f"已写入 {path}(profile={profile_name}, mode=AK)。请执行 ./scripts/osops.sh precheck 验证。",
findings=[
{
"severity": "info",
"title": "下一步",
"detail": "在 sysom-diagnosis(技能根)执行: ./scripts/osops.sh precheck",
}
],
),
data={
"config_path": str(path),
"profile": profile_name,
"mode": "AK",
"note": "Agent 多段 Bash 中 export 不会传递到下一次命令;写入 config.json 可跨会话生效。",
},
execution={"mode": "local"},
)
FILE:scripts/sysom_cli/core/__init__.py
# -*- coding: utf-8 -*-
"""
核心框架模块
提供所有子系统(diagnosis、io 等)共享的基础设施:
- BaseCommand / RemoteOnlyCommand: 抽象基类
- CommandRegistry: 命令自动发现与注册
- CommandExecutor: 统一执行器
"""
from sysom_cli.core.base import BaseCommand, ExecutionMode, RemoteOnlyCommand
from sysom_cli.core.registry import CommandRegistry, command_metadata
from sysom_cli.core.executor import CommandExecutor
__all__ = [
"BaseCommand",
"RemoteOnlyCommand",
"ExecutionMode",
"CommandRegistry",
"command_metadata",
"CommandExecutor",
]
FILE:scripts/sysom_cli/core/base.py
# -*- coding: utf-8 -*-
"""
抽象基类定义
所有子命令必须继承 BaseCommand 并实现对应的执行方法。
"""
from __future__ import annotations
from abc import ABC, abstractmethod
from argparse import Namespace
from typing import Any, Dict
class ExecutionMode:
"""执行模式常量"""
LOCAL = "local"
REMOTE = "remote"
HYBRID = "hybrid"
class BaseCommand(ABC):
"""
命令抽象基类
每个子命令继承此类,实现对应模式的执行方法。
未实现的模式会抛出 NotImplementedError。
"""
@property
@abstractmethod
def command_name(self) -> str:
"""命令名称,如 'oom', 'classify'"""
pass
@property
def supported_modes(self) -> Dict[str, bool]:
"""
声明支持的模式
Returns:
{"local": True, "remote": False, "hybrid": False}
"""
return {
ExecutionMode.LOCAL: False,
ExecutionMode.REMOTE: False,
ExecutionMode.HYBRID: False,
}
def execute(self, ns: Namespace, mode: str) -> Dict[str, Any]:
"""
统一执行入口,根据 mode 路由到具体实现
Args:
ns: 命令行参数
mode: 执行模式 (local/remote/hybrid)
Returns:
标准 sysom_agent JSON 信封(见 ``lib.schema``)
"""
if mode not in self.supported_modes or not self.supported_modes[mode]:
return self._unsupported_mode_error(mode)
if mode == ExecutionMode.LOCAL:
return self.execute_local(ns)
elif mode == ExecutionMode.REMOTE:
from sysom_cli.lib.precheck_gate import remote_precheck_gate
ok_gate, fail_env = remote_precheck_gate()
if not ok_gate:
return fail_env # type: ignore[return-value]
return self.execute_remote(ns)
elif mode == ExecutionMode.HYBRID:
from sysom_cli.lib.precheck_gate import remote_precheck_gate
ok_gate, fail_env = remote_precheck_gate()
if not ok_gate:
return fail_env # type: ignore[return-value]
return self.execute_hybrid(ns)
else:
return self._unsupported_mode_error(mode)
def execute_local(self, ns: Namespace) -> Dict[str, Any]:
"""
Local 模式实现
流程:生成命令 → 本地执行 → 本地分析
"""
raise NotImplementedError(
f"{self.command_name} 未实现 local 模式"
)
def execute_remote(self, ns: Namespace) -> Dict[str, Any]:
"""
Remote 模式实现
流程:调用远程 API → 直接返回结果
"""
raise NotImplementedError(
f"{self.command_name} 未实现 remote 模式"
)
def execute_hybrid(self, ns: Namespace) -> Dict[str, Any]:
"""
Hybrid 模式实现
流程:调用 OpenAPI(异步+轮询)→ 本地处理 → 转换格式
"""
raise NotImplementedError(
f"{self.command_name} 未实现 hybrid 模式"
)
def _unsupported_mode_action(self) -> str:
"""信封 ``action`` 字段,用于不支持模式时的错误。"""
return f"cmd_{self.command_name}"
def _unsupported_mode_error(self, mode: str) -> Dict[str, Any]:
"""返回不支持模式的标准错误"""
# 动态导入避免循环依赖
try:
from sysom_cli.lib.schema import envelope, agent_block
return envelope(
action=self._unsupported_mode_action(),
ok=False,
agent=agent_block(
"unknown",
f"{self.command_name} 不支持 {mode} 模式"
),
error={
"code": "unsupported_mode",
"message": f"Mode '{mode}' not supported",
"supported_modes": [k for k, v in self.supported_modes.items() if v]
},
data={}
)
except ImportError:
# 降级处理:如果 schema 不可用,返回基本错误格式
return {
"format": "sysom_agent",
"ok": False,
"error": {
"code": "unsupported_mode",
"message": f"Mode '{mode}' not supported for {self.command_name}",
"supported_modes": [k for k, v in self.supported_modes.items() if v]
},
"data": {}
}
class RemoteOnlyCommand(BaseCommand):
"""仅调用远程 OpenAPI,忽略 ``MEMORY_MODE``,始终执行 ``execute_remote``。"""
def execute(self, ns: Namespace, mode: str) -> Dict[str, Any]:
from sysom_cli.lib.precheck_gate import remote_precheck_gate
ok_gate, fail_env = remote_precheck_gate()
if not ok_gate:
return fail_env # type: ignore[return-value]
return self.execute_remote(ns)
FILE:scripts/sysom_cli/core/executor.py
# -*- coding: utf-8 -*-
"""
统一命令执行器
负责:
1. 从环境变量获取执行模式(local/remote/hybrid)
2. 从注册中心获取命令实例
3. 调用命令的 execute 方法
"""
from __future__ import annotations
import os
from argparse import Namespace
from typing import Any, Dict
from sysom_cli.core.registry import CommandRegistry
from sysom_cli.lib.schema import FORMAT_NAME
class CommandExecutor:
"""统一命令执行器"""
@staticmethod
def get_execution_mode() -> str:
"""
从环境变量获取执行模式
环境变量:MEMORY_MODE (local/remote/hybrid)
默认:local
"""
mode = os.environ.get("MEMORY_MODE", "local").lower()
if mode not in ["local", "remote", "hybrid"]:
mode = "local"
return mode
@staticmethod
def execute(command_name: str, ns: Namespace) -> Dict[str, Any]:
"""
执行命令
Args:
command_name: 命令名称
ns: 参数命名空间
Returns:
标准 JSON 信封
"""
try:
command = CommandRegistry.get(command_name)
except KeyError as e:
return {
"format": FORMAT_NAME,
"ok": False,
"error": {
"code": "command_not_found",
"message": str(e)
},
"data": {}
}
mode = CommandExecutor.get_execution_mode()
return command.execute(ns, mode)
FILE:scripts/sysom_cli/core/registry.py
# -*- coding: utf-8 -*-
"""
命令注册中心
支持自动发现和注册命令:
- 扫描子系统下的所有子命令目录
- 自动导入 command.py 模块
- 通过 @command_metadata 装饰器注册
"""
from __future__ import annotations
import importlib
import warnings
from pathlib import Path
from typing import Any, Dict, List
from sysom_cli.core.base import BaseCommand
class CommandRegistry:
"""
命令注册中心,支持自动发现和注册
自动扫描子系统目录下的所有子命令,
发现标记为 @command_metadata 的命令类并注册。
"""
_commands: Dict[str, BaseCommand] = {}
_metadata: Dict[str, Dict[str, Any]] = {}
_command_subsystem: Dict[str, str] = {} # 记录命令所属的子系统
_discovered: bool = False
@classmethod
def register(cls, command: BaseCommand, metadata: Dict[str, Any] = None, subsystem: str = None):
"""
注册命令
Args:
command: 命令实例
metadata: 命令元数据(参数定义、帮助信息等)
subsystem: 所属子系统(如 'diagnosis'),顶层命令为 None
"""
cls._commands[command.command_name] = command
if metadata:
cls._metadata[command.command_name] = metadata
cls._command_subsystem[command.command_name] = subsystem or "__top_level__"
@classmethod
def get(cls, command_name: str) -> BaseCommand:
"""获取命令实例"""
cls._ensure_discovered()
if command_name not in cls._commands:
raise KeyError(f"Command '{command_name}' not registered")
return cls._commands[command_name]
@classmethod
def get_metadata(cls, command_name: str) -> Dict[str, Any]:
"""获取命令元数据"""
cls._ensure_discovered()
return cls._metadata.get(command_name, {})
@classmethod
def get_subsystem(cls, command_name: str) -> str | None:
"""子系统名:如 io / net / load / diagnosis;未知返回 None。"""
cls._ensure_discovered()
v = cls._command_subsystem.get(command_name)
if v in (None, "__top_level__"):
return None
return v
@classmethod
def list_commands(cls) -> List[str]:
"""列出所有已注册命令"""
cls._ensure_discovered()
return list(cls._commands.keys())
@classmethod
def get_all_metadata(cls, subsystem: str = None) -> Dict[str, Dict[str, Any]]:
"""
获取所有命令的元数据
Args:
subsystem: 子系统名称,如果指定则只返回该子系统的命令
"""
cls._ensure_discovered()
if subsystem is None:
return cls._metadata.copy()
# 只返回指定子系统的命令
result = {}
for cmd_name, metadata in cls._metadata.items():
if cls._command_subsystem.get(cmd_name) == subsystem:
result[cmd_name] = metadata
return result
@classmethod
def discover_commands(cls, subsystem: str = None, top_level: bool = False):
"""
自动发现并注册命令
Args:
subsystem: 子系统名称(如 'diagnosis', 'io')
如果为 None 且 top_level=False,扫描所有子系统
top_level: 是否扫描顶层命令(如 precheck, version)
扫描指定子系统下的所有子命令目录,
或扫描顶层命令目录,
自动导入 command.py 并触发 @command_metadata 装饰器。
"""
if cls._discovered and subsystem is None and not top_level:
return
import sysom_cli
cli_path = Path(sysom_cli.__file__).parent
if top_level:
# 扫描顶层命令(直接在 sysom_cli/ 下有 command.py 的目录)
for cmd_dir in cli_path.iterdir():
if not cmd_dir.is_dir():
continue
if cmd_dir.name.startswith(("_", ".")):
continue
# 排除特殊目录:core, lib, auth, commands, diagnosis, io 等子系统
if cmd_dir.name in (
"core",
"lib",
"auth",
"commands",
"io",
"net",
"load",
"network",
"diagnosis",
):
continue
# 检查是否有 command.py(顶层命令标志)
command_file = cmd_dir / "command.py"
if not command_file.exists():
continue
try:
# 动态导入顶层命令的 command.py
module_path = f"sysom_cli.{cmd_dir.name}.command"
importlib.import_module(module_path)
except Exception as e:
warnings.warn(
f"Failed to load top-level command '{cmd_dir.name}': {e}"
)
return
# 确定要扫描的子系统
if subsystem:
subsystems = [subsystem]
else:
# 扫描所有子系统目录(排除特殊目录)
subsystems = [
d.name for d in cli_path.iterdir()
if d.is_dir()
and not d.name.startswith(("_", "."))
and d.name not in ("core", "lib", "auth", "commands", "diagnosis")
and (d / "__init__.py").exists()
# 排除顶层命令目录(没有 __init__.py 或有 command.py)
and not (d / "command.py").exists()
]
for subsys in subsystems:
subsys_path = cli_path / subsys
if not subsys_path.exists():
continue
# 扫描子系统下的所有子命令目录
for cmd_dir in subsys_path.iterdir():
if not cmd_dir.is_dir():
continue
if cmd_dir.name.startswith(("_", ".")):
continue
# 检查是否有 command.py
command_file = cmd_dir / "command.py"
if not command_file.exists():
continue
try:
# 动态导入 command.py,触发装饰器注册
module_path = f"sysom_cli.{subsys}.{cmd_dir.name}.command"
module = importlib.import_module(module_path)
# 标记命令所属的子系统
cmd_name = cmd_dir.name
if cmd_name in cls._commands:
cls._command_subsystem[cmd_name] = subsys
except Exception as e:
warnings.warn(
f"Failed to load command '{subsys}.{cmd_dir.name}': {e}"
)
if subsystem is None and not top_level:
cls._discovered = True
@classmethod
def _ensure_discovered(cls):
"""确保已执行发现过程"""
if not cls._discovered:
cls.discover_commands()
# 命令元数据装饰器
def command_metadata(**kwargs):
"""
命令元数据装饰器
用法示例:
@command_metadata(
name="oom",
help="OOM 分析命令",
args=[
(["--log-file"], {"help": "日志文件路径", "default": None}),
(["--list-only"], {"action": "store_true", "help": "仅列出 OOM 块"}),
],
subsystem="diagnosis" # 可选,指定所属子系统
)
class OomCommand(BaseCommand):
...
Args:
name: 命令名称(可选,默认使用类的 command_name 属性)
help: 命令帮助信息
args: 参数定义列表,格式同 argparse
subsystem: 所属子系统(可选),顶层命令不需要指定
"""
def decorator(cls):
# 实例化命令
instance = cls()
# 构建元数据
metadata = {
"name": kwargs.get("name", instance.command_name),
"help": kwargs.get("help", ""),
"args": kwargs.get("args", []),
"supported_modes": instance.supported_modes,
}
# 注册到注册中心
subsystem = kwargs.get("subsystem")
CommandRegistry.register(instance, metadata, subsystem=subsystem)
return cls
return decorator
FILE:scripts/sysom_cli/diagnosis/__init__.py
# sysom_cli.diagnosis: InvokeDiagnosis 子命令
FILE:scripts/sysom_cli/diagnosis/invoke/__init__.py
# invoke 子命令
FILE:scripts/sysom_cli/diagnosis/invoke/command.py
# -*- coding: utf-8 -*-
"""发起诊断:InvokeDiagnosis + 轮询 GetDiagnosisResult(仅远程)。
不作为 CLI 子命令注册;由 ``lib.diagnosis_backend`` 与专项子命令在内部调用。
"""
from __future__ import annotations
import asyncio
import json
from argparse import Namespace
from pathlib import Path
from typing import Any, Dict, List, Tuple
from sysom_cli.core.base import ExecutionMode, RemoteOnlyCommand
from sysom_cli.core.registry import CommandRegistry
from sysom_cli.lib.diagnosis_helper import (
DiagnoseResultCode,
DiagnosisMCPHelper,
DiagnosisRequest,
DiagnosisResponse,
)
from sysom_cli.lib.diagnosis_source import (
DIAGNOSIS_SOURCE_KEY,
LEGACY_DIAGNOSIS_SOURCE_KEYS,
resolve_diagnosis_source,
)
from sysom_cli.lib.ecs_metadata import get_ecs_metadata
from sysom_cli.lib.guidance import diagnosis_subsystem_minimal_guidance
from sysom_cli.lib.invoke_envelope_finalize import finalize_diagnosis_invoke_envelope
from sysom_cli.lib.schema import agent_block, envelope
from sysom_cli.memory.lib.oom_quick import normalize_oomcheck_time_param
def _diagnosis_hint_for_invoke_failure(resp: DiagnosisResponse) -> str:
"""根据 InvokeDiagnosis 返回的原始 code/message 给出补充说明(避免误导向「仅控制台授权」)。"""
raw_msg = resp.message or ""
text = f"{raw_msg} {resp.api_business_code}".lower()
msg_lower = raw_msg.lower()
if "notowninstance" in text or "not belong" in text or "不属于" in raw_msg:
return (
"当前凭证所属账号与 params 中的 instance/region 不匹配(例如本机元数据补全的实例不属于当前 AK/SK 账号)。"
"请改用该实例所属账号的凭证,或请用户明确选择「远程实例」并提供正确的 region/instance;"
"勿用本机元数据去诊断其它账号下的机器。"
)
# InvalidParameter 常合并多种校验失败;须先于泛化 InvalidParameter 分支匹配
if "instance not found" in msg_lower or "instance not found in ecs" in msg_lower:
return (
"服务端提示在指定地域下未查询到该 ECS 实例(文案前缀如「不支持的系统版本」多为合并错误类说明,未必表示 OS 版本问题)。"
"常见真实原因:本机元数据中的 instance 不属于当前 AK/SK 所属账号、region/instance 与控制台不一致、或实例已释放。"
"请改用目标实例所属账号的凭证,或由用户提供远程实例的 region 与 instance-id。"
)
if "sysom.invalidparameter" in text or "invalidparameter" in text:
return (
"参数或诊断项与地域/实例要求不一致。请核对 region、instance、service_name 与各 diagnoses 专文;"
"并确认实例属于当前凭证账号。"
)
if "notexists" in text or "not found" in text or "未查询到实例" in raw_msg:
return "请确认 instance-id、region 是否正确,且实例存在于当前凭证所属账号。"
return (
"若未对实例做诊断授权,请在阿里云 SysOM 或 ECS 控制台为目标实例完成诊断授权"
"(勿调用已废弃的 OpenAPI 授权接口)。"
)
def _include_console_authorization_extra_hint(resp: DiagnosisResponse) -> bool:
"""是否附加「控制台诊断授权」类 remediation 尾句(实例归属错误时不应喧宾夺主)。"""
raw_msg = resp.message or ""
text = f"{raw_msg} {resp.api_business_code}".lower()
msg_lower = raw_msg.lower()
if "notowninstance" in text or "not belong" in text:
return False
if "notexists" in text or "not found" in text:
return False
if "instance not found" in msg_lower:
return False
if "sysom.invalidparameter" in text or "invalidparameter" in text:
return False
return True
def _fill_params_from_ecs_metadata(params: Dict[str, Any]) -> Tuple[Dict[str, Any], List[str]]:
"""若未显式提供 region / instance,尝试从 ECS 元数据服务补全(仅本机跑在 ECS 内网时可用)。"""
filled: list[str] = []
reg = (params.get("region") or "").strip()
if not reg:
r = get_ecs_metadata("region-id", timeout=3.0)
if r.get("ok") and (r.get("text") or "").strip():
params["region"] = str(r["text"]).strip()
filled.append("region")
inst = (params.get("instance") or "").strip()
if not inst:
i = get_ecs_metadata("instance-id", timeout=3.0)
if i.get("ok") and (i.get("text") or "").strip():
params["instance"] = str(i["text"]).strip()
filled.append("instance")
return params, filled
def _load_params(ns: Namespace) -> Dict[str, Any]:
raw = getattr(ns, "params", None)
path = getattr(ns, "params_file", None)
if raw and str(raw).strip():
return json.loads(str(raw))
if path:
p = Path(str(path))
return json.loads(p.read_text(encoding="utf-8"))
return {}
class DiagnosisInvokeCommand(RemoteOnlyCommand):
"""远程诊断发起命令"""
@property
def command_name(self) -> str:
return "invoke"
@property
def supported_modes(self) -> Dict[str, bool]:
return {
ExecutionMode.LOCAL: False,
ExecutionMode.REMOTE: True,
ExecutionMode.HYBRID: False,
}
def execute_remote(self, ns: Namespace) -> Dict[str, Any]:
def _out(**kwargs: Any) -> Dict[str, Any]:
svc = str(getattr(ns, "service_name", "") or "").strip()
sub = CommandRegistry.get_subsystem(svc) if svc else None
return finalize_diagnosis_invoke_envelope(
envelope(**kwargs), ns, cli_subsystem=sub
)
try:
params = _load_params(ns)
except (json.JSONDecodeError, OSError, TypeError) as e:
return _out(
action="diagnosis_invoke",
ok=False,
agent=agent_block("error", f"解析 params 失败: {e}"),
error={"code": "invalid_params", "message": str(e)},
data=diagnosis_subsystem_minimal_guidance(),
execution={"subsystem": "invoke", "mode": "remote"},
)
inst = getattr(ns, "instance", None)
reg = getattr(ns, "region", None)
if inst:
params["instance"] = str(inst).strip()
if reg:
params["region"] = str(reg).strip()
params, metadata_filled = _fill_params_from_ecs_metadata(params)
# 内建来源字段:不由用户 params 传入;见 diagnosis_source.resolve_diagnosis_source
for _k in LEGACY_DIAGNOSIS_SOURCE_KEYS:
params.pop(_k, None)
params.pop(DIAGNOSIS_SOURCE_KEY, None)
src, src_origin = resolve_diagnosis_source()
if src:
params[DIAGNOSIS_SOURCE_KEY] = src
region = (params.get("region") or "").strip()
if not region:
return _out(
action="diagnosis_invoke",
ok=False,
agent=agent_block(
"error",
"params 中缺少 region。若诊断本机 ECS:在实例内执行深度诊断命令且省略 --region/--instance,由 CLI 从元数据补全。"
"若诊断远程实例:须由用户提供目标 --region,禁止 Agent 自行 curl 元数据。",
),
error={"code": "missing_region", "message": "region 必填"},
data=diagnosis_subsystem_minimal_guidance(),
execution={"subsystem": "invoke", "mode": "remote"},
)
if not (params.get("instance") or "").strip():
return _out(
action="diagnosis_invoke",
ok=False,
agent=agent_block(
"error",
"params 中缺少 instance。若诊断本机 ECS:在实例内执行深度诊断命令且省略 --region/--instance,由 CLI 从元数据补全。"
"若诊断远程实例:须由用户提供目标 --instance,禁止 Agent 自行 curl 元数据。",
),
error={"code": "missing_instance", "message": "instance 必填"},
data=diagnosis_subsystem_minimal_guidance(),
execution={"subsystem": "invoke", "mode": "remote"},
)
service_name = str(getattr(ns, "service_name", "")).strip()
channel = str(getattr(ns, "channel", "ecs") or "ecs").strip()
timeout = int(getattr(ns, "timeout", 300) or 300)
poll_interval = int(getattr(ns, "poll_interval", 1) or 1)
if service_name == "oomcheck":
tv = params.get("time")
if tv is not None and str(tv).strip() != "":
params["time"] = normalize_oomcheck_time_param(str(tv))
req = DiagnosisRequest(
service_name=service_name,
channel=channel,
region=region,
params=params,
)
helper = DiagnosisMCPHelper(timeout=timeout, poll_interval=poll_interval)
try:
resp: DiagnosisResponse = asyncio.run(helper.execute(req))
except Exception as e: # noqa: BLE001
return _out(
action="diagnosis_invoke",
ok=False,
agent=agent_block("error", str(e)[:500]),
error={"code": "invoke_exception", "message": str(e)},
data=diagnosis_subsystem_minimal_guidance(),
execution={"subsystem": "invoke", "mode": "remote"},
)
if resp.code != DiagnoseResultCode.SUCCESS:
msg = (resp.message or "").strip() or resp.code
hint = _diagnosis_hint_for_invoke_failure(resp)
api_code = (resp.api_business_code or "").strip()
err_code = api_code or resp.code
dg = diagnosis_subsystem_minimal_guidance(
include_console_authorization_hint=_include_console_authorization_extra_hint(
resp
),
)
data_out: Dict[str, Any] = {
"service_name": service_name,
"channel": channel,
"region": region,
"ecs_metadata_filled": metadata_filled,
"diagnosis_source_origin": src_origin,
**dg,
}
if src:
data_out["diagnosis_source"] = src
if (resp.task_id or "").strip():
data_out["task_id"] = resp.task_id.strip()
err: Dict[str, Any] = {"code": err_code, "message": msg}
if (resp.api_request_id or "").strip():
err["request_id"] = resp.api_request_id.strip()
return _out(
action="diagnosis_invoke",
ok=False,
agent=agent_block("error", f"{msg}。{hint}"),
error=err,
data=data_out,
execution={"subsystem": "invoke", "mode": "remote"},
)
return _out(
action="diagnosis_invoke",
ok=True,
agent=agent_block(
"normal",
f"诊断完成 task_id={resp.task_id}",
),
data={
"task_id": resp.task_id,
"service_name": service_name,
"channel": channel,
"region": region,
"result": resp.result,
"ecs_metadata_filled": metadata_filled,
"diagnosis_source_origin": src_origin,
**({"diagnosis_source": src} if src else {}),
},
execution={"subsystem": "invoke", "mode": "remote", "phase": "invoke_diagnosis"},
)
FILE:scripts/sysom_cli/io/__init__.py
# -*- coding: utf-8 -*-
"""磁盘与 IO 类 SysOM 专项(薄封装 service_name)。"""
FILE:scripts/sysom_cli/io/iodiagnose/__init__.py
# -*- coding: utf-8 -*-
FILE:scripts/sysom_cli/io/iodiagnose/command.py
# -*- coding: utf-8 -*-
from __future__ import annotations
from sysom_cli.core.registry import command_metadata
from sysom_cli.lib.specialty_args import SPECIALTY_INVOKE_ARGS
from sysom_cli.lib.specialty_command import BaseServiceSpecialtyCommand
@command_metadata(
name="iodiagnose",
help="SysOM 专项 iodiagnose:IO 深度/一键诊断。params 见 references/diagnoses/iodiagnose.md",
subsystem="io",
args=list(SPECIALTY_INVOKE_ARGS),
)
class IodiagnoseCommand(BaseServiceSpecialtyCommand):
SERVICE_NAME = "iodiagnose"
FILE:scripts/sysom_cli/io/iofsstat/__init__.py
# -*- coding: utf-8 -*-
FILE:scripts/sysom_cli/io/iofsstat/command.py
# -*- coding: utf-8 -*-
from __future__ import annotations
from sysom_cli.core.registry import command_metadata
from sysom_cli.lib.specialty_args import SPECIALTY_INVOKE_ARGS
from sysom_cli.lib.specialty_command import BaseServiceSpecialtyCommand
@command_metadata(
name="iofsstat",
help="SysOM 专项 iofsstat:磁盘 IO 流量/大盘。params 见 references/diagnoses/iofsstat.md",
subsystem="io",
args=list(SPECIALTY_INVOKE_ARGS),
)
class IofsstatCommand(BaseServiceSpecialtyCommand):
SERVICE_NAME = "iofsstat"
FILE:scripts/sysom_cli/lib/__init__.py
# -*- coding: utf-8 -*-
"""
通用工具库
提供所有子系统共享的工具函数和基础类。
"""
from sysom_cli.lib.schema import (
FORMAT_NAME, FORMAT_VERSION, agent_block, envelope, dumps, error_envelope,
)
from sysom_cli.lib.kernel_log import get_kernel_log_lines
from sysom_cli.lib.log_plugin import LogScanContext, CollectContext
from sysom_cli.lib import auth # 认证模块
from sysom_cli.lib import ecs_metadata # ECS 元数据服务
from sysom_cli.lib import log_parser # 日志解析器框架
__all__ = [
"FORMAT_NAME", "FORMAT_VERSION", "agent_block", "envelope", "dumps",
"error_envelope", "get_kernel_log_lines", "LogScanContext", "CollectContext",
"auth", "ecs_metadata", "log_parser",
]
FILE:scripts/sysom_cli/lib/auth.py
"""
认证检查器
支持检查 AKSK、STS Token 和 ECS RAM Role 三种认证方式,并调用 SysOM API 验证权限
"""
import os
import json
from pathlib import Path
from typing import Any, Dict, Optional
import requests
from sysom_cli.lib.guidance import RAM_CONSOLE_URL
def check_env_credentials() -> Dict[str, Any]:
"""
检查环境变量中的 AKSK / STS Token 配置
查找以下环境变量:
- ALIBABA_CLOUD_ACCESS_KEY_ID: AccessKey ID
- ALIBABA_CLOUD_ACCESS_KEY_SECRET: AccessKey Secret
- ALIBABA_CLOUD_SECURITY_TOKEN: STS SecurityToken(可选)
返回格式:
成功时(AK/SK 都存在且非空,且可选带 STS Token):
{
"available": True,
"method": "环境变量(AKSK)" 或 "环境变量(STS Token)",
"credentials": {
"ak_id": "LTAI...",
"ak_secret": "...",
"security_token": "CAES..." # 仅 STS Token 模式
}
}
失败时:
{
"available": False,
"method": "环境变量"
}
"""
# 与阿里云 Python SDK / CLI 常见变量名对齐(多段 Shell 不共享 export 时优先用 configure 写 config.json)
ak_id = os.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID") or os.getenv("ALICLOUD_ACCESS_KEY_ID")
ak_secret = os.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET") or os.getenv("ALICLOUD_ACCESS_KEY_SECRET")
security_token = (
os.getenv("ALIBABA_CLOUD_SECURITY_TOKEN")
or os.getenv("ALICLOUD_SECURITY_TOKEN")
or os.getenv("SECURITY_TOKEN")
)
if ak_id and ak_secret:
creds: Dict[str, str] = {
"ak_id": ak_id,
"ak_secret": ak_secret,
}
method = "环境变量(AKSK)"
if security_token:
creds["security_token"] = security_token
method = "环境变量(STS Token)"
return {
"available": True,
"method": method,
"credentials": creds,
}
return {"available": False, "method": "环境变量(AKSK/STS Token)"}
def _ecs_metadata_on_ecs_instance() -> bool:
"""若能读到 instance-id,说明当前环境可访问 ECS 元数据(多半在 ECS 内网)。"""
url = "http://100.100.100.200/latest/meta-data/instance-id"
try:
r = requests.get(url, timeout=2)
return r.status_code == 200 and bool((r.text or "").strip())
except Exception:
return False
def check_ecs_metadata_role() -> Dict[str, Any]:
"""
检查 ECS 实例是否绑定了 RAM Role(通过元数据 API)。
说明:在阿里云 ECS 内,若实例**未附加实例 RAM 角色**,
``ram/security-credentials/`` 常返回 **404**,这与「元数据不可达」不同。
此时会补充探测 ``instance-id``,以区分「在 ECS 上但未绑角色」与「非 ECS / 网络不可达」。
"""
base = "http://100.100.100.200/latest/meta-data"
url = f"{base}/ram/security-credentials/"
try:
response = requests.get(url, timeout=3)
if response.status_code == 200:
role_name = response.text.strip()
if role_name:
return {
"available": True,
"role_name": role_name,
"method": "ECS元数据服务",
}
return {
"available": False,
"error": "实例未绑定 RAM 角色(security-credentials 响应体为空)",
}
if response.status_code == 404:
if _ecs_metadata_on_ecs_instance():
return {
"available": False,
"error": (
"已在 ECS 环境中(可读取 instance-id),但 ram/security-credentials/ 返回 404:"
"实例可能未附加 RAM 角色。请在 ECS 控制台为实例绑定 RAM 角色,或改用 AK/SK / 配置 aliyun CLI。"
),
}
return {
"available": False,
"error": (
"HTTP 404:未获取到 RAM 临时凭证路径(可能不在阿里云 ECS 内网,或实例未绑定 RAM 角色)"
),
}
return {
"available": False,
"error": f"HTTP {response.status_code}: 无法读取 RAM 凭证路径",
}
except requests.exceptions.Timeout:
return {
"available": False,
"error": "连接超时(可能不在阿里云 ECS 内网,或元数据 100.100.100.200 不可达)",
}
except Exception as e:
return {
"available": False,
"error": f"访问元数据服务失败: {str(e)}",
}
def check_aliyun_config() -> Dict[str, Any]:
"""
检查 ~/.aliyun/config.json 配置文件
支持四种认证模式:
1. AK 模式:mode=AK,包含 access_key_id 和 access_key_secret
2. StsToken 模式:mode=StsToken,包含 access_key_id / access_key_secret / sts_token
3. EcsRamRole 模式:mode=EcsRamRole,SDK 自动从元数据服务获取临时凭证
4. 兼容旧格式:包含 ram_role_name 字段(不推荐)
返回格式:
AK 模式成功:
{
"available": True,
"method": "配置文件(AK)",
"credentials": {
"ak_id": "LTAI...",
"ak_secret": "..."
}
}
EcsRamRole 模式成功:
{
"available": True,
"method": "配置文件(EcsRamRole)",
"needs_metadata": True # 需要调用 check_ecs_metadata_role() 获取角色名
}
旧格式 RAM Role:
{
"available": True,
"method": "配置文件(RAM Role)",
"ram_role": "角色名"
}
失败时:
{
"available": False,
"method": "配置文件",
"error": "错误描述"
}
"""
config_path = Path.home() / ".aliyun" / "config.json"
if not config_path.exists():
return {"available": False, "method": "配置文件"}
try:
config = json.loads(config_path.read_text())
# 支持 current_profile 或 current 字段
profile_name = config.get("current_profile") or config.get("current") or "default"
for profile in config.get("profiles", []):
if profile.get("name") == profile_name:
mode = profile.get("mode", "AK")
mode_norm = str(mode).strip().lower()
# AK 模式
if mode_norm == "ak" and "access_key_id" in profile:
return {
"available": True,
"method": "配置文件(AK)",
"credentials": {
"ak_id": profile["access_key_id"],
"ak_secret": profile.get("access_key_secret", "")
}
}
# StsToken 模式
elif mode_norm == "ststoken" and "access_key_id" in profile:
sts_token = (
profile.get("sts_token")
or profile.get("security_token")
or profile.get("access_key_sts_token")
)
if not sts_token:
return {
"available": False,
"method": "配置文件(StsToken)",
"error": "StsToken 模式缺少 sts_token/security_token",
}
return {
"available": True,
"method": "配置文件(StsToken)",
"credentials": {
"ak_id": profile["access_key_id"],
"ak_secret": profile.get("access_key_secret", ""),
"security_token": sts_token,
},
}
# EcsRamRole 模式
elif mode_norm == "ecsramrole":
return {
"available": True,
"method": "配置文件(EcsRamRole)",
"needs_metadata": True # 需要从元数据服务获取角色名
}
# 兼容旧的 RAM Role 配置(有 ram_role_name 字段)
elif "ram_role_name" in profile:
return {
"available": True,
"method": "配置文件(RAM Role)",
"ram_role": profile["ram_role_name"]
}
return {"available": False, "method": "配置文件", "error": "未找到有效的 profile"}
except Exception as e:
return {
"available": False,
"method": "配置文件",
"error": f"解析配置文件失败: {str(e)}"
}
def get_ecs_ram_credentials(role_name: str) -> Dict[str, Any]:
"""
从 ECS 元数据服务获取 RAM Role 的 STS 临时凭证
通过访问 ECS 元数据 API (http://100.100.100.200/latest/meta-data/ram/security-credentials/<role_name>)
获取由 STS (Security Token Service) 签发的临时访问凭证,包含:
- 临时 AccessKey ID (以 STS. 开头)
- 临时 AccessKey Secret
- SecurityToken(用于验证临时凭证的有效性)
这些临时凭证会自动轮换,有效期通常为 6 小时。
Args:
role_name: RAM 角色名称
Returns:
dict: 凭证获取结果
成功时:
{
"available": True,
"credentials": {
"ak_id": "STS.xxx", # 临时 AccessKey ID
"ak_secret": "xxx", # 临时 AccessKey Secret
"security_token": "CAES+xxx" # STS SecurityToken
}
}
失败时:
{
"available": False,
"error": "错误描述"
}
"""
url = f"http://100.100.100.200/latest/meta-data/ram/security-credentials/{role_name}"
try:
response = requests.get(url, timeout=3)
if response.status_code == 200:
creds = response.json()
return {
"available": True,
"credentials": {
"ak_id": creds["AccessKeyId"],
"ak_secret": creds["AccessKeySecret"],
"security_token": creds["SecurityToken"]
}
}
else:
return {
"available": False,
"error": f"HTTP {response.status_code}: 无法获取凭证"
}
except requests.exceptions.Timeout:
return {
"available": False,
"error": "连接超时(可能不在 ECS 环境中)"
}
except Exception as e:
return {
"available": False,
"error": f"获取凭证失败: {str(e)}"
}
def _extract_initial_sysom_role_exist(data: Any) -> Optional[bool]:
"""
从 InitialSysom 返回的 data 解析 role_exist。
返回 True/False 表示布尔值;字段不存在时返回 None(不强制据此判失败,兼容旧响应)。
"""
if data is None:
return None
if isinstance(data, dict):
if "role_exist" not in data:
return None
return bool(data["role_exist"])
if hasattr(data, "role_exist"):
v = getattr(data, "role_exist", None)
return bool(v) if v is not None else None
if hasattr(data, "to_map"):
m = data.to_map()
if isinstance(m, dict):
raw = m.get("role_exist")
if raw is None:
raw = m.get("RoleExist")
if raw is None:
return None
return bool(raw)
return None
def test_sysom_api(credentials: Dict[str, str]) -> Dict[str, Any]:
"""
调用 InitialSysom API 测试权限
支持两种认证模式:
1. AKSK 模式:credentials 包含 ak_id 和 ak_secret
2. RAM Role 模式(STS 临时凭证):credentials 包含 ak_id、ak_secret 和 security_token
Args:
credentials: 认证凭证字典
- ak_id (str): AccessKey ID(AKSK 模式)或临时 AccessKey ID(STS 模式)
- ak_secret (str): AccessKey Secret 或临时 AccessKey Secret
- security_token (str, optional): STS SecurityToken(仅 RAM Role 模式需要)
Returns:
dict: API 调用结果
- success (bool): 是否成功
- response (dict): 成功时的响应数据
- error (str): 失败时的错误信息
"""
try:
from alibabacloud_sysom20231230.client import Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_sysom20231230 import models as sysom_models
# 创建配置
config = open_api_models.Config(
access_key_id=credentials["ak_id"],
access_key_secret=credentials["ak_secret"],
endpoint="sysom.cn-hangzhou.aliyuncs.com",
user_agent="AlibabaCloud-Agent-Skills/alibabacloud-sysom-diagnosis",
)
# RAM Role 模式:设置 STS SecurityToken
if "security_token" in credentials:
config.security_token = credentials["security_token"]
# 创建客户端
client = Client(config)
# 构造请求
request = sysom_models.InitialSysomRequest()
# 调用 API
response = client.initial_sysom(request)
# 检查响应
if response and response.body:
# 检查是否返回了角色信息(判断 SysOM 服务是否已开通)
# InitialSysom API 成功返回,但如果没有角色信息,说明服务未开通
data = response.body.data if hasattr(response.body, 'data') else None
# 如果 data 为空或 None,说明 SysOM 服务未开通
if not data:
return {
"success": False,
"error": "SysOM 服务未开通",
"error_code": "service_not_activated",
"detail": "InitialSysom API 调用成功,但返回的角色信息为空,请先开通 SysOM 服务"
}
role_exist = _extract_initial_sysom_role_exist(data)
if role_exist is False:
return {
"success": False,
"error": "SysOM 服务关联角色未创建或未就绪",
"error_code": "sysom_role_not_exist",
"detail": (
"InitialSysom 返回 data.role_exist=false:请在 Alinux 控制台完成 SysOM 开通"
"(开通流程通常会**自动**创建 AliyunServiceRoleForSysom);若刚完成开通,等待 1~3 分钟后再执行 precheck"
),
}
# 成功且服务已开通(role_exist 为 True,或未返回该字段时沿用非空 data 判断)
return {
"success": True,
"response": {
"RequestId": response.body.request_id if hasattr(response.body, 'request_id') else None,
"Data": data
}
}
return {
"success": False,
"error": "InitialSysom 返回无效:缺少响应体或 body",
"error_code": "initial_sysom_invalid_response",
"detail": "InitialSysom 未返回可用 data,请确认 SysOM 已开通且账号侧正常,或稍后重试",
}
except Exception as e:
error_msg = str(e)
# 解析常见错误
if "InvalidAccessKeyId" in error_msg or "Specified access key is not found" in error_msg:
return {
"success": False,
"error": "AccessKey 无效或不存在",
"error_code": "invalid_access_key",
"detail": error_msg
}
elif "Forbidden" in error_msg or "Permission" in error_msg or "NoPermission" in error_msg:
return {
"success": False,
"error": "权限不足,需要 AliyunSysomFullAccess 权限",
"error_code": "insufficient_permissions",
"detail": error_msg
}
elif "InvalidAction.NotFound" in error_msg or "404" in error_msg:
return {
"success": False,
"error": "API 不可用或 endpoint 配置错误",
"error_code": "api_not_found",
"detail": error_msg
}
else:
return {
"success": False,
"error": f"API 调用失败: {error_msg[:100]}",
"error_code": "api_call_failed",
"detail": error_msg
}
def generate_help_message(
*,
include_aksk: bool = True,
include_ram_role: bool = True,
include_permission: bool = True,
) -> Dict[str, str]:
"""生成帮助信息;可按场景省略块,避免与信封 remediation 重复堆叠。"""
aksk = (
"方式 1: 使用 AccessKey (AKSK)\n"
"A. 推荐(跨 Shell / Agent):./scripts/osops.sh configure,写入 ~/.aliyun/config.json\n"
"B. 环境变量(须在同一 shell 进程内与 precheck 一起执行):\n"
" export ALIBABA_CLOUD_ACCESS_KEY_ID='your-ak-id'\n"
" export ALIBABA_CLOUD_ACCESS_KEY_SECRET='your-ak-secret'\n"
" export ALIBABA_CLOUD_SECURITY_TOKEN='your-sts-token' # 可选,使用 STS 临时凭证时必填\n"
" (亦支持 ALICLOUD_ACCESS_KEY_ID / ALICLOUD_ACCESS_KEY_SECRET)\n"
"C. 或手工编辑 ~/.aliyun/config.json(mode=AK 或 mode=StsToken)"
)
ram_role = (
"方式 2: 使用 ECS RAM Role(推荐)\n"
"1. 在 ECS 控制台为实例绑定 RAM 角色\n"
"2. 配置 ~/.aliyun/config.json 指定 ram_role_name"
)
ram_role_compact = (
"方式 2: ECS RAM Role — 完整步骤见 references/authentication.md(ECS RAM Role)。"
)
permission = (
"重要: 授予权限\n"
"无论使用哪种方式,都需要授予 AliyunSysomFullAccess 权限\n"
f"RAM 控制台: {RAM_CONSOLE_URL}"
)
out: Dict[str, str] = {}
if include_aksk:
out["aksk"] = aksk
if include_ram_role:
out["ram_role"] = ram_role if include_aksk else ram_role_compact
if include_permission:
out["permission"] = permission
return out
def resolve_sysom_auth(*, verify_api: bool = True) -> Dict[str, Any]:
"""
解析 SysOM 访问凭证(供 OpenAPI 客户端等使用)。
优先级与 ``run_precheck`` 一致:环境变量 AKSK → ~/.aliyun/config.json
(EcsRamRole / 配置文件 RAM Role / AK)。环境变量存在但校验失败时仍会尝试配置文件。
Args:
verify_api: 为 True 时在拿到候选凭证后调用 InitialSysom 校验;为 False 则仅解析配置/元数据。
Returns:
成功: ``ok``、``method``、``credentials``、``message``;RAM 场景另有 ``role``。
失败: ``ok``、``error``、``help``;凭证可用但服务未开通时含 ``error_code``。
"""
metadata_result = check_ecs_metadata_role()
service_not_activated_error: Optional[Dict[str, Any]] = None
last_sysom_api_failure: Optional[Dict[str, Any]] = None
def _try_accept(
method: str,
credentials: Dict[str, str],
*,
role: Optional[str] = None,
) -> Optional[Dict[str, Any]]:
nonlocal service_not_activated_error, last_sysom_api_failure
if verify_api:
api_result = test_sysom_api(credentials)
if not api_result["success"]:
if api_result.get("error_code") in (
"service_not_activated",
"sysom_role_not_exist",
):
service_not_activated_error = api_result
else:
last_sysom_api_failure = api_result
return None
out: Dict[str, Any] = {
"ok": True,
"method": method,
"credentials": credentials,
"message": (
"认证验证成功,拥有 SysOM 访问权限"
if verify_api
else "已解析凭证(未调用 SysOM API 验证权限)"
),
}
if role is not None:
out["role"] = role
return out
env_result = check_env_credentials()
if env_result["available"]:
env_method = (
"环境变量 STS Token"
if env_result.get("credentials", {}).get("security_token")
else "环境变量 AKSK"
)
got = _try_accept(env_method, env_result["credentials"])
if got:
return got
config_result = check_aliyun_config()
if config_result["available"]:
if config_result.get("needs_metadata"):
if metadata_result["available"]:
ram_role = metadata_result["role_name"]
ram_creds = get_ecs_ram_credentials(ram_role)
if ram_creds["available"]:
got = _try_accept(
f"ECS RAM Role ({ram_role})",
ram_creds["credentials"],
role=ram_role,
)
if got:
return got
elif config_result["method"].endswith("(RAM Role)"):
ram_role = config_result["ram_role"]
ram_creds = get_ecs_ram_credentials(ram_role)
if ram_creds["available"]:
got = _try_accept(
f"ECS RAM Role ({ram_role})",
ram_creds["credentials"],
role=ram_role,
)
if got:
return got
else:
got = _try_accept("配置文件 AKSK", config_result["credentials"])
if got:
return got
if service_not_activated_error:
return {
"ok": False,
"error": service_not_activated_error["error"],
"error_code": service_not_activated_error.get("error_code"),
"help": generate_help_message(),
}
if last_sysom_api_failure:
return {
"ok": False,
"error": last_sysom_api_failure["error"],
"error_code": last_sysom_api_failure.get("error_code"),
"help": generate_help_message(),
}
return {
"ok": False,
"error": "未找到有效的认证配置",
"help": generate_help_message(),
}
def run_precheck() -> Dict[str, Any]:
"""
执行完整的认证检查流程
Returns:
dict: 检查结果
- ok (bool): 是否通过检查
- method (str): 成功的认证方式
- message (str): 结果消息
- checked (list): 已检查的方式列表
- error (str): 错误信息
- help (dict): 帮助信息
- ecs_role_name (str): 如果在 ECS 上检测到绑定的角色名
"""
check_results = []
ecs_role_name = None
service_not_activated_error = None # 记录"服务未开通"错误
last_sysom_api_failure: Optional[Dict[str, Any]] = None # InitialSysom 其他失败(含无效响应)
# 0. 首先检查是否在 ECS 环境并绑定了 RAM Role
metadata_result = check_ecs_metadata_role()
if metadata_result["available"]:
ecs_role_name = metadata_result["role_name"]
check_results.append({
"method": "ECS元数据",
"status": f"✓ 实例已绑定 RAM 角色: {ecs_role_name}"
})
else:
check_results.append({
"method": "ECS元数据",
"status": f"✗ {metadata_result.get('error', '未检测到')}"
})
# 1. 检查环境变量 AKSK
env_result = check_env_credentials()
if env_result["available"]:
api_result = test_sysom_api(env_result["credentials"])
if api_result["success"]:
# 成功时只返回成功的方式,不返回其他检查结果
env_method = (
"环境变量 STS Token"
if env_result.get("credentials", {}).get("security_token")
else "环境变量 AKSK"
)
return {
"ok": True,
"method": env_method,
"message": "认证验证成功,拥有 SysOM 访问权限"
}
# 记录"服务未开通"错误(优先级最高)
if api_result.get("error_code") in (
"service_not_activated",
"sysom_role_not_exist",
):
service_not_activated_error = api_result
else:
last_sysom_api_failure = api_result
check_results.append({
"method": "环境变量 AKSK",
"status": f"✗ {api_result.get('error', 'API 调用失败')}"
})
else:
check_results.append({
"method": "环境变量 AKSK",
"status": "✗ 未配置"
})
# 2. 检查阿里云配置文件
config_result = check_aliyun_config()
if config_result["available"]:
# 2.1 EcsRamRole 模式(需要从元数据获取角色)
if config_result.get("needs_metadata"):
if metadata_result["available"]:
ram_role = metadata_result["role_name"]
ram_creds = get_ecs_ram_credentials(ram_role)
if ram_creds["available"]:
api_result = test_sysom_api(ram_creds["credentials"])
if api_result["success"]:
# 成功时只返回成功的方式
return {
"ok": True,
"method": f"ECS RAM Role ({ram_role})",
"role": ram_role,
"message": "认证验证成功,拥有 SysOM 访问权限"
}
# 记录"服务未开通"错误
if api_result.get("error_code") in (
"service_not_activated",
"sysom_role_not_exist",
):
service_not_activated_error = api_result
else:
last_sysom_api_failure = api_result
check_results.append({
"method": f"配置文件 EcsRamRole ({ram_role})",
"status": f"✗ {api_result.get('error', 'API 调用失败')}"
})
else:
check_results.append({
"method": f"配置文件 EcsRamRole ({ram_role})",
"status": f"✗ {ram_creds.get('error', '获取临时凭证失败')}"
})
else:
check_results.append({
"method": "配置文件 EcsRamRole",
"status": "✗ 配置为 EcsRamRole 模式,但实例未绑定 RAM 角色"
})
# 2.2 RAM Role 模式(配置文件指定了 ram_role_name)
elif config_result["method"].endswith("(RAM Role)"):
ram_role = config_result["ram_role"]
ram_creds = get_ecs_ram_credentials(ram_role)
if ram_creds["available"]:
api_result = test_sysom_api(ram_creds["credentials"])
if api_result["success"]:
# 成功时只返回成功的方式
return {
"ok": True,
"method": f"ECS RAM Role ({ram_role})",
"role": ram_role,
"message": "认证验证成功,拥有 SysOM 访问权限"
}
# 记录"服务未开通"错误
if api_result.get("error_code") in (
"service_not_activated",
"sysom_role_not_exist",
):
service_not_activated_error = api_result
else:
last_sysom_api_failure = api_result
check_results.append({
"method": f"配置文件 RAM Role ({ram_role})",
"status": f"✗ {api_result.get('error', 'API 调用失败')}"
})
else:
check_results.append({
"method": f"配置文件 RAM Role ({ram_role})",
"status": f"✗ {ram_creds.get('error', '获取临时凭证失败')}"
})
# 2.3 StsToken 模式
elif config_result["method"] == "配置文件(StsToken)":
api_result = test_sysom_api(config_result["credentials"])
if api_result["success"]:
return {
"ok": True,
"method": "配置文件 STS Token",
"message": "认证验证成功,拥有 SysOM 访问权限"
}
if api_result.get("error_code") in (
"service_not_activated",
"sysom_role_not_exist",
):
service_not_activated_error = api_result
else:
last_sysom_api_failure = api_result
check_results.append({
"method": "配置文件 STS Token",
"status": f"✗ {api_result.get('error', 'API 调用失败')}"
})
# 2.4 AK 模式
else:
api_result = test_sysom_api(config_result["credentials"])
if api_result["success"]:
# 成功时只返回成功的方式
return {
"ok": True,
"method": "配置文件 AKSK",
"message": "认证验证成功,拥有 SysOM 访问权限"
}
# 记录"服务未开通"错误
if api_result.get("error_code") in (
"service_not_activated",
"sysom_role_not_exist",
):
service_not_activated_error = api_result
else:
last_sysom_api_failure = api_result
check_results.append({
"method": "配置文件 AKSK",
"status": f"✗ {api_result.get('error', 'API 调用失败')}"
})
else:
check_results.append({
"method": "配置文件",
"status": f"✗ {config_result.get('error', '未配置或配置无效')}"
})
# 所有方式都失败,生成详细的错误信息
# 优先返回"服务未开通"错误(如果检测到)
if service_not_activated_error:
ec = service_not_activated_error.get("error_code")
if ec == "sysom_role_not_exist":
sug = (
"InitialSysom 指示服务关联角色尚未就绪。请在 Alinux/SysOM 控制台完成 SysOM 开通"
"(开通流程通常会**自动**创建 AliyunServiceRoleForSysom);若已开通,等待 1~3 分钟再执行 precheck。"
)
else:
sug = (
"认证配置正确,但需要先开通 SysOM 服务。"
"请访问 https://alinux.console.aliyun.com/?source=cosh 完成服务开通。"
)
return {
"ok": False,
"checked": check_results,
"error": service_not_activated_error["error"],
"error_code": service_not_activated_error["error_code"],
"ecs_role_name": ecs_role_name,
"suggestion": sug,
"help": generate_help_message()
}
if last_sysom_api_failure:
sug = None
help_msg = generate_help_message()
if ecs_role_name:
sug = (
"InitialSysom 未通过:请以 JSON 信封中 data.remediation 与 path_summary 为准逐条修复;"
"勿在对话中传输密钥。"
)
help_msg = generate_help_message(
include_aksk=False,
include_ram_role=True,
include_permission=True,
)
return {
"ok": False,
"checked": check_results,
"error": last_sysom_api_failure["error"],
"error_code": last_sysom_api_failure.get("error_code", "api_call_failed"),
"ecs_role_name": ecs_role_name,
"detail": last_sysom_api_failure.get("detail"),
"suggestion": sug,
"help": help_msg,
}
# 其他认证失败情况
error_msg = "未找到有效的认证配置"
suggestion = None
help_msg = generate_help_message()
if ecs_role_name:
suggestion = (
"实例已绑定 RAM 角色:请以 JSON 信封中 data.remediation 与 path_summary 为唯一修复主线;"
"勿在对话中传输密钥。"
)
help_msg = generate_help_message(
include_aksk=False,
include_ram_role=True,
include_permission=True,
)
return {
"ok": False,
"checked": check_results,
"error": error_msg,
"ecs_role_name": ecs_role_name,
"suggestion": suggestion,
"help": help_msg,
}
FILE:scripts/sysom_cli/lib/diagnosis_backend.py
# -*- coding: utf-8 -*-
"""SysOM 专项后端接缝:默认走 DiagnosisInvokeCommand,可替换为异步任务等实现。"""
from __future__ import annotations
from abc import ABC, abstractmethod
from argparse import Namespace
from typing import Any, Dict
from sysom_cli.diagnosis.invoke.command import DiagnosisInvokeCommand
def namespace_for_diagnosis_invoke(service_name: str, ns: Namespace) -> Namespace:
"""从任意子命令 Namespace 构造 invoke 所需参数。"""
return Namespace(
service_name=service_name,
channel=str(getattr(ns, "channel", None) or "ecs").strip() or "ecs",
params=getattr(ns, "params", None),
params_file=getattr(ns, "params_file", None),
instance=getattr(ns, "instance", None),
region=getattr(ns, "region", None),
timeout=int(getattr(ns, "timeout", 300) or 300),
poll_interval=int(getattr(ns, "poll_interval", 1) or 1),
verbose_envelope=bool(getattr(ns, "verbose_envelope", False)),
)
class DiagnosisBackend(ABC):
"""发起 SysOM 专项(当前默认:InvokeDiagnosis + 轮询)。"""
@abstractmethod
def invoke_specialty(self, service_name: str, ns: Namespace) -> Dict[str, Any]:
...
class DefaultDiagnosisBackend(DiagnosisBackend):
def invoke_specialty(self, service_name: str, ns: Namespace) -> Dict[str, Any]:
invoke_ns = namespace_for_diagnosis_invoke(service_name, ns)
return DiagnosisInvokeCommand().execute_remote(invoke_ns)
_default_backend: DiagnosisBackend | None = None
def get_diagnosis_backend() -> DiagnosisBackend:
global _default_backend
if _default_backend is None:
_default_backend = DefaultDiagnosisBackend()
return _default_backend
def set_diagnosis_backend(backend: DiagnosisBackend | None) -> None:
"""测试或替换实现时注入。"""
global _default_backend
_default_backend = backend
FILE:scripts/sysom_cli/lib/diagnosis_helper.py
"""诊断 Helper:参数模型、调用 InvokeDiagnosis / 轮询 GetDiagnosisResult。
仅依赖 ``openapi_client.SysomOpenAPIClient``,无额外 Helper 基类。
"""
from __future__ import annotations
import asyncio
import json
from typing import Any, Dict, Optional, Tuple
from pydantic import BaseModel, Field
from Tea.model import TeaModel
from .openapi_client import SysomOpenAPIClient, sysom_openapi_client
__all__ = [
"DiagnoseResultCode",
"DiagnosisRequest",
"DiagnosisResponse",
]
class DiagnoseResultCode:
"""诊断结果状态码常量"""
SUCCESS = "Success"
TASK_CREATE_FAILED = "TaskCreateFailed"
TASK_EXECUTE_FAILED = "TaskExecuteFailed"
TASK_TIMEOUT = "TaskTimeout"
RESULT_PARSE_FAILED = "ResultParseFailed"
GET_RESULT_FAILED = "GetResultFailed"
class DiagnosisRequest(BaseModel):
"""诊断请求参数。"""
service_name: str = Field(..., description="诊断服务名称")
channel: str = Field(..., description="诊断通道")
region: str = Field(..., description="地域")
params: Dict[str, Any] = Field(default_factory=dict, description="诊断参数")
class DiagnosisResponse(BaseModel):
"""诊断响应(含通用 code / message,等价于原 OpenAPIResponse + 诊断字段)。"""
code: str = Field(..., description="业务状态码")
message: str = Field(default="", description="说明")
task_id: str = Field(default="", description="任务ID")
result: Dict[str, Any] = Field(default_factory=dict, description="诊断结果")
# InvokeDiagnosis 失败时由客户端填入,便于 JSON 透传服务端原始信息
api_business_code: str = Field(default="", description="InvokeDiagnosis 响应体中的业务 code")
api_request_id: str = Field(default="", description="InvokeDiagnosis 响应体中的 request_id")
def _normalize_map(obj: Any) -> Any:
"""递归将 TeaModel / 嵌套结构转为可序列化 dict。"""
if obj is None:
return None
if isinstance(obj, TeaModel):
return _normalize_map(obj.to_map())
if isinstance(obj, dict):
return {k: _normalize_map(v) for k, v in obj.items()}
if isinstance(obj, list):
return [_normalize_map(x) for x in obj]
return obj
def _extract_invoke_failure_message(norm: Dict[str, Any]) -> Tuple[str, str, str]:
"""
从 InvokeDiagnosis 响应体提取可读错误与原始 code / request_id。
Returns:
(message, api_business_code, request_id)
"""
if not norm:
return ("", "", "")
req_id = str(
norm.get("request_id")
or norm.get("RequestId")
or norm.get("requestId")
or ""
).strip()
biz = str(norm.get("code") or norm.get("Code") or "").strip()
msg = str(norm.get("message") or norm.get("Message") or "").strip()
data = norm.get("data")
if isinstance(data, dict):
msg = msg or str(data.get("message") or data.get("Message") or "").strip()
biz = biz or str(data.get("code") or data.get("Code") or "").strip()
recommend = norm.get("Recommend") or norm.get("recommend")
if not msg and recommend:
msg = str(recommend).strip()
parts: list[str] = []
if biz and msg:
parts.append(f"{biz}: {msg}")
elif msg:
parts.append(msg)
elif biz:
parts.append(f"业务错误(code={biz}),服务端未返回 message 字段")
if not parts:
try:
parts.append(json.dumps(norm, ensure_ascii=False)[:2000])
except (TypeError, ValueError):
parts.append(str(norm)[:2000])
return ("; ".join(parts), biz, req_id)
_GET_DIAGNOSIS_META_KEYS = frozenset(
{
"task_id",
"TaskId",
"status",
"Status",
"err_msg",
"ErrMsg",
"message",
"Message",
"request_id",
"RequestId",
"code",
"Code",
}
)
def _extract_get_diagnosis_result_payload(data: Dict[str, Any]) -> Any:
"""
从 GetDiagnosisResult 返回的 data 中提取业务载荷。
兼容 result/Result、以及载荷落在兄弟字段(无标准 result)的情况。
"""
if not isinstance(data, dict):
return None
def _nonempty(x: Any) -> bool:
if x is None or x == "":
return False
if isinstance(x, dict) and not x:
return False
if isinstance(x, list) and not x:
return False
return True
r = data.get("result")
if _nonempty(r):
return r
R = data.get("Result")
if _nonempty(R):
return R
for k in (
"diagnosis_result",
"DiagnosisResult",
"output",
"Output",
"report",
"Report",
"diagnosis_data",
"DiagnosisData",
):
v = data.get(k)
if _nonempty(v):
return v
rest = {k: v for k, v in data.items() if k not in _GET_DIAGNOSIS_META_KEYS}
if len(rest) == 1:
only = next(iter(rest.values()))
if _nonempty(only):
return only
if len(rest) > 1:
return rest
return None
def _response_data_to_dict(response_data: Any) -> Dict[str, Any]:
"""将 ``call_api`` 返回值统一为 dict(TeaModel / dict)。"""
if response_data is None:
return {}
if isinstance(response_data, dict):
return response_data
if isinstance(response_data, TeaModel):
return response_data.to_map()
return {}
class DiagnosisMCPHelper:
"""诊断流程封装:发起任务并轮询结果。"""
def __init__(
self,
client: Optional[SysomOpenAPIClient] = None,
*,
timeout: int = 150,
poll_interval: int = 1,
) -> None:
self.client: SysomOpenAPIClient = client or sysom_openapi_client
self.timeout = timeout
self.poll_interval = poll_interval
async def execute(self, request: DiagnosisRequest) -> DiagnosisResponse:
params_json = json.dumps(request.params, ensure_ascii=False)
invoke_request: Dict[str, Any] = {
"service_name": request.service_name,
"channel": request.channel,
"params": params_json,
}
success, response_data, error_msg = await self.client.call_api(
"InvokeDiagnosis",
invoke_request,
return_as="dict",
)
if not success:
raw = _normalize_map(_response_data_to_dict(response_data))
detail, biz, rid = _extract_invoke_failure_message(raw if isinstance(raw, dict) else {})
combined = (error_msg or "").strip()
if detail:
if combined and detail not in combined and combined not in detail:
combined = f"{combined}; {detail}"
elif not combined:
combined = detail
if not combined:
combined = "发起诊断失败(未收到服务端错误详情,请检查网络与 endpoint)"
return DiagnosisResponse(
code=DiagnoseResultCode.TASK_CREATE_FAILED,
message=combined,
task_id="",
api_business_code=biz,
api_request_id=rid,
result={},
)
data_map = _normalize_map(_response_data_to_dict(response_data))
if not isinstance(data_map, dict):
data_map = {}
data_inner = data_map.get("data") or {}
if isinstance(data_inner, TeaModel):
data_inner = data_inner.to_map()
if not isinstance(data_inner, dict):
data_inner = {}
task_id = str(data_inner.get("task_id") or "").strip()
biz_code = str(data_map.get("code") or "").strip()
ok_invoke = biz_code.lower() == "success" or (not biz_code and bool(task_id))
if not ok_invoke:
detail, biz, rid = _extract_invoke_failure_message(data_map)
if not detail:
detail = "发起诊断失败(服务端未返回可解析的错误信息)"
return DiagnosisResponse(
code=DiagnoseResultCode.TASK_CREATE_FAILED,
message=detail,
task_id="",
api_business_code=biz or biz_code,
api_request_id=rid,
result={},
)
if not task_id:
detail, biz, rid = _extract_invoke_failure_message(data_map)
return DiagnosisResponse(
code=DiagnoseResultCode.TASK_CREATE_FAILED,
message=detail or "InvokeDiagnosis 未返回 task_id",
task_id="",
api_business_code=biz or biz_code,
api_request_id=rid,
result={},
)
code, message, result = await self._wait_for_result(task_id)
if code == DiagnoseResultCode.SUCCESS:
if isinstance(result, str):
try:
result_dict = json.loads(result)
except (json.JSONDecodeError, TypeError) as e:
return DiagnosisResponse(
task_id=task_id,
code=DiagnoseResultCode.RESULT_PARSE_FAILED,
message=f"结果解析失败:{str(e)},原始结果:{result[:200]}",
result={"raw": result},
)
elif isinstance(result, dict):
result_dict = result
else:
return DiagnosisResponse(
task_id=task_id,
code=DiagnoseResultCode.RESULT_PARSE_FAILED,
message=f"结果类型异常:{type(result)},期望字符串或字典",
result={"raw": str(result)},
)
return DiagnosisResponse(
task_id=task_id,
code=DiagnoseResultCode.SUCCESS,
result=result_dict,
)
return DiagnosisResponse(task_id=task_id, code=code, message=message)
async def _wait_for_result(self, task_id: str) -> Tuple[str, str, Any]:
loop = asyncio.get_running_loop()
start_time = loop.time()
while (loop.time() - start_time) < self.timeout:
success, response_data, error_msg = await self.client.call_api(
"GetDiagnosisResult",
{"task_id": task_id},
return_as="dict",
)
if not success:
return DiagnoseResultCode.GET_RESULT_FAILED, error_msg or "获取结果失败", None
response_data = _response_data_to_dict(response_data)
if str(response_data.get("code") or "").strip().lower() == "success":
data = response_data.get("data") or {}
task_status = str(data.get("status") or "").strip().lower()
if task_status == "fail":
return (
DiagnoseResultCode.TASK_EXECUTE_FAILED,
data.get("err_msg", "任务执行失败"),
None,
)
if task_status == "success":
payload = _extract_get_diagnosis_result_payload(data)
if payload is None:
payload = {
"_sysom_cli_note_zh": (
"GetDiagnosisResult 标记成功但未解析到业务 result;"
"请对照控制台或原始 API。以下为 data 中非元数据字段。"
),
"task_id": task_id,
"_raw_data_keys": sorted(data.keys()),
}
return DiagnoseResultCode.SUCCESS, "", payload
await asyncio.sleep(self.poll_interval)
else:
return (
DiagnoseResultCode.GET_RESULT_FAILED,
response_data.get("message", "获取结果失败"),
None,
)
return (
DiagnoseResultCode.TASK_TIMEOUT,
f"诊断执行超时({self.timeout}秒),task_id: {task_id}",
None,
)
FILE:scripts/sysom_cli/lib/diagnosis_source.py
# -*- coding: utf-8 -*-
"""内建注入 __sysom_diagnosis_source:工作区目录启发式;可用环境变量覆盖或关闭。"""
from __future__ import annotations
import os
from pathlib import Path
from typing import Optional, Tuple
__all__ = [
"DIAGNOSIS_SOURCE_KEY",
"LEGACY_DIAGNOSIS_SOURCE_KEYS",
"infer_diagnosis_source_from_cwd",
"resolve_diagnosis_source",
]
# 写入 OpenAPI params 的键名(旧名 $diagnosis_source 会被网关/后端拦截,勿再用)
DIAGNOSIS_SOURCE_KEY = "__sysom_diagnosis_source"
# 历史上曾注入的键,invoke 前会从 params 中剔除,避免残留进请求体
LEGACY_DIAGNOSIS_SOURCE_KEYS: Tuple[str, ...] = ("$diagnosis_source",)
# 当前目录下存在这些子目录/文件时命中(优先级高于路径段)。
DIR_MARKERS: Tuple[Tuple[str, str], ...] = (
(".claude", "claude"),
(".copilot", "cosh"),
)
# 向上遍历时,若当前路径的**最后一段**为下列名称则命中(如 /usr/share/os-copilot/skills/...)。
PATH_SEGMENT_MARKERS: Tuple[Tuple[str, str], ...] = (
("os-copilot", "cosh"),
)
# 设置 SYSOM_DIAGNOSIS_SOURCE / OSOPS_DIAGNOSIS_SOURCE 为这些值时:不注入该字段(屏蔽 cwd 启发式)
_DISABLE_SENTINELS = frozenset(
{
"-",
"0",
"off",
"none",
"false",
"disable",
"disabled",
}
)
def infer_diagnosis_source_from_cwd(
start: Optional[Path] = None,
*,
max_depth: int = 64,
) -> Optional[str]:
"""
从 start(默认 Path.cwd())逐级向父目录查找:
1) 是否存在 DIR_MARKERS 中的子目录/文件(如 ``.claude`` / ``.copilot``);
2) 否则当前路径最后一段是否匹配 PATH_SEGMENT_MARKERS(如 ``os-copilot``,常见于 ``/usr/share/os-copilot/skills``)。
命中即返回对应来源标识;同一层内 DIR_MARKERS 优先于路径段。
"""
cur = (start or Path.cwd()).resolve()
for _ in range(max_depth):
for dirname, source in DIR_MARKERS:
if (cur / dirname).exists():
return source
for segment, source in PATH_SEGMENT_MARKERS:
if cur.name == segment:
return source
parent = cur.parent
if parent == cur:
break
cur = parent
return None
def resolve_diagnosis_source() -> Tuple[Optional[str], str]:
"""
决定本次 Invoke 是否写入 params['__sysom_diagnosis_source']。
- 未设置环境变量:仅根据当前工作目录向上启发式推断(内建)。
- 已设置 ``SYSOM_DIAGNOSIS_SOURCE`` 或 ``OSOPS_DIAGNOSIS_SOURCE``(非空):
使用该值,**不再**做 cwd 启发式(覆盖自动识别)。
值为 ``-`` / ``off`` / ``none`` 等(见 _DISABLE_SENTINELS)时:**不注入** 该字段(完全屏蔽)。
Returns:
(value_or_none, provenance) provenance ∈ {env, cwd, none, disabled}
"""
raw = os.environ.get("SYSOM_DIAGNOSIS_SOURCE") or os.environ.get("OSOPS_DIAGNOSIS_SOURCE")
env = (raw or "").strip()
if env:
if env.lower() in _DISABLE_SENTINELS:
return None, "disabled"
return env, "env"
inferred = infer_diagnosis_source_from_cwd()
if inferred:
return inferred, "cwd"
return None, "none"
FILE:scripts/sysom_cli/lib/ecs_metadata.py
# -*- coding: utf-8 -*-
"""
阿里云 ECS 实例元数据服务访问工具。
仅能在 ECS 实例内网访问:``http://100.100.100.200/latest/meta-data/...``
常用 ``path`` 示例(相对 ``/latest/meta-data/``)::
instance-id, hostname, region-id, zone-id, image-id
private-ipv4, vpc-id, vswitch-id, mac, network-type
owner-account-id, serial-number, source-address
disks/, network/, instance/ # 目录列表(返回多行文本)
ram/security-credentials/<RoleName> # JSON,请用 ``get_ecs_metadata_json``
参考:在实例内执行 ``curl http://100.100.100.200/latest/meta-data`` 可查看根下列表。
"""
from __future__ import annotations
import json
from typing import Any, Dict, Union
import requests
__all__ = [
"ECS_METADATA_HOST",
"ECS_METADATA_ROOT",
"build_ecs_metadata_url",
"get_ecs_metadata",
"get_ecs_metadata_json",
]
# 固定链路本地地址(阿里云 ECS)
ECS_METADATA_HOST = "100.100.100.200"
ECS_METADATA_ROOT = f"http://{ECS_METADATA_HOST}/latest/meta-data"
def build_ecs_metadata_url(path: str = "") -> str:
"""
构造元数据 URL。
Args:
path: 相对 ``/latest/meta-data/`` 的路径,如 ``instance-id``、``region-id``、``disks/``;
空字符串表示根路径(列出顶层条目)。
"""
p = (path or "").strip().lstrip("/")
if not p:
return ECS_METADATA_ROOT
return f"{ECS_METADATA_ROOT}/{p}"
def get_ecs_metadata(
path: str = "",
*,
timeout: float = 3.0,
strip: bool = True,
) -> Dict[str, Any]:
"""
GET 文本类元数据(含目录列表的多行文本)。
Returns:
成功: ``{"ok": True, "text": str, "status_code": 200}``
失败: ``{"ok": False, "error": str, "status_code": int|None}``
"""
url = build_ecs_metadata_url(path)
try:
resp = requests.get(url, timeout=timeout)
except requests.exceptions.Timeout:
return {
"ok": False,
"error": "连接超时(可能不在 ECS 环境或元数据服务不可达)",
"status_code": None,
}
except requests.exceptions.RequestException as e:
return {"ok": False, "error": f"请求失败: {e}", "status_code": None}
if resp.status_code != 200:
return {
"ok": False,
"error": f"HTTP {resp.status_code}: 无法读取元数据",
"status_code": resp.status_code,
}
text = resp.text
if strip:
text = text.strip()
return {"ok": True, "text": text, "status_code": resp.status_code}
def get_ecs_metadata_json(
path: str,
*,
timeout: float = 3.0,
) -> Dict[str, Any]:
"""
GET 并解析为 JSON(如 ``ram/security-credentials/<角色名>``)。
Returns:
成功: ``{"ok": True, "data": dict|list}``
失败: ``{"ok": False, "error": str, "status_code": int|None, "text": str|None}``
"""
url = build_ecs_metadata_url(path)
try:
resp = requests.get(url, timeout=timeout)
except requests.exceptions.Timeout:
return {
"ok": False,
"error": "连接超时(可能不在 ECS 环境或元数据服务不可达)",
"status_code": None,
"text": None,
}
except requests.exceptions.RequestException as e:
return {
"ok": False,
"error": f"请求失败: {e}",
"status_code": None,
"text": None,
}
if resp.status_code != 200:
return {
"ok": False,
"error": f"HTTP {resp.status_code}: 无法读取元数据",
"status_code": resp.status_code,
"text": resp.text[:500] if resp.text else None,
}
try:
data: Union[dict, list] = resp.json()
except json.JSONDecodeError as e:
return {
"ok": False,
"error": f"响应不是合法 JSON: {e}",
"status_code": resp.status_code,
"text": resp.text[:500] if resp.text else None,
}
return {"ok": True, "data": data, "status_code": resp.status_code}
FILE:scripts/sysom_cli/lib/guidance.py
# -*- coding: utf-8 -*-
"""SysOM OpenAPI 权限引导:precheck / diagnosis 用的固定文案与结构化字段。"""
from __future__ import annotations
from typing import Any, Dict, List, Optional
# 相对于 sysom-diagnosis/(技能根,与 SKILL.md 同级)
DOC_PATHS_REL_SYSOM_DIAGNOSIS = {
"openapi_permission_guide": "references/openapi-permission-guide.md",
"authentication": "references/authentication.md",
"sysom_diagnosis_skill": "SKILL.md",
"invoke_diagnosis_ref": "references/invoke-diagnosis.md",
}
# 兼容旧名称(模块内引用)
DOC_PATHS_REL_OS_OPS = DOC_PATHS_REL_SYSOM_DIAGNOSIS
# 需在 RAM 控制台完成授权/角色管理时的统一入口(与 permission 文案一致)
RAM_CONSOLE_URL = "https://ram.console.aliyun.com/"
# COSH:无 TTY 时 osops configure 无法交互读密钥时的补救说明
INTERACTIVE_CONFIGURE_PTY_HINT = (
"若当前执行环境不支持交互式输入(无 PTY):在 **COSH** 中可通过 **/settings** 开启「**交互式Shell(PTY)**」,"
"或使用 **/bash** 进入交互式 Bash,再在 sysom-diagnosis(技能根)执行 ./scripts/osops.sh configure。"
)
def precheck_commands() -> List[str]:
"""推荐在 sysom-diagnosis(技能根)执行;次选为在 scripts 目录下已配置 PYTHONPATH 时直接调模块。"""
return [
"./scripts/osops.sh precheck",
"cd scripts && ./osops.sh precheck",
]
def base_guidance_block() -> Dict[str, Any]:
"""写入 precheck / diagnosis 的 data.guidance,供 Agent 解析。"""
return {
"session_rule": (
"本机 memory 快速排查(`memory classify` / `oom` / `memgraph` 等,未加 `--deep-diagnosis`)"
"默认不调 OpenAPI,不要要求用户提供 region/instance。"
"用户只说「内存高」等泛化症状且未明确要 SysOM 远程时,应先在技能根跑本机 quick,不要一上来只问 ECS 实例 ID。"
"远程专项(`memory … --deep-diagnosis`)内建与 `osops precheck` 等价的凭证/开通检查,"
"不要求每次远程前再手动跑一遍 precheck。"
"一旦进入远程专项路径,必须通过 SysOM InvokeDiagnosis/GetDiagnosisResult 获取结论;"
"禁止用 ECS 通用诊断或 RunCommand/Cloud Assistant 手工命令采集替代。"
),
"diagnosis_target_rule": (
"以下二选一仅在要使用远程专项(`memory … --deep-diagnosis`)时适用;本机 memory quick 不适用。"
"A) 诊断本机 — 不传 --region 与 --instance,由 CLI 从元数据补全。"
"B) 诊断远程实例 — 须由用户提供目标 region 与 instance-id;Agent 禁止自行 curl 元数据冒充用户目标。"
),
"credential_policy": (
"禁止在对话中向用户索要、收集或请用户粘贴 AccessKey / Secret。"
"RAM 用户 AK 路径须通过终端执行 ./scripts/osops.sh configure;"
"密钥仅出现在用户本机终端输入,不进入聊天记录。"
+ INTERACTIVE_CONFIGURE_PTY_HINT
),
"precheck_command": precheck_commands()[0],
"configure_command": "./scripts/osops.sh configure",
"precheck_command_alternate": precheck_commands()[1],
"read_next": [
DOC_PATHS_REL_SYSOM_DIAGNOSIS["openapi_permission_guide"],
DOC_PATHS_REL_SYSOM_DIAGNOSIS["authentication"],
],
"guided_steps": [
"1) 确认工作区含完整 sysom-diagnosis(含 references/、scripts/);在技能根执行 ./scripts/init.sh(仅首次或依赖变更后)。",
"2) 可选:执行 ./scripts/osops.sh precheck 单独验证环境;远程子命令也会在调用 OpenAPI 前自动做同等检查。",
"3) 若远程返回或 precheck 显示认证失败:请用户二选一——"
"A) RAM 用户 AK(本机/非 ECS)B) ECS RAM Role(仅在阿里云 ECS 内)。"
"选 A 则 Agent 用 Bash 执行 ./scripts/osops.sh configure(勿在聊天传密钥),再重试远程命令;"
"无 PTY 时见 credential_policy 中的交互式 Shell 说明。"
"选 B 则按 openapi-permission-guide §4.1 完成控制台绑角色 + CLI 的 ECS RAM Role 模式配置。",
"4) 若 service_not_activated:在 Alinux 控制台开通 SysOM(SLR 通常随开通自动创建);"
"子账号开通过程报权限不足时见仓库根 RAM 专文。",
"5) 每完成一类配置后可再执行 precheck 或直接重试远程专项命令。",
"6) 内存域:表述已对应专项则用本机 `./scripts/osops.sh memory memgraph`(占用高/大图)"
"或 `memory oom`(OOM)或 `memory javamem`(Java);"
"仍不明时用 `memory classify`(均默认不调云)。选路见 references/memory-routing.md。"
"深度/远程入口:`./scripts/osops.sh memory <classify|memgraph|oom|javamem> … --deep-diagnosis`;"
"专有 params 见 references/diagnoses/。",
"7) 若失败且与实例侧相关(云助手、控制台诊断授权等),见 references/invoke-diagnosis.md。",
],
}
def remediation_service_not_activated() -> List[str]:
return [
"场景 A-K3 / E-R4:OpenAPI 已通,但账号侧 SysOM 未开通或未就绪。",
"步骤 1:用有权限的账号打开 https://alinux.console.aliyun.com/?source=cosh,进入 SysOM 相关入口,按页面完成「开通」。",
"步骤 2:在 Alinux 控制台按引导完成 SysOM 开通时,通常会**自动**创建服务关联角色 AliyunServiceRoleForSysom,一般无需在控制台外单独创建;若子账号在开通步骤报无 CreateServiceLinkedRole 等权限,按仓库根《如何让RAM子用户拥有创建服务角色的权限》配置后再试。",
"步骤 3:开通后等待 1~3 分钟,再在运行 CLI 的环境执行: " + precheck_commands()[0],
"步骤 4:仍报相同错误时,在控制台确认 SysOM 已显示为已开通,并检查当前 AK/RAM 角色所属账号与开通账号一致。",
]
def auth_path_choice_block() -> Dict[str, Any]:
"""凭证未配置时:先选路径,再执行对应动作(写入 data.guidance)。"""
return {
"step_0": (
"请先请用户二选一(未配置 AK 且未可用 ECS RAM Role 时):"
"A) RAM 用户 AccessKey — 适用于本机、非 ECS、CI 等;"
"B) ECS 实例 RAM Role — 仅当在阿里云 ECS 内运行本 CLI / Agent。"
),
"path_a_ram_user_ak": {
"label": "A — RAM 用户 AK",
"when": "用户选择 A",
"agent_run": (
"在 sysom-diagnosis(技能根)目录执行 Bash(勿在对话中索要密钥):"
"./scripts/osops.sh configure。"
+ INTERACTIVE_CONFIGURE_PTY_HINT
),
"then": "配置成功后同一环境执行: " + precheck_commands()[0],
},
"path_b_ecs_ram_role": {
"label": "B — ECS RAM Role",
"when": "用户选择 B",
"summary": (
f"在 RAM 控制台({RAM_CONSOLE_URL})创建可信 ECS 的 RAM 角色并附加 AliyunSysomFullAccess;"
"ECS 控制台将角色绑定到当前实例;"
"在实例内将 CLI 配置为 ECS RAM Role 模式(ram-role-name 与实例绑定一致);"
"验证: curl -s http://100.100.100.200/latest/meta-data/ram/security-credentials/ 应返回角色名。"
),
"then": "再执行: " + precheck_commands()[0],
"read_next": [
DOC_PATHS_REL_SYSOM_DIAGNOSIS["openapi_permission_guide"] + "(§4.1)",
DOC_PATHS_REL_SYSOM_DIAGNOSIS["authentication"] + "(方式 1 ECS RAM Role)",
],
},
}
def remediation_auth_failed() -> List[str]:
return [
"场景:尚未形成可用的 SysOM 调用身份(A-K* / E-R*),或 InitialSysom 未通过。",
"安全:不要在对话中索要或粘贴 AccessKey/Secret。",
"步骤 0:请先请用户二选一 — A) RAM 用户 AK(本机/非 ECS) B) ECS RAM Role(仅在阿里云 ECS 内)。",
"若选 A:Agent 应在 sysom-diagnosis(技能根)直接执行 Bash — ./scripts/osops.sh configure(交互式本地输入,不回显);"
+ INTERACTIVE_CONFIGURE_PTY_HINT
+ " 完成后执行 " + precheck_commands()[0] + "。",
"若选 B:按 references/openapi-permission-guide.md §4.1 与 references/authentication.md「ECS RAM Role」— "
f"在 RAM 控制台({RAM_CONSOLE_URL})与 ECS 控制台完成角色与绑定 + 完成 CLI 的 ECS RAM Role 模式配置,再 "
+ precheck_commands()[0]
+ "。",
"若 ~/.aliyun/config.json 解析失败:修复或删除后重试;选 A 可再次 ./scripts/osops.sh configure。",
]
def _remediation_ecs_ram_role_lines(
*,
role_label: str,
ram_profile_configured: bool,
metadata_role_bound: bool,
) -> List[str]:
"""ECS RAM Role 主路径步骤(单源);元数据已确认绑定时省略 curl 排障行。"""
lines: List[str] = [
"主路径:ECS RAM Role(实例角色)。无需再配置 RAM 用户 AccessKey。",
]
if role_label:
lines.append(
f"步骤 1:打开 RAM 控制台({RAM_CONSOLE_URL})→ 角色「{role_label}」→ 权限管理,"
"确认已附加 AliyunSysomFullAccess(无则新增授权)。"
)
else:
lines.append(
f"步骤 1:打开 RAM 控制台({RAM_CONSOLE_URL})→ 找到配置文件中的 RAM 角色 → 权限管理,"
"确认已附加 AliyunSysomFullAccess(无则新增授权)。"
)
lines.append("步骤 2:策略变更后等待约 2~5 分钟。")
if ram_profile_configured:
lines.append(
"步骤 3:已配置本地 config.json 的 ECS RAM Role/RAM Role,请核对 ram-role-name 与实例绑定一致、config.json 有效。"
)
elif role_label:
lines.append(
"步骤 3:实例内将 CLI 配置为 ECS RAM Role,ram-role-name 与实例绑定一致。"
)
else:
lines.append(
"步骤 3:完成实例 RAM 角色绑定后,在实例内完成 CLI 的 ECS RAM Role 模式配置。"
)
if not metadata_role_bound:
lines.append(
"(可选排障)curl -s http://100.100.100.200/latest/meta-data/ram/security-credentials/ 应返回角色名一行。"
)
lines.append("最后:在技能根执行: " + precheck_commands()[0])
return lines
def remediation_ecs_role_hint(role_name: str) -> List[str]:
"""兼容旧调用:实例已绑角色场景,与主路径 remediation 共用生成逻辑。"""
return _remediation_ecs_ram_role_lines(
role_label=str(role_name).strip(),
ram_profile_configured=False,
metadata_role_bound=True,
)
def precheck_guidance_compact(
primary_path: str,
*,
scenario_hint: Optional[str] = None,
) -> Dict[str, Any]:
"""
预检失败且已锁定单一路径时的精简 guidance:不含 session_rule / diagnosis_target_rule / guided_steps。
"""
bg = base_guidance_block()
out: Dict[str, Any] = {
"credential_policy": bg["credential_policy"],
"precheck_command": bg["precheck_command"],
"configure_command": bg["configure_command"],
"read_next": [
DOC_PATHS_REL_SYSOM_DIAGNOSIS["openapi_permission_guide"],
DOC_PATHS_REL_SYSOM_DIAGNOSIS["authentication"],
],
"primary_path": primary_path,
}
if scenario_hint:
out["scenario_hint"] = scenario_hint
return out
def precheck_guidance_success_light() -> Dict[str, Any]:
"""precheck 成功时的轻量 guidance,与失败侧 precheck_compact 对称(不含 session_rule / guided_steps)。"""
bg = base_guidance_block()
return {
"credential_policy": bg["credential_policy"],
"precheck_command": bg["precheck_command"],
"configure_command": bg["configure_command"],
"read_next": list(bg["read_next"]),
}
def remediation_for_precheck_failure(
check_result: Dict[str, Any],
path_summary: Dict[str, Any],
*,
needs_sysom_activation: bool,
) -> List[str]:
"""
按 path_summary.primary_path 与 checked 进度生成 remediation;已完成步骤不重复叙述。
"""
if needs_sysom_activation:
return remediation_service_not_activated()
primary = path_summary.get("primary_path") or "configure_identity"
prog: Dict[str, Any] = dict(path_summary.get("progress") or {})
checked: List[Dict[str, Any]] = list(check_result.get("checked") or [])
ecs_role_name = check_result.get("ecs_role_name")
ram_rows = [
r
for r in checked
if str(r.get("method", "")).lower().startswith("配置文件 ecsramrole")
or str(r.get("method", "")).startswith("配置文件 ECS RAM Role")
or str(r.get("method", "")).startswith("配置文件 RAM Role")
]
if not prog:
prog = {
"metadata_role_bound": bool(ecs_role_name)
or any("✓" in str((r.get("status") or "")) for r in checked if r.get("method") == "ECS元数据"),
"env_ak_attempted": any(
r.get("method") == "环境变量 AKSK" and "未配置" not in str(r.get("status") or "")
for r in checked
),
"file_ak_attempted": any(r.get("method") == "配置文件 AKSK" for r in checked),
"file_profile_missing_or_invalid": any(
r.get("method") == "配置文件" and "✗" in str(r.get("status") or "") for r in checked
),
"ram_profile_configured": bool(ram_rows),
}
if primary == "ecs_ram_role":
role_label = str(ecs_role_name).strip() if ecs_role_name else ""
meta_bound = bool(prog.get("metadata_role_bound"))
ram_prof = bool(prog.get("ram_profile_configured", bool(ram_rows)))
return _remediation_ecs_ram_role_lines(
role_label=role_label,
ram_profile_configured=ram_prof,
metadata_role_bound=meta_bound,
)
if primary == "access_key":
env_ok = bool(prog.get("env_ak_attempted"))
file_ok = bool(prog.get("file_ak_attempted"))
file_bad = bool(prog.get("file_profile_missing_or_invalid"))
head = [
"主路径:RAM 用户 AccessKey(环境变量或本地配置文件 AK)。无需并行配置 ECS RAM Role。",
]
if env_ok and not file_ok:
return head + [
f"步骤 1:在 RAM 控制台({RAM_CONSOLE_URL})确认 RAM 用户或 AK 对应主体已附加 AliyunSysomFullAccess。",
"步骤 2:环境变量 AK 须与 precheck 在同一 shell 进程内导出后再执行;否则改用 ./scripts/osops.sh configure 写入 ~/.aliyun。",
"步骤 3:执行: " + precheck_commands()[0],
]
if file_ok and not env_ok:
return head + [
"步骤 1:核对本地配置文件中的 AK 模式配置有效(密钥正确、profile 未损坏)。",
f"步骤 2:在 RAM 控制台({RAM_CONSOLE_URL})确认该 RAM 用户已附加 AliyunSysomFullAccess。",
"步骤 3:执行: " + precheck_commands()[0],
]
if file_bad and not env_ok:
return head + [
"步骤 1:在技能根由 Agent 执行 ./scripts/osops.sh configure 重建有效配置,勿在对话中传输 Secret。",
f"步骤 2:在 RAM 控制台({RAM_CONSOLE_URL})确认 RAM 用户已附加 AliyunSysomFullAccess。",
"步骤 3:执行: " + precheck_commands()[0],
]
return head + [
"步骤 1:在技能根由 Agent 执行 ./scripts/osops.sh configure,勿在对话中传输 Secret。",
f"步骤 2:在 RAM 控制台({RAM_CONSOLE_URL})确认 RAM 用户已附加 AliyunSysomFullAccess。",
"步骤 3:若用环境变量 AK,须与 precheck 在同一 shell 进程;否则优先写入 ~/.aliyun。",
"步骤 4:执行: " + precheck_commands()[0],
]
return remediation_auth_failed()
def scenario_hint_for_error_code(error_code: Optional[str]) -> Optional[str]:
if error_code == "service_not_activated":
return "A-K3_or_E-R4_sysom_not_activated"
if error_code == "sysom_role_not_exist":
return "A-K3_or_E-R4_sysom_slr_not_ready"
return None
def diagnosis_subsystem_minimal_guidance(
*,
include_console_authorization_hint: bool = False,
) -> Dict[str, Any]:
"""
专项命令(`memory … --deep-diagnosis` 等)失败时 data 内唯一引导块:
不展开 precheck 全量 guided_steps,避免与 error 重复。
认证/开通类问题仍以 references/openapi-permission-guide.md 为准。
"""
rem: List[str] = [
"invoke 参数与「本机/远程」约定见 read_next;precheck 与账号开通见 "
+ DOC_PATHS_REL_SYSOM_DIAGNOSIS["openapi_permission_guide"]
+ "。",
]
if include_console_authorization_hint:
rem.append(
"若 error.message 指向未授权:在 SysOM 或 ECS 控制台为目标实例完成诊断授权(勿使用已废弃 OpenAPI 授权)。"
)
return {
"diagnosis_target": (
"本机不传 --region/--instance(由 CLI 从元数据补全);"
"远程须由用户提供 region 与 instance-id。"
),
"read_next": [DOC_PATHS_REL_SYSOM_DIAGNOSIS["invoke_diagnosis_ref"]],
"remediation": rem,
}
FILE:scripts/sysom_cli/lib/invoke_envelope_finalize.py
# -*- coding: utf-8 -*-
"""diagnosis invoke / io|net|load 专项:routing、remote、data_refs、紧凑/verbose。"""
from __future__ import annotations
from argparse import Namespace
from typing import Any, Dict, List, Optional
def ensure_findings_finding_type(findings: Optional[List[Dict[str, Any]]]) -> None:
"""为 findings 补全 finding_type(precheck:severity;memory:kind)。"""
if not findings:
return
for f in findings:
if not isinstance(f, dict):
continue
if "finding_type" in f:
continue
if "severity" in f:
f["finding_type"] = f"precheck_{f.get('severity', 'info')}"
elif "kind" in f:
f["finding_type"] = f["kind"]
def finalize_diagnosis_invoke_envelope(
out: Dict[str, Any],
ns: Namespace,
*,
cli_subsystem: Optional[str] = None,
) -> Dict[str, Any]:
"""
- 写入 data.routing、data.remote(与 memory --deep-diagnosis 同形)
- 成功时业务结果仅在 data.remote.result(不再写顶层 data.result)
- 不写 data.next_steps;invoke 无内存式「下一步」列表
- agent:data_refs、data_field_guide;错误时不改 summary
"""
if out.get("action") != "diagnosis_invoke":
return out
data = out.setdefault("data", {})
agent = out.setdefault("agent", {})
ok = bool(out.get("ok"))
svc = (data.get("service_name") or "").strip()
ch = (data.get("channel") or "").strip() or "ecs"
reg = (data.get("region") or "").strip()
tid = (data.get("task_id") or "").strip() or None
routing: Dict[str, Any] = {
"recommended_service_name": svc,
"channel": ch,
"region": reg,
}
if tid:
routing["task_id"] = tid
data["routing"] = routing
remote: Dict[str, Any] = {
"ok": ok,
"action": "diagnosis_invoke",
"service_name": svc,
"channel": ch,
"region": reg,
}
if tid:
remote["task_id"] = tid
if "ecs_metadata_filled" in data:
remote["ecs_metadata_filled"] = data["ecs_metadata_filled"]
if "diagnosis_source_origin" in data:
remote["diagnosis_source_origin"] = data["diagnosis_source_origin"]
if data.get("diagnosis_source"):
remote["diagnosis_source"] = data["diagnosis_source"]
if ok:
# 与 invoke 约定一致:成功时始终暴露 data.remote.result(可为 null),便于 Agent 稳定读取
prev_remote = data.get("remote") if isinstance(data.get("remote"), dict) else {}
if "result" in data:
remote["result"] = data.pop("result")
elif "result" in prev_remote:
# 幂等:已被 finalize 过且顶层 result 已 pop 时,保留原 remote.result
remote["result"] = prev_remote.get("result")
else:
remote["result"] = None
else:
if out.get("error"):
remote["error"] = out["error"]
data["remote"] = remote
data.pop("next_steps", None)
ensure_findings_finding_type(agent.get("findings") or [])
refs: List[str] = [
"data.routing.recommended_service_name",
"data.routing.region",
"data.remote",
]
guides: List[Dict[str, Any]] = [
{
"pointer": "data.routing.recommended_service_name",
"label_zh": "专项名",
"meaning_zh": "本次调用的 SysOM service_name(OpenAPI)。",
"reading_zh": "与 data.remote.service_name 一致。",
},
{
"pointer": "data.remote",
"label_zh": "深度诊断结果壳",
"meaning_zh": "ok、task_id、result、error 等与内存域 data.remote 同形。",
"reading_zh": "成功时读 data.remote.result;失败时读 data.remote.error 与 agent.summary。",
},
]
if ok:
refs.append("data.remote.result")
guides.append(
{
"pointer": "data.remote.result",
"label_zh": "API 业务结果",
"meaning_zh": "GetDiagnosisResult 返回的专项载荷(形态随 service_name 变化)。",
"reading_zh": (
"仅此一处;无顶层 data.result。"
"若对象内含 _sysom_cli_note_zh,表示未命中标准 result 字段,已尽力从兄弟键聚合或给出键名提示。"
),
}
)
agent["data_refs"] = refs
agent["data_field_guide"] = guides
exec_ = dict(out.get("execution") or {})
if cli_subsystem:
exec_["subsystem"] = cli_subsystem
exec_.setdefault("mode", "remote")
out["execution"] = exec_
if out.get("error") is not None:
return out
verbose = bool(getattr(ns, "verbose_envelope", False))
if verbose:
if not (agent.get("summary") or "").strip():
agent["summary"] = (
f"诊断完成:service_name={svc} task_id={tid or '(无)'} region={reg}"
)
else:
domain = (cli_subsystem or "invoke").lower()
res = remote.get("result") if ok else None
if ok and isinstance(res, dict) and res.get("_sysom_cli_note_zh"):
agent["summary"] = (
f"深度诊断({domain})已完成;服务端返回体未含可解析的标准 result,"
"已写入占位说明与 _raw_data_keys,请读 data.remote.result。"
)
else:
agent["summary"] = (
f"深度诊断({domain},紧凑输出)。请读 data.remote.result(成功时)。"
)
agent["next"] = []
return out
FILE:scripts/sysom_cli/lib/kernel_log.py
# -*- coding: utf-8 -*-
from __future__ import annotations
import subprocess
from typing import List, Optional
def get_kernel_log_lines(source: str, path: Optional[str]) -> List[str]:
if path:
with open(path, encoding="utf-8", errors="replace") as f:
return f.read().splitlines()
if source == "journal":
try:
r = subprocess.run(
["journalctl", "-k", "-b", "-o", "short-precise", "--no-pager"],
capture_output=True,
text=True,
timeout=120,
)
if r.returncode == 0 and r.stdout:
return r.stdout.splitlines()
except (FileNotFoundError, subprocess.TimeoutExpired):
pass
for cmd in (["dmesg", "-T"], ["dmesg"]):
try:
r = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
if r.returncode == 0 and r.stdout:
return r.stdout.splitlines()
except FileNotFoundError:
pass
return []
FILE:scripts/sysom_cli/lib/log_parser.py
# -*- coding: utf-8 -*-
"""
通用日志解析器框架
提供基于状态机和插件的日志解析架构。适用于需要从结构化日志中提取信息的场景。
## 核心组件
1. **LogParserContext**: 解析上下文,用于在插件间共享数据
2. **LogParserResult**: 解析结果容器
3. **LogParserPluginBase**: 插件基类,实现start/end/filter/process模式
4. **LogParser**: 解析器主类,管理插件并协调解析过程
## 插件模式
插件通过 start/end 标记定义感兴趣的日志段,支持:
- 嵌套插件(sub_plugins)
- 可重复解析(repeat)
- 过滤器(filter)
- 生命周期钩子(is_start, is_end, process, done)
## 使用示例
```python
from sysom_cli.lib.log_parser import LogParser, LogParserPluginBase, LogParserContext
class MyPlugin(LogParserPluginBase):
def is_start(self, line, global_context, lines, idx):
return "START MARKER" in line
def is_end(self, line, global_context, lines, idx):
return "END MARKER" in line
def process(self, line, global_context, lines, idx):
# 处理每一行
pass
def done(self, local_context, global_context):
# 解析完成,保存结果
self.set("result", "parsed_value")
# 使用
parser = LogParser()
parser.register_plugin("my_plugin", MyPlugin())
parser.parse_lines(log_lines)
result = parser.get_result()
```
## 与 LogScanContext 的区别
- **LogScanContext**: 简单的日志扫描,一次性处理所有日志行(无状态)
- **LogParserPluginBase**: 有状态的解析器,支持 start/end 标记、子插件、重复解析
适用场景:
- 简单扫描 → 使用 `LogScanContext` + 插件函数
- 复杂结构化解析 → 使用 `LogParserPluginBase` + `LogParser`
"""
from __future__ import annotations
from abc import abstractmethod
from typing import Any, Dict, List, Optional
__all__ = [
"LogParserContext",
"LogParserResult",
"LogParserPluginBase",
"LogParser",
]
class LogParserContext:
"""
日志解析上下文
用于在插件间共享数据和状态。
"""
def __init__(self, context: dict = None):
if context is None:
context = {}
self.context = context
def get(self, key):
"""获取上下文值"""
return self.context.get(key)
def set(self, key, value):
"""设置上下文值"""
self.context[key] = value
def clear(self):
"""清空上下文"""
self.context.clear()
def dict(self):
"""返回上下文字典"""
return self.context
def copy(self):
"""复制上下文"""
return LogParserContext(self.context.copy())
class LogParserResult:
"""
日志解析结果
包含全局上下文和所有插件的解析结果。
"""
def __init__(self, context: LogParserContext, plugins: dict):
self.context = context
self.plugins = plugins
def __getattr__(self, item):
# 支持通过属性访问上下文和插件结果
if item in self.context.context:
return self.context.context[item]
if item in self.plugins:
return self.plugins[item]
return None
def __str__(self):
return str(self.context.context) + str(self.plugins)
class LogParserPluginBase:
"""
日志解析插件基类
提供基于状态机的日志解析框架。每个插件定义:
1. is_start: 判断起始标记
2. is_end: 判断结束标记
3. filter: 过滤 [start, end] 之间的日志
4. process: 处理通过过滤的日志
5. done: 解析完成时的回调
支持:
- 子插件(sub_plugins):嵌套解析
- 可重复解析(repeat):解析多个段落
- 历史记录(history):保存重复解析的所有结果
Attributes:
start: 是否已开始解析
end: 是否已结束解析
local_context: 插件私有上下文
sub_plugins: 子插件字典
history: 历史解析结果(repeat=True 时)
repeat: 是否允许重复解析
process_contains_start_end: 是否处理 start/end 行
"""
def __init__(self, **kwargs):
self.start = False
self.end = False
self.local_context = LogParserContext()
self.sub_plugins = {}
self.history = []
self.start_line = ""
self.end_line = ""
self.repeat = kwargs.get("repeat", False)
self.process_contains_start_end = kwargs.get("process_contains_start_end", True)
def register_sub_plugin(self, plugin_id: str, plugin: "LogParserPluginBase"):
"""注册子插件"""
if plugin_id in self.sub_plugins:
raise Exception("plugin id already exists")
if plugin_id == "context_value":
raise Exception("plugin id can't be global")
self.sub_plugins[plugin_id] = plugin
def process_wrapper(
self,
line: str,
global_context: LogParserContext,
lines: List[str] = None,
idx: int = None,
):
"""处理单行日志(状态机主逻辑)"""
def _process_wrapper_after_filter(_line: str):
if self.filter(_line, global_context, lines, idx):
self.process(_line, global_context, lines, idx)
for plugin in self.sub_plugins.values():
plugin.process_wrapper(_line, global_context, lines, idx)
# 如果不允许重复解析,并且已经解析过了,直接返回
if not self.repeatable() and self.end:
return
# 如果没有开始解析,判断是否包含起始标记
if not self.start:
if self.is_start(line, global_context, lines, idx):
self.start = True
self.start_line = line.strip()
self.set("start_line", self.start_line)
if self.process_contains_start_end:
_process_wrapper_after_filter(line)
return
# 如果已经开始解析,判断是否包含结束标记
elif not self.end:
if self.is_end(line, global_context, lines, idx):
self.end = True
self.end_line = line.strip()
self.set("end_line", self.end_line)
if self.process_contains_start_end:
_process_wrapper_after_filter(line)
self.done(self.local_context, global_context)
if self.repeatable():
# 如果支持重复解析,并且已经解析到结束标记,重置状态
for plugin in self.sub_plugins.values():
plugin.done(plugin.local_context, global_context)
self.history.append(self._get_single_result())
self.reset()
for plugin in self.sub_plugins.values():
plugin.reset()
return
# 如果在[start,end]之间,通过filter过滤后的日志,调用 process 方法进行处理
if self.start and not self.end:
_process_wrapper_after_filter(line)
def repeatable(self) -> bool:
"""是否允许重复解析"""
return self.repeat
def set(self, key, value):
"""设置插件本地上下文"""
self.local_context.set(key, value)
def get(self, key):
"""获取插件本地上下文"""
return self.local_context.get(key)
@abstractmethod
def is_start(
self, line: str, global_context: LogParserContext, lines: List[str], idx: int
) -> bool:
"""判断是否包含起始标记"""
pass
@abstractmethod
def is_end(
self, line: str, global_context: LogParserContext, lines: List[str], idx: int
) -> bool:
"""判断是否包含结束标记"""
pass
def filter(
self, line: str, global_context: LogParserContext, lines: List[str], idx: int
) -> bool:
"""对在[start,end]之间的日志进行过滤(默认全部通过)"""
return True
@abstractmethod
def process(
self, line: str, global_context: LogParserContext, lines: List[str], idx: int
):
"""处理日志,在 (start,end) 之间,并且通过filter过滤后的日志在这里进行处理"""
pass
@abstractmethod
def done(self, local_context: LogParserContext, global_context: LogParserContext):
"""处理结束,在此处更新提取的结果到 local_context 中"""
pass
def reset(self):
"""重置插件状态,如果插件支持重复解析,则会调用本函数重置插件状态"""
self.start = False
self.end = False
self.start_line = ""
self.end_line = ""
self.local_context.clear()
def _get_single_result(self) -> LogParserResult:
"""获取单次解析结果"""
sub_plugins_result = {}
for plugin_id, plugin in self.sub_plugins.items():
sub_plugins_result[plugin_id] = plugin.get_result()
return LogParserResult(self.local_context.copy(), sub_plugins_result)
def get_result(self):
"""获取解析结果"""
# 如果是可重复的插件,返回多组处理结果
if self.repeatable():
return self.history
return self._get_single_result()
class LogParser:
"""
日志解析器主类
管理多个插件并协调解析过程。
使用示例:
```python
parser = LogParser()
parser.register_plugin("plugin1", MyPlugin1())
parser.register_plugin("plugin2", MyPlugin2())
parser.parse_lines(log_lines)
result = parser.get_result()
```
"""
def __init__(self):
self.plugins = {}
self.global_context = LogParserContext()
def __getattr__(self, item):
# 支持通过属性访问插件和全局上下文
if item in self.plugins:
return self.plugins[item]
if item in self.global_context.context:
return self.global_context.context[item]
return None
def register_plugin(self, plugin_id: str, plugin: LogParserPluginBase):
"""注册插件"""
if plugin_id in self.plugins:
raise Exception("plugin id already exists")
if plugin_id == "global":
raise Exception("plugin id can't be global")
self.plugins[plugin_id] = plugin
def parse_lines(self, lines: List[str]):
"""解析多行日志"""
for idx, line in enumerate(lines):
self.parse(line, lines, idx)
def parse(self, line: str, lines: List[str] = None, idx: int = None):
"""解析单行日志"""
for plugin in self.plugins.values():
plugin.process_wrapper(line, self.global_context, lines, idx)
def get_result(self):
"""获取所有插件的解析结果"""
sub_plugins_result = {}
for plugin_id, plugin in self.plugins.items():
sub_plugins_result[plugin_id] = plugin.get_result()
return LogParserResult(self.global_context, sub_plugins_result)
FILE:scripts/sysom_cli/lib/log_plugin.py
# -*- coding: utf-8 -*-
"""
日志扫描插件的通用抽象
提供基于日志扫描的插件化架构支持。
## 使用场景
任何需要扫描系统日志(内核日志、应用日志等)并提取信息的子系统都可以使用。
## 插件协议
插件需要实现以下签名的函数:
```python
def run(ctx: LogScanContext) -> Dict[str, Any]:
\"\"\"
扫描日志并返回提取的信息
Args:
ctx: 日志扫描上下文,包含日志行和元数据
Returns:
提取的信息字典
\"\"\"
pass
```
## 示例
### Memory Classify 插件
```python
from sysom_cli.lib.log_plugin import LogScanContext
def run(ctx: LogScanContext) -> Dict[str, Any]:
oom_count = sum(1 for line in ctx.log_lines if "oom-killer" in line)
return {"oom_count": oom_count}
```
### 未来的 IO 错误扫描插件
```python
from sysom_cli.lib.log_plugin import LogScanContext
def run(ctx: LogScanContext) -> Dict[str, Any]:
io_errors = sum(1 for line in ctx.log_lines if "I/O error" in line)
return {"io_error_count": io_errors}
```
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import List, Optional
@dataclass
class LogScanContext:
"""
日志扫描上下文
为日志扫描插件提供统一的输入数据结构。
Attributes:
log_lines: 日志行列表
log_source: 日志来源标识(如 "dmesg", "journal", "syslog")
log_file: 日志文件路径(如果从文件读取)
metadata: 额外的元数据(可选)
"""
log_lines: List[str]
log_source: str
log_file: Optional[str] = None
metadata: Optional[dict] = None
# 向后兼容别名(供已有代码使用)
CollectContext = LogScanContext
FILE:scripts/sysom_cli/lib/openapi_client.py
# -*- coding: utf-8 -*-
"""
SysOM OpenAPI(阿里云 SDK)封装:凭证解析、按 Action 名异步调用。
约定::
``ListDiagnosis`` -> ``models.ListDiagnosisRequest`` + ``await client.list_diagnosis_async(request)``
鉴权:``sysom_cli.lib.auth.resolve_sysom_auth`` / ``test_sysom_api``。
用法::
import asyncio
from sysom_cli.lib.sysom_openapi_client import sysom_openapi_client
async def main():
ok, data, err = await sysom_openapi_client.call_api("ListDiagnosis", {})
asyncio.run(main())
``sysom_openapi_client`` 为**惰性单例**:首次调用方法时才构造 ``SysomOpenAPIClient``,
避免在仅导入模块(如注册子命令)时就必须已有凭证。需换 endpoint / 凭证时可赋值替换::
import sysom_cli.lib.openapi_client as m
m.sysom_openapi_client = SysomOpenAPIClient(verify_auth=False, endpoint="...")
"""
from __future__ import annotations
import json
import re
from typing import Any, Awaitable, Callable, Dict, Literal, Optional, Tuple, Type, Union
from Tea.model import TeaModel
from alibabacloud_sysom20231230.client import Client as SysOM20231230Client
from alibabacloud_tea_openapi import models as open_api_models
from sysom_cli.lib.auth import resolve_sysom_auth, test_sysom_api
__all__ = [
"DEFAULT_SYSOM_ENDPOINT",
"ReturnAs",
"SysomOpenAPIClient",
"sysom_openapi_client",
"api_action_to_snake",
"resolve_sysom_async_api",
"pack_result",
"build_sdk_request",
"normalize_sysom_roa_body",
]
ReturnAs = Literal["body", "response", "dict"]
DEFAULT_SYSOM_ENDPOINT = "sysom.cn-hangzhou.aliyuncs.com"
_CONNECT_TIMEOUT_MS = 10_000
def api_action_to_snake(api_action: str) -> str:
"""``ListDiagnosis`` -> ``list_diagnosis``。"""
name = api_action.strip()
s1 = re.sub(r"(.)([A-Z][a-z]+)", r"\1_\2", name)
s2 = re.sub(r"([a-z0-9])([A-Z])", r"\1_\2", s1)
return s2.lower()
def resolve_sysom_async_api(
api_name: str,
sdk_client: SysOM20231230Client,
) -> Tuple[Type[TeaModel], Callable[..., Awaitable[Any]]]:
"""由 OpenAPI Action 名解析 ``Request`` 类与 SDK 上的 ``{snake}_async`` 方法。"""
from alibabacloud_sysom20231230 import models as sysom_models
action = api_name.strip()
if not action:
raise ValueError("api_name 不能为空")
request_cls_name = f"{action}Request"
if not hasattr(sysom_models, request_cls_name):
raise LookupError(
f"未找到模型 alibabacloud_sysom20231230.models.{request_cls_name},"
"请确认 Action 为 PascalCase 且与 SDK 一致"
)
request_cls = getattr(sysom_models, request_cls_name)
async_name = api_action_to_snake(action) + "_async"
method = getattr(sdk_client, async_name, None)
if method is None or not callable(method):
raise LookupError(
f"SDK 上不存在异步方法 {async_name!r}(来自 Action {action!r}),"
"请核对名称或使用 raw_sdk_client()"
)
return request_cls, method
def pack_result(response: Any, body: Any, return_as: ReturnAs) -> Any:
"""按 ``return_as`` 从 ``response`` / ``body`` 取出返回值。"""
if return_as == "response":
return response
if return_as == "body":
return body
if return_as == "dict":
target = body if body is not None else response
if target is None:
return None
if isinstance(target, dict):
return target
to_map = getattr(target, "to_map", None)
if callable(to_map):
return to_map()
return str(target)
raise ValueError(f"不支持的 return_as: {return_as!r},应为 body / response / dict")
def normalize_sysom_roa_body(inner: Optional[Dict[str, Any]]) -> Dict[str, Any]:
"""
将 ROA JSON 中常见的 PascalCase 字段映射为 SDK 模型使用的小写键,便于下游判断标准错误码。
网关可能返回 ``Code`` / ``Message`` / ``RequestId``,而 alibabacloud 部分模型 ``from_map`` 只读小写,
导致 ``body.to_map()`` 为空;此处与原始 ``call_api_async`` 的 ``body`` 字典对齐。
"""
if not inner:
return {}
out: Dict[str, Any] = dict(inner)
if out.get("code") is None and "Code" in inner:
out["code"] = inner["Code"]
if out.get("message") is None and "Message" in inner:
out["message"] = inner["Message"]
if out.get("request_id") is None and "RequestId" in inner:
out["request_id"] = inner["RequestId"]
if out.get("recommend") is None and "Recommend" in inner:
out["recommend"] = inner["Recommend"]
if out.get("host_id") is None and "HostId" in inner:
out["host_id"] = inner["HostId"]
data = inner.get("data") if isinstance(inner.get("data"), dict) else inner.get("Data")
if isinstance(data, dict):
nd = dict(data)
if "task_id" not in nd and "TaskId" in nd:
nd["task_id"] = nd["TaskId"]
# GetDiagnosisResult 轮询数据
if "status" not in nd and "Status" in nd:
nd["status"] = nd["Status"]
if "err_msg" not in nd and "ErrMsg" in nd:
nd["err_msg"] = nd["ErrMsg"]
# result / Result:部分专项在 result 为 null/{} 时把载荷放在 Result,或仅有大写键
rlow = nd.get("result")
rhigh = nd.get("Result")
if rhigh is not None and (
rlow is None
or rlow == ""
or (isinstance(rlow, dict) and not rlow)
):
nd["result"] = rhigh
elif "result" not in nd and rhigh is not None:
nd["result"] = rhigh
out["data"] = nd
return out
def _biz_code_success(code: Optional[str]) -> bool:
return str(code or "").strip().lower() == "success"
def _format_roa_biz_error(norm: Dict[str, Any]) -> str:
c = norm.get("code")
m = (norm.get("message") or "").strip()
if c and m:
return f"{c}: {m}"
if m:
return m
if c:
return str(c)
try:
return json.dumps(norm, ensure_ascii=False)[:2000]
except (TypeError, ValueError):
return str(norm)[:2000]
def build_sdk_request(
request_cls: Type[TeaModel],
request: Optional[Union[TeaModel, Dict[str, Any]]],
) -> Tuple[Optional[TeaModel], Optional[str]]:
"""
将 ``call_api`` 的 ``request`` 参数规范化为 SDK 的 ``*Request`` 实例。
Returns:
``(req, None)`` 成功;``(None, err_msg)`` 类型不匹配等失败。
"""
if request is None:
return request_cls(), None
if isinstance(request, dict):
req = request_cls()
req.from_map(request)
return req, None
if isinstance(request, request_cls):
return request, None
return (
None,
f"请求类型错误:需要 {request_cls.__name__} 或 dict,实际为 {type(request).__name__}",
)
class SysomOpenAPIClient:
"""
- ``verify_auth=True``:解析凭证并调用 InitialSysom 校验(与 precheck 一致)。
- ``verify_auth=False``:仅解析凭证。
"""
def __init__(
self,
*,
verify_auth: bool = True,
credentials: Optional[Dict[str, str]] = None,
endpoint: str = DEFAULT_SYSOM_ENDPOINT,
) -> None:
self._endpoint = endpoint
if credentials is not None:
self._credentials = dict(credentials)
self._auth_method = "显式传入 credentials"
if verify_auth:
api_result = test_sysom_api(self._credentials)
if not api_result["success"]:
raise RuntimeError(
api_result.get("error", "SysOM API 校验失败")
+ (
f" — {api_result.get('detail', '')}"
if api_result.get("detail")
else ""
)
)
else:
resolved = resolve_sysom_auth(verify_api=verify_auth)
if not resolved.get("ok"):
raise RuntimeError(resolved.get("error", "无法解析 SysOM 凭证"))
self._credentials = resolved["credentials"]
self._auth_method = resolved.get("method", "unknown")
cfg = open_api_models.Config(
access_key_id=self._credentials["ak_id"],
access_key_secret=self._credentials["ak_secret"],
endpoint=self._endpoint,
user_agent="AlibabaCloud-Agent-Skills/alibabacloud-sysom-diagnosis",
)
if self._credentials.get("security_token"):
cfg.security_token = self._credentials["security_token"]
cfg.connect_timeout = _CONNECT_TIMEOUT_MS
self._sdk_client = SysOM20231230Client(cfg)
@property
def auth_method(self) -> str:
return self._auth_method
async def _call_invoke_diagnosis_roa_raw(
self,
request: Union[TeaModel, Dict[str, Any]],
*,
return_as: ReturnAs,
) -> Tuple[bool, Any, Optional[str]]:
"""经 ``call_api_async`` 取原始 body,避免 PascalCase 导致高层 ``body.to_map()`` 为空。"""
from alibabacloud_sysom20231230 import models as sysom_models
from alibabacloud_tea_openapi import utils_models as open_api_util_models
from alibabacloud_tea_openapi.utils import Utils
from alibabacloud_tea_util import models as tea_util_models
if return_as != "dict":
return False, None, "InvokeDiagnosis 当前仅支持 return_as=dict"
req, build_err = build_sdk_request(sysom_models.InvokeDiagnosisRequest, request)
if build_err is not None:
return False, None, build_err
body = req.to_map()
open_req = open_api_util_models.OpenApiRequest(
headers={},
body=Utils.parse_to_map(body),
)
params = open_api_util_models.Params(
action="InvokeDiagnosis",
version="2023-12-30",
protocol="HTTPS",
pathname="/api/v1/openapi/diagnosis/invoke_diagnosis",
method="POST",
auth_type="AK",
style="ROA",
req_body_type="json",
body_type="json",
)
runtime = tea_util_models.RuntimeOptions()
raw = await self._sdk_client.call_api_async(params, open_req, runtime)
if not isinstance(raw, dict):
return False, None, f"invoke_diagnosis 非预期响应类型: {type(raw)!r}"
status = raw.get("statusCode") or raw.get("status_code")
inner = raw.get("body") if isinstance(raw.get("body"), dict) else {}
norm = normalize_sysom_roa_body(inner)
if status != 200:
return False, norm, _format_roa_biz_error(norm) if norm else f"HTTP {status}"
if not _biz_code_success(norm.get("code")):
return False, norm, _format_roa_biz_error(norm)
return True, norm, None
async def _call_get_diagnosis_result_roa_raw(
self,
request: Union[TeaModel, Dict[str, Any]],
*,
return_as: ReturnAs,
) -> Tuple[bool, Any, Optional[str]]:
from alibabacloud_sysom20231230 import models as sysom_models
from alibabacloud_tea_openapi import utils_models as open_api_util_models
from alibabacloud_tea_openapi.utils import Utils
from alibabacloud_tea_util import models as tea_util_models
if return_as != "dict":
return False, None, "GetDiagnosisResult 当前仅支持 return_as=dict"
req, build_err = build_sdk_request(sysom_models.GetDiagnosisResultRequest, request)
if build_err is not None:
return False, None, build_err
query: Dict[str, str] = {}
if getattr(req, "task_id", None):
query["task_id"] = req.task_id
open_req = open_api_util_models.OpenApiRequest(
headers={},
query=Utils.query(query),
)
params = open_api_util_models.Params(
action="GetDiagnosisResult",
version="2023-12-30",
protocol="HTTPS",
pathname="/api/v1/openapi/diagnosis/get_diagnosis_results",
method="GET",
auth_type="AK",
style="ROA",
req_body_type="json",
body_type="json",
)
runtime = tea_util_models.RuntimeOptions()
raw = await self._sdk_client.call_api_async(params, open_req, runtime)
if not isinstance(raw, dict):
return False, None, f"get_diagnosis_result 非预期响应类型: {type(raw)!r}"
status = raw.get("statusCode") or raw.get("status_code")
inner = raw.get("body") if isinstance(raw.get("body"), dict) else {}
norm = normalize_sysom_roa_body(inner)
if status != 200:
return False, norm, _format_roa_biz_error(norm) if norm else f"HTTP {status}"
if not _biz_code_success(norm.get("code")):
return False, norm, _format_roa_biz_error(norm)
return True, norm, None
async def call_api(
self,
api_name: str,
request: Optional[Union[TeaModel, Dict[str, Any]]] = None,
*,
return_as: ReturnAs = "body",
) -> Tuple[bool, Any, Optional[str]]:
action = api_name.strip()
if action == "InvokeDiagnosis":
return await self._call_invoke_diagnosis_roa_raw(request, return_as=return_as)
if action == "GetDiagnosisResult":
return await self._call_get_diagnosis_result_roa_raw(request, return_as=return_as)
try:
request_cls, method = resolve_sysom_async_api(api_name, self._sdk_client)
except (LookupError, ValueError) as e:
return False, None, str(e)
req, build_err = build_sdk_request(request_cls, request)
if build_err is not None:
return False, None, build_err
try:
response = await method(req)
except Exception as exc: # noqa: BLE001
msg = getattr(exc, "message", None) or str(exc)
data = getattr(exc, "data", None)
if isinstance(data, dict) and data.get("Recommend"):
msg = f"{msg} | 诊断/建议: {data['Recommend']}"
return False, None, msg
status_code = getattr(response, "status_code", None)
body = getattr(response, "body", None)
if status_code == 200:
return True, pack_result(response, body, return_as), None
if body is None:
err = "未知错误(响应 body 为空)"
else:
code, message = getattr(body, "code", None), getattr(body, "message", None)
# 避免仅缺 message 时退化成 str(body) 不易读;与 diagnosis_helper 抽取逻辑一致
if code is not None and message is not None:
err = f"{code}: {message}"
elif code is not None:
m = (message or "").strip()
err = f"{code}: {m}" if m else str(code)
elif message is not None:
err = str(message)
else:
err = str(body)
return False, pack_result(response, body, return_as), err
class _LazySysomOpenAPIClient:
"""
惰性包装:首次访问属性/方法时才构造 ``SysomOpenAPIClient``。
解决在 Agent/IDE 中多段 Bash 未设置凭证时,导入 diagnosis 子命令即失败的问题。
"""
__slots__ = ("_inner",)
def __init__(self) -> None:
self._inner: Optional[SysomOpenAPIClient] = None
def _get(self) -> SysomOpenAPIClient:
if self._inner is None:
self._inner = SysomOpenAPIClient()
return self._inner
def __getattr__(self, name: str) -> Any:
return getattr(self._get(), name)
sysom_openapi_client = _LazySysomOpenAPIClient()
FILE:scripts/sysom_cli/lib/precheck_envelope.py
# -*- coding: utf-8 -*-
"""将 run_precheck 结果组装为与 precheck 子命令一致的 JSON 信封(供 precheck 命令与远程门禁复用)。"""
from __future__ import annotations
from typing import Any, Dict
from sysom_cli.lib.guidance import (
auth_path_choice_block,
base_guidance_block,
precheck_guidance_compact,
precheck_guidance_success_light,
remediation_for_precheck_failure,
remediation_service_not_activated,
scenario_hint_for_error_code,
)
from sysom_cli.lib.invoke_envelope_finalize import ensure_findings_finding_type
from sysom_cli.lib.precheck_summary import summarize_precheck_for_agent
from sysom_cli.lib.schema import agent_block, envelope
def envelope_from_precheck_result(check_result: Dict[str, Any]) -> Dict[str, Any]:
"""
check_result: run_precheck() 返回值。
与 PrecheckCommand.execute_local 输出一致。
"""
guidance_base = base_guidance_block()
if check_result["ok"]:
findings_ok = [
{
"severity": "info",
"title": "认证验证通过",
"detail": check_result["message"],
},
{
"severity": "info",
"title": "下一步",
"detail": (
"precheck 已通过。可在 memory 子命令加 --deep-diagnosis 发起深度诊断。"
"失败时检查云助手与控制台诊断授权;见 references/openapi-permission-guide.md。"
),
},
]
ensure_findings_finding_type(findings_ok)
return envelope(
action="precheck",
ok=True,
agent=agent_block(
status="normal",
summary=f"认证方式:{check_result['method']},验证成功",
findings=findings_ok,
),
data={
"auth_method": check_result["method"],
"role": check_result.get("role"),
"api_accessible": True,
"guidance": precheck_guidance_success_light(),
},
execution={"subsystem": "precheck", "mode": "local"},
)
err_code_pre = check_result.get("error_code")
is_service_not_activated = err_code_pre == "service_not_activated"
is_sysom_role_not_exist = err_code_pre == "sysom_role_not_exist"
needs_sysom_activation = is_service_not_activated or is_sysom_role_not_exist
err_title = "认证失败"
if is_service_not_activated:
err_title = "SysOM 服务未开通"
elif is_sysom_role_not_exist:
err_title = "SysOM 服务关联角色未就绪"
path_summary = summarize_precheck_for_agent(check_result)
primary_path = path_summary.get("primary_path", "configure_identity")
error_finding = {
"severity": "error",
"title": err_title,
"detail": check_result["error"],
}
findings: list = []
if needs_sysom_activation:
findings = [
error_finding,
{
"severity": "info",
"title": "处理说明",
"detail": "具体步骤见 data.remediation。",
},
]
else:
if primary_path in ("ecs_ram_role", "access_key"):
findings = [
error_finding,
{
"severity": "info",
"title": "修复引导",
"detail": "请按 data.remediation 顺序执行;详细文档见 data.guidance.read_next。",
},
]
else:
findings = [
error_finding,
{
"severity": "info",
"title": "选择凭证路径",
"detail": "见 data.guidance.auth_path_choice;步骤见 data.remediation。",
},
]
err_code = check_result.get("error_code", "auth_failed")
scenario = scenario_hint_for_error_code(err_code)
if needs_sysom_activation:
scenario = scenario or "A-K3_or_E-R4_sysom_not_activated"
if needs_sysom_activation:
if is_sysom_role_not_exist:
summary = (
"SysOM 服务关联角色未就绪(InitialSysom role_exist=false)。"
"请在 Alinux 控制台完成开通(SLR 通常随开通自动创建),等待后再 precheck"
)
else:
summary = (
"SysOM 服务未开通。请访问 https://alinux.console.aliyun.com/?source=cosh 完成服务开通"
)
next_steps = [
{
"tool": "Read",
"args": "references/openapi-permission-guide.md",
"reason": "场景 A-K3 / E-R4:开通与 SLR 相关引导",
},
]
remediation = remediation_service_not_activated()
data_out = {
"path_summary": path_summary,
"service_activated": False,
"activation_url": "https://alinux.console.aliyun.com/?source=cosh",
"guidance": {
"precheck_command": guidance_base["precheck_command"],
"read_next": list(guidance_base["read_next"]),
},
"remediation": remediation,
}
elif primary_path == "ecs_ram_role":
summary = f"认证失败,请按 data.remediation 完成配置。{path_summary['recommended_focus']}"
next_steps = [
{
"tool": "Read",
"args": "references/authentication.md(ECS RAM Role)",
"reason": "角色授权与 CLI 配置细节",
},
{
"tool": "Bash",
"args": "cd <sysom-diagnosis 技能根> && ./scripts/osops.sh precheck",
"reason": "修复后复验",
},
]
remediation = remediation_for_precheck_failure(
check_result, path_summary, needs_sysom_activation=False
)
guidance_merged = precheck_guidance_compact("ecs_ram_role", scenario_hint=scenario)
data_out: Dict[str, Any] = {
"path_summary": path_summary,
"guidance": guidance_merged,
"remediation": remediation,
}
if check_result.get("ecs_role_name"):
data_out["ecs_role_name"] = check_result["ecs_role_name"]
elif primary_path == "access_key":
summary = f"认证失败,请按 data.remediation 完成配置。{path_summary['recommended_focus']}"
next_steps = [
{
"tool": "Read",
"args": "references/authentication.md(AccessKey)",
"reason": "AK 配置与权限",
},
{
"tool": "Bash",
"args": "cd <sysom-diagnosis 技能根> && ./scripts/osops.sh configure",
"reason": "由 Agent 执行交互式配置(勿在对话输入密钥)",
},
{
"tool": "Bash",
"args": "cd <sysom-diagnosis 技能根> && ./scripts/osops.sh precheck",
"reason": "配置完成后复验",
},
]
remediation = remediation_for_precheck_failure(
check_result, path_summary, needs_sysom_activation=False
)
guidance_merged = precheck_guidance_compact("access_key", scenario_hint=scenario)
data_out = {
"path_summary": path_summary,
"guidance": guidance_merged,
"remediation": remediation,
}
else:
summary = f"认证失败,请按 data.remediation 完成凭证配置。{path_summary['recommended_focus']}"
next_steps = [
{
"tool": "AskUser",
"args": (
"请选择凭证方式:A) RAM 用户 AccessKey(本机/非 ECS) "
"B) ECS 实例 RAM Role(仅在阿里云 ECS 内运行本 CLI)"
),
"reason": "未配置凭证时必须先选路径,再执行对应配置步骤",
},
{
"tool": "Bash",
"args": "cd <sysom-diagnosis 技能根> && ./scripts/osops.sh configure",
"reason": "仅当用户选择 A:由 Agent 直接执行交互式配置(勿在对话输入密钥)",
},
{
"tool": "Bash",
"args": "cd <sysom-diagnosis 技能根> && ./scripts/osops.sh precheck",
"reason": "A 路径 configure 完成后,或 B 路径完成控制台+EcsRamRole 配置后复验",
},
{
"tool": "Read",
"args": "references/openapi-permission-guide.md(§4.1 ECS RAM Role)",
"reason": "仅当用户选择 B:绑定角色与 EcsRamRole 分步说明",
},
{
"tool": "Read",
"args": "references/authentication.md",
"reason": "AK 与 ECS RAM Role 详细步骤(控制台操作)",
},
]
remediation = remediation_for_precheck_failure(
check_result, path_summary, needs_sysom_activation=False
)
guidance_merged = {
"credential_policy": guidance_base["credential_policy"],
"precheck_command": guidance_base["precheck_command"],
"configure_command": guidance_base["configure_command"],
"read_next": list(guidance_base["read_next"]),
"auth_path_choice": auth_path_choice_block(),
}
data_out = {
"path_summary": path_summary,
"guidance": guidance_merged,
"remediation": remediation,
}
ensure_findings_finding_type(findings)
return envelope(
action="precheck",
ok=False,
agent=agent_block(
status="error",
summary=summary,
findings=findings,
next_steps=next_steps,
),
data=data_out,
error={
"code": check_result.get("error_code", "auth_failed"),
"message": check_result["error"],
},
execution={"subsystem": "precheck", "mode": "local"},
)
FILE:scripts/sysom_cli/lib/precheck_gate.py
# -*- coding: utf-8 -*-
"""远程 OpenAPI 前的环境门禁:复用 run_precheck 与同构信封。"""
from __future__ import annotations
from typing import Any, Dict, Optional, Tuple
from sysom_cli.lib.auth import run_precheck
from sysom_cli.lib.precheck_envelope import envelope_from_precheck_result
def remote_precheck_gate() -> Tuple[bool, Optional[Dict[str, Any]]]:
"""
Returns:
(True, None) 可继续远程调用;
(False, envelope) 与 `osops precheck` 失败同构的信封,不应再发 invoke。
"""
cr = run_precheck()
if cr["ok"]:
return True, None
return False, envelope_from_precheck_result(cr)
def merge_precheck_gate_failure_into_memory_envelope(
quick_envelope: Dict[str, Any],
precheck_env: Dict[str, Any],
) -> None:
"""
在已构建的 memory quick 信封上合并 precheck 失败(--deep-diagnosis 路径)。
就地修改 quick_envelope。
"""
quick_envelope["ok"] = False
if precheck_env.get("error"):
quick_envelope["error"] = precheck_env["error"]
pdata = precheck_env.get("data") or {}
data = quick_envelope.setdefault("data", {})
data["precheck_gate"] = {
"ok": False,
"remediation": pdata.get("remediation"),
"guidance": pdata.get("guidance"),
"check_details": pdata.get("check_details"),
"path_summary": pdata.get("path_summary"),
}
agent = quick_envelope.setdefault("agent", {})
base_summary = (agent.get("summary") or "").strip()
p_agent = precheck_env.get("agent") or {}
ps = (p_agent.get("summary") or "").strip()
agent["summary"] = (
"data.local 已包含所有本机 OOM 发现,请直接展示给用户,无需采集额外信息。"
"深度诊断需要认证,请按 data.precheck_gate.remediation 引导用户配置凭证以获取更全面的远程分析。"
)
existing = agent.get("findings") or []
extra = list(p_agent.get("findings") or [])
if len(extra) > 2:
extra = [
extra[0],
{
"severity": "info",
"title": "预检补充说明",
"detail": (
"完整修复步骤见 data.precheck_gate.remediation 与 path_summary;"
"无需将 precheck 全部 findings 逐条展开。"
),
},
]
agent["findings"] = list(existing) + extra
# precheck 失败时清空 next,避免 Agent 解读为“还有命令可跑”
agent["next"] = []
FILE:scripts/sysom_cli/lib/precheck_summary.py
# -*- coding: utf-8 -*-
"""precheck 结果压缩与路径推荐:合并 AK 来源、对比 AK 与 ECS RAM Role 路径完成度。"""
from __future__ import annotations
from typing import Any, Dict, List, Optional
def _find_row(checked: List[Dict[str, Any]], method: str) -> Optional[Dict[str, Any]]:
for row in checked:
if row.get("method") == method:
return row
return None
def _find_rows_prefix(checked: List[Dict[str, Any]], prefix: str) -> List[Dict[str, Any]]:
return [r for r in checked if str(r.get("method", "")).startswith(prefix)]
def _status_text(status: str) -> str:
s = (status or "").strip()
if s.startswith("✗"):
return s[1:].strip()
if s.startswith("✓"):
return s[1:].strip()
return s
def summarize_precheck_for_agent(check_result: Dict[str, Any]) -> Dict[str, Any]:
"""
供 Agent 展示:合并「环境变量 AK」与「配置文件 AK」为一条;汇总 RAM Role 相关行;
根据完成度推荐优先跟进 AK 路径、RAM Role 路径或仅控制台开通。
"""
checked: List[Dict[str, Any]] = list(check_result.get("checked") or [])
err_code = check_result.get("error_code")
ecs_role_name = check_result.get("ecs_role_name")
ecs_row = _find_row(checked, "ECS元数据")
env_row = _find_row(checked, "环境变量 AKSK")
file_ak_row = _find_row(checked, "配置文件 AKSK")
file_generic = _find_row(checked, "配置文件")
env_unconfigured = env_row is not None and "未配置" in (env_row.get("status") or "")
env_attempted = env_row is not None and not env_unconfigured
file_ak_attempted = file_ak_row is not None
# AK 合并行
ak_parts: List[str] = []
if env_attempted:
ak_parts.append(f"环境变量:{_status_text(env_row.get('status', ''))}")
elif env_row is not None:
ak_parts.append("环境变量:未配置")
if file_ak_attempted:
ak_parts.append(f"配置文件 AK:{_status_text(file_ak_row.get('status', ''))}")
if not ak_parts:
if file_generic is not None:
ak_parts.append(f"配置文件:{_status_text(file_generic.get('status', ''))}")
else:
ak_parts.append("未检测到环境变量 AK 与配置文件 AK 模式")
merged_ak = "AccessKey(环境变量与本地配置文件 AK 模式,任选其一即可)"
if env_attempted or file_ak_attempted:
merged_ak_status = ";".join(ak_parts)
else:
merged_ak_status = ";".join(ak_parts) if ak_parts else "未配置有效 AK"
ram_rows = [
r
for r in checked
if str(r.get("method", "")).lower().startswith("配置文件 ecsramrole")
or str(r.get("method", "")).startswith("配置文件 ECS RAM Role")
or str(r.get("method", "")).startswith("配置文件 RAM Role")
]
if ram_rows:
ram_status = " | ".join(
f"{r.get('method')}: {_status_text(r.get('status', ''))}" for r in ram_rows
)
else:
ram_status = "未使用配置文件中的 ECS RAM Role 模式或未配置"
merged_rows = [
{"label": "ECS 元数据(实例是否绑定 RAM 角色)", "status": ecs_row.get("status", "") if ecs_row else "—"},
{"label": "AccessKey(环境变量或配置文件,二选一即可)", "status": merged_ak_status},
{"label": "ECS RAM Role(配置文件 ECS RAM Role / RAM Role)", "status": ram_status},
]
ecs_status = (ecs_row.get("status") or "") if ecs_row else ""
metadata_role_bound = bool(ecs_role_name) or ("✓" in ecs_status)
file_profile_missing_or_invalid = file_generic is not None and "✗" in (file_generic.get("status") or "")
progress: Dict[str, Any] = {
"activation_needed": err_code in ("service_not_activated", "sysom_role_not_exist"),
"metadata_role_bound": metadata_role_bound,
"env_ak_unconfigured": env_unconfigured and env_row is not None,
"env_ak_attempted": env_attempted,
"file_ak_attempted": file_ak_attempted,
"file_profile_missing_or_invalid": file_profile_missing_or_invalid,
"ram_profile_configured": bool(ram_rows),
"ecs_role_name": ecs_role_name,
}
activation_needed = bool(progress["activation_needed"])
# 完成度:AK 路径(越高越接近「只剩控制台」)
ak_score = 0
if env_attempted:
ak_score += 2
if file_ak_attempted:
ak_score += 2
if activation_needed and (env_attempted or file_ak_attempted):
ak_score += 3
# 完成度:RAM 路径
ram_score = 0
if ecs_role_name:
ram_score += 2
if ram_rows:
ram_score += 2
if activation_needed and ram_rows:
ram_score += 2
focus: str
primary_path: str
if activation_needed:
# 仅一侧 AK 已生效、另一侧未配:不要引导再去配「第二套」AK
if file_ak_attempted and env_unconfigured and not env_attempted:
primary_path = "console_only"
focus = (
"AccessKey 已从配置文件生效(InitialSysom 已调用);账号侧未开通或 SLR 未就绪。"
"请优先在 Alinux/SysOM 控制台完成开通,无需再配置环境变量 AK。"
)
elif env_attempted and not file_ak_attempted:
primary_path = "console_only"
focus = (
"AccessKey 已从环境变量生效;账号侧未开通或 SLR 未就绪。"
"请优先在控制台完成开通;若希望改用配置文件 AK,再单独切换即可,不必两套同时配置。"
)
elif ak_score > ram_score and (env_attempted or file_ak_attempted):
primary_path = "access_key_then_console"
focus = (
"当前 AccessKey 路径完成度更高:InitialSysom 已返回需账号侧开通 SysOM(SLR 通常随 Alinux 控制台开通自动创建)。"
"请优先 Alinux/SysOM 控制台;环境变量 AK 与配置文件 AK 只需其一有效即可。"
)
elif ram_score > ak_score and (ecs_role_name or ram_rows):
primary_path = "ecs_ram_role"
focus = (
"当前 ECS RAM Role 路径完成度更高:请优先完成 RAM 策略、实例绑定与 CLI 的 ECS RAM Role 模式配置,"
"再处理 SysOM 控制台开通。"
)
else:
primary_path = "configure_identity"
focus = (
"请先建立一种可用身份:环境变量 AK、本地配置文件 AK,或(在 ECS 上)RAM 角色 + ECS RAM Role;"
"三条路线满足其一即可,不必并行配置多种。"
)
else:
# 非开通类失败:单路径优先(完成度高的路径独占引导,不并列 AK + RAM)
if ram_score > ak_score and (ecs_role_name or ram_rows):
primary_path = "ecs_ram_role"
if ram_rows and ecs_role_name:
focus = (
"实例已绑定 RAM 角色且本地配置文件为 ECS RAM Role/RAM Role:请优先确认该角色具备 AliyunSysomFullAccess、"
"等待策略生效后重试 precheck;勿再并行配置 AccessKey。"
)
elif ecs_role_name:
focus = (
f"实例已绑定 RAM 角色「{ecs_role_name}」:请授予 AliyunSysomFullAccess,"
"并在实例内将 CLI 配置为 ECS RAM Role(或核对现有 ECS RAM Role 配置)后重试 precheck;勿并行配置 AccessKey。"
)
else:
focus = (
"配置文件已指向 ECS RAM Role:请确认角色具备 AliyunSysomFullAccess、实例绑定与 STS 拉取正常,"
"勿并行配置 AccessKey。"
)
elif ak_score > ram_score and (env_attempted or file_ak_attempted):
primary_path = "access_key"
focus = (
"AccessKey 路径已尝试:请仅沿该路径修复(密钥有效性、AliyunSysomFullAccess、环境变量与 precheck 同进程等),"
"勿同时引入 ECS RAM Role 排查。"
)
elif ram_score == ak_score:
# 平局:实例已绑角色则优先 RAM 路径,避免「已绑角色却仍像未选路径」
if ecs_role_name or ram_rows:
primary_path = "ecs_ram_role"
if ram_rows and ecs_role_name:
focus = (
"实例已绑定 RAM 角色且本地配置文件为 ECS RAM Role/RAM Role:请优先确认该角色具备 AliyunSysomFullAccess、"
"等待策略生效后重试 precheck;勿再并行配置 AccessKey。"
)
elif ecs_role_name:
focus = (
f"实例已绑定 RAM 角色「{ecs_role_name}」:请授予 AliyunSysomFullAccess,"
"并在实例内将 CLI 配置为 ECS RAM Role(或核对现有 ECS RAM Role 配置)后重试 precheck;勿并行配置 AccessKey。"
)
else:
focus = (
"配置文件已指向 ECS RAM Role:请确认角色具备 AliyunSysomFullAccess、实例绑定与 STS 拉取正常,"
"勿并行配置 AccessKey。"
)
elif env_attempted or file_ak_attempted:
primary_path = "access_key"
focus = (
"AccessKey 路径已尝试:请仅沿该路径修复(密钥有效性、AliyunSysomFullAccess、环境变量与 precheck 同进程等),"
"勿同时引入 ECS RAM Role 排查。"
)
else:
primary_path = "configure_identity"
focus = (
"尚未形成明确领先路径:请先建立一种可用身份——环境变量 AK、本地配置文件 AK,"
"或(在 ECS 上)实例 RAM 角色 + ECS RAM Role;满足其一即可。"
)
else:
primary_path = "configure_identity"
focus = (
"尚未形成明确领先路径:请先建立一种可用身份——环境变量 AK、本地配置文件 AK,"
"或(在 ECS 上)实例 RAM 角色 + ECS RAM Role;满足其一即可。"
)
return {
"merged_rows": merged_rows,
"scores": {"ak_path": ak_score, "ram_role_path": ram_score},
"primary_path": primary_path,
"recommended_focus": focus,
"progress": progress,
}
FILE:scripts/sysom_cli/lib/schema.py
# -*- coding: utf-8 -*-
from __future__ import annotations
import json
from typing import Any, Dict, List, Optional
FORMAT_NAME = "sysom_agent"
# 信封「字段契约」版本:删除/改名已承诺的 data/agent 键,或已承诺的引用串格式(如 data_refs 形)变更时递增。
FORMAT_VERSION = "3.4"
def agent_block(
status: str,
summary: str,
*,
findings: Optional[List[Dict[str, Any]]] = None,
next_steps: Optional[List[Dict[str, Any]]] = None,
) -> Dict[str, Any]:
return {
"status": status,
"summary": summary,
"findings": findings or [],
"next": next_steps or [],
}
def envelope(
*,
action: str,
ok: bool,
agent: Dict[str, Any],
data: Optional[Dict[str, Any]] = None,
error: Optional[Dict[str, Any]] = None,
execution: Optional[Dict[str, Any]] = None,
) -> Dict[str, Any]:
return {
"format": FORMAT_NAME,
"version": FORMAT_VERSION,
"ok": ok,
"action": action,
"error": error,
"agent": agent,
"data": data if data is not None else {},
"execution": execution if execution is not None else {},
}
def dumps(obj: Dict[str, Any], *, compact: bool = False) -> str:
if compact:
return json.dumps(obj, ensure_ascii=False, separators=(",", ":"))
return json.dumps(obj, ensure_ascii=False, indent=2)
def error_envelope(action: str, code: str, message: str) -> Dict[str, Any]:
return envelope(
action=action,
ok=False,
agent=agent_block("unknown", message, findings=[]),
error={"code": code, "message": message},
execution={"mode": "local"},
)
FILE:scripts/sysom_cli/lib/specialty_args.py
# -*- coding: utf-8 -*-
"""非内存 `io`/`net`/`load` 子命令的 OpenAPI 侧参数(与 DiagnosisBackend/invoke 实现同源;不含 --service-name,由子命令固定)。"""
from __future__ import annotations
from typing import Any, Dict, List, Tuple
SPECIALTY_INVOKE_ARGS: List[Tuple[Any, Dict[str, Any]]] = [
(
["--verbose-envelope"],
{
"action": "store_true",
"help": "输出展开型 agent.summary(默认紧凑:省 token,结论见 data.remote)。",
},
),
(
["--channel"],
{"default": "ecs", "help": "诊断通道,官方文档当前固定为 ecs"},
),
(
["--params"],
{
"default": None,
"help": "params JSON 字符串(与 OpenAPI 文档「诊断参数说明」一致)",
},
),
(
["--params-file"],
{"default": None, "help": "params JSON 文件路径"},
),
(
["--instance"],
{
"default": None,
"help": "合并到 params.instance;省略时若在 ECS 上可自动从元数据 instance-id 补全",
},
),
(
["--region"],
{
"default": None,
"help": "合并到 params.region;省略时若在 ECS 上可自动从元数据 region-id 补全",
},
),
(
["--timeout"],
{"type": int, "default": 300, "help": "轮询 GetDiagnosisResult 总超时秒数"},
),
(
["--poll-interval"],
{"type": int, "default": 1, "help": "轮询间隔秒数"},
),
]
FILE:scripts/sysom_cli/lib/specialty_command.py
# -*- coding: utf-8 -*-
from __future__ import annotations
from argparse import Namespace
from typing import Any, Dict
from sysom_cli.core.base import ExecutionMode, RemoteOnlyCommand
from sysom_cli.lib.diagnosis_backend import get_diagnosis_backend
class BaseServiceSpecialtyCommand(RemoteOnlyCommand):
"""固定 service_name 的 SysOM 专项;远程-only,不走 MEMORY_MODE。"""
SERVICE_NAME: str = ""
@property
def command_name(self) -> str:
return self.SERVICE_NAME
def execute_remote(self, ns: Namespace) -> Dict[str, Any]:
return get_diagnosis_backend().invoke_specialty(self.SERVICE_NAME, ns)
@property
def supported_modes(self) -> Dict[str, bool]:
return {
ExecutionMode.LOCAL: False,
ExecutionMode.REMOTE: True,
ExecutionMode.HYBRID: False,
}
FILE:scripts/sysom_cli/load/__init__.py
# -*- coding: utf-8 -*-
"""系统负载与调度类 SysOM 专项(薄封装 service_name)。"""
FILE:scripts/sysom_cli/load/delay/__init__.py
# -*- coding: utf-8 -*-
FILE:scripts/sysom_cli/load/delay/command.py
# -*- coding: utf-8 -*-
from __future__ import annotations
from sysom_cli.core.registry import command_metadata
from sysom_cli.lib.specialty_args import SPECIALTY_INVOKE_ARGS
from sysom_cli.lib.specialty_command import BaseServiceSpecialtyCommand
@command_metadata(
name="delay",
help="SysOM 专项 delay:调度延迟(nosched)。params 见 references/diagnoses/delay.md",
subsystem="load",
args=list(SPECIALTY_INVOKE_ARGS),
)
class DelayCommand(BaseServiceSpecialtyCommand):
SERVICE_NAME = "delay"
FILE:scripts/sysom_cli/load/loadtask/__init__.py
# -*- coding: utf-8 -*-
FILE:scripts/sysom_cli/load/loadtask/command.py
# -*- coding: utf-8 -*-
from __future__ import annotations
from sysom_cli.core.registry import command_metadata
from sysom_cli.lib.specialty_args import SPECIALTY_INVOKE_ARGS
from sysom_cli.lib.specialty_command import BaseServiceSpecialtyCommand
@command_metadata(
name="loadtask",
help="SysOM 专项 loadtask:系统负载诊断。params 见 references/diagnoses/loadtask.md",
subsystem="load",
args=list(SPECIALTY_INVOKE_ARGS),
)
class LoadtaskCommand(BaseServiceSpecialtyCommand):
SERVICE_NAME = "loadtask"
FILE:scripts/sysom_cli/memory/__init__.py
# sysom_cli.memory: 本地内存快速排查子命令(classify / memgraph / oom / …)
FILE:scripts/sysom_cli/memory/classify/__init__.py
# memory classify
FILE:scripts/sysom_cli/memory/classify/command.py
# -*- coding: utf-8 -*-
"""本地内存归类:多特征分类与路由建议。"""
from __future__ import annotations
from argparse import Namespace
from typing import Any, Dict, List
from sysom_cli.core.base import BaseCommand, ExecutionMode
from sysom_cli.core.registry import command_metadata
from sysom_cli.lib.schema import agent_block, envelope
from sysom_cli.memory.lib.classify_engine import build_remote_analysis_payload, run_classify
from sysom_cli.memory.lib.envelope_memory import (
next_steps_struct,
oom_diagnosis_invoke_extra_purpose_zh,
)
from sysom_cli.memory.lib.memory_envelope_finalize import finalize_memory_envelope
from sysom_cli.memory.lib.memory_remote_helpers import (
run_memory_deep_diagnosis_local_first,
run_memory_remote_invoke,
)
from sysom_cli.memory.lib.shared_invoke_args import MEMORY_DEEP_DIAGNOSIS_ARGS
@command_metadata(
name="classify",
help=(
"本机快速排查:meminfo / 内核 OOM 线索 / 进程 RSS 粗采样,给出类别与建议的 SysOM 专项;"
"可选 --deep-diagnosis:快速排查后继续深度诊断。"
),
subsystem="memory",
args=list(MEMORY_DEEP_DIAGNOSIS_ARGS),
)
class ClassifyCommand(BaseCommand):
"""本地 memory classify:默认仅快速排查;可选 --deep-diagnosis 接深度诊断。"""
@property
def command_name(self) -> str:
return "classify"
@property
def supported_modes(self) -> Dict[str, bool]:
return {
ExecutionMode.LOCAL: True,
ExecutionMode.REMOTE: True,
ExecutionMode.HYBRID: False,
}
def execute_local(self, ns: Namespace) -> Dict[str, Any]:
result = run_classify()
remote_analysis_value = build_remote_analysis_payload()
recommended = result.recommended_service_name
summary = result.primary_reason_zh + (
f"(建议 SysOM 专项:{recommended},置信度约 {result.confidence:.2f})"
)
oom_hc = int((result.oom_local or {}).get("hit_count") or 0)
if recommended == "oomcheck" and oom_hc > 1:
summary += f" 检出 {oom_hc} 次 OOM,可用 --oom-time 指定某次。"
if getattr(ns, "deep_diagnosis", False):
return run_memory_deep_diagnosis_local_first(
recommended=recommended,
memory_action="memory_classify",
ns=ns,
remote_analysis_value=remote_analysis_value,
verbose_summary=f"深度诊断:已按归类建议发起专项「{recommended}」。{summary}",
)
ff = result.facts
meminfo_finding: Dict[str, Any] = {
"kind": "meminfo_and_classify",
"mem_total_kb": ff.get("mem_total_kb"),
"mem_available_kb": ff.get("mem_available_kb"),
"mem_available_ratio": ff.get("mem_available_ratio"),
"oom_event_count": ff.get("oom_event_count"),
}
findings: List[Dict[str, Any]] = [meminfo_finding]
if result.oom_local:
findings.append({"kind": "oom_kernel_hits", "oom_event_count": oom_hc})
oom_brief_zh = ""
next_actions = next_steps_struct(
recommended,
ns,
diagnosis_extra_purpose_zh=(
oom_diagnosis_invoke_extra_purpose_zh(oom_hc)
if recommended == "oomcheck"
else None
),
)
agent = agent_block(
"normal",
summary,
findings=findings,
next_steps=next_actions,
)
data: Dict[str, Any] = {
"categories": result.categories,
"recommended_service_name": recommended,
"confidence": result.confidence,
"primary_reason_zh": result.primary_reason_zh,
"facts": result.facts,
}
if result.oom_local:
ol = result.oom_local
data["oom_local"] = ol
data["oom_signal"] = ol["hit_count"] > 0
data["hit_count"] = ol["hit_count"]
out = envelope(
action="memory_classify",
ok=True,
agent=agent,
data=data,
execution={"subsystem": "memory", "phase": "quick_review"},
)
return finalize_memory_envelope(out, ns, verbose_summary=summary)
def execute_remote(self, ns: Namespace) -> Dict[str, Any]:
result = run_classify()
remote_analysis_value = build_remote_analysis_payload()
recommended = result.recommended_service_name
rsummary = f"深度诊断模式:按归类建议直接发起专项「{recommended}」。"
out = envelope(
action="memory_classify",
ok=True,
agent=agent_block("normal", rsummary, findings=[], next_steps=[]),
data={
"recommended_service_name": recommended,
"remote_analysis_value": remote_analysis_value,
"confidence": result.confidence,
"primary_reason_zh": result.primary_reason_zh,
"categories": result.categories,
},
execution={"subsystem": "memory", "mode": "remote", "phase": "remote_invoke"},
)
return run_memory_remote_invoke(
out, recommended, ns, verbose_summary=rsummary
)
FILE:scripts/sysom_cli/memory/javamem/__init__.py
# -*- coding: utf-8 -*-
FILE:scripts/sysom_cli/memory/javamem/command.py
# -*- coding: utf-8 -*-
"""Java 工作负载快速排查骨架:高 RSS 中 Java 特征 → 建议 SysOM javamem。"""
from __future__ import annotations
from argparse import Namespace
from typing import Any, Dict, List
from sysom_cli.core.base import BaseCommand, ExecutionMode
from sysom_cli.core.registry import command_metadata
from sysom_cli.lib.schema import agent_block, envelope
from sysom_cli.memory.lib.classify_engine import java_go_hints_from_rows, memory_ps_top_sample
from sysom_cli.memory.lib.envelope_memory import (
next_steps_struct,
)
from sysom_cli.memory.lib.memory_envelope_finalize import finalize_memory_envelope
from sysom_cli.memory.lib.memory_remote_helpers import (
run_memory_deep_diagnosis_local_first,
run_memory_remote_invoke,
)
from sysom_cli.memory.lib.remote_capabilities import remote_analysis_value_map
from sysom_cli.memory.lib.shared_invoke_args import MEMORY_DEEP_DIAGNOSIS_ARGS
def _limits_javamem() -> str:
return (
"仅基于进程名 RSS 粗采样,非 JVM 堆/GC 分析;"
"堆、GC、语言侧结论需 SysOM javamem 专项。"
)
@command_metadata(
name="javamem",
help="本机快速排查:高内存进程中 Java 特征;建议 SysOM javamem;可选 --deep-diagnosis 接深度诊断。",
subsystem="memory",
args=list(MEMORY_DEEP_DIAGNOSIS_ARGS),
)
class JavamemHintCommand(BaseCommand):
@property
def command_name(self) -> str:
return "javamem"
@property
def supported_modes(self) -> Dict[str, bool]:
return {
ExecutionMode.LOCAL: True,
ExecutionMode.REMOTE: True,
ExecutionMode.HYBRID: False,
}
def execute_local(self, ns: Namespace) -> Dict[str, Any]:
recommended = "javamem"
remote_analysis_value = remote_analysis_value_map()
if getattr(ns, "deep_diagnosis", False):
return run_memory_deep_diagnosis_local_first(
recommended=recommended,
memory_action="memory_javamem_hint",
ns=ns,
remote_analysis_value=remote_analysis_value,
verbose_summary=(
"深度诊断:已发起 SysOM javamem 专项。"
"(未跑本机进程 RSS 采样;本路径以远程专项为主。)"
),
)
top = memory_ps_top_sample()
java_hint, _ = java_go_hints_from_rows(top)
sample = [{"comm": c, "rss_kb": rss} for c, _, rss in top[:8]]
if java_hint:
summary = "高 RSS 进程中出现 Java 相关工作负载,建议 SysOM javamem 专项。"
cats = ["java_workload"]
else:
summary = (
"当前粗采样未见明显 Java 进程特征;若问题与 Java 相关仍可直接发起 javamem,"
"或先用 memory classify 综合归类。"
)
cats = ["general"]
next_actions = next_steps_struct(recommended, ns)
agent = agent_block(
"normal",
summary,
findings=[
{"kind": "rss_top_sample", "process_count": len(sample)}
],
next_steps=next_actions,
)
data: Dict[str, Any] = {
"recommended_service_name": recommended,
"java_signal": java_hint,
"rss_top_sample": sample,
}
out = envelope(
action="memory_javamem_hint",
ok=True,
agent=agent,
data=data,
execution={"subsystem": "memory", "phase": "quick_review"},
)
return finalize_memory_envelope(out, ns, verbose_summary=summary)
def execute_remote(self, ns: Namespace) -> Dict[str, Any]:
recommended = "javamem"
remote_analysis_value = remote_analysis_value_map()
rsummary = "深度诊断模式:直接发起 javamem(未跑本机进程采样)。"
out = envelope(
action="memory_javamem_hint",
ok=True,
agent=agent_block("normal", rsummary, findings=[], next_steps=[]),
data={
"recommended_service_name": recommended,
"remote_analysis_value": remote_analysis_value,
},
execution={"subsystem": "memory", "mode": "remote", "phase": "remote_invoke"},
)
return run_memory_remote_invoke(
out, recommended, ns, verbose_summary=rsummary
)
FILE:scripts/sysom_cli/memory/lib/__init__.py
# memory 子系统内部库(归类、远程价值文案等)
FILE:scripts/sysom_cli/memory/lib/classify_engine.py
# -*- coding: utf-8 -*-
"""
内存问题本地轻量归类:meminfo、本机进程 RSS 粗采样,以及通过 `oom_quick.analyze_oom_local`
复用 `memory oom` 的 quick 结论作为 OOM 归类信号。不调用 OpenAPI。
"""
from __future__ import annotations
import re
import subprocess
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
from sysom_cli.memory.lib.oom_quick import analyze_oom_local
from sysom_cli.memory.lib.remote_capabilities import remote_analysis_value_map
def _read_meminfo() -> Dict[str, int]:
out: Dict[str, int] = {}
path = Path("/proc/meminfo")
if not path.is_file():
return out
for line in path.read_text(encoding="utf-8", errors="replace").splitlines():
m = re.match(r"^(\w+):\s+(\d+)\s+kB\s*$", line)
if m:
out[m.group(1)] = int(m.group(2))
return out
def meminfo_quick_facts() -> Dict[str, Any]:
"""本机 /proc/meminfo 轻量摘要(kb),供 memory memgraph 等 quick_review。"""
mem = _read_meminfo()
total_kb = mem.get("MemTotal", 0)
avail_kb = mem.get("MemAvailable") or mem.get("MemFree", 0)
swap_total = mem.get("SwapTotal", 0)
swap_free = mem.get("SwapFree", 0)
facts: Dict[str, Any] = {
"mem_total_kb": total_kb,
"mem_available_kb": avail_kb,
"swap_total_kb": swap_total,
"swap_free_kb": swap_free,
}
if total_kb > 0:
facts["mem_available_ratio"] = round(avail_kb / total_kb, 4)
for k in ("Slab", "Buffers", "Cached", "AnonPages", "Shmem"):
if k in mem:
facts[f"{k.lower()}_kb"] = mem[k]
return facts
def _top_rss_processes(limit: int = 24) -> List[Tuple[str, str, int]]:
"""返回 [(comm, cmdline_snippet, rss_kb), ...] 按 RSS 降序;失败则空列表。"""
try:
r = subprocess.run(
[
"ps",
"-eo",
"comm,rss",
"--sort=-rss",
"--no-headers",
],
capture_output=True,
text=True,
timeout=15,
)
if r.returncode != 0 or not r.stdout:
return []
rows: List[Tuple[str, str, int]] = []
for line in r.stdout.strip().splitlines():
parts = line.split(None, 1)
if len(parts) < 2:
continue
comm, rss_s = parts[0], parts[1].strip()
try:
rss = int(rss_s)
except ValueError:
continue
rows.append((comm, comm, rss))
if len(rows) >= limit:
break
return rows
except (FileNotFoundError, subprocess.TimeoutExpired):
return []
def _java_go_hints(rows: List[Tuple[str, str, int]]) -> Tuple[bool, bool]:
java = False
go = False
for comm, _, _ in rows[:15]:
c = comm.lower()
if "java" in c or c == "java":
java = True
# Go 进程 comm 多变,仅对极弱特征做启发式,避免误判
if any(x in c for x in ("___go_", "dlv", "gopls")):
go = True
return java, go
@dataclass
class ClassifyResult:
"""归类结果。"""
categories: List[str] = field(default_factory=list)
category_labels_zh: Dict[str, str] = field(default_factory=dict)
primary_reason_zh: str = ""
recommended_service_name: str = "memgraph"
confidence: float = 0.5
facts: Dict[str, Any] = field(default_factory=dict)
oom_hits: List[str] = field(default_factory=list)
oom_local: Optional[Dict[str, Any]] = None
def run_classify() -> ClassifyResult:
mem = _read_meminfo()
total_kb = mem.get("MemTotal", 0)
avail_kb = mem.get("MemAvailable") or mem.get("MemFree", 0)
swap_total = mem.get("SwapTotal", 0)
swap_free = mem.get("SwapFree", 0)
facts: Dict[str, Any] = {
"mem_total_kb": total_kb,
"mem_available_kb": avail_kb,
"swap_total_kb": swap_total,
"swap_free_kb": swap_free,
}
if total_kb > 0:
facts["mem_available_ratio"] = round(avail_kb / total_kb, 4)
# OOM 详细提取属 memory oom quick,此处仅复用以驱动 classify 路由与 facts
oom_local = analyze_oom_local()
facts["oom_signal_lines"] = oom_local.get("oom_lines_total") or oom_local["hit_count"]
if oom_local.get("oom_event_count"):
facts["oom_event_count"] = oom_local["oom_event_count"]
top = _top_rss_processes()
facts["top_processes_sample"] = [
{"comm": c, "rss_kb": rss} for c, _, rss in top[:8]
]
java_hint, go_hint = _java_go_hints(top)
cats: List[str] = []
labels: Dict[str, str] = {
"oom_signal": "内核日志中存在 OOM / oom-killer 相关线索",
"memory_pressure": "可用内存占比偏低(压力)",
"swap_pressure": "Swap 空间紧张或未配置 Swap",
"java_workload": "高 RSS 进程中出现 Java 相关工作负载",
"go_workload": "高 RSS 进程中出现 Go 相关弱特征(启发式,需结合业务确认)",
"general": "通用内存分布与全景(默认)",
}
recommended = "memgraph"
confidence = 0.55
reason_parts: List[str] = []
if oom_local["hit_count"]:
cats.append("oom_signal")
recommended = "oomcheck"
confidence = 0.82
reason_parts.append("内核日志中存在 OOM / oom-killer 相关线索,优先走 OOM 专项。")
elif total_kb > 0 and (avail_kb / total_kb) < 0.08:
cats.append("memory_pressure")
recommended = "memgraph"
confidence = 0.78
reason_parts.append("MemAvailable 占比偏低,建议先做内存全景(memgraph)。")
elif swap_total > 0 and swap_free < swap_total * 0.15:
cats.append("swap_pressure")
recommended = "memgraph"
confidence = 0.72
reason_parts.append("Swap 使用率较高,建议内存全景排查回收与组成。")
elif swap_total == 0:
cats.append("swap_pressure")
reason_parts.append("当前未配置 Swap(SwapTotal=0),突发内存压力时风险更高;可选 memgraph 看组成。")
if java_hint and recommended == "memgraph" and "oom_signal" not in cats:
cats.append("java_workload")
recommended = "javamem"
confidence = max(confidence, 0.68)
reason_parts.append("高内存进程偏 Java,建议 Java 内存专项(javamem)。")
elif go_hint and recommended == "memgraph" and "oom_signal" not in cats and not java_hint:
cats.append("go_workload")
recommended = "memgraph"
confidence = max(confidence, 0.62)
reason_parts.append(
"启发式存在 Go 相关弱特征:CLI 侧优先走内存全景(memgraph);若已确认 Java 工作负载再改用 javamem。"
)
if not cats:
cats.append("general")
if not reason_parts:
cats.append("general")
reason_parts.append("未见强特征,默认建议内存全景(memgraph)。")
primary = " ".join(reason_parts) if reason_parts else labels.get(cats[0], labels["general"])
oom_tail5 = oom_local["oom_digest"][-2:] if oom_local["hit_count"] else []
return ClassifyResult(
categories=cats,
category_labels_zh={k: labels[k] for k in cats if k in labels},
primary_reason_zh=primary,
recommended_service_name=recommended,
confidence=confidence,
facts=facts,
oom_hits=oom_tail5,
oom_local=oom_local if oom_local["hit_count"] else None,
)
def build_remote_analysis_payload() -> Dict[str, str]:
"""供信封 data.remote_analysis_value。"""
return remote_analysis_value_map()
def memory_ps_top_sample(limit: int = 24) -> List[Tuple[str, str, int]]:
"""供 memory javamem 等子命令复用。"""
return _top_rss_processes(limit)
def java_go_hints_from_rows(rows: List[Tuple[str, str, int]]) -> Tuple[bool, bool]:
"""基于 RSS Top 行判断 Java / Go 弱特征。"""
return _java_go_hints(rows)
FILE:scripts/sysom_cli/memory/lib/envelope_memory.py
# -*- coding: utf-8 -*-
"""内存类命令信封:agent.next 由 next_steps_struct 生成;finalize 补全 routing/local。"""
from __future__ import annotations
from argparse import Namespace
from typing import Any, Dict, List, Optional
# agent.next 的 purpose_zh
_AGENT_NEXT_RUN_HINT_ZH = (
"在技能根 Bash 执行,勿让用户手抄。"
)
_REMOTE_REGION_INSTANCE_ZH = (
"本机不传 --region/--instance。"
)
_DEEP_DIAGNOSIS_MUST_ZH = (
"应继续深度诊断,勿仅以 quick 结论结束。"
)
def oom_diagnosis_agent_brief_zh(hit_count: int) -> str:
"""供 data.oom_diagnosis_cli_hints_zh:仅「下一步」引导。"""
multi = f"本封 {hit_count} 次 OOM,针对某次用 --oom-time。" if hit_count > 1 else ""
return multi + "深度用 memory oom --deep-diagnosis;细则见 oomcheck.md。"
def oom_diagnosis_invoke_extra_purpose_zh(hit_count: int) -> str:
"""附在 diagnosis_invoke 的 purpose_zh 末尾;内容与 data.oom_diagnosis_cli_hints_zh 同源,仅多换行包裹。"""
return f"\n【oomcheck 下一步】{oom_diagnosis_agent_brief_zh(hit_count)}\n"
def recommended_specialty_cli_command(
service_name: str,
ns: Optional[Namespace] = None,
) -> str:
"""可复制 shell:内存域 memory 子命令。"""
suffix = "--channel ecs --timeout 300"
memory_cli = {
"memgraph": f"./scripts/osops.sh memory memgraph --deep-diagnosis {suffix}",
"oomcheck": f"./scripts/osops.sh memory oom --deep-diagnosis {suffix}",
"javamem": f"./scripts/osops.sh memory javamem --deep-diagnosis {suffix}",
}
cmd = memory_cli.get(
service_name,
f"./scripts/osops.sh memory classify --deep-diagnosis {suffix}",
)
if ns is not None:
reg = str(getattr(ns, "region", None) or "").strip()
inst = str(getattr(ns, "instance", None) or "").strip()
if reg and inst:
cmd += f" --region {reg} --instance {inst}"
return cmd
def _deep_step_only_commands(
recommended: str,
ns: Optional[Namespace] = None,
) -> List[Dict[str, str]]:
"""环境检查已通过时仅保留一条可执行的深度诊断命令。"""
return [
{
"action": "diagnosis_invoke",
"command": recommended_specialty_cli_command(recommended, ns),
"purpose_zh": (
f"{_AGENT_NEXT_RUN_HINT_ZH}"
f"{_REMOTE_REGION_INSTANCE_ZH}"
f"{_DEEP_DIAGNOSIS_MUST_ZH}"
f"发起「{recommended}」深度诊断;环境已通过。"
),
}
]
def build_next_steps_commands(
recommended: str,
ns: Optional[Namespace] = None,
) -> List[Dict[str, str]]:
"""扁平步骤(供 agent.next 展开);每项含 action、command、purpose_zh。"""
pre = (
f"{_AGENT_NEXT_RUN_HINT_ZH}"
f"{_REMOTE_REGION_INSTANCE_ZH}"
"先执行 precheck 确认凭证。"
)
inv = (
f"{_AGENT_NEXT_RUN_HINT_ZH}"
f"{_REMOTE_REGION_INSTANCE_ZH}"
f"{_DEEP_DIAGNOSIS_MUST_ZH}"
f"precheck 通过后发起「{recommended}」深度诊断。"
)
return [
{
"action": "precheck",
"command": "./scripts/osops.sh precheck",
"purpose_zh": pre,
},
{
"action": "diagnosis_invoke",
"command": recommended_specialty_cli_command(recommended, ns),
"purpose_zh": inv,
},
]
def _steps_with_action_kind(raw_steps: List[Dict[str, str]]) -> List[Dict[str, Any]]:
out: List[Dict[str, Any]] = []
for s in raw_steps:
action = s.get("action", "")
if action == "precheck":
kind = "precheck"
elif action == "diagnosis_invoke":
kind = "sysom_specialty"
else:
kind = "other_subcommand"
out.append(
{
"action_kind": kind,
"action": action,
"command": s.get("command", ""),
"purpose_zh": s.get("purpose_zh", ""),
}
)
return out
def next_steps_struct(
recommended_service_name: str,
ns: Optional[Namespace] = None,
*,
diagnosis_extra_purpose_zh: Optional[str] = None,
) -> List[Dict[str, Any]]:
"""供写入 agent.next:若 run_precheck 已通过则仅深度命令一步,否则 precheck + 深度两步。"""
from sysom_cli.lib.auth import run_precheck
cr = run_precheck()
if cr["ok"]:
raw = _deep_step_only_commands(recommended_service_name, ns)
else:
raw = build_next_steps_commands(recommended_service_name, ns)
out = _steps_with_action_kind(raw)
if diagnosis_extra_purpose_zh:
for step in out:
if step.get("action") == "diagnosis_invoke":
step["purpose_zh"] = step.get("purpose_zh", "") + diagnosis_extra_purpose_zh
return out
def quick_analysis_block_classify(
*,
inputs_checked: List[str],
categories: List[str],
conclusion_zh: str,
confidence: float,
limits_zh: str,
) -> Dict[str, Any]:
return {
"inputs_checked": inputs_checked,
"categories": categories,
"conclusion_zh": conclusion_zh,
"confidence": confidence,
"limits_zh": limits_zh,
}
def quick_analysis_block_oom(
*,
conclusion_zh: str,
limits_zh: str,
) -> Dict[str, Any]:
"""结论与局限;命中次数见 data.routing / data.local,不在此重复。"""
return {
"inputs_checked": ["kernel_log_journal_oom_patterns"],
"conclusion_zh": conclusion_zh,
"limits_zh": limits_zh,
}
# 信封约定:classify 的完整 facts(含 top_processes_sample)只在 data.local.facts
CLASSIFY_FACTS_DATA_REF_ZH = (
"完整本机摘要(meminfo 指标、RSS 采样、OOM 行数等)见 data.local.facts;"
"若存在 OOM,逐次摘要见 data.local.oom_local.oom_events_summary,结构化关键字段见 oom_digest。"
)
# memgraph quick:本机快照只在 data.meminfo_facts / data.rss_top_sample
MEMGRAPH_LOCAL_SNAPSHOT_REF_ZH = (
"本机 meminfo 摘要见 data.local.meminfo_facts,高 RSS 进程采样见 data.local.rss_top_sample。"
)
# javamem:RSS 采样仅在 data.rss_top_sample
RSS_TOP_SAMPLE_DATA_REF_ZH = "高 RSS 进程采样(comm、rss_kb)见 data.local.rss_top_sample。"
def limits_zh_default_classify() -> str:
return (
"未在目标机执行 SysOM 采集;语言运行时(JVM/Go)细节、"
"全景 memgraph 等需走下方 SysOM 专项。"
)
def limits_zh_default_oom() -> str:
return (
"仅扫描本机内核日志尾部窗口;journal/dmesg 格式不同会导致部分行无法解析墙钟时间;"
"遗漏与根因链仍可能需 SysOM oomcheck 专项。"
)
FILE:scripts/sysom_cli/memory/lib/invoke_bridge.py
# -*- coding: utf-8 -*-
"""memory 子命令可选地复用 SysOM 专项后端(与 io / net / load 专项子命令同一实现)。"""
from __future__ import annotations
from argparse import Namespace
from typing import Any, Dict
from sysom_cli.lib.diagnosis_backend import get_diagnosis_backend, namespace_for_diagnosis_invoke
# 供测试或需直接构造 Namespace 的场景
__all__ = ["namespace_for_diagnosis_invoke", "run_deep_diagnosis_invoke"]
def run_deep_diagnosis_invoke(
service_name: str,
ns: Namespace,
*,
skip_precheck: bool = False,
) -> Dict[str, Any]:
"""
skip_precheck: 为 True 时跳过 run_precheck(调用方已在 execute(REMOTE) 等路径做过门禁)。
"""
if not skip_precheck:
from sysom_cli.lib.precheck_gate import remote_precheck_gate
ok_gate, fail_env = remote_precheck_gate()
if not ok_gate:
return fail_env # type: ignore[return-value]
return get_diagnosis_backend().invoke_specialty(service_name, ns)
FILE:scripts/sysom_cli/memory/lib/memory_envelope_finalize.py
# -*- coding: utf-8 -*-
"""memory 域 stdout 信封后处理:routing/local、扁平 remote;下一步仅在 agent.next。"""
from __future__ import annotations
from argparse import Namespace
from typing import Any, Dict, List, Optional
from sysom_cli.lib.invoke_envelope_finalize import ensure_findings_finding_type
_ROUTING_KEYS = (
"recommended_service_name",
"confidence",
"primary_reason_zh",
"categories",
"category_labels_zh",
"oom_signal",
"hit_count",
"java_signal",
"go_signal",
)
_LOCAL_KEYS = ("facts", "oom_local", "meminfo_facts", "rss_top_sample")
# 输出时从 data 根剥离的冗余字段(静态描述/与 agent 重复/已在 summary 中)
_DATA_STRIP_KEYS = ("remote_analysis_value", "quick_analysis", "oom_diagnosis_cli_hints_zh")
# oom_local 中仅供内部逻辑、不需输出给 Agent 的调试字段
_OOM_LOCAL_STRIP_KEYS = (
"oom_lines_total",
"extraction_mode",
"parsed_time_count",
"unparsed_wallclock_count",
"dmesg_relative_line_count",
"relative_boot_seconds_sample",
"source_note_zh",
)
def _build_routing(data: Dict[str, Any]) -> Dict[str, Any]:
routing: Dict[str, Any] = {}
for k in _ROUTING_KEYS:
if k in data:
routing[k] = data[k]
return routing
def _build_local(data: Dict[str, Any]) -> Dict[str, Any]:
local: Dict[str, Any] = {}
for k in _LOCAL_KEYS:
if k in data:
local[k] = data[k]
return local
def _strip_verbose_output(data: Dict[str, Any]) -> None:
"""输出前剥离冗余字段:静态描述、与 agent 重复字段、oom_local 调试字段。"""
for k in _DATA_STRIP_KEYS:
data.pop(k, None)
oom_local = (data.get("local") or {}).get("oom_local")
if isinstance(oom_local, dict):
for k in _OOM_LOCAL_STRIP_KEYS:
oom_local.pop(k, None)
def _invoke_data_coalesce(idata: Dict[str, Any], key: str) -> Any:
"""DiagnosisInvoke 信封经 finalize 后,业务字段常在 data.remote;合并时两边都读。"""
v = idata.get(key)
if v is not None:
return v
nested = idata.get("remote")
if isinstance(nested, dict):
return nested.get(key)
return None
def merge_deep_diagnosis_flat(
out: Dict[str, Any],
inv: Dict[str, Any],
*,
service_name: str,
) -> None:
"""将远程专项调用结果写入 data.remote,不再嵌套整包子信封。"""
data = out.setdefault("data", {})
data.pop("deep_diagnosis", None)
idata = inv.get("data") if isinstance(inv.get("data"), dict) else {}
# finalize_diagnosis_invoke_envelope 会把 result 挪到 data.remote 并 pop 顶层 result
nested_remote = idata.get("remote") if isinstance(idata.get("remote"), dict) else {}
merged_result = idata.get("result")
if merged_result is None and nested_remote:
merged_result = nested_remote.get("result")
remote: Dict[str, Any] = {
"ok": bool(inv.get("ok")),
"action": inv.get("action"),
"service_name": service_name,
"task_id": _invoke_data_coalesce(idata, "task_id"),
"channel": _invoke_data_coalesce(idata, "channel"),
"region": _invoke_data_coalesce(idata, "region"),
"result": merged_result,
"ecs_metadata_filled": _invoke_data_coalesce(idata, "ecs_metadata_filled"),
"diagnosis_source_origin": _invoke_data_coalesce(idata, "diagnosis_source_origin"),
}
ds = _invoke_data_coalesce(idata, "diagnosis_source")
if ds:
remote["diagnosis_source"] = ds
if inv.get("error"):
remote["error"] = inv["error"]
data["remote"] = remote
def finalize_memory_envelope(
out: Dict[str, Any],
ns: Namespace,
*,
verbose_summary: str,
) -> Dict[str, Any]:
"""
- 写入 data.routing、data.local;并从 data 根删除已收入二者的键
- 移除 data.next_steps(若存在);**下一步动作以 agent.next 为准**(由各 command 写入,含 action_kind)
- quick_analysis 中与 oom_local 重复项改为 detail_pointers
- agent:data_refs、data_field_guide;**不覆盖 agent.next**(成功路径);仅按 verbose 切换 summary
"""
data = out.setdefault("data", {})
agent = out.setdefault("agent", {})
routing = _build_routing(data)
if routing:
data["routing"] = routing
for k in routing:
data.pop(k, None)
local = _build_local(data)
if local:
data["local"] = local
for k in local:
data.pop(k, None)
data.pop("read_next", None)
data.pop("next_analysis", None)
data.pop("follow_up", None)
data.pop("next_steps", None)
_strip_verbose_output(data)
ensure_findings_finding_type(agent.get("findings") or [])
# 深度诊断等失败时保留上游已写入的 agent.summary / agent.next
if out.get("error") is not None:
return out
remote = data.get("remote")
remote_ok = (
isinstance(remote, dict)
and bool(remote.get("ok"))
and bool(out.get("ok"))
)
if isinstance(remote, dict):
exec_ = dict(out.get("execution") or {})
exec_["phase"] = "invoke_diagnosis"
out["execution"] = exec_
verbose = bool(getattr(ns, "verbose_envelope", False))
if remote_ok:
if verbose:
if (verbose_summary or "").strip():
agent["summary"] = verbose_summary
else:
tid = (remote.get("task_id") or "").strip() if isinstance(remote, dict) else ""
agent["summary"] = (
f"深度诊断已完成(紧凑 verbose 未提供摘要)。task_id={tid or '(无)'};详见 data.remote。"
)
else:
res = remote.get("result") if isinstance(remote, dict) else None
if isinstance(res, dict) and res.get("_sysom_cli_note_zh"):
agent["summary"] = (
"深度诊断调用已成功;服务端未返回标准 result 字段,说明与 _raw_data_keys 见 data.remote.result;"
"task_id 见 data.remote.task_id。勿再重复发起同一条 --deep-diagnosis 命令。"
)
elif res is not None:
agent["summary"] = (
"深度诊断调用已成功。业务载荷见 data.remote.result;"
"task_id 见 data.remote.task_id。勿再重复发起同一条 --deep-diagnosis 命令。"
)
else:
agent["summary"] = (
"深度诊断调用已成功但 data.remote.result 为 null。"
"请凭 data.remote.task_id 在控制台或后续 API 查询;勿再重复发起同一条 --deep-diagnosis 命令。"
)
agent["next"] = []
elif verbose:
agent["summary"] = verbose_summary
else:
agent["summary"] = (
"OOM 信号已检出,本机初步发现见 agent.findings 和 data.local。"
"请向用户展示本机发现,并执行 agent.next 中的命令获取深度诊断结果。"
)
return out
FILE:scripts/sysom_cli/memory/lib/memory_remote_helpers.py
# -*- coding: utf-8 -*-
"""memory 子命令:--deep-diagnosis 与 execute_remote 共用的远程结果合并。"""
from __future__ import annotations
from argparse import Namespace
from typing import Any, Dict
from sysom_cli.lib.precheck_gate import merge_precheck_gate_failure_into_memory_envelope
from sysom_cli.lib.schema import agent_block, envelope
from sysom_cli.memory.lib.invoke_bridge import run_deep_diagnosis_invoke
from sysom_cli.memory.lib.memory_envelope_finalize import (
finalize_memory_envelope,
merge_deep_diagnosis_flat,
)
def apply_deep_diagnosis_or_precheck_merge(
out: Dict[str, Any],
inv: Dict[str, Any],
*,
recommended: str,
ns: Namespace,
verbose_summary: str,
) -> Dict[str, Any]:
"""inv 为 run_deep_diagnosis_invoke 返回值;precheck 失败时合并进 quick 信封。"""
if not inv.get("ok") and inv.get("action") == "precheck":
merge_precheck_gate_failure_into_memory_envelope(out, inv)
else:
merge_deep_diagnosis_flat(out, inv, service_name=recommended)
if not inv.get("ok"):
out["ok"] = False
if inv.get("agent"):
out["agent"] = inv["agent"]
if inv.get("error"):
out["error"] = inv["error"]
return finalize_memory_envelope(out, ns, verbose_summary=verbose_summary)
def run_memory_remote_invoke(
out: Dict[str, Any],
recommended: str,
ns: Namespace,
*,
verbose_summary: str,
) -> Dict[str, Any]:
"""execute_remote:外层已在 BaseCommand.execute(REMOTE) 做过 precheck 门禁。"""
inv = run_deep_diagnosis_invoke(recommended, ns, skip_precheck=True)
return apply_deep_diagnosis_or_precheck_merge(
out, inv, recommended=recommended, ns=ns, verbose_summary=verbose_summary
)
def run_memory_deep_diagnosis_local_first(
*,
recommended: str,
memory_action: str,
ns: Namespace,
remote_analysis_value: Dict[str, Any],
verbose_summary: str,
) -> Dict[str, Any]:
"""
本地入口带 --deep-diagnosis:先发起远程专项,避免先构造整块本机 quick 数据。
precheck 失败时合并进最小信封(与 apply_deep_diagnosis 的 precheck 分支一致)。
"""
inv = run_deep_diagnosis_invoke(recommended, ns)
out = envelope(
action=memory_action,
ok=True,
agent=agent_block("normal", "", findings=[], next_steps=[]),
data={
"recommended_service_name": recommended,
"remote_analysis_value": remote_analysis_value,
},
execution={"subsystem": "memory", "phase": "quick_review"},
)
if not inv.get("ok") and inv.get("action") == "precheck":
merge_precheck_gate_failure_into_memory_envelope(out, inv)
return finalize_memory_envelope(out, ns, verbose_summary=verbose_summary)
merge_deep_diagnosis_flat(out, inv, service_name=recommended)
if not inv.get("ok"):
out["ok"] = False
if inv.get("agent"):
out["agent"] = inv["agent"]
if inv.get("error"):
out["error"] = inv["error"]
return finalize_memory_envelope(out, ns, verbose_summary=verbose_summary)
FILE:scripts/sysom_cli/memory/lib/oom_log_extract.py
# -*- coding: utf-8 -*-
"""
从内核日志中按块提取 OOM 记录(对齐 SysOM oomcheck 的块边界规则)。
- 块起点:invoked oom-killer
- 块结束:Killed process+total-vm / oom_reaper / oom-kill:constraint / Out of memory+total-vm
- 块内解析 cgroup(Task in / oom-kill:constraint)
extract_oom_digest() 从已提取的块中解析关键诊断字段(参考 SysOM oomcheck.py),
本地模式仅提取信息,不做结论/建议。
"""
from __future__ import annotations
import re
from typing import Any, Dict, List, Optional, Tuple
# 与 sysAK oomcheck.py 一致(起点:invoked oom-killer,见 _OOM_BEGIN_RE)
OOM_CGROUP_KEYWORD = "Task in /"
OOM_END_KEYWORD_5_10 = "oom-kill:constraint"
def _is_oom_block_end(line: str, idx: int, lines: List[str]) -> bool:
"""对应 sysAK is_oom_end。"""
if "Killed process" in line and "total-vm" in line:
if idx + 1 < len(lines) and "oom_reaper" in lines[idx + 1]:
return False
return True
if "oom_reaper" in line:
return True
if "oom-kill:constraint" in line:
if idx + 1 < len(lines) and "Out of memory" in lines[idx + 1]:
return False
return True
if "Out of memory" in line and "total-vm" in line:
return True
return False
def _cgroup_from_task_line(line: str) -> Tuple[str, str]:
"""对应 sysAK oom_get_cgroup_name_api(line, is_510=0)。"""
if OOM_CGROUP_KEYWORD not in line:
return "", ""
try:
task_list = line.strip().split("Task in")[1].strip().split()
if len(task_list) < 2:
return "", ""
return task_list[0], task_list[-1]
except (IndexError, ValueError):
return "", ""
def _cgroup_from_constraint_line(line: str) -> Tuple[str, str]:
"""对应 sysAK oom_get_cgroup_name_api(line, is_510=1)。"""
if "CONSTRAINT_MEMCG" not in line or "task_memcg=" not in line or "oom_memcg=" not in line:
return "", ""
try:
cgroup = line.split("task_memcg=")[1].split(",")[0]
pcgroup = line.split("oom_memcg=")[1].split(",")[0]
return cgroup, pcgroup
except IndexError:
return "", ""
_OOM_BEGIN_RE = re.compile(r"invoked\s+oom-killer", re.IGNORECASE)
# ---------- digest 提取正则(参考 SysOM oomcheck.py)----------
_INVOKER_RE = re.compile(
r"(\S+)\s+invoked\s+oom-killer.*?gfp_mask=(0x\w+)(?:\([^)]*\))?\s*,\s*order=(\d+)",
re.IGNORECASE,
)
# Killed process 111 (comm) total-vm:NNNkB, anon-rss:NNNkB, file-rss:NNNkB, shmem-rss:NNNkB
_KILLED_PROCESS_RE = re.compile(
r"Killed\s+process\s+(\d+)\s+\(([^)]+)\).*?"
r"total-vm:(\d+)kB.*?anon-rss:(\d+)kB.*?file-rss:(\d+)kB"
r"(?:.*?shmem-rss:(\d+)kB)?",
re.IGNORECASE,
)
# Fallback: Killed process PID (comm) total-vm:NNNkB (no anon-rss/file-rss)
_KILLED_PROCESS_SIMPLE_RE = re.compile(
r"Killed\s+process\s+(\d+)\s+\(([^)]+)\).*?total-vm:(\d+)kB",
re.IGNORECASE,
)
# oom-kill:constraint=XXX,...,task=YYY,pid=NNN,uid=NNN
_CONSTRAINT_KV_RE = re.compile(
r"oom-kill:constraint=(\S+?)(?:,|\s)"
)
_CONSTRAINT_TASK_RE = re.compile(
r"task=(\S+?),pid=(\d+),uid=(\d+)"
)
# memory: usage NNNkB, limit NNNkB, failcnt NNN
_CG_USAGE_LIMIT_RE = re.compile(
r"memory:\s*usage\s+(\d+)kB,\s*limit\s+(\d+)kB"
)
# Task table row: [PID] UID TGID TOTAL_VM RSS ...
_TASK_ROW_RE = re.compile(
r"\[\s*(\d+)\]\s+\d+\s+\d+\s+(\d+)\s+(\d+)\s+\d+\s+\d+\s+[-\d]+\s+(\S+)"
)
def _strip_ts_prefix(line: str) -> str:
"""去除 journal/dmesg 时间戳前缀,返回内核消息体。"""
# journal: "Mar 18 16:02:39.381492 host kernel: MSG"
if " kernel: " in line:
return line.split(" kernel: ", 1)[1]
# dmesg -T: "[Thu Mar 18 ...] MSG" or "[123.456] MSG"
if line.startswith("[") and "]" in line:
return line.split("]", 1)[1].strip()
return line
def extract_oom_digest(block: Dict[str, Any]) -> Dict[str, Any]:
"""从 OOM 块提取关键诊断字段(参考 SysOM oomcheck.py 提取逻辑)。
仅提取事实信息,不做结论/建议——结论由远程 oomcheck 专项或 Agent 给出。
"""
lines: List[str] = block.get("lines", [])
digest: Dict[str, Any] = {
"line_count": len(lines),
"invoker_comm": None,
"gfp_mask": None,
"order": None,
"constraint": None,
"killed_pid": None,
"killed_comm": None,
"total_vm_kb": None,
"anon_rss_kb": None,
"file_rss_kb": None,
"shmem_rss_kb": None,
"task_cgroup": block.get("cgroup") or None,
"oom_cgroup": block.get("parent_cgroup") or None,
"cg_usage_kb": None,
"cg_limit_kb": None,
"top_rss_tasks": [],
}
task_rows: List[Dict[str, Any]] = []
for raw_line in lines:
line = _strip_ts_prefix(raw_line)
# --- invoker ---
m = _INVOKER_RE.search(line)
if m:
digest["invoker_comm"] = m.group(1)
digest["gfp_mask"] = m.group(2)
digest["order"] = int(m.group(3))
continue
# --- constraint (5.10) ---
m = _CONSTRAINT_KV_RE.search(line)
if m:
digest["constraint"] = m.group(1)
# 5.10 cgroup: task=...,pid=...,uid=...
tm = _CONSTRAINT_TASK_RE.search(line)
if tm and digest["killed_comm"] is None:
digest["killed_comm"] = tm.group(1)
digest["killed_pid"] = int(tm.group(2))
# task_memcg / oom_memcg
if "task_memcg=" in line:
digest["task_cgroup"] = line.split("task_memcg=")[1].split(",")[0]
if "oom_memcg=" in line:
digest["oom_cgroup"] = line.split("oom_memcg=")[1].split(",")[0]
continue
# --- killed process (3.10/4.19/5.10-host) ---
m = _KILLED_PROCESS_RE.search(line)
if m:
digest["killed_pid"] = int(m.group(1))
digest["killed_comm"] = m.group(2)
digest["total_vm_kb"] = int(m.group(3))
digest["anon_rss_kb"] = int(m.group(4))
digest["file_rss_kb"] = int(m.group(5))
digest["shmem_rss_kb"] = int(m.group(6)) if m.group(6) else 0
continue
# fallback: simpler Killed process line without full mem detail
m = _KILLED_PROCESS_SIMPLE_RE.search(line)
if m and digest["killed_pid"] is None:
digest["killed_pid"] = int(m.group(1))
digest["killed_comm"] = m.group(2)
digest["total_vm_kb"] = int(m.group(3))
continue
# --- cgroup usage / limit ---
m = _CG_USAGE_LIMIT_RE.search(line)
if m and digest["cg_usage_kb"] is None:
digest["cg_usage_kb"] = int(m.group(1))
digest["cg_limit_kb"] = int(m.group(2))
continue
# --- 4.19 oom_reaper (killed pid/comm) ---
if "oom_reaper" in line and "reaped process" in line:
try:
seg = line.split("reaped process")[1].strip()
parts = seg.split()
if len(parts) >= 2 and digest["killed_pid"] is None:
digest["killed_pid"] = int(parts[0])
digest["killed_comm"] = parts[1].strip(",").strip("()")
except (ValueError, IndexError):
pass
continue
# --- task table row ---
m = _TASK_ROW_RE.search(line)
if m:
task_rows.append({
"pid": int(m.group(1)),
"total_vm_pages": int(m.group(2)),
"rss_pages": int(m.group(3)),
"comm": m.group(4),
})
# top 5 by RSS
task_rows.sort(key=lambda t: t["rss_pages"], reverse=True)
digest["top_rss_tasks"] = task_rows[:5]
return digest
def extract_oom_blocks(lines: List[str]) -> List[Dict[str, Any]]:
"""
返回若干 OOM 块;每块含 lines(行列表)、raw_block、cgroup 等。
无匹配时返回 []。
"""
res: List[str] = []
blocks: List[Dict[str, Any]] = []
in_oom = False
pending: Optional[Dict[str, Any]] = None
for idx, line in enumerate(lines):
if _OOM_BEGIN_RE.search(line):
res = [line]
in_oom = True
pending = {
"lines": res,
"cgroup": "",
"parent_cgroup": "",
}
continue
if not in_oom or pending is None:
continue
if _is_oom_block_end(line, idx, lines):
in_oom = False
if OOM_END_KEYWORD_5_10 in line:
cg, pcg = _cgroup_from_constraint_line(line)
if cg or pcg:
pending["cgroup"] = cg
pending["parent_cgroup"] = pcg
pending["lines"].append(line)
raw = "\n".join(pending["lines"])
blocks.append(
{
"lines": pending["lines"],
"raw_block": raw,
"cgroup": pending["cgroup"] or "",
"parent_cgroup": pending["parent_cgroup"] or "",
"incomplete": False,
}
)
pending = None
res = []
elif OOM_CGROUP_KEYWORD in line:
cg, pcg = _cgroup_from_task_line(line)
pending["cgroup"] = cg
pending["parent_cgroup"] = pcg
pending["lines"].append(line)
else:
pending["lines"].append(line)
if in_oom and pending is not None:
raw = "\n".join(pending["lines"])
blocks.append(
{
"lines": pending["lines"],
"raw_block": raw,
"cgroup": pending["cgroup"] or "",
"parent_cgroup": pending["parent_cgroup"] or "",
"incomplete": True,
}
)
return blocks
FILE:scripts/sysom_cli/memory/lib/oom_quick.py
# -*- coding: utf-8 -*-
"""`memory oom` 本机 quick 模式:内核日志扫描、OOM 块解析、oom_digest / oom_events_summary 信封数据。"""
from __future__ import annotations
import re
from collections import Counter
from datetime import datetime, timedelta
from typing import Any, Dict, List, Optional, Tuple
from sysom_cli.lib.kernel_log import get_kernel_log_lines
from sysom_cli.memory.lib.oom_log_extract import extract_oom_blocks, extract_oom_digest
_OOM_PATTERNS = re.compile(
r"out of memory|oom-killer|killed process|invoked oom-killer",
re.IGNORECASE,
)
_JOURNAL_TS = re.compile(
r"^([A-Za-z]{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}(?:\.\d+)?)\s+",
)
_DMESG_HUMAN_TS = re.compile(r"^\[([A-Za-z]{3}\s+[A-Za-z]{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}\s+\d{4})\]")
_DMESG_REL = re.compile(r"^\[\s*([\d.]+)\]")
_KILLED_RE = re.compile(r"Killed\s+process\s+\d+\s+\(([^)]+)\)", re.IGNORECASE)
def parse_oom_at_anchor(s: Optional[str]) -> Optional[datetime]:
"""解析 `--oom-at`:Unix 秒、ISO 日期时间、或 journal 风格 `Mar 20 12:00:00`(补当前年)。"""
if s is None:
return None
t = str(s).strip()
if not t:
return None
if t.isdigit():
return datetime.fromtimestamp(int(t))
ts = t.replace("Z", "+00:00")
try:
return datetime.fromisoformat(ts)
except ValueError:
pass
for fmt in ("%b %d %H:%M:%S.%f", "%b %d %H:%M:%S"):
try:
dt = datetime.strptime(t, fmt)
now = datetime.now()
dt = dt.replace(year=now.year)
if dt > now + timedelta(days=1):
dt = dt.replace(year=now.year - 1)
return dt
except ValueError:
continue
return None
_HMS_ONLY = re.compile(r"^(\d{1,2}):(\d{2})(?::(\d{2}))?(?:\.\d+)?$")
def parse_oom_time_for_remote(s: str) -> Optional[datetime]:
"""
解析用户可能传入的多种时间串(供远程 oomcheck params.time)。
含:纯 Unix 秒、ISO、`YYYY-MM-DD HH:MM:SS`、仅 `HH:MM:SS`(按本地当天)、journal 风格。
"""
t = (s or "").strip()
if not t:
return None
m = _HMS_ONLY.match(t)
if m:
h, mi, sec_s = int(m.group(1)), int(m.group(2)), m.group(3) or "0"
sec = int(sec_s)
now = datetime.now()
return now.replace(hour=h, minute=mi, second=sec, microsecond=0)
if len(t) >= 11 and t[4] == "-" and t[10] == " ":
t = t[:10] + "T" + t[11:]
return parse_oom_at_anchor(t)
def _one_oom_time_segment_to_unix(seg: str) -> str:
seg = seg.strip()
if not seg:
return seg
if re.fullmatch(r"\d+", seg):
return seg
if re.fullmatch(r"\d+\.\d+", seg):
return str(int(float(seg)))
dt = parse_oom_time_for_remote(seg)
if dt is None:
return seg
return str(int(dt.timestamp()))
def normalize_oomcheck_time_param(raw: Optional[str]) -> str:
"""
SysOM Invoke 对 params 值字符集限制为 [a-zA-Z0-9_.~-];冒号等 ISO 时间会被拒。
将 oomcheck 的 time(或 `开始~结束`)转为 Unix 秒数字符串(段间仍用 ~)。
无法解析的片段保持原样,便于服务端或其它格式仍报错时排查。
"""
if raw is None:
return ""
t = str(raw).strip()
if not t:
return ""
if "~" in t:
left, right = t.split("~", 1)
return f"{_one_oom_time_segment_to_unix(left)}~{_one_oom_time_segment_to_unix(right)}"
return _one_oom_time_segment_to_unix(t)
def _kernel_tail_lines(max_lines: int = 4000) -> List[str]:
lines = get_kernel_log_lines("journal", None)
if not lines:
return []
return lines[-max_lines:] if len(lines) > max_lines else lines
def _oom_loose_line_hits(tail: List[str]) -> List[str]:
return [ln for ln in tail if _OOM_PATTERNS.search(ln)]
def _oom_hit_lines(max_lines: int = 4000) -> List[str]:
"""OOM 相关行:优先按 sysAK 块展开;无块时退回宽松单行匹配。"""
tail = _kernel_tail_lines(max_lines)
if not tail:
return []
blocks = extract_oom_blocks(tail)
if blocks:
out: List[str] = []
for b in blocks:
out.extend(b["lines"])
return out
return _oom_loose_line_hits(tail)
def get_oom_kernel_hits(max_lines: int = 4000) -> List[str]:
return _oom_hit_lines(max_lines)
def _parse_oom_line_time(line: str) -> Tuple[Optional[datetime], str]:
"""
从内核日志行解析时间。返回 (datetime 或 None, reason)。
reason: journal | dmesg_human | dmesg_relative_s | none
"""
m = _JOURNAL_TS.match(line)
if m:
frag = m.group(1)
for fmt in ("%b %d %H:%M:%S.%f", "%b %d %H:%M:%S"):
try:
dt = datetime.strptime(frag, fmt)
now = datetime.now()
dt = dt.replace(year=now.year)
if dt > now + timedelta(days=1):
dt = dt.replace(year=now.year - 1)
return dt, "journal"
except ValueError:
continue
m = _DMESG_HUMAN_TS.match(line)
if m:
try:
return datetime.strptime(m.group(1), "%a %b %d %H:%M:%S %Y"), "dmesg_human"
except ValueError:
pass
m = _DMESG_REL.match(line)
if m:
try:
sec = float(m.group(1))
return None, f"dmesg_relative_s:{sec:.3f}"
except ValueError:
pass
return None, "none"
def _killed_comm_from_lines(lines: List[str]) -> Optional[str]:
for ln in lines:
m = _KILLED_RE.search(ln)
if m:
comm = (m.group(1) or "").strip()
return comm.split()[0] if comm else None
return None
def _scope_hint_from_block(b: Dict[str, Any]) -> str:
raw = b.get("raw_block") or ""
if b.get("cgroup") or b.get("parent_cgroup"):
return "memcg"
if "CONSTRAINT_MEMCG" in raw or "task_memcg=" in raw:
return "memcg"
return "host"
def _block_start_dt(lines: List[str]) -> Tuple[Optional[datetime], str]:
first = lines[0] if lines else ""
return _parse_oom_line_time(first)
def _pick_primary_block_index(
block_dts: List[Optional[datetime]],
*,
anchor: Optional[datetime],
) -> Tuple[int, Optional[str]]:
"""返回 (primary_index, note_zh);无墙钟且指定了 anchor 时退回最后一块并附说明。"""
n = len(block_dts)
if n == 0:
return 0, None
if anchor is None:
return n - 1, None
usable = [(i, dt) for i, dt in enumerate(block_dts) if dt is not None]
if not usable:
return n - 1, "已指定 --oom-at,但扫描窗口内各 OOM 块首行均无墙钟时间,全文退回为最后一次 OOM 块。"
best_i, best_dt = min(usable, key=lambda x: abs((x[1] - anchor).total_seconds()))
return best_i, None
def _full_log_indices(n_blocks: int, max_full: int, primary: int) -> List[int]:
if max_full <= 0 or n_blocks == 0:
return []
start = max(0, primary - max_full + 1)
return list(range(start, primary + 1))
def _similar_oom_group_key(s: Dict[str, Any]) -> Tuple[str, str, str, str]:
"""
强分组:scope_hint + killed_comm + cgroup 路径(cgroup 优先,否则 parent)+ 本地小时桶。
无墙钟时小时桶为 __no_wallclock__(同类仍可按 scope/comm/cg 合并)。
"""
scope = str(s.get("scope_hint") or "unknown")
comm = str(s.get("killed_comm") or "")
cg = str(s.get("cgroup") or s.get("parent_cgroup") or "")
ps = s.get("parsed_start_local")
if ps:
try:
dt = datetime.fromisoformat(str(ps))
hour = dt.replace(minute=0, second=0, microsecond=0).isoformat(timespec="seconds")
except ValueError:
hour = "__no_wallclock__"
else:
hour = "__no_wallclock__"
return (scope, comm, cg, hour)
def _collapse_similar_oom_summaries(summaries: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""同组仅保留时间上最后一条(按扫描顺序 / event_index),并写入 similar_oom_count。"""
reps: Dict[Tuple[str, str, str, str], Dict[str, Any]] = {}
for s in summaries:
k = _similar_oom_group_key(s)
if k not in reps:
reps[k] = {**s, "similar_oom_count": 1}
else:
old = reps[k]
reps[k] = {**s, "similar_oom_count": int(old.get("similar_oom_count", 1)) + 1}
return sorted(reps.values(), key=lambda x: int(x.get("event_index", 0)))
def analyze_oom_local(
*,
max_lines: int = 4000,
max_event_summaries: int = 64,
max_full_oom_logs: int = 1,
oom_at: Optional[str] = None,
) -> Dict[str, Any]:
"""
本机 OOM quick:按块边界解析;oom_events_summary 为轻量摘要(先按
scope_hint、killed_comm、cgroup 路径与**同一本地小时**折叠同类,每组只保留最近一条并带
similar_oom_count,再取最近至多 max_event_summaries 条)。
oom_digest 为结构化关键字段(默认仅最近一次,可用 oom_at 锚定最近墙钟块)。
无块时退回宽松单行匹配。
"""
anchor = parse_oom_at_anchor(oom_at)
empty: Dict[str, Any] = {
"hit_count": 0,
"oom_event_count": 0,
"oom_lines_total": None,
"extraction_mode": "none",
"oom_events_summary": [],
"oom_digest": [],
"time_range": None,
"histogram_hour_local": [],
"parsed_time_count": 0,
"unparsed_wallclock_count": 0,
"dmesg_relative_line_count": 0,
"relative_boot_seconds_sample": [],
"source_note_zh": "未在扫描窗口内匹配到 OOM / oom-killer 相关行。",
}
tail = _kernel_tail_lines(max_lines)
if not tail:
return dict(empty)
blocks = extract_oom_blocks(tail)
extraction_mode = "sysak_blocks" if blocks else "loose_lines"
if blocks:
n_events = len(blocks)
n_lines = sum(len(b["lines"]) for b in blocks)
parsed_dts: List[datetime] = []
relative_secs: List[float] = []
unparsed_first = 0
block_wallclock: List[Optional[datetime]] = []
for b in blocks:
first = b["lines"][0] if b["lines"] else ""
dt, _src = _parse_oom_line_time(first)
block_wallclock.append(dt)
if dt is not None:
parsed_dts.append(dt)
elif _src.startswith("dmesg_relative_s:"):
try:
relative_secs.append(float(_src.split(":", 1)[1]))
except (ValueError, IndexError):
unparsed_first += 1
else:
unparsed_first += 1
primary, anchor_note = _pick_primary_block_index(block_wallclock, anchor=anchor)
idx_full = _full_log_indices(n_events, max_full_oom_logs, primary)
oom_digest = [extract_oom_digest(blocks[i]) for i in idx_full]
summaries_all: List[Dict[str, Any]] = []
for i, b in enumerate(blocks):
dt_b, _ = _block_start_dt(b["lines"])
first_line = b["lines"][0] if b["lines"] else ""
summaries_all.append(
{
"event_index": i,
"parsed_start_local": dt_b.isoformat(timespec="seconds") if dt_b else None,
"line_count": len(b["lines"]),
"cgroup": b.get("cgroup") or None,
"parent_cgroup": b.get("parent_cgroup") or None,
"scope_hint": _scope_hint_from_block(b),
"killed_comm": _killed_comm_from_lines(b["lines"]),
"incomplete": bool(b.get("incomplete", False)),
"first_line_preview": (first_line[:240] + "…") if len(first_line) > 240 else first_line,
}
)
collapsed = _collapse_similar_oom_summaries(summaries_all)
oom_events_summary = collapsed[-max_event_summaries:] if collapsed else []
hour_counter: Counter[str] = Counter()
for dt in parsed_dts:
buck = dt.replace(minute=0, second=0, microsecond=0)
hour_counter[buck.isoformat(timespec="seconds")] += 1
histogram = [{"bucket_start_local": k, "count": v} for k, v in sorted(hour_counter.items())]
time_range = None
if parsed_dts:
mn, mx = min(parsed_dts), max(parsed_dts)
time_range = {
"first_seen_local": mn.isoformat(timespec="seconds"),
"last_seen_local": mx.isoformat(timespec="seconds"),
}
note_parts = [
f"扫描内核日志尾部至多 {max_lines} 行;按 sysAK oomcheck 块解析到 {n_events} 次完整 OOM(共 {n_lines} 行内核文本);"
f"摘要先按同类(scope/comm/cgroup/小时)折叠为 {len(collapsed)} 条,再保留最近 {len(oom_events_summary)} 条(上限 {max_event_summaries});"
f"oom_digest 含 {len(oom_digest)} 条结构化摘要(上限 {max_full_oom_logs}"
+ (",按 --oom-at 锚定最近墙钟块" if anchor is not None else ",默认以最近一次为主")
+ ")。"
f"{len(parsed_dts)} 次在块首行解析到墙钟;{len(relative_secs)} 次块首为 dmesg 相对启动秒;{unparsed_first} 次块首无墙钟。"
]
if anchor_note:
note_parts.append(anchor_note)
if relative_secs and not parsed_dts:
note_parts.append(" 仅相对启动时间时按小时曲线为空,可换用 journal 或 dmesg -T。")
return {
"hit_count": n_events,
"oom_event_count": n_events,
"oom_lines_total": n_lines,
"extraction_mode": extraction_mode,
"oom_events_summary": oom_events_summary,
"oom_digest": oom_digest,
"time_range": time_range,
"histogram_hour_local": histogram,
"parsed_time_count": len(parsed_dts),
"unparsed_wallclock_count": unparsed_first,
"dmesg_relative_line_count": len(relative_secs),
"relative_boot_seconds_sample": relative_secs[-8:] if relative_secs else [],
"source_note_zh": "".join(note_parts),
}
hits = _oom_loose_line_hits(tail)
n = len(hits)
if n == 0:
return dict(empty)
parsed_dts = []
relative_secs = []
unparsed_wall = 0
hit_dts: List[Optional[datetime]] = []
for ln in hits:
dt, src = _parse_oom_line_time(ln)
hit_dts.append(dt)
if dt is not None:
parsed_dts.append(dt)
elif src.startswith("dmesg_relative_s:"):
try:
relative_secs.append(float(src.split(":", 1)[1]))
except (ValueError, IndexError):
unparsed_wall += 1
else:
unparsed_wall += 1
primary, anchor_note = _pick_primary_block_index(hit_dts, anchor=anchor)
idx_full = _full_log_indices(n, max_full_oom_logs, primary)
oom_digest = [{"raw_line": hits[i]} for i in idx_full]
summaries_all = []
for i, ln in enumerate(hits):
dt_b, _ = _parse_oom_line_time(ln)
summaries_all.append(
{
"event_index": i,
"parsed_start_local": dt_b.isoformat(timespec="seconds") if dt_b else None,
"line_count": 1,
"cgroup": None,
"parent_cgroup": None,
"scope_hint": "unknown",
"killed_comm": _killed_comm_from_lines([ln]),
"incomplete": False,
"first_line_preview": (ln[:240] + "…") if len(ln) > 240 else ln,
}
)
collapsed = _collapse_similar_oom_summaries(summaries_all)
oom_events_summary = collapsed[-max_event_summaries:] if collapsed else []
hour_counter = Counter()
for dt in parsed_dts:
b = dt.replace(minute=0, second=0, microsecond=0)
hour_counter[b.isoformat(timespec="seconds")] += 1
histogram = [{"bucket_start_local": k, "count": v} for k, v in sorted(hour_counter.items())]
time_range = None
if parsed_dts:
mn, mx = min(parsed_dts), max(parsed_dts)
time_range = {
"first_seen_local": mn.isoformat(timespec="seconds"),
"last_seen_local": mx.isoformat(timespec="seconds"),
}
note_parts = [
f"扫描尾部 {max_lines} 行:未形成 invoked oom-killer 起点的完整块,退回宽松匹配共 {n} 条相关行;"
f"摘要先折叠为 {len(collapsed)} 条,再保留最近 {len(oom_events_summary)} 条(上限 {max_event_summaries}),"
f"oom_digest {len(oom_digest)} 条(上限 {max_full_oom_logs})。"
f"{len(parsed_dts)} 条解析到墙钟,{len(relative_secs)} 条相对启动秒,{unparsed_wall} 条无时间。"
]
if anchor_note:
note_parts.append(anchor_note)
if relative_secs and not parsed_dts:
note_parts.append(" 仅相对启动时间时按小时曲线为空。")
return {
"hit_count": n,
"oom_event_count": 0,
"oom_lines_total": None,
"extraction_mode": extraction_mode,
"oom_events_summary": oom_events_summary,
"oom_digest": oom_digest,
"time_range": time_range,
"histogram_hour_local": histogram,
"parsed_time_count": len(parsed_dts),
"unparsed_wallclock_count": unparsed_wall,
"dmesg_relative_line_count": len(relative_secs),
"relative_boot_seconds_sample": relative_secs[-8:] if relative_secs else [],
"source_note_zh": "".join(note_parts),
}
FILE:scripts/sysom_cli/memory/lib/remote_capabilities.py
# -*- coding: utf-8 -*-
"""
用户可见「远程专项能进一步得到什么」——措辞与 skill 内 references/diagnoses 对齐;
不包含仓库外路径,供 classify / 各 memory 子命令复用。
"""
from __future__ import annotations
from typing import Dict
def remote_analysis_value_map() -> Dict[str, str]:
"""service_name -> 简短说明(中文)。"""
return {
"memgraph": (
"SysOM memgraph:在目标 ECS 上做内存全景采集与汇总,含整机/应用维度的内存组成,"
"强于仅看本机 /proc 粗值。"
),
"oomcheck": (
"SysOM oomcheck:在目标机结合 memgraph 等路径上的输出与日志侧信息,"
"定位 OOM / oom-killer 事件与根因链。"
),
"javamem": (
"SysOM javamem:JVM 堆、GC、Java 进程内存等语言侧专项分析。"
),
}
FILE:scripts/sysom_cli/memory/lib/shared_invoke_args.py
# -*- coding: utf-8 -*-
"""memory 子命令共用的「可选深度诊断」CLI 参数(与 io|net|load 专项子命令共用 OpenAPI 侧选项)。"""
from __future__ import annotations
from typing import Any, Dict, List, Tuple
# (flags, kwargs) 供 @command_metadata args= 使用(含 --deep-diagnosis 与远程专项共用参数)
MEMORY_DEEP_DIAGNOSIS_ARGS: List[Tuple[Any, Dict[str, Any]]] = [
(
["--verbose-envelope"],
{
"action": "store_true",
"help": "输出展开型 agent.summary 与 agent.next(默认紧凑:省 token,结论见 data.*)。",
},
),
(
["--deep-diagnosis"],
{
"action": "store_true",
"help": "快速排查完成后,同一次调用内继续深度诊断(与 io|net|load 同一 OpenAPI 链路)。",
},
),
(
["--channel"],
{"default": "ecs", "help": "诊断通道,默认 ecs"},
),
(
["--timeout"],
{"type": int, "default": 300, "help": "轮询 GetDiagnosisResult 超时秒数"},
),
(
["--poll-interval"],
{"type": int, "default": 1, "help": "轮询间隔秒数"},
),
(
["--region"],
{"default": None, "help": "合并到 params.region;与专项 invoke 子命令相同"},
),
(
["--instance"],
{"default": None, "help": "合并到 params.instance;与专项 invoke 子命令相同"},
),
(
["--params"],
{"default": None, "help": "params JSON 字符串"},
),
(
["--params-file"],
{"default": None, "help": "params JSON 文件路径"},
),
]
FILE:scripts/sysom_cli/memory/memgraph/__init__.py
# memory memgraph
FILE:scripts/sysom_cli/memory/memgraph/command.py
# -*- coding: utf-8 -*-
"""内存偏高 / 组成不明 / TCP·socket 相关内存:本机 meminfo 与 RSS 粗采样;SysOM memgraph 含全景与 socket/TCP 等采集(见专文)。"""
from __future__ import annotations
from argparse import Namespace
from typing import Any, Dict, List
from sysom_cli.core.base import BaseCommand, ExecutionMode
from sysom_cli.core.registry import command_metadata
from sysom_cli.lib.schema import agent_block, envelope
from sysom_cli.memory.lib.classify_engine import meminfo_quick_facts, memory_ps_top_sample
from sysom_cli.memory.lib.envelope_memory import (
next_steps_struct,
)
from sysom_cli.memory.lib.memory_envelope_finalize import finalize_memory_envelope
from sysom_cli.memory.lib.memory_remote_helpers import (
run_memory_deep_diagnosis_local_first,
run_memory_remote_invoke,
)
from sysom_cli.memory.lib.remote_capabilities import remote_analysis_value_map
from sysom_cli.memory.lib.shared_invoke_args import MEMORY_DEEP_DIAGNOSIS_ARGS
def _limits_memgraph() -> str:
return (
"本机仅 /proc/meminfo 与进程 RSS 粗采样;整机/应用维度的内存大图与组成拆解需 "
"SysOM memgraph 深度诊断。"
"MemAvailable 占比偏高**不能**单独证明「无内存问题」:尖峰、cgroup/容器限额、"
"JVM/语言运行时堆、延迟回收缓存等,均可能使用户或应用仍感知「内存不够」。"
"本输出人读与自动化共用:若**要继续用 SysOM 做全景/专项**,应在技能根按 agent.next 执行 "
"`memory memgraph --deep-diagnosis`;若仅查看本机粗检、不打算调用 SysOM,可忽略 agent.next。"
)
@command_metadata(
name="memgraph",
help=(
"本机快速排查:内存用量摘要与高 RSS 采样;深度诊断走 SysOM memgraph(整机/应用内存组成);"
"可选 --deep-diagnosis。"
),
subsystem="memory",
args=list(MEMORY_DEEP_DIAGNOSIS_ARGS),
)
class MemgraphHintCommand(BaseCommand):
@property
def command_name(self) -> str:
return "memgraph"
@property
def supported_modes(self) -> Dict[str, bool]:
return {
ExecutionMode.LOCAL: True,
ExecutionMode.REMOTE: True,
ExecutionMode.HYBRID: False,
}
def execute_local(self, ns: Namespace) -> Dict[str, Any]:
recommended = "memgraph"
remote_analysis_value = remote_analysis_value_map()
if getattr(ns, "deep_diagnosis", False):
return run_memory_deep_diagnosis_local_first(
recommended=recommended,
memory_action="memory_memgraph_hint",
ns=ns,
remote_analysis_value=remote_analysis_value,
verbose_summary=(
"深度诊断:已发起 SysOM memgraph 专项。"
"(未跑本机 meminfo/RSS 快照;本路径以远程专项为主。)"
),
)
facts = meminfo_quick_facts()
top = memory_ps_top_sample()
ratio = facts.get("mem_available_ratio")
total_kb = facts.get("mem_total_kb", 0)
if total_kb <= 0:
summary = (
"未能读取本机 MemTotal;仍可直接发起 SysOM memgraph 在目标机采集内存全景大图。"
)
confidence = 0.4
cats = ["memory_panorama"]
elif ratio is not None and ratio < 0.08:
summary = (
f"MemAvailable 占比偏低(约 {ratio:.1%}),适合用 SysOM memgraph 做内存全景拆解。"
"本机粗检不能替代 SysOM 采集;拟用 SysOM 时须按 agent.next 发起 --deep-diagnosis。"
)
confidence = 0.72
cats = ["memory_pressure", "memory_panorama"]
else:
summary = (
"本机 MemAvailable 比例尚可,仍**不**排除用户感知的内存压力或应用侧内存错误;"
"若本次会话按「内存不够/内存高/OOM 相关」排障,须用 SysOM memgraph 深度复核。"
"请按 agent.next 执行 `memory memgraph --deep-diagnosis`,勿仅凭本封粗检下结论。"
)
confidence = 0.62
cats = ["memory_panorama", "memory_deep_followup_recommended"]
sample = [{"comm": c, "rss_kb": rss} for c, _, rss in top[:8]]
next_actions = next_steps_struct(recommended, ns)
agent = agent_block(
"normal",
summary,
findings=[
{
"kind": "meminfo_summary",
"mem_total_kb": facts.get("mem_total_kb"),
"mem_available_ratio": facts.get("mem_available_ratio"),
},
],
next_steps=next_actions,
)
data: Dict[str, Any] = {
"recommended_service_name": recommended,
"meminfo_facts": facts,
"rss_top_sample": sample,
}
out = envelope(
action="memory_memgraph_hint",
ok=True,
agent=agent,
data=data,
execution={"subsystem": "memory", "phase": "quick_review"},
)
return finalize_memory_envelope(out, ns, verbose_summary=summary)
def execute_remote(self, ns: Namespace) -> Dict[str, Any]:
recommended = "memgraph"
remote_analysis_value = remote_analysis_value_map()
rsummary = "深度诊断模式:直接发起 memgraph(未跑本机 meminfo/RSS 快照)。"
out = envelope(
action="memory_memgraph_hint",
ok=True,
agent=agent_block("normal", rsummary, findings=[], next_steps=[]),
data={
"recommended_service_name": recommended,
"remote_analysis_value": remote_analysis_value,
},
execution={"subsystem": "memory", "mode": "remote", "phase": "remote_invoke"},
)
return run_memory_remote_invoke(
out, recommended, ns, verbose_summary=rsummary
)
FILE:scripts/sysom_cli/memory/oom/__init__.py
# memory oom:本地 OOM 线索轻量检测
FILE:scripts/sysom_cli/memory/oom/command.py
# -*- coding: utf-8 -*-
"""本机 OOM 线索:内核日志轻量扫描;建议 SysOM oomcheck 专项。"""
from __future__ import annotations
import json
from argparse import Namespace
from pathlib import Path
from typing import Any, Dict, List, Tuple
from sysom_cli.core.base import BaseCommand, ExecutionMode
from sysom_cli.core.registry import command_metadata
from sysom_cli.lib.schema import agent_block, envelope
from sysom_cli.memory.lib.envelope_memory import (
next_steps_struct,
oom_diagnosis_agent_brief_zh,
oom_diagnosis_invoke_extra_purpose_zh,
)
from sysom_cli.memory.lib.memory_envelope_finalize import finalize_memory_envelope
from sysom_cli.memory.lib.memory_remote_helpers import (
run_memory_deep_diagnosis_local_first,
run_memory_remote_invoke,
)
from sysom_cli.memory.lib.oom_quick import (
analyze_oom_local,
normalize_oomcheck_time_param,
)
from sysom_cli.memory.lib.remote_capabilities import remote_analysis_value_map
from sysom_cli.memory.lib.shared_invoke_args import MEMORY_DEEP_DIAGNOSIS_ARGS
def merge_oom_time_into_namespace(ns: Namespace) -> Namespace:
"""将 `--oom-time` 写入 params.time,供远程 oomcheck / --deep-diagnosis 调用。"""
oom_time = getattr(ns, "oom_time", None)
if not oom_time or not str(oom_time).strip():
return ns
params: Dict[str, Any] = {}
try:
if getattr(ns, "params", None):
params = json.loads(ns.params)
elif getattr(ns, "params_file", None):
raw = Path(ns.params_file).read_text(encoding="utf-8")
params = json.loads(raw)
except (json.JSONDecodeError, OSError, TypeError):
params = {}
params["time"] = normalize_oomcheck_time_param(str(oom_time).strip())
kw = vars(ns).copy()
kw["params"] = json.dumps(params, ensure_ascii=False)
kw["params_file"] = None
return Namespace(**kw)
OOM_COMMAND_EXTRA_ARGS: List[Tuple[Any, Dict[str, Any]]] = [
(
["--oom-at"],
{
"dest": "oom_at",
"default": None,
"help": "本机 quick:锚定墙钟时间,选取时间上最接近的 OOM 块作为全文主选(Unix 秒 / ISO / journal 风格月日时分秒)",
},
),
(
["--oom-time"],
{
"dest": "oom_time",
"default": None,
"help": (
"远程 oomcheck:合并到 params.time(可与 --params / --params-file 叠加);"
"支持 ISO、日期+时间、Unix 秒、journal 风格等,发起前会转为 Unix 秒以满足 OpenAPI 字符集"
),
},
),
(
["--max-oom-summaries"],
{
"dest": "max_oom_summaries",
"type": int,
"default": 64,
"help": "折叠同类摘要后,oom_events_summary 最多保留最近若干条(默认 64)",
},
),
(
["--max-oom-full-logs"],
{
"dest": "max_oom_full_logs",
"type": int,
"default": 1,
"help": "本机 oom_logs 全文条数上限(默认 1;与 --oom-at 组合时以锚定块为末端向前取)",
},
),
]
@command_metadata(
name="oom",
help=(
"本机快速排查:扫描内核日志中的 OOM / oom-killer 行;建议 SysOM oomcheck;"
"可选 --deep-diagnosis:快速排查后继续深度诊断。"
),
subsystem="memory",
args=list(MEMORY_DEEP_DIAGNOSIS_ARGS) + OOM_COMMAND_EXTRA_ARGS,
)
class OomHintCommand(BaseCommand):
@property
def command_name(self) -> str:
return "oom"
@property
def supported_modes(self) -> Dict[str, bool]:
return {
ExecutionMode.LOCAL: True,
ExecutionMode.REMOTE: True,
ExecutionMode.HYBRID: False,
}
def execute_local(self, ns: Namespace) -> Dict[str, Any]:
recommended = "oomcheck"
remote_analysis_value = remote_analysis_value_map()
invoke_ns = merge_oom_time_into_namespace(ns)
if getattr(ns, "deep_diagnosis", False):
return run_memory_deep_diagnosis_local_first(
recommended=recommended,
memory_action="memory_oom_hint",
ns=invoke_ns,
remote_analysis_value=remote_analysis_value,
verbose_summary=(
"深度诊断:已发起 SysOM oomcheck 专项。"
"(未跑本机内核日志扫描;本路径以远程专项为主。)"
),
)
oom_local = analyze_oom_local(
oom_at=getattr(ns, "oom_at", None),
max_event_summaries=int(getattr(ns, "max_oom_summaries", 64) or 64),
max_full_oom_logs=int(getattr(ns, "max_oom_full_logs", 1) or 0),
)
n = oom_local["hit_count"]
lt = oom_local.get("oom_lines_total")
has_signal = n > 0
if has_signal:
if oom_local.get("extraction_mode") == "sysak_blocks" and lt is not None:
summary = f"解析到 {n} 次 OOM 事件({lt} 行内核文本)。"
else:
summary = f"宽松匹配到 {n} 条 OOM 相关内核行。"
else:
summary = "近期内核日志未匹配到 OOM 事件。"
tr = oom_local.get("time_range")
if has_signal and tr:
summary += f" 时间:{tr['first_seen_local']} ~ {tr['last_seen_local']}。"
if has_signal and n > 1:
summary += " 多次 OOM,可用 --oom-at/--oom-time 指定某次。"
next_actions = next_steps_struct(
recommended,
ns,
diagnosis_extra_purpose_zh=oom_diagnosis_invoke_extra_purpose_zh(n),
)
findings: List[Dict[str, Any]] = [
{"kind": "oom_kernel_hits", "oom_event_count": n},
]
agent = agent_block(
"normal",
summary,
findings=findings,
next_steps=next_actions,
)
data: Dict[str, Any] = {
"recommended_service_name": recommended,
"oom_signal": has_signal,
"hit_count": n,
"oom_local": oom_local,
}
out = envelope(
action="memory_oom_hint",
ok=True,
agent=agent,
data=data,
execution={"subsystem": "memory", "phase": "quick_review"},
)
return finalize_memory_envelope(out, ns, verbose_summary=summary)
def execute_remote(self, ns: Namespace) -> Dict[str, Any]:
recommended = "oomcheck"
remote_analysis_value = remote_analysis_value_map()
rsummary = "深度诊断模式:直接发起 oomcheck(未跑本机内核日志扫描)。"
out = envelope(
action="memory_oom_hint",
ok=True,
agent=agent_block("normal", rsummary, findings=[], next_steps=[]),
data={
"recommended_service_name": recommended,
"remote_analysis_value": remote_analysis_value,
},
execution={"subsystem": "memory", "mode": "remote", "phase": "remote_invoke"},
)
return run_memory_remote_invoke(
out, recommended, merge_oom_time_into_namespace(ns), verbose_summary=rsummary
)
FILE:scripts/sysom_cli/net/__init__.py
# -*- coding: utf-8 -*-
"""网络类 SysOM 专项(薄封装 service_name)。"""
FILE:scripts/sysom_cli/net/netjitter/__init__.py
# -*- coding: utf-8 -*-
FILE:scripts/sysom_cli/net/netjitter/command.py
# -*- coding: utf-8 -*-
from __future__ import annotations
from sysom_cli.core.registry import command_metadata
from sysom_cli.lib.specialty_args import SPECIALTY_INVOKE_ARGS
from sysom_cli.lib.specialty_command import BaseServiceSpecialtyCommand
@command_metadata(
name="netjitter",
help="SysOM 专项 netjitter:网络抖动诊断。params 见 references/diagnoses/netjitter.md",
subsystem="net",
args=list(SPECIALTY_INVOKE_ARGS),
)
class NetjitterCommand(BaseServiceSpecialtyCommand):
SERVICE_NAME = "netjitter"
FILE:scripts/sysom_cli/net/packetdrop/__init__.py
# -*- coding: utf-8 -*-
FILE:scripts/sysom_cli/net/packetdrop/command.py
# -*- coding: utf-8 -*-
from __future__ import annotations
from sysom_cli.core.registry import command_metadata
from sysom_cli.lib.specialty_args import SPECIALTY_INVOKE_ARGS
from sysom_cli.lib.specialty_command import BaseServiceSpecialtyCommand
@command_metadata(
name="packetdrop",
help="SysOM 专项 packetdrop:网络丢包诊断。params 见 references/diagnoses/packetdrop.md",
subsystem="net",
args=list(SPECIALTY_INVOKE_ARGS),
)
class PacketdropCommand(BaseServiceSpecialtyCommand):
SERVICE_NAME = "packetdrop"
FILE:scripts/sysom_cli/precheck/__init__.py
# -*- coding: utf-8 -*-
"""
环境预检查命令模块
检查阿里云认证配置并验证 SysOM API 访问权限
"""
FILE:scripts/sysom_cli/precheck/command.py
# -*- coding: utf-8 -*-
"""
Precheck 命令实现
环境预检查:验证阿里云认证配置和 SysOM API 权限
"""
from __future__ import annotations
from argparse import Namespace
from typing import Any, Dict
from sysom_cli.core.base import BaseCommand, ExecutionMode
from sysom_cli.core.registry import command_metadata
@command_metadata(
name="precheck",
help="环境预检查:验证阿里云认证配置和 SysOM API 权限",
args=[] # precheck 不需要额外参数
)
class PrecheckCommand(BaseCommand):
"""环境预检查命令"""
@property
def command_name(self) -> str:
return "precheck"
@property
def supported_modes(self) -> Dict[str, bool]:
return {
ExecutionMode.LOCAL: True,
ExecutionMode.REMOTE: False,
ExecutionMode.HYBRID: False,
}
def execute_local(self, ns: Namespace) -> Dict[str, Any]:
"""
Local 模式:执行环境预检查
检查阿里云认证配置(AKSK / ECS RAM Role)
并验证 SysOM API 访问权限
返回标准 JSON 信封格式
"""
from sysom_cli.lib.auth import run_precheck
from sysom_cli.lib.precheck_envelope import envelope_from_precheck_result
return envelope_from_precheck_result(run_precheck())
FILE:scripts/sysom_cli/tests/__init__.py
# Test package for sysom_cli
FILE:scripts/sysom_cli/tests/test_all_remote_diagnosis_envelope_results.py
# -*- coding: utf-8 -*-
"""Mock 成功响应时,各 SysOM 专项信封须含非空的 data.remote.result(防二次 finalize 等回归)。"""
from __future__ import annotations
import unittest
from argparse import Namespace
from typing import List
from sysom_cli.core import registry as reg
from sysom_cli.diagnosis.invoke.command import DiagnosisInvokeCommand
from sysom_cli.lib.diagnosis_backend import namespace_for_diagnosis_invoke
from sysom_cli.lib.diagnosis_helper import DiagnoseResultCode, DiagnosisRequest, DiagnosisResponse
def _ensure_registry() -> None:
reg.CommandRegistry._discovered = False
reg.CommandRegistry._commands.clear()
reg.CommandRegistry._metadata.clear()
reg.CommandRegistry._command_subsystem.clear()
reg.CommandRegistry.discover_commands(top_level=True)
reg.CommandRegistry.discover_commands()
def _specialty_ns() -> Namespace:
return Namespace(
channel="ecs",
region="cn-hangzhou",
instance="i-mockprobe",
params=None,
params_file=None,
timeout=30,
poll_interval=1,
verbose_envelope=False,
)
async def _fake_diagnosis_execute(self: object, req: DiagnosisRequest) -> DiagnosisResponse:
return DiagnosisResponse(
code=DiagnoseResultCode.SUCCESS,
message="",
task_id="mock-task-id",
result={"mock_ok": True, "service_name": req.service_name},
)
class AllRemoteDiagnosisEnvelopeResultTests(unittest.TestCase):
@classmethod
def setUpClass(cls) -> None:
_ensure_registry()
# io / net / load 已裁剪,仅保留内存诊断专项
SPECIALTY_CLI_NAMES: List[str] = []
# memory 深度诊断复用同一 invoke 后端的 service_name
MEMORY_INVOKE_SERVICE_NAMES: List[str] = ["memgraph", "oomcheck", "javamem"]
def test_specialty_commands_remote_result_populated_under_mock(self) -> None:
from unittest.mock import patch
with patch(
"sysom_cli.diagnosis.invoke.command.DiagnosisMCPHelper.execute",
new=_fake_diagnosis_execute,
):
for name in self.SPECIALTY_CLI_NAMES:
with self.subTest(command=name):
cmd = reg.CommandRegistry.get(name)
out = cmd.execute_remote(_specialty_ns())
self.assertTrue(out.get("ok"), msg=f"{name}: {out}")
remote = (out.get("data") or {}).get("remote") or {}
self.assertIsInstance(remote, dict)
res = remote.get("result")
self.assertIsNotNone(
res,
msg=f"{name}: data.remote.result 不应为 null,实际 remote keys={list(remote)}",
)
self.assertIsInstance(res, dict)
self.assertEqual(res.get("service_name"), name)
def test_invoke_command_all_service_names_remote_result(self) -> None:
from unittest.mock import patch
all_services = self.SPECIALTY_CLI_NAMES + self.MEMORY_INVOKE_SERVICE_NAMES
with patch(
"sysom_cli.diagnosis.invoke.command.DiagnosisMCPHelper.execute",
new=_fake_diagnosis_execute,
):
for svc in all_services:
with self.subTest(service_name=svc):
ns = namespace_for_diagnosis_invoke(svc, _specialty_ns())
out = DiagnosisInvokeCommand().execute_remote(ns)
self.assertTrue(out.get("ok"), msg=f"{svc}: {out}")
remote = (out.get("data") or {}).get("remote") or {}
res = remote.get("result")
self.assertIsNotNone(res, msg=f"{svc}: data.remote.result 缺失")
self.assertIsInstance(res, dict)
self.assertEqual(res.get("service_name"), svc)
sub = reg.CommandRegistry.get_subsystem(svc)
if sub:
self.assertEqual((out.get("execution") or {}).get("subsystem"), sub)
if __name__ == "__main__":
unittest.main()
FILE:scripts/sysom_cli/tests/test_auth_sts_token.py
# -*- coding: utf-8 -*-
from __future__ import annotations
import json
import tempfile
import unittest
from pathlib import Path
from unittest.mock import patch
from sysom_cli.lib.auth import check_aliyun_config, check_env_credentials
class AuthStsTokenTests(unittest.TestCase):
def test_env_credentials_support_sts_token(self) -> None:
with patch.dict(
"os.environ",
{
"ALIBABA_CLOUD_ACCESS_KEY_ID": "STS.ID",
"ALIBABA_CLOUD_ACCESS_KEY_SECRET": "secret",
"ALIBABA_CLOUD_SECURITY_TOKEN": "token-abc",
},
clear=True,
):
out = check_env_credentials()
self.assertTrue(out["available"])
self.assertEqual(out["method"], "环境变量(STS Token)")
self.assertEqual(out["credentials"]["ak_id"], "STS.ID")
self.assertEqual(out["credentials"]["ak_secret"], "secret")
self.assertEqual(out["credentials"]["security_token"], "token-abc")
def test_config_support_sts_token_mode(self) -> None:
with tempfile.TemporaryDirectory() as tmp:
home = Path(tmp)
cfg_path = home / ".aliyun" / "config.json"
cfg_path.parent.mkdir(parents=True, exist_ok=True)
cfg_path.write_text(
json.dumps(
{
"current": "default",
"profiles": [
{
"name": "default",
"mode": "StsToken",
"access_key_id": "STS.ID",
"access_key_secret": "secret",
"sts_token": "token-xyz",
}
],
}
),
encoding="utf-8",
)
with patch.dict("os.environ", {"HOME": str(home)}):
out = check_aliyun_config()
self.assertTrue(out["available"])
self.assertEqual(out["method"], "配置文件(StsToken)")
self.assertEqual(out["credentials"]["ak_id"], "STS.ID")
self.assertEqual(out["credentials"]["ak_secret"], "secret")
self.assertEqual(out["credentials"]["security_token"], "token-xyz")
if __name__ == "__main__":
unittest.main()
FILE:scripts/sysom_cli/tests/test_configure_non_interactive.py
# -*- coding: utf-8 -*-
from __future__ import annotations
import tempfile
import unittest
from argparse import Namespace
from pathlib import Path
from unittest.mock import patch
from sysom_cli.configure.command import ConfigureCommand
class ConfigureNonInteractiveTests(unittest.TestCase):
def test_eof_returns_non_interactive_envelope_with_settings_hint(self) -> None:
with tempfile.TemporaryDirectory() as tmp:
home = Path(tmp)
fake_cfg = home / ".aliyun" / "config.json"
fake_cfg.parent.mkdir(parents=True, exist_ok=True)
fake_cfg.write_text("{}", encoding="utf-8")
def fake_input(_: str = "") -> str:
raise EOFError()
with patch.dict("os.environ", {"HOME": str(home)}):
with patch("builtins.input", fake_input):
cmd = ConfigureCommand()
out = cmd.execute_local(Namespace())
self.assertFalse(out["ok"])
self.assertEqual(out["action"], "configure")
self.assertEqual(out["error"]["code"], "non_interactive_shell")
self.assertIn("/settings", out["agent"]["summary"])
self.assertIn("PTY", out["agent"]["summary"])
rem = "\n".join(out["data"]["remediation"])
self.assertIn("/settings", rem)
self.assertIn("/bash", rem)
self.assertIn("authentication.md", rem)
self.assertIn("/settings", out["data"]["guidance"]["credential_policy"])
if __name__ == "__main__":
unittest.main()
FILE:scripts/sysom_cli/tests/test_diagnosis_result_extract.py
# -*- coding: utf-8 -*-
from __future__ import annotations
import unittest
from sysom_cli.lib.diagnosis_helper import _extract_get_diagnosis_result_payload
class ExtractGetDiagnosisResultTests(unittest.TestCase):
def test_prefers_nonempty_result(self) -> None:
d = {"status": "success", "result": {"io": 1}}
self.assertEqual(_extract_get_diagnosis_result_payload(d), {"io": 1})
def test_empty_result_uses_result_capital(self) -> None:
d = {"status": "success", "result": {}, "Result": {"disk": 2}}
self.assertEqual(_extract_get_diagnosis_result_payload(d), {"disk": 2})
def test_fallback_sibling_keys(self) -> None:
d = {
"status": "success",
"task_id": "t1",
"iofsstat_payload": {"x": 3},
}
self.assertEqual(_extract_get_diagnosis_result_payload(d), {"x": 3})
def test_multiple_non_meta_becomes_dict(self) -> None:
d = {"status": "success", "a": 1, "b": 2}
self.assertEqual(_extract_get_diagnosis_result_payload(d), {"a": 1, "b": 2})
if __name__ == "__main__":
unittest.main()
FILE:scripts/sysom_cli/tests/test_memory_deep_envelope.py
# -*- coding: utf-8 -*-
"""memory --deep-diagnosis 本地入口信封形状;gomemdump 移除与 Go→memgraph 归类。"""
from __future__ import annotations
import copy
import unittest
from argparse import Namespace
from unittest.mock import patch
from sysom_cli.lib.diagnosis_backend import DiagnosisBackend, set_diagnosis_backend
from sysom_cli.memory.lib.classify_engine import run_classify
from sysom_cli.memory.lib.envelope_memory import recommended_specialty_cli_command
from sysom_cli.memory.memgraph.command import MemgraphHintCommand
from sysom_cli.memory.oom.command import OomHintCommand
_EMPTY_OOM_LOCAL: dict = {
"hit_count": 0,
"oom_event_count": 0,
"oom_lines_total": None,
"extraction_mode": "none",
"oom_events_summary": [],
"oom_digest": [],
"time_range": None,
"histogram_hour_local": [],
"parsed_time_count": 0,
"unparsed_wallclock_count": 0,
"dmesg_relative_line_count": 0,
"relative_boot_seconds_sample": [],
"source_note_zh": "",
}
def _ok_invoke_payload() -> dict:
return {
"ok": True,
"action": "diagnosis_invoke",
"data": {
"task_id": "task-xyz",
"channel": "ecs",
"region": "cn-test",
"result": {"probe": 1},
},
}
class _FakeInvokeBackend(DiagnosisBackend):
def __init__(self, payload: dict) -> None:
self._payload = payload
def invoke_specialty(self, service_name: str, ns: Namespace) -> dict:
return self._payload
class MemoryDeepEnvelopeTests(unittest.TestCase):
def tearDown(self) -> None:
set_diagnosis_backend(None)
@patch("sysom_cli.lib.precheck_gate.remote_precheck_gate", return_value=(True, None))
def test_memgraph_deep_local_envelope(self, _mock_gate: object) -> None:
set_diagnosis_backend(_FakeInvokeBackend(_ok_invoke_payload()))
ns = Namespace(
deep_diagnosis=True,
channel="ecs",
timeout=300,
verbose_envelope=False,
)
out = MemgraphHintCommand().execute_local(ns)
self.assertTrue(out["ok"])
self.assertEqual(out["execution"].get("phase"), "invoke_diagnosis")
remote = out["data"]["remote"]
self.assertTrue(remote["ok"])
self.assertEqual(remote["service_name"], "memgraph")
self.assertEqual(remote["task_id"], "task-xyz")
self.assertEqual(out["agent"]["next"], [])
self.assertIn("勿再重复", out["agent"]["summary"])
@patch("sysom_cli.lib.precheck_gate.remote_precheck_gate", return_value=(True, None))
def test_oom_deep_local_envelope(self, _mock_gate: object) -> None:
set_diagnosis_backend(_FakeInvokeBackend(_ok_invoke_payload()))
ns = Namespace(
deep_diagnosis=True,
channel="ecs",
timeout=300,
verbose_envelope=False,
)
out = OomHintCommand().execute_local(ns)
self.assertTrue(out["ok"])
self.assertEqual(out["data"]["remote"]["service_name"], "oomcheck")
self.assertEqual(out["agent"]["next"], [])
def test_recommended_cli_appends_region_instance_when_both_set(self) -> None:
ns = Namespace(region="cn-hangzhou", instance="i-abcd")
cmd = recommended_specialty_cli_command("memgraph", ns)
self.assertIn("--region cn-hangzhou", cmd)
self.assertIn("--instance i-abcd", cmd)
ns_partial = Namespace(region="cn-hangzhou", instance="")
cmd2 = recommended_specialty_cli_command("oomcheck", ns_partial)
self.assertNotIn("--region", cmd2)
@patch("sysom_cli.memory.lib.classify_engine._top_rss_processes")
@patch("sysom_cli.memory.lib.classify_engine.analyze_oom_local")
@patch("sysom_cli.memory.lib.classify_engine._read_meminfo")
def test_go_weak_hint_recommends_memgraph(
self,
mock_mem: object,
mock_oom: object,
mock_top: object,
) -> None:
mock_mem.return_value = {
"MemTotal": 16_000_000,
"MemAvailable": 8_000_000,
"SwapTotal": 1000,
"SwapFree": 500,
}
mock_oom.return_value = dict(_EMPTY_OOM_LOCAL)
mock_top.return_value = [("my___go_prog", "x", 1000)]
r = run_classify()
self.assertEqual(r.recommended_service_name, "memgraph")
self.assertIn("go_workload", r.categories)
class FinalizeDiagnosisInvokeIdempotentTests(unittest.TestCase):
"""重复 finalize 不得把 data.remote.result 清空(历史上 io/net/load 曾二次包裹)。"""
def test_double_finalize_preserves_remote_result(self) -> None:
from sysom_cli.lib.invoke_envelope_finalize import finalize_diagnosis_invoke_envelope
from sysom_cli.lib.schema import agent_block, envelope
out = envelope(
action="diagnosis_invoke",
ok=True,
agent=agent_block("normal", "x"),
data={
"task_id": "t1",
"service_name": "iodiagnose",
"channel": "ecs",
"region": "cn-hangzhou",
"result": {"payload": True},
},
execution={"subsystem": "invoke", "mode": "remote"},
)
ns = Namespace(verbose_envelope=False)
once = finalize_diagnosis_invoke_envelope(out, ns, cli_subsystem="io")
twice = finalize_diagnosis_invoke_envelope(copy.deepcopy(once), ns, cli_subsystem="io")
self.assertEqual(once["data"]["remote"]["result"], {"payload": True})
self.assertEqual(twice["data"]["remote"]["result"], {"payload": True})
class MergeDeepFromFinalizedInvokeTests(unittest.TestCase):
"""finalize_diagnosis_invoke_envelope 后 result 只在 data.remote,merge 须能读到。"""
def test_merge_reads_result_from_nested_data_remote(self) -> None:
from sysom_cli.lib.schema import agent_block, envelope
from sysom_cli.memory.lib.memory_envelope_finalize import merge_deep_diagnosis_flat
out = envelope(
action="memory_memgraph_hint",
ok=True,
agent=agent_block("normal", "", findings=[], next_steps=[]),
data={
"recommended_service_name": "memgraph",
"remote_analysis_value": {},
},
execution={"subsystem": "memory"},
)
inv = {
"ok": True,
"action": "diagnosis_invoke",
"data": {
"task_id": "t-finalized",
"channel": "ecs",
"region": "cn-hangzhou",
"routing": {"recommended_service_name": "memgraph"},
"remote": {
"ok": True,
"action": "diagnosis_invoke",
"service_name": "memgraph",
"task_id": "t-finalized",
"channel": "ecs",
"region": "cn-hangzhou",
"result": {"memgraph": "real_payload"},
},
},
}
merge_deep_diagnosis_flat(out, inv, service_name="memgraph")
self.assertEqual(
out["data"]["remote"]["result"],
{"memgraph": "real_payload"},
)
class OomDiagnosisHintsTests(unittest.TestCase):
def test_extra_purpose_points_to_reference_and_params(self) -> None:
from sysom_cli.memory.lib.envelope_memory import oom_diagnosis_invoke_extra_purpose_zh
s = oom_diagnosis_invoke_extra_purpose_zh(0)
self.assertIn("oomcheck.md", s)
self.assertIn("--deep-diagnosis", s)
def test_extra_purpose_multi_stays_short(self) -> None:
from sysom_cli.memory.lib.envelope_memory import oom_diagnosis_invoke_extra_purpose_zh
s = oom_diagnosis_invoke_extra_purpose_zh(3)
self.assertIn("--oom-time", s)
self.assertIn("3", s)
self.assertLess(len(s), 600)
class RegistryNoGomemdumpTests(unittest.TestCase):
def test_gomemdump_not_registered(self) -> None:
from sysom_cli.core import registry as reg
reg.CommandRegistry._discovered = False
reg.CommandRegistry._commands.clear()
reg.CommandRegistry._metadata.clear()
reg.CommandRegistry._command_subsystem.clear()
reg.CommandRegistry.discover_commands(top_level=True)
reg.CommandRegistry.discover_commands()
self.assertNotIn("gomemdump", reg.CommandRegistry.list_commands())
FILE:scripts/sysom_cli/tests/test_oom_quick.py
# -*- coding: utf-8 -*-
from __future__ import annotations
import unittest
from unittest.mock import patch
from sysom_cli.memory.lib.oom_quick import (
analyze_oom_local,
normalize_oomcheck_time_param,
parse_oom_at_anchor,
)
def _two_block_journal_tail() -> list[str]:
return [
"Mar 10 10:00:00 host kernel: invoked oom-killer: gfp_mask=0x0",
"Mar 10 10:00:00 host kernel: Killed process 111 (oldproc) total-vm:1000kB",
"Mar 11 11:00:00 host kernel: invoked oom-killer: gfp_mask=0x0",
"Mar 11 11:00:00 host kernel: Task in /sys.slice cgroup parent",
"Mar 11 11:00:00 host kernel: Killed process 222 (newproc) total-vm:2000kB",
]
def _three_block_journal_tail() -> list[str]:
b = _two_block_journal_tail()
b.extend(
[
"Mar 12 12:00:00 host kernel: invoked oom-killer: gfp_mask=0x0",
"Mar 12 12:00:00 host kernel: Killed process 333 (third) total-vm:3000kB",
]
)
return b
def _same_hour_duplicate_comm_blocks() -> list[str]:
"""同日同小时、同 comm、无 cgroup,折叠为一条摘要。"""
return [
"Mar 12 12:00:00 host kernel: invoked oom-killer: gfp_mask=0x0",
"Mar 12 12:00:01 host kernel: Killed process 1 (sameapp) total-vm:1000kB",
"Mar 12 12:30:00 host kernel: invoked oom-killer: gfp_mask=0x0",
"Mar 12 12:30:02 host kernel: Killed process 2 (sameapp) total-vm:2000kB",
]
class NormalizeOomcheckTimeTests(unittest.TestCase):
def test_iso_converts_to_unix_digits_only(self) -> None:
out = normalize_oomcheck_time_param("2026-03-25T15:21:32")
self.assertTrue(out.isdigit(), msg=out)
self.assertNotIn(":", out)
def test_space_datetime_converts(self) -> None:
out = normalize_oomcheck_time_param("2026-03-25 15:21:32")
self.assertTrue(out.isdigit(), msg=out)
def test_range_segments_normalized(self) -> None:
out = normalize_oomcheck_time_param(
"2026-03-25T15:21:00~2026-03-25T15:22:00"
)
self.assertIn("~", out)
a, b = out.split("~", 1)
self.assertTrue(a.isdigit() and b.isdigit(), msg=out)
def test_unix_passthrough(self) -> None:
self.assertEqual(normalize_oomcheck_time_param("1700000000"), "1700000000")
class ParseOomAtTests(unittest.TestCase):
def test_unix_seconds(self) -> None:
dt = parse_oom_at_anchor("1700000000")
self.assertIsNotNone(dt)
assert dt is not None
self.assertEqual(dt.year, 2023)
def test_iso(self) -> None:
dt = parse_oom_at_anchor("2024-06-15T14:30:00")
self.assertIsNotNone(dt)
assert dt is not None
self.assertEqual((dt.year, dt.month, dt.day), (2024, 6, 15))
class AnalyzeOomLocalTests(unittest.TestCase):
@patch("sysom_cli.memory.lib.oom_quick.get_kernel_log_lines")
def test_default_one_full_log_is_last_event(self, mock_gl: object) -> None:
mock_gl.return_value = _two_block_journal_tail()
r = analyze_oom_local(max_lines=5000, max_full_oom_logs=1)
self.assertEqual(r["hit_count"], 2)
self.assertEqual(r["extraction_mode"], "sysak_blocks")
self.assertEqual(len(r["oom_digest"]), 1)
# digest is a dict with killed_comm
self.assertEqual(r["oom_digest"][0]["killed_comm"], "newproc")
self.assertEqual(len(r["oom_events_summary"]), 2)
@patch("sysom_cli.memory.lib.oom_quick.get_kernel_log_lines")
def test_oom_at_selects_nearest_wallclock_block(self, mock_gl: object) -> None:
mock_gl.return_value = _two_block_journal_tail()
r = analyze_oom_local(
max_lines=5000,
max_full_oom_logs=1,
oom_at="Mar 10 10:00:05",
)
self.assertEqual(len(r["oom_digest"]), 1)
self.assertEqual(r["oom_digest"][0]["killed_comm"], "oldproc")
@patch("sysom_cli.memory.lib.oom_quick.get_kernel_log_lines")
def test_max_full_logs_extends_backward(self, mock_gl: object) -> None:
mock_gl.return_value = _three_block_journal_tail()
r = analyze_oom_local(max_lines=5000, max_full_oom_logs=2)
self.assertEqual(len(r["oom_digest"]), 2)
comms = [d["killed_comm"] for d in r["oom_digest"]]
self.assertIn("third", comms)
self.assertIn("newproc", comms)
@patch("sysom_cli.memory.lib.oom_quick.get_kernel_log_lines")
def test_summaries_capped(self, mock_gl: object) -> None:
mock_gl.return_value = _three_block_journal_tail()
r = analyze_oom_local(max_lines=5000, max_event_summaries=2)
self.assertEqual(len(r["oom_events_summary"]), 2)
idxs = [s["event_index"] for s in r["oom_events_summary"]]
self.assertEqual(idxs, [1, 2])
@patch("sysom_cli.memory.lib.oom_quick.get_kernel_log_lines")
def test_memcg_scope_hint(self, mock_gl: object) -> None:
mock_gl.return_value = _two_block_journal_tail()
r = analyze_oom_local(max_lines=5000)
summ = r["oom_events_summary"]
self.assertEqual(summ[-1].get("scope_hint"), "memcg")
self.assertEqual(summ[-1].get("killed_comm"), "newproc")
@patch("sysom_cli.memory.lib.oom_quick.get_kernel_log_lines")
def test_same_hour_same_comm_collapses_with_count(self, mock_gl: object) -> None:
mock_gl.return_value = _same_hour_duplicate_comm_blocks()
r = analyze_oom_local(max_lines=5000, max_event_summaries=64)
self.assertEqual(r["hit_count"], 2)
self.assertEqual(len(r["oom_events_summary"]), 1)
row = r["oom_events_summary"][0]
self.assertEqual(row.get("similar_oom_count"), 2)
self.assertEqual(row.get("killed_comm"), "sameapp")
self.assertEqual(row.get("event_index"), 1)
if __name__ == "__main__":
unittest.main()
FILE:scripts/sysom_cli/tests/test_precheck_envelope.py
# -*- coding: utf-8 -*-
"""表驱动断言:precheck 信封形状与 remediation 关键词。"""
from __future__ import annotations
import unittest
from sysom_cli.lib.precheck_envelope import envelope_from_precheck_result
class PrecheckEnvelopeShapeTests(unittest.TestCase):
def test_ok_success_light_guidance(self) -> None:
env = envelope_from_precheck_result(
{
"ok": True,
"method": "环境变量 AKSK",
"message": "认证验证成功,拥有 SysOM 访问权限",
}
)
self.assertTrue(env["ok"])
g = env["data"]["guidance"]
self.assertNotIn("guidance_mode", g)
self.assertNotIn("session_rule", g)
self.assertNotIn("guided_steps", g)
def test_ecs_ram_role_compact_no_help(self) -> None:
cr = {
"ok": False,
"error": "未找到有效的认证配置",
"ecs_role_name": "ECSRamRoleForSysOM",
"checked": [
{"method": "ECS元数据", "status": "✓ 实例已绑定 RAM 角色: ECSRamRoleForSysOM"},
{"method": "环境变量 AKSK", "status": "✗ 未配置"},
{"method": "配置文件", "status": "✗ 未配置或配置无效"},
],
}
env = envelope_from_precheck_result(cr)
self.assertFalse(env["ok"])
d = env["data"]
self.assertEqual(d["path_summary"]["primary_path"], "ecs_ram_role")
self.assertNotIn("service_activated", d)
self.assertNotIn("help", d)
self.assertIn("progress", d["path_summary"])
self.assertTrue(d["path_summary"]["progress"]["metadata_role_bound"])
self.assertNotIn("help", d)
self.assertNotIn("suggestion", d)
self.assertNotIn("guidance_mode", d["guidance"])
self.assertEqual(len(env["agent"]["findings"]), 2)
rem = "\n".join(d["remediation"])
self.assertIn("ECS RAM Role", rem)
self.assertNotIn("curl", rem.lower())
def test_access_key_path_env_only_remediation(self) -> None:
cr = {
"ok": False,
"error": "权限不足",
"error_code": "insufficient_permissions",
"checked": [
{"method": "ECS元数据", "status": "✗ 未检测到"},
{
"method": "环境变量 AKSK",
"status": "✗ 权限不足,需要 AliyunSysomFullAccess 权限",
},
{"method": "配置文件", "status": "✗ 未配置或配置无效"},
],
}
env = envelope_from_precheck_result(cr)
self.assertEqual(env["data"]["path_summary"]["primary_path"], "access_key")
self.assertNotIn(
"service_activated", env["data"],
"权限不足时未调用成功 InitialSysom,不应输出 service_activated",
)
rem = env["data"]["remediation"]
joined = "\n".join(rem)
self.assertIn("环境变量", joined)
self.assertIn("主路径:RAM 用户 AccessKey", joined)
def test_configure_identity_has_help(self) -> None:
cr = {
"ok": False,
"error": "未找到有效的认证配置",
"checked": [
{"method": "ECS元数据", "status": "✗ 未检测到"},
{"method": "环境变量 AKSK", "status": "✗ 未配置"},
{"method": "配置文件", "status": "✗ 未配置或配置无效"},
],
"help": {"aksk": "x", "ram_role": "y", "permission": "z"},
}
env = envelope_from_precheck_result(cr)
self.assertEqual(env["data"]["path_summary"]["primary_path"], "configure_identity")
self.assertNotIn("service_activated", env["data"])
self.assertNotIn("help", env["data"])
self.assertIn("auth_path_choice", env["data"]["guidance"])
def test_activation_two_findings(self) -> None:
cr = {
"ok": False,
"error": "服务未开通",
"error_code": "service_not_activated",
"checked": [
{"method": "ECS元数据", "status": "✗ 未检测到"},
{"method": "环境变量 AKSK", "status": "✗ 未配置"},
{"method": "配置文件 AKSK", "status": "✗ 服务未开通"},
],
"suggestion": "开通",
"help": {},
}
env = envelope_from_precheck_result(cr)
self.assertEqual(len(env["agent"]["findings"]), 2)
self.assertIn("处理说明", env["agent"]["findings"][1]["title"])
self.assertIs(
env["data"].get("service_activated"),
False,
"InitialSysom 已明确返回未开通/角色未就绪时应为 false",
)
if __name__ == "__main__":
unittest.main()
Alibaba Cloud OpenSearch instance management skill. Create and query OpenSearch instances. Note: OpenSearch instance and OpenSearch app group are synonymous...
---
name: alibabacloud-opensearch-app-manage
description: |
Alibaba Cloud OpenSearch instance management skill. Create and query OpenSearch instances.
Note: OpenSearch instance and OpenSearch app group are synonymous terms.
Triggers: "opensearch", "search instance", "create search instance", "app group", "instance management", "create instance", "query instance", "list instances"
---
# OpenSearch Instance Management
Manage Alibaba Cloud OpenSearch instances, including creation and query operations.
> **Terminology**: OpenSearch instance and OpenSearch app group are synonymous. This document uses "instance" uniformly.
## Scenario Description
OpenSearch is Alibaba Cloud's intelligent search service. This skill covers core instance management operations:
1. **Create Instance** - Create a new OpenSearch instance
2. **List Instances** - Get the list of instances under the current account
3. **Describe Instance** - Get detailed information of a specific instance
**Architecture**: OpenSearch Instance (AppGroup)
```
Alibaba Cloud OpenSearch Instance Management
├── CreateAppGroup (Create Instance)
├── DescribeAppGroup (Describe Instance)
└── ListAppGroups (List Instances)
```
---
## Installation
> **Pre-check: Aliyun CLI >= 3.3.3 required**
> Run `aliyun version` to verify >= 3.3.3. If not installed or version too low,
> run `curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash` to update,
> or see [references/cli-installation-guide.md](references/cli-installation-guide.md) for installation instructions.
**Pre-check: Aliyun CLI plugin update required**
> [MUST] run `aliyun configure set --auto-plugin-install true` to enable automatic plugin installation.
> [MUST] run `aliyun plugin update` to ensure that any existing plugins are always up-to-date.
```bash
# Install/Upgrade Aliyun CLI
curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash
aliyun version # Ensure >= 3.3.3
# Enable auto plugin installation
aliyun configure set --auto-plugin-install true
# Update plugins
aliyun plugin update
```
**[MUST] CLI User-Agent** — Every `aliyun` CLI command invocation must include:
`--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage`
**[MUST] AI-Mode** — Before executing CLI commands, run:
1. `aliyun configure ai-mode enable`
2. `aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage"`
After all CLI operations complete, run: `aliyun configure ai-mode disable`
---
## Authentication
> **Pre-check: Alibaba Cloud Credentials Required**
> **Security Rules (MUST FOLLOW):**
> - **NEVER** read, echo, or print AK/SK values
> - **NEVER** ask the user to input AK/SK directly in the conversation
> - **NEVER** use `aliyun configure set` with literal credential values
> - **NEVER** accept AK/SK provided directly by users in the conversation
> - **ONLY** read credentials from environment variables or pre-configured CLI profiles
>
> **⚠️ CRITICAL: Handling User-Provided Credentials**
>
> If a user attempts to provide AK/SK directly (e.g., "My AK is xxx, SK is yyy"):
> 1. **STOP immediately** - Do NOT execute any command
> 2. **Reject the request politely** with the following message:
> ```
> For your account security, please do not provide Alibaba Cloud AccessKey ID and AccessKey Secret directly in the conversation.
>
> Please use the following secure methods to configure credentials:
>
> Method 1: Interactive configuration via aliyun configure (Recommended)
> aliyun configure
> # Enter AK/SK as prompted, credentials will be securely stored in local config file
>
> Method 2: Configure via environment variables
> export ALIBABA_CLOUD_ACCESS_KEY_ID=<your-access-key-id>
> export ALIBABA_CLOUD_ACCESS_KEY_SECRET=<your-access-key-secret>
>
> After configuration, please retry your request.
> ```
> 3. **Do NOT proceed** with any Alibaba Cloud operations until credentials are properly configured
>
> **Check CLI configuration**:
> ```bash
> aliyun configure list
> ```
> Check the output for a valid profile (AK, STS, or OAuth identity).
>
> **If no valid credentials exist, STOP here.**
---
## RAM Permissions
> **[MUST] RAM Permission Pre-check:**
> Before executing any operation, ensure the current user has the required RAM permissions.
> See [references/ram-policies.md](references/ram-policies.md) for detailed permission list.
---
## Parameter Confirmation
> **IMPORTANT: Parameter Confirmation** — Before executing any command or API call,
> ALL user-customizable parameters (e.g., instance name, instance type, charge type, quota spec, etc.) MUST be confirmed with the user.
> Do NOT assume or use default values without explicit user approval.
### Required Parameters
| Parameter | Required | Description | Default |
|-----------|----------|-------------|---------|
| `name` | Yes | Instance name | None |
| `type` | Yes | Instance type: `standard` (High-performance) / `enhanced` (Industry Algorithm) | None |
| `chargeType` | No | Charge type: `POSTPAY` / `PREPAY` | `POSTPAY` |
| `quota.spec` | Yes | Spec type (see table below) | None |
| `quota.docSize` | Yes | Storage capacity (GB) | None |
| `quota.computeResource` | Yes | Compute resource (LCU) | None |
| `domain` | No | Industry type (required for enhanced type, see table below) | `general` |
| `order` | Conditional | Subscription order info (required when PREPAY) | None |
| `order.duration` | Conditional | Subscription period quantity | None |
| `order.pricingCycle` | Conditional | Period unit: `Year` / `Month` | None |
| `order.autoRenew` | No | Auto-renewal | `false` |
### Spec Types
| Spec Code | Description |
|-----------|-------------|
| `opensearch.share.common` | Shared Common |
| `opensearch.private.common` | Dedicated Common |
| `opensearch.private.compute` | Dedicated Compute |
| `opensearch.private.storage` | Dedicated Storage |
### Industry Types (for enhanced type only)
| Industry Code | Description |
|---------------|-------------|
| `general` | General (default) |
| `ecommerce` | E-commerce |
| `esports` | Gaming |
| `community` | Content Community |
| `education` | Education |
---
## Core Workflow
> **Note:** OpenSearch APIs use **ROA (RESTful)** style. You can use `--body` to specify the HTTP request body as a JSON string. See examples in each task below.
> **Idempotency:** For write operations (create, restart, delete, etc.), you **MUST** use `--client-token` parameter for idempotency.
> - Use a UUID format unique identifier as clientToken
> - When request times out or fails, you can safely retry with **the same clientToken**; recommend waiting 10s before retry
> - Repeated requests with the same clientToken will not execute the operation multiple times
> - Generation: `uuidgen` (macOS/Linux) or `[guid]::NewGuid()` (PowerShell)
### Task 1: Create OpenSearch Instance
```bash
# Generate idempotency token
CLIENT_TOKEN=$(uuidgen)
aliyun opensearch create-app-group \
--client-token "$CLIENT_TOKEN" \
--body '{
"name": "<instance_name>",
"type": "<standard|enhanced>",
"chargeType": "<POSTPAY|PREPAY>",
"quota": {
"docSize": <storage_GB>,
"computeResource": <compute_LCU>,
"spec": "<spec_type>"
}
}' \
--connect-timeout 3 \
--read-timeout 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Optional Parameters** (add in body):
- `domain` - Industry type (only for enhanced type): `general` (default) / `ecommerce` / `esports` / `community` / `education`
**Idempotency and Dry-run Support** (via Query parameters):
- `--dryRun true` - Dry-run mode, validates parameters without actual creation
- `--client-token <unique_id>` - Idempotency token, same token multiple requests only creates once
**Example**: Create an enhanced (Industry Algorithm) pay-as-you-go instance (E-commerce)
```bash
# Generate idempotency token
CLIENT_TOKEN=$(uuidgen)
aliyun opensearch create-app-group \
--client-token "$CLIENT_TOKEN" \
--body '{
"name": "my_search_instance",
"type": "enhanced",
"chargeType": "POSTPAY",
"domain": "ecommerce",
"quota": {
"docSize": 100,
"computeResource": 2000,
"spec": "opensearch.private.common"
}
}' \
--connect-timeout 3 \
--read-timeout 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Example**: Create a standard (High-performance) instance
```bash
# Generate idempotency token
CLIENT_TOKEN=$(uuidgen)
aliyun opensearch create-app-group \
--client-token "$CLIENT_TOKEN" \
--body '{
"name": "my_standard_instance",
"type": "standard",
"chargeType": "POSTPAY",
"quota": {
"docSize": 50,
"computeResource": 1000,
"spec": "opensearch.share.common"
}
}' \
--connect-timeout 3 \
--read-timeout 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Example**: Create a subscription (prepaid) instance
> **Note**: Subscription instances MUST provide `order` parameter
```bash
# Generate idempotency token
CLIENT_TOKEN=$(uuidgen)
aliyun opensearch create-app-group \
--client-token "$CLIENT_TOKEN" \
--body '{
"name": "my_prepay_instance",
"type": "enhanced",
"chargeType": "PREPAY",
"domain": "ecommerce",
"quota": {
"docSize": 100,
"computeResource": 2000,
"spec": "opensearch.private.common"
},
"order": {
"duration": 1,
"pricingCycle": "Year",
"autoRenew": true
}
}' \
--connect-timeout 3 \
--read-timeout 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Dry-run Mode Example** (validates parameters only, no actual creation):
```bash
aliyun opensearch create-app-group \
--dryRun true \
--body '{
"name": "my_search_instance",
"type": "enhanced",
"chargeType": "POSTPAY",
"quota": {
"docSize": 100,
"computeResource": 2000,
"spec": "opensearch.private.common"
}
}' \
--connect-timeout 3 \
--read-timeout 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Idempotent Creation Example** (prevents duplicate creation):
```bash
# Generate idempotency token
CLIENT_TOKEN=$(uuidgen)
aliyun opensearch create-app-group \
--client-token "$CLIENT_TOKEN" \
--body '{
"name": "my_search_instance",
"type": "enhanced",
"chargeType": "POSTPAY",
"quota": {
"docSize": 100,
"computeResource": 2000,
"spec": "opensearch.private.common"
}
}' \
--connect-timeout 3 \
--read-timeout 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
### Task 2: List Instances
```bash
aliyun opensearch list-app-groups \
--engine-type ha3 \
--page-number <page> \
--page-size <size> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Supported Filter Parameters**:
- `--engine-type ha3` - Engine type (default ha3, must specify)
- `--name <instance_name>` - Filter by name
- `--instance-id <instance_id>` - Filter by instance ID
- `--type <standard|enhanced>` - Filter by type
- `standard`: High-performance
- `enhanced`: Industry Algorithm
- `--sort-by <field>` - Sort field
**Example**: List instances
```bash
aliyun opensearch list-app-groups \
--engine-type ha3 \
--page-number 1 \
--page-size 10 \
--connect-timeout 3 \
--read-timeout 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
### Task 3: Describe Instance
```bash
aliyun opensearch describe-app-group \
--app-group-identity <instance_name_or_id> \
--connect-timeout 3 \
--read-timeout 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Example**: Get instance details
```bash
aliyun opensearch describe-app-group \
--app-group-identity my_search_instance \
--connect-timeout 3 \
--read-timeout 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Response includes**:
- Basic info (instanceId, name, type, status)
- Quota info (quota: docSize, computeResource, spec)
- Billing info (chargeType, chargingWay)
- Version info (currentVersion, versions)
- Status info (lockMode, produced)
- Engine info (engineType)
---
## Success Verification
For operation verification, see [references/verification-method.md](references/verification-method.md)
### Quick Verification
**Verify Instance Creation**:
```bash
aliyun opensearch describe-app-group \
--app-group-identity <instance_name> \
--connect-timeout 3 \
--read-timeout 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
Check if `result.instanceId` field is non-empty; if non-empty, instance creation succeeded.
---
## Resource Cleanup
To delete instances, please use [OpenSearch Console](https://opensearch.console.aliyun.com/).
---
## API and Command Reference
For complete API list, see [references/related-apis.md](references/related-apis.md)
| Operation | CLI Command | API Action |
|-----------|------------|------------|
| Create Instance | `aliyun opensearch create-app-group` | CreateAppGroup |
| List Instances | `aliyun opensearch list-app-groups` | ListAppGroups |
| Describe Instance | `aliyun opensearch describe-app-group` | DescribeAppGroup |
---
## Best Practices
### Write Operation Parameter Confirmation (Required)
> **Important**: Before executing write operations (create instance, etc.), you **MUST** confirm the following parameters with the user:
**Pre-creation Confirmation Checklist**:
| Parameter | Description | Example |
|-----------|-------------|---------|
| Region | Instance region | `cn-hangzhou` / `cn-shanghai` / `cn-beijing` |
| Instance Name (name) | User-specified name (lowercase, numbers, underscores) | `my_search_instance` |
| Instance Type (type) | High-performance / Industry Algorithm | `standard` / `enhanced` |
| Charge Type (chargeType) | Pay-as-you-go / Subscription | `POSTPAY` / `PREPAY` |
| Spec Type (quota.spec) | Shared / Dedicated | `opensearch.share.common` |
| Storage (quota.docSize) | In GB | `100` |
| Compute (quota.computeResource) | In LCU | `2000` |
| Industry (domain) | Only for enhanced type | `ecommerce` / `general` |
| Subscription Period (order) | Only for PREPAY | 1 Year / 6 Months |
**Confirmation Flow Example**:
```
You are about to create the following OpenSearch instance, please confirm:
- Region: cn-hangzhou (China East 1)
- Instance Name: my_search_instance
- Instance Type: Industry Algorithm (enhanced)
- Industry: E-commerce (ecommerce)
- Charge Type: Pay-as-you-go (POSTPAY)
- Spec Type: Dedicated Common (opensearch.private.common)
- Storage: 100 GB
- Compute: 2000 LCU
Confirm creation? (yes/no)
```
### Idempotency Best Practices
For write operations (create, restart, delete), follow these idempotency best practices:
1. **Generate unique Token before each operation**: Use `uuidgen` to generate UUID
2. **Reuse Token on timeout retry**: If request times out, retry with the same clientToken
3. **Use different Token for different operations**: Each independent operation needs a new clientToken
4. **Token validity**: clientToken is typically valid for 24 hours
```bash
# Example: Safe retry pattern
CLIENT_TOKEN=$(uuidgen)
echo "Using clientToken: $CLIENT_TOKEN"
# First attempt
aliyun opensearch create-app-group --client-token $CLIENT_TOKEN ...
# If timeout, retry with same Token
aliyun opensearch create-app-group --client-token $CLIENT_TOKEN ...
```
### Other Best Practices
1. **Naming Convention**: Instance name must start with a letter, only lowercase letters, numbers, and underscores (_) allowed, **hyphens (-) are forbidden**, max 30 characters
- ✅ Correct: `my_search_instance`, `video_search`, `product_search_2024`
- ❌ Incorrect: `my-search-instance`, `My_Search`, `123_search`
2. **Quota Planning**: Plan storage and compute resources based on actual data volume and query requirements
3. **Charge Type Selection**:
- Test/Dev environment: Use pay-as-you-go (POSTPAY)
- Production environment: Consider subscription (PREPAY) to reduce costs
- **Note**: Subscription instances MUST provide `order` parameter (including duration and pricingCycle)
4. **Instance Type Selection**:
- High-performance (`standard`): Suitable for general search scenarios
- Industry Algorithm (`enhanced`): Suitable for specific industry scenarios, requires `domain` parameter
5. **Industry Selection** (Industry Algorithm):
- E-commerce: `ecommerce`
- Gaming: `esports`
- Content Community: `community`
- Education: `education`
- General: `general` (default)
6. **Spec Selection**:
- Shared Common: Suitable for small-scale scenarios
- Dedicated: Suitable for production environments, more stable performance
7. **Resource Cleanup**: Delete unused pay-as-you-go instances promptly to avoid unnecessary costs
---
## Reference Links
| Document | Description |
|----------|-------------|
| [references/related-apis.md](references/related-apis.md) | Complete API List |
| [references/ram-policies.md](references/ram-policies.md) | RAM Policies |
| [references/verification-method.md](references/verification-method.md) | Verification Methods |
| [references/cli-installation-guide.md](references/cli-installation-guide.md) | CLI Installation Guide |
| [references/acceptance-criteria.md](references/acceptance-criteria.md) | Acceptance Criteria |
FILE:references/acceptance-criteria.md
# Acceptance Criteria: OpenSearch App Management
**Scenario**: OpenSearch Instance Management
**Purpose**: Skill test acceptance criteria
> **Terminology**: OpenSearch instance and OpenSearch app group are synonymous.
---
# Correct CLI Command Patterns
## 1. Product — Verify Product Name
✅ **CORRECT**: `opensearch`
```bash
aliyun opensearch --help
```
❌ **INCORRECT**: `open-search`, `OpenSearch`, `os`
---
## 2. Command — Verify Command Exists
### CreateAppGroup
✅ **CORRECT**:
```bash
aliyun opensearch create-app-group --help
```
❌ **INCORRECT**: `aliyun opensearch CreateAppGroup`, `aliyun opensearch create_app_group`
### ListAppGroups
✅ **CORRECT**:
```bash
aliyun opensearch list-app-groups --help
```
❌ **INCORRECT**: `aliyun opensearch ListAppGroups`, `aliyun opensearch list_app_groups`
### DescribeAppGroup
✅ **CORRECT**:
```bash
aliyun opensearch describe-app-group --help
```
❌ **INCORRECT**: `aliyun opensearch DescribeAppGroup`, `aliyun opensearch get-app-group`
---
## 3. Parameters — Verify Parameter Names
### create-app-group Parameters
✅ **CORRECT**:
```bash
# Generate idempotency token
CLIENT_TOKEN=$(uuidgen)
aliyun opensearch create-app-group \
--client-token "$CLIENT_TOKEN" \
--body '{
"name": "my_app",
"type": "enhanced",
"chargeType": "POSTPAY",
"domain": "ecommerce",
"quota": {
"docSize": 100,
"computeResource": 2000,
"spec": "opensearch.private.common"
}
}' \
--user-agent AlibabaCloud-Agent-Skills
```
❌ **INCORRECT** (wrong parameter format):
```bash
# Wrong: Should not use separate parameters, use --body JSON instead
aliyun opensearch create-app-group \
--name my-app \
--type standard \
--charge-type POSTPAY \
--quota "doc-size=10,compute-resource=20,spec=opensearch.share.common"
```
### Idempotency and Dry-run Parameters
✅ **CORRECT**:
```bash
# Dry-run mode
aliyun opensearch create-app-group \
--dryRun true \
--body '{...}' \
--user-agent AlibabaCloud-Agent-Skills
# Idempotent creation (must generate token first)
CLIENT_TOKEN=$(uuidgen)
aliyun opensearch create-app-group \
--client-token "$CLIENT_TOKEN" \
--body '{...}' \
--user-agent AlibabaCloud-Agent-Skills
```
❌ **INCORRECT**:
```bash
# Wrong: dryRun should not be in body
aliyun opensearch create-app-group \
--body '{"dryRun": true, ...}'
# Wrong: hardcoded token, should use uuidgen
aliyun opensearch create-app-group \
--client-token "fixed-token-123"
# Wrong: parameter name is incorrect
aliyun opensearch create-app-group \
--dry-run true # Should be --dryRun
```
### list-app-groups Parameters
✅ **CORRECT**:
```bash
aliyun opensearch list-app-groups \
--page-number 1 \
--page-size 10 \
--user-agent AlibabaCloud-Agent-Skills
```
❌ **INCORRECT**:
```bash
aliyun opensearch list-app-groups \
--page 1 \ # Wrong: should be --page-number
--limit 10 # Wrong: should be --page-size
```
### describe-app-group Parameters
✅ **CORRECT**:
```bash
aliyun opensearch describe-app-group \
--app-group-identity my_instance \
--user-agent AlibabaCloud-Agent-Skills
```
❌ **INCORRECT**:
```bash
aliyun opensearch describe-app-group \
--name my_instance \ # Wrong: should be --app-group-identity
--app-name my_instance # Wrong: should be --app-group-identity
```
---
## 4. Parameter Values — Verify Valid Values
### name Parameter (Instance Name)
Instance name must start with a letter, only lowercase letters, numbers, and underscores (_) allowed, hyphens (-) forbidden, max 30 characters.
✅ **CORRECT**: `my_search_instance`, `video_search`, `product_search_2024`
❌ **INCORRECT**: `my-search-instance` (contains hyphen), `My_Search` (uppercase), `123_search` (starts with number)
### type Parameter
✅ **CORRECT**: `standard` (High-performance), `enhanced` (Industry Algorithm)
❌ **INCORRECT**: `basic`, `advanced`, `enterprise`
### domain Parameter (for enhanced type only)
✅ **CORRECT**: `general` (default), `ecommerce`, `esports`, `community`, `education`
❌ **INCORRECT**: `retail`, `game`, `media`, `ECOMMERCE`
### chargeType Parameter
✅ **CORRECT**: `POSTPAY`, `PREPAY`
❌ **INCORRECT**: `postpay`, `prepay`, `PAY_AS_YOU_GO`, `SUBSCRIPTION`
> **Note**: When chargeType is `PREPAY`, `order` parameter is required
### order Parameter (required for PREPAY)
✅ **CORRECT**:
```bash
# Subscription instance must include order parameter
aliyun opensearch create-app-group \
--client-token "$CLIENT_TOKEN" \
--body '{
"name": "my_prepay_instance",
"type": "enhanced",
"chargeType": "PREPAY",
"quota": {...},
"order": {
"duration": 1,
"pricingCycle": "Year",
"autoRenew": true
}
}'
```
❌ **INCORRECT**:
```bash
# Wrong: Subscription instance missing order parameter
aliyun opensearch create-app-group \
--body '{
"chargeType": "PREPAY",
"quota": {...}
}'
# Wrong: order missing required fields
aliyun opensearch create-app-group \
--body '{
"chargeType": "PREPAY",
"order": {
"duration": 1
}
}'
```
### spec Parameter
✅ **CORRECT**:
- `opensearch.share.common`
- `opensearch.private.common`
- `opensearch.private.compute`
- `opensearch.private.storage`
❌ **INCORRECT**:
- `opensearch.share.junior`
- `opensearch.share.compute`
- `opensearch.share.storage`
- `share.common`
- `common`
---
## 5. User-Agent Flag — Verify User-Agent
✅ **CORRECT**: Every command includes `--user-agent AlibabaCloud-Agent-Skills`
```bash
aliyun opensearch list-app-groups --user-agent AlibabaCloud-Agent-Skills
```
❌ **INCORRECT**: Missing user-agent
```bash
aliyun opensearch list-app-groups
```
---
# Correct Python Common SDK Code Patterns (Fallback)
If CLI is unavailable, Python Common SDK can be used as a fallback.
## 1. Import Patterns
✅ **CORRECT**:
```python
from alibabacloud_tea_openapi.client import Client as OpenApiClient
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_tea_util import models as util_models
```
❌ **INCORRECT**:
```python
# Wrong: using deprecated SDK
from aliyunsdkcore.client import AcsClient
# Wrong: incorrect module name
from alibabacloud_opensearch import Client
```
## 2. Authentication — Must Use CredentialClient
✅ **CORRECT**:
```python
credential = CredentialClient()
config = open_api_models.Config(credential=credential)
config.endpoint = 'opensearch.cn-hangzhou.aliyuncs.com'
client = OpenApiClient(config)
```
❌ **INCORRECT** (hardcoded credentials):
```python
# FORBIDDEN: hardcoding credentials
config = open_api_models.Config()
config.access_key_id = 'LTAI5txxxxxxxx'
config.access_key_secret = 'xxxxxxxxxxxxxxxx'
```
## 3. API Style — OpenSearch Uses ROA Style
✅ **CORRECT**:
```python
params = open_api_models.Params(
action='CreateAppGroup',
version='2017-12-25',
protocol='HTTPS',
method='POST',
auth_type='AK',
style='ROA',
pathname='/v4/openapi/app-groups',
req_body_type='json',
body_type='json'
)
```
❌ **INCORRECT** (wrong API style):
```python
params = open_api_models.Params(
style='RPC', # Wrong: OpenSearch uses ROA style
pathname='/', # Wrong: ROA requires specific path
)
```
---
# Critical Patterns Checklist
- [ ] All CLI commands use lowercase hyphen format (plugin mode)
- [ ] All commands include `--user-agent AlibabaCloud-Agent-Skills`
- [ ] Parameter names are correct (kebab-case format)
- [ ] Parameter values are within allowed enum range
- [ ] SDK code uses CredentialClient, no hardcoded credentials
- [ ] OpenSearch API uses ROA style with correct pathname
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
Complete guide for installing and configuring Aliyun CLI.
> **Aliyun CLI 3.3.3+**: Supports installing and using all published Alibaba Cloud product plugins. Make sure to upgrade to 3.3.3 or later for full plugin ecosystem coverage.
## Table of Contents
- [Installation](#installation)
- [macOS](#macos)
- [Linux](#linux)
- [Windows](#windows)
- [Configuration](#configuration)
- [Quick Start](#quick-start)
- [Configuration Modes](#configuration-modes)
- [Environment Variables](#environment-variables)
- [Managing Multiple Profiles](#managing-multiple-profiles)
- [Credential Priority](#credential-priority)
- [Verification](#verification)
- [Test Authentication](#test-authentication)
- [Debug Configuration](#debug-configuration)
- [Security Best Practices](#security-best-practices)
- [Troubleshooting](#troubleshooting)
- [Advanced Configuration](#advanced-configuration)
- [Custom Endpoint](#custom-endpoint)
- [Proxy Settings](#proxy-settings)
- [Timeout Settings](#timeout-settings)
- [Next Steps](#next-steps)
- [References](#references)
---
## Installation
### macOS
**Using Homebrew (Recommended)**
```bash
brew install aliyun-cli
# Upgrade to latest
brew upgrade aliyun-cli
# Verify version (>= 3.3.3)
aliyun version
```
**Using Binary**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
# Extract
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
# Move to PATH
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
### Linux
**Debian/Ubuntu**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**CentOS/RHEL**
```bash
# Download
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
# Verify
aliyun version
```
**ARM64 Architecture**
```bash
# Download ARM64 version
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-arm64.tgz
# Extract and install
tar -xzf aliyun-cli-linux-latest-arm64.tgz
sudo mv aliyun /usr/local/bin/
```
### Windows
**Using Binary**
1. Download from: https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip
2. Extract the ZIP file
3. Add the directory to your PATH environment variable
4. Open new Command Prompt or PowerShell
5. Verify: `aliyun version`
**Using PowerShell**
```powershell
# Download
Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip"
# Extract
Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli
# Add to PATH (requires admin privileges)
$env:Path += ";C:\aliyun-cli"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)
# Verify
aliyun version
```
## Configuration
### Quick Start
All `aliyun configure` commands support non-interactive flags, which is the recommended approach —
it works in scripts, CI/CD pipelines, and agent-driven automation without hanging on stdin prompts.
**Where to Get Access Keys**
1. Log in to Aliyun Console: https://ram.console.aliyun.com/
2. Navigate to: AccessKey Management
3. Create a new AccessKey pair
4. Save the secret immediately — it's only shown once
### Configuration Modes
Aliyun CLI supports several authentication modes. For security reasons, credential configuration is not shown in this guide. Please refer to the [official Aliyun CLI documentation](https://help.aliyun.com/zh/cli/) for secure credential setup.
**Available Authentication Modes:**
| Mode | Description | Use Case |
|------|-------------|----------|
| `AK` | Access Key authentication | General purpose |
| `StsToken` | Temporary credentials with STS token | CI/CD pipelines, temporary access |
| `RamRoleArn` | Assume RAM role | Cross-account access, elevated privileges |
| `EcsRamRole` | ECS instance RAM role | Scripts running on ECS instances |
| `RsaKeyPair` | RSA key pair authentication | Special authentication scenarios |
| `RamRoleArnWithEcs` | ECS + RAM role combination | Cross-account from ECS |
**Configure using interactive mode (recommended):**
```bash
aliyun configure
```
This will prompt you to enter credentials securely without exposing them in command history.
### Environment Variables
Environment variables provide the **highest priority** credential source and override config file settings.
**Supported Environment Variables:**
| Variable | Purpose |
|----------|---------|
| `ALIBABA_CLOUD_ACCESS_KEY_ID` | Access Key ID |
| `ALIBABA_CLOUD_ACCESS_KEY_SECRET` | Access Key Secret |
| `ALIBABA_CLOUD_SECURITY_TOKEN` | STS Token (for temporary credentials) |
| `ALIBABA_CLOUD_REGION_ID` | Default region |
| `ALIBABA_CLOUD_ECS_METADATA` | ECS RAM Role name |
| `ALIBABA_CLOUD_PROFILE` | Profile name to use |
> **Security Best Practices:**
> - Set environment variables in your shell profile (e.g., `~/.bashrc`, `~/.zshrc`) or CI/CD secret stores
> - NEVER commit credentials to version control
> - NEVER echo or print environment variable values
> - Use your shell's secure credential management or CI/CD secret stores
**Use Cases:**
- CI/CD pipelines (via secret environment variables)
- Docker containers
- Temporary credential override
### Managing Multiple Profiles
**Create Named Profiles**
Use interactive mode to create profiles securely:
```bash
# Create a new profile
aliyun configure --profile projectA
# Or use the set command with --mode only, then configure credentials interactively
aliyun configure set --profile projectA --mode AK --region cn-hangzhou
# Then run 'aliyun configure' to set credentials
```
> **Security Note**: Avoid using `--access-key-id` and `--access-key-secret` flags in commands as they may be recorded in shell history. Use interactive mode instead.
**Use Specific Profile**
```bash
aliyun ecs describe-instances --profile projectA
export ALIBABA_CLOUD_PROFILE=projectA
aliyun ecs describe-instances # Uses projectA
```
**List and Switch Profiles**
```bash
aliyun configure list # List all profiles
aliyun configure set --current projectA # Switch default profile
```
### Credential Priority
Credentials are loaded in this order (first found wins):
1. **Command-line flag**: `--profile <name>`
2. **Environment variable**: `ALIBABA_CLOUD_PROFILE`
3. **Environment credentials**: `ALIBABA_CLOUD_ACCESS_KEY_ID`, etc.
4. **Configuration file**: `~/.aliyun/config.json` (current profile)
5. **ECS Instance RAM Role**: If running on ECS with attached role
## Verification
### Test Authentication
```bash
# Basic test - list regions
aliyun ecs describe-regions
# Expected output: JSON array of regions
```
**If successful**, you'll see:
```json
{
"Regions": {
"Region": [
{
"RegionId": "cn-hangzhou",
"RegionEndpoint": "ecs.cn-hangzhou.aliyuncs.com",
"LocalName": "华东 1(杭州)"
},
...
]
},
"RequestId": "..."
}
```
**If failed**, you'll see error messages:
- `InvalidAccessKeyId.NotFound` - Wrong Access Key ID
- `SignatureDoesNotMatch` - Wrong Access Key Secret
- `InvalidSecurityToken.Expired` - STS token expired (for StsToken mode)
- `Forbidden.RAM` - Insufficient permissions
### Debug Configuration
```bash
# Show current configuration
aliyun configure get
# Test with debug logging
aliyun ecs describe-regions --log-level=debug
# Check credential provider
aliyun configure get mode
```
## Security Best Practices
### 1. Use RAM Users (Not Root Account)
❌ **Don't**: Use Aliyun root account credentials
✅ **Do**: Create RAM users with specific permissions
```bash
# Create RAM user in console
# Attach only necessary policies
# Use RAM user's access keys
```
### 2. Principle of Least Privilege
Grant only the minimum permissions needed:
```bash
# Example: Read-only ECS access
# Attach policy: AliyunECSReadOnlyAccess
```
### 3. Rotate Access Keys Regularly
1. Create new access key in [RAM Console](https://ram.console.aliyun.com/manage/ak)
2. Update configuration using interactive mode:
```bash
aliyun configure
```
3. Delete old access key from console
> **Security Note**: Use interactive mode (`aliyun configure`) to avoid exposing credentials in shell history.
### 4. Use STS Tokens for Temporary Access
Configure STS Token mode interactively:
```bash
aliyun configure --mode StsToken
```
Or use environment variables for temporary credentials in CI/CD pipelines.
### 5. Use ECS RAM Roles When Possible
```bash
aliyun configure set --mode EcsRamRole --ram-role-name MyRole --region cn-hangzhou
```
### 6. Never Commit Credentials
```bash
# Add to .gitignore
echo "~/.aliyun/config.json" >> .gitignore
# Use environment variables in CI/CD instead
```
### 7. Secure Config File
```bash
# Restrict permissions
chmod 600 ~/.aliyun/config.json
```
## Troubleshooting
### Issue: Command Not Found
```bash
# Check installation
which aliyun
# Check PATH
echo $PATH
# Reinstall or add to PATH
```
### Issue: Authentication Failed
```bash
# Verify configuration
aliyun configure get
# Test with debug
aliyun ecs describe-regions --log-level=debug
# Check credentials in console
# Verify access key is active
```
### Issue: Permission Denied
```bash
# Error: Forbidden.RAM
# Check RAM user permissions
# Attach necessary policies in RAM console
# Example: AliyunECSFullAccess for ECS operations
```
### Issue: STS Token Expired
```bash
# Error: InvalidSecurityToken.Expired
# Reconfigure with new token using interactive mode
aliyun configure --mode StsToken
```
> **Security Note**: Use interactive mode to avoid exposing credentials in shell history.
### Issue: Wrong Region
```bash
# Some resources may not exist in the specified region
# Check available regions
aliyun ecs describe-regions
# Update default region
aliyun configure set region cn-shanghai
```
## Advanced Configuration
### Custom Endpoint
```bash
# Use custom or private endpoint
export ALIBABA_CLOUD_ECS_ENDPOINT=ecs-vpc.cn-hangzhou.aliyuncs.com
```
### Proxy Settings
```bash
# HTTP proxy
export HTTP_PROXY=http://proxy.example.com:8080
export HTTPS_PROXY=http://proxy.example.com:8080
# No proxy for specific domains
export NO_PROXY=localhost,127.0.0.1,.aliyuncs.com
```
### Timeout Settings
```bash
# Connection timeout (default: 10s)
export ALIBABA_CLOUD_CONNECT_TIMEOUT=30
# Read timeout (default: 10s)
export ALIBABA_CLOUD_READ_TIMEOUT=30
```
## Next Steps
After installation and configuration:
1. **Install plugins** for services you need (v3.3.3+ supports all published product plugins):
```bash
aliyun plugin install --names ecs vpc rds
# List all available plugins
aliyun plugin list-remote
```
2. **Explore commands**:
```bash
aliyun ecs --help
aliyun fc --help
```
3. **Read documentation**:
- [Command Syntax Guide](./command-syntax.md)
- [Global Flags Reference](./global-flags.md)
- [Common Scenarios](./common-scenarios.md)
## References
- Official Documentation: https://help.aliyun.com/zh/cli/
- RAM Console: https://ram.console.aliyun.com/
- Access Key Management: https://ram.console.aliyun.com/manage/ak
- Plugin Repository: https://github.com/aliyun/aliyun-cli
FILE:references/ram-policies.md
# RAM Policies
RAM (Resource Access Management) permissions required for OpenSearch instance management.
> **Terminology**: OpenSearch instance and OpenSearch app group are synonymous.
## Permission Summary
| API Action | RAM Action | Description |
|---------|-----------|-------------|
| CreateAppGroup | opensearch:CreateAppGroup | Create OpenSearch instance |
| ListAppGroups | opensearch:ListAppGroups | List instances |
| DescribeAppGroup | opensearch:DescribeAppGroup| Describe instance details |
## RAM Policy Document
### Full Access Policy (Read-Write)
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"opensearch:CreateAppGroup",
"opensearch:ListAppGroups",
"opensearch:DescribeAppGroup"
],
"Resource": "acs:opensearch:*:*:apps/*"
}
]
}
```
### Read-Only Policy
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"opensearch:ListAppGroups",
"opensearch:DescribeAppGroup"
],
"Resource": "acs:opensearch:*:*:apps/*"
}
]
}
```
### Create Instance Policy
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": "opensearch:CreateAppGroup",
"Resource": "acs:opensearch:*:*:apps/*"
}
]
}
```
## System Policies
Alibaba Cloud provides the following OpenSearch system policies:
| Policy Name | Description |
|-------------|-------------|
| AliyunOpenSearchFullAccess | Full management access to OpenSearch |
| AliyunOpenSearchReadOnlyAccess | Read-only access to OpenSearch |
## Usage
1. Log in to [RAM Console](https://ram.console.aliyun.com/)
2. Create custom policy or use system policy
3. Attach policy to RAM user or role
## Best Practices
1. **Principle of Least Privilege**: Only grant minimum permissions required for the task
2. **Use System Policies**: Prefer Alibaba Cloud provided system policies
3. **Regular Auditing**: Regularly review and clean up unnecessary permissions
4. **Resource-Level Restrictions**: Restrict access to specific resources when possible
## Reference Documentation
- [RAM Policy Overview](https://help.aliyun.com/document_detail/93732.html)
- [OpenSearch Authorization Info](https://help.aliyun.com/zh/open-search/industry-algorithm-edition/authorization-rules-of-applications)
FILE:references/related-apis.md
# Related APIs
Complete API list for OpenSearch instance management.
> **Terminology**: OpenSearch instance and OpenSearch app group are synonymous.
## Instance Management APIs
| Product | CLI Command | API Action | HTTP Method | Path | Description |
|---------|-------------|------------|-------------|------|-------------|
| OpenSearch | `aliyun opensearch create-app-group` | CreateAppGroup | POST | /v4/openapi/app-groups | Create OpenSearch instance |
| OpenSearch | `aliyun opensearch list-app-groups` | ListAppGroups | GET | /v4/openapi/app-groups | List instances |
| OpenSearch | `aliyun opensearch describe-app-group` | DescribeAppGroup | GET | /v4/openapi/app-groups/{appGroupIdentity} | Describe instance details |
## API Details
### CreateAppGroup - Create Instance
- **API Version**: 2017-12-25
- **API Style**: ROA
- **HTTP Method**: POST
- **Path**: /v4/openapi/app-groups
- **Operation Type**: write
- **Billing**: paid
**Request Parameters**:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| name | String | Yes | Instance name |
| type | String | Yes | Instance type: `standard` (High-performance) / `enhanced` (Industry Algorithm) |
| chargeType | String | No | Charge type: `POSTPAY` (default) / `PREPAY` |
| quota | Object | Yes | Quota info |
| quota.spec | String | Yes | Spec type |
| quota.docSize | Integer | Yes | Storage capacity (GB) |
| quota.computeResource | String | Yes | Compute resource (LCU) |
| domain | String | No | Industry type (enhanced only): `general` (default) / `ecommerce` / `esports` / `community` / `education` |
| order | Object | Conditional | Subscription order info (required when chargeType=PREPAY) |
| order.duration | Integer | Conditional | Subscription period quantity |
| order.pricingCycle | String | Conditional | Period unit: Year / Month |
| order.autoRenew | Boolean | No | Auto-renewal, default false |
**Query Parameters** (Idempotency and Dry-run):
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| dryRun | Boolean | No | Dry-run mode, true validates parameters without actual creation |
| clientToken | String | No | Idempotency token, same token multiple requests only creates once |
**CLI Example**:
```bash
# Generate idempotency token
CLIENT_TOKEN=$(uuidgen)
aliyun opensearch create-app-group \
--client-token "$CLIENT_TOKEN" \
--body '{
"name": "my_instance",
"type": "enhanced",
"chargeType": "POSTPAY",
"domain": "ecommerce",
"quota": {
"docSize": 100,
"computeResource": 2000,
"spec": "opensearch.private.common"
}
}' \
--user-agent AlibabaCloud-Agent-Skills
```
---
### ListAppGroups - List Instances
- **API Version**: 2017-12-25
- **API Style**: ROA
- **HTTP Method**: GET
- **Path**: /v4/openapi/app-groups
- **Operation Type**: read
- **Billing**: free
**Request Parameters**:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| engineType | String | Yes | Engine type, default `ha3` (must specify) |
| pageNumber | Integer | No | Page number, default 1 |
| pageSize | Integer | No | Page size, default 10 |
| name | String | No | Filter by instance name |
| type | String | No | Filter by type: `standard` (High-performance) / `enhanced` (Industry Algorithm) |
| instanceId | String | No | Filter by instance ID |
| sortBy | Integer | No | Sort field |
**CLI Example**:
```bash
aliyun opensearch list-app-groups \
--engine-type ha3 \
--page-number 1 \
--page-size 10 \
--user-agent AlibabaCloud-Agent-Skills
```
---
### DescribeAppGroup - Describe Instance
- **API Version**: 2017-12-25
- **API Style**: ROA
- **HTTP Method**: GET
- **Path**: /v4/openapi/app-groups/{appGroupIdentity}
- **Operation Type**: read
- **Billing**: free
**Request Parameters**:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| appGroupIdentity | String | Yes | Instance name or ID (path parameter) |
**Response Parameters**:
| Field | Type | Description |
|-------|------|-------------|
| requestId | String | Request ID |
| result.id | String | Instance ID |
| result.name | String | Instance name |
| result.type | String | Instance type (standard/enhanced) |
| result.status | String | Instance status |
| result.chargeType | String | Charge type (POSTPAY/PREPAY) |
| result.quota | Object | Quota info |
| result.currentVersion | String | Current version |
| result.lockMode | String | Lock status |
**CLI Example**:
```bash
aliyun opensearch describe-app-group \
--app-group-identity my_app \
--user-agent AlibabaCloud-Agent-Skills
```
---
## Response Status Codes
| Status Code | Description |
|-------------|-------------|
| 200 | Request successful |
| 400 | Invalid request parameters |
| 401 | Authentication failed |
| 403 | Permission denied |
| 404 | Resource not found |
| 500 | Internal server error |
## Common Response Structure
```json
{
"requestId": "xxx-xxx-xxx",
"result": { ... }
}
```
## Error Response Structure
```json
{
"code": "ErrorCode",
"message": "Error message",
"requestId": "xxx-xxx-xxx",
"httpCode": 400
}
```
## Reference Documentation
- [CreateAppGroup API Doc](https://api.aliyun.com/document/OpenSearch/2017-12-25/CreateAppGroup)
- [ListAppGroups API Doc](https://api.aliyun.com/document/OpenSearch/2017-12-25/ListAppGroups)
- [DescribeAppGroup API Doc](https://api.aliyun.com/document/OpenSearch/2017-12-25/DescribeAppGroup)
FILE:references/verification-method.md
# Verification Method
Success verification methods for OpenSearch instance management operations.
> **Terminology**: OpenSearch instance and OpenSearch app group are synonymous.
## Scenario Verification
### Verify Instance Creation Success
**Expected Result**: OpenSearch instance created successfully, `instanceId` field is non-empty
**Verification Flow**: First call `create-app-group` to create instance, then call `describe-app-group` to check status
**Verification Command**:
```bash
aliyun opensearch describe-app-group \
--app-group-identity <instance_name> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Success Criteria**:
- Response status code is 200
- Response contains `result` object
- `result.instanceId` field is non-empty (key indicator of successful creation)
**Example Response (Success)**:
```json
{
"requestId": "xxx-xxx-xxx",
"result": {
"name": "my_search_instance",
"instanceId": "ops-cn-xxxxx",
"status": "normal",
"produced": 1,
"type": "enhanced",
"chargeType": "POSTPAY",
"lockMode": "Unlock",
"domain": "ecommerce"
}
}
```
**Key Fields**:
| Field | Description |
|-------|-------------|
| `instanceId` | Instance ID, non-empty indicates successful creation |
| `status` | Instance status, `normal` means running |
| `produced` | Production status, `1` means production complete |
**Status Values**:
| Status | Description |
|--------|-------------|
| `producing` | Producing |
| `review_pending` | Review pending |
| `config_pending` | Configuration pending |
| `normal` | Normal (success) |
| `frozen` | Frozen |
---
### Verify Dry-run Mode
**Expected Result**: Validates parameters only, does not actually create instance
**Verification Command**:
```bash
aliyun opensearch create-app-group \
--dryRun true \
--body '{
"name": "test_dry_run",
"type": "enhanced",
"chargeType": "POSTPAY",
"domain": "general",
"quota": {
"docSize": 100,
"computeResource": 2000,
"spec": "opensearch.private.common"
}
}' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Success Criteria**:
- Response status code is 200
- Request succeeds but does not create instance
- Instance does not exist in list query
---
### Verify Idempotent Creation (clientToken)
**Expected Result**: Same clientToken multiple requests only creates once
**Verification Steps**:
```bash
# Generate idempotency token
CLIENT_TOKEN=$(uuidgen)
# First creation
aliyun opensearch create-app-group \
--client-token "$CLIENT_TOKEN" \
--body '{
"name": "idempotent_test_instance",
"type": "enhanced",
"chargeType": "POSTPAY",
"domain": "ecommerce",
"quota": {
"docSize": 100,
"computeResource": 2000,
"spec": "opensearch.private.common"
}
}' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
# Second request with same token (using same CLIENT_TOKEN)
aliyun opensearch create-app-group \
--client-token "$CLIENT_TOKEN" \
--body '{
"name": "idempotent_test_instance",
"type": "enhanced",
"chargeType": "POSTPAY",
"domain": "ecommerce",
"quota": {
"docSize": 100,
"computeResource": 2000,
"spec": "opensearch.private.common"
}
}' \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Success Criteria**:
- Both requests return 200
- Only one instance created (verify with describe-app-group)
---
### Verify List Instances Success
**Expected Result**: Successfully returns instance list
**Verification Command**:
```bash
aliyun opensearch list-app-groups \
--engine-type ha3 \
--page-number 1 \
--page-size 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Success Criteria**:
- Response status code is 200
- Response contains `result` array
- `totalCount` field shows correct instance count
---
### Verify Describe Instance Success
**Expected Result**: Successfully returns instance details
**Verification Command**:
```bash
aliyun opensearch describe-app-group \
--app-group-identity <instance_name> \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
**Success Criteria**:
- Response status code is 200
- Response contains `result` object
- `result` contains instance details (id, name, type, status, quota, etc.)
**Example Response (Success)**:
```json
{
"requestId": "0A6EB64B-B4C8-CF02-810F-E660812972FF",
"result": {
"id": "110116134",
"name": "my_search_instance",
"instanceId": "ops-cn-xxxxx",
"type": "enhanced",
"status": "normal",
"chargeType": "POSTPAY",
"domain": "ecommerce",
"quota": {
"docSize": 100,
"computeResource": 2000,
"spec": "opensearch.private.common"
},
"lockMode": "Unlock",
"produced": 1
}
}
```
---
## Complete Workflow Verification
### Full Test Script
```bash
#!/bin/bash
APP_NAME="test_instance_$(date +%s)"
# Generate idempotency token
CLIENT_TOKEN=$(uuidgen)
echo "=== 1. Create Instance ==="
aliyun opensearch create-app-group \
--client-token "$CLIENT_TOKEN" \
--body "{
\"name\": \"$APP_NAME\",
\"type\": \"enhanced\",
\"chargeType\": \"POSTPAY\",
\"domain\": \"general\",
\"quota\": {
\"docSize\": 100,
\"computeResource\": 2000,
\"spec\": \"opensearch.private.common\"
}
}" \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
echo ""
echo "=== 2. Wait for Instance Production ==="
sleep 30
echo ""
echo "=== 3. Describe Instance (Verify Creation) ==="
aliyun opensearch describe-app-group \
--app-group-identity $APP_NAME \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
echo ""
echo "=== 4. List Instances ==="
aliyun opensearch list-app-groups \
--engine-type ha3 \
--page-number 1 \
--page-size 10 \
--user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
echo ""
echo "=== Test Complete ==="
echo "Note: To delete test instance, please use OpenSearch Console"
```
---
## Troubleshooting
### Common Errors and Solutions
| Error Message | Possible Cause | Solution |
|---------------|----------------|----------|
| `InvalidAccessKeyId.NotFound` | Invalid Access Key ID | Check credential configuration |
| `Forbidden.RAM` | Insufficient permissions | Check RAM policy |
| `AppGroupNotExist` | Instance does not exist | Verify instance name |
| `InvalidParameter` | Invalid parameter | Check parameter format and values |
| `QuotaExceed` | Quota exceeded | Contact Alibaba Cloud to increase quota |
### Debug Commands
```bash
# Enable debug mode for detailed request/response
aliyun opensearch list-app-groups --log-level=debug --user-agent AlibabaCloud-Agent-Skills/alibabacloud-opensearch-app-manage
```
DataWorks Data Quality (Read-Only): Query rule templates, data quality monitors (scans), alert rules, and scan run records/logs. Uses aliyun CLI to call data...
---
name: alibabacloud-dataworks-data-quality
description: |
DataWorks Data Quality (Read-Only): Query rule templates, data quality monitors (scans), alert rules, and scan run records/logs.
Uses aliyun CLI to call dataworks-public OpenAPI (2024-05-18). All operations are read-only — no create, update, or delete.
Trigger keywords: DataWorks data quality, quality rule, quality template, quality monitor, quality scan, scan run,
quality check result, quality alert rule, quality run log, DQ monitor, data quality execution, quality pass/fail,
list quality scans, get quality scan, query quality result, quality monitoring detail, quality run history.
Not triggered: creating/updating/deleting quality rules or monitors, data source management, compute resource management,
resource group management, workspace member management, data development tasks, scheduling configuration.
---
# DataWorks Data Quality (Read-Only)
Query and investigate **Rule Templates**, **Data Quality Monitors**, **Alert Rules**, and **Scan Run Records** in Alibaba Cloud DataWorks.
> **Coverage**: All Get/List read-only OpenAPIs under DataWorks Data Quality, totaling 9:
> ListDataQualityTemplates / GetDataQualityTemplate · ListDataQualityScans / GetDataQualityScan ·
> ListDataQualityAlertRules / GetDataQualityAlertRule · ListDataQualityScanRuns / GetDataQualityScanRun / GetDataQualityScanRunLog
> **Excludes** write operations: Create / Update / Delete / CreateDataQualityScanRun.
> **Read-Only Skill**: This skill supports query operations only. Any write operation request **must be blocked immediately** — direct the user to the DataWorks console.
## Architecture
```
DataWorks Data Quality
├── Rule Templates ─── Reusable metric logic definitions (built-in & custom)
│
├── Data Quality Monitors (Scans) ─── Monitor tasks bound to tables, with rules and trigger config
│ └── Alert Rules ─── Notification rules tied to a monitor (channels, recipients, conditions)
│
└── Scan Runs ─── Execution records each time a monitor runs
└── Scan Run Logs ─── Detailed execution logs for a run
```
---
## Global Rules
### Prerequisites
1. **Aliyun CLI >= 3.3.3**: `aliyun version` (If not installed or version too low, run `curl -fsSL https://aliyuncli.alicdn.com/setup.sh | bash` to install/update. See [references/cli-installation-guide.md](references/cli-installation-guide.md))
2. **First-time use**: `aliyun configure set --auto-plugin-install true`
3. **Plugin update**: [MUST] run `aliyun plugin update` to ensure that any existing plugins on your local machine are always up-to-date.
4. **AI-Mode Configuration**: [MUST] Before using aliyun CLI commands, configure AI-Mode:
- `aliyun configure ai-mode enable`
- `aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality"`
- `aliyun configure ai-mode disable`
5. **jq** (recommended for output formatting): `which jq`
6. **Credential status**: `aliyun configure list`, verify valid credentials exist
> **Security Rules**: **DO NOT** read/print/echo AK/SK values. **ONLY** use `aliyun configure list` to check credential status.
### Command Formatting
- **User-Agent (mandatory)**: All `aliyun` CLI commands **must** include `--user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality`.
- **Timeout (mandatory)**: All `aliyun` CLI commands **must** include `--connect-timeout 5 --read-timeout 10`. These match the CLI built-in defaults and make the timeout policy explicit.
- **Single-line commands**: Construct as a **single-line string**; do not use `\` for line breaks.
- **jq step-by-step**: First execute the `aliyun` command to get JSON, then pipe to `jq` for formatting.
- **Endpoint mandatory**: When specifying `--region`, you **must** also add `--endpoint dataworks.<REGION_ID>.aliyuncs.com`.
### Parameter Confirmation
**Must be explicitly provided by user — do not assume or use defaults**:
- `ProjectId`: Core parameter for every query — must be confirmed
- `Id`-type resource identifiers: template ID, monitor ID, alert rule ID, scan run ID
- `region`: Affects endpoint — must be confirmed
**Can use default values directly — no user confirmation needed**:
- `PageNumber`: default `1`
- `PageSize`: default `10`
- `SortBy`: default `ModifyTime Desc` or `CreateTime Desc`
**Ask contextually — only collect when the user has a specific need**:
- `Name`, `Table`: fuzzy search keywords
- Time range: `CreateTimeFrom` / `CreateTimeTo`
- `Status`: collect only when the user explicitly wants to filter by a specific status
> If the user has already provided `ProjectId`, `Id`, or `region` in the conversation, reuse them directly without re-confirmation.
### Time Parameter Conversion
When the user describes time in natural language, convert it to millisecond timestamps automatically. Do **not** ask the user to provide raw timestamps.
- `"yesterday"` → yesterday `00:00:00` to `23:59:59`
- `"today"` → today `00:00:00` to current time
- `"last N days"` → current time minus `N × 24` hours through current time
- If the time phrase is ambiguous, ask a clarification question and offer a suggested range
### Query Result Presentation
After every query, present the result in a decision-friendly way:
- **List queries**: use a Markdown table for key fields such as ID, name, status, and time; do **not** dump raw JSON
- **Detail queries**: present a short summary first, then expand full `Spec` only if the user asks
- **Abnormal status**: highlight `Fail` / `Error` / `Warn`, and proactively recommend the next diagnostic step
- **Empty result**: explain likely causes such as wrong `ProjectId`, wrong `region`, or filters that are too strict
### Pagination
- First query uses the default `PageSize` of `10`
- If the number of returned rows equals `PageSize`, proactively offer next page or a larger `PageSize`
- Do not fetch more than `100` records in a single request
### ⚠️ Read-Only Execution Gate
> **MANDATORY**: Before responding to ANY request, check whether it involves a write operation.
> If YES: **BLOCK immediately**. Do NOT call any API. Respond with:
> "This skill supports query operations only and cannot perform create/update/delete. Please go to the [DataWorks Console](https://dataworks.console.aliyun.com) for configuration."
**Quick Reference — All Blocked Operations**:
| Operation Type | Blocked APIs |
|----------------|-------------|
| Create | CreateDataQualityTemplate, CreateDataQualityScan, CreateDataQualityScanRun, CreateDataQualityAlertRule |
| Update | UpdateDataQualityTemplate, UpdateDataQualityScan, UpdateDataQualityAlertRule |
| Delete | DeleteDataQualityTemplate, DeleteDataQualityScan, DeleteDataQualityAlertRule |
| Trigger | CreateDataQualityScanRun (manual execution trigger) |
### RAM Permissions
All operations require `dataworks:<APIAction>` permissions on the target workspace.
> Full permission matrix: [references/ram-policies.md](references/ram-policies.md)
---
## Quick Start: Data Quality Investigation
When the user request is vague, use the following default path:
1. **Environment check** — Confirm CLI and credentials per Prerequisites. After completion, proactively suggest the workspace confirmation step.
2. **Confirm workspace** — Confirm `ProjectId` and `region`. If either is missing, use Module 0. After completion, proactively suggest listing monitors.
3. **List monitors** — Call `ListDataQualityScans`, present a table, and let the user choose a monitor. After completion, proactively suggest monitor detail.
4. **Check monitor detail** — Call `GetDataQualityScan`, summarize rules, monitored object, and trigger mode. After completion, proactively suggest recent runs.
5. **Check run history** — Call `ListDataQualityScanRuns`, default to the most recent 10 rows, and highlight abnormal status. After completion, proactively suggest drilling into one run.
6. **Drill into failed or warned runs** — For `Fail` / `Error` / `Warn`, call `GetDataQualityScanRun` and summarize per-rule results. After completion, proactively suggest log inspection.
7. **Fetch execution logs** — If `Results` shows failed rules or runtime errors, call `GetDataQualityScanRunLog` to locate root cause. After completion, proactively suggest whether further analysis is needed.
---
## Next Step Guidance
| Completed Operation | Recommended Next Step |
|---------------------|----------------------|
| ListDataQualityTemplates | "Would you like to view the full configuration of a specific template? (GetDataQualityTemplate)" |
| GetDataQualityTemplate | "Would you like to view monitors that use this template? (ListDataQualityScans)" |
| ListDataQualityScans | "Select a monitor to view its full configuration? (GetDataQualityScan)" |
| GetDataQualityScan | "View associated alert rules (ListDataQualityAlertRules) or recent run history (ListDataQualityScanRuns)?" |
| ListDataQualityAlertRules | "View details for a specific alert rule? (GetDataQualityAlertRule)" |
| GetDataQualityAlertRule | "Return to view run history for the associated monitor? (ListDataQualityScanRuns)" |
| ListDataQualityScanRuns | "View detailed results for a specific run? (GetDataQualityScanRun)" |
| GetDataQualityScanRun (Pass) | "This run passed. Would you like to view other run records or alert configuration?" |
| GetDataQualityScanRun (Fail/Error/Warn) | "Anomaly detected — recommend viewing execution logs to locate the root cause. (GetDataQualityScanRunLog)" |
| GetDataQualityScanRunLog (NextOffset=-1) | "Log retrieval complete. Is further analysis needed?" |
| GetDataQualityScanRunLog (NextOffset≠-1) | "Log not fully retrieved — continue fetching the next segment. (Retry with Offset)" |
---
## Trigger Rules
**Trigger scenarios**: Query data quality monitors/rules/templates/alerts/scan runs/logs, diagnose data quality check failures, view quality alert notification configuration, `list/get quality scan/rule/template/alert/run`
**Not triggered**:
- Creating/updating/deleting data quality configuration → Use DataWorks Console
- Data source/compute resource/resource group management → `alibabacloud-dataworks-infra-manage`
- Workspace query/member management → `alibabacloud-dataworks-workspace-manage`
- Data development node/scheduling configuration → `alibabacloud-dataworks-datastudio-develop`
---
## Interaction Flow
**Identify query intent → Environment check → Module 0 (if ProjectId/region missing) → Collect parameters → Execute command → Present results → Guide next step**
Common aliases: DW = DataWorks, DQ = Data Quality, scan = monitor, scan run = execution record
---
# Module 0: Workspace / ProjectId / Region Query
> If the `alibabacloud-dataworks-workspace-manage` skill is available, prefer using it for workspace lookup. The following is only a fallback.
```bash
aliyun dataworks-public list-projects --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality --status Available --page-size 100
```
Rules:
- If the user provides only a workspace name, list candidate workspaces and ask the user to confirm the `ProjectId`
- If `ProjectId` is unknown, ask for it explicitly and never guess
- If `region` is unknown, offer common regions for confirmation: `cn-hangzhou`, `cn-shanghai`, `cn-beijing`, `cn-shenzhen`
- Once `ProjectId` and `region` are confirmed in the conversation, reuse them in later steps
Intent guidance:
- `"there's a data quality issue"` → ask whether the user wants monitor configuration, run records, or alert settings
- `"show me this table"` → start with `list-data-quality-scans --table <TABLE_NAME>`
- If the intent is still unclear, ask the user to choose one of four modules: rule templates, monitors, alert rules, or scan runs
---
# Module 1: Rule Templates
Rule templates define reusable metric logic such as null rate, duplicate rate, row count, and custom SQL checks. Use this module when the user wants to know what a template checks, whether it is built-in or workspace-specific, and how its threshold logic is defined.
## Task 1.1: List Rule Templates (ListDataQualityTemplates)
> **Always call `ListDataQualityTemplates`** whenever the user asks about quality rule templates in their workspace. Never answer without invoking the API.
>
> **Scope**: This API only returns workspace **custom** templates. It does **not** support querying system built-in templates. `--project-id` is **required** — if the user has not provided `ProjectId`, collect it first via Module 0.
```bash
aliyun dataworks-public list-data-quality-templates --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --project-id <PROJECT_ID> [--name <FUZZY_NAME>] [--catalog <CATALOG_PATH>] [--page-number 1] [--page-size 10]
```
How to interpret the result:
- `PageInfo.DataQualityTemplates[]` is the working set for user selection
- Present a Markdown table of `Id`, template name (from `Spec`), `Catalog`/category, and description — do not dump raw JSON
- Use `Catalog` and template naming patterns to tell the user what class of checks is available
- After listing, proactively suggest the user pick a template ID to view full configuration via `GetDataQualityTemplate`
- If the user asks about system built-in templates, explain this API only covers workspace custom templates and direct them to the DataWorks console
## Task 1.2: Get Rule Template Details (GetDataQualityTemplate)
```bash
aliyun dataworks-public get-data-quality-template --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --id <TEMPLATE_ID>
```
How to interpret the result:
- Focus on `Spec`: summarize the metric logic, parameter definitions, and threshold expression
- Tell the user what this template checks and how pass/fail is decided
- Mention whether the template belongs to a workspace (`ProjectId` present) or is reused as a generic template
- Expand full `Spec` only when the user explicitly asks for raw detail
---
# Module 2: Data Quality Monitors
A data quality monitor (scan) is a concrete monitoring task bound to a table or field. Use this module to locate monitors, explain what they check, and understand how they are triggered.
## Task 2.1: List Data Quality Monitors (ListDataQualityScans)
```bash
aliyun dataworks-public list-data-quality-scans --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --project-id <PROJECT_ID> --page-number 1 --page-size 10 [--name <FUZZY_NAME>] [--table <FUZZY_TABLE_NAME>] [--sort-by "ModifyTime Desc"]
```
How to interpret the result:
- `PageInfo.DataQualityScans[]` is the candidate monitor list; show `Id`, `Name`, `Description`, owner, and latest update time
- When `--Table` is used, explicitly tell the user these monitors are the likely matches for that table
- Use the table to help the user choose one target monitor before moving to detail query
- When the list is empty, suggest checking `ProjectId`, `region`, or relaxing `Name` / `Table`
## Task 2.2: Get Monitor Details (GetDataQualityScan)
```bash
aliyun dataworks-public get-data-quality-scan --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --id <SCAN_ID>
```
How to interpret the result:
- `Spec`: summarize monitored object, rule count, core metrics, and threshold settings
- `Trigger`: explain whether the monitor is `ByManual` or `BySchedule`
- `ComputeResource` and `RuntimeResource`: mention them only when they help explain execution behavior
- `Parameters` and `Hooks`: summarize only if they affect how the run is triggered or analyzed
- Present a concise monitor summary first, then suggest alert-rule or run-history follow-up
---
# Module 3: Alert Rules
Alert rules define when notifications are sent and to whom. Use this module when the user asks who gets notified, through which channel, and under what condition.
**Receiver Type Quick Reference**
| ReceiverType | Description |
|-------------|-------------|
| AliUid | Specific Alibaba Cloud account UID |
| DataQualityScanOwner | Owner of the data quality monitor task |
| TaskOwner | Owner of the associated scheduling task |
| DingdingUrl | DingTalk custom robot Webhook |
| FeishuUrl | Feishu custom robot Webhook |
| WeixinUrl | WeCom Webhook |
| WebhookUrl | Generic Webhook URL |
| ShiftSchedule | On-call schedule (notify by shift) |
## Task 3.1: List Alert Rules (ListDataQualityAlertRules)
```bash
aliyun dataworks-public list-data-quality-alert-rules --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --project-id <PROJECT_ID> --page-number 1 --page-size 10 [--data-quality-scan-id <SCAN_ID>] [--sort-by "CreateTime Desc"]
```
How to interpret the result:
- `PageInfo.DataQualityAlertRules[]` should be summarized as: rule ID, condition, channels, receivers, and associated monitor IDs
- Translate `Notification.Channels` into user-friendly channel names such as DingTalk, email, Feishu, SMS, or Webhook
- Summarize `Notification.Receivers` by receiver type instead of showing nested raw JSON
- If `DataQualityScanId` is provided, explicitly state these are the alert rules attached to that monitor
## Task 3.2: Get Alert Rule Details (GetDataQualityAlertRule)
```bash
aliyun dataworks-public get-data-quality-alert-rule --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --id <ALERT_RULE_ID>
```
How to interpret the result:
- Explain the alert condition in plain language
- Summarize notification channels and recipients with emphasis on who will be notified and how
- Call out whether the rule targets one monitor or multiple monitors
- If the user is diagnosing missing alerts, suggest returning to recent run history for the associated monitor
---
# Module 4: Scan Runs
A scan run is created every time a monitor executes. Use this module to inspect run history, diagnose failed checks, and read execution logs.
**Status Quick Reference**
| Status | Meaning | Recommended Path |
|--------|---------|-----------------|
| Pass | All rules passed | No action needed |
| Fail | At least one rule failed to meet the threshold | GetDataQualityScanRun → Results → GetDataQualityScanRunLog |
| Error | Execution error (engine error, insufficient resources) | GetDataQualityScanRunLog to view error details |
| Warn | Warning triggered but did not reach the blocking threshold | GetDataQualityScanRun → Results to view metric values |
| Running | Execution in progress | Wait for completion before querying |
## Task 4.1: List Scan Runs (ListDataQualityScanRuns)
```bash
aliyun dataworks-public list-data-quality-scan-runs --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --project-id <PROJECT_ID> [--data-quality-scan-id <SCAN_ID>] [--status <Pass|Running|Error|Fail|Warn>] [--create-time-from <TIMESTAMP_MS>] [--create-time-to <TIMESTAMP_MS>] [--filter '{"TaskInstanceId":"<INSTANCE_ID>"}'] [--sort-by "CreateTime Desc"] [--page-number 1] [--page-size 20]
```
Filter quick reference:
| Scenario | Filter JSON Example |
|----------|---------------------|
| Filter by scheduling instance | `{"TaskInstanceId":"123456"}` |
| Filter by run number | `{"RunNumber":"2"}` |
How to interpret the result:
- `PageInfo.DataQualityScanRuns[]` should be shown as a table with `Id`, `Status`, `CreateTime`, `FinishTime`, and key runtime parameters
- Sort by recent time first so phrases like "most recent" map naturally to the first row
- Highlight `Fail`, `Error`, and `Warn`, then recommend drilling into `GetDataQualityScanRun`
- If the user asks for recent failures, combine `Status=Fail` with a converted time range instead of asking for timestamps
## Task 4.2: Get Scan Run Details (GetDataQualityScanRun)
```bash
aliyun dataworks-public get-data-quality-scan-run --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --id <SCAN_RUN_ID>
```
How to interpret the result:
- `Status`: state clearly whether the run passed, failed, warned, errored, or is still running
- `Results`: extract each rule's status, actual metric value, threshold, and whether it caused the overall failure; present this as a table instead of raw JSON
- `Scan`: use it as configuration snapshot context only when it helps explain the failure
- `Parameters`: mention runtime parameters when they may have influenced the result
- If any rule is abnormal, proactively suggest `GetDataQualityScanRunLog`
## Task 4.3: Get Scan Run Log (GetDataQualityScanRunLog)
```bash
aliyun dataworks-public get-data-quality-scan-run-log --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality [--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com] --id <SCAN_RUN_ID> [--offset <BYTE_OFFSET>]
```
How to interpret the result:
- `Log` is the raw execution trace; summarize the root cause first, then provide key excerpts if needed
- `NextOffset = -1` means log retrieval is complete
- If `NextOffset != -1`, continue querying with the returned offset until completion when the user asks for the full log
- When logs are long, explain the main error path instead of pasting everything by default
---
## Best Practices
1. **List before detail** — Do not guess IDs. Use list queries first, then drill into a selected resource.
2. **Diagnose failures in order** — For `Fail`, check `GetDataQualityScanRun` results first, then read `GetDataQualityScanRunLog`.
3. **Templates scope** — `ListDataQualityTemplates` returns workspace **custom** templates only; `ProjectId` is required. Built-in templates must be viewed in the DataWorks console.
4. **Use a bounded time window** — For run-history queries, default to recent 24 hours or recent 10 rows to avoid oversized result sets.
5. **Proactively guide the next step** — After every query, suggest the most likely follow-up instead of waiting for the user to ask.
6. **Expand `Spec` on demand** — `Spec` is often verbose. Summarize first, expand only on request.
## Query Result Guidance
- **Empty list result**: Explain likely causes including wrong `ProjectId`, wrong `region`, or overly strict filters — suggest confirming parameters or relaxing filter conditions.
- **Spec field handling**: First extract monitored object, rule count, key thresholds, and trigger mode; expand full JSON only when the user requests it.
- **Abnormal status handling**: When encountering `Fail` / `Error` / `Warn`, do not just display the status — proactively provide the next diagnostic path.
- **Results field handling**: Present status, actual value, threshold, and conclusion per rule in a table — do not dump the raw array.
## Common Errors
| Error Code | Solution |
|------------|----------|
| Forbidden.Access / PermissionDenied | Check RAM permissions, see [references/ram-policies.md](references/ram-policies.md) |
| InvalidParameter | Verify parameter names, JSON shape, and required fields |
| EntityNotExists | Check whether the ID, `ProjectId`, and `region` match the target resource |
| InvalidPageSize | `PageSize` must be within the API-supported range, usually `1-100` |
## Region and Endpoint
Common: `cn-hangzhou`, `cn-shanghai`, `cn-beijing`, `cn-shenzhen`.
Endpoint format: `dataworks.<REGION_ID>.aliyuncs.com`
> Full region and endpoint list: [references/related-apis.md](references/related-apis.md)
## Reference Links
| Reference | Description |
|-----------|-------------|
| [references/ram-policies.md](references/ram-policies.md) | RAM permission configuration and policy examples |
| [references/related-apis.md](references/related-apis.md) | API parameter details and Region Endpoints |
| [references/cli-installation-guide.md](references/cli-installation-guide.md) | Aliyun CLI installation guide |
FILE:references/cli-installation-guide.md
# Aliyun CLI Installation & Configuration Guide
> **Aliyun CLI 3.3.3+** required for full plugin ecosystem support.
## Installation
### macOS (Homebrew)
```bash
brew install aliyun-cli
brew upgrade aliyun-cli
aliyun version # verify >= 3.3.3
```
### macOS (Binary)
```bash
wget https://aliyuncli.alicdn.com/aliyun-cli-macosx-latest-amd64.tgz
tar -xzf aliyun-cli-macosx-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
aliyun version
```
### Linux
```bash
wget https://aliyuncli.alicdn.com/aliyun-cli-linux-latest-amd64.tgz
tar -xzf aliyun-cli-linux-latest-amd64.tgz
sudo mv aliyun /usr/local/bin/
aliyun version
```
### Windows (PowerShell)
```powershell
Invoke-WebRequest -Uri "https://aliyuncli.alicdn.com/aliyun-cli-windows-latest-amd64.zip" -OutFile "aliyun-cli.zip"
Expand-Archive -Path aliyun-cli.zip -DestinationPath C:\aliyun-cli
$env:Path += ";C:\aliyun-cli"
aliyun version
```
## Configuration
```bash
# Interactive configuration
aliyun configure
# Or set via environment variables (see official docs for details)
# Note: AK/SK should be configured through `aliyun configure` or credential files,
# not echoed/printed in chat. See SKILL.md security rules.
export ALIBABA_CLOUD_REGION_ID=cn-hangzhou
```
## Enable Auto Plugin Install
```bash
aliyun configure set --auto-plugin-install true
```
## Enable AI-Mode
```bash
aliyun configure ai-mode enable
aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality"
aliyun configure ai-mode disable
```
## Verification
```bash
aliyun ecs describe-regions --user-agent AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality # test authentication
aliyun configure list # show current configuration
```
## References
- Official Documentation: https://help.aliyun.com/zh/cli/
- RAM Console: https://ram.console.aliyun.com/
- Access Key Management: https://ram.console.aliyun.com/manage/ak
FILE:references/ram-policies.md
# DataWorks Data Quality RAM Policies
This document lists all RAM permissions required to use the DataWorks Data Quality skill.
> **Read-Only Skill**: All operations in this skill are read-only (Get/List). No write permissions are required.
## Permission Matrix
### Workspace Permissions
| API Action | RAM Permission | Access Level |
|------------|----------------|--------------|
| ListProjects | dataworks:ListProjects | List |
### Rule Template Permissions
| API Action | RAM Permission | Access Level |
|------------|----------------|--------------|
| ListDataQualityTemplates | dataworks:ListDataQualityTemplates | List |
| GetDataQualityTemplate | dataworks:GetDataQualityTemplate | Read |
### Data Quality Monitor Permissions
| API Action | RAM Permission | Access Level |
|------------|----------------|--------------|
| ListDataQualityScans | dataworks:ListDataQualityScans | List |
| GetDataQualityScan | dataworks:GetDataQualityScan | Read |
### Alert Rule Permissions
| API Action | RAM Permission | Access Level |
|------------|----------------|--------------|
| ListDataQualityAlertRules | dataworks:ListDataQualityAlertRules | List |
| GetDataQualityAlertRule | dataworks:GetDataQualityAlertRule | Read |
### Scan Run Permissions
| API Action | RAM Permission | Access Level |
|------------|----------------|--------------|
| ListDataQualityScanRuns | dataworks:ListDataQualityScanRuns | List |
| GetDataQualityScanRun | dataworks:GetDataQualityScanRun | Read |
| GetDataQualityScanRunLog | dataworks:GetDataQualityScanRunLog | Read |
## RAM Policy Example
### Minimum Permission Policy (Read-Only)
```json
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dataworks:ListProjects",
"dataworks:ListDataQualityTemplates",
"dataworks:GetDataQualityTemplate",
"dataworks:ListDataQualityScans",
"dataworks:GetDataQualityScan",
"dataworks:ListDataQualityAlertRules",
"dataworks:GetDataQualityAlertRule",
"dataworks:ListDataQualityScanRuns",
"dataworks:GetDataQualityScanRun",
"dataworks:GetDataQualityScanRunLog"
],
"Resource": "*"
}
]
}
```
## Common Permission Errors
| Error Code | Description | Solution |
|------------|-------------|----------|
| `Forbidden.Access` | Insufficient RAM permissions | Add the corresponding `dataworks:<Action>` permission |
| `PermissionDenied` | No operation permission | Check if the RAM policy is attached to the correct identity |
| `InvalidAccessKeyId.NotFound` | Invalid AccessKey | Check AccessKey configuration via `aliyun configure list` |
| `SignatureDoesNotMatch` | Signature mismatch | Check AccessKeySecret |
## Local CLI Configuration Notes
- `aliyun configure set --auto-plugin-install true`, `aliyun configure ai-mode enable`, `aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality"`, `aliyun configure ai-mode disable`, and `aliyun configure list` are local CLI configuration commands.
- These commands do not map to DataWorks RAM permissions, but they are required prerequisites for safe skill execution.
## References
- [DataWorks RAM Permission Guide](https://help.aliyun.com/zh/dataworks/user-guide/dataworks-ram-permissions)
- [RAM Console](https://ram.console.aliyun.com/)
FILE:references/related-apis.md
# DataWorks Data Quality Related APIs
All APIs use: `aliyun dataworks-public <ApiName> --user-agent AlibabaCloud-Agent-Skills`
## Local CLI Prerequisite Commands
| Surface | Command | Purpose |
| --- | --- | --- |
| `aliyun` CLI | `aliyun version` | Verify CLI availability and version |
| `aliyun` CLI | `aliyun configure set --auto-plugin-install true` | Enable plugin auto-install |
| `aliyun` CLI | `aliyun configure ai-mode enable` | Enable AI-Mode before running aliyun CLI commands |
| `aliyun` CLI | `aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-dataworks-data-quality"` | Configure the skill-specific AI-Mode user agent |
| `aliyun` CLI | `aliyun configure ai-mode disable` | Disable AI-Mode after task completion |
| `aliyun` CLI | `aliyun configure list` | Verify credential profile state safely |
---
## Module 1: Rule Template APIs
### ListDataQualityTemplates — List Rule Templates
**Request Parameters:**
| Name | Type | Required | Description | Example |
|------|------|----------|-------------|---------|
| ProjectId | long | No | Workspace ID (omit to query system built-in templates) | 10000 |
| Name | string | No | Fuzzy match on template name | table_rows |
| Catalog | string | No | Template catalog path filter | /timeliness/ods_layer |
| PageNumber | integer | No | Page number, default 1 | 1 |
| PageSize | integer | No | Page size, default 10 | 10 |
**Response Parameters:**
| Name | Type | Description |
|------|------|-------------|
| RequestId | string | Request ID |
| PageInfo.TotalCount | integer | Total count |
| PageInfo.PageNumber | integer | Current page |
| PageInfo.PageSize | integer | Page size |
| PageInfo.DataQualityTemplates[] | array | Template list |
| - Id | string | Template ID (UUID) |
| - Spec | string | Template configuration JSON |
| - Owner | string | Owner user ID |
| - CreateUser / ModifyUser | string | Creator / last modifier |
| - CreateTime / ModifyTime | long | Timestamps (ms) |
**Example:**
```bash
aliyun dataworks-public ListDataQualityTemplates --user-agent AlibabaCloud-Agent-Skills --region cn-hangzhou --endpoint dataworks.cn-hangzhou.aliyuncs.com --ProjectId 10000 --PageNumber 1 --PageSize 10
```
---
### GetDataQualityTemplate — Get Rule Template Details
**Request Parameters:**
| Name | Type | Required | Description | Example |
|------|------|----------|-------------|---------|
| Id | string | Yes | Template ID (UUID) | a7ef0634-20ec-4a7c-a214-54020f91xxxx |
**Response Parameters:**
| Name | Type | Description |
|------|------|-------------|
| RequestId | string | Request ID |
| DataQualityTemplate.Id | string | Template ID |
| DataQualityTemplate.ProjectId | long | Workspace ID |
| DataQualityTemplate.Spec | string | Full template configuration JSON |
| DataQualityTemplate.Owner | string | Owner user ID |
| DataQualityTemplate.CreateUser / ModifyUser | string | Creator / last modifier |
| DataQualityTemplate.CreateTime / ModifyTime | long | Timestamps (ms) |
**Example:**
```bash
aliyun dataworks-public GetDataQualityTemplate --user-agent AlibabaCloud-Agent-Skills --region cn-hangzhou --endpoint dataworks.cn-hangzhou.aliyuncs.com --Id a7ef0634-20ec-4a7c-a214-54020f91xxxx
```
---
## Module 2: Data Quality Monitor (Scan) APIs
### ListDataQualityScans — List Data Quality Monitors
**Request Parameters:**
| Name | Type | Required | Description | Example |
|------|------|----------|-------------|---------|
| ProjectId | long | Yes | Workspace ID | 10000 |
| PageNumber | integer | Yes | Page number, default 1 | 1 |
| PageSize | integer | Yes | Page size, default 10 | 10 |
| Name | string | No | Fuzzy match on monitor name | test |
| Table | string | No | Fuzzy match on monitored table name | video_album |
| SortBy | string | No | Sort field + direction | ModifyTime Desc |
**SortBy options:** `ModifyTime Desc`, `ModifyTime Asc`, `CreateTime Desc`, `CreateTime Asc`, `Id Desc`, `Id Asc`
**Response Parameters:**
| Name | Type | Description |
|------|------|-------------|
| RequestId | string | Request ID |
| PageInfo.TotalCount | integer | Total count |
| PageInfo.PageNumber | integer | Current page |
| PageInfo.PageSize | integer | Page size |
| PageInfo.DataQualityScans[] | array | Monitor list |
| - Id | long | Monitor ID |
| - Name | string | Monitor name |
| - Description | string | Description |
| - Owner | string | Owner user ID |
| - CreateTime / ModifyTime | long | Timestamps (ms) |
**Example:**
```bash
aliyun dataworks-public ListDataQualityScans --user-agent AlibabaCloud-Agent-Skills --region cn-hangzhou --endpoint dataworks.cn-hangzhou.aliyuncs.com --ProjectId 10000 --PageNumber 1 --PageSize 10 --Table video_album --SortBy "ModifyTime Desc"
```
---
### GetDataQualityScan — Get Monitor Details
**Request Parameters:**
| Name | Type | Required | Description | Example |
|------|------|----------|-------------|---------|
| Id | long | Yes | Monitor ID | 10001 |
**Response Parameters:**
| Name | Type | Description |
|------|------|-------------|
| RequestId | string | Request ID |
| DataQualityScan.Id | long | Monitor ID |
| DataQualityScan.Name | string | Monitor name |
| DataQualityScan.Description | string | Description |
| DataQualityScan.ProjectId | long | Workspace ID |
| DataQualityScan.Spec | string | Full rule configuration JSON (table, field, metric, threshold) |
| DataQualityScan.Parameters | array | Execution parameter definitions |
| DataQualityScan.ComputeResource | object | Compute engine configuration |
| DataQualityScan.RuntimeResource | object | Resource group configuration |
| DataQualityScan.Trigger | object | Trigger configuration (ByManual or BySchedule) |
| DataQualityScan.Hooks | array | Post-execution hook configurations |
| DataQualityScan.Owner | string | Owner user ID |
| DataQualityScan.CreateUser / ModifyUser | string | Creator / last modifier |
| DataQualityScan.CreateTime / ModifyTime | long | Timestamps (ms) |
**Example:**
```bash
aliyun dataworks-public GetDataQualityScan --user-agent AlibabaCloud-Agent-Skills --region cn-hangzhou --endpoint dataworks.cn-hangzhou.aliyuncs.com --Id 10001
```
---
## Module 3: Alert Rule APIs
### ListDataQualityAlertRules — List Alert Rules
**Request Parameters:**
| Name | Type | Required | Description | Example |
|------|------|----------|-------------|---------|
| ProjectId | long | Yes | Workspace ID | 10001 |
| PageNumber | integer | Yes | Page number | 1 |
| PageSize | integer | Yes | Page size | 10 |
| DataQualityScanId | long | No | Filter by monitor ID | 10001 |
| SortBy | string | No | Sort field + direction | CreateTime Desc |
**SortBy options:** `CreateTime Desc`, `CreateTime Asc`, `Id Desc`, `Id Asc`
**Response Parameters:**
| Name | Type | Description |
|------|------|-------------|
| RequestId | string | Request ID |
| PageInfo.TotalCount | integer | Total count |
| PageInfo.PageNumber | integer | Current page |
| PageInfo.PageSize | integer | Page size |
| PageInfo.DataQualityAlertRules[] | array | Alert rule list |
| - Id | long | Alert rule ID |
| - ProjectId | long | Workspace ID |
| - Condition | string | Alert trigger condition expression |
| - Target.Type | string | Monitor object type (`DataQualityScan`) |
| - Target.Ids[] | array | Associated monitor IDs |
| - Notification.Channels[] | array | Alert channels |
| - Notification.Receivers[] | array | Alert recipients |
**Example:**
```bash
aliyun dataworks-public ListDataQualityAlertRules --user-agent AlibabaCloud-Agent-Skills --region cn-hangzhou --endpoint dataworks.cn-hangzhou.aliyuncs.com --ProjectId 10001 --PageNumber 1 --PageSize 10 --DataQualityScanId 10001
```
---
### GetDataQualityAlertRule — Get Alert Rule Details
**Request Parameters:**
| Name | Type | Required | Description | Example |
|------|------|----------|-------------|---------|
| Id | long | Yes | Alert rule ID | 113642 |
**Response Parameters:**
| Name | Type | Description |
|------|------|-------------|
| RequestId | string | Request ID |
| DataQualityAlertRule.Id | long | Alert rule ID |
| DataQualityAlertRule.ProjectId | long | Workspace ID |
| DataQualityAlertRule.Condition | string | Alert condition expression |
| DataQualityAlertRule.Target.Type | string | Monitor type (`DataQualityScan`) |
| DataQualityAlertRule.Target.Ids[] | array | Associated monitor ID list |
| DataQualityAlertRule.Notification.Channels[] | array | Channels: `Dingding`, `Mail`, `Weixin`, `Feishu`, `Phone`, `Sms`, `Webhook` |
| DataQualityAlertRule.Notification.Receivers[] | array | Recipients: type + values |
**Receiver types:** `ShiftSchedule`, `WebhookUrl`, `FeishuUrl`, `TaskOwner`, `WeixinUrl`, `DingdingUrl`, `DataQualityScanOwner`, `AliUid`
**Example:**
```bash
aliyun dataworks-public GetDataQualityAlertRule --user-agent AlibabaCloud-Agent-Skills --region cn-hangzhou --endpoint dataworks.cn-hangzhou.aliyuncs.com --Id 113642
```
---
## Module 4: Scan Run APIs
### ListDataQualityScanRuns — List Scan Run Records
**Request Parameters:**
| Name | Type | Required | Description | Example |
|------|------|----------|-------------|---------|
| ProjectId | long | Yes | Workspace ID | 12345 |
| DataQualityScanId | long | No | Filter by monitor ID | 10001 |
| Status | string | No | Status filter: `Pass` / `Running` / `Error` / `Fail` / `Warn` | Fail |
| CreateTimeFrom | long | No | Earliest start time (ms timestamp) | 1710239005000 |
| CreateTimeTo | long | No | Latest start time (ms timestamp) | 1710239605000 |
| SortBy | string | No | Sort field + direction | CreateTime Desc |
| PageNumber | integer | No | Page number, default 1 | 1 |
| PageSize | integer | No | Page size, default 10 | 20 |
| Filter | object | No | Extended filters: `TaskInstanceId`, `RunNumber` | `{"TaskInstanceId":"111"}` |
**SortBy options:** `CreateTime Desc`, `CreateTime Asc`, `Id Desc`, `Id Asc`
**Response Parameters:**
| Name | Type | Description |
|------|------|-------------|
| RequestId | string | Request ID |
| PageInfo.TotalCount | integer | Total count |
| PageInfo.PageNumber | integer | Current page |
| PageInfo.PageSize | integer | Page size |
| PageInfo.DataQualityScanRuns[] | array | Run record list |
| - Id | long | Scan run ID |
| - Status | string | Execution status |
| - CreateTime | long | Run start time (ms) |
| - FinishTime | long | Run end time (ms) |
| - Parameters[] | array | Runtime parameters (name + value) |
**Example:**
```bash
aliyun dataworks-public ListDataQualityScanRuns --user-agent AlibabaCloud-Agent-Skills --region cn-hangzhou --endpoint dataworks.cn-hangzhou.aliyuncs.com --ProjectId 12345 --Status Fail --SortBy "CreateTime Desc" --PageNumber 1 --PageSize 20
```
---
### GetDataQualityScanRun — Get Scan Run Details
**Request Parameters:**
| Name | Type | Required | Description | Example |
|------|------|----------|-------------|---------|
| Id | long | Yes | Scan run ID | 1006059507 |
**Response Parameters:**
| Name | Type | Description |
|------|------|-------------|
| RequestId | string | Request ID |
| DataQualityScanRun.Id | long | Scan run ID |
| DataQualityScanRun.Status | string | Overall status: `Pass` / `Running` / `Error` / `Fail` / `Warn` |
| DataQualityScanRun.CreateTime | long | Start time (ms) |
| DataQualityScanRun.FinishTime | long | End time (ms) |
| DataQualityScanRun.Scan | object | Configuration snapshot (Spec, Trigger, ComputeResource, RuntimeResource, Hooks) |
| DataQualityScanRun.Parameters[] | array | Runtime parameters (name + value) |
| DataQualityScanRun.Results[] | array | Per-rule execution results (status, metric value, threshold) |
**Example:**
```bash
aliyun dataworks-public GetDataQualityScanRun --user-agent AlibabaCloud-Agent-Skills --region cn-hangzhou --endpoint dataworks.cn-hangzhou.aliyuncs.com --Id 1006059507
```
---
### GetDataQualityScanRunLog — Get Scan Run Log
Supports pagination for large logs (max 512 KB per call). Call repeatedly with `NextOffset` until it returns `-1`.
**Request Parameters:**
| Name | Type | Required | Description | Example |
|------|------|----------|-------------|---------|
| Id | long | Yes | Scan run ID | 10001 |
| Offset | long | No | Byte offset from start of log file (default 0) | 512000 |
**Response Parameters:**
| Name | Type | Description |
|------|------|-------------|
| RequestId | string | Request ID |
| LogSegment.Log | string | Log text content (up to 512 KB) |
| LogSegment.NextOffset | long | Offset for next call; `-1` means end of log |
**Example — first segment:**
```bash
aliyun dataworks-public GetDataQualityScanRunLog --user-agent AlibabaCloud-Agent-Skills --region cn-hangzhou --endpoint dataworks.cn-hangzhou.aliyuncs.com --Id 10001
```
**Example — subsequent segment:**
```bash
aliyun dataworks-public GetDataQualityScanRunLog --user-agent AlibabaCloud-Agent-Skills --region cn-hangzhou --endpoint dataworks.cn-hangzhou.aliyuncs.com --Id 10001 --Offset 512000
```
---
## Region and Endpoints
When specifying `--region`, you **must** also add `--endpoint`.
| Scenario | Parameters |
|----------|-----------|
| Public network | `--region <REGION_ID> --endpoint dataworks.<REGION_ID>.aliyuncs.com` |
| VPC internal network | `--region <REGION_ID> --endpoint dataworks-vpc.<REGION_ID>.aliyuncs.com` |
### Common Regions
| Region Name | Region ID |
|-------------|-----------|
| China (Hangzhou) | `cn-hangzhou` |
| China (Shanghai) | `cn-shanghai` |
| China (Beijing) | `cn-beijing` |
| China (Shenzhen) | `cn-shenzhen` |
| China (Chengdu) | `cn-chengdu` |
| China (Hong Kong) | `cn-hongkong` |
| Singapore | `ap-southeast-1` |
| Indonesia (Jakarta) | `ap-southeast-5` |
| Japan (Tokyo) | `ap-northeast-1` |
| US (Virginia) | `us-east-1` |
| US (Silicon Valley) | `us-west-1` |
| Germany (Frankfurt) | `eu-central-1` |
> Endpoint naming rule: `dataworks.<REGION_ID>.aliyuncs.com` (public) / `dataworks-vpc.<REGION_ID>.aliyuncs.com` (VPC)
---
## Official Documentation Links
- [ListDataQualityTemplates](https://help.aliyun.com/zh/dataworks/developer-reference/api-dataworks-public-2024-05-18-listdataqualitytemplates)
- [GetDataQualityTemplate](https://help.aliyun.com/zh/dataworks/developer-reference/api-dataworks-public-2024-05-18-getdataqualitytemplate)
- [ListDataQualityScans](https://help.aliyun.com/zh/dataworks/developer-reference/api-dataworks-public-2024-05-18-listdataqualityscans)
- [GetDataQualityScan](https://help.aliyun.com/zh/dataworks/developer-reference/api-dataworks-public-2024-05-18-getdataqualityscan)
- [ListDataQualityAlertRules](https://help.aliyun.com/zh/dataworks/developer-reference/api-dataworks-public-2024-05-18-listdataqualityalertrules)
- [GetDataQualityAlertRule](https://help.aliyun.com/zh/dataworks/developer-reference/api-dataworks-public-2024-05-18-getdataqualityalertrule)
- [ListDataQualityScanRuns](https://help.aliyun.com/zh/dataworks/developer-reference/api-dataworks-public-2024-05-18-listdataqualityscanruns)
- [GetDataQualityScanRun](https://help.aliyun.com/zh/dataworks/developer-reference/api-dataworks-public-2024-05-18-getdataqualityscanrun)
- [GetDataQualityScanRunLog](https://help.aliyun.com/zh/dataworks/developer-reference/api-dataworks-public-2024-05-18-getdataqualityscanrunlog)