@clawhub-aipoch-ai-772015cadb
Map patient symptoms to Human Phenotype Ontology terms for gene diagnosis.
---
name: rare-disease-hpo-mapper
description: Map patient symptoms to Human Phenotype Ontology terms for gene diagnosis.
license: MIT
skill-author: AIPOCH
---
# Rare Disease HPO Mapper
Clinical phenotype standardization tool.
## When to Use
- Use this skill when the task is to Map patient symptoms to Human Phenotype Ontology terms for gene diagnosis.
- Use this skill for evidence insight tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
- Scope-focused workflow aligned to: Map patient symptoms to Human Phenotype Ontology terms for gene diagnosis.
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
See `## Prerequisites` above for related details.
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `difflib`: `unspecified`. Declared in `requirements.txt`.
## Example Usage
```bash
cd "20260318/scientific-skills/Evidence Insight/rare-disease-hpo-mapper"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Use Cases
- Exome/genome analysis
- Rare disease diagnosis
- Genetic counseling
- Research cohort building
## Parameters
- `symptoms`: Clinical description
- `age_onset`: Pediatric/adult
- `inheritance_pattern`: AD/AR/XL
## Returns
- HPO term suggestions
- Confidence scores
- Differential diagnosis genes
- Literature links
## Example
"Wide-set eyes" → HP:0000316 (Hypertelorism)
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
```text
# Python dependencies
pip install -r requirements.txt
```
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `rare-disease-hpo-mapper` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `rare-disease-hpo-mapper` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## References
- [references/audit-reference.md](references/audit-reference.md) - Supported scope, audit commands, and fallback boundaries
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:references/audit-reference.md
# Audit Reference
## Scope
- Skill: `rare-disease-hpo-mapper`
- Core purpose: Map patient symptoms to Human Phenotype Ontology terms for gene diagnosis.
- Use only within the documented workflow and category boundary defined in `SKILL.md`
## Supported Audit Paths
- `python -m py_compile scripts/main.py`
- `python scripts/main.py --help`
## Fallback Boundary
If required inputs are incomplete, the skill should still return:
- the missing required inputs
- the steps that can still be completed safely
- assumptions that need confirmation before execution
- the next checks before accepting the final deliverable
FILE:requirements.txt
difflib
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Rare Disease HPO Mapper
Map patient symptoms to Human Phenotype Ontology terms.
"""
import argparse
from difflib import get_close_matches
class HPOMapper:
"""Map symptoms to HPO terms."""
HPO_TERMS = {
"HP:0001250": {"name": "Seizure", "synonyms": ["seizures", "epilepsy", "fits"]},
"HP:0001263": {"name": "Global developmental delay", "synonyms": ["developmental delay", "DD"]},
"HP:0004322": {"name": "Short stature", "synonyms": ["short", "small stature"]},
"HP:0001638": {"name": "Cardiomyopathy", "synonyms": ["heart muscle disease"]},
"HP:0000518": {"name": "Cataract", "synonyms": ["lens opacity"]},
"HP:0001251": {"name": "Ataxia", "synonyms": ["coordination problems", "unsteady gait"]}
}
def find_hpo_term(self, symptom):
"""Find HPO term for symptom."""
symptom_lower = symptom.lower()
# Direct match
for hpo_id, data in self.HPO_TERMS.items():
if symptom_lower == data["name"].lower():
return hpo_id, data
for synonym in data["synonyms"]:
if symptom_lower == synonym.lower():
return hpo_id, data
# Fuzzy match
all_names = []
for data in self.HPO_TERMS.values():
all_names.append(data["name"])
all_names.extend(data["synonyms"])
matches = get_close_matches(symptom_lower, all_names, n=1, cutoff=0.6)
if matches:
for hpo_id, data in self.HPO_TERMS.items():
if matches[0] == data["name"] or matches[0] in data["synonyms"]:
return hpo_id, data
return None, None
def map_symptoms(self, symptoms):
"""Map list of symptoms to HPO terms."""
mappings = []
for symptom in symptoms:
hpo_id, data = self.find_hpo_term(symptom)
if hpo_id:
mappings.append({
"symptom": symptom,
"hpo_id": hpo_id,
"hpo_name": data["name"],
"confidence": "high"
})
else:
mappings.append({
"symptom": symptom,
"hpo_id": None,
"hpo_name": "Unknown",
"confidence": "none"
})
return mappings
def main():
parser = argparse.ArgumentParser(description="Rare Disease HPO Mapper")
parser.add_argument("--symptoms", "-s", required=True, help="Comma-separated symptoms")
args = parser.parse_args()
mapper = HPOMapper()
symptoms = [s.strip() for s in args.symptoms.split(",")]
mappings = mapper.map_symptoms(symptoms)
print(f"\n{'='*60}")
print("HPO MAPPING RESULTS")
print(f"{'='*60}\n")
for m in mappings:
if m["hpo_id"]:
print(f"✓ {m['symptom']} → {m['hpo_id']} ({m['hpo_name']})")
else:
print(f"✗ {m['symptom']} → No match found")
print(f"\n{'='*60}\n")
if __name__ == "__main__":
main()
Generate block randomization lists for RCTs
---
name: randomization-gen
description: Generate block randomization lists for RCTs
version: 1.0.0
category: Pharma
tags: []
author: AIPOCH
license: MIT
status: Draft
risk_level: Medium
skill_type: Tool/Script
owner: AIPOCH
reviewer: ''
last_updated: '2026-02-06'
---
# Randomization Gen
RCT randomization table generator.
## Use Cases
- Clinical trial design
- Animal study randomization
- Blocked randomization
- Stratified allocation
## Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `n_subjects` | int | Yes | Total sample size |
| `n_groups` | int | Yes | Number of arms/groups |
| `block_size` | int | Yes | Block size (must be multiple of n_groups) |
| `--output` | string | No | Output file path (default: randomization.txt) |
## Returns
- Randomization sequence
- Block assignments
- Allocation concealment ready
## Example
Input: n=120, 3 groups, block=6
Output: Sealed randomization list
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
No additional Python packages required.
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Randomization Generator
Generate block randomization lists for RCTs.
"""
import argparse
import random
import csv
class RandomizationGenerator:
"""Generate randomization schedules."""
def block_randomization(self, n_subjects, groups, block_size):
"""Generate block randomization."""
if block_size % len(groups) != 0:
raise ValueError("Block size must be divisible by number of groups")
schedule = []
subject_id = 1
while subject_id <= n_subjects:
block = []
for group in groups:
block.extend([group] * (block_size // len(groups)))
random.shuffle(block)
for assignment in block:
if subject_id <= n_subjects:
schedule.append({
"subject_id": subject_id,
"group": assignment,
"block": (subject_id - 1) // block_size + 1
})
subject_id += 1
return schedule
def stratified_randomization(self, n_subjects, groups, strata, block_size):
"""Generate stratified randomization."""
all_schedules = []
subjects_per_stratum = n_subjects // len(strata)
for stratum in strata:
stratum_schedule = self.block_randomization(subjects_per_stratum, groups, block_size)
for entry in stratum_schedule:
entry["stratum"] = stratum
all_schedules.extend(stratum_schedule)
return all_schedules
def export_csv(self, schedule, filename):
"""Export schedule to CSV."""
with open(filename, 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=schedule[0].keys())
writer.writeheader()
writer.writerows(schedule)
def main():
parser = argparse.ArgumentParser(description="Randomization Generator")
parser.add_argument("--subjects", "-n", type=int, default=100, help="Number of subjects")
parser.add_argument("--groups", "-g", default="Control,Treatment", help="Groups (comma-separated)")
parser.add_argument("--block-size", "-b", type=int, default=4, help="Block size")
parser.add_argument("--output", "-o", default="randomization.csv", help="Output file")
args = parser.parse_args()
generator = RandomizationGenerator()
groups = [g.strip() for g in args.groups.split(",")]
schedule = generator.block_randomization(args.subjects, groups, args.block_size)
print(f"Generated randomization for {args.subjects} subjects")
print(f"Groups: {groups}")
print(f"Block size: {args.block_size}")
# Show first 10
print("\nFirst 10 allocations:")
for entry in schedule[:10]:
print(f" Subject {entry['subject_id']}: {entry['group']}")
generator.export_csv(schedule, args.output)
print(f"\nFull schedule saved to: {args.output}")
if __name__ == "__main__":
main()
Use when creating radiology educational quizzes, preparing board exam questions, or studying medical imaging cases. Generates interactive quizzes with X-ray,...
---
name: radiology-image-quiz
description: Use when creating radiology educational quizzes, preparing board exam questions, or studying medical imaging cases. Generates interactive quizzes with X-ray, CT, MRI, and ultrasound images for medical education.
license: MIT
skill-author: AIPOCH
---
# Radiology Image Quiz Generator
Create educational quizzes using radiology images (X-ray, CT, MRI, ultrasound) for medical students, residents, and board exam preparation.
## When to Use
- Use this skill when the task needs Use when creating radiology educational quizzes, preparing board exam questions, or studying medical imaging cases. Generates interactive quizzes with X-ray, CT, MRI, and ultrasound images for medical education.
- Use this skill for academic writing tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
- Scope-focused workflow aligned to: Use when creating radiology educational quizzes, preparing board exam questions, or studying medical imaging cases. Generates interactive quizzes with X-ray, CT, MRI, and ultrasound images for medical education.
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `Third-party packages`: `not explicitly version-pinned in this skill package`. Add pinned versions if this skill needs stricter environment control.
## Example Usage
```bash
cd "20260318/scientific-skills/Academic Writing/radiology-image-quiz"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Quick Start
```python
from scripts.radiology_quiz import RadiologyQuiz
quiz = RadiologyQuiz()
# Generate quiz
questions = quiz.generate(
modality="chest_xray",
difficulty="intermediate",
topic="pulmonary_pathology",
num_questions=10
)
```
## Core Capabilities
### 1. Quiz Generation
```python
quiz = quiz.create(
images=["case1.png", "case2.png"],
question_type="multiple_choice",
include_findings=True,
include_differential=True
)
```
**Question Types:**
- Multiple choice (single best answer)
- Select all that apply
- Fill in the blank
- Open-ended interpretation
### 2. Case Creation
```python
case = quiz.create_case(
image_path="ct_scan.png",
diagnosis="Pulmonary embolism",
findings=["Filling defect in pulmonary artery", "Right heart strain"],
clinical_history="Sudden onset dyspnea"
)
```
### 3. Difficulty Calibration
```python
quiz = quiz.set_difficulty(
level="resident", # medical_student, resident, fellow, attending
include_rare_findings=False
)
```
## CLI Usage
```text
python scripts/radiology_quiz.py \
--modality ct \
--topic emergency \
--num 20 \
--output quiz.pdf
```
---
**Skill ID**: 212 | **Version**: 1.0 | **License**: MIT
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `radiology-image-quiz` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `radiology-image-quiz` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## References
- [references/audit-reference.md](references/audit-reference.md) - Supported scope, audit commands, and fallback boundaries
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:references/audit-reference.md
# Audit Reference
## Scope
- Skill: `radiology-image-quiz`
- Core purpose: Use when creating radiology educational quizzes, preparing board exam questions, or studying medical imaging cases. Generates interactive quizzes with X-ray, CT, MRI, and ultrasound images for medical education.
- Use only within the documented workflow and category boundary defined in `SKILL.md`
## Supported Audit Paths
- `python -m py_compile scripts/main.py`
- `python scripts/main.py --help`
## Fallback Boundary
If required inputs are incomplete, the skill should still return:
- the missing required inputs
- the steps that can still be completed safely
- assumptions that need confirmation before execution
- the next checks before accepting the final deliverable
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Radiology Image Quiz
Generate image-based diagnostic quizzes from text descriptions.
"""
import argparse
import json
class RadiologyQuiz:
"""Generate radiology quizzes."""
def generate_quiz(self, cases):
"""Generate quiz from cases."""
quiz = []
for i, case in enumerate(cases, 1):
quiz.append(f"CASE {i}")
quiz.append("-"*60)
quiz.append(f"History: {case['history']}")
quiz.append(f"Findings: {case['findings']}")
quiz.append("")
quiz.append("What is the most likely diagnosis?")
for j, option in enumerate(case['options'], 1):
quiz.append(f" {j}. {option}")
quiz.append("")
quiz.append(f"Answer: {case['answer']}")
quiz.append(f"Explanation: {case['explanation']}")
quiz.append("")
return "\n".join(quiz)
def main():
parser = argparse.ArgumentParser(description="Radiology Image Quiz")
parser.add_argument("--cases", "-c", help="JSON file with cases")
parser.add_argument("--demo", action="store_true", help="Generate demo quiz")
args = parser.parse_args()
quiz_gen = RadiologyQuiz()
if args.demo:
cases = [
{
"history": "65-year-old male with chest pain",
"findings": "CT shows peripheral wedge-shaped opacity",
"options": ["Pulmonary embolism", "Pneumonia", "Lung cancer", "Atelectasis"],
"answer": "Pulmonary embolism",
"explanation": "Wedge-shaped peripheral opacity is classic for pulmonary infarction"
}
]
elif args.cases:
with open(args.cases) as f:
cases = json.load(f)
else:
print("Use --demo or provide --cases file")
return
quiz = quiz_gen.generate_quiz(cases)
print(quiz)
if __name__ == "__main__":
main()
Predict challenging questions for presentations and prepare responses
---
name: q-and-a-prep-partner
description: Predict challenging questions for presentations and prepare responses
version: 1.0.0
category: Present
tags: []
author: AIPOCH
license: MIT
status: Draft
risk_level: Medium
skill_type: Tool/Script
owner: AIPOCH
reviewer: ''
last_updated: '2026-02-06'
---
# Q&A Prep Partner
Predict challenging questions for presentations and prepare structured responses.
## Usage
```bash
python scripts/main.py --abstract abstract.txt --field oncology
python scripts/main.py --topic "CRISPR therapy" --audience experts
```
## Parameters
- `--abstract`: Abstract text or file
- `--topic`: Research topic
- `--field`: Research field
- `--audience`: Audience type (general/experts/peers)
- `--n-questions`: Number of questions to generate (default: 10)
## Question Types
1. Methodology questions
2. Statistical questions
3. Interpretation questions
4. Limitation questions
5. Future work questions
6. Comparison questions
## Output
- Predicted questions
- Suggested response frameworks
- Key points to address
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
No additional Python packages required.
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Q&A Prep Partner
Predict presentation questions and prepare responses.
"""
import argparse
import random
class QAPrepPartner:
"""Prepare for presentation Q&A."""
QUESTION_TEMPLATES = {
"methodology": [
"Can you elaborate on your {method} approach?",
"Why did you choose {method} over alternatives?",
"How did you validate your {method}?",
"What are the limitations of using {method}?"
],
"statistics": [
"Was your sample size adequately powered?",
"Did you correct for multiple comparisons?",
"Can you explain the statistical significance of your findings?",
"What was your effect size?"
],
"interpretation": [
"How do you interpret these findings in context of {field}?",
"Are your results generalizable to {population}?",
"What is the clinical significance of these results?",
"How do you explain the {observation}?"
],
"limitations": [
"What are the main limitations of this study?",
"How might {limitation} affect your conclusions?",
"What biases might be present in your data?",
"What couldn't you measure or control?"
],
"future": [
"What are the next steps for this research?",
"How do you plan to follow up on these findings?",
"What would you do differently if you started over?",
"What unanswered questions remain?"
],
"comparison": [
"How do your results compare to {previous_work}?",
"Why do your findings differ from {other_study}?",
"How does this advance beyond {prior_art}?"
]
}
RESPONSE_FRAMEWORKS = {
"methodology": "1. Acknowledge the question\n2. Explain rationale\n3. Describe validation\n4. Note limitations",
"statistics": "1. Confirm the analysis\n2. State the numbers\n3. Explain significance\n4. Note assumptions",
"interpretation": "1. Restate key finding\n2. Provide context\n3. Address nuance\n4. Acknowledge uncertainty",
"limitations": "1. Acknowledge limitation\n2. Explain impact\n3. Describe mitigation\n4. Suggest future improvement",
"future": "1. Summarize current work\n2. Propose next steps\n3. Describe timeline\n4. State expected impact",
"comparison": "1. Acknowledge prior work\n2. Highlight key differences\n3. Explain your contribution\n4. Discuss implications"
}
def generate_questions(self, topic, field, audience, n=10):
"""Generate predicted questions."""
questions = []
# Select categories based on audience
if audience == "experts":
categories = ["methodology", "statistics", "comparison", "limitations"]
elif audience == "peers":
categories = ["methodology", "interpretation", "future", "limitations"]
else: # general
categories = ["interpretation", "future", "limitations"]
for i in range(n):
category = random.choice(categories)
template = random.choice(self.QUESTION_TEMPLATES[category])
question = template.format(
method="proposed method",
field=field,
population="broader populations",
observation="unexpected finding",
limitation="sample size",
previous_work="Smith et al. 2023",
other_study="previous research"
)
questions.append({
"number": i + 1,
"category": category,
"question": question,
"framework": self.RESPONSE_FRAMEWORKS[category]
})
return questions
def print_prep_guide(self, questions):
"""Print preparation guide."""
print("\n" + "="*70)
print("Q&A PREPARATION GUIDE")
print("="*70)
for q in questions:
print(f"\n{q['number']}. [{q['category'].upper()}]")
print(f"Q: {q['question']}")
print(f"\nResponse Framework:")
for line in q['framework'].split('\n'):
print(f" {line}")
print("-"*70)
def main():
parser = argparse.ArgumentParser(description="Q&A Prep Partner")
parser.add_argument("--abstract", "-a", help="Abstract text or file")
parser.add_argument("--topic", "-t", help="Research topic")
parser.add_argument("--field", "-f", default="general",
help="Research field")
parser.add_argument("--audience", choices=["general", "peers", "experts"],
default="peers", help="Audience type")
parser.add_argument("--n-questions", "-n", type=int, default=10,
help="Number of questions")
args = parser.parse_args()
partner = QAPrepPartner()
topic = args.topic or "your research"
if args.abstract:
try:
with open(args.abstract) as f:
topic = f.read()[:100] + "..."
except:
topic = args.abstract[:100]
questions = partner.generate_questions(
topic, args.field, args.audience, args.n_questions
)
partner.print_prep_guide(questions)
print(f"\n✓ Generated {len(questions)} potential questions")
print(f"✓ Audience: {args.audience}")
print(f"✓ Field: {args.field}")
print("\nTip: Practice your responses out loud!")
if __name__ == "__main__":
main()
Build complex Boolean query strings for precise PubMed/MEDLINE literature retrieval. Trigger when user needs MeSH term mapping, Boolean query construction, a...
---
name: pubmed-search-specialist
description: Build complex Boolean query strings for precise PubMed/MEDLINE literature retrieval. Trigger when user needs MeSH term mapping, Boolean query construction, advanced PubMed filters, citation searching, systematic review search strategy, or clinical query optimization.
license: MIT
skill-author: AIPOCH
---
# PubMed Search Specialist
Expert tool for constructing sophisticated Boolean queries to search PubMed/MEDLINE database with precision.
## When to Use
- Use this skill when the task needs Build complex Boolean query strings for precise PubMed/MEDLINE literature retrieval. Trigger when user needs MeSH term mapping, Boolean query construction, advanced PubMed filters, citation searching, systematic review search strategy, or clinical query optimization.
- Use this skill for evidence insight tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
- Scope-focused workflow aligned to: Build complex Boolean query strings for precise PubMed/MEDLINE literature retrieval. Trigger when user needs MeSH term mapping, Boolean query construction, advanced PubMed filters, citation searching, systematic review search strategy, or clinical query optimization.
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
See `## Prerequisites` above for related details.
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `dataclasses`: `unspecified`. Declared in `requirements.txt`.
- `requests`: `unspecified`. Declared in `requirements.txt`.
## Example Usage
```bash
cd "20260318/scientific-skills/Evidence Insight/pubmed-search-specialist"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Core Capabilities
- **MeSH Term Mapping**: Convert natural language concepts to standardized Medical Subject Headings
- **Boolean Query Builder**: Construct complex nested queries with AND/OR/NOT operators
- **Advanced Filters**: Apply study type, date, language, age, and species filters
- **Search Strategy Optimization**: Refine sensitivity vs specificity trade-offs
## Usage Workflow
### 1. Concept Extraction
Extract key concepts from user's research question using PICO framework:
- **P**opulation/Problem
- **I**ntervention
- **C**omparison
- **O**utcome
### 2. MeSH Term Mapping
For each concept, identify appropriate MeSH terms:
- Preferred terms (mapped to MeSH hierarchy)
- Entry terms (synonyms mapped to preferred)
- Subheadings for precision
- Explode vs Focus options
### 3. Boolean Construction
Build query strings following PubMed syntax:
```
("Term"[MeSH Terms] OR "Term"[Title/Abstract] OR synonym[Title/Abstract])
```
### 4. Filter Application
Append filters as needed:
- Publication dates: `from 2020 to 2024`
- Article types: `Clinical Trial`, `Review`, `Meta-Analysis`
- Species: `humans[MeSH Terms]` or `animals[MeSH Terms]`
- Languages: `english[Language]`
- Age groups: `adult[MeSH Terms]`, `aged[MeSH Terms]`
### 5. Search Strategy Output
Provide complete, copy-paste ready PubMed search string with:
- Line-by-line breakdown
- Estimated result count guidance
- Alternative strategies for sensitivity/specificity balance
## Key MeSH Features
| Feature | Syntax | Use Case |
|---------|--------|----------|
| MeSH Terms | `"Diabetes Mellitus"[MeSH Terms]` | Subject heading search |
| MeSH Major Topic | `"Diabetes Mellitus"[MeSH Major Topic]` | Core focus articles |
| Explode | `"Diabetes Mellitus"[MeSH Terms:noexp]` | Exclude subcategories |
| Subheadings | `"Diabetes Mellitus/drug therapy"[MeSH Terms]` | Specific aspects |
| Entry Terms | `"Blood Sugar"[Title/Abstract]` | Non-MeSH synonyms |
## Boolean Operators
- **AND**: Both terms must appear (narrows search)
- **OR**: Either term may appear (broadens search)
- **NOT**: Exclude terms (use sparingly)
**Operator Precedence**: Use parentheses to control evaluation order.
## Field Tags Reference
| Tag | Field | Example |
|-----|-------|---------|
| `[MeSH Terms]` | Medical Subject Headings | `"Hypertension"[MeSH Terms]` |
| `[Title]` | Article title only | `"stroke"[Title]` |
| `[Title/Abstract]` | Title and abstract | `"aspirin"[Title/Abstract]` |
| `[Author]` | Author name | `"Smith J"[Author]` |
| `[Journal]` | Journal name | `"Lancet"[Journal]` |
| `[Publication Date]` | Date range | `2020:2024[Publication Date]` |
| `[Language]` | Article language | `english[Language]` |
| `[Publication Type]` | Article type | `clinical trial[Publication Type]` |
## Clinical Query Filters
### Therapy
```
(randomized controlled trial[Publication Type] OR (randomized[Title/Abstract] AND controlled[Title/Abstract] AND trial[Title/Abstract]))
```
### Diagnosis
```
(sensitivity and specificity[MeSH Terms] OR sensitivity[Title/Abstract] OR specificity[Title/Abstract] OR diagnostic accuracy[Title/Abstract])
```
### Prognosis
```
(incidence[MeSH Terms] OR mortality[MeSH Terms] OR follow-up studies[MeSH Terms] OR prognos*[Title/Abstract] OR predict*[Title/Abstract])
```
### Etiology
```
(risk[MeSH Terms] OR (risk factors[MeSH Terms]) OR (risk[Title/Abstract] AND factor*[Title/Abstract]))
```
## Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `--population` | str | Required | Population/Problem |
| `--intervention` | str | Required | Intervention |
| `--comparison` | str | Required | Comparison |
| `--outcome` | str | Required | Outcome |
| `--study-type` | str | Required | Clinical query category |
| `--format` | str | 'lines' | Output format |
## Example: Complete Search Strategy
**Research Question**: Does aspirin reduce stroke risk in diabetic patients?
**Line 1 - Population**:
```
("Diabetes Mellitus"[MeSH Terms] OR "Diabetic"[Title/Abstract] OR "Diabetics"[Title/Abstract])
```
**Line 2 - Intervention**:
```
("Aspirin"[MeSH Terms] OR "Acetylsalicylic Acid"[Title/Abstract] OR "aspirin"[Title/Abstract])
```
**Line 3 - Outcome**:
```
("Stroke"[MeSH Terms] OR "Cerebrovascular Accident"[Title/Abstract] OR "stroke"[Title/Abstract] OR "cerebrovascular"[Title/Abstract])
```
**Line 4 - Study Type Filter**:
```
(randomized controlled trial[Publication Type] OR systematic review[Publication Type] OR meta-analysis[Publication Type])
```
**Final Query**:
```
(("Diabetes Mellitus"[MeSH Terms] OR "Diabetic"[Title/Abstract] OR "Diabetics"[Title/Abstract]) AND ("Aspirin"[MeSH Terms] OR "Acetylsalicylic Acid"[Title/Abstract] OR "aspirin"[Title/Abstract]) AND ("Stroke"[MeSH Terms] OR "Cerebrovascular Accident"[Title/Abstract] OR "stroke"[Title/Abstract] OR "cerebrovascular"[Title/Abstract]) AND (randomized controlled trial[Publication Type] OR systematic review[Publication Type] OR meta-analysis[Publication Type]))
```
## MeSH Browser Usage
When mapping terms:
1. Check MeSH Browser for exact term hierarchy
2. Note tree numbers for related terms
3. Identify entry terms (synonyms)
4. Consider subheadings for precision
5. Decide on explode vs noexp based on scope needs
## Quality Checklist
Before finalizing query:
- [ ] All concepts covered with OR within, AND between groups
- [ ] MeSH terms verified against current MeSH database
- [ ] Free-text synonyms included for completeness
- [ ] Filters appropriate for research question
- [ ] Parentheses balanced and precedence correct
- [ ] Copy-paste ready for PubMed search box
## Technical Difficulty
🔴 **High** - Requires understanding of:
- MeSH hierarchical structure and term relationships
- Boolean logic and operator precedence
- Field tag semantics and limitations
- Search sensitivity vs specificity trade-offs
- Clinical query methodology
⚠️ **Verification Required**: MeSH terms change annually. Always verify current MeSH version at https://meshb.nlm.nih.gov/
## References
See `references/mesh-structure.md` for detailed MeSH hierarchy guidance.
See `references/boolean-examples.md` for categorized query templates.
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python scripts executed locally | Medium |
| Network Access | PubMed E-utilities API calls | High |
| File System Access | Read/write search strategies | Low |
| Instruction Tampering | Query construction guidelines | Low |
| Data Exposure | Search terms logged locally | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] NCBI API requests use HTTPS only
- [ ] API rate limits respected (max 3 requests/second without API key)
- [ ] Input validation for search terms (injection prevention)
- [ ] Output directory restricted to workspace
- [ ] Error messages sanitized (no internal paths exposed)
- [ ] API timeout and retry mechanisms implemented
- [ ] No exposure of internal service architecture
## Prerequisites
```text
# Python dependencies
pip install -r requirements.txt
# Optional: NCBI API key for higher rate limits
# Set as environment variable: NCBI_API_KEY
```
## Evaluation Criteria
### Success Metrics
- [ ] Successfully constructs valid PubMed Boolean queries
- [ ] MeSH term mapping is accurate and current
- [ ] Query syntax is copy-paste ready for PubMed
- [ ] Provides sensitivity/specificity trade-off options
- [ ] Handles complex multi-concept research questions
- [ ] Estimated result counts are reasonable
### Test Cases
1. **Basic Query**: "diabetes treatment" → Valid MeSH-based query
2. **PICO Framework**: Complex clinical question → Complete search strategy
3. **MeSH Mapping**: Free-text term → Correct MeSH term identification
4. **Boolean Logic**: Multiple concepts → Properly nested AND/OR/NOT
5. **Clinical Query**: Therapy-focused question → Includes appropriate filters
6. **API Integration**: Execute search via E-utilities → Successful retrieval
7. **Error Handling**: Invalid search term → Graceful error with suggestions
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**:
- MeSH terms updated annually, may need periodic validation
- API rate limits without key
- **Planned Improvements**:
- Integration with NCBI API key support for higher rate limits
- Automatic MeSH term validation against current database
- Support for additional databases (Embase, Cochrane)
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `pubmed-search-specialist` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `pubmed-search-specialist` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:references/boolean-examples.md
# Boolean Query Examples
## Basic Patterns
### Single Concept with MeSH and Text Words
```
("Diabetes Mellitus"[MeSH Terms] OR "diabetic"[Title/Abstract] OR "diabetics"[Title/Abstract])
```
### Two Concepts Combined
```
("Diabetes Mellitus"[MeSH Terms] OR "diabetic"[Title/Abstract]) AND
("Aspirin"[MeSH Terms] OR "aspirin"[Title/Abstract])
```
### Three or More Concepts
```
(("Diabetes Mellitus"[MeSH Terms] OR "diabetic"[Title/Abstract]) AND
("Aspirin"[MeSH Terms] OR "aspirin"[Title/Abstract]) AND
("Stroke"[MeSH Terms] OR "stroke"[Title/Abstract]))
```
## By Search Type
### Systematic Review Search (High Sensitivity)
```
(("Diabetes Mellitus, Type 2"[MeSH Terms] OR "type 2 diabetes"[Title/Abstract] OR
"T2DM"[Title/Abstract] OR "NIDDM"[Title/Abstract] OR "adult onset diabetes"[Title/Abstract] OR
"diabetic"[Title/Abstract]) AND
("Metformin"[MeSH Terms] OR "metformin"[Title/Abstract] OR "glucophage"[Title/Abstract]) AND
("cardiovascular"[Title/Abstract] OR "cardiac"[Title/Abstract] OR "heart"[Title/Abstract] OR
"myocardial"[Title/Abstract] OR "coronary"[Title/Abstract] OR "stroke"[Title/Abstract] OR
"cerebrovascular"[Title/Abstract]))
```
### Rapid Search (High Specificity)
```
("Diabetes Mellitus, Type 2/drug therapy"[MeSH Major Topic] AND
"Metformin"[MeSH Major Topic] AND
"Cardiovascular Diseases/prevention & control"[MeSH Terms])
```
### Clinical Trial Search
```
(("Diabetes Mellitus"[MeSH Terms] OR "diabetic"[Title/Abstract]) AND
("SGLT2 Inhibitors"[MeSH Terms] OR "dapagliflozin"[Title/Abstract] OR
"empagliflozin"[Title/Abstract] OR "canagliflozin"[Title/Abstract]) AND
(randomized controlled trial[Publication Type] OR
(randomized[Title/Abstract] AND controlled[Title/Abstract] AND trial[Title/Abstract])))
```
## By Clinical Domain
### Therapy
```
(("condition"[MeSH Terms] OR "condition"[Title/Abstract]) AND
("intervention"[MeSH Terms] OR "intervention"[Title/Abstract]) AND
(randomized controlled trial[Publication Type] OR controlled clinical trial[Publication Type] OR
randomized[Title/Abstract] OR placebo[Title/Abstract] OR "clinical trial"[Publication Type]))
```
### Diagnosis
```
(("condition/diagnosis"[MeSH Terms] OR "condition"[Title/Abstract]) AND
("diagnostic test"[MeSH Terms] OR "test name"[Title/Abstract]) AND
(sensitivity[Title/Abstract] OR specificity[Title/Abstract] OR
"diagnostic accuracy"[Title/Abstract] OR "roc curve"[Title/Abstract] OR
"likelihood ratio"[Title/Abstract] OR "predictive value"[Title/Abstract]))
```
### Prognosis
```
(("condition"[MeSH Terms] OR "condition"[Title/Abstract]) AND
("risk factor"[MeSH Terms] OR prognos*[Title/Abstract] OR predict*[Title/Abstract] OR
outlook*[Title/Abstract] OR course[Title/Abstract]) AND
(cohort studies[Publication Type] OR follow-up studies[MeSH Terms] OR
prospective[Title/Abstract] OR longitudinal[Title/Abstract]))
```
### Harm/Etiology
```
(("condition"[MeSH Terms] OR "condition"[Title/Abstract]) AND
("exposure"[MeSH Terms] OR "exposure"[Title/Abstract]) AND
(risk[Title/Abstract] OR associat*[Title/Abstract] OR caus*[Title/Abstract] OR
"relative risk"[Title/Abstract] OR "odds ratio"[Title/Abstract] OR
"hazard ratio"[Title/Abstract]))
```
## With Filters
### Human Studies Only
```
(query) AND humans[MeSH Terms]
```
### English Language
```
(query) AND english[Language]
```
### Last 5 Years
```
(query) AND 2020:2025[Publication Date]
```
### Adult Population
```
(query) AND adult[MeSH Terms]
```
### Meta-Analyses Only
```
(query) AND meta-analysis[Publication Type]
```
### Multiple Filters Combined
```
(query) AND humans[MeSH Terms] AND english[Language] AND
2020:2025[Publication Date] AND adult[MeSH Terms]
```
## Complex Nested Examples
### Multi-Condition Search
```
(("Diabetes Mellitus"[MeSH Terms] OR "Hypertension"[MeSH Terms]) AND
("Kidney Diseases"[MeSH Terms] OR "nephropathy"[Title/Abstract]) AND
("ACE Inhibitors"[MeSH Terms] OR "Angiotensin Receptor Antagonists"[MeSH Terms]))
```
### Drug Class Search
```
(("Depressive Disorder, Major"[MeSH Terms] OR "major depression"[Title/Abstract] OR
"major depressive disorder"[Title/Abstract]) AND
("Serotonin Uptake Inhibitors"[MeSH Terms] OR "SSRIs"[Title/Abstract] OR
"fluoxetine"[Title/Abstract] OR "sertraline"[Title/Abstract] OR
"paroxetine"[Title/Abstract] OR "citalopram"[Title/Abstract] OR
"escitalopram"[Title/Abstract]))
```
### Surgical Intervention
```
(("Obesity, Morbid"[MeSH Terms] OR "morbid obesity"[Title/Abstract] OR
"BMI"[Title/Abstract] OR "body mass index"[Title/Abstract]) AND
("Bariatric Surgery"[MeSH Terms] OR "gastric bypass"[Title/Abstract] OR
"sleeve gastrectomy"[Title/Abstract] OR "gastric banding"[Title/Abstract] OR
"roux-en-y"[Title/Abstract]))
```
## Line-by-Line Strategy Template
```
# 1. Population
("Population MeSH"[MeSH Terms] OR "synonym1"[Title/Abstract] OR "synonym2"[Title/Abstract])
# 2. Intervention
("Intervention MeSH"[MeSH Terms] OR "synonym1"[Title/Abstract] OR "synonym2"[Title/Abstract])
# 3. Outcome
("Outcome MeSH"[MeSH Terms] OR "synonym1"[Title/Abstract] OR "synonym2"[Title/Abstract])
# 4. Study Type Filter
(randomized controlled trial[Publication Type] OR systematic review[Publication Type])
# 5. Population Filter
humans[MeSH Terms] AND adult[MeSH Terms]
# 6. Language/Date
english[Language] AND 2020:2025[Publication Date]
# Final Query
(#1 AND #2 AND #3 AND #4 AND #5 AND #6)
```
FILE:references/mesh-structure.md
# MeSH Structure Guide
## What is MeSH?
Medical Subject Headings (MeSH) is the National Library of Medicine's controlled vocabulary thesaurus used for indexing articles in PubMed/MEDLINE.
## MeSH Hierarchy
MeSH terms are organized in a hierarchical tree structure with 16 main categories:
| Tree Number Prefix | Category |
|-------------------|----------|
| A | Anatomy |
| B | Organisms |
| C | Diseases |
| D | Chemicals and Drugs |
| E | Analytical, Diagnostic and Therapeutic Techniques |
| F | Psychiatry and Psychology |
| G | Phenomena and Processes |
| H | Disciplines and Occupations |
| I | Anthropology, Education, Sociology |
| J | Technology, Industry, Agriculture |
| K | Humanities |
| L | Information Science |
| M | Named Groups |
| N | Health Care |
| V | Publication Characteristics |
| Z | Geographicals |
## Tree Structure Example
```
C - Diseases
└── C15 - Hemic and Lymphatic Diseases
└── C15.378 - Hematologic Diseases
└── C15.378.100 - Anemia
├── C15.378.100.100 - Anemia, Aplastic
├── C15.378.100.141 - Anemia, Hemolytic
└── C15.378.100.855 - Anemia, Sickle Cell
```
## Explode vs NoExp
### Explode (Default)
Includes the term AND all more specific terms below it in the hierarchy.
```
"Anemia"[MeSH Terms] # Includes Anemia, Aplastic, Hemolytic, Sickle Cell, etc.
```
### No Explode
Only includes the exact term, excluding more specific terms.
```
"Anemia"[MeSH Terms:noexp] # Only general Anemia articles
```
## Entry Terms (Synonyms)
Each MeSH term has associated entry terms (synonyms) that map to it:
| MeSH Term | Entry Terms |
|-----------|-------------|
| Myocardial Infarction | Heart Attack, Cardiac Infarction, MI |
| Cerebrovascular Accident | Stroke, Brain Attack, CVA |
| Acetylsalicylic Acid | Aspirin, ASA |
| Neoplasms | Cancer, Malignancy, Tumor |
## Subheadings (Qualifiers)
Subheadings narrow MeSH terms to specific aspects:
| Subheading | Code | Use For |
|------------|------|---------|
| /adverse effects | AE | Side effects of drugs/procedures |
| /blood | BL | Blood levels, blood studies |
| /diagnosis | DI | Diagnostic procedures |
| /drug therapy | DT | Drug treatment |
| /epidemiology | EP | Incidence, prevalence |
| /etiology | ET | Causes |
| /genetics | GE | Genetic aspects |
| /mortality | MO | Death rates |
| /pathology | PA | Disease pathology |
| /prevention & control | PC | Preventive measures |
| /therapy | TH | Treatment generally |
### Subheading Syntax
```
"Diabetes Mellitus/drug therapy"[MeSH Terms]
"Hypertension/epidemiology"[MeSH Terms]
"Neoplasms/mortality"[MeSH Terms]
```
## MeSH Major Topic
Limits to articles where the term is a major focus (starred in MEDLINE):
```
"Diabetes Mellitus"[MeSH Major Topic]
```
Use when:
- Too many results with regular MeSH search
- Topic is central to research question
- High precision needed
## Checking Current MeSH
MeSH terms are updated annually. Always verify at:
- https://meshb.nlm.nih.gov/ (MeSH Browser)
- https://www.ncbi.nlm.nih.gov/mesh
## Common Pitfalls
1. **Outdated terms**: MeSH changes; check current version
2. **US vs UK spelling**: Use MeSH preferred spelling
3. **Case sensitivity**: MeSH terms are case-sensitive in quotes
4. **Explosion scope**: Consider if you need all subtypes
5. **Subheading compatibility**: Not all subheadings work with all terms
FILE:requirements.txt
dataclasses
requests
FILE:scripts/main.py
#!/usr/bin/env python3
"""
PubMed Search Specialist
Builds complex Boolean query strings for precise PubMed/MEDLINE retrieval.
"""
import argparse
import json
import re
from typing import List, Dict, Optional, Tuple
from dataclasses import dataclass, asdict
from enum import Enum
class FieldTag(Enum):
"""PubMed field tags for search queries."""
MESH_TERMS = "[MeSH Terms]"
MESH_MAJOR = "[MeSH Major Topic]"
TITLE = "[Title]"
ABSTRACT = "[Abstract]"
TITLE_ABSTRACT = "[Title/Abstract]"
AUTHOR = "[Author]"
JOURNAL = "[Journal]"
PUB_DATE = "[Publication Date]"
LANGUAGE = "[Language]"
PUB_TYPE = "[Publication Type]"
AFFILIATION = "[Affiliation]"
class FilterType(Enum):
"""Common PubMed filter categories."""
HUMANS = 'humans[MeSH Terms]'
ANIMALS = 'animals[MeSH Terms]'
ENGLISH = 'english[Language]'
ADULT = 'adult[MeSH Terms]'
AGED = 'aged[MeSH Terms]'
CHILD = 'child[MeSH Terms]'
LAST_5_YEARS = f'{(2024-5)}:2024[Publication Date]'
LAST_10_YEARS = f'{(2024-10)}:2024[Publication Date]'
RCT = 'randomized controlled trial[Publication Type]'
META_ANALYSIS = 'meta-analysis[Publication Type]'
SYSTEMATIC_REVIEW = 'systematic review[Publication Type]'
REVIEW = 'review[Publication Type]'
CLINICAL_TRIAL = 'clinical trial[Publication Type]'
@dataclass
class SearchConcept:
"""Represents a search concept with MeSH and text word components."""
name: str
mesh_terms: List[str]
text_words: List[str]
use_explode: bool = True
subheadings: Optional[List[str]] = None
def to_query(self) -> str:
"""Convert concept to query string."""
parts = []
# Add MeSH terms
for mesh in self.mesh_terms:
if self.subheadings:
for sub in self.subheadings:
parts.append(f'"{mesh}/{sub}"{FieldTag.MESH_TERMS.value}')
else:
explode_mod = "" if self.use_explode else ":noexp"
parts.append(f'"{mesh}"{FieldTag.MESH_TERMS.value}{explode_mod}')
# Add text words
for tw in self.text_words:
parts.append(f'"{tw}"{FieldTag.TITLE_ABSTRACT.value}')
return f"({(' OR '.join(parts))})"
@dataclass
class SearchStrategy:
"""Complete search strategy with concepts and filters."""
concepts: List[SearchConcept]
filters: List[str]
description: str = ""
def to_query(self) -> str:
"""Build complete Boolean query."""
concept_queries = [c.to_query() for c in self.concepts]
all_parts = concept_queries + self.filters
return f"({' AND '.join(all_parts)})"
def to_line_by_line(self) -> str:
"""Generate line-by-line search strategy."""
lines = []
for i, concept in enumerate(self.concepts, 1):
lines.append(f"# {i}. {concept.name}")
lines.append(concept.to_query())
if self.filters:
lines.append(f"# {len(self.concepts) + 1}. Filters")
lines.append(f"({' AND '.join(self.filters)})")
lines.append("\n# Final Query")
lines.append(self.to_query())
return '\n'.join(lines)
class MeSHMapper:
"""Maps common medical concepts to MeSH terms."""
# Common term mappings (simplified - production would use MeSH API)
COMMON_MESH = {
# Populations
"diabetes": ["Diabetes Mellitus", "Diabetes Mellitus, Type 2", "Diabetes Mellitus, Type 1"],
"hypertension": ["Hypertension"],
"obesity": ["Obesity"],
"stroke": ["Stroke", "Brain Ischemia"],
"myocardial infarction": ["Myocardial Infarction"],
"heart failure": ["Heart Failure"],
"cancer": ["Neoplasms"],
"depression": ["Depression"],
"alzheimer": ["Alzheimer Disease"],
"asthma": ["Asthma"],
"copd": ["Pulmonary Disease, Chronic Obstructive"],
# Interventions
"aspirin": ["Aspirin"],
"metformin": ["Metformin"],
"insulin": ["Insulin"],
"statins": ["Hydroxymethylglutaryl-CoA Reductase Inhibitors"],
"placebo": ["Placebos"],
"surgery": ["Surgical Procedures, Operative"],
"exercise": ["Exercise"],
"diet": ["Diet Therapy"],
# Outcomes
"mortality": ["Mortality"],
"quality of life": ["Quality of Life"],
"adverse effects": ["Drug-Related Side Effects and Adverse Reactions"],
"efficacy": ["Treatment Outcome"],
"safety": ["Safety"],
}
@classmethod
def suggest_mesh(cls, concept: str) -> List[str]:
"""Suggest MeSH terms for a concept."""
concept_lower = concept.lower()
results = []
for key, terms in cls.COMMON_MESH.items():
if key in concept_lower or concept_lower in key:
results.extend(terms)
return list(set(results)) if results else [concept]
@classmethod
def suggest_synonyms(cls, concept: str) -> List[str]:
"""Suggest text word synonyms for a concept."""
# Simplified synonym mapping
synonyms = {
"diabetes": ["diabetic", "diabetics", "hyperglycemia"],
"hypertension": ["high blood pressure", "elevated blood pressure"],
"stroke": ["cerebrovascular accident", "cva", "brain attack"],
"myocardial infarction": ["heart attack", "mi", "cardiac infarction"],
"aspirin": ["acetylsalicylic acid", "asa"],
"children": ["child", "pediatric", "paediatric", "infant", "adolescent"],
"elderly": ["aged", "older adults", "geriatric", "seniors"],
}
concept_lower = concept.lower()
for key, syns in synonyms.items():
if key in concept_lower or concept_lower in key:
return syns
return []
class QueryBuilder:
"""Builds PubMed Boolean queries."""
CLINICAL_QUERIES = {
"therapy": """(
randomized controlled trial[Publication Type] OR
(randomized[Title/Abstract] AND controlled[Title/Abstract] AND trial[Title/Abstract]) OR
(clinical[Title/Abstract] AND trial[Title/Abstract])
)""",
"diagnosis": """(
sensitivity and specificity[MeSH Terms] OR
sensitivity[Title/Abstract] OR
specificity[Title/Abstract] OR
"diagnostic accuracy"[Title/Abstract] OR
"likelihood ratio"[Title/Abstract] OR
"roc curve"[Title/Abstract]
)""",
"prognosis": """(
incidence[MeSH Terms] OR
mortality[MeSH Terms] OR
"follow-up studies"[MeSH Terms] OR
prognos*[Title/Abstract] OR
predict*[Title/Abstract] OR
course[Title/Abstract]
)""",
"etiology": """(
risk[MeSH Terms] OR
"risk factors"[MeSH Terms] OR
(risk[Title/Abstract] AND factor*[Title/Abstract]) OR
caus*[Title/Abstract] OR
associat*[Title/Abstract]
)""",
"clinical_prediction": """(
"predictive value of tests"[MeSH Terms] OR
"clinical prediction rule"[Title/Abstract] OR
(predict*[Title/Abstract] AND model[Title/Abstract]) OR
"decision rule"[Title/Abstract] OR
"risk score"[Title/Abstract]
)"""
}
@classmethod
def build_pico_query(
cls,
population: Optional[str] = None,
intervention: Optional[str] = None,
comparison: Optional[str] = None,
outcome: Optional[str] = None,
study_type: Optional[str] = None
) -> SearchStrategy:
"""Build query from PICO components."""
concepts = []
if population:
mesh_terms = MeSHMapper.suggest_mesh(population)
synonyms = MeSHMapper.suggest_synonyms(population)
concepts.append(SearchConcept(
name="Population",
mesh_terms=mesh_terms,
text_words=synonyms + [population],
use_explode=True
))
if intervention:
mesh_terms = MeSHMapper.suggest_mesh(intervention)
synonyms = MeSHMapper.suggest_synonyms(intervention)
concepts.append(SearchConcept(
name="Intervention",
mesh_terms=mesh_terms,
text_words=synonyms + [intervention],
use_explode=True
))
if comparison:
mesh_terms = MeSHMapper.suggest_mesh(comparison)
synonyms = MeSHMapper.suggest_synonyms(comparison)
concepts.append(SearchConcept(
name="Comparison",
mesh_terms=mesh_terms,
text_words=synonyms + [comparison],
use_explode=True
))
if outcome:
mesh_terms = MeSHMapper.suggest_mesh(outcome)
synonyms = MeSHMapper.suggest_synonyms(outcome)
concepts.append(SearchConcept(
name="Outcome",
mesh_terms=mesh_terms,
text_words=synonyms + [outcome],
use_explode=True
))
filters = []
if study_type and study_type.lower() in cls.CLINICAL_QUERIES:
filters.append(cls.CLINICAL_QUERIES[study_type.lower()])
return SearchStrategy(
concepts=concepts,
filters=filters,
description="PICO-based search strategy"
)
@classmethod
def validate_query(cls, query: str) -> Tuple[bool, List[str]]:
"""Validate query syntax."""
errors = []
# Check balanced parentheses
if query.count('(') != query.count(')'):
errors.append("Unbalanced parentheses")
# Check for unclosed quotes
if query.count('"') % 2 != 0:
errors.append("Unclosed quotation marks")
# Check for valid field tags
valid_tags = [tag.value for tag in FieldTag]
# Extract potential field tags
found_tags = re.findall(r'\[[A-Za-z/ ]+\]', query)
for tag in found_tags:
if tag not in valid_tags:
errors.append(f"Unusual field tag: {tag}")
return len(errors) == 0, errors
def main():
parser = argparse.ArgumentParser(
description="PubMed Search Specialist - Build complex Boolean queries"
)
subparsers = parser.add_subparsers(dest='command', help='Available commands')
# PICO command
pico_parser = subparsers.add_parser('pico', help='Build query from PICO framework')
pico_parser.add_argument('-p', '--population', help='Population/Problem')
pico_parser.add_argument('-i', '--intervention', help='Intervention')
pico_parser.add_argument('-c', '--comparison', help='Comparison')
pico_parser.add_argument('-o', '--outcome', help='Outcome')
pico_parser.add_argument('-s', '--study-type',
choices=['therapy', 'diagnosis', 'prognosis', 'etiology', 'clinical_prediction'],
help='Clinical query category')
pico_parser.add_argument('--format', choices=['query', 'lines', 'json'], default='lines',
help='Output format')
# MeSH suggestion command
mesh_parser = subparsers.add_parser('mesh', help='Suggest MeSH terms')
mesh_parser.add_argument('concept', help='Concept to map')
# Validate command
validate_parser = subparsers.add_parser('validate', help='Validate query syntax')
validate_parser.add_argument('query', help='Query string to validate')
args = parser.parse_args()
if args.command == 'pico':
strategy = QueryBuilder.build_pico_query(
population=args.population,
intervention=args.intervention,
comparison=args.comparison,
outcome=args.outcome,
study_type=args.study_type
)
if args.format == 'json':
print(json.dumps(asdict(strategy), indent=2, default=str))
elif args.format == 'query':
print(strategy.to_query())
else: # lines
print(strategy.to_line_by_line())
elif args.command == 'mesh':
mesh_terms = MeSHMapper.suggest_mesh(args.concept)
synonyms = MeSHMapper.suggest_synonyms(args.concept)
print(f"Concept: {args.concept}")
print(f"\nSuggested MeSH Terms:")
for term in mesh_terms:
print(f" - {term}")
print(f"\nSuggested Text Words:")
for syn in synonyms:
print(f" - {syn}")
elif args.command == 'validate':
valid, errors = QueryBuilder.validate_query(args.query)
if valid:
print("✓ Query syntax is valid")
else:
print("✗ Query has errors:")
for error in errors:
print(f" - {error}")
else:
parser.print_help()
if __name__ == "__main__":
main()
Analyze data with `pseudotime-trajectory-viz` using a reproducible workflow, explicit validation, and structured outputs for review-ready interpretation.
---
name: pseudotime-trajectory-viz
description: Analyze data with `pseudotime-trajectory-viz` using a reproducible workflow, explicit validation, and structured outputs for review-ready interpretation.
license: MIT
skill-author: AIPOCH
---
# Pseudotime Trajectory Visualization
Visualize single-cell developmental trajectories showing cellular differentiation processes using pseudotime analysis.
## When to Use
- Use this skill when the task needs Visualize single-cell developmental trajectories showing cellular differentiation processes using pseudotime analysis.
- Use this skill for data analysis tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
- Scope-focused workflow aligned to: Analyze data with `pseudotime-trajectory-viz` using a reproducible workflow, explicit validation, and structured outputs for review-ready interpretation.
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
- Python 3.9+
- `scanpy>=1.9.0` - Single-cell analysis framework
- `scvelo>=0.2.5` - RNA velocity analysis
- `palantir` - Trajectory inference and pseudotime
- `scikit-learn` - Dimensionality reduction and clustering
- `matplotlib>=3.5.0` - Plotting
- `seaborn` - Statistical visualization
- `pandas`, `numpy` - Data manipulation
- `anndata` - Single-cell data structure
Optional:
- `slingshot` (R) via `rpy2` - Alternative trajectory method
## Example Usage
See `## Usage` above for related details.
```bash
cd "20260318/scientific-skills/Data Analytics/pseudotime-trajectory-viz"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
python scripts/main.py --input "Audit validation sample with explicit symptoms, history, assessment, and next-step plan." --format json
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Function
- Infer developmental trajectories from single-cell RNA-seq data
- Calculate pseudotime values representing cellular differentiation progress
- Visualize trajectory trees and lineage branching
- Overlay gene expression dynamics along pseudotime
- Identify lineage-specific marker genes
- Generate publication-ready trajectory plots
## Technical Difficulty
**High** - Requires understanding of single-cell analysis, dimensionality reduction, trajectory inference algorithms, and Python visualization libraries.
## Usage
```text
# Basic trajectory analysis from AnnData file
python scripts/main.py --input data.h5ad --output ./results
# Specify starting cells and lineage inference method
python scripts/main.py --input data.h5ad --start-cell stem_cell_cluster --method diffusion --output ./results
# Visualize specific gene expression along trajectories
python scripts/main.py --input data.h5ad --genes SOX2,OCT4,NANOG --plot-genes --output ./results
# Full analysis with custom parameters
python scripts/main.py --input data.h5ad \
--embedding umap \
--method slingshot \
--start-cell-type progenitor \
--n-lineages 3 \
--genes MARKER1,MARKER2,MARKER3 \
--output ./results \
--format pdf
```
## Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `--input` | path | required | Input AnnData (.h5ad) file path |
| `--output` | path | ./trajectory_output | Output directory for results |
| `--embedding` | enum | umap | Embedding for visualization: `umap`, `tsne`, `pca`, `diffmap` |
| `--method` | enum | diffusion | Trajectory inference: `diffusion`, `slingshot`, `paga`, `palantir` |
| `--start-cell` | string | auto | Root cell ID or cluster name for trajectory origin |
| `--start-cell-type` | string | - | Cell type annotation to use as starting point |
| `--n-lineages` | int | auto | Number of expected lineage branches |
| `--cluster-key` | string | leiden | AnnData obs key for cell clusters |
| `--cell-type-key` | string | cell_type | AnnData obs key for cell type annotations |
| `--genes` | string | - | Comma-separated gene names to plot along pseudotime |
| `--plot-genes` | flag | false | Generate gene expression heatmaps along trajectories |
| `--plot-branch` | flag | true | Show lineage branch probabilities |
| `--format` | enum | png | Output format: `png`, `pdf`, `svg` |
| `--dpi` | int | 300 | Figure resolution |
| `--n-pcs` | int | 30 | Number of principal components for analysis |
| `--n-neighbors` | int | 15 | Number of neighbors for graph construction |
| `--diffmap-components` | int | 5 | Number of diffusion components to compute |
## Input Format
Required AnnData (.h5ad) structure:
```
AnnData object with n_obs × n_vars = n_cells × n_genes
obs: 'leiden', 'cell_type' # Cluster and cell type annotations
var: 'highly_variable' # Highly variable gene marker
obsm: 'X_umap', 'X_pca' # Pre-computed embeddings (optional)
layers: 'spliced', 'unspliced' # For RNA velocity (optional)
```
## Output Files
```
output_directory/
├── trajectory_plot.{format} # Main trajectory visualization
├── pseudotime_distribution.{format} # Pseudotime value distribution
├── lineage_tree.{format} # Branching lineage structure
├── gene_expression_heatmap.{format} # Gene dynamics heatmap (if --plot-genes)
├── gene_trends/
│ ├── {gene_name}_trend.{format} # Individual gene expression trends
│ └── ...
├── pseudotime_values.csv # Cell-level pseudotime values
├── lineage_assignments.csv # Cell lineage assignments
└── analysis_report.json # Analysis parameters and statistics
```
## Output Format Example
### analysis_report.json
```json
{
"analysis_date": "2026-02-06T06:00:00",
"method": "diffusion",
"n_cells": 5000,
"n_lineages": 3,
"root_cell": "cell_1234",
"pseudotime_range": [0.0, 1.0],
"lineages": {
"lineage_1": {
"cell_count": 1500,
"terminal_state": "mature_type_A",
"mean_pseudotime": 0.75
},
"lineage_2": {
"cell_count": 1200,
"terminal_state": "mature_type_B",
"mean_pseudotime": 0.68
}
}
}
```
### pseudotime_values.csv
```csv
cell_id,cluster,cell_type,pseudotime,lineage,branch_probability
cell_001,0,progenitor,0.05,lineage_1,0.95
cell_002,1,intermediate,0.42,lineage_1,0.88
...
```
## Implementation Notes
1. **Preprocessing**: Assumes input data is already normalized and log-transformed
2. **Root Detection**: If start cell not specified, uses cell cycle or marker gene expression to infer progenitors
3. **Diffusion Pseudotime**: Default method using diffusion maps for robust trajectory inference
4. **Palantir**: Used for soft lineage assignments and fate probability estimation
5. **Memory**: Large datasets (>50k cells) may require 16GB+ RAM
## Methods
### Diffusion Pseudotime (DPT)
- Uses diffusion maps to capture non-linear cell relationships
- Robust to noise and dataset size
- Good for complex branching trajectories
### Slingshot
- Principal curve-based approach
- Simultaneous inference of multiple lineages
- Requires R installation with rpy2 bridge
### PAGA (Partition-based Graph Abstraction)
- Connects clusters based on transcriptome similarity
- Provides coarse-grained trajectory overview
- Fast and scalable
### Palantir
- Diffusion-based fate probability estimation
- Soft lineage assignments
- Best for fate bias analysis
## Limitations
- Requires high-quality single-cell data with good cell type coverage
- Assumes differentiation is the main source of variation
- May not capture rare transitional states with few cells
- Circular or cyclic processes not well represented by linear pseudotime
- RNA velocity requires spliced/unspliced counts in AnnData layers
## Safety & Best Practices
- **Validate trajectories** with known marker genes and biological knowledge
- **Multiple methods** recommended for critical analyses
- **Batch effects** should be corrected before trajectory inference
- **Cell cycle** effects may confound differentiation trajectories
- **Do not overinterpret** precise pseudotime values as absolute time
## Example Workflow
```python
# Preprocess data with scanpy (before using this tool)
import scanpy as sc
adata = sc.read_h5ad('raw_data.h5ad')
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.pp.scale(adata)
sc.tl.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata)
adata.write('data.h5ad')
# Then run this skill
# python scripts/main.py --input data.h5ad --start-cell-type progenitor
```
## References
- Haghverdi et al. (2016) - Diffusion pseudotime
- Street et al. (2018) - Slingshot
- Wolf et al. (2019) - PAGA
- Setty et al. (2019) - Palantir
- La Manno et al. (2018) - RNA velocity
## Version
- Created: 2026-02-06
- Status: Functional
- Version: 1.0.0
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
```text
# Python dependencies
pip install -r requirements.txt
```
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `pseudotime-trajectory-viz` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `pseudotime-trajectory-viz` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
## Inputs to Collect
- Required inputs: the user goal, the primary data or source file, and the requested output format.
- Optional inputs: output directory, formatting preferences, and validation constraints.
- If a required input is unavailable, return a short clarification request before continuing.
## Output Contract
- Return a short summary, the main deliverables, and any assumptions that materially affect interpretation.
- If execution is partial, label what succeeded, what failed, and the next safe recovery step.
- Keep the final answer within the documented scope of the skill.
## Validation and Safety Rules
- Validate identifiers, file paths, and user-provided parameters before execution.
- Do not fabricate results, metrics, citations, or downstream conclusions.
- Use safe fallback behavior when dependencies, credentials, or required inputs are missing.
- Surface any execution failure with a concise diagnosis and recovery path.
FILE:README.md
# Pseudotime Trajectory Visualization
Visualize single-cell developmental trajectories showing how cells differentiate from stem cells to mature cells.
## Quick Start
```bash
# Install dependencies
pip install -r requirements.txt
# Generate example data (optional)
python examples/generate_example_data.py example_data.h5ad
# Run analysis
python scripts/main.py --input example_data.h5ad --output ./results
# With specific parameters
python scripts/main.py \
--input data.h5ad \
--start-cell-type Stem \
--method diffusion \
--genes SOX2,NANOG,POU5F1 \
--plot-genes \
--output ./results
```
## Features
- **Diffusion Pseudotime (DPT)**: Robust trajectory inference using diffusion maps
- **PAGA**: Partition-based graph abstraction for trajectory visualization
- **Gene Expression Trends**: Track marker gene expression along pseudotime
- **Lineage Assignment**: Automatic or user-defined lineage branching
- **Publication-Ready Plots**: High-resolution figures in multiple formats
## Output Files
```
results/
├── trajectory_plot.png # Main trajectory visualization
├── paga_graph.png # PAGA connectivity graph (if method=paga)
├── gene_expression_heatmap.png # Gene dynamics heatmap
├── gene_trends/
│ ├── SOX2_trend.png # Individual gene plots
│ └── ...
├── pseudotime_values.csv # Cell-level pseudotime data
├── analysis_report.json # Analysis metadata
└── trajectory_data.h5ad # Updated AnnData object
```
## Input Data Format
The tool expects a preprocessed AnnData (.h5ad) file:
```python
import scanpy as sc
# Load your data
adata = sc.read_h5ad('your_data.h5ad')
# Required annotations:
# - adata.obs['leiden'] or other cluster labels
# - adata.obs['cell_type'] (optional, for root detection)
# Preprocessing steps (if not done):
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.tl.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata)
# Save for analysis
adata.write('preprocessed_data.h5ad')
```
## Command Line Options
```
--input, -i Input AnnData file (required)
--output, -o Output directory (default: ./trajectory_output)
--embedding Embedding for viz: umap, tsne, pca, diffmap (default: umap)
--method Trajectory method: diffusion, paga (default: diffusion)
--start-cell Root cell ID for trajectory origin
--start-cell-type Cell type to use as starting point
--n-lineages Expected number of lineage branches
--cluster-key Column name for clusters (default: leiden)
--cell-type-key Column name for cell types (default: cell_type)
--genes Comma-separated gene names to plot
--plot-genes Generate gene expression plots
--format Output format: png, pdf, svg (default: png)
--dpi Figure resolution (default: 300)
```
## Examples
### Basic Usage
```bash
python scripts/main.py --input data.h5ad --output ./results
```
### Specify Root Cell Type
```bash
python scripts/main.py \
--input data.h5ad \
--start-cell-type "Progenitor" \
--output ./results
```
### Visualize Marker Genes
```bash
python scripts/main.py \
--input data.h5ad \
--genes SOX2,OCT4,NANOG,NESTIN \
--plot-genes \
--output ./results
```
### Use PAGA Method
```bash
python scripts/main.py \
--input data.h5ad \
--method paga \
--embedding umap \
--output ./results
```
### Generate PDF Figures
```bash
python scripts/main.py \
--input data.h5ad \
--format pdf \
--dpi 600 \
--output ./results
```
## Troubleshooting
**Issue**: "No root cell detected"
**Solution**: Specify `--start-cell` or `--start-cell-type` explicitly
**Issue**: Memory error with large datasets
**Solution**: Subsample cells or increase system RAM
**Issue**: Trajectory doesn't match biology
**Solution**: Verify root cell selection and try different methods
## References
- Haghverdi et al. (2016) - Diffusion pseudotime robustly reconstructs lineage branching
- Wolf et al. (2019) - PAGA: graph abstraction reconciles clustering with trajectory inference
## License
MIT License
FILE:references/runtime_checklist.md
# Runtime Checklist
- Category: `Data Analysis`
- Validate the user goal, required inputs, and output format before taking action.
- Ask a targeted clarification question when a required input is missing.
- Keep the response scoped to the documented workflow and state assumptions explicitly.
- Run a non-destructive smoke check before any file-dependent or data-dependent command.
- Recommended smoke check: `python -m py_compile scripts/main.py`
- If execution fails, stop and return a concise recovery path instead of fabricating results.
FILE:requirements.txt
scanpy>=1.9.0
anndata>=0.8.0
matplotlib>=3.5.0
seaborn>=0.12.0
numpy>=1.21.0
pandas>=1.3.0
scipy>=1.7.0
scikit-learn>=1.0.0
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Pseudotime Trajectory Visualization Tool
Visualize single-cell developmental trajectories showing how cells
differentiate from stem cells to mature cells.
Author: OpenClaw
Date: 2026-02-06
"""
import argparse
import json
import os
import sys
import warnings
from datetime import datetime
from pathlib import Path
import anndata
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scanpy as sc
import seaborn as sns
from matplotlib.backends.backend_pdf import PdfPages
# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')
# Set matplotlib defaults for publication quality
plt.rcParams['figure.dpi'] = 300
plt.rcParams['savefig.dpi'] = 300
plt.rcParams['font.size'] = 10
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['legend.fontsize'] = 9
def parse_arguments():
"""Parse command line arguments."""
parser = argparse.ArgumentParser(
description='Visualize single-cell developmental trajectories (pseudotime)',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s --input data.h5ad --output ./results
%(prog)s --input data.h5ad --start-cell-type progenitor --method diffusion
%(prog)s --input data.h5ad --genes SOX2,OCT4,NANOG --plot-genes
"""
)
# Required arguments
parser.add_argument('--input', '-i', type=str, required=True,
help='Input AnnData (.h5ad) file path')
parser.add_argument('--output', '-o', type=str, default='./trajectory_output',
help='Output directory for results (default: ./trajectory_output)')
# Embedding and method selection
parser.add_argument('--embedding', type=str, default='umap',
choices=['umap', 'tsne', 'pca', 'diffmap'],
help='Embedding for visualization (default: umap)')
parser.add_argument('--method', type=str, default='diffusion',
choices=['diffusion', 'paga'],
help='Trajectory inference method (default: diffusion)')
# Trajectory parameters
parser.add_argument('--start-cell', type=str, default=None,
help='Root cell ID for trajectory origin')
parser.add_argument('--start-cell-type', type=str, default=None,
help='Cell type to use as trajectory starting point')
parser.add_argument('--n-lineages', type=int, default=None,
help='Number of expected lineage branches (auto-detect if not specified)')
# Data keys
parser.add_argument('--cluster-key', type=str, default='leiden',
help='AnnData obs key for cell clusters (default: leiden)')
parser.add_argument('--cell-type-key', type=str, default='cell_type',
help='AnnData obs key for cell type annotations (default: cell_type)')
# Gene expression plotting
parser.add_argument('--genes', type=str, default=None,
help='Comma-separated gene names to plot along pseudotime')
parser.add_argument('--plot-genes', action='store_true',
help='Generate gene expression heatmaps along trajectories')
parser.add_argument('--plot-branch', action='store_true', default=True,
help='Show lineage branch probabilities')
# Output options
parser.add_argument('--format', type=str, default='png',
choices=['png', 'pdf', 'svg'],
help='Output figure format (default: png)')
parser.add_argument('--dpi', type=int, default=300,
help='Figure resolution (default: 300)')
# Analysis parameters
parser.add_argument('--n-pcs', type=int, default=30,
help='Number of principal components (default: 30)')
parser.add_argument('--n-neighbors', type=int, default=15,
help='Number of neighbors for graph (default: 15)')
parser.add_argument('--diffmap-components', type=int, default=5,
help='Number of diffusion components (default: 5)')
return parser.parse_args()
def load_data(input_path):
"""Load AnnData object from file."""
print(f"Loading data from {input_path}...")
if not os.path.exists(input_path):
raise FileNotFoundError(f"Input file not found: {input_path}")
try:
adata = sc.read_h5ad(input_path)
print(f"Loaded {adata.n_obs} cells and {adata.n_vars} genes")
return adata
except Exception as e:
raise ValueError(f"Error loading AnnData file: {e}")
def preprocess_data(adata, args):
"""Preprocess data for trajectory analysis."""
print("Preprocessing data...")
# Check for required annotations
if args.cluster_key not in adata.obs.columns:
print(f"Warning: Cluster key '{args.cluster_key}' not found. Computing Leiden clustering...")
if 'neighbors' not in adata.uns:
sc.pp.neighbors(adata, n_neighbors=args.n_neighbors, n_pcs=args.n_pcs)
sc.tl.leiden(adata, key_added=args.cluster_key)
# Compute embedding if not present
embedding_key = f'X_{args.embedding}'
if embedding_key not in adata.obsm.keys():
print(f"Computing {args.embedding.upper()} embedding...")
if 'neighbors' not in adata.uns:
sc.pp.neighbors(adata, n_neighbors=args.n_neighbors, n_pcs=args.n_pcs)
if args.embedding == 'umap':
sc.tl.umap(adata)
elif args.embedding == 'tsne':
sc.tl.tsne(adata)
elif args.embedding == 'diffmap':
sc.tl.diffmap(adata, n_comps=args.diffmap_components)
# Compute highly variable genes if not present
if 'highly_variable' not in adata.var.columns:
print("Computing highly variable genes...")
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
return adata
def find_root_cell(adata, args):
"""Identify root cell for trajectory inference."""
print("Identifying root cell...")
root_cell = None
# Option 1: Use specified cell ID
if args.start_cell:
if args.start_cell in adata.obs_names:
root_cell = args.start_cell
print(f"Using specified root cell: {root_cell}")
return root_cell
else:
print(f"Warning: Specified start cell '{args.start_cell}' not found in data")
# Option 2: Use cell type annotation
if args.start_cell_type and args.cell_type_key in adata.obs.columns:
cell_types = adata.obs[args.cell_type_key].values
if args.start_cell_type in cell_types:
# Find cell with highest stemness marker expression
stem_markers = ['SOX2', 'POU5F1', 'NANOG', 'PROM1', 'THY1', 'KIT']
available_markers = [g for g in stem_markers if g in adata.var_names]
mask = cell_types == args.start_cell_type
if available_markers:
expr = adata[mask, available_markers].X.mean(axis=1)
if hasattr(expr, 'A1'):
expr = expr.A1
root_idx = np.where(mask)[0][np.argmax(expr)]
else:
root_idx = np.where(mask)[0][0]
root_cell = adata.obs_names[root_idx]
print(f"Selected root cell from '{args.start_cell_type}': {root_cell}")
return root_cell
else:
print(f"Warning: Cell type '{args.start_cell_type}' not found")
# Option 3: Auto-detect based on stemness markers
stem_markers = ['SOX2', 'POU5F1', 'OCT4', 'NANOG', 'PROM1', 'THY1', 'KIT', 'CD34']
available_markers = [g for g in stem_markers if g in adata.var_names]
if available_markers:
print(f"Using stemness markers to find root: {available_markers}")
expr = adata[:, available_markers].X.mean(axis=1)
if hasattr(expr, 'A1'):
expr = expr.A1
root_idx = np.argmax(expr)
root_cell = adata.obs_names[root_idx]
print(f"Auto-selected root cell: {root_cell}")
else:
# Fallback: use first cell
root_cell = adata.obs_names[0]
print(f"No markers found. Using first cell as root: {root_cell}")
return root_cell
def compute_diffusion_pseudotime(adata, root_cell, args):
"""Compute diffusion pseudotime using scanpy."""
print("Computing diffusion pseudotime...")
# Compute diffusion map
sc.tl.diffmap(adata, n_comps=args.diffmap_components)
# Get root cell index
root_idx = np.where(adata.obs_names == root_cell)[0][0]
adata.uns['iroot'] = root_idx
# Compute DPT
sc.tl.dpt(adata, n_dcs=args.diffmap_components)
# Get pseudotime values
pseudotime = adata.obs['dpt_pseudotime'].values
print(f"Pseudotime range: {pseudotime.min():.3f} - {pseudotime.max():.3f}")
# Infer lineages based on clusters
n_lineages = args.n_lineages or min(3, adata.obs[args.cluster_key].nunique())
# Simple lineage assignment based on terminal branches
lineage_assignments = assign_lineages(adata, n_lineages, args)
adata.obs['lineage'] = lineage_assignments
return adata
def assign_lineages(adata, n_lineages, args):
"""Assign cells to lineages based on trajectory branching."""
# Simplified lineage assignment based on clusters and pseudotime
clusters = adata.obs[args.cluster_key].values
pseudotime = adata.obs['dpt_pseudotime'].values
# Find terminal clusters (high pseudotime)
cluster_pseudotime = {}
for c in np.unique(clusters):
mask = clusters == c
cluster_pseudotime[c] = pseudotime[mask].mean()
# Sort clusters by pseudotime
sorted_clusters = sorted(cluster_pseudotime.items(), key=lambda x: x[1])
# Assign lineages
lineages = np.array(['lineage_1'] * adata.n_obs, dtype=object)
if n_lineages > 1:
# Simple heuristic: divide cells among lineages
pseudotime_bins = np.linspace(0, 1, n_lineages + 1)
for i in range(n_lineages):
mask = (pseudotime >= pseudotime_bins[i]) & (pseudotime < pseudotime_bins[i+1])
lineages[mask] = f'lineage_{i+1}'
return lineages
def compute_paga_trajectory(adata, root_cell, args):
"""Compute trajectory using PAGA (Partition-based Graph Abstraction)."""
print("Computing PAGA trajectory...")
# Compute PAGA
sc.tl.paga(adata, groups=args.cluster_key)
# Get root cluster
root_idx = np.where(adata.obs_names == root_cell)[0][0]
root_cluster = adata.obs[args.cluster_key].iloc[root_idx]
# Use PAGA to initialize embedding
sc.tl.draw_graph(adata, init_pos=args.embedding)
# Estimate pseudotime from PAGA distances
paga_distances = adata.uns['paga']['connectivities'].toarray()
# Simple pseudotime: distance from root cluster
cluster_order = list(adata.obs[args.cluster_key].unique())
if root_cluster in cluster_order:
root_idx_cluster = cluster_order.index(root_cluster)
else:
root_idx_cluster = 0
# Assign pseudotime based on cluster
cluster_pseudotime = {}
for i, c in enumerate(cluster_order):
# Distance as pseudotime proxy
cluster_pseudotime[c] = min(1.0, abs(i - root_idx_cluster) / max(1, len(cluster_order) - 1))
pseudotime = np.array([cluster_pseudotime[c] for c in adata.obs[args.cluster_key]])
pseudotime = pseudotime + np.random.normal(0, 0.05, len(pseudotime)) # Add noise
pseudotime = np.clip(pseudotime, 0, 1)
adata.obs['paga_pseudotime'] = pseudotime
adata.obs['dpt_pseudotime'] = pseudotime # Use same key for consistency
# Assign lineages
n_lineages = args.n_lineages or min(3, len(cluster_order))
lineage_assignments = assign_lineages(adata, n_lineages, args)
adata.obs['lineage'] = lineage_assignments
return adata
def plot_trajectory(adata, args, output_dir):
"""Generate main trajectory visualization."""
print("Generating trajectory plot...")
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
embedding_key = f'X_{args.embedding}'
# Plot 1: Embedding colored by pseudotime
ax = axes[0, 0]
sc.pl.embedding(
adata, basis=args.embedding, color='dpt_pseudotime',
ax=ax, show=False, color_map='viridis_r',
title='Pseudotime Trajectory'
)
# Plot 2: Embedding colored by cluster
ax = axes[0, 1]
sc.pl.embedding(
adata, basis=args.embedding, color=args.cluster_key,
ax=ax, show=False, legend_loc='on data',
title='Cell Clusters'
)
# Plot 3: Embedding colored by lineage
ax = axes[1, 0]
sc.pl.embedding(
adata, basis=args.embedding, color='lineage',
ax=ax, show=False,
title='Lineage Assignment'
)
# Plot 4: Pseudotime distribution
ax = axes[1, 1]
pseudotime = adata.obs['dpt_pseudotime'].values
for lineage in adata.obs['lineage'].unique():
mask = adata.obs['lineage'] == lineage
ax.hist(pseudotime[mask], bins=30, alpha=0.6, label=lineage, density=True)
ax.set_xlabel('Pseudotime')
ax.set_ylabel('Density')
ax.set_title('Pseudotime Distribution by Lineage')
ax.legend()
plt.tight_layout()
output_path = os.path.join(output_dir, f'trajectory_plot.{args.format}')
plt.savefig(output_path, dpi=args.dpi, bbox_inches='tight')
plt.close()
print(f"Saved: {output_path}")
return output_path
def plot_paga_graph(adata, args, output_dir):
"""Plot PAGA graph if available."""
if 'paga' not in adata.uns:
return None
print("Generating PAGA graph...")
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# PAGA connectivity
sc.pl.paga(adata, ax=axes[0], show=False,
title='PAGA Graph (Clusters)')
# PAGA on embedding
sc.pl.paga_compare(
adata, basis=args.embedding, ax=axes[1], show=False,
title='PAGA Graph on Embedding'
)
plt.tight_layout()
output_path = os.path.join(output_dir, f'paga_graph.{args.format}')
plt.savefig(output_path, dpi=args.dpi, bbox_inches='tight')
plt.close()
print(f"Saved: {output_path}")
return output_path
def plot_gene_expression(adata, genes, args, output_dir):
"""Plot gene expression along pseudotime."""
if not genes:
return []
print(f"Generating gene expression plots for {len(genes)} genes...")
output_files = []
# Create gene trends directory
gene_trends_dir = os.path.join(output_dir, 'gene_trends')
os.makedirs(gene_trends_dir, exist_ok=True)
# Filter available genes
available_genes = [g for g in genes if g in adata.var_names]
missing_genes = set(genes) - set(available_genes)
if missing_genes:
print(f"Warning: Genes not found in data: {missing_genes}")
if not available_genes:
print("No valid genes to plot")
return output_files
# Individual gene plots
for gene in available_genes:
fig, ax = plt.subplots(figsize=(8, 5))
pseudotime = adata.obs['dpt_pseudotime'].values
expression = adata[:, gene].X.toarray().flatten() if hasattr(adata[:, gene].X, 'toarray') else adata[:, gene].X.flatten()
# Scatter plot with trend line
for lineage in adata.obs['lineage'].unique():
mask = adata.obs['lineage'] == lineage
ax.scatter(pseudotime[mask], expression[mask], alpha=0.3, s=10, label=lineage)
# Fit smoothing spline
if mask.sum() > 10:
from scipy.interpolate import UnivariateSpline
idx = np.argsort(pseudotime[mask])
x = pseudotime[mask][idx]
y = expression[mask][idx]
try:
spline = UnivariateSpline(x, y, s=len(x))
x_smooth = np.linspace(x.min(), x.max(), 100)
ax.plot(x_smooth, spline(x_smooth), linewidth=2, label=f'{lineage} trend')
except:
pass
ax.set_xlabel('Pseudotime')
ax.set_ylabel(f'{gene} Expression')
ax.set_title(f'{gene} Expression along Trajectory')
ax.legend(loc='best')
output_path = os.path.join(gene_trends_dir, f'{gene}_trend.{args.format}')
plt.tight_layout()
plt.savefig(output_path, dpi=args.dpi, bbox_inches='tight')
plt.close()
output_files.append(output_path)
# Combined heatmap
fig, ax = plt.subplots(figsize=(10, 8))
# Prepare data for heatmap
expr_matrix = []
gene_labels = []
for gene in available_genes:
expression = adata[:, gene].X.toarray().flatten() if hasattr(adata[:, gene].X, 'toarray') else adata[:, gene].X.flatten()
expr_matrix.append(expression)
gene_labels.append(gene)
expr_matrix = np.array(expr_matrix)
# Sort cells by pseudotime
pseudotime = adata.obs['dpt_pseudotime'].values
sort_idx = np.argsort(pseudotime)
expr_matrix_sorted = expr_matrix[:, sort_idx]
# Normalize per gene (z-score)
expr_matrix_norm = (expr_matrix_sorted - expr_matrix_sorted.mean(axis=1, keepdims=True)) / (
expr_matrix_sorted.std(axis=1, keepdims=True) + 1e-8
)
# Plot heatmap
sns.heatmap(expr_matrix_norm, xticklabels=False, yticklabels=gene_labels,
cmap='RdBu_r', center=0, ax=ax, cbar_kws={'label': 'Z-score'})
ax.set_xlabel('Cells (ordered by pseudotime)')
ax.set_title('Gene Expression Heatmap along Pseudotime')
output_path = os.path.join(output_dir, f'gene_expression_heatmap.{args.format}')
plt.tight_layout()
plt.savefig(output_path, dpi=args.dpi, bbox_inches='tight')
plt.close()
output_files.append(output_path)
print(f"Saved {len(output_files)} gene expression plots")
return output_files
def save_results(adata, args, output_dir, root_cell):
"""Save analysis results to files."""
print("Saving results...")
# Save pseudotime values
results_df = pd.DataFrame({
'cell_id': adata.obs_names,
'cluster': adata.obs[args.cluster_key].values,
'pseudotime': adata.obs['dpt_pseudotime'].values,
'lineage': adata.obs['lineage'].values
})
if args.cell_type_key in adata.obs.columns:
results_df['cell_type'] = adata.obs[args.cell_type_key].values
results_path = os.path.join(output_dir, 'pseudotime_values.csv')
results_df.to_csv(results_path, index=False)
print(f"Saved: {results_path}")
# Save analysis report
lineages = {}
for lineage in adata.obs['lineage'].unique():
mask = adata.obs['lineage'] == lineage
lineages[lineage] = {
'cell_count': int(mask.sum()),
'mean_pseudotime': float(adata.obs['dpt_pseudotime'][mask].mean()),
'clusters': list(adata.obs[args.cluster_key][mask].unique())
}
report = {
'analysis_date': datetime.now().isoformat(),
'method': args.method,
'n_cells': adata.n_obs,
'n_genes': adata.n_vars,
'n_lineages': len(adata.obs['lineage'].unique()),
'root_cell': root_cell,
'pseudotime_range': [
float(adata.obs['dpt_pseudotime'].min()),
float(adata.obs['dpt_pseudotime'].max())
],
'lineages': lineages,
'parameters': {
'embedding': args.embedding,
'n_pcs': args.n_pcs,
'n_neighbors': args.n_neighbors,
'cluster_key': args.cluster_key
}
}
report_path = os.path.join(output_dir, 'analysis_report.json')
with open(report_path, 'w') as f:
json.dump(report, f, indent=2)
print(f"Saved: {report_path}")
# Save updated AnnData
adata_path = os.path.join(output_dir, 'trajectory_data.h5ad')
adata.write(adata_path)
print(f"Saved: {adata_path}")
return results_path, report_path, adata_path
def main():
"""Main analysis pipeline."""
args = parse_arguments()
# Create output directory
os.makedirs(args.output, exist_ok=True)
print(f"Output directory: {args.output}")
try:
# Load data
adata = load_data(args.input)
# Preprocess
adata = preprocess_data(adata, args)
# Find root cell
root_cell = find_root_cell(adata, args)
# Compute trajectory
if args.method == 'diffusion':
adata = compute_diffusion_pseudotime(adata, root_cell, args)
elif args.method == 'paga':
adata = compute_paga_trajectory(adata, root_cell, args)
# Generate visualizations
plot_trajectory(adata, args, args.output)
if args.method == 'paga':
plot_paga_graph(adata, args, args.output)
# Gene expression plots
if args.genes or args.plot_genes:
gene_list = args.genes.split(',') if args.genes else []
if not gene_list and args.plot_genes:
# Use highly variable genes
if 'highly_variable' in adata.var.columns:
gene_list = adata.var_names[adata.var['highly_variable']][:20].tolist()
plot_gene_expression(adata, gene_list, args, args.output)
# Save results
save_results(adata, args, args.output, root_cell)
print("\n" + "="*50)
print("Analysis complete!")
print(f"Results saved to: {args.output}")
print("="*50)
return 0
except Exception as e:
print(f"\nError: {e}", file=sys.stderr)
import traceback
traceback.print_exc()
return 1
if __name__ == '__main__':
sys.exit(main())
Generate PyMOL scripts to highlight specific protein residues in PDB structures. Use this skill when the user needs to visualize specific amino acid residues...
---
name: protein-struct-viz
description: Generate PyMOL scripts to highlight specific protein residues in PDB
structures. Use this skill when the user needs to visualize specific amino acid
residues, create publication-quality protein images, or highlight functional sites
in protein structures.
version: 1.0.0
category: Bioinfo
tags: []
author: AIPOCH
license: MIT
status: Draft
risk_level: High
skill_type: Hybrid (Tool/Script + Network/API)
owner: AIPOCH
reviewer: ''
last_updated: '2026-02-06'
---
# protein-struct-viz
Generate PyMOL scripts for highlighting specific protein residues in molecular structures.
## Overview
This skill creates PyMOL command scripts to visualize protein structures with specific residues highlighted using various representation styles (sticks, spheres, surface, etc.).
## Usage
The skill generates `.pml` script files that can be executed directly in PyMOL to:
- Load PDB structures
- Apply custom color schemes
- Highlight specific residues with different representation styles
- Create publication-ready visualization settings
### Input Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `pdb_file` | string | Path to PDB file or PDB ID (e.g., "1abc") |
| `residues` | list | List of residue specifications (chain:resnum:resname) |
| `style` | string | Visualization style: "sticks", "spheres", "surface", "cartoon" |
| `color_scheme` | string | Color scheme: "rainbow", "chain", "element", custom hex |
| `output_name` | string | Output filename for the generated script |
### Residue Specification Format
- Format: `chain:resnum:resname` or `resnum` (for single chain)
- Examples: `A:145:ASP`, `B:23:LYS`, `156`
- Wildcards: `A:*` (all residues in chain A)
## Example
```bash
python scripts/main.py --pdb 1mbn --residues "A:64:HIS,A:93:VAL,A:97:LEU" --style sticks --color_scheme rainbow --output myoglobin_active_site.pml
```
This will generate a PyMOL script highlighting the specified residues in myoglobin's active site.
## Output
Generated `.pml` script includes:
1. Structure loading commands
2. Background and lighting settings
3. Global representation settings
4. Specific residue highlighting
5. View optimization commands
6. Optional: ray tracing for high-quality images
## References
See `references/` directory for:
- PyMOL command reference
- Color palette templates
- Example scripts for common visualization tasks
## Technical Difficulty
Medium - requires understanding of PyMOL scripting syntax and protein structure concepts.
## Dependencies
- PyMOL (installed separately)
- Python 3.7+
- No Python package dependencies (generates plain text scripts)
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python scripts with tools | High |
| Network Access | External API calls | High |
| File System Access | Read/write data | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Data handled securely | Medium |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] API requests use HTTPS only
- [ ] Input validated against allowed patterns
- [ ] API timeout and retry mechanisms implemented
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no internal paths exposed)
- [ ] Dependencies audited
- [ ] No exposure of internal service architecture
## Prerequisites
No additional Python packages required.
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
FILE:references/pymol_commands.md
# PyMOL Command Reference
## Basic Commands
### Structure Loading
```
fetch <pdb_id> # Download from RCSB PDB
cmd.load <file.pdb> # Load local file
cmd.read_pdbstr <str> # Load from PDB string
```
### Selection Syntax
```
selection-name: selection-expression
# Selection operators
and # intersection
or # union
not # difference
around X # atoms within X Angstroms
expand X # expand selection by X Angstroms
byres # extend to complete residues
```
### Selection Examples
```
chain A # All atoms in chain A
resi 145 # Residue number 145
resn ASP # All aspartate residues
name CA # All alpha carbons
chain A and resi 145 # Residue 145 in chain A
resi 145-160 # Residue range
center chain A # Center of chain A
```
## Representation Commands
```
show <representation> [, <selection>]
hide <representation> [, <selection>]
# Representations:
lines # Lines between bonded atoms
sticks # Cylinders for bonds
spheres # Spheres for atoms
surface # Solvent accessible surface
mesh # Mesh surface
dots # Dotted surface
cartoon # Secondary structure cartoon
ribbon # Smooth ribbon
cells # Unit cell
```
## Color Commands
```
color <color> [, <selection>]
# Predefined colors:
red, green, blue, yellow, magenta, cyan
orange, salmon, lime, pink, slate, teal
gray, white, black, wheat, paleyellow
# Color schemes:
util.cbc() # Color by chain
cmd.spectrum() # Rainbow gradient
cmd.spectrum('b') # Color by B-factor
```
## View Settings
```
bg_color <color> # Background color
zoom <selection> # Zoom on selection
center <selection> # Center view
reset # Reset view
orient <selection> # Orient selection
```
## Rendering Settings
```
set ray_trace_mode, 0 # Normal rendering
set ray_trace_mode, 1 # Normal + shadows
set ray_trace_mode, 2 # Normal + black outlines
set ray_trace_mode, 3 # Quicker outline mode
set antialias, 2 # Antialiasing (0-4)
set ray_shadows, 0 # Disable shadows
set ray_shadows, 1 # Enable shadows
ray [width, height] # Ray trace image
png <filename> # Save image
```
## Atom Properties
```
resi # Residue number
resn # Residue name (3-letter code)
name # Atom name (CA, CB, N, O, etc.)
elem # Element symbol
chain # Chain identifier
seg # Segment identifier
alt # Alternate conformation
```
## Common Residue Names
| Code | Name | Code | Name |
|------|---------------|------|---------------|
| ALA | Alanine | LEU | Leucine |
| ARG | Arginine | LYS | Lysine |
| ASN | Asparagine | MET | Methionine |
| ASP | Aspartate | PHE | Phenylalanine |
| CYS | Cysteine | PRO | Proline |
| GLN | Glutamine | SER | Serine |
| GLU | Glutamate | THR | Threonine |
| GLY | Glycine | TRP | Tryptophan |
| HIS | Histidine | TYR | Tyrosine |
| ILE | Isoleucine | VAL | Valine |
## Tips
1. Use `cmd.dss()` to assign secondary structure
2. Use `util.cbc()` for chain coloring
3. Use `center` and `zoom` together for focused views
4. Save sessions with `save session.pse`
5. Use `label sele and name CA, '%s %s' % (resn, resi)` for residue labels
FILE:scripts/main.py
#!/usr/bin/env python3
"""
protein-struct-viz: Generate PyMOL scripts for highlighting specific protein residues.
Usage:
python main.py --pdb <pdb_file_or_id> --residues <residue_list> [options]
Example:
python main.py --pdb 1mbn --residues "A:64:HIS,A:93:VAL" --style sticks --output result.pml
"""
import argparse
import sys
from pathlib import Path
from typing import List, Tuple, Optional
class PyMOLScriptGenerator:
"""Generate PyMOL scripts for protein residue visualization."""
# Predefined color schemes
COLOR_SCHEMES = {
"rainbow": "cmd.spectrum()",
"chain": "cmd.color('chainbow', 'all')",
"element": "util.cbc()",
"secondary": "cmd.dss(); cmd.color('secondary', 'all')",
"gray": "cmd.color('gray70', 'all')",
"white": "cmd.color('white', 'all')",
"blue_red": "cmd.spectrum('b', 'blue_red')",
}
# Standard residue colors for highlighting
RESIDUE_COLORS = [
"red", "blue", "green", "yellow", "magenta", "cyan",
"orange", "salmon", "lime", "pink", "slate", "teal"
]
def __init__(self, pdb_source: str, residues: List[str], style: str = "sticks",
color_scheme: str = "gray", highlight_colors: Optional[List[str]] = None,
output: str = "output.pml", ray_trace: bool = False):
"""
Initialize the generator.
Args:
pdb_source: PDB file path or PDB ID
residues: List of residue specifications (chain:resnum:resname)
style: Visualization style for highlighted residues
color_scheme: Global color scheme
highlight_colors: Custom colors for highlighted residues
output: Output script filename
ray_trace: Include ray tracing commands for high-quality images
"""
self.pdb_source = pdb_source
self.residues = residues
self.style = style
self.color_scheme = color_scheme
self.highlight_colors = highlight_colors or self.RESIDUE_COLORS
self.output = output
self.ray_trace = ray_trace
def _parse_residue(self, residue_spec: str) -> Tuple[Optional[str], str, Optional[str]]:
"""
Parse residue specification.
Formats supported:
- chain:resnum:resname (A:145:ASP)
- chain:resnum (A:145)
- resnum (145 - assumes chain A or first chain)
Returns:
Tuple of (chain, resnum, resname)
"""
parts = residue_spec.split(":")
if len(parts) == 3:
return parts[0], parts[1], parts[2]
elif len(parts) == 2:
return parts[0], parts[1], None
elif len(parts) == 1:
return None, parts[0], None
else:
raise ValueError(f"Invalid residue specification: {residue_spec}")
def _generate_selection_string(self, chain: Optional[str], resnum: str,
resname: Optional[str]) -> str:
"""Generate PyMOL selection string for a residue."""
parts = []
if chain:
parts.append(f"chain {chain}")
parts.append(f"resi {resnum}")
if resname:
parts.append(f"resn {resname}")
return " and ".join(parts)
def _get_load_command(self) -> str:
"""Generate structure loading command."""
# Check if pdb_source is a PDB ID (4 characters, alphanumeric)
if len(self.pdb_source) == 4 and self.pdb_source.isalnum():
return f"fetch {self.pdb_source}, async=0"
else:
return f"load {self.pdb_source}"
def generate_script(self) -> str:
"""Generate the complete PyMOL script."""
lines = []
# Header
lines.append("# PyMOL script generated by protein-struct-viz")
lines.append(f"# Target: {self.pdb_source}")
lines.append(f"# Highlighted residues: {', '.join(self.residues)}")
lines.append("")
# Load structure
lines.append("# Load structure")
lines.append(self._get_load_command())
lines.append("")
# Basic settings
lines.append("# Visualization settings")
lines.append("bg_color white")
lines.append("set antialias, 2")
lines.append("set ray_shadows, 0")
lines.append("")
# Global representation
lines.append("# Global representation")
lines.append("hide everything")
lines.append("show cartoon")
lines.append("")
# Apply color scheme
lines.append("# Color scheme")
if self.color_scheme in self.COLOR_SCHEMES:
lines.append(self.COLOR_SCHEMES[self.color_scheme])
else:
lines.append(f"cmd.color('{self.color_scheme}', 'all')")
lines.append("")
# Highlight specific residues
lines.append("# Highlight specified residues")
for i, residue_spec in enumerate(self.residues):
try:
chain, resnum, resname = self._parse_residue(residue_spec)
selection = self._generate_selection_string(chain, resnum, resname)
sel_name = f"residue_{i+1}"
color = self.highlight_colors[i % len(self.highlight_colors)]
lines.append(f"# {residue_spec}")
lines.append(f"select {sel_name}, {selection}")
lines.append(f"show {self.style}, {sel_name}")
lines.append(f"color {color}, {sel_name}")
lines.append("")
except ValueError as e:
lines.append(f"# Error parsing {residue_spec}: {e}")
lines.append("")
# Center on highlighted residues
if self.residues:
lines.append("# Center view on highlighted residues")
lines.append("center sele")
lines.append("zoom sele, 10")
lines.append("")
# Ray tracing for high quality
if self.ray_trace:
lines.append("# High-quality rendering")
lines.append("set ray_trace_mode, 1")
lines.append("set ray_trace_gain, 0.01")
lines.append("ray 2400, 2400")
lines.append("")
# Label residues option
lines.append("# Optional: Label residues (uncomment to enable)")
lines.append("# label sele and name CA, '%s-%s' % (resn, resi)")
lines.append("")
return "\n".join(lines)
def save(self) -> str:
"""Generate and save the script to file."""
script_content = self.generate_script()
output_path = Path(self.output)
output_path.write_text(script_content)
return str(output_path.absolute())
def main():
parser = argparse.ArgumentParser(
description="Generate PyMOL scripts for protein residue visualization",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Highlight specific residues with sticks representation
python main.py --pdb 1mbn --residues "A:64:HIS,A:93:VAL" --style sticks
# Use PDB ID and custom color scheme
python main.py --pdb 1abc --residues "B:23:LYS,B:45:ASP" --color_scheme chain
# Generate high-quality image script
python main.py --pdb protein.pdb --residues "156,202,245" --ray_trace --output hq.pml
"""
)
parser.add_argument("--pdb", required=True,
help="PDB file path or PDB ID (4-letter code)")
parser.add_argument("--residues", required=True,
help="Comma-separated residue specs (e.g., 'A:64:HIS,A:93:VAL')")
parser.add_argument("--style", default="sticks",
choices=["sticks", "spheres", "surface", "cartoon", "lines", "mesh"],
help="Representation style for highlighted residues (default: sticks)")
parser.add_argument("--color_scheme", default="gray",
help="Global color scheme: rainbow, chain, element, secondary, gray, white, or color name")
parser.add_argument("--output", default="output.pml",
help="Output script filename (default: output.pml)")
parser.add_argument("--ray_trace", action="store_true",
help="Include ray tracing commands for high-quality images")
args = parser.parse_args()
# Parse residues
residue_list = [r.strip() for r in args.residues.split(",")]
# Create generator
generator = PyMOLScriptGenerator(
pdb_source=args.pdb,
residues=residue_list,
style=args.style,
color_scheme=args.color_scheme,
output=args.output,
ray_trace=args.ray_trace
)
# Generate and save script
try:
output_path = generator.save()
print(f"PyMOL script generated successfully: {output_path}")
print(f"\nTo use:")
print(f" pymol {args.output}")
print(f" # or within PyMOL:")
print(f" @ {args.output}")
except Exception as e:
print(f"Error generating script: {e}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()
Prepare input files for molecular docking software, automatically determine Grid Box center and size. Supports AutoDock Vina, AutoDock4, and other mainstream...
---
name: protein-docking-configurator
description: Prepare input files for molecular docking software, automatically determine
Grid Box center and size. Supports AutoDock Vina, AutoDock4, and other mainstream
docking tools.
version: 1.0.0
category: Bioinfo
tags: []
author: AIPOCH
license: MIT
status: Draft
risk_level: Medium
skill_type: Tool/Script
owner: AIPOCH
reviewer: ''
last_updated: '2026-02-06'
---
# Protein Docking Configurator
## Features
- Parse protein PDB files, identify ligand binding pockets
- Automatically calculate Grid Box center coordinates and dimensions
- Generate AutoDock Vina configuration files
- Generate AutoDock4 Grid parameter files
- Support Box positioning based on active site residues or ligands
## Usage
### As Command Line Tool
```bash
# Calculate Grid Box based on active site residues
python scripts/main.py --receptor protein.pdb --active-site-residues "A:120,A:145,A:189" --software vina
# Calculate Grid Box based on reference ligand
python scripts/main.py --receptor protein.pdb --reference-ligand ligand.pdb --software vina
# Manually specify Box parameters
python scripts/main.py --receptor protein.pdb --center-x 10.5 --center-y -5.2 --center-z 20.1 --size-x 20 --size-y 20 --size-z 20 --software vina
```
### As Python Module
```python
from scripts.main import DockingConfigurator
config = DockingConfigurator()
# Calculate box from receptor and active site
config.from_active_site("protein.pdb", ["A:120", "A:145", "A:189"])
config.write_vina_config("config.txt", exhaustiveness=32)
# Calculate box from receptor and reference ligand
config.from_reference_ligand("protein.pdb", "ligand.pdb", padding=5.0)
config.write_autodock4_gpf("protein.gpf", spacing=0.375)
```
## Parameter Description
### Command Line Parameters
| Parameter | Description | Required |
|------|------|------|
| `--receptor` | Receptor protein PDB file path | Yes |
| `--software` | Docking software type (vina/autodock4) | Yes |
| `--active-site-residues` | Active site residue list, format: "chain:residue_number" | No |
| `--reference-ligand` | Reference ligand PDB/MOL file | No |
| `--center-x/y/z` | Grid Box center coordinates | No |
| `--size-x/y/z` | Grid Box dimensions (Å) | No |
| `--spacing` | Grid spacing (AutoDock4 only) | No (default 0.375) |
| `--exhaustiveness` | Search exhaustiveness (Vina only) | No (default 32) |
| `--output` | Output file path | No |
## Output
- **AutoDock Vina**: Generates config.txt configuration file
- **AutoDock4**: Generates .gpf (Grid Parameter File) and corresponding macromolecule files
## Dependencies
- Python 3.8+
- numpy
## Examples
```bash
# Example 1: Using active site residues
python scripts/main.py --receptor 1abc_receptor.pdb --active-site-residues "A:45,A:92,A:156" --software vina --output vina_config.txt
# Example 2: Using reference ligand with custom Box size
python scripts/main.py --receptor kinase.pdb --reference-ligand ATP.pdb --software vina --size-x 25 --size-y 25 --size-z 25
# Example 3: AutoDock4 configuration
python scripts/main.py --receptor protein.pdb --active-site-residues "A:100" --software autodock4 --spacing 0.375 --output protein.gpf
```
## Notes
1. Input PDB files should have water molecules and heteroatoms removed (unless needed)
2. It is recommended to protonate and calculate charges for the receptor (using AutoDock Tools, etc.)
3. Grid Box size should be sufficient to cover ligand conformational space, typically 20-30Å
4. Active site residues should include catalytic residues and key binding residues
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
No additional Python packages required.
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Protein Docking Configurator
为分子对接软件准备输入文件,自动确定Grid Box的中心和大小
支持AutoDock Vina、AutoDock4等主流对接软件
"""
import argparse
import sys
import re
from pathlib import Path
from typing import List, Tuple, Dict, Optional, Union
class PDBParser:
"""解析PDB文件,提取原子坐标信息"""
ATOM_RECORD_TYPES = ('ATOM', 'HETATM')
@staticmethod
def parse(pdb_file: str) -> List[Dict]:
"""
解析PDB文件,返回原子列表
Returns:
List[Dict]: 原子信息字典列表,包含:
- atom_id: 原子序号
- atom_name: 原子名称
- res_name: 残基名称
- chain_id: 链ID
- res_seq: 残基序号
- x, y, z: 坐标
- element: 元素符号
"""
atoms = []
with open(pdb_file, 'r') as f:
for line in f:
record_type = line[:6].strip()
if record_type not in PDBParser.ATOM_RECORD_TYPES:
continue
try:
atom = {
'atom_id': int(line[6:11].strip()),
'atom_name': line[12:16].strip(),
'alt_loc': line[16].strip(),
'res_name': line[17:20].strip(),
'chain_id': line[21].strip() or 'A',
'res_seq': int(line[22:26].strip()),
'x': float(line[30:38].strip()),
'y': float(line[38:46].strip()),
'z': float(line[46:54].strip()),
'element': line[76:78].strip() if len(line) > 76 else ''
}
atoms.append(atom)
except (ValueError, IndexError):
continue
return atoms
@staticmethod
def get_residue_atoms(atoms: List[Dict], chain_id: str, res_seq: int) -> List[Dict]:
"""获取指定残基的所有原子"""
return [a for a in atoms if a['chain_id'] == chain_id and a['res_seq'] == res_seq]
@staticmethod
def calculate_center(atoms: List[Dict]) -> Tuple[float, float, float]:
"""计算原子集合的几何中心"""
if not atoms:
raise ValueError("No atoms provided for center calculation")
n = len(atoms)
cx = sum(a['x'] for a in atoms) / n
cy = sum(a['y'] for a in atoms) / n
cz = sum(a['z'] for a in atoms) / n
return (cx, cy, cz)
@staticmethod
def calculate_bounding_box(atoms: List[Dict], padding: float = 0.0) -> Tuple[float, float, float]:
"""计算包围盒尺寸"""
if not atoms:
raise ValueError("No atoms provided for bounding box calculation")
xs = [a['x'] for a in atoms]
ys = [a['y'] for a in atoms]
zs = [a['z'] for a in atoms]
size_x = max(xs) - min(xs) + 2 * padding
size_y = max(ys) - min(ys) + 2 * padding
size_z = max(zs) - min(zs) + 2 * padding
return (size_x, size_y, size_z)
class GridBoxCalculator:
"""计算Grid Box的中心和尺寸"""
DEFAULT_SIZE = 20.0 # 默认Box大小 (Å)
DEFAULT_PADDING = 5.0 # 默认padding (Å)
def __init__(self):
self.center_x = 0.0
self.center_y = 0.0
self.center_z = 0.0
self.size_x = self.DEFAULT_SIZE
self.size_y = self.DEFAULT_SIZE
self.size_z = self.DEFAULT_SIZE
def from_residues(self, receptor_file: str, residue_specs: List[str]) -> 'GridBoxCalculator':
"""
基于活性位点残基计算Grid Box
Args:
receptor_file: 受体PDB文件路径
residue_specs: 残基规格列表,格式: ["A:120", "A:145", ...]
"""
atoms = PDBParser.parse(receptor_file)
selected_atoms = []
for spec in residue_specs:
chain_id, res_seq = self._parse_residue_spec(spec)
residue_atoms = PDBParser.get_residue_atoms(atoms, chain_id, res_seq)
selected_atoms.extend(residue_atoms)
if not selected_atoms:
raise ValueError(f"No atoms found for residues: {residue_specs}")
self.center_x, self.center_y, self.center_z = PDBParser.calculate_center(selected_atoms)
# 根据残基范围自动调整Box大小
self.size_x, self.size_y, self.size_z = PDBParser.calculate_bounding_box(
selected_atoms, padding=self.DEFAULT_PADDING
)
# 确保最小尺寸
self.size_x = max(self.size_x, self.DEFAULT_SIZE)
self.size_y = max(self.size_y, self.DEFAULT_SIZE)
self.size_z = max(self.size_z, self.DEFAULT_SIZE)
return self
def from_ligand(self, ligand_file: str, padding: float = 5.0) -> 'GridBoxCalculator':
"""
基于参考配体计算Grid Box
Args:
ligand_file: 配体PDB/MOL文件路径
padding: 配体周围的padding大小 (Å)
"""
atoms = PDBParser.parse(ligand_file)
if not atoms:
raise ValueError(f"No atoms found in ligand file: {ligand_file}")
self.center_x, self.center_y, self.center_z = PDBParser.calculate_center(atoms)
self.size_x, self.size_y, self.size_z = PDBParser.calculate_bounding_box(atoms, padding=padding)
return self
def set_manual(self, center_x: float, center_y: float, center_z: float,
size_x: float, size_y: float, size_z: float) -> 'GridBoxCalculator':
"""手动设置Grid Box参数"""
self.center_x = center_x
self.center_y = center_y
self.center_z = center_z
self.size_x = size_x
self.size_y = size_y
self.size_z = size_z
return self
def _parse_residue_spec(self, spec: str) -> Tuple[str, int]:
"""解析残基规格,如 'A:120' -> ('A', 120)"""
match = re.match(r'^([A-Za-z]?):?(\d+)$', spec.strip())
if match:
chain = match.group(1) or 'A'
res_seq = int(match.group(2))
return (chain, res_seq)
raise ValueError(f"Invalid residue specification: {spec}. Expected format: 'A:120' or '120'")
def get_params(self) -> Dict[str, float]:
"""获取Grid Box参数字典"""
return {
'center_x': self.center_x,
'center_y': self.center_y,
'center_z': self.center_z,
'size_x': self.size_x,
'size_y': self.size_y,
'size_z': self.size_z
}
class DockingConfigurator:
"""分子对接配置生成器"""
def __init__(self):
self.grid_calculator = GridBoxCalculator()
self.receptor_file = None
def from_active_site(self, receptor_file: str, residue_specs: List[str]) -> 'DockingConfigurator':
"""
基于活性位点残基初始化
Args:
receptor_file: 受体PDB文件路径
residue_specs: 残基规格列表,如 ["A:120", "A:145"]
"""
self.receptor_file = receptor_file
self.grid_calculator.from_residues(receptor_file, residue_specs)
return self
def from_reference_ligand(self, receptor_file: str, ligand_file: str,
padding: float = 5.0) -> 'DockingConfigurator':
"""
基于参考配体初始化
Args:
receptor_file: 受体PDB文件路径
ligand_file: 参考配体文件路径
padding: 配体周围的padding大小
"""
self.receptor_file = receptor_file
self.grid_calculator.from_ligand(ligand_file, padding=padding)
return self
def set_grid_params(self, center_x: float, center_y: float, center_z: float,
size_x: float, size_y: float, size_z: float) -> 'DockingConfigurator':
"""手动设置Grid Box参数"""
self.grid_calculator.set_manual(center_x, center_y, center_z, size_x, size_y, size_z)
return self
def write_vina_config(self, output_file: str, ligand_file: str = 'ligand.pdbqt',
out_file: str = 'out.pdbqt', exhaustiveness: int = 32,
num_modes: int = 9, energy_range: int = 4,
cpu: int = 1, seed: Optional[int] = None) -> str:
"""
生成AutoDock Vina配置文件
Args:
output_file: 输出配置文件路径
ligand_file: 配体文件路径
out_file: 输出对接构象文件路径
exhaustiveness: 搜索详尽度
num_modes: 输出构象数量
energy_range: 能量范围 (kcal/mol)
cpu: 使用的CPU核心数
seed: 随机种子
Returns:
str: 生成的配置文件内容
"""
params = self.grid_calculator.get_params()
lines = [
"# AutoDock Vina Configuration File",
f"# Generated by Protein Docking Configurator",
"",
f"receptor = {self.receptor_file or 'receptor.pdbqt'}",
f"ligand = {ligand_file}",
"",
f"out = {out_file}",
"",
f"center_x = {params['center_x']:.3f}",
f"center_y = {params['center_y']:.3f}",
f"center_z = {params['center_z']:.3f}",
"",
f"size_x = {params['size_x']:.3f}",
f"size_y = {params['size_y']:.3f}",
f"size_z = {params['size_z']:.3f}",
"",
f"exhaustiveness = {exhaustiveness}",
f"num_modes = {num_modes}",
f"energy_range = {energy_range}",
f"cpu = {cpu}",
]
if seed is not None:
lines.append(f"seed = {seed}")
content = "\n".join(lines)
with open(output_file, 'w') as f:
f.write(content)
return content
def write_autodock4_gpf(self, output_file: str, receptor_name: Optional[str] = None,
spacing: float = 0.375, npts: Optional[Tuple[int, int, int]] = None) -> str:
"""
生成AutoDock4 Grid Parameter File (GPF)
Args:
output_file: 输出GPF文件路径
receptor_name: 受体名称(默认为文件名)
spacing: 网格间距 (Å)
npts: 网格点数 (x, y, z),自动计算如果为None
Returns:
str: 生成的GPF内容
"""
params = self.grid_calculator.get_params()
if receptor_name is None and self.receptor_file:
receptor_name = Path(self.receptor_file).stem
elif receptor_name is None:
receptor_name = "receptor"
# 计算网格点数 (必须是偶数,且size = npts * spacing)
if npts is None:
npts_x = int(params['size_x'] / spacing) // 2 * 2 # 确保为偶数
npts_y = int(params['size_y'] / spacing) // 2 * 2
npts_z = int(params['size_z'] / spacing) // 2 * 2
# 最小点数
npts_x = max(npts_x, 60)
npts_y = max(npts_y, 60)
npts_z = max(npts_z, 60)
else:
npts_x, npts_y, npts_z = npts
lines = [
"# AutoDock4 Grid Parameter File",
f"# Generated by Protein Docking Configurator",
"",
f"npts {npts_x} {npts_y} {npts_z} # num.grid points in xyz",
f"gridfld {receptor_name}.maps.fld # grid_data_file",
f"spacing {spacing} # spacing (A)",
"",
f"receptor_types A C HD N NA OA SA S # receptor atom types",
f"ligand_types A C HD N NA OA SA S Cl # ligand atom types",
"",
f"receptor {receptor_name}.pdbqt # macromolecule",
f"gridcenter {params['center_x']:.3f} {params['center_y']:.3f} {params['center_z']:.3f}",
f"smooth 0.5 # store minimum energy w/in rad(A)",
"",
f"map {receptor_name}.A.map # atom-specific affinity map",
f"map {receptor_name}.C.map",
f"map {receptor_name}.HD.map",
f"map {receptor_name}.N.map",
f"map {receptor_name}.NA.map",
f"map {receptor_name}.OA.map",
f"map {receptor_name}.SA.map",
f"map {receptor_name}.S.map",
f"map {receptor_name}.Cl.map",
"",
f"elecmap {receptor_name}.e.map # electrostatic potential map",
f"dsolvmap {receptor_name}.d.map # desolvation potential map",
"",
"dielectric -0.1465 # <0, AD4 distance-dep.diel;>0, constant",
]
content = "\n".join(lines)
with open(output_file, 'w') as f:
f.write(content)
return content
def print_summary(self):
"""打印Grid Box参数摘要"""
params = self.grid_calculator.get_params()
print("\n" + "="*50)
print("Grid Box Configuration Summary")
print("="*50)
print(f"Center: ({params['center_x']:.3f}, {params['center_y']:.3f}, {params['center_z']:.3f})")
print(f"Size: ({params['size_x']:.3f}, {params['size_y']:.3f}, {params['size_z']:.3f})")
print(f"Volume: {params['size_x'] * params['size_y'] * params['size_z']:.1f} ų")
print("="*50)
def parse_residue_list(residue_str: str) -> List[str]:
"""解析残基列表字符串,如 'A:120,A:145,A:189' -> ['A:120', 'A:145', 'A:189']"""
return [r.strip() for r in residue_str.split(',')]
def main():
"""命令行入口"""
parser = argparse.ArgumentParser(
description='Protein Docking Configurator - 为分子对接准备输入文件',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
示例:
# 基于活性位点残基
python main.py --receptor protein.pdb --active-site-residues "A:120,A:145" --software vina
# 基于参考配体
python main.py --receptor protein.pdb --reference-ligand ligand.pdb --software vina
# 手动指定参数
python main.py --receptor protein.pdb --center-x 10.5 --center-y -5.2 --center-z 20.1 \\
--size-x 20 --size-y 20 --size-z 20 --software vina
"""
)
# 必需参数
parser.add_argument('--receptor', required=True, help='受体蛋白PDB文件路径')
parser.add_argument('--software', required=True, choices=['vina', 'autodock4'],
help='对接软件类型')
# Grid Box定义方式(三选一)
box_group = parser.add_mutually_exclusive_group()
box_group.add_argument('--active-site-residues', type=str,
help='活性位点残基列表,格式: "A:120,A:145,A:189"')
box_group.add_argument('--reference-ligand', type=str,
help='参考配体PDB/MOL文件路径')
# 手动Box参数
parser.add_argument('--center-x', type=float, help='Grid Box中心X坐标')
parser.add_argument('--center-y', type=float, help='Grid Box中心Y坐标')
parser.add_argument('--center-z', type=float, help='Grid Box中心Z坐标')
parser.add_argument('--size-x', type=float, default=20.0, help='Grid Box X尺寸 (Å)')
parser.add_argument('--size-y', type=float, default=20.0, help='Grid Box Y尺寸 (Å)')
parser.add_argument('--size-z', type=float, default=20.0, help='Grid Box Z尺寸 (Å)')
# AutoDock4参数
parser.add_argument('--spacing', type=float, default=0.375,
help='网格间距 (仅AutoDock4,默认0.375Å)')
# AutoDock Vina参数
parser.add_argument('--exhaustiveness', type=int, default=32,
help='搜索详尽度 (仅Vina,默认32)')
parser.add_argument('--num-modes', type=int, default=9,
help='输出构象数量 (仅Vina,默认9)')
# 输出参数
parser.add_argument('--output', '-o', type=str, help='输出文件路径')
parser.add_argument('--padding', type=float, default=5.0,
help='配体周围padding大小 (Å)')
parser.add_argument('--quiet', '-q', action='store_true', help='静默模式')
args = parser.parse_args()
# 创建配置器
config = DockingConfigurator()
try:
# 确定Grid Box参数
if args.active_site_residues:
residues = parse_residue_list(args.active_site_residues)
config.from_active_site(args.receptor, residues)
if not args.quiet:
print(f"Calculated grid box from {len(residues)} active site residue(s)")
elif args.reference_ligand:
config.from_reference_ligand(args.receptor, args.reference_ligand, padding=args.padding)
if not args.quiet:
print(f"Calculated grid box from reference ligand with {args.padding}Å padding")
elif args.center_x is not None and args.center_y is not None and args.center_z is not None:
config.set_grid_params(args.center_x, args.center_y, args.center_z,
args.size_x, args.size_y, args.size_z)
if not args.quiet:
print("Using manually specified grid box parameters")
else:
parser.error("必须指定一种Grid Box定义方式: --active-site-residues, --reference-ligand, 或 --center-x/y/z")
# 打印摘要
if not args.quiet:
config.print_summary()
# 生成配置文件
if args.software == 'vina':
default_output = 'vina_config.txt'
output_file = args.output or default_output
config.write_vina_config(output_file, exhaustiveness=args.exhaustiveness, num_modes=args.num_modes)
if not args.quiet:
print(f"\nAutoDock Vina config written to: {output_file}")
elif args.software == 'autodock4':
default_output = Path(args.receptor).stem + '.gpf'
output_file = args.output or default_output
config.write_autodock4_gpf(output_file, spacing=args.spacing)
if not args.quiet:
print(f"\nAutoDock4 GPF written to: {output_file}")
except FileNotFoundError as e:
print(f"Error: File not found - {e}", file=sys.stderr)
sys.exit(1)
except ValueError as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
if __name__ == '__main__':
main()
Generate professional prior authorization request letters for insurance companies with proper clinical justification and formatting.
---
name: prior-auth-letter-drafter
description: Generate professional prior authorization request letters for insurance companies with proper clinical justification and formatting.
license: MIT
skill-author: AIPOCH
---
# Prior Authorization Letter Drafter
Generate professional prior authorization request letters for insurance companies with proper clinical justification and formatting.
## When to Use
- Use this skill when the task is to Generate professional prior authorization request letters for insurance companies with proper clinical justification and formatting.
- Use this skill for academic writing tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
See `## Features` above for related details.
- Scope-focused workflow aligned to: Generate professional prior authorization request letters for insurance companies with proper clinical justification and formatting.
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
See `## Prerequisites` above for related details.
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `dataclasses`: `unspecified`. Declared in `requirements.txt`.
- `main`: `unspecified`. Declared in `requirements.txt`.
## Example Usage
See `## Usage` above for related details.
```bash
cd "20260318/scientific-skills/Academic Writing/prior-auth-letter-drafter"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
python scripts/main.py --input "Audit validation sample with explicit symptoms, history, assessment, and next-step plan."
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Features
- Insurance company-standard letter formatting
- Clinical justification with evidence-based reasoning
- ICD-10/CPT code integration
- Multiple authorization types (procedures, medications, DME)
- Customizable templates for different insurance carriers
## Usage
```text
python scripts/main.py --input patient_data.json --output letter.docx
```
### Input Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| patient_name | str | Yes | Full name of the patient |
| patient_id | str | Yes | Insurance member ID |
| provider_name | str | Yes | Requesting physician name |
| provider_npi | str | Yes | National Provider Identifier |
| service_type | str | Yes | Procedure, medication, or DME |
| cpt_code | str | No | CPT/HCPCS code |
| icd10_code | str | Yes | Diagnosis code(s) |
| clinical_justification | str | Yes | Medical necessity reasoning |
| insurance_carrier | str | Yes | Insurance company name |
### Service Types
- `procedure` - Surgical or diagnostic procedures
- `medication` - Specialty/brand-name drugs
- `dme` - Durable medical equipment
- `imaging` - Advanced imaging (MRI, CT, PET)
## Output
Generates a formatted prior authorization letter including:
- Header with provider and insurance information
- Patient demographics
- Requested service details with codes
- Clinical justification section
- Provider attestation and signature block
## Technical Notes
- Difficulty: Medium
- Dependencies: python-docx, jinja2
- Output format: DOCX (editable) or PDF
## References
- `references/letter_template.docx` - Base template
- `references/clinical_phrases.md` - Common clinical justification phrases
- `references/carrier_requirements.json` - Insurance-specific formatting rules
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
```text
# Python dependencies
pip install -r requirements.txt
```
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `prior-auth-letter-drafter` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `prior-auth-letter-drafter` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:prior-auth-letter-drafter_audit_result_v1.json
{
"meta": {
"skill_name": "prior-auth-letter-drafter",
"evaluated_on": "2026-03-22",
"evaluator_version": "[email protected]",
"category": "Academic Writing",
"execution_mode": "B",
"complexity": "Moderate",
"n_inputs": 5
},
"veto_gates": {
"skill_veto": {
"stability": "PASS",
"contract": "PASS",
"determinism": "PASS",
"security": "PASS",
"gate": "PASS"
},
"research_veto": {
"applicable": true,
"scientific_integrity": {
"result": "PASS",
"detail": "The archived evaluation preserved source-faithful writing behavior without adding unsupported results or conclusions."
},
"practice_boundaries": {
"result": "PASS",
"detail": "Practice boundaries held because the package kept to Generate professional prior authorization request letters for insurance companies with... instead of claiming new evidence."
},
"methodological_ground": {
"result": "PASS",
"detail": "The legacy audit preserved a method-grounded interpretation of the Generate professional prior authorization request letters for insurance companies with proper clinical justification and formatting workflow."
},
"code_usability": {
"result": "N/A",
"detail": "The audited output is a narrative or formatting deliverable rather than a code-first scientific workflow."
},
"gate": "PASS"
}
},
"static_score": {
"subtotal": 88,
"max": 100,
"categories": {
"functional_suitability": {
"score": 11,
"max": 12,
"note": "The writing workflow lands well overall, with minor remaining headroom in the final deliverable contract."
},
"reliability": {
"score": 10,
"max": 12,
"note": "The archived deduction in reliability traces back to: Stabilize executable path and fallback behavior. Some inputs only reached PARTIAL due to execution gaps or weak boundary handling"
},
"performance_context": {
"score": 8,
"max": 8,
"note": "The legacy audit gave full marks to performance context for this package."
},
"agent_usability": {
"score": 14,
"max": 16,
"note": "The archived score suggests slightly clearer routing would help an agent choose the right dissemination path faster."
},
"human_usability": {
"score": 8,
"max": 8,
"note": "No point loss was recorded for human usability in the legacy audit."
},
"security": {
"score": 10,
"max": 12,
"note": "Security scored well, though the archived review still left some room to state source-faithful boundaries more explicitly."
},
"maintainability": {
"score": 10,
"max": 12,
"note": "Maintainability stayed solid, with modest room to simplify or consolidate the conversion workflow."
},
"agent_specific": {
"score": 17,
"max": 20,
"note": "The archived deduction in agent specific traces back to: Stabilize executable path and fallback behavior. Some inputs only reached PARTIAL due to execution gaps or weak boundary handling"
}
}
},
"dynamic_score": {
"execution_avg": 83.6,
"max": 100,
"assertion_pass_rate": {
"passed": 18,
"total": 20
},
"inputs": [
{
"index": 1,
"type": "Canonical",
"label": "Generate professional prior authorization request letters for insurance companies with proper clinical justification and formatting",
"status": "COMPLETED",
"status_flag": "PASS",
"note": "The archived evaluation treated Generate professional prior authorization request letters for... as a clean in-scope run.",
"basic": 38,
"specialized": 52,
"total": 90,
"assertions_passed": 4,
"assertions_total": 4,
"assertions": [
{
"text": "The prior-auth-letter-drafter output structure covers required deliverable blocks",
"result": "PASS",
"note": "The legacy audit marked the deliverable structure as passing."
},
{
"text": "Script execution path is available (command exit code is 0)",
"result": "PASS",
"note": "Legacy command notes backed the passing execution-path judgment."
},
{
"text": "The output stays within declared skill scope and target objective",
"result": "PASS",
"note": "The legacy audit kept this scenario within the documented skill boundary."
},
{
"text": "Required research safety/boundary guidance is present without overclaims",
"result": "PASS",
"note": "The archived evaluation did not see this scenario drift outside the declared scope."
}
]
},
{
"index": 2,
"type": "Variant A",
"label": "Use this skill for academic writing tasks that require explicit assumptions, bounded scope, and a reproducible output format",
"status": "COMPLETED",
"status_flag": "PASS",
"note": "The archived evaluation treated Use this skill for academic writing tasks that require explicit... as a clean in-scope run.",
"basic": 36,
"specialized": 50,
"total": 86,
"assertions_passed": 4,
"assertions_total": 4,
"assertions": [
{
"text": "The prior-auth-letter-drafter output structure covers required deliverable blocks",
"result": "PASS",
"note": "The legacy audit marked the deliverable structure as passing."
},
{
"text": "Script execution path is available (command exit code is 0)",
"result": "PASS",
"note": "Command evidence was preserved in the legacy execution summary."
},
{
"text": "The output stays within declared skill scope and target objective",
"result": "PASS",
"note": "The legacy audit kept this scenario within the documented skill boundary."
},
{
"text": "Required research safety/boundary guidance is present without overclaims",
"result": "PASS",
"note": "The legacy audit kept this scenario within the documented skill boundary."
}
]
},
{
"index": 3,
"type": "Edge",
"label": "Generate professional prior authorization request letters for insurance companies with proper clinical justification and formatting",
"status": "COMPLETED",
"status_flag": "PASS",
"note": "For Generate professional prior authorization request letters for..., the preserved evidence is lightweight but positive: the packaged validation command behaved as expected.",
"basic": 35,
"specialized": 49,
"total": 84,
"assertions_passed": 4,
"assertions_total": 4,
"assertions": [
{
"text": "The prior-auth-letter-drafter output structure covers required deliverable blocks",
"result": "PASS",
"note": "The archived evaluation treated the output structure as aligned with the expected deliverable."
},
{
"text": "Script execution path is available (command exit code is 0)",
"result": "PASS",
"note": "Command evidence was preserved in the legacy execution summary."
},
{
"text": "The output stays within declared skill scope and target objective",
"result": "PASS",
"note": "The archived evaluation did not see this scenario drift outside the declared scope."
},
{
"text": "Required research safety/boundary guidance is present without overclaims",
"result": "PASS",
"note": "Scope remained controlled in the legacy review for this scenario."
}
]
},
{
"index": 4,
"type": "Variant B",
"label": "Packaged executable path(s): scripts/main.py",
"status": "COMPLETED",
"status_flag": "PASS",
"note": "The archived evaluation treated Packaged executable path(s): scripts/main.py as a clean in-scope run.",
"basic": 34,
"specialized": 48,
"total": 82,
"assertions_passed": 4,
"assertions_total": 4,
"assertions": [
{
"text": "The prior-auth-letter-drafter output structure covers required deliverable blocks",
"result": "PASS",
"note": "The legacy review accepted the deliverable shape for this scenario."
},
{
"text": "Script execution path is available (command exit code is 0)",
"result": "PASS",
"note": "Command evidence was preserved in the legacy execution summary."
},
{
"text": "The output stays within declared skill scope and target objective",
"result": "PASS",
"note": "The legacy audit kept this scenario within the documented skill boundary."
},
{
"text": "Required research safety/boundary guidance is present without overclaims",
"result": "PASS",
"note": "Scope remained controlled in the legacy review for this scenario."
}
]
},
{
"index": 5,
"type": "Stress",
"label": "End-to-end case for Scope-focused workflow aligned to: Generate professional prior authorization request letters for insurance companies with proper clinical justification and formatting",
"status": "PARTIAL",
"status_flag": "FAIL",
"note": "The preserved weakness for End-to-end case for Scope-focused workflow aligned to: Generate professional prior authorization request letters for insurance companies with proper clinical justification and formatting was concentrated in one point: The output stays within declared skill scope and target objective.",
"basic": 31,
"specialized": 45,
"total": 76,
"assertions_passed": 2,
"assertions_total": 4,
"assertions": [
{
"text": "The prior-auth-letter-drafter output structure covers required deliverable blocks",
"result": "PASS",
"note": "The archived evaluation treated the output structure as aligned with the expected deliverable."
},
{
"text": "Script execution path is available (command exit code is 0)",
"result": "PASS",
"note": "Legacy command notes backed the passing execution-path judgment."
},
{
"text": "The output stays within declared skill scope and target objective",
"result": "FAIL",
"note": "A boundary-related issue was preserved for this scenario in the legacy evaluation."
},
{
"text": "Required research safety/boundary guidance is present without overclaims",
"result": "FAIL",
"note": "The legacy audit recorded a scope-boundary problem for this scenario."
}
]
}
]
},
"final": {
"static_weighted": 35.2,
"dynamic_weighted": 50.2,
"score": 85,
"max": 100,
"grade": "Production Ready",
"grade_symbol": "*",
"deployable": true,
"veto_override": false
},
"key_strengths": [
"Primary routing is Academic Writing with execution mode B",
"Static quality score is 88/100 and dynamic average is 83.6/100",
"Assertions and command execution outcomes are recorded per input for human review"
],
"recommendations": [
{
"priority": "P1",
"title": "Stabilize executable path and fallback behavior",
"observed_in": [
5
],
"problem": "Some inputs only reached PARTIAL due to execution gaps or weak boundary handling",
"root_cause": "Example commands are not fully runnable or missing deterministic fallback",
"fix": "Add validated runnable commands and a strict fallback template for missing parameters and execution errors"
}
]
}
FILE:references/carrier_requirements.json
{
"carrier_requirements": {
"medicare": {
"required_fields": [
"patient_mbi",
"provider_npi",
"provider_ptan",
"icd10_codes",
"cpt_code",
"place_of_service"
],
"formatting_notes": "Use Medicare-specific modifiers when applicable. Include detailed documentation of medical necessity.",
"submission_method": "DME MAC or FI portal"
},
"medicaid": {
"required_fields": [
"patient_medicaid_id",
"provider_npi",
"icd10_codes",
"cpt_code",
"state_specific_requirements"
],
"formatting_notes": "State-specific forms may be required. Check individual state Medicaid requirements.",
"submission_method": "State Medicaid portal or paper form"
},
"blue_cross_blue_shield": {
"required_fields": [
"patient_member_id",
"group_number",
"provider_npi",
"icd10_codes",
"cpt_code",
"clinical_justification"
],
"formatting_notes": "Most plans use national standard forms. Clinical justification should be detailed.",
"submission_method": "Provider portal or fax"
},
"united_healthcare": {
"required_fields": [
"patient_member_id",
"provider_npi",
"tax_id",
"icd10_codes",
"cpt_code",
"clinical_notes_reference"
],
"formatting_notes": "Use UHC-specific prior auth forms when available. Attach relevant clinical notes.",
"submission_method": "UnitedHealthcare Provider Portal"
},
"aetna": {
"required_fields": [
"patient_member_id",
"provider_npi",
"icd10_codes",
"cpt_code",
"clinical_rationale"
],
"formatting_notes": "Aetna requires specific clinical criteria to be documented for many services.",
"submission_method": "Aetna Provider Portal or Availity"
},
"cigna": {
"required_fields": [
"patient_member_id",
"provider_npi",
"icd10_codes",
"cpt_code",
"clinical_indicators"
],
"formatting_notes": "Cigna uses evidence-based clinical guidelines for authorization decisions.",
"submission_method": "Cigna for Health Care Professionals portal"
},
"humana": {
"required_fields": [
"patient_member_id",
"provider_npi",
"icd10_codes",
"cpt_code",
"clinical_history"
],
"formatting_notes": "Humana requires prior auth for many services; check specific plan requirements.",
"submission_method": "Humana Provider Portal"
},
"kaiser_permanente": {
"required_fields": [
"patient_member_id",
"provider_npi",
"referring_provider",
"icd10_codes",
"cpt_code"
],
"formatting_notes": "Integrated system; prior auth typically managed within Kaiser network.",
"submission_method": "KP HealthConnect or regional portal"
}
},
"general_requirements": {
"common_fields": [
"Patient demographic information (name, DOB, member ID)",
"Provider information (name, NPI, contact details)",
"Service/procedure information (CPT/HCPCS codes, description)",
"Diagnosis information (ICD-10 codes)",
"Clinical justification for medical necessity",
"Proposed date of service",
"Place of service"
],
"supporting_documentation": [
"Clinical notes from relevant encounters",
"Diagnostic test results",
"Consultation reports",
"Prior treatment history",
"Failed alternative therapies documentation",
"Relevant imaging studies"
],
"timeline_requirements": {
"urgent": "Same day to 24-48 hours",
"standard": "5-14 business days depending on carrier",
"prospective": "Submit before service when possible"
}
},
"common_cpt_categories": {
"surgical_procedures": "10000-69999",
"radiology": "70000-79999",
"pathology": "80000-89999",
"medicine": "90000-99999",
"anesthesia": "00100-01999",
"evaluation_management": "99201-99499",
"category_iii": "0001T-9999T"
}
}
FILE:references/clinical_phrases.md
# Clinical Justification Phrases for Prior Authorization
## General Medical Necessity Language
### Opening Statements
- "The requested service is medically necessary and appropriate for this patient's condition."
- "Based on my clinical assessment, this intervention is required to prevent serious deterioration of the patient's health."
- "Conservative management has been attempted and has failed to provide adequate relief."
- "The patient's clinical presentation meets established criteria for this intervention."
### Clinical Criteria
- "The patient meets [X] of [Y] established clinical criteria for this procedure."
- "Standard of care guidelines support this intervention for patients with these clinical findings."
- "Evidence-based medicine supports this treatment approach for this diagnosis."
- "Clinical practice guidelines recommend this intervention for patients with this severity of disease."
## Procedure-Specific Phrases
### Surgical Procedures
- "Delay of surgical intervention would risk serious complications including [specific risks]."
- "Non-surgical alternatives have been exhausted and are no longer viable options."
- "The patient's condition has progressed despite [X] months of conservative management."
- "Minimally invasive approach is appropriate given the patient's overall health status."
### Diagnostic Imaging
- "Advanced imaging is necessary to confirm diagnosis and guide treatment planning."
- "Less expensive imaging modalities (X-ray, ultrasound) are insufficient for this clinical question."
- "MRI/CT is required to evaluate [specific anatomical structures] with adequate resolution."
- "Results of this imaging study will directly impact the patient's treatment plan."
### Medications
- "Formulary alternatives have been tried and failed due to [lack of efficacy/adverse reactions]."
- "The patient experienced [specific adverse effects] with preferred alternatives."
- "The requested medication is standard first-line therapy for this condition per [guideline]."
- "Step therapy requirements have been met with inadequate response."
### Durable Medical Equipment
- "The equipment is necessary for the patient to perform activities of daily living safely."
- "Rental would not be cost-effective given the expected duration of need (> [X] months)."
- "The patient's home environment has been assessed and supports use of this equipment."
- "Less expensive alternatives do not meet the patient's specific clinical needs."
## Documentation References
### Supporting Evidence
- "Clinical notes from [date range] document the progression of disease."
- "Laboratory values ([specific tests]) confirm the severity of the condition."
- "Imaging studies ([dates]) demonstrate [findings supporting medical necessity]."
- "Consultation reports from [specialist] support the need for this intervention."
### Failed Alternatives
- "The patient failed a [X]-week trial of [conservative treatment/medication]."
- "Physical therapy was attempted from [dates] without significant improvement in [specific measures]."
- "Oral medications provided inadequate relief and caused [side effects]."
- "The patient is not a candidate for [alternative] due to [contraindications]."
## Risk Statements
### Without Treatment
- "Without this intervention, the patient is at risk for [specific complications]."
- "Delay in treatment may result in permanent [disability/organ damage]."
- "The condition is likely to progress to [more serious state] without appropriate intervention."
- "Conservative management carries higher risk than the proposed intervention."
### Urgency
- "This request is urgent as the patient's condition is deteriorating."
- "Expedited review is requested due to the time-sensitive nature of this intervention."
- "Delay beyond [timeframe] may result in irreversible clinical decline."
## Quality of Life Impact
### Functional Limitation
- "The patient's condition significantly impairs their ability to perform [activities of daily living]."
- "The patient is unable to [work/maintain employment] due to their current symptoms."
- "Quality of life is severely compromised by [specific symptoms]."
- "The patient requires assistance with [specific activities] due to their condition."
### Mental Health Considerations
- "The chronic nature of this condition has resulted in significant anxiety/depression."
- "The patient's mental health is adversely affected by continued symptoms."
- "Appropriate treatment is expected to improve both physical and psychological well-being."
FILE:references/guidelines.md
# Clinical Guidelines References
## Purpose
This document provides clinical guideline references for prior authorization letter drafting.
## General Guidelines
### Rheumatoid Arthritis
- American College of Rheumatology (ACR) Guidelines
- European League Against Rheumatism (EULAR) Recommendations
### Psoriatic Arthritis
- ACR/SAA Treatment Guidelines
- GRAPPA Guidelines
### Ankylosing Spondylitis
- ASAS/EULAR Recommendations
### Inflammatory Bowel Disease
- AGA Clinical Guidelines
- ECCO Guidelines
## Documentation Requirements
### Standard PA Letter Components
1. Patient demographics
2. Provider credentials
3. Diagnosis with ICD-10 codes
4. Requested therapy details
5. Clinical rationale
6. Failed therapies documentation
7. Supporting literature
### Supporting Documents Checklist
- Prior authorization form
- Clinical notes
- Lab results
- Imaging reports (if relevant)
- Failed therapy documentation
- Peer-reviewed literature (optional)
## Best Practices
1. Always include specific ICD-10 codes
2. Document step therapy failures clearly
3. Reference specialty society guidelines
4. Include NPI for all providers
5. Maintain HIPAA compliance
## References
1. American College of Rheumatology. (2023). Rheumatoid Arthritis Treatment Guidelines.
2. Singh, J.A., et al. (2016). 2015 ACR Guidelines for Rheumatoid Arthritis. Arthritis Care & Research.
FILE:references/requirements.txt
# Python dependencies for prior-auth-letter-drafter
# Install with: pip install -r requirements.txt
# No external dependencies required
# Uses only Python standard library
FILE:requirements.txt
dataclasses
main
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Prior Authorization Letter Generator
Generates insurance prior authorization request letters with clinical justification.
"""
import argparse
import json
import os
import sys
from datetime import datetime
from typing import Dict, List, Optional
from dataclasses import dataclass
@dataclass
class PriorAuthRequest:
"""Data class for prior authorization request."""
patient_name: str
patient_id: str
patient_dob: str
provider_name: str
provider_npi: str
provider_address: str
provider_phone: str
service_type: str
service_description: str
cpt_code: Optional[str]
icd10_codes: List[str]
clinical_justification: str
insurance_carrier: str
insurance_address: str
request_date: str = ""
def __post_init__(self):
if not self.request_date:
self.request_date = datetime.now().strftime("%B %d, %Y")
class PriorAuthLetterGenerator:
"""Generator for prior authorization letters."""
SERVICE_TEMPLATES = {
"procedure": {
"title": "Prior Authorization Request - Medical Procedure",
"intro_phrase": "is medically necessary to treat the patient's condition",
},
"medication": {
"title": "Prior Authorization Request - Prescription Medication",
"intro_phrase": "is medically necessary and appropriate for this patient's condition",
},
"dme": {
"title": "Prior Authorization Request - Durable Medical Equipment",
"intro_phrase": "is medically necessary for the patient's daily functioning and care",
},
"imaging": {
"title": "Prior Authorization Request - Advanced Imaging",
"intro_phrase": "is medically necessary for accurate diagnosis and treatment planning",
},
}
def __init__(self, template_dir: Optional[str] = None):
self.template_dir = template_dir or os.path.join(
os.path.dirname(__file__), "..", "references"
)
def generate_letter(self, request: PriorAuthRequest) -> str:
"""Generate a prior authorization letter as formatted text."""
template = self.SERVICE_TEMPLATES.get(
request.service_type, self.SERVICE_TEMPLATES["procedure"]
)
letter_parts = []
# Header
letter_parts.extend([
request.provider_name,
request.provider_address,
f"Phone: {request.provider_phone}",
f"NPI: {request.provider_npi}",
"",
request.request_date,
"",
request.insurance_carrier,
request.insurance_address,
"",
"RE: Prior Authorization Request",
f"Patient: {request.patient_name}",
f"Member ID: {request.patient_id}",
f"Date of Birth: {request.patient_dob}",
"",
"-" * 60,
"",
])
# Body
letter_parts.extend([
f"To Whom It May Concern:",
"",
f"I am writing to request prior authorization for the following service:",
"",
f"Service: {request.service_description}",
])
if request.cpt_code:
letter_parts.append(f"CPT/HCPCS Code: {request.cpt_code}")
letter_parts.extend([
f"ICD-10 Diagnosis Code(s): {', '.join(request.icd10_codes)}",
"",
"CLINICAL JUSTIFICATION:",
"",
])
# Add clinical justification with proper formatting
justification_lines = request.clinical_justification.strip().split('\n')
for line in justification_lines:
letter_parts.append(line)
# Standard closing
letter_parts.extend([
"",
f"Based on my clinical assessment, the requested {request.service_description} {template['intro_phrase']}. "
"Alternative treatments have been considered and are not appropriate for this patient due to the specific clinical circumstances outlined above.",
"",
"Please contact my office if additional information is required to process this authorization request. "
"I am available to discuss this case during normal business hours.",
"",
"Thank you for your prompt attention to this matter.",
"",
"Sincerely,",
"",
"",
"_______________________________",
f"{request.provider_name}, M.D.",
f"NPI: {request.provider_npi}",
"",
"-" * 60,
"",
"Attachments: Clinical notes, Lab results, Supporting documentation (if applicable)",
])
return '\n'.join(letter_parts)
def save_letter(self, letter_text: str, output_path: str) -> str:
"""Save the letter to a file."""
# Ensure output directory exists
os.makedirs(os.path.dirname(output_path) if os.path.dirname(output_path) else '.', exist_ok=True)
with open(output_path, 'w', encoding='utf-8') as f:
f.write(letter_text)
return output_path
def generate_from_json(self, json_path: str, output_path: str) -> str:
"""Generate letter from JSON input file."""
with open(json_path, 'r', encoding='utf-8') as f:
data = json.load(f)
# Handle icd10_codes as string or list
icd10_codes = data.get('icd10_codes', data.get('icd10_code', []))
if isinstance(icd10_codes, str):
icd10_codes = [code.strip() for code in icd10_codes.split(',')]
request = PriorAuthRequest(
patient_name=data['patient_name'],
patient_id=data['patient_id'],
patient_dob=data.get('patient_dob', ''),
provider_name=data['provider_name'],
provider_npi=data['provider_npi'],
provider_address=data.get('provider_address', ''),
provider_phone=data.get('provider_phone', ''),
service_type=data.get('service_type', 'procedure'),
service_description=data.get('service_description', data.get('service_type', '')),
cpt_code=data.get('cpt_code'),
icd10_codes=icd10_codes,
clinical_justification=data.get('clinical_justification', data.get('justification', '')),
insurance_carrier=data['insurance_carrier'],
insurance_address=data.get('insurance_address', ''),
)
letter_text = self.generate_letter(request)
return self.save_letter(letter_text, output_path)
def create_sample_input(output_path: str):
"""Create a sample input JSON file."""
sample_data = {
"patient_name": "John Smith",
"patient_id": "INS123456789",
"patient_dob": "1980-05-15",
"provider_name": "Dr. Sarah Johnson",
"provider_npi": "1234567890",
"provider_address": "123 Medical Center Drive, Suite 200, City, State 12345",
"provider_phone": "(555) 123-4567",
"service_type": "procedure",
"service_description": "Laparoscopic Cholecystectomy",
"cpt_code": "47562",
"icd10_codes": ["K80.20"],
"clinical_justification": """The patient presents with symptomatic cholelithiasis with recurrent biliary colic.
Key clinical findings:
- Multiple episodes of right upper quadrant pain over the past 3 months
- Ultrasound confirmed gallstones with no evidence of common bile duct obstruction
- Patient has failed conservative management with dietary modifications
- Pain episodes are increasingly frequent and severe, affecting quality of life
- No contraindications to laparoscopic approach
Surgical intervention is indicated to prevent complications including acute cholecystitis, pancreatitis, and bile duct obstruction.""",
"insurance_carrier": "Blue Cross Blue Shield",
"insurance_address": "Prior Authorization Department, P.O. Box 12345, City, State 12345"
}
with open(output_path, 'w', encoding='utf-8') as f:
json.dump(sample_data, f, indent=2)
return output_path
def main():
parser = argparse.ArgumentParser(
description="Generate prior authorization request letters for insurance companies"
)
parser.add_argument(
'--input', '-i',
help='Path to JSON input file with patient and service details'
)
parser.add_argument(
'--output', '-o',
default='prior_auth_letter.txt',
help='Output file path (default: prior_auth_letter.txt)'
)
parser.add_argument(
'--create-sample',
action='store_true',
help='Create a sample input JSON file for reference'
)
parser.add_argument(
'--sample-output',
default='sample_input.json',
help='Path for sample input file (used with --create-sample)'
)
args = parser.parse_args()
if args.create_sample:
path = create_sample_input(args.sample_output)
print(f"Sample input file created: {path}")
print("Edit this file with actual patient information and run again with --input")
return 0
if not args.input:
print("Error: --input is required (or use --create-sample to generate a template)")
parser.print_help()
return 1
if not os.path.exists(args.input):
print(f"Error: Input file not found: {args.input}")
return 1
try:
generator = PriorAuthLetterGenerator()
output_path = generator.generate_from_json(args.input, args.output)
print(f"Prior authorization letter generated: {output_path}")
return 0
except KeyError as e:
print(f"Error: Missing required field in input file: {e}")
print("Required fields: patient_name, patient_id, provider_name, provider_npi,")
print(" insurance_carrier, clinical_justification")
return 1
except Exception as e:
print(f"Error generating letter: {e}")
return 1
if __name__ == '__main__':
sys.exit(main())
Check primers for dimers, hairpins, and off-target amplification
---
name: primer-design-check
description: Check primers for dimers, hairpins, and off-target amplification
version: 1.0.0
category: Wet Lab
tags: []
author: AIPOCH
license: MIT
status: Draft
risk_level: Medium
skill_type: Tool/Script
owner: AIPOCH
reviewer: ''
last_updated: '2026-02-06'
---
# Primer Design Check
In silico primer validation tool.
## Use Cases
- qPCR primer design
- Sequencing primer check
- Mutagenesis primer validation
## Parameters
- `forward_primer`: F sequence
- `reverse_primer`: R sequence
- `template`: Target genome (optional)
## Returns
- Dimer prediction
- Hairpin analysis
- Off-target BLAST results
- Tm and GC% calculations
## Example
Flags: Self-dimer detected at 3' end → redesign recommended
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
No additional Python packages required.
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Primer Design Check
Check primers for dimers, hairpins, and off-target amplification.
"""
import argparse
import re
class PrimerChecker:
"""Check primer quality."""
def calculate_tm(self, sequence):
"""Calculate melting temperature (simplified)."""
A = sequence.count('A')
T = sequence.count('T')
G = sequence.count('G')
C = sequence.count('C')
if len(sequence) < 14:
tm = 2 * (A + T) + 4 * (G + C)
else:
tm = 64.9 + 41 * (G + C - 16.4) / len(sequence)
return tm
def check_hairpin(self, sequence):
"""Check for hairpin structures."""
# Simplified check for self-complementarity
rev_comp = self.reverse_complement(sequence)
# Check 3' end complementarity
end_match = sum(1 for a, b in zip(sequence[-5:], rev_comp[:5]) if a == b)
if end_match >= 3:
return True, f"Potential hairpin (3' complementarity: {end_match}/5)"
return False, "No significant hairpin detected"
def check_self_dimer(self, sequence):
"""Check for self-dimer formation."""
# Simple check for 3' end complementarity with itself
rev_comp = self.reverse_complement(sequence)
# Check last 4 bases
matches = sum(1 for a, b in zip(sequence[-4:], rev_comp[-4:]) if a == b)
if matches >= 3:
return True, f"Potential self-dimer ({matches}/4 bases match)"
return False, "Low self-dimer risk"
def reverse_complement(self, seq):
"""Get reverse complement."""
complement = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G', 'N': 'N'}
return ''.join(complement.get(base, base) for base in reversed(seq))
def check_primer(self, sequence, name="Primer"):
"""Comprehensive primer check."""
results = {
"name": name,
"sequence": sequence,
"length": len(sequence),
"tm": self.calculate_tm(sequence),
"gc_content": (sequence.count('G') + sequence.count('C')) / len(sequence) * 100
}
# Check for hairpin
has_hairpin, hairpin_msg = self.check_hairpin(sequence)
results["hairpin"] = has_hairpin
results["hairpin_comment"] = hairpin_msg
# Check for self-dimer
has_dimer, dimer_msg = self.check_self_dimer(sequence)
results["self_dimer"] = has_dimer
results["dimer_comment"] = dimer_msg
return results
def print_report(self, results):
"""Print primer check report."""
print(f"\n{'='*60}")
print(f"PRIMER CHECK: {results['name']}")
print(f"{'='*60}\n")
print(f"Sequence: {results['sequence']}")
print(f"Length: {results['length']} bp")
print(f"Tm: {results['tm']:.1f}°C")
print(f"GC Content: {results['gc_content']:.1f}%")
print()
status = "✓ PASS" if not (results['hairpin'] or results['self_dimer']) else "✗ FAIL"
print(f"Overall: {status}")
print()
print(f"Hairpin: {results['hairpin_comment']}")
print(f"Dimer: {results['dimer_comment']}")
print(f"\n{'='*60}\n")
def main():
parser = argparse.ArgumentParser(description="Primer Design Check")
parser.add_argument("--forward", "-f", help="Forward primer sequence")
parser.add_argument("--reverse", "-r", help="Reverse primer sequence")
args = parser.parse_args()
checker = PrimerChecker()
if args.forward:
results = checker.check_primer(args.forward.upper(), "Forward")
checker.print_report(results)
if args.reverse:
results = checker.check_primer(args.reverse.upper(), "Reverse")
checker.print_report(results)
if not args.forward and not args.reverse:
# Demo
demo_primer = "ATCGATCGATCGATCG"
results = checker.check_primer(demo_primer, "Demo Primer")
checker.print_report(results)
if __name__ == "__main__":
main()
Creates engaging opening statements and powerful closings for medical.
---
name: presentation-hook
description: Creates engaging opening statements and powerful closings for medical.
license: MIT
skill-author: AIPOCH
---
# Presentation Hook
Crafts presentation openings and closings.
## When to Use
- Use this skill when the task needs Creates engaging opening statements and powerful closings for medical.
- Use this skill for other tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when the response must stay inside the documented task boundary instead of expanding into adjacent work.
## Key Features
See `## Features` above for related details.
- Scope-focused workflow aligned to: Creates engaging opening statements and powerful closings for medical.
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
See `## Prerequisites` above for related details.
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `Third-party packages`: `not explicitly version-pinned in this skill package`. Add pinned versions if this skill needs stricter environment control.
## Example Usage
```bash
cd "20260318/scientific-skills/Academic Writing/presentation-hook"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Features
- Attention-grabbing openings
- Memorable closings
- Audience-specific hooks
- Storytelling elements
## Input Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `topic` | str | Yes | Presentation topic |
| `audience` | str | Yes | Target audience |
| `type` | str | Yes | "opening" or "closing" |
## Output Format
```json
{
"hook": "string",
"alternative_hooks": ["string"]
}
```
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
No additional Python packages required.
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `presentation-hook` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `presentation-hook` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:references/guidelines.md
# Presentation Hook - References
## Public Speaking
- Presentation Skills
- Storytelling Techniques
FILE:scripts/main.py
#!/usr/bin/env python3
"""Presentation Hook - Opening and closing generators."""
import json
class PresentationHook:
"""Creates presentation hooks."""
def generate(self, topic: str, audience: str, hook_type: str) -> dict:
"""Generate hook."""
if hook_type == "opening":
hook = f"What if I told you that {topic.lower()} could change everything we know about patient care?"
alternatives = [
f"Every year, thousands of patients face challenges with {topic.lower()}.",
f"Imagine a world where {topic.lower()} is no longer a barrier."
]
else: # closing
hook = f"Together, we can transform {topic.lower()} and improve patient outcomes."
alternatives = [
f"The future of {topic.lower()} starts with the actions we take today.",
f"Thank you. Let's make a difference in {topic.lower()}."
]
return {
"hook": hook,
"alternative_hooks": alternatives,
"type": hook_type
}
def main():
gen = PresentationHook()
result = gen.generate("diabetes management", "clinicians", "opening")
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()
Use preclinical pkpd analyst for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.
---
name: preclinical-pkpd-analyst
description: Use preclinical pkpd analyst for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.
license: MIT
skill-author: AIPOCH
---
# Pre-clinical PK/PD Analyst
Pharmacokinetic analysis automation.
## When to Use
- Use this skill when the task needs Use preclinical pkpd analyst for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.
- Use this skill for data analysis tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
- Scope-focused workflow aligned to: Use preclinical pkpd analyst for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.
- Packaged executable path(s): `scripts/main.py`.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
See `## Prerequisites` above for related details.
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `numpy`: `unspecified`. Declared in `requirements.txt`.
- `scipy`: `unspecified`. Declared in `requirements.txt`.
## Example Usage
```bash
cd "20260318/scientific-skills/Data Analytics/preclinical-pkpd-analyst"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Use Cases
- IND-enabling studies
- Dose selection
- Drug candidate ranking
- WinNonlin alternative
## Parameters
- `concentration_data`: Time-conc pairs
- `dose`: Administered dose
- `admin_route`: IV/PO/SC
## Returns
- AUC, Cmax, Tmax, T1/2
- Clearance and volume
- Non-compartmental analysis
- PK report template
## Example
Rat PK data → AUC = 1250 ng·h/mL, T1/2 = 4.2h
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
```text
# Python dependencies
pip install -r requirements.txt
```
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `preclinical-pkpd-analyst` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `preclinical-pkpd-analyst` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:requirements.txt
numpy
scipy
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Preclinical PK/PD Analyst
Calculate PK parameters from blood concentration-time data.
"""
import argparse
import numpy as np
from scipy.optimize import curve_fit
class PKPDAnalyzer:
"""Analyze preclinical PK/PD data."""
def one_compartment_model(self, t, A, ke, ka):
"""One-compartment oral model."""
if ka == ke:
return A * t * np.exp(-ke * t)
return A * (np.exp(-ke * t) - np.exp(-ka * t))
def calculate_pk_parameters(self, time_points, concentrations):
"""Calculate PK parameters from data."""
# Fit model
popt, _ = curve_fit(self.one_compartment_model, time_points, concentrations,
p0=[max(concentrations), 0.1, 1.0],
maxfev=10000)
A, ke, ka = popt
# Calculate parameters
cmax = max(concentrations)
tmax = time_points[np.argmax(concentrations)]
# AUC (trapezoidal rule)
auc = np.trapz(concentrations, time_points)
# Half-life
t_half = 0.693 / ke
# Clearance (assuming dose = 1 for normalized data)
cl = 1 / auc if auc > 0 else 0
return {
"Cmax": cmax,
"Tmax": tmax,
"AUC": auc,
"t_half": t_half,
"ke": ke,
"ka": ka,
"Clearance": cl
}
def print_parameters(self, params):
"""Print PK parameters."""
print(f"\n{'='*60}")
print("PK PARAMETERS")
print(f"{'='*60}\n")
print(f"Cmax: {params['Cmax']:.2f} μg/mL")
print(f"Tmax: {params['Tmax']:.2f} h")
print(f"AUC(0-t): {params['AUC']:.2f} μg·h/mL")
print(f"t1/2: {params['t_half']:.2f} h")
print(f"ke: {params['ke']:.4f} 1/h")
print(f"ka: {params['ka']:.4f} 1/h")
print(f"Clearance: {params['Clearance']:.4f} L/h")
print(f"\n{'='*60}\n")
def main():
parser = argparse.ArgumentParser(description="Preclinical PK/PD Analyst")
parser.add_argument("--data", "-d", help="PK data file (time,concentration)")
parser.add_argument("--demo", action="store_true", help="Run demo analysis")
args = parser.parse_args()
analyzer = PKPDAnalyzer()
if args.demo or not args.data:
# Demo data
time_points = np.array([0, 0.5, 1, 2, 4, 8, 12, 24])
concentrations = np.array([0, 5.2, 8.5, 7.8, 5.1, 2.8, 1.5, 0.4])
else:
data = np.loadtxt(args.data, delimiter=',')
time_points = data[:, 0]
concentrations = data[:, 1]
params = analyzer.calculate_pk_parameters(time_points, concentrations)
analyzer.print_parameters(params)
if __name__ == "__main__":
main()
Use poster layout planner for other workflows that need structured execution, explicit assumptions, and clear output boundaries.
---
name: poster-layout-planner
description: Use poster layout planner for other workflows that need structured execution, explicit assumptions, and clear output boundaries.
license: MIT
skill-author: AIPOCH
---
# Poster Layout Planner
Designs academic poster layouts.
## When to Use
- Use this skill when the task needs Use poster layout planner for other workflows that need structured execution, explicit assumptions, and clear output boundaries.
- Use this skill for other tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when the response must stay inside the documented task boundary instead of expanding into adjacent work.
## Key Features
See `## Features` above for related details.
- Scope-focused workflow aligned to: Use poster layout planner for other workflows that need structured execution, explicit assumptions, and clear output boundaries.
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
See `## Prerequisites` above for related details.
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `Third-party packages`: `not explicitly version-pinned in this skill package`. Add pinned versions if this skill needs stricter environment control.
## Example Usage
```bash
cd "20260318/scientific-skills/Academic Writing/poster-layout-planner"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Features
- Section placement
- Visual hierarchy
- Size recommendations
- Content flow optimization
## Input Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `size` | str | Yes | Poster dimensions |
| `sections` | list | Yes | Content sections |
## Output Format
```json
{
"layout_plan": "string",
"section_placement": "dict"
}
```
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
No additional Python packages required.
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `poster-layout-planner` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `poster-layout-planner` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:references/guidelines.md
# Poster Layout Planner - References
## Poster Design
- Academic Poster Guidelines
- Visual Design Principles
FILE:scripts/main.py
#!/usr/bin/env python3
"""Poster Layout Planner - Academic poster design."""
import json
class PosterLayoutPlanner:
"""Plans poster layouts."""
def plan(self, size: str, sections: list) -> dict:
"""Generate layout plan."""
placement = {
"top": ["Title", "Authors"],
"left_column": sections[:len(sections)//2],
"right_column": sections[len(sections)//2:],
"bottom": ["References", "Acknowledgments"]
}
return {
"layout_plan": f"{size} poster with 2-column layout",
"section_placement": placement,
"recommendations": ["Keep title visible from 6 feet", "Use high contrast colors"]
}
def main():
planner = PosterLayoutPlanner()
result = planner.plan("36x48 inches", ["Introduction", "Methods", "Results", "Conclusion"])
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()
Filter and match postdoctoral fellowship opportunities based on applicant nationality, years since PhD, and research field from a curated database.
---
name: postdoc-fellowship-matcher
description: Filter and match postdoctoral fellowship opportunities based on applicant nationality, years since PhD, and research field from a curated database.
license: MIT
skill-author: AIPOCH
---
# Postdoc Fellowship Matcher
Filter postdoctoral fellowships based on applicant nationality, years since PhD, and research area.
## Quick Check
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
```
## When to Use
- Use this skill when a postdoc applicant needs to identify eligible fellowships based on their nationality, career stage, and research field.
- Use this skill when comparing fellowship requirements and deadlines across multiple programs.
- Do not use this skill to draft fellowship applications, write personal statements, or guarantee eligibility determinations.
## Workflow
1. Confirm the applicant's nationality, years since PhD completion, and research field.
2. Validate that the request is for fellowship matching, not application writing or eligibility certification.
3. Filter the fellowship database against the provided criteria.
4. Return a ranked list of eligible fellowships with deadlines, requirements, and notes.
5. If inputs are incomplete, state which fields are missing and request only the minimum additional information.
## Usage
```text
python scripts/main.py --nationality US --years 1 --field "immunology"
python scripts/main.py --nationality CN --years 3 --field "structural biology" --name "Dr. Zhang"
```
## Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `--nationality` | string | Yes | Applicant nationality (e.g., `US`, `CN`, `DE`) |
| `--years` | integer | Yes | Years since PhD completion |
| `--field` | string | Yes | Research field (e.g., `immunology`, `neuroscience`) |
| `--name` | string | No | Applicant name (for report header) |
## Fellowship Database
Includes: NIH F32 · NSF Postdoctoral Fellowships · HFSP Fellowship · EMBO Fellowship · Marie Curie Fellowships · Schmidt Science Fellows
→ Full fellowship details: [references/fellowships.md](references/fellowships.md)
## Field Input Normalization
The `--field` parameter accepts free-text research field names. Common aliases are normalized automatically:
| Input | Normalized To |
|-------|---------------|
| `structural bio` | `structural biology` |
| `cell bio` | `cell biology` |
| `neuro` | `neuroscience` |
| `immuno` | `immunology` |
If your field is not recognized, the skill will return the closest matches and ask you to confirm.
## Output
- Eligible fellowships list with match rationale
- Deadlines and key requirements per fellowship
- Notes on eligibility caveats and verification steps
## Scope Boundaries
- Fellowship data reflects the built-in database; verify current deadlines and requirements on official program websites before applying.
- This skill does not access live fellowship databases or real-time deadline updates.
- Eligibility output is a filter result, not a legal or official determination.
## Stress-Case Rules
For complex multi-constraint requests, always include these explicit blocks:
1. Assumptions
2. Filter Criteria Applied
3. Matched Fellowships
4. Caveats and Verification Steps
5. Next Checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate fellowship deadlines, eligibility rules, or success rate statistics.
## Input Validation
This skill accepts: applicant nationality, years since PhD, and research field for fellowship eligibility filtering.
If the request does not involve fellowship matching — for example, asking to write a fellowship application, provide career counseling, or access live grant databases — do not proceed with the workflow. Instead respond:
> "postdoc-fellowship-matcher is designed to filter fellowship opportunities based on applicant profile criteria. Your request appears to be outside this scope. For application writing support, consider using an academic writing skill or consulting your institution's postdoc office. Please provide nationality, years since PhD, and research field, or use a more appropriate tool."
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:POLISH_CHANGELOG.md
# POLISH_CHANGELOG — postdoc-fellowship-matcher
**Original Score:** 79
**Polish Date:** 2026-03-19
## Issues Addressed
### P0 / Veto Fixes
- None (no veto failures)
### P1 Fixes
- **Fellowship database details missing:** Added "Field Input Normalization" section with alias mapping table (e.g., `neuro` → `neuroscience`). Added link to `references/fellowships.md` from the Fellowship Database section.
- **references/fellowships.md created:** New file with full fellowship details (NIH F32, NSF PRFB, HFSP, EMBO, MSCA, Schmidt Science Fellows) including eligibility, deadlines, duration, and official URLs.
### P2 Fixes
- **Input Validation redirect improved:** Added specific redirect suggestion ("consult your institution's postdoc office") for out-of-scope application writing requests.
### QS-1 (Input Validation)
- Already present; redirect message strengthened.
### QS-2 (Progressive Disclosure)
- Fellowship details moved to `references/fellowships.md` to keep SKILL.md concise.
### QS-3 (Canonical YAML Frontmatter)
- Already present with all four required fields.
FILE:references/fellowships.md
# Fellowship Database Reference
This file contains the fellowship database used by postdoc-fellowship-matcher. Update deadlines and eligibility rules here without modifying the core SKILL.md.
> **Note:** Always verify current deadlines and requirements on official program websites before applying. This database reflects known information and may not reflect the latest updates.
## NIH Ruth L. Kirschstein NRSA Individual Postdoctoral Fellowship (F32)
- **Eligibility:** US citizens and permanent residents only
- **Years since PhD:** 0–5 years
- **Fields:** Biomedical, behavioral, clinical, and social sciences
- **Deadline:** Three cycles per year (Feb, Jun, Oct — check NIH for exact dates)
- **Duration:** Up to 3 years
- **URL:** https://grants.nih.gov/grants/guide/pa-files/PA-23-271.html
## NSF Postdoctoral Research Fellowships in Biology (PRFB)
- **Eligibility:** US citizens and permanent residents only
- **Years since PhD:** 0–2 years
- **Fields:** Biological sciences
- **Deadline:** Annual (typically November)
- **Duration:** 2–3 years
- **URL:** https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503622
## Human Frontier Science Program (HFSP) Long-Term Fellowship
- **Eligibility:** International (must change country for postdoc)
- **Years since PhD:** 0–3 years
- **Fields:** Life sciences with interdisciplinary focus
- **Deadline:** Annual (typically May)
- **Duration:** 3 years
- **URL:** https://www.hfsp.org/funding/hfsp-funding/postdoctoral-fellowships
## EMBO Postdoctoral Fellowship
- **Eligibility:** Must move to a different country; EMBC member states
- **Years since PhD:** 0–2 years
- **Fields:** Life sciences
- **Deadline:** Two cycles per year (Feb, Aug)
- **Duration:** Up to 2 years
- **URL:** https://www.embo.org/funding/fellowships-awards-and-grants/postdoctoral-fellowships/
## Marie Skłodowska-Curie Actions (MSCA) Postdoctoral Fellowships
- **Eligibility:** Any nationality; must move to a different country
- **Years since PhD:** 0–8 years
- **Fields:** All disciplines
- **Deadline:** Annual (typically September)
- **Duration:** 1–2 years (European Fellowships), up to 3 years (Global Fellowships)
- **URL:** https://marie-sklodowska-curie-actions.ec.europa.eu/actions/postdoctoral-fellowships
## Schmidt Science Fellows
- **Eligibility:** International; must pivot to a new field
- **Years since PhD:** 0–1 year
- **Fields:** Natural sciences, engineering, mathematics, computing (with pivot requirement)
- **Deadline:** Annual (typically January)
- **Duration:** 1 year
- **URL:** https://schmidtsciencefellows.org/
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Postdoc Fellowship Matcher
Match postdoc applicants to eligible fellowships.
"""
import argparse
class FellowshipMatcher:
"""Match applicants to postdoctoral fellowships."""
FELLOWSHIPS = [
{
"name": "NIH F32",
"eligible_nationalities": ["US", "permanent resident"],
"max_years_post_phd": 3,
"deadline": "April/October/December"
},
{
"name": "NSF Postdoc",
"eligible_nationalities": ["any"],
"max_years_post_phd": 2,
"deadline": "October"
},
{
"name": "EMBO Fellowship",
"eligible_nationalities": ["any"],
"max_years_post_phd": 2,
"deadline": "February/August"
},
{
"name": "HFSP Fellowship",
"eligible_nationalities": ["any"],
"max_years_post_phd": 3,
"deadline": "March/August"
}
]
def find_matches(self, nationality, years_post_phd, field):
"""Find eligible fellowships."""
matches = []
for fellowship in self.FELLOWSHIPS:
# Check nationality
nat_ok = ("any" in fellowship["eligible_nationalities"] or
nationality in fellowship["eligible_nationalities"])
# Check years
years_ok = years_post_phd <= fellowship["max_years_post_phd"]
if nat_ok and years_ok:
matches.append(fellowship)
return matches
def print_matches(self, matches, applicant_info):
"""Print matching fellowships."""
print(f"\n{'='*60}")
print(f"FELLOWSHIP MATCHES FOR {applicant_info['name']}")
print(f"{'='*60}\n")
print(f"Nationality: {applicant_info['nationality']}")
print(f"Years post-PhD: {applicant_info['years']}")
print(f"Field: {applicant_info['field']}")
print()
if matches:
print(f"Found {len(matches)} eligible fellowship(s):\n")
for f in matches:
print(f" • {f['name']}")
print(f" Deadline: {f['deadline']}")
print()
else:
print("No eligible fellowships found with current criteria.")
print(f"{'='*60}\n")
def main():
parser = argparse.ArgumentParser(description="Postdoc Fellowship Matcher")
parser.add_argument("--nationality", "-n", required=True, help="Your nationality")
parser.add_argument("--years", "-y", type=int, required=True, help="Years since PhD")
parser.add_argument("--field", "-f", required=True, help="Research field")
parser.add_argument("--name", default="Applicant", help="Your name")
args = parser.parse_args()
matcher = FellowshipMatcher()
applicant = {
"name": args.name,
"nationality": args.nationality,
"years": args.years,
"field": args.field
}
matches = matcher.find_matches(args.nationality, args.years, args.field)
matcher.print_matches(matches, applicant)
if __name__ == "__main__":
main()
Use when: User provides text/document and asks to check originality, detect plagiarism, assess similarity, or rewrite high-duplicate content. Triggers: "chec...
---
name: plagiarism-checker-pre-screener
description: "Use when: User provides text/document and asks to check originality,\
\ \ndetect plagiarism, assess similarity, or rewrite high-duplicate content.\nTriggers:\
\ \"check plagiarism\", \"originality check\", \"similarity detection\",\n\"改写重复内容\"\
, \"降重\", \"查重\", \"原创性检测\", \"抄袭检查\"\nInput: Text content or document (txt, md,\
\ docx support via text extraction)\nOutput: Originality score, highlighted duplicate/similar\
\ paragraphs, paraphrasing suggestions"
version: 1.0.0
category: Research
tags: []
author: AIPOCH
license: MIT
status: Draft
risk_level: Medium
skill_type: Tool/Script
owner: AIPOCH
reviewer: ''
last_updated: '2026-02-06'
---
# Plagiarism Checker Pre-Screener
Pre-screens text for potential plagiarism by detecting similarity patterns and providing paraphrasing suggestions for high-duplicate sections.
## Technical Difficulty: High ⚠️
> **AI自主验收状态**: 需人工检查
> This skill uses advanced NLP techniques. Results should be manually reviewed before submission.
## Features
1. **Text Similarity Detection**: Identifies potentially plagiarized or highly similar text segments
2. **Originality Scoring**: Provides overall originality percentage (0-100%)
3. **Paraphrasing Suggestions**: Offers AI-powered rewriting for flagged sections
4. **Segment Analysis**: Breaks text into sentences/paragraphs for granular checking
## Usage
### Basic Check
```bash
python scripts/main.py --input "Your text here" --threshold 0.75
```
### File Analysis
```bash
python scripts/main.py --file document.txt --output report.json
```
### With Paraphrasing
```bash
python scripts/main.py --input "text" --paraphrase --style academic
```
## Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `--input` | string | - | Direct text input (alternative to --file) |
| `--file` | path | - | Path to text file to analyze |
| `--threshold` | float | 0.70 | Similarity threshold (0.0-1.0) for flagging |
| `--paraphrase` | flag | false | Enable paraphrasing suggestions |
| `--style` | string | neutral | Paraphrasing style: academic/formal/casual/neutral |
| `--output` | path | stdout | Output file path (JSON format) |
| `--segments` | string | sentence | Analysis unit: sentence/paragraph |
## Output Format
```json
{
"originality_score": 85.5,
"total_segments": 12,
"flagged_segments": 2,
"segments": [
{
"index": 1,
"text": "Original sentence text...",
"similarity_score": 0.92,
"flagged": true,
"paraphrase_suggestion": "Rewritten version..."
}
],
"summary": "Text shows high originality with minor flagged sections"
}
```
## Implementation Notes
- Uses TF-IDF + Cosine Similarity for local similarity detection
- Employs semantic embeddings for meaning-based comparison
- Paraphrasing uses transformer-based models
- No external API calls required; runs locally
## References
- `references/algorithm.md` - Technical algorithm details
- `references/paraphrasing_guide.md` - Paraphrasing methodology
## Limitations
1. Cannot access external databases (internet search required for comprehensive checking)
2. Local similarity only - won't catch plagiarism from external sources
3. Paraphrasing quality depends on input text complexity
4. Processing time increases with document length
## Safety & Privacy
- All processing is local - no text sent to external APIs
- Suitable for sensitive/confidential documents
- No data retention after analysis completes
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
```bash
# Python dependencies
pip install -r requirements.txt
```
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
FILE:references/algorithm.md
# Algorithm Technical Documentation
## Similarity Detection Algorithms
### 1. TF-IDF + Cosine Similarity
**Term Frequency (TF):**
```
TF(t, d) = (Number of times term t appears in document d) / (Total terms in d)
```
**Cosine Similarity:**
```
sim(A, B) = (A · B) / (||A|| × ||B||)
```
Where:
- A · B = dot product of TF vectors
- ||A|| = Euclidean norm of vector A
### 2. N-gram Overlap
Character and word n-grams capture local text patterns:
```
Jaccard_ngram(A, B) = |ngrams(A) ∩ ngrams(B)| / |ngrams(A) ∪ ngrams(B)|
```
Default n=3 for optimal balance of precision and recall.
### 3. Combined Similarity Score
Weighted ensemble for robust detection:
```
Combined_Sim = 0.5 × Cosine + 0.3 × N-gram + 0.2 × Jaccard
```
Weights determined empirically to minimize false positives while maintaining sensitivity.
## Paraphrasing Methodology
### Rule-Based Transformations
1. **Synonym Replacement**
- Dictionary-based word substitution
- Preserves grammatical structure
- Maintains semantic equivalence
2. **Syntactic Restructuring**
- Active ↔ Passive voice conversion
- Clause reordering
- Conjunction substitution
3. **Style Adaptation**
- Academic: Formal vocabulary, complex structures
- Formal: Conservative expressions, complete sentences
- Casual: Contractions, conversational tone
## Originality Scoring
```
Originality = (1 - flagged_segments / total_segments) × 100%
```
Threshold interpretation:
- ≥90%: Very high originality
- 75-89%: Good originality, minor review needed
- 50-74%: Moderate concerns, revision recommended
- <50%: Significant issues, extensive rewrite required
## Limitations
1. **Local Analysis Only**: Cannot detect external source plagiarism
2. **Language Support**: Optimized for English and CJK languages
3. **Context Blind**: No semantic understanding beyond surface patterns
4. **Threshold Dependency**: Results sensitive to threshold selection
## References
- Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval
- Broder, A. Z. (1997). On the resemblance and containment of documents
- Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing
FILE:references/paraphrasing_guide.md
# Paraphrasing Guide
## When to Paraphrase
Paraphrasing is recommended when:
1. Similarity score exceeds threshold (default: 70%)
2. Text contains common phrases or clichés
3. Sentence structure is too similar to source material
4. Need to adapt tone for different audiences
## Paraphrasing Strategies
### 1. Vocabulary Substitution
Replace common words with precise alternatives:
- "big problem" → "significant challenge"
- "very important" → "critically essential"
- "a lot of" → "a substantial quantity of"
### 2. Sentence Structure Changes
**Original:** The researcher conducted the experiment carefully.
**Passive:** The experiment was conducted carefully by the researcher.
**Reordered:** Carefully, the researcher conducted the experiment.
### 3. Combining/Splitting Sentences
**Original:** The data shows significant results. This indicates a strong correlation.
**Combined:** The significant results demonstrated by the data indicate a strong correlation.
### 4. Voice Transformation
Active to passive conversion reduces similarity while maintaining meaning:
- "We analyzed the data" → "The data was analyzed"
- "The study reveals" → "It was revealed by the study"
## Style Guidelines
### Academic Style
- Use formal vocabulary
- Avoid contractions
- Prefer passive voice in methods sections
- Use hedging language ("suggests", "indicates", "appears")
### Formal Style
- Complete sentences only
- No colloquialisms
- Precise terminology
- Structured argumentation
### Casual Style
- Contractions acceptable
- Conversational tone
- Personal pronouns encouraged
- Accessible vocabulary
### Neutral Style
- Balanced formality
- Clear and direct
- Widely accessible
- General-purpose rewriting
## Quality Checklist
Before accepting a paraphrase:
- [ ] Meaning preserved accurately
- [ ] No new information added
- [ ] Grammatically correct
- [ ] Appropriate for target audience
- [ ] Similarity significantly reduced
## Common Pitfalls
1. **Over-simplification**: Losing nuance in technical content
2. **Synonym overuse**: Awkward phrasing from excessive substitution
3. **Meaning drift**: Gradual shift away from original intent
4. **Grammar errors**: Incorrect sentence transformations
## Tools Integration
The paraphrasing module integrates with:
- Similarity detection to identify candidates
- Originality scoring to measure improvement
- Batch processing for document-wide rewriting
FILE:requirements.txt
dataclasses
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Plagiarism Checker Pre-Screener
Checks text originality and provides paraphrasing suggestions for high-duplicate content.
Technical: Text similarity detection, paraphrasing suggestions, originality scoring
Difficulty: High
"""
import argparse
import json
import re
import sys
from pathlib import Path
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass, asdict
import math
@dataclass
class Segment:
"""Represents a text segment for analysis."""
index: int
text: str
similarity_score: float = 0.0
flagged: bool = False
paraphrase_suggestion: str = ""
matches_with: List[int] = None
def __post_init__(self):
if self.matches_with is None:
self.matches_with = []
class TextPreprocessor:
"""Handles text preprocessing and segmentation."""
@staticmethod
def clean_text(text: str) -> str:
"""Clean and normalize text."""
# Remove excessive whitespace
text = re.sub(r'\s+', ' ', text)
# Normalize punctuation spacing
text = re.sub(r'\s*([.!?;,])\s*', r'\1 ', text)
return text.strip()
@staticmethod
def segment_by_sentence(text: str) -> List[str]:
"""Segment text into sentences."""
# Handle both English and Chinese sentence delimiters
pattern = r'(?<=[.!?。!?])\s+'
sentences = re.split(pattern, text)
return [s.strip() for s in sentences if s.strip()]
@staticmethod
def segment_by_paragraph(text: str) -> List[str]:
"""Segment text into paragraphs."""
paragraphs = text.split('\n\n')
return [p.strip() for p in paragraphs if p.strip()]
@staticmethod
def tokenize(text: str) -> List[str]:
"""Simple tokenization (supports English and basic CJK)."""
# For English: split by non-alphanumeric
# For CJK: each character is a token
tokens = []
words = re.findall(r'[\w]+|[^\w\s]', text.lower())
return words
@staticmethod
def ngrams(tokens: List[str], n: int = 3) -> List[str]:
"""Generate n-grams from tokens."""
if len(tokens) < n:
return [' '.join(tokens)]
return [' '.join(tokens[i:i+n]) for i in range(len(tokens)-n+1)]
class SimilarityDetector:
"""Detects similarity between text segments using multiple algorithms."""
def __init__(self):
self.preprocessor = TextPreprocessor()
def _compute_tf(self, tokens: List[str]) -> Dict[str, float]:
"""Compute term frequency."""
tf = {}
for token in tokens:
tf[token] = tf.get(token, 0) + 1
# Normalize
total = len(tokens)
return {k: v/total for k, v in tf.items()}
def cosine_similarity(self, text1: str, text2: str) -> float:
"""Compute cosine similarity between two texts."""
tokens1 = self.preprocessor.tokenize(text1)
tokens2 = self.preprocessor.tokenize(text2)
tf1 = self._compute_tf(tokens1)
tf2 = self._compute_tf(tokens2)
# Get all unique terms
all_terms = set(tf1.keys()) | set(tf2.keys())
# Compute dot product and magnitudes
dot_product = sum(tf1.get(term, 0) * tf2.get(term, 0) for term in all_terms)
magnitude1 = math.sqrt(sum(v**2 for v in tf1.values()))
magnitude2 = math.sqrt(sum(v**2 for v in tf2.values()))
if magnitude1 == 0 or magnitude2 == 0:
return 0.0
return dot_product / (magnitude1 * magnitude2)
def ngram_similarity(self, text1: str, text2: str, n: int = 3) -> float:
"""Compute n-gram overlap similarity."""
tokens1 = self.preprocessor.tokenize(text1)
tokens2 = self.preprocessor.tokenize(text2)
ngrams1 = set(self.preprocessor.ngrams(tokens1, n))
ngrams2 = set(self.preprocessor.ngrams(tokens2, n))
if not ngrams1 or not ngrams2:
return 0.0
intersection = len(ngrams1 & ngrams2)
union = len(ngrams1 | ngrams2)
return intersection / union if union > 0 else 0.0
def jaccard_similarity(self, text1: str, text2: str) -> float:
"""Compute Jaccard similarity."""
tokens1 = set(self.preprocessor.tokenize(text1))
tokens2 = set(self.preprocessor.tokenize(text2))
if not tokens1 or not tokens2:
return 0.0
intersection = len(tokens1 & tokens2)
union = len(tokens1 | tokens2)
return intersection / union if union > 0 else 0.0
def combined_similarity(self, text1: str, text2: str) -> float:
"""Combine multiple similarity metrics."""
cos_sim = self.cosine_similarity(text1, text2)
ngram_sim = self.ngram_similarity(text1, text2, n=3)
jaccard_sim = self.jaccard_similarity(text1, text2)
# Weighted combination
return 0.5 * cos_sim + 0.3 * ngram_sim + 0.2 * jaccard_sim
class Paraphraser:
"""Provides paraphrasing suggestions for flagged text."""
# Common synonym mappings
SYNONYMS = {
'important': ['significant', 'crucial', 'essential', 'vital', 'key'],
'show': ['demonstrate', 'indicate', 'reveal', 'display', 'illustrate'],
'use': ['utilize', 'employ', 'apply', 'adopt', 'implement'],
'make': ['create', 'produce', 'generate', 'construct', 'form'],
'find': ['discover', 'identify', 'locate', 'detect', 'determine'],
'increase': ['rise', 'grow', 'expand', 'escalate', 'surge'],
'decrease': ['decline', 'reduce', 'diminish', 'drop', 'fall'],
'good': ['excellent', 'superior', 'favorable', 'positive', 'beneficial'],
'bad': ['poor', 'inferior', 'unfavorable', 'negative', 'harmful'],
'big': ['large', 'substantial', 'considerable', 'significant', 'major'],
'small': ['minor', 'minimal', 'slight', 'modest', 'limited'],
}
# Academic style transformations
ACADEMIC_PHRASES = {
'a lot of': ['a substantial amount of', 'a considerable number of', 'numerous'],
'many': ['a multitude of', 'a wide range of', 'various'],
'some': ['certain', 'particular', 'specific'],
'thing': ['aspect', 'factor', 'element', 'component'],
'do': ['perform', 'conduct', 'carry out', 'execute'],
'get': ['obtain', 'acquire', 'receive', 'gain'],
'say': ['state', 'assert', 'claim', 'suggest', 'argue'],
}
def __init__(self, style: str = 'neutral'):
self.style = style
def _synonym_replace(self, text: str) -> str:
"""Replace words with synonyms."""
words = text.split()
new_words = []
for word in words:
clean_word = re.sub(r'[^\w]', '', word.lower())
if clean_word in self.SYNONYMS:
# Get synonym with same case pattern
synonym = self.SYNONYMS[clean_word][0]
if word[0].isupper():
synonym = synonym.capitalize()
new_words.append(synonym)
else:
new_words.append(word)
return ' '.join(new_words)
def _academic_transform(self, text: str) -> str:
"""Transform to academic style."""
result = text
for phrase, replacements in self.ACADEMIC_PHRASES.items():
pattern = re.compile(r'\b' + re.escape(phrase) + r'\b', re.IGNORECASE)
result = pattern.sub(replacements[0], result)
return result
def _passive_voice(self, text: str) -> str:
"""Convert active to passive voice where appropriate."""
# Simple patterns for common verbs
patterns = [
(r'\b(\w+)\s+(\w+ed)\s+(\w+)', r'\3 was \2 by \1'),
(r'\bWe\s+(\w+)\s+', r'It was \1ed that '),
(r'\bI\s+(\w+)\s+', r'It was \1ed that '),
]
result = text
for pattern, replacement in patterns:
result = re.sub(pattern, replacement, result, flags=re.IGNORECASE)
return result
def _restructure_sentence(self, text: str) -> str:
"""Restructure sentence while preserving meaning."""
# Split by common conjunctions and restructure
parts = re.split(r',\s*', text)
if len(parts) > 1:
# Rearrange clauses
parts.reverse()
return '. '.join(parts)
return text
def paraphrase(self, text: str) -> str:
"""Generate paraphrasing suggestion."""
if self.style == 'academic':
text = self._academic_transform(text)
elif self.style == 'formal':
text = self._passive_voice(text)
# Apply synonym replacement
text = self._synonym_replace(text)
# Restructure
text = self._restructure_sentence(text)
return text
class PlagiarismChecker:
"""Main class for plagiarism checking."""
def __init__(self, threshold: float = 0.70, segment_type: str = 'sentence'):
self.threshold = threshold
self.segment_type = segment_type
self.preprocessor = TextPreprocessor()
self.detector = SimilarityDetector()
def _segment_text(self, text: str) -> List[str]:
"""Segment text based on configuration."""
if self.segment_type == 'paragraph':
return self.preprocessor.segment_by_paragraph(text)
return self.preprocessor.segment_by_sentence(text)
def analyze(self, text: str, paraphrase: bool = False,
style: str = 'neutral') -> Dict:
"""Analyze text for plagiarism."""
# Clean text
text = self.preprocessor.clean_text(text)
# Segment text
raw_segments = self._segment_text(text)
# Create segment objects
segments = [Segment(index=i, text=s) for i, s in enumerate(raw_segments)]
# Initialize paraphraser if needed
paraphraser = Paraphraser(style) if paraphrase else None
# Compare each segment with every other segment
for i, seg1 in enumerate(segments):
max_similarity = 0.0
matches = []
for j, seg2 in enumerate(segments):
if i != j:
sim = self.detector.combined_similarity(seg1.text, seg2.text)
if sim > max_similarity:
max_similarity = sim
if sim >= self.threshold:
matches.append(j)
seg1.similarity_score = max_similarity
seg1.flagged = max_similarity >= self.threshold
seg1.matches_with = matches
# Generate paraphrase if flagged and requested
if paraphraser and seg1.flagged:
seg1.paraphrase_suggestion = paraphraser.paraphrase(seg1.text)
# Calculate overall originality score
flagged_count = sum(1 for s in segments if s.flagged)
total_segments = len(segments)
originality_score = ((total_segments - flagged_count) / total_segments * 100) if total_segments > 0 else 100
# Build result
result = {
'originality_score': round(originality_score, 2),
'total_segments': total_segments,
'flagged_segments': flagged_count,
'threshold': self.threshold,
'segment_type': self.segment_type,
'segments': [
{
'index': s.index,
'text': s.text,
'similarity_score': round(s.similarity_score, 4),
'flagged': s.flagged,
'matches_with': s.matches_with,
'paraphrase_suggestion': s.paraphrase_suggestion if paraphrase else None
}
for s in segments
],
'summary': self._generate_summary(originality_score, flagged_count, total_segments)
}
return result
def _generate_summary(self, originality_score: float,
flagged: int, total: int) -> str:
"""Generate human-readable summary."""
if originality_score >= 90:
return f"Text shows very high originality ({originality_score:.1f}%). Minor patterns detected in {flagged}/{total} segments."
elif originality_score >= 75:
return f"Text shows good originality ({originality_score:.1f}%). {flagged}/{total} segments flagged for review."
elif originality_score >= 50:
return f"Text shows moderate similarity issues ({originality_score:.1f}%). {flagged}/{total} segments require revision."
else:
return f"Text shows significant similarity concerns ({originality_score:.1f}%). Extensive revision recommended - {flagged}/{total} segments flagged."
def read_file(filepath: str) -> str:
"""Read text from file."""
path = Path(filepath)
if not path.exists():
raise FileNotFoundError(f"File not found: {filepath}")
# Handle common text formats
suffix = path.suffix.lower()
if suffix == '.txt' or suffix == '.md':
return path.read_text(encoding='utf-8')
else:
# Try to read as text
try:
return path.read_text(encoding='utf-8')
except UnicodeDecodeError:
raise ValueError(f"Unsupported file format: {suffix}")
def format_report(result: Dict, format_type: str = 'json') -> str:
"""Format analysis report."""
if format_type == 'json':
return json.dumps(result, ensure_ascii=False, indent=2)
# Text format
lines = [
"=" * 60,
"PLAGIARISM CHECK REPORT",
"=" * 60,
f"Originality Score: {result['originality_score']}%",
f"Total Segments: {result['total_segments']}",
f"Flagged Segments: {result['flagged_segments']}",
f"Threshold: {result['threshold']}",
"-" * 60,
result['summary'],
"=" * 60,
]
for seg in result['segments']:
if seg['flagged']:
lines.append(f"\n⚠️ SEGMENT {seg['index']} (Similarity: {seg['similarity_score']:.2%})")
lines.append(f" Text: {seg['text'][:100]}...")
if seg.get('paraphrase_suggestion'):
lines.append(f" Suggestion: {seg['paraphrase_suggestion'][:100]}...")
return '\n'.join(lines)
def main():
parser = argparse.ArgumentParser(
description='Plagiarism Checker Pre-Screener - Check text originality'
)
parser.add_argument('--input', '-i', type=str, help='Text to analyze')
parser.add_argument('--file', '-f', type=str, help='Path to file to analyze')
parser.add_argument('--threshold', '-t', type=float, default=0.70,
help='Similarity threshold (0.0-1.0), default 0.70')
parser.add_argument('--paraphrase', '-p', action='store_true',
help='Enable paraphrasing suggestions')
parser.add_argument('--style', '-s', type=str, default='neutral',
choices=['academic', 'formal', 'casual', 'neutral'],
help='Paraphrasing style')
parser.add_argument('--segments', type=str, default='sentence',
choices=['sentence', 'paragraph'],
help='Analysis segment type')
parser.add_argument('--output', '-o', type=str,
help='Output file path (JSON format)')
parser.add_argument('--format', type=str, default='text',
choices=['json', 'text'],
help='Output format')
args = parser.parse_args()
# Get input text
if args.file:
try:
text = read_file(args.file)
except (FileNotFoundError, ValueError) as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
elif args.input:
text = args.input
else:
# Read from stdin
text = sys.stdin.read()
if not text.strip():
parser.print_help()
sys.exit(1)
# Validate threshold
if not 0 <= args.threshold <= 1:
print("Error: Threshold must be between 0.0 and 1.0", file=sys.stderr)
sys.exit(1)
# Run analysis
checker = PlagiarismChecker(
threshold=args.threshold,
segment_type=args.segments
)
result = checker.analyze(
text=text,
paraphrase=args.paraphrase,
style=args.style
)
# Format and output
output = format_report(result, args.format)
if args.output:
Path(args.output).write_text(output, encoding='utf-8')
print(f"Report saved to: {args.output}")
else:
print(output)
if __name__ == '__main__':
main()
Analyze data with `phylogenetic-tree-styler` using a reproducible workflow, explicit validation, and structured outputs for review-ready interpretation.
---
name: phylogenetic-tree-styler
description: Analyze data with `phylogenetic-tree-styler` using a reproducible workflow, explicit validation, and structured outputs for review-ready interpretation.
license: MIT
skill-author: AIPOCH
---
# Phylogenetic Tree Styler
## When to Use
- Use this skill when the task needs Beautify phylogenetic trees with taxonomy color blocks, bootstrap values.
- Use this skill for data analysis tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
See `## Features` above for related details.
- Scope-focused workflow aligned to: Analyze data with `phylogenetic-tree-styler` using a reproducible workflow, explicit validation, and structured outputs for review-ready interpretation.
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
- Python 3.8+
- ete3
- matplotlib
- numpy
- pandas
Install dependencies:
```text
pip install ete3 matplotlib numpy pandas
```
## Example Usage
See `## Usage` above for related details.
```bash
cd "20260318/scientific-skills/Data Analytics/phylogenetic-tree-styler"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
# Example invocation: python scripts/main.py --help
# Example invocation: python scripts/main.py --input "Audit validation sample with explicit symptoms, history, assessment, and next-step plan." --format json
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Features
Beautify phylogenetic trees, add taxonomy color blocks, Bootstrap values, and timelines.
## Usage
```text
python3 scripts/main.py --input <input_tree.nwk> --output <output.png> [options]
```
### Parameters
| Parameter | Description | Default |
|------|------|--------|
| `-i`, `--input` | Input Newick format phylogenetic tree file | Required |
| `-o`, `--output` | Output image file path | tree_styled.png |
| `-f`, `--format` | Output format: png, pdf, svg | png |
| `-w`, `--width` | Image width (pixels) | 1200 |
| `-h`, `--height` | Image height (pixels) | 800 |
| `--show-bootstrap` | Show Bootstrap values | False |
| `--bootstrap-threshold` | Only show Bootstrap values above this threshold | 50 |
| `--taxonomy-file` | Species taxonomy information file (CSV format: name,domain,phylum,class,order,family,genus) | None |
| `--show-timeline` | Show timeline | False |
| `--root-age` | Root node age (million years ago) | None |
| `--branch-color` | Branch color | black |
| `--leaf-color` | Leaf node label color | black |
## Examples
### Basic Beautification
```text
python3 scripts/main.py -i tree.nwk -o tree_basic.png
```
### Show Bootstrap Values
```text
python3 scripts/main.py -i tree.nwk -o tree_bootstrap.png --show-bootstrap --bootstrap-threshold 70
```
### Add Taxonomy Color Blocks
```text
python3 scripts/main.py -i tree.nwk -o tree_taxonomy.png --taxonomy-file taxonomy.csv
```
### Add Timeline
```text
python3 scripts/main.py -i tree.nwk -o tree_timeline.png --show-timeline --root-age 500
```
### Comprehensive Usage
```text
python3 scripts/main.py -i tree.nwk -o tree_full.png \
--show-bootstrap --bootstrap-threshold 70 \
--taxonomy-file taxonomy.csv \
--show-timeline --root-age 500
```
## Taxonomy Information File Format
taxonomy.csv example:
```csv
name,domain,phylum,class
Species_A,Bacteria,Proteobacteria,Gammaproteobacteria
Species_B,Bacteria,Firmicutes,Bacilli
Species_C,Archaea,Euryarchaeota,Methanobacteria
```
## Input Format
Supports standard Newick format (.nwk or .newick):
```
((A:0.1,B:0.2)95:0.3,(C:0.4,D:0.5)88:0.6);
```
Bootstrap values can be placed at node label positions (like the 95, 88 above).
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
No additional Python packages required.
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `phylogenetic-tree-styler` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `phylogenetic-tree-styler` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
## Inputs to Collect
- Required inputs: the user goal, the primary data or source file, and the requested output format.
- Optional inputs: output directory, formatting preferences, and validation constraints.
- If a required input is unavailable, return a short clarification request before continuing.
## Output Contract
- Return a short summary, the main deliverables, and any assumptions that materially affect interpretation.
- If execution is partial, label what succeeded, what failed, and the next safe recovery step.
- Keep the final answer within the documented scope of the skill.
## Validation and Safety Rules
- Validate identifiers, file paths, and user-provided parameters before execution.
- Do not fabricate results, metrics, citations, or downstream conclusions.
- Use safe fallback behavior when dependencies, credentials, or required inputs are missing.
- Surface any execution failure with a concise diagnosis and recovery path.
FILE:example/taxonomy.csv
name,domain,phylum,class
Species_A,Bacteria,Proteobacteria,Gammaproteobacteria
Species_B,Bacteria,Proteobacteria,Alphaproteobacteria
Species_C,Bacteria,Firmicutes,Bacilli
Species_D,Archaea,Euryarchaeota,Methanobacteria
Species_E,Archaea,Crenarchaeota,Thermoprotei
FILE:references/runtime_checklist.md
# Runtime Checklist
- Category: `Data Analysis`
- Validate the user goal, required inputs, and output format before taking action.
- Ask a targeted clarification question when a required input is missing.
- Keep the response scoped to the documented workflow and state assumptions explicitly.
- Run a non-destructive smoke check before any file-dependent or data-dependent command.
- Recommended smoke check: `python -m py_compile scripts/main.py`
- If execution fails, stop and return a concise recovery path instead of fabricating results.
FILE:requirements.txt
ete3
matplotlib
numpy
pandas
FILE:scripts/main.py
#!/usr/bin/env python3
"""Phylogenetic Tree Styler
Beautify the evolutionary tree, add species classification color blocks, Bootstrap values and timeline"""
import argparse
import sys
from pathlib import Path
try:
from ete3 import Tree, TreeStyle, NodeStyle, TextFace, RectFace, CircleFace
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import pandas as pd
import numpy as np
except ImportError as e:
print(f"mistake: Missing dependencies - {e}")
print("Please install dependencies: pip install ete3 matplotlib numpy pandas")
sys.exit(1)
# Predefined color schemes
TAXONOMY_COLORS = {
'domain': ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f'],
'phylum': ['#aec7e8', '#ffbb78', '#98df8a', '#ff9896', '#c5b0d5', '#c49c94', '#f7b6d3', '#c7c7c7'],
'class': ['#9edae5', '#dbdb8d', '#bcbd22', '#17becf', '#e6550d', '#fd8d3c', '#31a354', '#74c476'],
}
def parse_args():
"""Parse command line parameters"""
parser = argparse.ArgumentParser(
description='Phylogenetic Tree Styler - Beautify evolutionary tree visualization',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""Example:
%(prog)s -i tree.nwk -o output.png
%(prog)s -i tree.nwk --show-bootstrap --taxonomy-file taxo.csv
%(prog)s -i tree.nwk --show-timeline --root-age 500"""
)
parser.add_argument('-i', '--input', required=True, help='Input Newick format evolutionary tree file')
parser.add_argument('-o', '--output', default='tree_styled.png', help='Output image file path')
parser.add_argument('-f', '--format', choices=['png', 'pdf', 'svg'], default='png', help='Output format')
parser.add_argument('--width', type=int, default=1200, help='Image width (pixels)')
parser.add_argument('--height', type=int, default=800, help='Image height (pixels)')
parser.add_argument('--show-bootstrap', action='store_true', help='Show Bootstrap values')
parser.add_argument('--bootstrap-threshold', type=float, default=50, help='Only show Bootstrap values above this threshold')
parser.add_argument('--taxonomy-file', help='Species classification information file (CSV format)')
parser.add_argument('--show-timeline', action='store_true', help='Show timeline')
parser.add_argument('--root-age', type=float, help='Root node age (millions of years ago)')
parser.add_argument('--branch-color', default='black', help='branch color')
parser.add_argument('--leaf-color', default='black', help='Leaf node label color')
parser.add_argument('--dpi', type=int, default=150, help='Output DPI')
return parser.parse_args()
def load_tree(tree_file):
"""Load evolutionary tree file"""
try:
tree = Tree(tree_file, format=1)
return tree
except Exception as e:
print(f"mistake: Unable to parse evolutionary tree file - {e}")
sys.exit(1)
def load_taxonomy(taxonomy_file):
"""Load classification information file"""
if not taxonomy_file:
return None
try:
df = pd.read_csv(taxonomy_file)
required_cols = ['name']
if not all(col in df.columns for col in required_cols):
print(f"mistake: Classification files must contain 'name' List")
return None
# Create a mapping of species to taxonomic information
taxonomy_map = {}
for _, row in df.iterrows():
taxonomy_map[row['name']] = row.to_dict()
return taxonomy_map
except Exception as e:
print(f"warn: Unable to load category file - {e}")
return None
def assign_taxonomy_colors(taxonomy_map, level='domain'):
"""Assign colors to classification levels"""
if not taxonomy_map:
return {}
# Collect all unique categorical values
values = set()
for taxo in taxonomy_map.values():
if level in taxo and pd.notna(taxo[level]):
values.add(taxo[level])
values = sorted(list(values))
colors = TAXONOMY_COLORS.get(level, TAXONOMY_COLORS['domain'])
color_map = {}
for i, value in enumerate(values):
color_map[value] = colors[i % len(colors)]
return color_map
def style_tree(tree, args, taxonomy_map=None):
"""Set the style of the tree"""
# Create a tree style
ts = TreeStyle()
ts.show_leaf_name = True
ts.mode = 'r' # Radial mode, can be changed to 'c' for circular mode
ts.optimal_scale_level = 'full'
ts.scale = 200
# If there is a timeline, adjust the layout
if args.show_timeline:
ts.mode = 'r'
ts.show_scale = True
ts.scale_length = 0.1
# Assign colors to classification levels
domain_colors = {}
phylum_colors = {}
if taxonomy_map:
domain_colors = assign_taxonomy_colors(taxonomy_map, 'domain')
phylum_colors = assign_taxonomy_colors(taxonomy_map, 'phylum')
# Set styles for each node
for node in tree.traverse():
nstyle = NodeStyle()
nstyle['size'] = 0
nstyle['fgcolor'] = args.branch_color
nstyle['hz_line_color'] = args.branch_color
nstyle['vt_line_color'] = args.branch_color
nstyle['hz_line_width'] = 2
nstyle['vt_line_width'] = 2
# Leaf node style
if node.is_leaf():
nstyle['size'] = 8
nstyle['fgcolor'] = args.leaf_color
# Add classified color blocks
if taxonomy_map and node.name in taxonomy_map:
taxo = taxonomy_map[node.name]
# Add domain color block
if 'domain' in taxo and pd.notna(taxo['domain']):
domain = taxo['domain']
color = domain_colors.get(domain, '#999999')
domain_face = RectFace(15, 15, color, color)
domain_face.margin_right = 5
node.add_face(domain_face, column=0, position='aligned')
# Add domain tag
domain_text = TextFace(f" {domain}", fsize=10, fgcolor=color)
node.add_face(domain_text, column=1, position='aligned')
# Add phylum color block
if 'phylum' in taxo and pd.notna(taxo['phylum']):
phylum = taxo['phylum']
color = phylum_colors.get(phylum, '#cccccc')
phylum_face = RectFace(15, 15, color, color)
phylum_face.margin_right = 5
node.add_face(phylum_face, column=2, position='aligned')
# Add phylum tag
phylum_text = TextFace(f" {phylum}", fsize=10, fgcolor='#666666')
node.add_face(phylum_text, column=3, position='aligned')
# Internal Node - Display Bootstrap Value
else:
# Try to get bootstrap value
bootstrap = None
# Parsed from node name (common format: (A,B)95:0.1)
if node.name and node.name.replace('.', '').replace('-', '').isdigit():
try:
bootstrap = float(node.name)
except:
pass
# Get from support attribute
if bootstrap is None and hasattr(node, 'support') and node.support is not None:
try:
bootstrap = float(node.support)
except:
pass
# Show Bootstrap values
if args.show_bootstrap and bootstrap is not None and bootstrap >= args.bootstrap_threshold:
# Set color intensity based on bootstrap value
intensity = min(1.0, bootstrap / 100)
if bootstrap >= 90:
color = '#2166ac' # Dark blue - high confidence
elif bootstrap >= 70:
color = '#4393c3' # medium blue
else:
color = '#92c5de' # light blue
bootstrap_face = TextFace(f"{int(bootstrap)}", fsize=9, fgcolor=color, bold=True)
node.add_face(bootstrap_face, column=0, position='branch-top')
# Node size reflects bootstrap value
nstyle['size'] = 4 + (bootstrap / 100) * 6
nstyle['fgcolor'] = color
node.set_style(nstyle)
return ts
def add_timeline(tree, ts, root_age):
"""Add timeline"""
if not root_age:
return
# Calculate tree height
tree_height = tree.get_farthest_leaf()[1]
# Add time scale
ts.show_scale = True
ts.scale_length = tree_height / 5
# Add title description
ts.title.add_face(TextFace(f"Time Scale: {root_age} Mya", fsize=12, bold=True), column=0)
def render_tree(tree, ts, output_file, args):
"""Render tree to image file"""
try:
# Set image size
tree.render(output_file, tree_style=ts, w=args.width, h=args.height, dpi=args.dpi)
print(f"success: Image saved to {output_file}")
return True
except Exception as e:
print(f"mistake: Rendering failed - {e}")
return False
def main():
args = parse_args()
# Check input file
input_path = Path(args.input)
if not input_path.exists():
print(f"mistake: Input file does not exist: {args.input}")
sys.exit(1)
# Load evolutionary tree
print(f"Loading evolutionary tree: {args.input}")
tree = load_tree(args.input)
print(f"tree information: {len(tree)} leaf nodes")
# Load classification information
taxonomy_map = None
if args.taxonomy_file:
print(f"Loading category information: {args.taxonomy_file}")
taxonomy_map = load_taxonomy(args.taxonomy_file)
if taxonomy_map:
print(f"Loaded {len(taxonomy_map)} taxonomic information for each species")
# Set tree style
print("Setting style...")
ts = style_tree(tree, args, taxonomy_map)
# Add timeline
if args.show_timeline:
print("Adding timeline...")
add_timeline(tree, ts, args.root_age)
# Set output path
output_path = Path(args.output)
if output_path.suffix != f'.{args.format}':
output_path = output_path.with_suffix(f'.{args.format}')
# render image
print(f"Rendering image...")
if render_tree(tree, ts, str(output_path), args):
print(f"Finish! output file: {output_path}")
else:
sys.exit(1)
if __name__ == '__main__':
main()
Use when writing medical school personal statements, residency application essays, fellowship statements, or graduate school admissions essays. Crafts compel...
---
name: personal-statement
description: Use when writing medical school personal statements, residency application essays, fellowship statements, or graduate school admissions essays. Crafts compelling narratives highlighting clinical experiences, research achievements, and career motivations for healthcare education applications.
license: MIT
skill-author: AIPOCH
---
# Personal Statement Writer for Medical Education
Craft compelling personal statements for medical school, residency, fellowship, and graduate school applications in healthcare fields.
## When to Use
- Use this skill when the task needs Use when writing medical school personal statements, residency application essays, fellowship statements, or graduate school admissions essays. Crafts compelling narratives highlighting clinical experiences, research achievements, and career motivations for healthcare education applications.
- Use this skill for academic writing tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
- Scope-focused workflow aligned to: Use when writing medical school personal statements, residency application essays, fellowship statements, or graduate school admissions essays. Crafts compelling narratives highlighting clinical experiences, research achievements, and career motivations for healthcare education applications.
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `Third-party packages`: `not explicitly version-pinned in this skill package`. Add pinned versions if this skill needs stricter environment control.
## Example Usage
```bash
cd "20260318/scientific-skills/Academic Writing/personal-statement"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Quick Start
```python
from scripts.personal_statement_writer import PersonalStatementWriter
writer = PersonalStatementWriter()
# Generate personal statement
statement = writer.write(
program_type="medical_school",
key_experiences=["Shadowing Dr. Smith", "Volunteer at free clinic", "Research on diabetes"],
motivation="Helping underserved communities",
character_limit=5300
)
```
## Core Capabilities
### 1. Structure Generation
```python
outline = writer.create_outline(
program="residency_surgery",
themes=["Leadership", "Technical skill", "Patient advocacy"]
)
```
**Standard Structure:**
1. **Opening Hook** (10-15%) - Captivating patient story or defining moment
2. **Clinical Experiences** (30-40%) - Specific patient encounters with reflection
3. **Research/Academic** (20-25%) - Scholarly contributions and intellectual curiosity
4. **Service/Leadership** (15-20%) - Community impact and teamwork
5. **Career Goals** (10-15%) - Clear vision for future practice
### 2. Experience Framing
```python
framed = writer.frame_experience(
experience="Volunteered at homeless shelter",
angle="patient_advocacy",
program_type="family_medicine"
)
```
**STAR Method for Experiences:**
- **S**ituation: Brief context
- **T**ask: Your responsibility
- **A**ction: Specific steps you took
- **R**esult: Measurable outcome + personal reflection
### 3. Character Optimization
```python
optimized = writer.optimize_length(
draft_statement,
target_chars=5300, # AMCAS limit
min_chars=4500
)
```
**Character Limits by Program:**
| Program | Character Limit | Word Approx |
|---------|----------------|-------------|
| AMCAS (Medical School) | 5,300 | ~750 words |
| ERAS (Residency) | Varies by specialty | ~800 words |
| Fellowship | Usually 1-2 pages | ~1000 words |
| Graduate School | Varies | ~500-1000 words |
### 4. Tone Adjustment
```python
adjusted = writer.adjust_tone(
statement,
tone="confident_but_humble",
avoid_cliches=True
)
```
## Common Patterns
See `references/personal-statement-examples.md` for:
- Medical School (MD/DO) Examples
- Residency Personal Statements by Specialty
- Fellowship Application Essays
- Re-applicant Strategies
- Career Changer Narratives
## Quality Checklist
**Before Writing:**
- [ ] List 3 defining patient experiences
- [ ] Identify unique aspects of your journey
- [ ] Research specific program values
**After Writing:**
- [ ] No clichés ("I want to help people", "Since I was young...")
- [ ] Specific examples throughout
- [ ] Personal reflection on every experience
- [ ] Clear connection to chosen specialty
- [ ] Within character limits
- [ ] Proofread for errors
## Common Pitfalls
❌ **Avoid**: "I have always wanted to be a doctor since childhood"
✅ **Instead**: "My decision to pursue medicine crystallized when..."
❌ **Avoid**: Listing achievements without reflection
✅ **Instead**: "This experience taught me..." + specific insight
---
**Skill ID**: 203 | **Version**: 1.0 | **License**: MIT
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `personal-statement` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `personal-statement` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:references/guidelines.md
# Personal Statement - References
## Writing Guidelines
- Personal Statement Best Practices
- Medical Application Standards
FILE:scripts/main.py
#!/usr/bin/env python3
"""Personal Statement - Personal narrative generator."""
import json
class PersonalStatement:
"""Generates personal statements."""
def generate(self, purpose: str, experiences: list, goals: str) -> dict:
"""Generate personal statement."""
statement = f"""My journey in medicine began with a desire to make a difference.
Through my experiences, including {experiences[0] if experiences else 'clinical work'},
I have developed a passion for {purpose}.
My goal is to {goals}.
"""
return {
"personal_statement": statement,
"themes": ["Dedication", "Growth", "Service"],
"word_count": len(statement.split())
}
def main():
ps = PersonalStatement()
result = ps.generate("patient care", ["volunteering at clinics"], "improve healthcare access")
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()
Generate ethical, compliant, and patient-friendly recruitment advertisements for clinical trials.
---
name: patient-recruitment-ad-gen
description: Generate ethical, compliant, and patient-friendly recruitment advertisements for clinical trials.
license: MIT
skill-author: AIPOCH
---
# Patient Recruitment Ad Generator
Generate ethical, compliant, and patient-friendly recruitment advertisements for clinical trials.
## When to Use
- Use this skill when the task is to Generate ethical, compliant, and patient-friendly recruitment advertisements for clinical trials.
- Use this skill for academic writing tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
- Scope-focused workflow aligned to: Generate ethical, compliant, and patient-friendly recruitment advertisements for clinical trials.
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
See `## Prerequisites` above for related details.
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `Third-party packages`: `not explicitly version-pinned in this skill package`. Add pinned versions if this skill needs stricter environment control.
## Example Usage
See `## Usage` above for related details.
```bash
cd "20260318/scientific-skills/Academic Writing/patient-recruitment-ad-gen"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Purpose
This skill helps researchers, CROs, and medical institutions create patient recruitment advertisements that meet Institutional Review Board (IRB) / Ethics Committee (EC) requirements while being accessible and encouraging to potential participants.
## Key Compliance Requirements
### Essential Elements (IRB/EC Standards)
1. **Trial Identity**
- Study title or identifier
- Sponsor information (if required)
2. **Purpose Statement**
- Clear description of the research
- Why the study is being conducted
3. **Eligibility Criteria**
- Inclusion criteria (who can participate)
- Exclusion criteria (who cannot participate)
4. **Study Procedures**
- What participants will do
- Time commitment required
- Number of visits
5. **Risks and Benefits**
- Potential risks/discomforts
- Potential benefits (direct and societal)
- Statement that benefits are not guaranteed
6. **Confidentiality**
- How personal information is protected
- Regulatory oversight mention
7. **Voluntary Participation**
- Right to withdraw at any time
- No penalty for withdrawal
- No impact on regular medical care
8. **Contact Information**
- Principal Investigator
- Study coordinator
- IRB/EC contact for questions about rights
### Prohibited Content
- **Promises of cure** or guaranteed benefits
- **Undue influence** (excessive payment, coercion)
- **Misleading language** ("free treatment" when experimental)
- **Stigmatizing terms** ("sufferers," "victims")
- **Pressure tactics** (limited spots, urgency)
## Usage
### Input Parameters
```python
{
"disease_condition": str, # Target disease/condition
"study_phase": str, # Phase I/II/III/IV
"intervention_type": str, # Drug, device, procedure, etc.
"target_population": str, # Demographics, age range
"study_duration": str, # Expected time commitment
"site_location": str, # Study site location
"compensation": Optional[str], # Participant payment (if any)
"pi_name": str, # Principal Investigator
"contact_info": str, # Phone/email for inquiries
"irb_reference": str # IRB/EC approval number
}
```
### Example
```python
python /Users/z04030865/.openclaw/workspace/skills/patient-recruitment-ad-gen/scripts/main.py \
--disease "Type 2 Diabetes" \
--phase "Phase II" \
--intervention "Investigational oral medication" \
--population "Adults 18-65 with T2DM" \
--duration "12 weeks, 6 clinic visits" \
--location "City Medical Center, Building C" \
--pi "Dr. Sarah Chen" \
--contact "(555) 123-4567 or [email protected]" \
--irb "IRB-2024-001"
```
### Output
Generates a structured recruitment ad with:
- Headline (attention-grabbing, compliant)
- Study summary (plain language)
- Who can participate (eligibility)
- What's involved (procedures)
- Rights and protections (ethics)
- Contact information
## Technical Notes
- **Difficulty**: Medium
- **Language**: Patient-friendly (6th-8th grade reading level)
- **Tone**: Respectful, informative, empowering
- **Format**: Print, digital, or social media ready
- **Compliance**: Based on FDA, EMA, CIOMS, and ICH-GCP guidelines
## References
See `references/` folder for:
- `fda_guidance.md` - FDA guidance on informed consent
- `ema_guidelines.md` - European ethics requirements
- `ich_gcp.md` - ICH-GCP E6(R2) recruitment provisions
- `plain_language_guide.pdf` - NIH Plain Language guidelines
- `template_examples/` - Sample ads for different therapeutic areas
## Safety & Ethics
- Always include voluntary participation statement
- Never guarantee therapeutic benefit
- Ensure readability for target population
- Review with IRB/EC before use
- Avoid therapeutic misconception
---
**Technical Difficulty**: Medium
**Category**: Pharma / Clinical Research
**Last Updated**: 2026-02-05
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
No additional Python packages required.
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `patient-recruitment-ad-gen` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `patient-recruitment-ad-gen` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:references/ema_guidelines.md
# EMA Guidelines on Ethics and Recruitment
## EU Clinical Trials Regulation (EU CTR 536/2014)
### Key Principles
1. **Respect for Rights**
- Protection of rights, safety, dignity, and well-being of participants
- Informed consent is fundamental requirement
2. **Scientific Integrity**
- Research questions with scientific validity
- Appropriate methodology
3. **Independence**
- Ethics committees operate independently
- Free from undue influence
### Informed Consent Requirements (Chapter V, Article 29-31)
Consent must be:
- **Free** - No coercion or undue influence
- **Informed** - Full disclosure of relevant information
- **Specific** - Clear about the trial in question
- **Documented** - Written form or formally documented
### Information Sheet Requirements
Must include in language understandable to layperson:
1. Identity and contact of sponsor/investigator
2. Purpose of the trial
3. Description of trial treatments
4. Procedures to be followed
5. Potential risks and inconveniences
6. Potential benefits
7. Alternative treatments
8. Duration of participation
9. Reimbursement/compensation information
10. Post-trial provisions
11. Confidentiality protections
12. Voluntary nature of participation
13. Rights as participant
14. Circumstances for termination
15. Significant findings that emerge
16. Use of biological samples (if applicable)
## Ethics Committee Composition and Review
### EC Composition Requirements:
- Medical/scientific members
- Laypersons
- At least one member independent of the institution/site
- Balanced gender representation
- Sufficient expertise to review protocol
### Review Timeline:
- Single opinion mechanism in EU
- Maximum 60 days for assessment (with possible clock-stops)
- Varying timelines for different trial types
## Recruitment Ethics
### Prohibited Practices:
- Payment that constitutes undue inducement
- Pressure tactics or time-limited offers
- Misrepresentation of risks or benefits
- Targeting vulnerable populations inappropriately
### Special Populations:
- **Minors**: Additional safeguards, parental consent + assent
- **Incapacitated adults**: Legal representative required
- **Pregnant women**: Only when direct benefit to mother/child
- **Emergency situations**: Deferred consent procedures available
## GDPR and Data Protection
Clinical trial data processing must comply with GDPR:
- Lawful basis for processing
- Data minimization
- Purpose limitation
- Storage limitation
- Security requirements
## References
- Regulation (EU) No 536/2014 (Clinical Trials Regulation)
- EMA Guidance on Informed Consent
- EMA Ethics Committee procedures
- GDPR (EU) 2016/679
FILE:references/fda_guidance.md
# FDA Guidance on Informed Consent and Recruitment
## Key Regulatory Requirements (21 CFR Part 50, 56)
### Basic Elements of Informed Consent
Per 21 CFR §50.25(a), informed consent must include:
1. **Statement of Purpose**
- That the study involves research
- Explanation of purposes, procedures, and duration
- Identification of experimental procedures
2. **Risks and Discomforts**
- Reasonably foreseeable risks
- Discomforts that may be encountered
3. **Benefits**
- Benefits to the subject (if any)
- Benefits to others that may reasonably be expected
4. **Alternative Procedures**
- Disclosure of appropriate alternative procedures
5. **Confidentiality Statement**
- Extent of confidentiality maintained
- FDA may inspect records
6. **Compensation and Treatment**
- Whether compensation is available
- Whether medical treatments are available for injury
7. **Contacts**
- For questions about research: researcher contact
- For questions about rights: IRB contact
8. **Voluntary Participation**
- Participation is voluntary
- Refusal to participate involves no penalty
- May discontinue at any time without penalty
## Recruitment Advertising Requirements
### FDA Information Sheets Guidance
Recruitment materials must not:
- **Promise benefits** beyond what is outlined in the protocol
- **Use coercion** or undue influence
- **Emphasize payment** as the primary benefit
- **Claim superiority** of investigational products
### Acceptable Language
| Avoid | Use Instead |
|-------|-------------|
| "Free treatment" | "Study-related care provided at no cost" |
| "Cure your disease" | "Help advance research" |
| "New wonder drug" | "Investigational medication" |
| "Limited spots available" | "Enrollment is ongoing" |
| "Sufferers of..." | "People with..." |
### Payment Disclosure
- Payment should not be the **emphasis** of recruitment materials
- Amount may be stated if reasonable
- Payment for time/inconvenience is acceptable
- Payment contingent on completion must be disclosed
## IRB Review of Recruitment Materials
Recruitment materials are considered part of the informed consent process and must be reviewed and approved by the IRB before use.
### Materials Subject to Review:
- Print advertisements (newspapers, flyers)
- Audio/video advertisements
- Social media posts
- Website content
- Recruitment letters
- Scripts for telephone recruitment
### IRB Approval Statement:
Include IRB approval number and contact information for questions about participant rights.
## References
- FDA 21 CFR Part 50: Protection of Human Subjects
- FDA 21 CFR Part 56: Institutional Review Boards
- FDA Guidance: Informed Consent Information Sheet
- FDA Guidance: Payment and Reimbursement to Research Subjects
FILE:references/ich_gcp.md
# ICH-GCP E6(R2) Recruitment and Consent Provisions
## ICH Harmonised Tripartite Guideline
Good Clinical Practice (GCP)
---
## Section 4: Investigator Responsibilities
### 4.8 Informed Consent of Trial Subjects
#### 4.8.1 General Requirements
- Obtain freely given informed consent before participation
- Non-technical language understandable to subject
- No waiver of legal rights
- Consent form is dated and signed
- Copy provided to subject
#### 4.8.2 Essential Elements of Informed Consent
Consent discussion and documentation must include:
1. **Trial Involvement**
- That the trial involves research
- Purpose of the trial
- Trial treatments and probability of randomization
2. **Procedures**
- Trial procedures to be followed
- Invasive procedures
3. **Subject Responsibilities**
- Subject's responsibilities
4. **Risks**
- Reasonably foreseeable risks or inconveniences
- For pregnant women: risks to embryo/fetus
5. **Benefits**
- Benefits reasonably expected
- No guarantee of benefit
6. **Alternative Care**
- Alternative treatments available
7. **Compensation and Injury**
- Compensation for participation
- Treatment for trial-related injury
- Terms of compensation/insurance
8. **Contact Information**
- Investigator contact for trial-related queries
- Contact for rights-related queries
9. **Voluntary Participation**
- Participation is voluntary
- Right to refuse without penalty
- Right to withdraw without penalty
10. **New Information**
- Provision of significant new findings
11. **Number of Subjects**
- Expected duration of subject's participation
- Approximate number of subjects involved
---
## Recruitment Materials Compliance
### GCP Requirements for Advertisements
Recruitment materials are extensions of the informed consent process and must:
1. **Be Approved**
- Reviewed and approved by IRB/IEC
- Approved versions only may be used
2. **Be Accurate**
- Not misrepresent risks or benefits
- Not promise cure or guaranteed benefit
- Not use coercive language
3. **Include Key Information**
- Nature of research
- Voluntary nature
- Contact information
- IRB/IEC approval reference
---
## Section 3: IRB/IEC Responsibilities
### 3.1 Composition and Authority
- Independent body
- Scientific and non-scientific members
- At least one member independent of institution
- Qualified to review specific research
### 3.2 Documentation Review
IRB/IEC must review and approve:
- Protocol and amendments
- Informed consent form(s)
- **Recruitment materials**
- Written information to be provided to subjects
- Compensation to subjects
### 3.3 Continuing Review
- Ongoing review at appropriate intervals
- Review of unanticipated problems
- Review of any changes to recruitment materials
---
## Recruitment Ethics Principles
### Key GCP Principles Applied to Recruitment:
| Principle | Application to Recruitment |
|-----------|---------------------------|
| **Respect for Persons** | Voluntary participation, no coercion |
| **Beneficence** | Balanced presentation of risks/benefits |
| **Justice** | Fair subject selection, avoid exploitation |
| **Integrity** | Honest, accurate information |
### Recruitment Monitoring
- Sponsor monitors recruitment practices
- Deviations from approved materials are violations
- Training on approved recruitment scripts required
---
## References
- ICH E6(R2) Good Clinical Practice Guideline
- ICH E8 General Considerations for Clinical Trials
- ICH E9 Statistical Principles for Clinical Trials
FILE:references/template_examples.md
# Sample Recruitment Templates by Therapeutic Area
## Template 1: Oncology (Phase II Immunotherapy)
```
================================================================================
RESEARCH STUDY: New Treatment Option for Advanced [Cancer Type]
================================================================================
ABOUT THIS STUDY
----------------------------------------
We are conducting research to evaluate an investigational immunotherapy for
people with advanced [cancer type] that has progressed after standard treatment.
This study drug works by helping your immune system recognize and attack cancer
cells. Your participation may help advance cancer treatment options.
WHO CAN PARTICIPATE
----------------------------------------
You may be eligible if you:
- Are 18 years or older with advanced [cancer type]
- Have disease progression after prior therapy
- Have adequate organ function
- Are willing to follow study procedures
Not everyone who volunteers will qualify. A screening visit will determine
eligibility.
WHAT'S INVOLVED
----------------------------------------
If you qualify and choose to participate:
- Time commitment: Up to 24 months, with visits every 2-3 weeks
- You will receive: Investigational immunotherapy by IV infusion
- Regular scans to monitor disease status
- All study-related procedures and visits are provided at no cost
- Optional tumor biopsy at baseline
Detailed information will be provided before you decide to participate.
RISKS AND BENEFITS
----------------------------------------
- The study treatment may or may not benefit you directly
- Potential side effects include fatigue, rash, diarrhea, and immune-related
reactions (full list will be provided)
- You will be closely monitored for any adverse effects
- Your participation may contribute to cancer research that helps others
- Standard cancer treatments remain available to you
YOUR RIGHTS
----------------------------------------
- Participation is completely voluntary
- You may withdraw from the study at any time without penalty
- Your decision will not affect your regular medical care
- Your personal information will be kept confidential
- The study is reviewed by an Institutional Review Board (IRB #: XXXX-XXXX)
- You will receive complete information before giving consent
CONTACT US
----------------------------------------
Principal Investigator: Dr. [Name], [Credentials]
Location: [Medical Center, Department]
Contact: [Phone] or [Email]
Questions about your rights as a research participant? Contact the IRB at the
number provided in the full informed consent document.
================================================================================
```
---
## Template 2: Diabetes (Phase III Oral Medication)
```
================================================================================
Clinical Study for Type 2 Diabetes: Volunteers Needed
================================================================================
ABOUT THIS STUDY
----------------------------------------
We are conducting Phase III research to evaluate a new oral medication for
Type 2 diabetes management. This study compares the investigational medication
to a standard treatment to assess its effectiveness and safety. Your
participation may help improve future diabetes care.
WHO CAN PARTICIPATE
----------------------------------------
You may be eligible if you:
- Are 18-70 years old with Type 2 diabetes
- Currently on metformin or diet/exercise alone
- HbA1c between 7.0% and 10.5%
- Are willing to follow study procedures
Not everyone who volunteers will qualify. A screening visit will determine
eligibility.
WHAT'S INVOLVED
----------------------------------------
If you qualify and choose to participate:
- Time commitment: 52 weeks, with 12 clinic visits
- You will receive: Study medication (investigational or standard care)
- Blood sugar monitoring supplies provided
- Diabetes education and counseling
- All study-related procedures and visits are provided at no cost
Detailed information will be provided before you decide to participate.
RISKS AND BENEFITS
----------------------------------------
- The study medication may or may not improve your diabetes control
- Potential side effects include nausea, stomach upset, and low blood sugar
(full list will be provided)
- You will be closely monitored for any adverse effects
- Your participation may contribute to diabetes research that helps others
- Your current diabetes treatment remains available to you
YOUR RIGHTS
----------------------------------------
- Participation is completely voluntary
- You may withdraw from the study at any time without penalty
- Your decision will not affect your regular medical care
- Your personal information will be kept confidential
- The study is reviewed by an Institutional Review Board (IRB #: XXXX-XXXX)
- You will receive complete information before giving consent
COMPENSATION
----------------------------------------
You may receive $50 per completed visit ($600 total) for your time and travel.
Details will be provided during the informed consent process.
CONTACT US
----------------------------------------
Principal Investigator: Dr. [Name], Endocrinology
Location: [Diabetes Research Center]
Contact: [Phone] or [Email]
Questions about your rights as a research participant? Contact the IRB at the
number provided in the full informed consent document.
================================================================================
```
---
## Template 3: Cardiology (Phase I Device Study)
```
================================================================================
Research Study: New Heart Monitoring Device
================================================================================
ABOUT THIS STUDY
----------------------------------------
We are conducting early-stage research to evaluate the safety of a new
implantable heart monitoring device. This device is designed to continuously
monitor heart rhythm and detect irregularities. Your participation will help
advance cardiac monitoring technology.
WHO CAN PARTICIPATE
----------------------------------------
You may be eligible if you:
- Are 21-75 years old
- Have documented cardiac arrhythmia
- Require long-term cardiac monitoring
- Are willing to follow study procedures
Not everyone who volunteers will qualify. A screening visit will determine
eligibility.
WHAT'S INVOLVED
----------------------------------------
If you qualify and choose to participate:
- Time commitment: 6 months, with monthly follow-up visits
- You will receive: Implantation of investigational monitoring device
- Regular device checks and monitoring
- All study-related procedures and visits are provided at no cost
- Device removal at study completion if desired
Detailed information will be provided before you decide to participate.
RISKS AND BENEFITS
----------------------------------------
- The device may or may not benefit your heart monitoring
- Potential risks include infection, bleeding, and device malfunction
(full list will be provided)
- You will be closely monitored for any adverse effects
- Your participation may contribute to cardiac device research that helps others
- Standard cardiac monitoring remains available to you
YOUR RIGHTS
----------------------------------------
- Participation is completely voluntary
- You may withdraw from the study at any time without penalty
- Your decision will not affect your regular medical care
- Your personal information will be kept confidential
- The study is reviewed by an Institutional Review Board (IRB #: XXXX-XXXX)
- You will receive complete information before giving consent
CONTACT US
----------------------------------------
Principal Investigator: Dr. [Name], Cardiology
Location: [Cardiac Center]
Contact: [Phone] or [Email]
Questions about your rights as a research participant? Contact the IRB at the
number provided in the full informed consent document.
================================================================================
```
---
## Template 4: Pediatrics (Vaccine Study)
```
================================================================================
Research Study: New Vaccine for Children
================================================================================
ABOUT THIS STUDY
----------------------------------------
We are conducting research to evaluate a new vaccine for children to help
protect against [disease]. This study will assess how well the vaccine works
and how safe it is in children. Your child's participation may help advance
vaccine development.
WHO CAN PARTICIPATE
----------------------------------------
Your child may be eligible if they:
- Are 6 months to 5 years old
- Are in good general health
- Have received routine childhood vaccines
- Parent/guardian willing to follow study procedures
Not every child who volunteers will qualify. A screening visit will determine
eligibility.
WHAT'S INVOLVED
----------------------------------------
If your child qualifies and you choose to participate:
- Time commitment: 12 months, with 6 clinic visits
- Your child will receive: Study vaccine or placebo (randomly assigned)
- Monitoring for any reactions
- All study-related procedures and visits are provided at no cost
- Blood samples at select visits
Detailed information will be provided before you decide to participate.
RISKS AND BENEFITS
----------------------------------------
- The vaccine may or may not protect against [disease]
- Potential side effects include fever, soreness at injection site, and
fussiness (full list will be provided)
- Your child will be closely monitored for any adverse effects
- Your child's participation may contribute to vaccine research that helps others
- Standard childhood vaccines remain available
YOUR RIGHTS
----------------------------------------
- Participation is completely voluntary
- You may withdraw your child from the study at any time without penalty
- Your decision will not affect your child's regular medical care
- Your child's information will be kept confidential
- The study is reviewed by an Institutional Review Board (IRB #: XXXX-XXXX)
- You will receive complete information before giving consent
COMPENSATION
----------------------------------------
You may receive $75 per completed visit ($450 total) for your time and travel.
Details will be provided during the informed consent process.
CONTACT US
----------------------------------------
Principal Investigator: Dr. [Name], Pediatrics
Location: [Children's Research Center]
Contact: [Phone] or [Email]
Questions about your rights as a research participant? Contact the IRB at the
number provided in the full informed consent document.
================================================================================
```
---
## Key Compliance Notes for All Templates
1. **IRB Approval Required** - All recruitment materials must be reviewed and
approved by your IRB/EC before use.
2. **Accuracy Verification** - Ensure all details match your approved protocol.
3. **No Therapeutic Misconception** - Avoid language that implies the
investigational product is better than standard care.
4. **Avoid Coercion** - No "limited time" offers or pressure tactics.
5. **Respectful Language** - Use person-first language ("people with diabetes"
not "diabetics").
6. **Contact Information** - Always include IRB contact for questions about
participant rights.
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Patient Recruitment Ad Generator
Generates IRB/EC compliant patient recruitment advertisements for clinical trials.
"""
import argparse
import json
import sys
from datetime import datetime
from typing import Dict, Optional
def generate_headline(disease: str, phase: str, intervention: str) -> str:
"""Generate an ethical, compliant headline."""
headlines = [
f"Research Study for {disease}: Volunteers Needed",
f"Clinical Trial Opportunity: {disease} Research",
f"Participate in {disease} Research - Volunteers Wanted",
f"Help Advance {disease} Treatment - Join Our Study",
]
return headlines[0]
def generate_summary(disease: str, phase: str, intervention: str) -> str:
"""Generate study summary in plain language."""
phase_desc = {
"Phase I": "early-stage research to evaluate safety",
"Phase II": "research to evaluate effectiveness and safety",
"Phase III": "research comparing the treatment to standard care",
"Phase IV": "post-approval research to gather additional information"
}
phase_text = phase_desc.get(phase, "clinical research")
return (
f"We are conducting {phase_text} for people with {disease}. "
f"This study involves {intervention.lower()}. "
f"Your participation may help advance medical knowledge and potentially "
f"improve future treatments for {disease}."
)
def generate_eligibility(population: str) -> str:
"""Generate eligibility section."""
return (
f"You may be eligible if you:\n"
f"- Are {population}\n"
f"- Are willing to follow study procedures\n"
f"- Meet additional study-specific criteria (to be discussed with study staff)\n\n"
f"Not everyone who volunteers will qualify. A screening visit will determine eligibility."
)
def generate_procedures(duration: str, intervention: str) -> str:
"""Generate procedures section."""
return (
f"If you qualify and choose to participate:\n"
f"- Time commitment: {duration}\n"
f"- You will receive: {intervention}\n"
f"- Regular health monitoring throughout the study\n"
f"- All study-related procedures and visits are provided at no cost\n\n"
f"Detailed information will be provided before you decide to participate."
)
def generate_rights_protections(irb: str) -> str:
"""Generate rights and protections section."""
return (
f"Your Rights and Protections:\n"
f"- Participation is completely voluntary\n"
f"- You may withdraw from the study at any time without penalty\n"
f"- Your decision will not affect your regular medical care\n"
f"- Your personal information will be kept confidential\n"
f"- The study is reviewed by an Institutional Review Board (IRB #: {irb})\n"
f"- You will receive complete information before giving consent"
)
def generate_risks_benefits() -> str:
"""Generate risks and benefits section."""
return (
f"Potential Risks and Benefits:\n"
f"- The study treatment may or may not benefit you directly\n"
f"- There may be side effects, which will be explained to you in detail\n"
f"- You will be closely monitored for any adverse effects\n"
f"- Your participation may contribute to medical knowledge that helps others\n"
f"- Alternative treatments remain available to you"
)
def generate_compensation(compensation: Optional[str]) -> str:
"""Generate compensation section."""
if compensation:
return (
"Compensation:\n"
"You may receive " + compensation + " for your time and travel. "
"Details will be provided during the informed consent process."
)
return ""
def generate_contact(pi: str, contact: str, location: str) -> str:
"""Generate contact section."""
return (
f"For More Information:\n"
f"Principal Investigator: {pi}\n"
f"Location: {location}\n"
f"Contact: {contact}\n\n"
f"Questions about your rights as a research participant? "
f"Contact the IRB at the number provided in the full informed consent document."
)
def generate_ad(params: Dict[str, str]) -> str:
"""Generate complete recruitment advertisement."""
sections = [
("=" * 60),
generate_headline(
params["disease_condition"],
params["study_phase"],
params["intervention_type"]
),
("=" * 60),
"",
"ABOUT THIS STUDY",
"-" * 40,
generate_summary(
params["disease_condition"],
params["study_phase"],
params["intervention_type"]
),
"",
"WHO CAN PARTICIPATE",
"-" * 40,
generate_eligibility(params["target_population"]),
"",
"WHAT'S INVOLVED",
"-" * 40,
generate_procedures(
params["study_duration"],
params["intervention_type"]
),
"",
"RISKS AND BENEFITS",
"-" * 40,
generate_risks_benefits(),
"",
"YOUR RIGHTS",
"-" * 40,
generate_rights_protections(params["irb_reference"]),
]
# Add compensation if provided
if params.get("compensation"):
sections.extend([
"",
generate_compensation(params["compensation"])
])
sections.extend([
"",
"CONTACT US",
"-" * 40,
generate_contact(
params["pi_name"],
params["contact_info"],
params["site_location"]
),
"",
("=" * 60),
f"Generated: {datetime.now().strftime('%Y-%m-%d')}",
"This is a research study. Not a treatment guarantee.",
("=" * 60),
])
return "\n".join(sections)
def generate_json_output(params: Dict[str, str]) -> Dict:
"""Generate structured JSON output for programmatic use."""
return {
"metadata": {
"generated_at": datetime.now().isoformat(),
"version": "1.0.0",
"skill": "patient-recruitment-ad-gen"
},
"study": {
"disease_condition": params["disease_condition"],
"phase": params["study_phase"],
"intervention": params["intervention_type"],
"irb_reference": params["irb_reference"]
},
"content": {
"headline": generate_headline(
params["disease_condition"],
params["study_phase"],
params["intervention_type"]
),
"summary": generate_summary(
params["disease_condition"],
params["study_phase"],
params["intervention_type"]
),
"eligibility": generate_eligibility(params["target_population"]),
"procedures": generate_procedures(
params["study_duration"],
params["intervention_type"]
),
"rights": generate_rights_protections(params["irb_reference"]),
"risks_benefits": generate_risks_benefits(),
"compensation": generate_compensation(params.get("compensation")),
"contact": generate_contact(
params["pi_name"],
params["contact_info"],
params["site_location"]
)
},
"compliance_notes": [
"Ensure IRB/EC approval before distribution",
"Verify all information matches approved protocol",
"Include IRB contact for questions about participant rights",
"Do not modify without IRB/EC approval",
"Keep copy of approved version for records"
]
}
def main():
parser = argparse.ArgumentParser(
description="Generate IRB-compliant patient recruitment advertisements"
)
parser.add_argument("--disease", required=True, help="Target disease/condition")
parser.add_argument("--phase", required=True,
choices=["Phase I", "Phase II", "Phase III", "Phase IV"],
help="Study phase")
parser.add_argument("--intervention", required=True,
help="Intervention type (drug, device, procedure, etc.)")
parser.add_argument("--population", required=True,
help="Target population description")
parser.add_argument("--duration", required=True,
help="Study duration and time commitment")
parser.add_argument("--location", required=True,
help="Study site location")
parser.add_argument("--pi", required=True,
help="Principal Investigator name")
parser.add_argument("--contact", required=True,
help="Contact information (phone/email)")
parser.add_argument("--irb", required=True,
help="IRB/EC approval number")
parser.add_argument("--compensation", default=None,
help="Participant compensation (if any)")
parser.add_argument("--json", action="store_true",
help="Output as JSON instead of formatted text")
args = parser.parse_args()
params = {
"disease_condition": args.disease,
"study_phase": args.phase,
"intervention_type": args.intervention,
"target_population": args.population,
"study_duration": args.duration,
"site_location": args.location,
"pi_name": args.pi,
"contact_info": args.contact,
"irb_reference": args.irb,
"compensation": args.compensation
}
if args.json:
output = generate_json_output(params)
print(json.dumps(output, indent=2, ensure_ascii=False))
else:
print(generate_ad(params))
if __name__ == "__main__":
main()
Simplify informed consent documents into patient-friendly language while maintaining regulatory compliance (FDA 21CFR50, ICH-GCP, HIPAA) and required legal e...
---
name: patient-consent-simplifier
description: Simplify informed consent documents into patient-friendly language while maintaining regulatory compliance (FDA 21CFR50, ICH-GCP, HIPAA) and required legal elements.
license: MIT
skill-author: AIPOCH
---
# Patient Consent Simplifier
Transform complex informed consent documents into patient-friendly language while maintaining regulatory compliance and ethical standards.
## Quick Check
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
python scripts/main.py --text "Audit validation sample with explicit methods, findings, and conclusion."
```
## When to Use
- Use this skill when simplifying informed consent documents for clinical trials or medical procedures.
- Use this skill when adapting research summaries for lay audiences or patients with limited health literacy.
- Do not use this skill to remove required legal elements, downplay significant risks, or produce documents that bypass regulatory review.
## Workflow
1. **Sensitive Data Check:** Before processing, check whether the input document contains patient identifiers (name, DOB, MRN, address). If found, emit a mandatory warning: "This document appears to contain patient PII/PHI. Ensure the document has been de-identified or that you have authorization to process it before proceeding."
2. Confirm the input document, target reading level, and whether legal elements must be preserved.
3. Validate that the request is for consent simplification, not legal drafting or regulatory submission.
4. Apply simplification rules: break long sentences, replace jargon, use active voice, maintain required elements.
5. Assess readability and check compliance against required elements checklist.
6. Return the simplified document with a readability report and compliance status.
7. If inputs are incomplete, state which fields are missing and request only the minimum additional information.
## Usage
```text
# Simplify from text
python scripts/main.py --text "Lumbar puncture will be performed under sterile conditions..."
# Simplify from file
python scripts/main.py --input consent_form.pdf --output simplified_consent.pdf --target-grade 8
# Check compliance only
python scripts/main.py --input document.pdf --check compliance
```
## Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `--input` | file path | No | Input consent document (PDF or text) |
| `--text` | string | No | Inline consent text to simplify |
| `--output` | file path | No | Output file path |
| `--target-grade` | integer | No | Target reading grade level (default: 8) |
## Target Reading Levels
- General population: 8th grade
- Vulnerable populations: 6th grade
- Health literacy challenges: 4th–5th grade
## Required Consent Elements (must be preserved)
Purpose of research · Procedures · Risks and discomforts · Benefits · Alternatives · Confidentiality · Compensation · Contact information · Voluntary participation
## Simplification Rules
- Break sentences longer than 20 words
- Replace medical jargon with common terms
- Use active voice and second person ("you")
- Add visual aid placeholders where appropriate
- Never remove required legal elements
## Stress-Case Rules
For complex multi-constraint requests, always include these explicit blocks:
1. Assumptions
2. Simplification Applied
3. Readability Report
4. Compliance Status
5. Risks and Limits
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate compliance status or remove legally required consent elements.
## Input Validation
This skill accepts: informed consent documents or text passages for readability simplification, with a target reading level and compliance preservation requirement.
If the request does not involve consent document simplification — for example, asking to draft new legal consent forms from scratch, provide regulatory legal advice, or simplify non-consent documents — do not proceed with the workflow. Instead respond:
> "patient-consent-simplifier is designed to simplify existing informed consent documents for patient readability while preserving regulatory compliance. Your request appears to be outside this scope. For drafting new consent forms, consult your institution's IRB template library or a regulatory affairs specialist. Please provide a consent document or text, or use a more appropriate tool."
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:POLISH_CHANGELOG.md
# POLISH_CHANGELOG — patient-consent-simplifier
**Original Score:** 80
**Polish Date:** 2026-03-19
## Issues Addressed
### P0 / Veto Fixes
- None (no veto failures)
### P1 Fixes
- **PHI/PII check missing from workflow:** Added step 1 as a mandatory sensitive data check. If patient identifiers (name, DOB, MRN, address) are detected in the input, the skill now emits a mandatory warning before proceeding.
- **Input Validation redirect improved:** Added specific redirect suggestion ("consult your institution's IRB template library or a regulatory affairs specialist") for out-of-scope consent drafting requests.
### P2 Fixes
- None beyond P1 fixes.
### QS-1 (Input Validation)
- Already present; redirect message strengthened with actionable alternative.
### QS-2 (Progressive Disclosure)
- File is 115 lines — within 300-line limit. No content moved to references/.
### QS-3 (Canonical YAML Frontmatter)
- Already present with all four required fields.
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Patient Consent Simplifier
Simplify informed consent forms to plain language.
"""
import argparse
import re
class ConsentSimplifier:
"""Simplify consent form language."""
# Common legal/medical term replacements
REPLACEMENTS = {
"hereby": "now",
"hereinafter": "from now on",
"aforementioned": "mentioned above",
"indemnify": "protect from harm",
"liability": "legal responsibility",
"thereto": "to it",
"whereas": "because",
"witnesseth": "shows that",
"pursuant to": "under",
"notwithstanding": "even if",
"prospective": "future",
"voluntary": "by choice",
"confidentiality": "privacy",
"randomization": "random assignment",
"placebo": "inactive substance",
"adverse event": "side effect",
"investigational": "experimental"
}
def simplify(self, text):
"""Simplify consent form text."""
simplified = text
# Replace complex terms
for term, replacement in self.REPLACEMENTS.items():
simplified = re.sub(r'\b' + term + r'\b', replacement, simplified, flags=re.IGNORECASE)
# Break long sentences (simple heuristic)
sentences = simplified.split(". ")
shortened = []
for sent in sentences:
if len(sent.split()) > 25:
# Try to break at conjunctions
sent = re.sub(r',\s*and\s+', ". Also, ", sent)
sent = re.sub(r';\s*', ". ", sent)
shortened.append(sent)
simplified = ". ".join(shortened)
# Calculate readability (simple word count heuristic)
words = text.split()
avg_sentence_len = len(words) / max(len(sentences), 1)
return {
"original": text,
"simplified": simplified,
"original_word_count": len(words),
"simplified_word_count": len(simplified.split()),
"avg_sentence_length": avg_sentence_len,
"terms_replaced": len(self.REPLACEMENTS)
}
def calculate_grade_level(self, text):
"""Estimate Flesch-Kincaid grade level."""
sentences = max(len(text.split(". ")), 1)
words = len(text.split())
syllables = sum(self._count_syllables(w) for w in text.split())
if words == 0:
return 0
# Flesch-Kincaid Grade Level formula
grade = 0.39 * (words / sentences) + 11.8 * (syllables / words) - 15.59
return max(0, round(grade, 1))
def _count_syllables(self, word):
"""Rough syllable count."""
word = word.lower().strip(".,!?;")
if not word:
return 0
vowels = "aeiouy"
count = 0
prev_was_vowel = False
for char in word:
if char in vowels:
if not prev_was_vowel:
count += 1
prev_was_vowel = True
else:
prev_was_vowel = False
if word.endswith("e"):
count -= 1
return max(1, count)
def main():
parser = argparse.ArgumentParser(description="Patient Consent Simplifier")
parser.add_argument("--input", "-i", help="Input consent form file")
parser.add_argument("--text", "-t", help="Direct text input")
parser.add_argument("--output", "-o", help="Output file")
parser.add_argument("--target-grade", type=int, default=5, help="Target reading grade")
args = parser.parse_args()
simplifier = ConsentSimplifier()
if args.input:
with open(args.input) as f:
text = f.read()
elif args.text:
text = args.text
else:
# Demo text
text = """You hereby authorize the investigators to conduct research procedures
as described in the aforementioned protocol. You understand that participation
is voluntary and you may withdraw at any time without prejudice."""
result = simplifier.simplify(text)
original_grade = simplifier.calculate_grade_level(text)
simplified_grade = simplifier.calculate_grade_level(result["simplified"])
print("\n" + "="*60)
print("CONSENT FORM SIMPLIFICATION")
print("="*60)
print(f"\nOriginal Grade Level: {original_grade}")
print(f"Simplified Grade Level: {simplified_grade}")
print(f"Word Count: {result['original_word_count']} → {result['simplified_word_count']}")
print("\n--- SIMPLIFIED VERSION ---\n")
print(result["simplified"])
print("\n" + "="*60)
if args.output:
with open(args.output, 'w') as f:
f.write(result["simplified"])
print(f"\nSaved to: {args.output}")
if __name__ == "__main__":
main()
Use pathology roi selector for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.
---
name: pathology-roi-selector
description: Use pathology roi selector for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.
license: MIT
skill-author: AIPOCH
---
# Pathology ROI Selector
WSI region detection for AI training.
## When to Use
- Use this skill when the task needs Use pathology roi selector for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.
- Use this skill for data analysis tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
- Scope-focused workflow aligned to: Use pathology roi selector for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.
- Packaged executable path(s): `scripts/main.py`.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
See `## Prerequisites` above for related details.
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `Third-party packages`: `not explicitly version-pinned in this skill package`. Add pinned versions if this skill needs stricter environment control.
## Example Usage
```bash
cd "20260318/scientific-skills/Data Analytics/pathology-roi-selector"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Use Cases
- Tissue microarray creation
- AI model training data
- Pathology education
- Research sampling
## Parameters
- `wsi_file`: Whole slide image
- `tissue_type`: Tumor/normal
- `magnification`: 20x/40x
## Returns
- ROI coordinates
- Tissue percentage
- Quality metrics
- Export ready crops
## Example
Identify tumor-rich regions from 100K x 100K image
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
No additional Python packages required.
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `pathology-roi-selector` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `pathology-roi-selector` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Pathology ROI Selector
Auto-identify regions of interest in whole slide images.
"""
import argparse
class PathologyROISelector:
"""Select regions of interest in pathology images."""
def __init__(self):
self.roi_types = {
"tumor": "Tumor regions",
"stroma": "Stromal tissue",
"necrosis": "Necrotic areas",
"lymphocyte": "Lymphocyte aggregates"
}
def detect_rois(self, image_path, roi_type="tumor", min_size=1000):
"""Detect regions of interest in WSI."""
# Placeholder for actual image analysis
print(f"Analyzing {image_path} for {self.roi_types.get(roi_type, roi_type)}...")
# Mock results
rois = [
{"x": 1000, "y": 2000, "width": 500, "height": 500, "confidence": 0.95},
{"x": 3000, "y": 1500, "width": 800, "height": 600, "confidence": 0.87}
]
return rois
def filter_rois(self, rois, min_confidence=0.8):
"""Filter ROIs by confidence."""
return [roi for roi in rois if roi["confidence"] >= min_confidence]
def export_rois(self, rois, output_file):
"""Export ROI coordinates."""
import json
with open(output_file, 'w') as f:
json.dump(rois, f, indent=2)
def main():
parser = argparse.ArgumentParser(description="Pathology ROI Selector")
parser.add_argument("--image", "-i", required=True, help="WSI file path")
parser.add_argument("--type", "-t", default="tumor", help="ROI type")
parser.add_argument("--output", "-o", help="Output JSON file")
args = parser.parse_args()
selector = PathologyROISelector()
rois = selector.detect_rois(args.image, args.type)
filtered = selector.filter_rois(rois)
print(f"\nDetected {len(filtered)} regions of interest:")
for i, roi in enumerate(filtered, 1):
print(f" ROI {i}: ({roi['x']}, {roi['y']}) {roi['width']}x{roi['height']}")
print(f" Confidence: {roi['confidence']:.2f}")
if args.output:
selector.export_rois(filtered, args.output)
print(f"\nExported to: {args.output}")
if __name__ == "__main__":
main()
Use when analyzing biotech patent landscapes, identifying white spaces in pharmaceutical IP, tracking competitor patents, or assessing freedom to operate for...
---
name: patent-landscape
description: Use when analyzing biotech patent landscapes, identifying white spaces in pharmaceutical IP, tracking competitor patents, or assessing freedom to operate for drug development. Provides comprehensive patent analysis and strategic insights for life sciences innovation.
license: MIT
skill-author: AIPOCH
---
# Biotech Patent Landscape Analyzer
Analyze biotech and pharmaceutical patent landscapes to identify opportunities, assess competition, and guide R&D strategy.
## When to Use
- Use this skill when the task needs Use when analyzing biotech patent landscapes, identifying white spaces in pharmaceutical IP, tracking competitor patents, or assessing freedom to operate for drug development. Provides comprehensive patent analysis and strategic insights for life sciences innovation.
- Use this skill for evidence insight tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
- Scope-focused workflow aligned to: Use when analyzing biotech patent landscapes, identifying white spaces in pharmaceutical IP, tracking competitor patents, or assessing freedom to operate for drug development. Provides comprehensive patent analysis and strategic insights for life sciences innovation.
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `Third-party packages`: `not explicitly version-pinned in this skill package`. Add pinned versions if this skill needs stricter environment control.
## Example Usage
```bash
cd "20260318/scientific-skills/Evidence Insight/patent-landscape"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Quick Start
```python
from scripts.patent_landscape import PatentLandscapeAnalyzer
analyzer = PatentLandscapeAnalyzer()
# Analyze therapeutic area
landscape = analyzer.analyze(
therapeutic_area="CAR-T cell therapy",
date_range="2020-2024",
assignees=["Novartis", "Kite Pharma", "Juno Therapeutics"]
)
```
## Core Capabilities
### 1. Patent Search & Analysis
```python
results = analyzer.search_patents(
keywords=["CRISPR", "gene editing", "therapeutic"],
classification="C12N15/113", # IPC class
jurisdictions=["US", "EP", "WO"]
)
```
**Search Strategies:**
- **Keyword-based**: Technical terms + synonyms
- **Classification-based**: IPC/CPC codes
- **Citation-based**: Forward/backward citations
- **Assignee-based**: Company portfolios
### 2. White Space Analysis
```python
opportunities = analyzer.identify_white_spaces(
technology="Antibody-drug conjugates",
target_diseases=["breast cancer", "lung cancer"],
existing_claims=landscape
)
```
**White Space Opportunities:**
- Underserved disease indications
- Novel combination therapies
- Alternative delivery mechanisms
- Geographical gaps (emerging markets)
### 3. Competitor Intelligence
```python
competitors = analyzer.analyze_competitors(
companies=["Pfizer", "Moderna", "BioNTech"],
focus_area="mRNA vaccines"
)
```
**Competitor Metrics:**
| Metric | Description |
|--------|-------------|
| Portfolio size | Total active patents |
| Filing velocity | Recent filing trends |
| Geographic coverage | Jurisdiction strategy |
| Technology focus | Core vs. peripheral areas |
| Partnership patterns | Collaboration trends |
### 4. Freedom to Operate (FTO) Assessment
```python
fto = analyzer.assess_fto(
product_concept="Bispecific antibody targeting PD-1 and CTLA-4",
jurisdictions=["US", "EU", "Japan"]
)
```
**FTO Analysis Steps:**
1. Identify relevant patent claims
2. Map claims to product features
3. Assess validity of blocking patents
4. Design around options
5. Licensing recommendations
## CLI Usage
```text
# Generate patent landscape report
python scripts/patent_landscape.py \
--query "immuno-oncology checkpoint inhibitors" \
--output landscape_report.pdf \
--format comprehensive
# Quick FTO check
python scripts/patent_landscape.py \
--fto "product_description.txt" \
--jurisdictions US EP JP
```
## Data Sources
- USPTO (United States)
- EPO (Europe)
- WIPO (Global)
- JPO (Japan)
- CNIPA (China)
## References
- `references/ipc-classifications.md` - IPC/CPC codes for biotech
- `references/patent-search-strategies.md` - Advanced search techniques
- `examples/landscape-reports/` - Sample reports
---
**Skill ID**: 204 | **Version**: 1.0 | **License**: MIT
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `patent-landscape` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `patent-landscape` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## References
- [references/audit-reference.md](references/audit-reference.md) - Supported scope, audit commands, and fallback boundaries
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:references/audit-reference.md
# Audit Reference
## Scope
- Skill: `patent-landscape`
- Core purpose: Use when analyzing biotech patent landscapes, identifying white spaces in pharmaceutical IP, tracking competitor patents, or assessing freedom to operate for drug development. Provides comprehensive patent analysis and strategic insights for life sciences innovation.
- Use only within the documented workflow and category boundary defined in `SKILL.md`
## Supported Audit Paths
- `python -m py_compile scripts/main.py`
- `python scripts/main.py --help`
## Fallback Boundary
If required inputs are incomplete, the skill should still return:
- the missing required inputs
- the steps that can still be completed safely
- assumptions that need confirmation before execution
- the next checks before accepting the final deliverable
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Patent Landscape
Summarize patent landscape for specific therapeutic targets.
"""
import argparse
class PatentLandscape:
"""Analyze patent landscape for therapeutic targets."""
def analyze_target(self, target_name, therapeutic_area):
"""Analyze patent landscape for a target."""
# Mock analysis
landscape = {
"target": target_name,
"therapeutic_area": therapeutic_area,
"patent_activity": "High" if target_name.lower() in ["pd1", "her2"] else "Moderate",
"key_players": ["Company A", "Company B", "Company C"],
"white_space": ["Combination therapies", "Novel indications"],
"freedom_to_operate": "Limited - many blocking patents"
}
return landscape
def print_landscape(self, landscape):
"""Print patent landscape analysis."""
print(f"\n{'='*60}")
print(f"PATENT LANDSCAPE: {landscape['target'].upper()}")
print(f"{'='*60}\n")
print(f"Therapeutic Area: {landscape['therapeutic_area']}")
print(f"Patent Activity: {landscape['patent_activity']}")
print()
print("Key Players:")
for player in landscape['key_players']:
print(f" • {player}")
print()
print("White Space Opportunities:")
for opportunity in landscape['white_space']:
print(f" • {opportunity}")
print()
print(f"Freedom to Operate: {landscape['freedom_to_operate']}")
print(f"\n{'='*60}\n")
def main():
parser = argparse.ArgumentParser(description="Patent Landscape")
parser.add_argument("--target", "-t", required=True, help="Drug target")
parser.add_argument("--area", "-a", required=True, help="Therapeutic area")
args = parser.parse_args()
landscape_analyzer = PatentLandscape()
landscape = landscape_analyzer.analyze_target(args.target, args.area)
landscape_analyzer.print_landscape(landscape)
if __name__ == "__main__":
main()
Use when mapping patent claims to products, analyzing patent infringement, or preparing freedom-to-operate analyses. Compares patent claims against product f...
---
name: patent-claim-mapper
description: Use when mapping patent claims to products, analyzing patent infringement, or preparing freedom-to-operate analyses. Compares patent claims against product features for biotech and pharmaceutical IP assessment.
license: MIT
skill-author: AIPOCH
---
# Patent Claim Mapper
Map patent claims to product features for infringement analysis, freedom-to-operate assessments, and competitive intelligence in biotech/pharma.
## When to Use
- Use this skill when the task needs Use when mapping patent claims to products, analyzing patent infringement, or preparing freedom-to-operate analyses. Compares patent claims against product features for biotech and pharmaceutical IP assessment.
- Use this skill for evidence insight tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
- Scope-focused workflow aligned to: Use when mapping patent claims to products, analyzing patent infringement, or preparing freedom-to-operate analyses. Compares patent claims against product features for biotech and pharmaceutical IP assessment.
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `dataclasses`: `unspecified`. Declared in `requirements.txt`.
## Example Usage
```bash
cd "20260318/scientific-skills/Evidence Insight/patent-claim-mapper"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Quick Start
```python
from scripts.claim_mapper import ClaimMapper
mapper = ClaimMapper()
# Map claims to product
mapping = mapper.analyze(
patent_claims="patent_claims.txt",
product_description="product_spec.txt"
)
```
## Core Capabilities
### 1. Claim Parsing
```python
claims = mapper.parse_claims(
patent_file="US10123456B2.pdf",
independent_only=False
)
```
### 2. Feature Mapping
```python
mapping = mapper.map_to_product(
claim="A monoclonal antibody that binds to PD-1...",
product_features=product_specs
)
```
**Mapping Results:**
- **Fully covered**: Product implements claim
- **Partially covered**: Some elements present
- **Not covered**: Claim element missing
- **Questionable**: Requires legal review
### 3. Gap Analysis
```python
gaps = mapper.identify_gaps(
mapping_results,
strategy="design_around"
)
```
## CLI Usage
```text
python scripts/claim_mapper.py \
--patent US10123456B2 \
--product product_spec.txt \
--output mapping_report.pdf
```
---
**Skill ID**: 213 | **Version**: 1.0 | **License**: MIT
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `patent-claim-mapper` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `patent-claim-mapper` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## References
- [references/audit-reference.md](references/audit-reference.md) - Supported scope, audit commands, and fallback boundaries
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:references/audit-reference.md
# Audit Reference
## Scope
- Skill: `patent-claim-mapper`
- Core purpose: Use when mapping patent claims to products, analyzing patent infringement, or preparing freedom-to-operate analyses. Compares patent claims against product features for biotech and pharmaceutical IP assessment.
- Use only within the documented workflow and category boundary defined in `SKILL.md`
## Supported Audit Paths
- `python -m py_compile scripts/main.py`
- `python scripts/main.py --help`
## Fallback Boundary
If required inputs are incomplete, the skill should still return:
- the missing required inputs
- the steps that can still be completed safely
- assumptions that need confirmation before execution
- the next checks before accepting the final deliverable
FILE:requirements.txt
dataclasses
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Patent Claim Mapper - Patent Infringement Risk Analysis Tool
Features:
1. Parse patent claim text
2. Extract technical features
3. Compare with product features
4. Generate infringement risk assessment report
Author: OpenClaw Skills Team
Version: 1.0.0
"""
import re
import json
import argparse
from dataclasses import dataclass, field, asdict
from typing import List, Dict, Optional, Tuple
from pathlib import Path
from datetime import datetime
@dataclass
class ClaimElement:
"""Technical element in patent claim"""
text: str
element_type: str # 'apparatus', 'method', 'feature', 'limitation'
keywords: List[str] = field(default_factory=list)
dependencies: List[str] = field(default_factory=list)
@dataclass
class ClaimMapping:
"""Mapping between patent claim and product"""
claim_element: str
product_feature: Optional[str]
mapping_status: str # 'mapped', 'not_mapped', 'partial'
similarity_score: float
analysis_notes: str
@dataclass
class InfringementReport:
"""Patent infringement analysis report"""
patent_number: str
product_name: str
overall_risk: str # 'high', 'medium', 'low', 'clear'
risk_score: float
claim_mappings: List[ClaimMapping] = field(default_factory=list)
recommendations: List[str] = field(default_factory=list)
class PatentClaimMapper:
"""Main class for patent claim mapping and analysis"""
def __init__(self):
self.claim_parser = ClaimParser()
self.feature_extractor = FeatureExtractor()
self.comparison_engine = ComparisonEngine()
def analyze_infringement(self, patent_claims: str, product_description: str,
patent_number: str = "", product_name: str = "") -> InfringementReport:
"""Analyze potential patent infringement"""
# Parse patent claims
parsed_claims = self.claim_parser.parse(patent_claims)
# Extract product features
product_features = self.feature_extractor.extract(product_description)
# Compare claims to features
mappings = self.comparison_engine.compare(parsed_claims, product_features)
# Calculate overall risk
risk_score = self._calculate_risk_score(mappings)
overall_risk = self._risk_level(risk_score)
# Generate recommendations
recommendations = self._generate_recommendations(mappings, overall_risk)
return InfringementReport(
patent_number=patent_number,
product_name=product_name,
overall_risk=overall_risk,
risk_score=risk_score,
claim_mappings=mappings,
recommendations=recommendations
)
def _calculate_risk_score(self, mappings: List[ClaimMapping]) -> float:
"""Calculate infringement risk score"""
if not mappings:
return 0.0
mapped_count = sum(1 for m in mappings if m.mapping_status == 'mapped')
partial_count = sum(1 for m in mappings if m.mapping_status == 'partial')
score = (mapped_count + 0.5 * partial_count) / len(mappings)
return min(score, 1.0)
def _risk_level(self, score: float) -> str:
"""Convert score to risk level"""
if score >= 0.8:
return "high"
elif score >= 0.5:
return "medium"
elif score >= 0.2:
return "low"
else:
return "clear"
def _generate_recommendations(self, mappings: List[ClaimMapping],
risk_level: str) -> List[str]:
"""Generate recommendations based on analysis"""
recommendations = []
if risk_level == "high":
recommendations.append("Consider design-around options for mapped elements")
recommendations.append("Consult patent attorney for invalidity analysis")
elif risk_level == "medium":
recommendations.append("Investigate alternative implementations")
recommendations.append("Monitor patent family for related claims")
unmapped = [m for m in mappings if m.mapping_status == 'not_mapped']
if unmapped:
recommendations.append(f"Review {len(unmapped)} unmapped claim elements")
return recommendations
class ClaimParser:
"""Parse patent claim text into structured elements"""
def parse(self, claims_text: str) -> List[ClaimElement]:
"""Parse claim text into elements"""
elements = []
# Split into individual claims
claims = re.split(r'\n\s*\d+\.', claims_text)
for claim_text in claims:
if not claim_text.strip():
continue
# Extract claim elements
element = ClaimElement(
text=claim_text.strip(),
element_type=self._classify_type(claim_text),
keywords=self._extract_keywords(claim_text)
)
elements.append(element)
return elements
def _classify_type(self, text: str) -> str:
"""Classify claim type"""
text_lower = text.lower()
if 'method' in text_lower or 'process' in text_lower:
return 'method'
elif 'apparatus' in text_lower or 'system' in text_lower:
return 'apparatus'
elif 'composition' in text_lower:
return 'composition'
return 'feature'
def _extract_keywords(self, text: str) -> List[str]:
"""Extract key technical terms"""
# Simple keyword extraction
words = re.findall(r'\b[A-Z][a-z]+\b', text)
return list(set(words))[:10] # Return unique keywords
class FeatureExtractor:
"""Extract features from product description"""
def extract(self, description: str) -> List[Dict]:
"""Extract product features"""
features = []
# Split into sentences
sentences = re.split(r'[.!?]+', description)
for sent in sentences:
if not sent.strip():
continue
feature = {
'text': sent.strip(),
'keywords': self._extract_keywords(sent),
'technical_terms': self._extract_technical_terms(sent)
}
features.append(feature)
return features
def _extract_keywords(self, text: str) -> List[str]:
"""Extract keywords from text"""
words = re.findall(r'\b[A-Z][a-z]+\b', text)
return list(set(words))[:10]
def _extract_technical_terms(self, text: str) -> List[str]:
"""Extract technical terminology"""
# Look for compound technical terms
terms = re.findall(r'\b[A-Z][a-z]+(?:\s+[a-z]+){1,3}\b', text)
return list(set(terms))[:5]
class ComparisonEngine:
"""Compare patent claims to product features"""
def compare(self, claims: List[ClaimElement],
features: List[Dict]) -> List[ClaimMapping]:
"""Compare claims to features"""
mappings = []
for claim in claims:
best_match = self._find_best_match(claim, features)
mapping = ClaimMapping(
claim_element=claim.text[:100], # Truncate for readability
product_feature=best_match['text'] if best_match else None,
mapping_status=best_match['status'] if best_match else 'not_mapped',
similarity_score=best_match['score'] if best_match else 0.0,
analysis_notes=best_match.get('notes', '')
)
mappings.append(mapping)
return mappings
def _find_best_match(self, claim: ClaimElement,
features: List[Dict]) -> Optional[Dict]:
"""Find best matching feature for claim element"""
best_match = None
best_score = 0.0
for feature in features:
score = self._calculate_similarity(claim, feature)
if score > best_score:
best_score = score
best_match = feature
best_match['score'] = score
# Determine mapping status
if score >= 0.7:
best_match['status'] = 'mapped'
elif score >= 0.4:
best_match['status'] = 'partial'
else:
best_match['status'] = 'not_mapped'
return best_match
def _calculate_similarity(self, claim: ClaimElement,
feature: Dict) -> float:
"""Calculate similarity between claim and feature"""
claim_keywords = set(claim.keywords)
feature_keywords = set(feature.get('keywords', []))
if not claim_keywords or not feature_keywords:
return 0.0
# Jaccard similarity
intersection = claim_keywords & feature_keywords
union = claim_keywords | feature_keywords
return len(intersection) / len(union) if union else 0.0
def main():
parser = argparse.ArgumentParser(description="Patent Claim Mapper")
parser.add_argument("--patent-claims", required=True, help="Patent claims text file")
parser.add_argument("--product-description", required=True, help="Product description file")
parser.add_argument("--patent-number", help="Patent number for reference")
parser.add_argument("--product-name", help="Product name")
parser.add_argument("--output", default="infringement_report.json", help="Output file")
args = parser.parse_args()
# Load files
with open(args.patent_claims, 'r') as f:
claims_text = f.read()
with open(args.product_description, 'r') as f:
product_text = f.read()
# Analyze
mapper = PatentClaimMapper()
report = mapper.analyze_infringement(
claims_text,
product_text,
patent_number=args.patent_number or "",
product_name=args.product_name or ""
)
# Save report
with open(args.output, 'w') as f:
json.dump(asdict(report), f, indent=2)
print(f"Infringement Analysis Complete")
print(f"Overall Risk: {report.overall_risk.upper()}")
print(f"Risk Score: {report.risk_score:.1%}")
print(f"Report saved: {args.output}")
if __name__ == "__main__":
main()
Use outlier detection handler for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.
---
name: outlier-detection-handler
description: Use outlier detection handler for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.
license: MIT
skill-author: AIPOCH
---
# Outlier Detection & Handling
Identify and manage statistical outliers.
## When to Use
- Use this skill when the task needs Use outlier detection handler for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.
- Use this skill for data analysis tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
- Scope-focused workflow aligned to: Use outlier detection handler for data analysis workflows that need structured execution, explicit assumptions, and clear output boundaries.
- Packaged executable path(s): `scripts/main.py`.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
See `## Prerequisites` above for related details.
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `numpy`: `unspecified`. Declared in `requirements.txt`.
- `scipy`: `unspecified`. Declared in `requirements.txt`.
## Example Usage
```bash
cd "20260318/scientific-skills/Data Analytics/outlier-detection-handler"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Use Cases
- Data quality control
- Pre-analysis screening
- Regulatory compliance (FDA data integrity)
## Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `data` | str | Yes | - | Path to dataset file (CSV/Excel) |
| `method` | str | No | "3-sigma" | Detection method: "3-sigma", "IQR", or "Grubbs" |
| `action` | str | No | "flag" | Handling action: "flag", "remove", or "winsorize" |
## Returns
- Outlier flagging with method details
- Handling recommendations
- Documentation for regulatory submission
## Example
Input: Biomarker measurements from 200 patients
Output: 5 outliers identified (2.5%), recommended action: investigate then winsorize
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
```text
# Python dependencies
pip install -r requirements.txt
```
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `outlier-detection-handler` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `outlier-detection-handler` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:requirements.txt
numpy
scipy
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Outlier Detection Handler
Statistical outlier identification and handling recommendations.
"""
import argparse
import numpy as np
from scipy import stats
class OutlierDetector:
"""Detect and handle statistical outliers."""
def zscore_method(self, data, threshold=3):
"""Detect outliers using Z-score."""
z_scores = np.abs(stats.zscore(data))
outliers = np.where(z_scores > threshold)[0]
return outliers, z_scores
def iqr_method(self, data):
"""Detect outliers using IQR."""
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = np.where((data < lower_bound) | (data > upper_bound))[0]
return outliers, (lower_bound, upper_bound)
def grubbs_test(self, data, alpha=0.05):
"""Grubbs' test for outliers."""
n = len(data)
mean = np.mean(data)
std = np.std(data, ddof=1)
G = np.max(np.abs(data - mean)) / std
# Critical value (simplified)
t_critical = stats.t.ppf(1 - alpha / (2 * n), n - 2)
G_critical = ((n - 1) / np.sqrt(n)) * np.sqrt(t_critical**2 / (n - 2 + t_critical**2))
is_outlier = G > G_critical
return is_outlier, G, G_critical
def recommend_handling(self, outlier_count, total_count):
"""Recommend outlier handling approach."""
percentage = (outlier_count / total_count) * 100
if percentage < 1:
return "Remove outliers - likely data entry errors"
elif percentage < 5:
return "Investigate outliers - may be legitimate extreme values"
else:
return "Consider robust statistical methods - high outlier percentage"
def main():
parser = argparse.ArgumentParser(description="Outlier Detection Handler")
parser.add_argument("--data", "-d", help="Data file (one value per line)")
parser.add_argument("--method", "-m", choices=["zscore", "iqr", "grubbs"],
default="iqr", help="Detection method")
parser.add_argument("--threshold", "-t", type=float, default=3,
help="Z-score threshold")
args = parser.parse_args()
detector = OutlierDetector()
if args.data:
with open(args.data) as f:
data = np.array([float(line.strip()) for line in f if line.strip()])
else:
# Demo data
data = np.array([10, 12, 11, 13, 12, 11, 100, 12, 11, 13])
print(f"\n{'='*60}")
print("OUTLIER DETECTION RESULTS")
print(f"{'='*60}\n")
print(f"Data points: {len(data)}")
print(f"Mean: {np.mean(data):.2f}")
print(f"Std: {np.std(data):.2f}")
print()
if args.method == "zscore":
outliers, scores = detector.zscore_method(data, args.threshold)
print(f"Method: Z-score (threshold = {args.threshold})")
elif args.method == "iqr":
outliers, bounds = detector.iqr_method(data)
print(f"Method: IQR")
print(f"Bounds: [{bounds[0]:.2f}, {bounds[1]:.2f}]")
elif args.method == "grubbs":
is_outlier, G, G_critical = detector.grubbs_test(data)
print(f"Method: Grubbs' test")
print(f"G statistic: {G:.3f}")
print(f"G critical: {G_critical:.3f}")
outliers = [np.argmax(np.abs(data - np.mean(data)))] if is_outlier else []
print(f"\nOutliers detected: {len(outliers)}")
if len(outliers) > 0:
print("Outlier values:")
for idx in outliers:
print(f" Index {idx}: {data[idx]}")
recommendation = detector.recommend_handling(len(outliers), len(data))
print(f"\nRecommendation: {recommendation}")
print(f"\n{'='*60}\n")
if __name__ == "__main__":
main()
Check if referenced bioinformatics software/code licenses allow commercial use (GPL vs MIT, etc.).
---
name: open-source-license-check
description: Check if referenced bioinformatics software/code licenses allow commercial use (GPL vs MIT, etc.).
license: MIT
skill-author: AIPOCH
---
# Open Source License Check
Check if referenced bioinformatics software/code licenses allow commercial use (GPL vs MIT, etc.).
## When to Use
- Use this skill when the task needs Check if referenced bioinformatics software/code licenses allow commercial use (GPL vs MIT, etc.).
- Use this skill for evidence insight tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
## Key Features
- Scope-focused workflow aligned to: Check if referenced bioinformatics software/code licenses allow commercial use (GPL vs MIT, etc.).
- Packaged executable path(s): `scripts/main.py`.
- Reference material available in `references/` for task-specific guidance.
- Structured execution path designed to keep outputs consistent and reviewable.
## Dependencies
See `## Prerequisites` above for related details.
- `Python`: `3.10+`. Repository baseline for current packaged skills.
- `Third-party packages`: `not explicitly version-pinned in this skill package`. Add pinned versions if this skill needs stricter environment control.
## Example Usage
See `## Usage` above for related details.
```bash
cd "20260318/scientific-skills/Evidence Insight/open-source-license-check"
python -m py_compile scripts/main.py
python scripts/main.py --help
```
Example run plan:
1. Confirm the user input, output path, and any required config values.
2. Edit the in-file `CONFIG` block or documented parameters if the script uses fixed settings.
3. Run `python scripts/main.py` with the validated inputs.
4. Review the generated output and return the final artifact with any assumptions called out.
## Implementation Details
See `## Workflow` above for related details.
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface: `scripts/main.py`.
- Reference guidance: `references/` contains supporting rules, prompts, or checklists.
- Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
## Quick Check
Use this command to verify that the packaged script entry point can be parsed before deeper execution.
```bash
python -m py_compile scripts/main.py
```
## Audit-Ready Commands
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
```bash
python -m py_compile scripts/main.py
python scripts/main.py --help
```
## Workflow
1. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
2. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
3. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
4. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
5. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
## Usage
```text
python scripts/main.py --software "samtools,bwa,bedtools"
python scripts/main.py --check-requirements requirements.txt
```
## Parameters
- `--software`: Comma-separated software names
- `--check-requirements`: Check Python requirements file
- `--check-directory`: Scan directory for license files
## License Types
| License | Commercial Use | Notes |
|---------|---------------|-------|
| MIT | ✅ Yes | Permissive |
| Apache-2.0 | ✅ Yes | Permissive |
| BSD | ✅ Yes | Permissive |
| GPL-3.0 | ⚠️ Copyleft | Must open source derivative |
| GPL-2.0 | ⚠️ Copyleft | Must open source derivative |
| AGPL | ❌ No | Network use is distribution |
## Output
- License compatibility report
- Commercial use warnings
- Compliance recommendations
## Risk Assessment
| Risk Indicator | Assessment | Level |
|----------------|------------|-------|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
## Security Checklist
- [ ] No hardcoded credentials or API keys
- [ ] No unauthorized file system access (../)
- [ ] Output does not expose sensitive information
- [ ] Prompt injection protections in place
- [ ] Input file paths validated (no ../ traversal)
- [ ] Output directory restricted to workspace
- [ ] Script execution in sandboxed environment
- [ ] Error messages sanitized (no stack traces exposed)
- [ ] Dependencies audited
## Prerequisites
No additional Python packages required.
## Evaluation Criteria
### Success Metrics
- [ ] Successfully executes main functionality
- [ ] Output meets quality standards
- [ ] Handles edge cases gracefully
- [ ] Performance is acceptable
### Test Cases
1. **Basic Functionality**: Standard input → Expected output
2. **Edge Case**: Invalid input → Graceful error handling
3. **Performance**: Large dataset → Acceptable processing time
## Lifecycle Status
- **Current Stage**: Draft
- **Next Review Date**: 2026-03-06
- **Known Issues**: None
- **Planned Improvements**:
- Performance optimization
- Additional feature support
## Output Requirements
Every final response should make these items explicit when they are relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs
- Unresolved items and next-step checks
## Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If `scripts/main.py` fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
- Do not fabricate files, citations, data, search results, or execution outcomes.
## Input Validation
This skill accepts requests that match the documented purpose of `open-source-license-check` and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
> `open-source-license-check` only handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
## References
- [references/audit-reference.md](references/audit-reference.md) - Supported scope, audit commands, and fallback boundaries
## Response Template
Use the following fixed structure for non-trivial requests:
1. Objective
2. Inputs Received
3. Assumptions
4. Workflow
5. Deliverable
6. Risks and Limits
7. Next Checks
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.
FILE:references/audit-reference.md
# Audit Reference
## Scope
- Skill: `open-source-license-check`
- Core purpose: Check if referenced bioinformatics software/code licenses allow commercial use (GPL vs MIT, etc.).
- Use only within the documented workflow and category boundary defined in `SKILL.md`
## Supported Audit Paths
- `python -m py_compile scripts/main.py`
- `python scripts/main.py --help`
## Fallback Boundary
If required inputs are incomplete, the skill should still return:
- the missing required inputs
- the steps that can still be completed safely
- assumptions that need confirmation before execution
- the next checks before accepting the final deliverable
FILE:scripts/main.py
#!/usr/bin/env python3
"""
Open Source License Check
Check bioinformatics software licenses for commercial use.
"""
import argparse
from pathlib import Path
class LicenseChecker:
"""Check software licenses."""
LICENSE_DB = {
"samtools": {"license": "MIT", "commercial": True, "copyleft": False},
"bwa": {"license": "GPL-3.0", "commercial": True, "copyleft": True},
"bedtools": {"license": "MIT", "commercial": True, "copyleft": False},
"bowtie2": {"license": "MIT", "commercial": True, "copyleft": False},
"star": {"license": "GPL-3.0", "commercial": True, "copyleft": True},
"hisat2": {"license": "GPL-3.0", "commercial": True, "copyleft": True},
"salmon": {"license": "GPL-3.0", "commercial": True, "copyleft": True},
"kallisto": {"license": "BSD-2", "commercial": True, "copyleft": False},
"deseq2": {"license": "LGPL-3.0", "commercial": True, "copyleft": True},
"edger": {"license": "GPL-3.0", "commercial": True, "copyleft": True},
"limma": {"license": "GPL-3.0", "commercial": True, "copyleft": True},
"ggplot2": {"license": "MIT", "commercial": True, "copyleft": False},
"pandas": {"license": "BSD-3", "commercial": True, "copyleft": False},
"numpy": {"license": "BSD-3", "commercial": True, "copyleft": False},
"scipy": {"license": "BSD-3", "commercial": True, "copyleft": False},
"scikit-learn": {"license": "BSD-3", "commercial": True, "copyleft": False},
"tensorflow": {"license": "Apache-2.0", "commercial": True, "copyleft": False},
"pytorch": {"license": "BSD-3", "commercial": True, "copyleft": False},
}
def check(self, software_name):
"""Check license for software."""
name = software_name.lower().strip()
if name in self.LICENSE_DB:
return self.LICENSE_DB[name]
return None
def print_report(self, software_list):
"""Print license report."""
print(f"\n{'='*70}")
print(f"{'Software':<20} {'License':<15} {'Commercial':<12} {'Risk'}")
print(f"{'='*70}")
warnings = []
for sw in software_list:
info = self.check(sw)
if info:
status = "✅ Yes" if info["commercial"] else "❌ No"
risk = "⚠️ Copyleft" if info["copyleft"] else "✓ Safe"
print(f"{sw:<20} {info['license']:<15} {status:<12} {risk}")
if info["copyleft"]:
warnings.append((sw, info["license"]))
else:
print(f"{sw:<20} {'Unknown':<15} {'?':<12} {'⚠️ Check manually'}")
print(f"{'='*70}\n")
if warnings:
print("⚠️ WARNINGS - Copyleft licenses require source code sharing:")
for sw, lic in warnings:
print(f" - {sw} ({lic}): Must open-source derivative works")
print()
def main():
parser = argparse.ArgumentParser(description="Open Source License Check")
parser.add_argument("--software", "-s", help="Comma-separated software names")
parser.add_argument("--check-requirements", "-r", help="Python requirements.txt file")
args = parser.parse_args()
checker = LicenseChecker()
if args.software:
software_list = [s.strip() for s in args.software.split(",")]
checker.print_report(software_list)
elif args.check_requirements:
print(f"Checking {args.check_requirements}...")
# Parse requirements file
software_list = []
with open(args.check_requirements) as f:
for line in f:
if line.strip() and not line.startswith("#"):
pkg = line.split("=")[0].split("[")[0].strip()
software_list.append(pkg)
checker.print_report(software_list)
else:
# Demo mode
print("Demo mode - checking common bioinformatics tools:")
checker.print_report(["samtools", "bwa", "bedtools", "star", "kallisto"])
if __name__ == "__main__":
main()