@clawhub-yxr191202-3b79015c08
Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be...
---
name: literature-review
description: Conduct comprehensive, systematic literature reviews using multiple academic databases (PubMed, arXiv, bioRxiv, Semantic Scholar, etc.). This skill should be used when conducting systematic literature reviews, meta-analyses, research synthesis, or comprehensive literature searches across biomedical, scientific, and technical domains. Creates professionally formatted markdown documents and PDFs with verified citations in multiple citation styles (APA, Nature, Vancouver, etc.).
allowed-tools: Read Write Edit Bash
license: MIT license
metadata:
skill-author: K-Dense Inc.
---
# Literature Review
## Overview
Conduct systematic, comprehensive literature reviews following rigorous academic methodology. Search multiple literature databases, synthesize findings thematically, verify all citations for accuracy, and generate professional output documents in markdown and PDF formats.
This skill integrates with multiple scientific skills for database access (gget, bioservices, datacommons-client) and provides specialized tools for citation verification, result aggregation, and document generation.
## When to Use This Skill
Use this skill when:
- Conducting a systematic literature review for research or publication
- Synthesizing current knowledge on a specific topic across multiple sources
- Performing meta-analysis or scoping reviews
- Writing the literature review section of a research paper or thesis
- Investigating the state of the art in a research domain
- Identifying research gaps and future directions
- Requiring verified citations and professional formatting
## Visual Enhancement with Scientific Schematics
**⚠️ MANDATORY: Every literature review MUST include at least 1-2 AI-generated figures using the scientific-schematics skill.**
This is not optional. Literature reviews without visual elements are incomplete. Before finalizing any document:
1. Generate at minimum ONE schematic or diagram (e.g., PRISMA flow diagram for systematic reviews)
2. Prefer 2-3 figures for comprehensive reviews (search strategy flowchart, thematic synthesis diagram, conceptual framework)
**How to generate figures:**
- Use the **scientific-schematics** skill to generate AI-powered publication-quality diagrams
- Simply describe your desired diagram in natural language
- Nano Banana Pro will automatically generate, review, and refine the schematic
**How to generate schematics:**
```bash
python scripts/generate_schematic.py "your diagram description" -o figures/output.png
```
The AI will automatically:
- Create publication-quality images with proper formatting
- Review and refine through multiple iterations
- Ensure accessibility (colorblind-friendly, high contrast)
- Save outputs in the figures/ directory
**When to add schematics:**
- PRISMA flow diagrams for systematic reviews
- Literature search strategy flowcharts
- Thematic synthesis diagrams
- Research gap visualization maps
- Citation network diagrams
- Conceptual framework illustrations
- Any complex concept that benefits from visualization
For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.
---
## Core Workflow
Literature reviews follow a structured, multi-phase workflow:
### Phase 1: Planning and Scoping
1. **Define Research Question**: Use PICO framework (Population, Intervention, Comparison, Outcome) for clinical/biomedical reviews
- Example: "What is the efficacy of CRISPR-Cas9 (I) for treating sickle cell disease (P) compared to standard care (C)?"
2. **Establish Scope and Objectives**:
- Define clear, specific research questions
- Determine review type (narrative, systematic, scoping, meta-analysis)
- Set boundaries (time period, geographic scope, study types)
3. **Develop Search Strategy**:
- Identify 2-4 main concepts from research question
- List synonyms, abbreviations, and related terms for each concept
- Plan Boolean operators (AND, OR, NOT) to combine terms
- Select minimum 3 complementary databases
4. **Set Inclusion/Exclusion Criteria**:
- Date range (e.g., last 10 years: 2015-2024)
- Language (typically English, or specify multilingual)
- Publication types (peer-reviewed, preprints, reviews)
- Study designs (RCTs, observational, in vitro, etc.)
- Document all criteria clearly
### Phase 2: Systematic Literature Search
1. **Multi-Database Search**:
Select databases appropriate for the domain:
**Biomedical & Life Sciences:**
- Use `gget` skill: `gget search pubmed "search terms"` for PubMed/PMC
- Use `gget` skill: `gget search biorxiv "search terms"` for preprints
- Use `bioservices` skill for ChEMBL, KEGG, UniProt, etc.
**General Scientific Literature:**
- Search arXiv via direct API (preprints in physics, math, CS, q-bio)
- Search Semantic Scholar via API (200M+ papers, cross-disciplinary)
- Use Google Scholar for comprehensive coverage (manual or careful scraping)
**Specialized Databases:**
- Use `gget alphafold` for protein structures
- Use `gget cosmic` for cancer genomics
- Use `datacommons-client` for demographic/statistical data
- Use specialized databases as appropriate for the domain
2. **Document Search Parameters**:
```markdown
## Search Strategy
### Database: PubMed
- **Date searched**: 2024-10-25
- **Date range**: 2015-01-01 to 2024-10-25
- **Search string**:
```
("CRISPR"[Title] OR "Cas9"[Title])
AND ("sickle cell"[MeSH] OR "SCD"[Title/Abstract])
AND 2015:2024[Publication Date]
```
- **Results**: 247 articles
```
Repeat for each database searched.
3. **Export and Aggregate Results**:
- Export results in JSON format from each database
- Combine all results into a single file
- Use `scripts/search_databases.py` for post-processing:
```bash
python search_databases.py combined_results.json \
--deduplicate \
--format markdown \
--output aggregated_results.md
```
### Phase 3: Screening and Selection
1. **Deduplication**:
```bash
python search_databases.py results.json --deduplicate --output unique_results.json
```
- Removes duplicates by DOI (primary) or title (fallback)
- Document number of duplicates removed
2. **Title Screening**:
- Review all titles against inclusion/exclusion criteria
- Exclude obviously irrelevant studies
- Document number excluded at this stage
3. **Abstract Screening**:
- Read abstracts of remaining studies
- Apply inclusion/exclusion criteria rigorously
- Document reasons for exclusion
4. **Full-Text Screening**:
- Obtain full texts of remaining studies
- Conduct detailed review against all criteria
- Document specific reasons for exclusion
- Record final number of included studies
5. **Create PRISMA Flow Diagram**:
```
Initial search: n = X
├─ After deduplication: n = Y
├─ After title screening: n = Z
├─ After abstract screening: n = A
└─ Included in review: n = B
```
### Phase 4: Data Extraction and Quality Assessment
1. **Extract Key Data** from each included study:
- Study metadata (authors, year, journal, DOI)
- Study design and methods
- Sample size and population characteristics
- Key findings and results
- Limitations noted by authors
- Funding sources and conflicts of interest
2. **Assess Study Quality**:
- **For RCTs**: Use Cochrane Risk of Bias tool
- **For observational studies**: Use Newcastle-Ottawa Scale
- **For systematic reviews**: Use AMSTAR 2
- Rate each study: High, Moderate, Low, or Very Low quality
- Consider excluding very low-quality studies
3. **Organize by Themes**:
- Identify 3-5 major themes across studies
- Group studies by theme (studies may appear in multiple themes)
- Note patterns, consensus, and controversies
### Phase 5: Synthesis and Analysis
1. **Create Review Document** from template:
```bash
cp assets/review_template.md my_literature_review.md
```
2. **Write Thematic Synthesis** (NOT study-by-study summaries):
- Organize Results section by themes or research questions
- Synthesize findings across multiple studies within each theme
- Compare and contrast different approaches and results
- Identify consensus areas and points of controversy
- Highlight the strongest evidence
Example structure:
```markdown
#### 3.3.1 Theme: CRISPR Delivery Methods
Multiple delivery approaches have been investigated for therapeutic
gene editing. Viral vectors (AAV) were used in 15 studies^1-15^ and
showed high transduction efficiency (65-85%) but raised immunogenicity
concerns^3,7,12^. In contrast, lipid nanoparticles demonstrated lower
efficiency (40-60%) but improved safety profiles^16-23^.
```
3. **Critical Analysis**:
- Evaluate methodological strengths and limitations across studies
- Assess quality and consistency of evidence
- Identify knowledge gaps and methodological gaps
- Note areas requiring future research
4. **Write Discussion**:
- Interpret findings in broader context
- Discuss clinical, practical, or research implications
- Acknowledge limitations of the review itself
- Compare with previous reviews if applicable
- Propose specific future research directions
### Phase 6: Citation Verification
**CRITICAL**: All citations must be verified for accuracy before final submission.
1. **Verify All DOIs**:
```bash
python scripts/verify_citations.py my_literature_review.md
```
This script:
- Extracts all DOIs from the document
- Verifies each DOI resolves correctly
- Retrieves metadata from CrossRef
- Generates verification report
- Outputs properly formatted citations
2. **Review Verification Report**:
- Check for any failed DOIs
- Verify author names, titles, and publication details match
- Correct any errors in the original document
- Re-run verification until all citations pass
3. **Format Citations Consistently**:
- Choose one citation style and use throughout (see `references/citation_styles.md`)
- Common styles: APA, Nature, Vancouver, Chicago, IEEE
- Use verification script output to format citations correctly
- Ensure in-text citations match reference list format
### Phase 7: Document Generation
1. **Generate PDF**:
```bash
python scripts/generate_pdf.py my_literature_review.md \
--citation-style apa \
--output my_review.pdf
```
Options:
- `--citation-style`: apa, nature, chicago, vancouver, ieee
- `--no-toc`: Disable table of contents
- `--no-numbers`: Disable section numbering
- `--check-deps`: Check if pandoc/xelatex are installed
2. **Review Final Output**:
- Check PDF formatting and layout
- Verify all sections are present
- Ensure citations render correctly
- Check that figures/tables appear properly
- Verify table of contents is accurate
3. **Quality Checklist**:
- [ ] All DOIs verified with verify_citations.py
- [ ] Citations formatted consistently
- [ ] PRISMA flow diagram included (for systematic reviews)
- [ ] Search methodology fully documented
- [ ] Inclusion/exclusion criteria clearly stated
- [ ] Results organized thematically (not study-by-study)
- [ ] Quality assessment completed
- [ ] Limitations acknowledged
- [ ] References complete and accurate
- [ ] PDF generates without errors
## Database-Specific Search Guidance
### PubMed / PubMed Central
Access via `gget` skill:
```bash
# Search PubMed
gget search pubmed "CRISPR gene editing" -l 100
# Search with filters
# Use PubMed Advanced Search Builder to construct complex queries
# Then execute via gget or direct Entrez API
```
**Search tips**:
- Use MeSH terms: `"sickle cell disease"[MeSH]`
- Field tags: `[Title]`, `[Title/Abstract]`, `[Author]`
- Date filters: `2020:2024[Publication Date]`
- Boolean operators: AND, OR, NOT
- See MeSH browser: https://meshb.nlm.nih.gov/search
### bioRxiv / medRxiv
Access via `gget` skill:
```bash
gget search biorxiv "CRISPR sickle cell" -l 50
```
**Important considerations**:
- Preprints are not peer-reviewed
- Verify findings with caution
- Check if preprint has been published (CrossRef)
- Note preprint version and date
### arXiv
Access via direct API or WebFetch:
```python
# Example search categories:
# q-bio.QM (Quantitative Methods)
# q-bio.GN (Genomics)
# q-bio.MN (Molecular Networks)
# cs.LG (Machine Learning)
# stat.ML (Machine Learning Statistics)
# Search format: category AND terms
search_query = "cat:q-bio.QM AND ti:\"single cell sequencing\""
```
### Semantic Scholar
Access via direct API (requires API key, or use free tier):
- 200M+ papers across all fields
- Excellent for cross-disciplinary searches
- Provides citation graphs and paper recommendations
- Use for finding highly influential papers
### Specialized Biomedical Databases
Use appropriate skills:
- **ChEMBL**: `bioservices` skill for chemical bioactivity
- **UniProt**: `gget` or `bioservices` skill for protein information
- **KEGG**: `bioservices` skill for pathways and genes
- **COSMIC**: `gget` skill for cancer mutations
- **AlphaFold**: `gget alphafold` for protein structures
- **PDB**: `gget` or direct API for experimental structures
### Citation Chaining
Expand search via citation networks:
1. **Forward citations** (papers citing key papers):
- Use Google Scholar "Cited by"
- Use Semantic Scholar or OpenAlex APIs
- Identifies newer research building on seminal work
2. **Backward citations** (references from key papers):
- Extract references from included papers
- Identify highly cited foundational work
- Find papers cited by multiple included studies
## Citation Style Guide
Detailed formatting guidelines are in `references/citation_styles.md`. Quick reference:
### APA (7th Edition)
- In-text: (Smith et al., 2023)
- Reference: Smith, J. D., Johnson, M. L., & Williams, K. R. (2023). Title. *Journal*, *22*(4), 301-318. https://doi.org/10.xxx/yyy
### Nature
- In-text: Superscript numbers^1,2^
- Reference: Smith, J. D., Johnson, M. L. & Williams, K. R. Title. *Nat. Rev. Drug Discov.* **22**, 301-318 (2023).
### Vancouver
- In-text: Superscript numbers^1,2^
- Reference: Smith JD, Johnson ML, Williams KR. Title. Nat Rev Drug Discov. 2023;22(4):301-18.
**Always verify citations** with verify_citations.py before finalizing.
### Prioritizing High-Impact Papers (CRITICAL)
**Always prioritize influential, highly-cited papers from reputable authors and top venues.** Quality matters more than quantity in literature reviews.
#### Citation Count Thresholds
Use citation counts to identify the most impactful papers:
| Paper Age | Citation Threshold | Classification |
|-----------|-------------------|----------------|
| 0-3 years | 20+ citations | Noteworthy |
| 0-3 years | 100+ citations | Highly Influential |
| 3-7 years | 100+ citations | Significant |
| 3-7 years | 500+ citations | Landmark Paper |
| 7+ years | 500+ citations | Seminal Work |
| 7+ years | 1000+ citations | Foundational |
#### Journal and Venue Tiers
Prioritize papers from higher-tier venues:
- **Tier 1 (Always Prefer):** Nature, Science, Cell, NEJM, Lancet, JAMA, PNAS, Nature Medicine, Nature Biotechnology
- **Tier 2 (Strong Preference):** High-impact specialized journals (IF>10), top conferences (NeurIPS, ICML for ML/AI)
- **Tier 3 (Include When Relevant):** Respected specialized journals (IF 5-10)
- **Tier 4 (Use Sparingly):** Lower-impact peer-reviewed venues
#### Author Reputation Assessment
Prefer papers from:
- **Senior researchers** with high h-index (>40 in established fields)
- **Leading research groups** at recognized institutions (Harvard, Stanford, MIT, Oxford, etc.)
- **Authors with multiple Tier-1 publications** in the relevant field
- **Researchers with recognized expertise** (awards, editorial positions, society fellows)
#### Identifying Seminal Papers
For any topic, identify foundational work by:
1. **High citation count** (typically 500+ for papers 5+ years old)
2. **Frequently cited by other included studies** (appears in many reference lists)
3. **Published in Tier-1 venues** (Nature, Science, Cell family)
4. **Written by field pioneers** (often cited as establishing concepts)
## Best Practices
### Search Strategy
1. **Use multiple databases** (minimum 3): Ensures comprehensive coverage
2. **Include preprint servers**: Captures latest unpublished findings
3. **Document everything**: Search strings, dates, result counts for reproducibility
4. **Test and refine**: Run pilot searches, review results, adjust search terms
5. **Sort by citations**: When available, sort search results by citation count to surface influential work first
### Screening and Selection
1. **Use multiple databases** (minimum 3): Ensures comprehensive coverage
2. **Include preprint servers**: Captures latest unpublished findings
3. **Document everything**: Search strings, dates, result counts for reproducibility
4. **Test and refine**: Run pilot searches, review results, adjust search terms
### Screening and Selection
1. **Use clear criteria**: Document inclusion/exclusion criteria before screening
2. **Screen systematically**: Title → Abstract → Full text
3. **Document exclusions**: Record reasons for excluding studies
4. **Consider dual screening**: For systematic reviews, have two reviewers screen independently
### Synthesis
1. **Organize thematically**: Group by themes, NOT by individual studies
2. **Synthesize across studies**: Compare, contrast, identify patterns
3. **Be critical**: Evaluate quality and consistency of evidence
4. **Identify gaps**: Note what's missing or understudied
### Quality and Reproducibility
1. **Assess study quality**: Use appropriate quality assessment tools
2. **Verify all citations**: Run verify_citations.py script
3. **Document methodology**: Provide enough detail for others to reproduce
4. **Follow guidelines**: Use PRISMA for systematic reviews
### Writing
1. **Be objective**: Present evidence fairly, acknowledge limitations
2. **Be systematic**: Follow structured template
3. **Be specific**: Include numbers, statistics, effect sizes where available
4. **Be clear**: Use clear headings, logical flow, thematic organization
## Common Pitfalls to Avoid
1. **Single database search**: Misses relevant papers; always search multiple databases
2. **No search documentation**: Makes review irreproducible; document all searches
3. **Study-by-study summary**: Lacks synthesis; organize thematically instead
4. **Unverified citations**: Leads to errors; always run verify_citations.py
5. **Too broad search**: Yields thousands of irrelevant results; refine with specific terms
6. **Too narrow search**: Misses relevant papers; include synonyms and related terms
7. **Ignoring preprints**: Misses latest findings; include bioRxiv, medRxiv, arXiv
8. **No quality assessment**: Treats all evidence equally; assess and report quality
9. **Publication bias**: Only positive results published; note potential bias
10. **Outdated search**: Field evolves rapidly; clearly state search date
## Example Workflow
Complete workflow for a biomedical literature review:
```bash
# 1. Create review document from template
cp assets/review_template.md crispr_sickle_cell_review.md
# 2. Search multiple databases using appropriate skills
# - Use gget skill for PubMed, bioRxiv
# - Use direct API access for arXiv, Semantic Scholar
# - Export results in JSON format
# 3. Aggregate and process results
python scripts/search_databases.py combined_results.json \
--deduplicate \
--rank citations \
--year-start 2015 \
--year-end 2024 \
--format markdown \
--output search_results.md \
--summary
# 4. Screen results and extract data
# - Manually screen titles, abstracts, full texts
# - Extract key data into the review document
# - Organize by themes
# 5. Write the review following template structure
# - Introduction with clear objectives
# - Detailed methodology section
# - Results organized thematically
# - Critical discussion
# - Clear conclusions
# 6. Verify all citations
python scripts/verify_citations.py crispr_sickle_cell_review.md
# Review the citation report
cat crispr_sickle_cell_review_citation_report.json
# Fix any failed citations and re-verify
python scripts/verify_citations.py crispr_sickle_cell_review.md
# 7. Generate professional PDF
python scripts/generate_pdf.py crispr_sickle_cell_review.md \
--citation-style nature \
--output crispr_sickle_cell_review.pdf
# 8. Review final PDF and markdown outputs
```
## Integration with Other Skills
This skill works seamlessly with other scientific skills:
### Database Access Skills
- **gget**: PubMed, bioRxiv, COSMIC, AlphaFold, Ensembl, UniProt
- **bioservices**: ChEMBL, KEGG, Reactome, UniProt, PubChem
- **datacommons-client**: Demographics, economics, health statistics
### Analysis Skills
- **pydeseq2**: RNA-seq differential expression (for methods sections)
- **scanpy**: Single-cell analysis (for methods sections)
- **anndata**: Single-cell data (for methods sections)
- **biopython**: Sequence analysis (for background sections)
### Visualization Skills
- **matplotlib**: Generate figures and plots for review
- **seaborn**: Statistical visualizations
### Writing Skills
- **brand-guidelines**: Apply institutional branding to PDF
- **internal-comms**: Adapt review for different audiences
## Resources
### Bundled Resources
**Scripts:**
- `scripts/verify_citations.py`: Verify DOIs and generate formatted citations
- `scripts/generate_pdf.py`: Convert markdown to professional PDF
- `scripts/search_databases.py`: Process, deduplicate, and format search results
**References:**
- `references/citation_styles.md`: Detailed citation formatting guide (APA, Nature, Vancouver, Chicago, IEEE)
- `references/database_strategies.md`: Comprehensive database search strategies
**Assets:**
- `assets/review_template.md`: Complete literature review template with all sections
### External Resources
**Guidelines:**
- PRISMA (Systematic Reviews): http://www.prisma-statement.org/
- Cochrane Handbook: https://training.cochrane.org/handbook
- AMSTAR 2 (Review Quality): https://amstar.ca/
**Tools:**
- MeSH Browser: https://meshb.nlm.nih.gov/search
- PubMed Advanced Search: https://pubmed.ncbi.nlm.nih.gov/advanced/
- Boolean Search Guide: https://www.ncbi.nlm.nih.gov/books/NBK3827/
**Citation Styles:**
- APA Style: https://apastyle.apa.org/
- Nature Portfolio: https://www.nature.com/nature-portfolio/editorial-policies/reporting-standards
- NLM/Vancouver: https://www.nlm.nih.gov/bsd/uniform_requirements.html
## Dependencies
### Required Python Packages
```bash
pip install requests # For citation verification
```
### Required System Tools
```bash
# For PDF generation
brew install pandoc # macOS
apt-get install pandoc # Linux
# For LaTeX (PDF generation)
brew install --cask mactex # macOS
apt-get install texlive-xetex # Linux
```
Check dependencies:
```bash
python scripts/generate_pdf.py --check-deps
```
## Summary
This literature-review skill provides:
1. **Systematic methodology** following academic best practices
2. **Multi-database integration** via existing scientific skills
3. **Citation verification** ensuring accuracy and credibility
4. **Professional output** in markdown and PDF formats
5. **Comprehensive guidance** covering the entire review process
6. **Quality assurance** with verification and validation tools
7. **Reproducibility** through detailed documentation requirements
Conduct thorough, rigorous literature reviews that meet academic standards and provide comprehensive synthesis of current knowledge in any domain.
FILE:assets/review_template.md
# [Literature Review Title]
**Authors**: [Author Names and Affiliations]
**Date**: [Date]
**Review Type**: [Narrative / Systematic / Scoping / Meta-Analysis / Umbrella Review]
**Review Protocol**: [PROSPERO ID if registered, or state "Not registered"]
**PRISMA Compliance**: [Yes/No/Partial - specify which guidelines]
---
## Abstract
**Background**: [Context and rationale]
**Objectives**: [Primary and secondary objectives]
**Methods**: [Databases, dates, selection criteria, quality assessment]
**Results**: [n studies included; key findings by theme]
**Conclusions**: [Main conclusions and implications]
**Registration**: [PROSPERO ID or "Not registered"]
**Keywords**: [5-8 keywords]
---
## 1. Introduction
### 1.1 Background and Context
[Provide background information on the topic. Establish why this literature review is important and timely. Discuss the broader context and current state of knowledge.]
### 1.2 Scope and Objectives
[Clearly define the scope of the review and state the specific objectives. What questions will this review address?]
**Primary Research Questions:**
1. [Research question 1]
2. [Research question 2]
3. [Research question 3]
### 1.3 Significance
[Explain the significance of this review. Why is it important to synthesize this literature now? What gaps does it fill?]
---
## 2. Methodology
### 2.1 Protocol and Registration
**Protocol**: [PROSPERO ID / OSF link / Not registered]
**Deviations**: [Document any protocol deviations]
**PRISMA**: [Checklist in Appendix B]
### 2.2 Search Strategy
**Databases:** [PubMed, Scopus, Web of Science, bioRxiv, etc.]
**Supplementary:** [Citation chaining, grey literature, trial registries]
**Search String Example:**
```
("CRISPR"[Title/Abstract] OR "Cas9"[Title/Abstract]) AND
("disease"[MeSH Terms]) AND ("2015/01/01"[Date] : "2024/12/31"[Date])
```
**Dates:** [YYYY-MM-DD to YYYY-MM-DD] | **Executed:** [Date]
**Validation:** [Key papers used to test search strategy]
### 2.3 Tools and Software
**Screening:** [Rayyan, Covidence, ASReview]
**Analysis:** [VOSviewer, R, Python]
**Citation Management:** [Zotero, Mendeley, EndNote]
**AI Tools:** [Any AI-assisted tools used; document validation approach]
### 2.4 Inclusion and Exclusion Criteria
**Inclusion Criteria:**
- [Criterion 1: e.g., Published between 2015-2024]
- [Criterion 2: e.g., Peer-reviewed articles and preprints]
- [Criterion 3: e.g., English language]
- [Criterion 4: e.g., Human or animal studies]
- [Criterion 5: e.g., Original research or systematic reviews]
**Exclusion Criteria:**
- [Criterion 1: e.g., Case reports with n<5]
- [Criterion 2: e.g., Conference abstracts without full text]
- [Criterion 3: e.g., Editorials and commentaries]
- [Criterion 4: e.g., Duplicate publications]
- [Criterion 5: e.g., Retracted articles]
- [Criterion 6: e.g., Studies with unavailable full text after author contact]
### 2.5 Study Selection
**Reviewers:** [n independent reviewers] | **Conflict resolution:** [Method]
**Inter-rater reliability:** [Cohen's kappa = X]
**PRISMA Flow:**
```
Records identified: n=[X] → Deduplicated: n=[Y] →
Title/abstract screened: n=[Y] → Full-text assessed: n=[Z] → Included: n=[N]
```
**Exclusion reasons:** [List with counts]
### 2.6 Data Extraction
**Method:** [Standardized form (Appendix E); pilot-tested on n studies]
**Extractors:** [n independent] | **Verification:** [Double-checked]
**Items:** Study ID, design, population, interventions/exposures, outcomes, statistics, funding, COI, bias domains
**Missing data:** [Author contact protocol]
### 2.7 Quality Assessment
**Tool:** [Cochrane RoB 2.0 / ROBINS-I / Newcastle-Ottawa / AMSTAR 2 / JBI]
**Method:** [2 independent reviewers; third for conflicts]
**Rating:** [Low/Moderate/High risk of bias]
**Publication bias:** [Funnel plots, Egger's test - if meta-analysis]
### 2.8 Synthesis and Analysis
**Approach:** [Narrative / Meta-analysis / Both]
**Statistics** (if meta-analysis): Effect measures, heterogeneity (I², τ²), sensitivity analyses, subgroups
**Software:** [RevMan, R, Stata]
**Certainty:** [GRADE framework; factors: bias, inconsistency, indirectness, imprecision]
---
## 3. Results
### 3.1 Study Selection
**Summary:** [X records → Y deduplicated → Z full-text → N included (M in meta-analysis)]
**Study types:** [RCTs: n=X, Observational: n=Y, Reviews: n=Z]
**Years:** [Range; peak year]
**Geography:** [Countries represented]
**Source:** [Peer-reviewed: n=X, Preprints: n=Y]
### 3.2 Bibliometric Overview
[Optional: Trends, journal distribution, author networks, citations, keywords - if analyzed with VOSviewer or similar]
### 3.3 Study Characteristics
| Study | Year | Design | Sample Size | Key Methods | Main Findings | Quality |
|-------|------|--------|-------------|-------------|---------------|---------|
| First Author et al. | 2023 | [Type] | n=[X] | [Methods] | [Brief findings] | [Low/Mod/High RoB] |
**Quality:** Low RoB: n=X ([%]); Moderate: n=Y ([%]); High: n=Z ([%])
### 3.4 Thematic Synthesis
[Organize by themes, NOT study-by-study. Synthesize across studies to identify consensus, controversies, and gaps.]
#### 3.4.1 Theme 1: [Title]
**Findings:** [Synthesis of key findings from multiple studies]
**Supporting studies:** [X, Y, Z]
**Contradictory evidence:** [If any]
**Certainty:** [GRADE rating if applicable]
### 3.5 Methodological Approaches
**Common methods:** [Method 1 (n studies), Method 2 (n studies)]
**Emerging techniques:** [New approaches observed]
**Methodological quality:** [Overall assessment]
### 3.6 Meta-Analysis Results
[Include only if conducting meta-analysis]
**Effect estimates:** [Primary/secondary outcomes with 95% CI, p-values]
**Heterogeneity:** [I²=X%, τ²=Y, interpretation]
**Subgroups & sensitivity:** [Key findings from analyses]
**Publication bias:** [Funnel plot, Egger's p=X]
**Forest plots:** [Include for primary outcomes]
### 3.7 Knowledge Gaps
**Knowledge:** [Unanswered research questions]
**Methodological:** [Study design/measurement issues]
**Translational:** [Research-to-practice gaps]
**Populations:** [Underrepresented groups/contexts]
---
## 4. Discussion
### 4.1 Main Findings
[Synthesize key findings by research question]
**Principal findings:** [Top 3-5 takeaways]
**Consensus:** [Where studies agree]
**Controversy:** [Conflicting results]
### 4.2 Interpretation and Implications
**Context:** [How findings advance/challenge current understanding]
**Mechanisms:** [Potential explanations for observed patterns]
**Implications for:**
- **Practice:** [Actionable recommendations]
- **Policy:** [If relevant]
- **Research:** [Theoretical, methodological, priority directions]
### 4.3 Strengths and Limitations
**Strengths:** [Comprehensive search, rigorous methods, large evidence base, transparency]
**Limitations:**
- Search/selection: [Language bias, database coverage, grey literature, publication bias]
- Methodological: [Heterogeneity, study quality]
- Temporal: [Rapid evolution, search cutoff date]
**Impact:** [How limitations affect conclusions]
### 4.4 Comparison with Previous Reviews
[If relevant: How does this review update/differ from prior reviews?]
### 4.5 Future Research
**Priority questions:**
1. [Question] - Rationale, suggested approach, expected impact
2. [Question] - Rationale, suggested approach, expected impact
3. [Question] - Rationale, suggested approach, expected impact
**Recommendations:** [Methodological improvements, understudied populations, emerging technologies]
---
## 5. Conclusions
[Concise conclusions addressing research questions]
1. [Conclusion directly addressing primary research question]
2. [Key finding conclusion]
3. [Gap/future direction conclusion]
**Evidence certainty:** [High/Moderate/Low/Very Low]
**Translation readiness:** [Ready / Needs more research / Preliminary]
---
## 6. Declarations
### Author Contributions
[CRediT taxonomy: Author 1 - Conceptualization, Methodology, Writing; Author 2 - Analysis, Review; etc.]
### Funding
[Grant details with numbers] OR [No funding received]
### Conflicts of Interest
[Author-specific declarations] OR [None]
### Data Availability
**Protocol:** [PROSPERO/OSF ID or "Not registered"]
**Data/Code:** [Repository URL/DOI or "Available upon request"]
**Materials:** [Search strategies (Appendix A), PRISMA checklist (Appendix B), extraction form (Appendix E)]
### Acknowledgments
[Contributors not meeting authorship criteria, librarians, patient involvement]
---
## 7. References
[Use consistent style: APA / Nature / Vancouver]
**Format examples:**
APA: Author, A. A., & Author, B. B. (Year). Title. *Journal*, *volume*(issue), pages. https://doi.org/xx.xxxx
Nature: Author, A. A. & Author, B. B. Title. *J. Name* **volume**, pages (year).
Vancouver: Author AA, Author BB. Title. J Abbrev. Year;volume(issue):pages. doi:xx.xxxx
1. [First reference]
2. [Second reference]
3. [Continue...]
---
## 8. Appendices
### Appendix A: Search Strings
**PubMed** (Date: YYYY-MM-DD; Results: n)
```
[Complete search string with operators and MeSH terms]
```
[Repeat for each database: Scopus, Web of Science, bioRxiv, etc.]
### Appendix B: PRISMA Checklist
| Section | Item | Reported? | Page |
|---------|------|-----------|------|
| Title | Identify as systematic review | Yes/No | # |
| Abstract | Structured summary | Yes/No | # |
| Methods | Eligibility, sources, search, selection, data, quality | Yes/No | # |
| Results | Selection, characteristics, risk of bias, syntheses | Yes/No | # |
| Discussion | Interpretation, limitations, conclusions | Yes/No | # |
| Other | Registration, support, conflicts, availability | Yes/No | # |
### Appendix C: Excluded Studies
| Study | Year | Reason | Category |
|-------|------|--------|----------|
| Author et al. | Year | [Reason] | [Wrong population/outcome/design/etc.] |
**Summary:** Wrong population (n=X), Wrong outcome (n=Y), etc.
### Appendix D: Quality Assessment
**Tool:** [Cochrane RoB 2.0 / ROBINS-I / Newcastle-Ottawa / etc.]
| Study | Domain 1 | Domain 2 | Domain 3 | Overall |
|-------|----------|----------|----------|---------|
| Study 1 | Low | Low | Some concerns | Low |
| Study 2 | [Score] | [Score] | [Score] | [Overall] |
### Appendix E: Data Extraction Form
```
STUDY: Author______ Year______ DOI______
DESIGN: □RCT □Cohort □Case-Control □Cross-sectional □Other______
POPULATION: n=_____ Age_____ Setting_____
INTERVENTION/EXPOSURE: _____
OUTCOMES: Primary_____ Secondary_____
RESULTS: Effect size_____ 95%CI_____ p=_____
QUALITY: □Low □Moderate □High RoB
FUNDING/COI: _____
```
### Appendix F: Meta-Analysis Details
[Only if meta-analysis performed]
**Software:** [R 4.x.x with meta/metafor packages / RevMan / Stata]
**Model:** [Random-effects; justification]
**Code:** [Link to repository]
**Sensitivity analyses:** [Details]
### Appendix G: Author Contacts
| Study | Contact Date | Response | Data Received |
|-------|--------------|----------|---------------|
| Author et al. | YYYY-MM-DD | Yes/No | Yes/No/Partial |
---
## 9. Supplementary Materials
[If applicable]
**Tables:** S1 (Full study characteristics), S2 (Quality scores), S3 (Subgroups), S4 (Sensitivity)
**Figures:** S1 (PRISMA diagram), S2 (Risk of bias), S3 (Funnel plot), S4 (Forest plots), S5 (Networks)
**Data:** S1 (Extraction file), S2 (Search results), S3 (Analysis code), S4 (PRISMA checklist)
**Repository:** [OSF/GitHub/Zenodo URL with DOI]
---
## Review Metadata
**Registration:** [Registry] ID: [Number] (Date: YYYY-MM-DD)
**Search dates:** Initial: [Date]; Updated: [Date]
**Version:** [1.0] | **Last updated:** [Date]
**Quality checks:**
- [ ] Citations verified with verify_citations.py
- [ ] PRISMA checklist completed
- [ ] Search reproducible
- [ ] Independent data verification
- [ ] Code peer-reviewed
- [ ] All authors approved
---
## Usage Notes
**Review type adaptations:**
- Systematic Review: Use all sections
- Meta-Analysis: Include sections 3.6, Appendix F
- Narrative Review: May omit some methodology detail
- Scoping Review: Follow PRISMA-ScR, may omit quality assessment
**Key principles:**
1. Remove all [bracketed placeholders]
2. Follow PRISMA 2020 guidelines
3. Pre-register when feasible (PROSPERO/OSF)
4. Use thematic synthesis, not study-by-study
5. Be transparent and reproducible
6. Verify all DOIs before submission
7. Make data/code openly available
**Common pitfalls to avoid:**
- Don't list studies - synthesize them
- Don't cherry-pick results
- Don't ignore limitations
- Don't overstate conclusions
- Don't skip publication bias assessment
**Resources:**
- PRISMA 2020: http://prisma-statement.org/
- PROSPERO: https://www.crd.york.ac.uk/prospero/
- Cochrane Handbook: https://training.cochrane.org/handbook
- GRADE: https://www.gradeworkinggroup.org/
**DELETE THIS SECTION FROM YOUR FINAL REVIEW**
---
FILE:references/citation_styles.md
# Citation Styles Reference
This document provides detailed guidelines for formatting citations in various academic styles commonly used in literature reviews.
## APA Style (7th Edition)
### Journal Articles
**Format**: Author, A. A., Author, B. B., & Author, C. C. (Year). Title of article. *Title of Periodical*, *volume*(issue), page range. https://doi.org/xx.xxx/yyyy
**Example**: Smith, J. D., Johnson, M. L., & Williams, K. R. (2023). Machine learning approaches in drug discovery. *Nature Reviews Drug Discovery*, *22*(4), 301-318. https://doi.org/10.1038/nrd.2023.001
### Books
**Format**: Author, A. A. (Year). *Title of work: Capital letter also for subtitle*. Publisher Name. https://doi.org/xxxx
**Example**: Kumar, V., Abbas, A. K., & Aster, J. C. (2021). *Robbins and Cotran pathologic basis of disease* (10th ed.). Elsevier.
### Book Chapters
**Format**: Author, A. A., & Author, B. B. (Year). Title of chapter. In E. E. Editor & F. F. Editor (Eds.), *Title of book* (pp. xx-xx). Publisher.
**Example**: Brown, P. O., & Botstein, D. (2020). Exploring the new world of the genome with DNA microarrays. In M. B. Eisen & P. O. Brown (Eds.), *DNA microarrays: A molecular cloning manual* (pp. 1-45). Cold Spring Harbor Laboratory Press.
### Preprints
**Format**: Author, A. A., & Author, B. B. (Year). Title of preprint. *Repository Name*. https://doi.org/xxxx
**Example**: Zhang, Y., Chen, L., & Wang, H. (2024). Novel therapeutic targets in Alzheimer's disease. *bioRxiv*. https://doi.org/10.1101/2024.01.001
### Conference Papers
**Format**: Author, A. A. (Year, Month day-day). Title of paper. In E. E. Editor (Ed.), *Title of conference proceedings* (pp. xx-xx). Publisher. https://doi.org/xxxx
---
## Nature Style
### Journal Articles
**Format**: Author, A. A., Author, B. B. & Author, C. C. Title of article. *J. Name* **volume**, page range (year).
**Example**: Smith, J. D., Johnson, M. L. & Williams, K. R. Machine learning approaches in drug discovery. *Nat. Rev. Drug Discov.* **22**, 301-318 (2023).
### Books
**Format**: Author, A. A. & Author, B. B. *Book Title* (Publisher, Year).
**Example**: Kumar, V., Abbas, A. K. & Aster, J. C. *Robbins and Cotran Pathologic Basis of Disease* 10th edn (Elsevier, 2021).
### Multiple Authors
- 1-2 authors: List all
- 3+ authors: List first author followed by "et al."
**Example**: Zhang, Y. et al. Novel therapeutic targets in Alzheimer's disease. *bioRxiv* https://doi.org/10.1101/2024.01.001 (2024).
---
## Chicago Style (Author-Date)
### Journal Articles
**Format**: Author, First Name Middle Initial. Year. "Article Title." *Journal Title* volume, no. issue (Month): page range. https://doi.org/xxxx.
**Example**: Smith, John D., Mary L. Johnson, and Karen R. Williams. 2023. "Machine Learning Approaches in Drug Discovery." *Nature Reviews Drug Discovery* 22, no. 4 (April): 301-318. https://doi.org/10.1038/nrd.2023.001.
### Books
**Format**: Author, First Name Middle Initial. Year. *Book Title: Subtitle*. Edition. Place: Publisher.
**Example**: Kumar, Vinay, Abul K. Abbas, and Jon C. Aster. 2021. *Robbins and Cotran Pathologic Basis of Disease*. 10th ed. Philadelphia: Elsevier.
---
## Vancouver Style (Numbered)
### Journal Articles
**Format**: Author AA, Author BB, Author CC. Title of article. Abbreviated Journal Name. Year;volume(issue):page range.
**Example**: Smith JD, Johnson ML, Williams KR. Machine learning approaches in drug discovery. Nat Rev Drug Discov. 2023;22(4):301-18.
### Books
**Format**: Author AA, Author BB. Title of book. Edition. Place: Publisher; Year.
**Example**: Kumar V, Abbas AK, Aster JC. Robbins and Cotran pathologic basis of disease. 10th ed. Philadelphia: Elsevier; 2021.
### Citation in Text
Use superscript numbers in order of appearance: "Recent studies^1,2^ have shown..."
---
## IEEE Style
### Journal Articles
**Format**: [#] A. A. Author, B. B. Author, and C. C. Author, "Title of article," *Abbreviated Journal Name*, vol. x, no. x, pp. xxx-xxx, Month Year.
**Example**: [1] J. D. Smith, M. L. Johnson, and K. R. Williams, "Machine learning approaches in drug discovery," *Nat. Rev. Drug Discov.*, vol. 22, no. 4, pp. 301-318, Apr. 2023.
### Books
**Format**: [#] A. A. Author, *Title of Book*, xth ed. City, State: Publisher, Year.
**Example**: [2] V. Kumar, A. K. Abbas, and J. C. Aster, *Robbins and Cotran Pathologic Basis of Disease*, 10th ed. Philadelphia, PA: Elsevier, 2021.
---
## Common Abbreviations for Journal Names
- Nature: Nat.
- Science: Science
- Cell: Cell
- Nature Reviews Drug Discovery: Nat. Rev. Drug Discov.
- Journal of the American Chemical Society: J. Am. Chem. Soc.
- Proceedings of the National Academy of Sciences: Proc. Natl. Acad. Sci. U.S.A.
- PLOS ONE: PLoS ONE
- Bioinformatics: Bioinformatics
- Nucleic Acids Research: Nucleic Acids Res.
---
## DOI Best Practices
1. **Always verify DOIs**: Use the verify_citations.py script to check all DOIs
2. **Format as URLs**: https://doi.org/10.xxxx/yyyy (preferred over doi:10.xxxx/yyyy)
3. **No period after DOI**: DOI should be the last element without trailing punctuation
4. **Resolve redirects**: Check that DOIs resolve to the correct article
---
## In-Text Citation Guidelines
### APA Style
- (Smith et al., 2023)
- Smith et al. (2023) demonstrated...
- Multiple citations: (Brown, 2022; Smith et al., 2023; Zhang, 2024)
### Nature Style
- Superscript numbers: Recent studies^1,2^ have shown...
- Or: Recent studies (refs 1,2) have shown...
### Chicago Style
- (Smith, Johnson, and Williams 2023)
- Smith, Johnson, and Williams (2023) found...
---
## Reference List Organization
### By Citation Style
- **APA, Chicago**: Alphabetical by first author's last name
- **Nature, Vancouver, IEEE**: Numerical order of first appearance in text
### Hanging Indents
Most styles use hanging indents where the first line is flush left and subsequent lines are indented.
### Consistency
Maintain consistent formatting throughout:
- Capitalization (title case vs. sentence case)
- Journal name abbreviations
- DOI presentation
- Author name format
FILE:references/database_strategies.md
# Literature Database Search Strategies
This document provides comprehensive guidance for searching multiple literature databases systematically and effectively.
## Available Databases and Skills
### Biomedical & Life Sciences
#### PubMed / PubMed Central
- **Access**: Use `gget` skill or WebFetch tool
- **Coverage**: 35M+ citations in biomedical literature
- **Best for**: Clinical studies, biomedical research, genetics, molecular biology
- **Search tips**: Use MeSH terms, Boolean operators (AND, OR, NOT), field tags [Title], [Author]
- **Example**: `"CRISPR"[Title] AND "gene editing"[Title/Abstract] AND 2020:2024[Publication Date]`
#### bioRxiv / medRxiv
- **Access**: Use `gget` skill or direct API
- **Coverage**: Preprints in biology and medicine
- **Best for**: Latest unpublished research, cutting-edge findings
- **Note**: Not peer-reviewed; verify findings with caution
- **Search tips**: Search by category (bioinformatics, genomics, etc.)
### General Scientific Literature
#### arXiv
- **Access**: Direct API access
- **Coverage**: Preprints in physics, mathematics, computer science, quantitative biology
- **Best for**: Computational methods, bioinformatics algorithms, theoretical work
- **Categories**: q-bio (Quantitative Biology), cs.LG (Machine Learning), stat.ML (Statistics)
- **Search format**: `cat:q-bio.QM AND title:"single cell"`
#### Semantic Scholar
- **Access**: Direct API (requires API key)
- **Coverage**: 200M+ papers across all fields
- **Best for**: Cross-disciplinary searches, citation graphs, paper recommendations
- **Features**: Influential citations, paper summaries, related papers
- **Rate limits**: 100 requests/5 minutes with API key
#### Google Scholar
- **Access**: Web scraping (use cautiously) or manual search
- **Coverage**: Comprehensive across all fields
- **Best for**: Finding highly cited papers, conference proceedings, theses
- **Limitations**: No official API, rate limiting
- **Export**: Use "Cite" feature for formatted citations
### Specialized Databases
#### ChEMBL / PubChem
- **Access**: Use `gget` skill or `bioservices` skill
- **Coverage**: Chemical compounds, bioactivity data, drug molecules
- **Best for**: Drug discovery, chemical biology, medicinal chemistry
- **ChEMBL**: 2M+ compounds, bioactivity data
- **PubChem**: 110M+ compounds, assay data
#### UniProt
- **Access**: Use `gget` skill or `bioservices` skill
- **Coverage**: Protein sequence and functional information
- **Best for**: Protein research, sequence analysis, functional annotations
- **Search by**: Protein name, gene name, organism, function
#### KEGG (Kyoto Encyclopedia of Genes and Genomes)
- **Access**: Use `bioservices` skill
- **Coverage**: Pathways, diseases, drugs, genes
- **Best for**: Pathway analysis, systems biology, metabolic research
#### COSMIC (Catalogue of Somatic Mutations in Cancer)
- **Access**: Use `gget` skill or direct download
- **Coverage**: Cancer genomics, somatic mutations
- **Best for**: Cancer research, mutation analysis
#### AlphaFold Database
- **Access**: Use `gget` skill with `alphafold` command
- **Coverage**: 200M+ protein structure predictions
- **Best for**: Structural biology, protein modeling
#### PDB (Protein Data Bank)
- **Access**: Use `gget` or direct API
- **Coverage**: Experimental 3D structures of proteins, nucleic acids
- **Best for**: Structural biology, drug design, molecular modeling
### Citation & Reference Management
#### OpenAlex
- **Access**: Direct API (free, no key required)
- **Coverage**: 250M+ works, comprehensive metadata
- **Best for**: Citation analysis, author disambiguation, institutional research
- **Features**: Open access, excellent for bibliometrics
#### Dimensions
- **Access**: Free tier available
- **Coverage**: Publications, grants, patents, clinical trials
- **Best for**: Research impact, funding analysis, translational research
---
## Search Strategy Framework
### 1. Define Research Question (PICO Framework)
For clinical/biomedical reviews:
- **P**opulation: Who is the study about?
- **I**ntervention: What is being tested?
- **C**omparison: What is it compared to?
- **O**utcome: What are the results?
**Example**: "What is the efficacy of CRISPR-Cas9 gene therapy (I) for treating sickle cell disease (P) compared to standard care (C) in improving patient outcomes (O)?"
### 2. Develop Search Terms
#### Primary Concepts
Identify 2-4 main concepts from your research question.
**Example**:
- Concept 1: CRISPR, Cas9, gene editing
- Concept 2: sickle cell disease, SCD, hemoglobin disorders
- Concept 3: gene therapy, therapeutic editing
#### Synonyms & Related Terms
List alternative terms, abbreviations, and related concepts.
**Tool**: Use MeSH (Medical Subject Headings) browser for standardized terms
#### Boolean Operators
- **AND**: Narrows search (must include both terms)
- **OR**: Broadens search (includes either term)
- **NOT**: Excludes terms
**Example**: `(CRISPR OR Cas9 OR "gene editing") AND ("sickle cell" OR SCD) AND therapy`
#### Wildcards & Truncation
- `*` or `%`: Matches any characters
- `?`: Matches single character
**Example**: `genom*` matches genomic, genomics, genome
### 3. Set Inclusion/Exclusion Criteria
#### Inclusion Criteria
- **Date range**: e.g., 2015-2024 (last 10 years)
- **Language**: English (or specify multilingual)
- **Publication type**: Peer-reviewed articles, reviews, preprints
- **Study design**: RCTs, cohort studies, meta-analyses
- **Population**: Human, animal models, in vitro
#### Exclusion Criteria
- Case reports (n<5)
- Conference abstracts without full text
- Non-original research (editorials, commentaries)
- Duplicate publications
- Retracted articles
### 4. Database Selection Strategy
#### Multi-Database Approach
Search at least 3 complementary databases:
1. **Primary database**: PubMed (biomedical) or arXiv (computational)
2. **Preprint server**: bioRxiv/medRxiv or arXiv
3. **Comprehensive database**: Semantic Scholar or Google Scholar
4. **Specialized database**: ChEMBL, UniProt, or field-specific
#### Database-Specific Syntax
| Database | Field Tags | Example |
|----------|-----------|---------|
| PubMed | [Title], [Author], [MeSH] | "CRISPR"[Title] AND 2020:2024[DP] |
| arXiv | ti:, au:, cat: | ti:"machine learning" AND cat:q-bio.QM |
| Semantic Scholar | title:, author:, year: | title:"deep learning" year:2020-2024 |
---
## Search Execution Workflow
### Phase 1: Pilot Search
1. Run initial search with broad terms
2. Review first 50 results for relevance
3. Note common keywords and MeSH terms
4. Refine search strategy
### Phase 2: Comprehensive Search
1. Execute refined searches across all selected databases
2. Export results in standard format (RIS, BibTeX, JSON)
3. Document search strings and date for each database
4. Record number of results per database
### Phase 3: Deduplication
1. Import all results into a single file
2. Use `search_databases.py --deduplicate` to remove duplicates
3. Identify duplicates by DOI (primary) or title (fallback)
4. Keep the version with most complete metadata
### Phase 4: Screening
1. **Title screening**: Review titles, exclude obviously irrelevant
2. **Abstract screening**: Read abstracts, apply inclusion/exclusion criteria
3. **Full-text screening**: Obtain and review full texts
4. Document reasons for exclusion at each stage
### Phase 5: Quality Assessment
1. Assess study quality using appropriate tools:
- **RCTs**: Cochrane Risk of Bias tool
- **Observational**: Newcastle-Ottawa Scale
- **Systematic reviews**: AMSTAR 2
2. Grade quality of evidence (high, moderate, low, very low)
3. Consider excluding very low-quality studies
---
## Search Documentation Template
### Required Documentation
All searches must be documented for reproducibility:
```markdown
## Search Strategy
### Database: PubMed
- **Date searched**: 2024-10-25
- **Date range**: 2015-01-01 to 2024-10-25
- **Search string**:
```
("CRISPR"[Title] OR "Cas9"[Title] OR "gene editing"[Title/Abstract])
AND ("sickle cell disease"[MeSH] OR "SCD"[Title/Abstract])
AND ("gene therapy"[MeSH] OR "therapeutic editing"[Title/Abstract])
AND 2015:2024[Publication Date]
AND English[Language]
```
- **Results**: 247 articles
- **After deduplication**: 189 articles
### Database: bioRxiv
- **Date searched**: 2024-10-25
- **Date range**: 2015-01-01 to 2024-10-25
- **Search string**: "CRISPR" AND "sickle cell" (in title/abstract)
- **Results**: 34 preprints
- **After deduplication**: 28 preprints
### Total Unique Articles
- **Combined results**: 217 unique articles
- **After title screening**: 156 articles
- **After abstract screening**: 89 articles
- **After full-text screening**: 52 articles included in review
```
---
## Advanced Search Techniques
### Prioritizing High-Impact Papers (CRITICAL)
**Always prioritize papers based on citation count, venue quality, and author reputation.** Quality matters more than quantity.
#### Citation Metrics in Database Searches
Use citation counts to identify influential work:
| Paper Age | Citations | Classification |
|-----------|-----------|----------------|
| 0-3 years | 20+ | Noteworthy |
| 0-3 years | 100+ | Highly Influential |
| 3-7 years | 100+ | Significant |
| 3-7 years | 500+ | Landmark |
| 7+ years | 500+ | Seminal |
| 7+ years | 1000+ | Foundational |
**Database-Specific Citation Features:**
- **Google Scholar:** Sort by citation count, use "Cited by" feature
- **Semantic Scholar:** "Highly Influential Citations" metric, citation velocity
- **OpenAlex:** Citation counts, citation context analysis
- **PubMed:** Use "Cited by" in PMC, check citation counts via Google Scholar
#### Filtering by Journal Quality
Prioritize papers from higher-tier venues:
**Tier 1 (Always Prefer):**
- Nature, Science, Cell, NEJM, Lancet, JAMA, PNAS
- Nature Medicine, Nature Biotechnology, Nature Methods
- Search tip: `source:Nature` or `journal:Nature` in Google Scholar
**Tier 2 (High Priority):**
- High-impact specialized journals (Impact Factor >10)
- Top conferences: NeurIPS, ICML, ICLR, CVPR, ACL
**Tier 3 (Include When Relevant):**
- Respected field-specific journals (IF 5-10)
**PubMed Journal Filtering:**
```
"Nature"[Journal] OR "Science"[Journal] OR "Cell"[Journal]
```
**Google Scholar Journal Filtering:**
```
source:Nature source:Science source:Cell
```
#### Leveraging "Cited by" Features
**Finding Influential Work:**
1. Start with a known key paper
2. Click "Cited by" to find papers that cite it
3. Sort citing papers by their citation count
4. Highly-cited citing papers indicate important follow-up work
**Identifying Seminal Papers:**
1. Search your topic broadly
2. Note which papers appear repeatedly in reference lists
3. Papers cited by many of your results are likely seminal
4. Check citation counts to confirm influence
**Semantic Scholar Features:**
- "Highly Influential Citations" shows citations that significantly built on the paper
- "Citation Velocity" shows recent citation growth
- Paper recommendations based on citation networks
### Citation Chaining
#### Forward Citation Search
Find papers that cite a key paper:
- Use Google Scholar "Cited by" feature
- Use OpenAlex or Semantic Scholar APIs
- Identifies newer research building on seminal work
- **Tip:** Sort by citation count to find the most influential follow-up work
#### Backward Citation Search
Review references in key papers:
- Extract references from included papers
- Search for highly cited references (500+ citations for older papers)
- Identifies foundational research
- **Tip:** Focus on references that appear in multiple papers' bibliographies
### Snowball Sampling
1. Start with 3-5 highly relevant papers **from Tier-1 venues**
2. Extract all their references
3. Check which references are cited by multiple papers
4. Review those high-overlap references - these are likely seminal
5. Repeat for newly identified key papers
6. **Prioritize papers with high citation counts** at each step
### Author Search
Follow prolific and reputable authors in the field:
- Search by author name across databases
- Check author profiles (ORCID, Google Scholar) for h-index and publication venues
- Review recent publications and preprints
- **Prefer authors with multiple Tier-1 publications** and high h-index (>40)
- Look for senior authors who are recognized field leaders
### Related Article Features
Many databases suggest related articles:
- PubMed "Similar articles"
- Semantic Scholar "Recommended papers"
- Use to discover papers missed by keyword search
- **Filter recommendations by citation count and venue quality**
---
## Quality Control Checklist
### Before Searching
- [ ] Research question clearly defined
- [ ] PICO criteria established (if applicable)
- [ ] Search terms and synonyms listed
- [ ] Inclusion/exclusion criteria documented
- [ ] Target databases selected (minimum 3)
- [ ] Date range determined
### During Searching
- [ ] Search string tested and refined
- [ ] Results exported with complete metadata
- [ ] Search parameters documented
- [ ] Number of results recorded per database
- [ ] Search date recorded
### After Searching
- [ ] Duplicates removed
- [ ] Screening protocol followed
- [ ] Reasons for exclusion documented
- [ ] Quality assessment completed
- [ ] All citations verified with verify_citations.py
- [ ] Search methodology documented in review
---
## Common Pitfalls to Avoid
1. **Too narrow search**: Missing relevant papers
- Solution: Include synonyms, related terms, broader concepts
2. **Too broad search**: Thousands of irrelevant results
- Solution: Add specific concepts with AND, use field tags
3. **Single database**: Incomplete coverage
- Solution: Search minimum 3 complementary databases
4. **Ignoring preprints**: Missing latest findings
- Solution: Include bioRxiv, medRxiv, or arXiv
5. **No documentation**: Irreproducible search
- Solution: Document every search string, date, and result count
6. **Manual deduplication**: Time-consuming and error-prone
- Solution: Use search_databases.py script
7. **Unverified citations**: Broken DOIs, incorrect metadata
- Solution: Run verify_citations.py on final reference list
8. **Publication bias**: Only including published positive results
- Solution: Search trial registries, contact authors for unpublished data
---
## Example Multi-Database Search Workflow
```python
# Example workflow using available skills
# 1. Search PubMed via gget
search_term = "CRISPR AND sickle cell disease"
# Use gget search pubmed search_term
# 2. Search bioRxiv
# Use gget search biorxiv search_term
# 3. Search arXiv for computational papers
# Search arXiv with: cat:q-bio AND "CRISPR" AND "sickle cell"
# 4. Search Semantic Scholar via API
# Use semantic scholar API with search query
# 5. Aggregate and deduplicate results
# python search_databases.py combined_results.json --deduplicate --format markdown --output review_papers.md
# 6. Verify all citations
# python verify_citations.py review_papers.md
# 7. Generate final PDF
# python generate_pdf.py review_papers.md --citation-style nature
```
---
## Resources
### MeSH Browser
https://meshb.nlm.nih.gov/search
### Boolean Search Tutorial
https://www.ncbi.nlm.nih.gov/books/NBK3827/
### Citation Style Guides
See references/citation_styles.md in this skill
### PRISMA Guidelines
Preferred Reporting Items for Systematic Reviews and Meta-Analyses:
http://www.prisma-statement.org/
FILE:scripts/generate_pdf.py
#!/usr/bin/env python3
"""
PDF Generation Script for Literature Reviews
Converts markdown files to professionally formatted PDFs with proper styling.
"""
import subprocess
import sys
import os
from pathlib import Path
def generate_pdf(
markdown_file: str,
output_pdf: str = None,
citation_style: str = "apa",
template: str = None,
toc: bool = True,
number_sections: bool = True
) -> bool:
"""
Generate a PDF from a markdown file using pandoc.
Args:
markdown_file: Path to the markdown file
output_pdf: Path for output PDF (defaults to same name as markdown)
citation_style: Citation style (apa, nature, chicago, etc.)
template: Path to custom LaTeX template
toc: Include table of contents
number_sections: Number the sections
Returns:
True if successful, False otherwise
"""
# Verify markdown file exists
if not os.path.exists(markdown_file):
print(f"Error: Markdown file not found: {markdown_file}")
return False
# Set default output path
if output_pdf is None:
output_pdf = Path(markdown_file).with_suffix('.pdf')
# Check if pandoc is installed
try:
subprocess.run(['pandoc', '--version'], capture_output=True, check=True)
except (subprocess.CalledProcessError, FileNotFoundError):
print("Error: pandoc is not installed.")
print("Install with: brew install pandoc (macOS) or apt-get install pandoc (Linux)")
return False
# Build pandoc command
cmd = [
'pandoc',
markdown_file,
'-o', str(output_pdf),
'--pdf-engine=xelatex', # Better Unicode support
'-V', 'geometry:margin=1in',
'-V', 'fontsize=11pt',
'-V', 'colorlinks=true',
'-V', 'linkcolor=blue',
'-V', 'urlcolor=blue',
'-V', 'citecolor=blue',
]
# Add table of contents
if toc:
cmd.extend(['--toc', '--toc-depth=3'])
# Add section numbering
if number_sections:
cmd.append('--number-sections')
# Add citation processing if bibliography exists
bib_file = Path(markdown_file).with_suffix('.bib')
if bib_file.exists():
cmd.extend([
'--citeproc',
'--bibliography', str(bib_file),
'--csl', f'{citation_style}.csl' if not citation_style.endswith('.csl') else citation_style
])
# Add custom template if provided
if template and os.path.exists(template):
cmd.extend(['--template', template])
# Execute pandoc
try:
print(f"Generating PDF: {output_pdf}")
print(f"Command: {' '.join(cmd)}")
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
print(f"✓ PDF generated successfully: {output_pdf}")
return True
except subprocess.CalledProcessError as e:
print(f"Error generating PDF:")
print(f"STDOUT: {e.stdout}")
print(f"STDERR: {e.stderr}")
return False
def check_dependencies():
"""Check if required dependencies are installed."""
dependencies = {
'pandoc': 'pandoc --version',
'xelatex': 'xelatex --version'
}
missing = []
for name, cmd in dependencies.items():
try:
subprocess.run(cmd.split(), capture_output=True, check=True)
print(f"✓ {name} is installed")
except (subprocess.CalledProcessError, FileNotFoundError):
print(f"✗ {name} is NOT installed")
missing.append(name)
if missing:
print("\n" + "="*60)
print("Missing dependencies:")
for dep in missing:
if dep == 'pandoc':
print(" - pandoc: brew install pandoc (macOS) or apt-get install pandoc (Linux)")
elif dep == 'xelatex':
print(" - xelatex: brew install --cask mactex (macOS) or apt-get install texlive-xetex (Linux)")
return False
return True
def main():
"""Command-line interface."""
if len(sys.argv) < 2:
print("Usage: python generate_pdf.py <markdown_file> [output_pdf] [--citation-style STYLE]")
print("\nOptions:")
print(" --citation-style STYLE Citation style (default: apa)")
print(" --no-toc Disable table of contents")
print(" --no-numbers Disable section numbering")
print(" --check-deps Check if dependencies are installed")
sys.exit(1)
# Check dependencies mode
if '--check-deps' in sys.argv:
check_dependencies()
sys.exit(0)
# Parse arguments
markdown_file = sys.argv[1]
output_pdf = sys.argv[2] if len(sys.argv) > 2 and not sys.argv[2].startswith('--') else None
citation_style = 'apa'
toc = True
number_sections = True
# Parse optional flags
if '--citation-style' in sys.argv:
idx = sys.argv.index('--citation-style')
if idx + 1 < len(sys.argv):
citation_style = sys.argv[idx + 1]
if '--no-toc' in sys.argv:
toc = False
if '--no-numbers' in sys.argv:
number_sections = False
# Generate PDF
success = generate_pdf(
markdown_file,
output_pdf,
citation_style=citation_style,
toc=toc,
number_sections=number_sections
)
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()
FILE:scripts/search_databases.py
#!/usr/bin/env python3
"""
Literature Database Search Script
Searches multiple literature databases and aggregates results.
"""
import json
import sys
from typing import Dict, List
from datetime import datetime
def format_search_results(results: List[Dict], output_format: str = 'json') -> str:
"""
Format search results for output.
Args:
results: List of search results
output_format: Format (json, markdown, or bibtex)
Returns:
Formatted string
"""
if output_format == 'json':
return json.dumps(results, indent=2)
elif output_format == 'markdown':
md = f"# Literature Search Results\n\n"
md += f"**Search Date**: {datetime.now().strftime('%Y-%m-%d %H:%M')}\n"
md += f"**Total Results**: {len(results)}\n\n"
for i, result in enumerate(results, 1):
md += f"## {i}. {result.get('title', 'Untitled')}\n\n"
md += f"**Authors**: {result.get('authors', 'Unknown')}\n\n"
md += f"**Year**: {result.get('year', 'N/A')}\n\n"
md += f"**Source**: {result.get('source', 'Unknown')}\n\n"
if result.get('abstract'):
md += f"**Abstract**: {result['abstract']}\n\n"
if result.get('doi'):
md += f"**DOI**: [{result['doi']}](https://doi.org/{result['doi']})\n\n"
if result.get('url'):
md += f"**URL**: {result['url']}\n\n"
if result.get('citations'):
md += f"**Citations**: {result['citations']}\n\n"
md += "---\n\n"
return md
elif output_format == 'bibtex':
bibtex = ""
for i, result in enumerate(results, 1):
entry_type = result.get('type', 'article')
cite_key = f"{result.get('first_author', 'unknown')}{result.get('year', '0000')}"
bibtex += f"@{entry_type}{{{cite_key},\n"
bibtex += f" title = {{{result.get('title', '')}}},\n"
bibtex += f" author = {{{result.get('authors', '')}}},\n"
bibtex += f" year = {{{result.get('year', '')}}},\n"
if result.get('journal'):
bibtex += f" journal = {{{result['journal']}}},\n"
if result.get('volume'):
bibtex += f" volume = {{{result['volume']}}},\n"
if result.get('pages'):
bibtex += f" pages = {{{result['pages']}}},\n"
if result.get('doi'):
bibtex += f" doi = {{{result['doi']}}},\n"
bibtex += "}\n\n"
return bibtex
else:
raise ValueError(f"Unknown format: {output_format}")
def deduplicate_results(results: List[Dict]) -> List[Dict]:
"""
Remove duplicate results based on DOI or title.
Args:
results: List of search results
Returns:
Deduplicated list
"""
seen_dois = set()
seen_titles = set()
unique_results = []
for result in results:
doi = result.get('doi', '').lower().strip()
title = result.get('title', '').lower().strip()
# Check DOI first (more reliable)
if doi and doi in seen_dois:
continue
# Check title as fallback
if not doi and title in seen_titles:
continue
# Add to results
if doi:
seen_dois.add(doi)
if title:
seen_titles.add(title)
unique_results.append(result)
return unique_results
def rank_results(results: List[Dict], criteria: str = 'citations') -> List[Dict]:
"""
Rank results by specified criteria.
Args:
results: List of search results
criteria: Ranking criteria (citations, year, relevance)
Returns:
Ranked list
"""
if criteria == 'citations':
return sorted(results, key=lambda x: x.get('citations', 0), reverse=True)
elif criteria == 'year':
return sorted(results, key=lambda x: x.get('year', '0'), reverse=True)
elif criteria == 'relevance':
return sorted(results, key=lambda x: x.get('relevance_score', 0), reverse=True)
else:
return results
def filter_by_year(results: List[Dict], start_year: int = None, end_year: int = None) -> List[Dict]:
"""
Filter results by publication year range.
Args:
results: List of search results
start_year: Minimum year (inclusive)
end_year: Maximum year (inclusive)
Returns:
Filtered list
"""
filtered = []
for result in results:
try:
year = int(result.get('year', 0))
if start_year and year < start_year:
continue
if end_year and year > end_year:
continue
filtered.append(result)
except (ValueError, TypeError):
# Include if year parsing fails
filtered.append(result)
return filtered
def generate_search_summary(results: List[Dict]) -> Dict:
"""
Generate summary statistics for search results.
Args:
results: List of search results
Returns:
Summary dictionary
"""
summary = {
'total_results': len(results),
'sources': {},
'year_distribution': {},
'avg_citations': 0,
'total_citations': 0
}
citations = []
for result in results:
# Count by source
source = result.get('source', 'Unknown')
summary['sources'][source] = summary['sources'].get(source, 0) + 1
# Count by year
year = result.get('year', 'Unknown')
summary['year_distribution'][year] = summary['year_distribution'].get(year, 0) + 1
# Collect citations
if result.get('citations'):
try:
citations.append(int(result['citations']))
except (ValueError, TypeError):
pass
if citations:
summary['avg_citations'] = sum(citations) / len(citations)
summary['total_citations'] = sum(citations)
return summary
def main():
"""Command-line interface for search result processing."""
if len(sys.argv) < 2:
print("Usage: python search_databases.py <results.json> [options]")
print("\nOptions:")
print(" --format FORMAT Output format (json, markdown, bibtex)")
print(" --output FILE Output file (default: stdout)")
print(" --rank CRITERIA Rank by (citations, year, relevance)")
print(" --year-start YEAR Filter by start year")
print(" --year-end YEAR Filter by end year")
print(" --deduplicate Remove duplicates")
print(" --summary Show summary statistics")
sys.exit(1)
# Load results
results_file = sys.argv[1]
try:
with open(results_file, 'r', encoding='utf-8') as f:
results = json.load(f)
except Exception as e:
print(f"Error loading results: {e}")
sys.exit(1)
# Parse options
output_format = 'markdown'
output_file = None
rank_criteria = None
year_start = None
year_end = None
do_dedup = False
show_summary = False
i = 2
while i < len(sys.argv):
arg = sys.argv[i]
if arg == '--format' and i + 1 < len(sys.argv):
output_format = sys.argv[i + 1]
i += 2
elif arg == '--output' and i + 1 < len(sys.argv):
output_file = sys.argv[i + 1]
i += 2
elif arg == '--rank' and i + 1 < len(sys.argv):
rank_criteria = sys.argv[i + 1]
i += 2
elif arg == '--year-start' and i + 1 < len(sys.argv):
year_start = int(sys.argv[i + 1])
i += 2
elif arg == '--year-end' and i + 1 < len(sys.argv):
year_end = int(sys.argv[i + 1])
i += 2
elif arg == '--deduplicate':
do_dedup = True
i += 1
elif arg == '--summary':
show_summary = True
i += 1
else:
i += 1
# Process results
if do_dedup:
results = deduplicate_results(results)
print(f"After deduplication: {len(results)} results")
if year_start or year_end:
results = filter_by_year(results, year_start, year_end)
print(f"After year filter: {len(results)} results")
if rank_criteria:
results = rank_results(results, rank_criteria)
print(f"Ranked by: {rank_criteria}")
# Show summary
if show_summary:
summary = generate_search_summary(results)
print("\n" + "="*60)
print("SEARCH SUMMARY")
print("="*60)
print(json.dumps(summary, indent=2))
print()
# Format output
output = format_search_results(results, output_format)
# Write output
if output_file:
with open(output_file, 'w', encoding='utf-8') as f:
f.write(output)
print(f"✓ Results saved to: {output_file}")
else:
print(output)
if __name__ == "__main__":
main()
FILE:scripts/verify_citations.py
#!/usr/bin/env python3
"""
Citation Verification Script
Verifies DOIs, URLs, and citation metadata for accuracy.
"""
import re
import requests
import json
from typing import Dict, List, Tuple
from urllib.parse import urlparse
import time
class CitationVerifier:
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'CitationVerifier/1.0 (Literature Review Tool)'
})
def extract_dois(self, text: str) -> List[str]:
"""Extract all DOIs from text."""
doi_pattern = r'10\.\d{4,}/[^\s\]\)"]+'
return re.findall(doi_pattern, text)
def verify_doi(self, doi: str) -> Tuple[bool, Dict]:
"""
Verify a DOI and retrieve metadata.
Returns (is_valid, metadata)
"""
try:
url = f"https://doi.org/api/handles/{doi}"
response = self.session.get(url, timeout=10)
if response.status_code == 200:
# DOI exists, now get metadata from CrossRef
metadata = self._get_crossref_metadata(doi)
return True, metadata
else:
return False, {}
except Exception as e:
return False, {"error": str(e)}
def _get_crossref_metadata(self, doi: str) -> Dict:
"""Get metadata from CrossRef API."""
try:
url = f"https://api.crossref.org/works/{doi}"
response = self.session.get(url, timeout=10)
if response.status_code == 200:
data = response.json()
message = data.get('message', {})
# Extract key metadata
metadata = {
'title': message.get('title', [''])[0],
'authors': self._format_authors(message.get('author', [])),
'year': self._extract_year(message),
'journal': message.get('container-title', [''])[0],
'volume': message.get('volume', ''),
'pages': message.get('page', ''),
'doi': doi
}
return metadata
return {}
except Exception as e:
return {"error": str(e)}
def _format_authors(self, authors: List[Dict]) -> str:
"""Format author list."""
if not authors:
return ""
formatted = []
for author in authors[:3]: # First 3 authors
given = author.get('given', '')
family = author.get('family', '')
if family:
formatted.append(f"{family}, {given[0]}." if given else family)
if len(authors) > 3:
formatted.append("et al.")
return ", ".join(formatted)
def _extract_year(self, message: Dict) -> str:
"""Extract publication year."""
date_parts = message.get('published-print', {}).get('date-parts', [[]])
if not date_parts or not date_parts[0]:
date_parts = message.get('published-online', {}).get('date-parts', [[]])
if date_parts and date_parts[0]:
return str(date_parts[0][0])
return ""
def verify_url(self, url: str) -> Tuple[bool, int]:
"""
Verify a URL is accessible.
Returns (is_accessible, status_code)
"""
try:
response = self.session.head(url, timeout=10, allow_redirects=True)
is_accessible = response.status_code < 400
return is_accessible, response.status_code
except Exception as e:
return False, 0
def verify_citations_in_file(self, filepath: str) -> Dict:
"""
Verify all citations in a markdown file.
Returns a report of verification results.
"""
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read()
dois = self.extract_dois(content)
report = {
'total_dois': len(dois),
'verified': [],
'failed': [],
'metadata': {}
}
for doi in dois:
print(f"Verifying DOI: {doi}")
is_valid, metadata = self.verify_doi(doi)
if is_valid:
report['verified'].append(doi)
report['metadata'][doi] = metadata
else:
report['failed'].append(doi)
time.sleep(0.5) # Rate limiting
return report
def format_citation_apa(self, metadata: Dict) -> str:
"""Format citation in APA style."""
authors = metadata.get('authors', '')
year = metadata.get('year', 'n.d.')
title = metadata.get('title', '')
journal = metadata.get('journal', '')
volume = metadata.get('volume', '')
pages = metadata.get('pages', '')
doi = metadata.get('doi', '')
citation = f"{authors} ({year}). {title}. "
if journal:
citation += f"*{journal}*"
if volume:
citation += f", *{volume}*"
if pages:
citation += f", {pages}"
if doi:
citation += f". https://doi.org/{doi}"
return citation
def format_citation_nature(self, metadata: Dict) -> str:
"""Format citation in Nature style."""
authors = metadata.get('authors', '')
title = metadata.get('title', '')
journal = metadata.get('journal', '')
volume = metadata.get('volume', '')
pages = metadata.get('pages', '')
year = metadata.get('year', '')
citation = f"{authors} {title}. "
if journal:
citation += f"*{journal}* "
if volume:
citation += f"**{volume}**, "
if pages:
citation += f"{pages} "
if year:
citation += f"({year})"
return citation
def main():
"""Example usage."""
import sys
if len(sys.argv) < 2:
print("Usage: python verify_citations.py <markdown_file>")
sys.exit(1)
filepath = sys.argv[1]
verifier = CitationVerifier()
print(f"Verifying citations in: {filepath}")
report = verifier.verify_citations_in_file(filepath)
print("\n" + "="*60)
print("CITATION VERIFICATION REPORT")
print("="*60)
print(f"\nTotal DOIs found: {report['total_dois']}")
print(f"Verified: {len(report['verified'])}")
print(f"Failed: {len(report['failed'])}")
if report['failed']:
print("\nFailed DOIs:")
for doi in report['failed']:
print(f" - {doi}")
if report['metadata']:
print("\n\nVerified Citations (APA format):")
for doi, metadata in report['metadata'].items():
citation = verifier.format_citation_apa(metadata)
print(f"\n{citation}")
# Save detailed report
output_file = filepath.replace('.md', '_citation_report.json')
with open(output_file, 'w', encoding='utf-8') as f:
json.dump(report, f, indent=2)
print(f"\n\nDetailed report saved to: {output_file}")
if __name__ == "__main__":
main()
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, c...
---
name: scientific-visualization
description: Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
license: MIT license
metadata:
skill-author: K-Dense Inc.
---
# Scientific Visualization
## Overview
Scientific visualization transforms data into clear, accurate figures for publication. Create journal-ready plots with multi-panel layouts, error bars, significance markers, and colorblind-safe palettes. Export as PDF/EPS/TIFF using matplotlib, seaborn, and plotly for manuscripts.
## When to Use This Skill
This skill should be used when:
- Creating plots or visualizations for scientific manuscripts
- Preparing figures for journal submission (Nature, Science, Cell, PLOS, etc.)
- Ensuring figures are colorblind-friendly and accessible
- Making multi-panel figures with consistent styling
- Exporting figures at correct resolution and format
- Following specific publication guidelines
- Improving existing figures to meet publication standards
- Creating figures that need to work in both color and grayscale
## Quick Start Guide
### Basic Publication-Quality Figure
```python
import matplotlib.pyplot as plt
import numpy as np
# Apply publication style (from scripts/style_presets.py)
from style_presets import apply_publication_style
apply_publication_style('default')
# Create figure with appropriate size (single column = 3.5 inches)
fig, ax = plt.subplots(figsize=(3.5, 2.5))
# Plot data
x = np.linspace(0, 10, 100)
ax.plot(x, np.sin(x), label='sin(x)')
ax.plot(x, np.cos(x), label='cos(x)')
# Proper labeling with units
ax.set_xlabel('Time (seconds)')
ax.set_ylabel('Amplitude (mV)')
ax.legend(frameon=False)
# Remove unnecessary spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# Save in publication formats (from scripts/figure_export.py)
from figure_export import save_publication_figure
save_publication_figure(fig, 'figure1', formats=['pdf', 'png'], dpi=300)
```
### Using Pre-configured Styles
Apply journal-specific styles using the matplotlib style files in `assets/`:
```python
import matplotlib.pyplot as plt
# Option 1: Use style file directly
plt.style.use('assets/nature.mplstyle')
# Option 2: Use style_presets.py helper
from style_presets import configure_for_journal
configure_for_journal('nature', figure_width='single')
# Now create figures - they'll automatically match Nature specifications
fig, ax = plt.subplots()
# ... your plotting code ...
```
### Quick Start with Seaborn
For statistical plots, use seaborn with publication styling:
```python
import seaborn as sns
import matplotlib.pyplot as plt
from style_presets import apply_publication_style
# Apply publication style
apply_publication_style('default')
sns.set_theme(style='ticks', context='paper', font_scale=1.1)
sns.set_palette('colorblind')
# Create statistical comparison figure
fig, ax = plt.subplots(figsize=(3.5, 3))
sns.boxplot(data=df, x='treatment', y='response',
order=['Control', 'Low', 'High'], palette='Set2', ax=ax)
sns.stripplot(data=df, x='treatment', y='response',
order=['Control', 'Low', 'High'],
color='black', alpha=0.3, size=3, ax=ax)
ax.set_ylabel('Response (μM)')
sns.despine()
# Save figure
from figure_export import save_publication_figure
save_publication_figure(fig, 'treatment_comparison', formats=['pdf', 'png'], dpi=300)
```
## Core Principles and Best Practices
### 1. Resolution and File Format
**Critical requirements** (detailed in `references/publication_guidelines.md`):
- **Raster images** (photos, microscopy): 300-600 DPI
- **Line art** (graphs, plots): 600-1200 DPI or vector format
- **Vector formats** (preferred): PDF, EPS, SVG
- **Raster formats**: TIFF, PNG (never JPEG for scientific data)
**Implementation:**
```python
# Use the figure_export.py script for correct settings
from figure_export import save_publication_figure
# Saves in multiple formats with proper DPI
save_publication_figure(fig, 'myfigure', formats=['pdf', 'png'], dpi=300)
# Or save for specific journal requirements
from figure_export import save_for_journal
save_for_journal(fig, 'figure1', journal='nature', figure_type='combination')
```
### 2. Color Selection - Colorblind Accessibility
**Always use colorblind-friendly palettes** (detailed in `references/color_palettes.md`):
**Recommended: Okabe-Ito palette** (distinguishable by all types of color blindness):
```python
# Option 1: Use assets/color_palettes.py
from color_palettes import OKABE_ITO_LIST, apply_palette
apply_palette('okabe_ito')
# Option 2: Manual specification
okabe_ito = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
'#0072B2', '#D55E00', '#CC79A7', '#000000']
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=okabe_ito)
```
**For heatmaps/continuous data:**
- Use perceptually uniform colormaps: `viridis`, `plasma`, `cividis`
- Avoid red-green diverging maps (use `PuOr`, `RdBu`, `BrBG` instead)
- Never use `jet` or `rainbow` colormaps
**Always test figures in grayscale** to ensure interpretability.
### 3. Typography and Text
**Font guidelines** (detailed in `references/publication_guidelines.md`):
- Sans-serif fonts: Arial, Helvetica, Calibri
- Minimum sizes at **final print size**:
- Axis labels: 7-9 pt
- Tick labels: 6-8 pt
- Panel labels: 8-12 pt (bold)
- Sentence case for labels: "Time (hours)" not "TIME (HOURS)"
- Always include units in parentheses
**Implementation:**
```python
# Set fonts globally
import matplotlib as mpl
mpl.rcParams['font.family'] = 'sans-serif'
mpl.rcParams['font.sans-serif'] = ['Arial', 'Helvetica']
mpl.rcParams['font.size'] = 8
mpl.rcParams['axes.labelsize'] = 9
mpl.rcParams['xtick.labelsize'] = 7
mpl.rcParams['ytick.labelsize'] = 7
```
### 4. Figure Dimensions
**Journal-specific widths** (detailed in `references/journal_requirements.md`):
- **Nature**: Single 89 mm, Double 183 mm
- **Science**: Single 55 mm, Double 175 mm
- **Cell**: Single 85 mm, Double 178 mm
**Check figure size compliance:**
```python
from figure_export import check_figure_size
fig = plt.figure(figsize=(3.5, 3)) # 89 mm for Nature
check_figure_size(fig, journal='nature')
```
### 5. Multi-Panel Figures
**Best practices:**
- Label panels with bold letters: **A**, **B**, **C** (uppercase for most journals, lowercase for Nature)
- Maintain consistent styling across all panels
- Align panels along edges where possible
- Use adequate white space between panels
**Example implementation** (see `references/matplotlib_examples.md` for complete code):
```python
from string import ascii_uppercase
fig = plt.figure(figsize=(7, 4))
gs = fig.add_gridspec(2, 2, hspace=0.4, wspace=0.4)
ax1 = fig.add_subplot(gs[0, 0])
ax2 = fig.add_subplot(gs[0, 1])
# ... create other panels ...
# Add panel labels
for i, ax in enumerate([ax1, ax2, ...]):
ax.text(-0.15, 1.05, ascii_uppercase[i], transform=ax.transAxes,
fontsize=10, fontweight='bold', va='top')
```
## Common Tasks
### Task 1: Create a Publication-Ready Line Plot
See `references/matplotlib_examples.md` Example 1 for complete code.
**Key steps:**
1. Apply publication style
2. Set appropriate figure size for target journal
3. Use colorblind-friendly colors
4. Add error bars with correct representation (SEM, SD, or CI)
5. Label axes with units
6. Remove unnecessary spines
7. Save in vector format
**Using seaborn for automatic confidence intervals:**
```python
import seaborn as sns
fig, ax = plt.subplots(figsize=(5, 3))
sns.lineplot(data=timeseries, x='time', y='measurement',
hue='treatment', errorbar=('ci', 95),
markers=True, ax=ax)
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Measurement (AU)')
sns.despine()
```
### Task 2: Create a Multi-Panel Figure
See `references/matplotlib_examples.md` Example 2 for complete code.
**Key steps:**
1. Use `GridSpec` for flexible layout
2. Ensure consistent styling across panels
3. Add bold panel labels (A, B, C, etc.)
4. Align related panels
5. Verify all text is readable at final size
### Task 3: Create a Heatmap with Proper Colormap
See `references/matplotlib_examples.md` Example 4 for complete code.
**Key steps:**
1. Use perceptually uniform colormap (`viridis`, `plasma`, `cividis`)
2. Include labeled colorbar
3. For diverging data, use colorblind-safe diverging map (`RdBu_r`, `PuOr`)
4. Set appropriate center value for diverging maps
5. Test appearance in grayscale
**Using seaborn for correlation matrices:**
```python
import seaborn as sns
fig, ax = plt.subplots(figsize=(5, 4))
corr = df.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, mask=mask, annot=True, fmt='.2f',
cmap='RdBu_r', center=0, square=True,
linewidths=1, cbar_kws={'shrink': 0.8}, ax=ax)
```
### Task 4: Prepare Figure for Specific Journal
**Workflow:**
1. Check journal requirements: `references/journal_requirements.md`
2. Configure matplotlib for journal:
```python
from style_presets import configure_for_journal
configure_for_journal('nature', figure_width='single')
```
3. Create figure (will auto-size correctly)
4. Export with journal specifications:
```python
from figure_export import save_for_journal
save_for_journal(fig, 'figure1', journal='nature', figure_type='line_art')
```
### Task 5: Fix an Existing Figure to Meet Publication Standards
**Checklist approach** (full checklist in `references/publication_guidelines.md`):
1. **Check resolution**: Verify DPI meets journal requirements
2. **Check file format**: Use vector for plots, TIFF/PNG for images
3. **Check colors**: Ensure colorblind-friendly
4. **Check fonts**: Minimum 6-7 pt at final size, sans-serif
5. **Check labels**: All axes labeled with units
6. **Check size**: Matches journal column width
7. **Test grayscale**: Figure interpretable without color
8. **Remove chart junk**: No unnecessary grids, 3D effects, shadows
### Task 6: Create Colorblind-Friendly Visualizations
**Strategy:**
1. Use approved palettes from `assets/color_palettes.py`
2. Add redundant encoding (line styles, markers, patterns)
3. Test with colorblind simulator
4. Ensure grayscale compatibility
**Example:**
```python
from color_palettes import apply_palette
import matplotlib.pyplot as plt
apply_palette('okabe_ito')
# Add redundant encoding beyond color
line_styles = ['-', '--', '-.', ':']
markers = ['o', 's', '^', 'v']
for i, (data, label) in enumerate(datasets):
plt.plot(x, data, linestyle=line_styles[i % 4],
marker=markers[i % 4], label=label)
```
## Statistical Rigor
**Always include:**
- Error bars (SD, SEM, or CI - specify which in caption)
- Sample size (n) in figure or caption
- Statistical significance markers (*, **, ***)
- Individual data points when possible (not just summary statistics)
**Example with statistics:**
```python
# Show individual points with summary statistics
ax.scatter(x_jittered, individual_points, alpha=0.4, s=8)
ax.errorbar(x, means, yerr=sems, fmt='o', capsize=3)
# Mark significance
ax.text(1.5, max_y * 1.1, '***', ha='center', fontsize=8)
```
## Working with Different Plotting Libraries
### Matplotlib
- Most control over publication details
- Best for complex multi-panel figures
- Use provided style files for consistent formatting
- See `references/matplotlib_examples.md` for extensive examples
### Seaborn
Seaborn provides a high-level, dataset-oriented interface for statistical graphics, built on matplotlib. It excels at creating publication-quality statistical visualizations with minimal code while maintaining full compatibility with matplotlib customization.
**Key advantages for scientific visualization:**
- Automatic statistical estimation and confidence intervals
- Built-in support for multi-panel figures (faceting)
- Colorblind-friendly palettes by default
- Dataset-oriented API using pandas DataFrames
- Semantic mapping of variables to visual properties
#### Quick Start with Publication Style
Always apply matplotlib publication styles first, then configure seaborn:
```python
import seaborn as sns
import matplotlib.pyplot as plt
from style_presets import apply_publication_style
# Apply publication style
apply_publication_style('default')
# Configure seaborn for publication
sns.set_theme(style='ticks', context='paper', font_scale=1.1)
sns.set_palette('colorblind') # Use colorblind-safe palette
# Create figure
fig, ax = plt.subplots(figsize=(3.5, 2.5))
sns.scatterplot(data=df, x='time', y='response',
hue='treatment', style='condition', ax=ax)
sns.despine() # Remove top and right spines
```
#### Common Plot Types for Publications
**Statistical comparisons:**
```python
# Box plot with individual points for transparency
fig, ax = plt.subplots(figsize=(3.5, 3))
sns.boxplot(data=df, x='treatment', y='response',
order=['Control', 'Low', 'High'], palette='Set2', ax=ax)
sns.stripplot(data=df, x='treatment', y='response',
order=['Control', 'Low', 'High'],
color='black', alpha=0.3, size=3, ax=ax)
ax.set_ylabel('Response (μM)')
sns.despine()
```
**Distribution analysis:**
```python
# Violin plot with split comparison
fig, ax = plt.subplots(figsize=(4, 3))
sns.violinplot(data=df, x='timepoint', y='expression',
hue='treatment', split=True, inner='quartile', ax=ax)
ax.set_ylabel('Gene Expression (AU)')
sns.despine()
```
**Correlation matrices:**
```python
# Heatmap with proper colormap and annotations
fig, ax = plt.subplots(figsize=(5, 4))
corr = df.corr()
mask = np.triu(np.ones_like(corr, dtype=bool)) # Show only lower triangle
sns.heatmap(corr, mask=mask, annot=True, fmt='.2f',
cmap='RdBu_r', center=0, square=True,
linewidths=1, cbar_kws={'shrink': 0.8}, ax=ax)
plt.tight_layout()
```
**Time series with confidence bands:**
```python
# Line plot with automatic CI calculation
fig, ax = plt.subplots(figsize=(5, 3))
sns.lineplot(data=timeseries, x='time', y='measurement',
hue='treatment', style='replicate',
errorbar=('ci', 95), markers=True, dashes=False, ax=ax)
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Measurement (AU)')
sns.despine()
```
#### Multi-Panel Figures with Seaborn
**Using FacetGrid for automatic faceting:**
```python
# Create faceted plot
g = sns.relplot(data=df, x='dose', y='response',
hue='treatment', col='cell_line', row='timepoint',
kind='line', height=2.5, aspect=1.2,
errorbar=('ci', 95), markers=True)
g.set_axis_labels('Dose (μM)', 'Response (AU)')
g.set_titles('{row_name} | {col_name}')
sns.despine()
# Save with correct DPI
from figure_export import save_publication_figure
save_publication_figure(g.figure, 'figure_facets',
formats=['pdf', 'png'], dpi=300)
```
**Combining seaborn with matplotlib subplots:**
```python
# Create custom multi-panel layout
fig, axes = plt.subplots(2, 2, figsize=(7, 6))
# Panel A: Scatter with regression
sns.regplot(data=df, x='predictor', y='response', ax=axes[0, 0])
axes[0, 0].text(-0.15, 1.05, 'A', transform=axes[0, 0].transAxes,
fontsize=10, fontweight='bold')
# Panel B: Distribution comparison
sns.violinplot(data=df, x='group', y='value', ax=axes[0, 1])
axes[0, 1].text(-0.15, 1.05, 'B', transform=axes[0, 1].transAxes,
fontsize=10, fontweight='bold')
# Panel C: Heatmap
sns.heatmap(correlation_data, cmap='viridis', ax=axes[1, 0])
axes[1, 0].text(-0.15, 1.05, 'C', transform=axes[1, 0].transAxes,
fontsize=10, fontweight='bold')
# Panel D: Time series
sns.lineplot(data=timeseries, x='time', y='signal',
hue='condition', ax=axes[1, 1])
axes[1, 1].text(-0.15, 1.05, 'D', transform=axes[1, 1].transAxes,
fontsize=10, fontweight='bold')
plt.tight_layout()
sns.despine()
```
#### Color Palettes for Publications
Seaborn includes several colorblind-safe palettes:
```python
# Use built-in colorblind palette (recommended)
sns.set_palette('colorblind')
# Or specify custom colorblind-safe colors (Okabe-Ito)
okabe_ito = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
'#0072B2', '#D55E00', '#CC79A7', '#000000']
sns.set_palette(okabe_ito)
# For heatmaps and continuous data
sns.heatmap(data, cmap='viridis') # Perceptually uniform
sns.heatmap(corr, cmap='RdBu_r', center=0) # Diverging, centered
```
#### Choosing Between Axes-Level and Figure-Level Functions
**Axes-level functions** (e.g., `scatterplot`, `boxplot`, `heatmap`):
- Use when building custom multi-panel layouts
- Accept `ax=` parameter for precise placement
- Better integration with matplotlib subplots
- More control over figure composition
```python
fig, ax = plt.subplots(figsize=(3.5, 2.5))
sns.scatterplot(data=df, x='x', y='y', hue='group', ax=ax)
```
**Figure-level functions** (e.g., `relplot`, `catplot`, `displot`):
- Use for automatic faceting by categorical variables
- Create complete figures with consistent styling
- Great for exploratory analysis
- Use `height` and `aspect` for sizing
```python
g = sns.relplot(data=df, x='x', y='y', col='category', kind='scatter')
```
#### Statistical Rigor with Seaborn
Seaborn automatically computes and displays uncertainty:
```python
# Line plot: shows mean ± 95% CI by default
sns.lineplot(data=df, x='time', y='value', hue='treatment',
errorbar=('ci', 95)) # Can change to 'sd', 'se', etc.
# Bar plot: shows mean with bootstrapped CI
sns.barplot(data=df, x='treatment', y='response',
errorbar=('ci', 95), capsize=0.1)
# Always specify error type in figure caption:
# "Error bars represent 95% confidence intervals"
```
#### Best Practices for Publication-Ready Seaborn Figures
1. **Always set publication theme first:**
```python
sns.set_theme(style='ticks', context='paper', font_scale=1.1)
```
2. **Use colorblind-safe palettes:**
```python
sns.set_palette('colorblind')
```
3. **Remove unnecessary elements:**
```python
sns.despine() # Remove top and right spines
```
4. **Control figure size appropriately:**
```python
# Axes-level: use matplotlib figsize
fig, ax = plt.subplots(figsize=(3.5, 2.5))
# Figure-level: use height and aspect
g = sns.relplot(..., height=3, aspect=1.2)
```
5. **Show individual data points when possible:**
```python
sns.boxplot(...) # Summary statistics
sns.stripplot(..., alpha=0.3) # Individual points
```
6. **Include proper labels with units:**
```python
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Expression (AU)')
```
7. **Export at correct resolution:**
```python
from figure_export import save_publication_figure
save_publication_figure(fig, 'figure_name',
formats=['pdf', 'png'], dpi=300)
```
#### Advanced Seaborn Techniques
**Pairwise relationships for exploratory analysis:**
```python
# Quick overview of all relationships
g = sns.pairplot(data=df, hue='condition',
vars=['gene1', 'gene2', 'gene3'],
corner=True, diag_kind='kde', height=2)
```
**Hierarchical clustering heatmap:**
```python
# Cluster samples and features
g = sns.clustermap(expression_data, method='ward',
metric='euclidean', z_score=0,
cmap='RdBu_r', center=0,
figsize=(10, 8),
row_colors=condition_colors,
cbar_kws={'label': 'Z-score'})
```
**Joint distributions with marginals:**
```python
# Bivariate distribution with context
g = sns.jointplot(data=df, x='gene1', y='gene2',
hue='treatment', kind='scatter',
height=6, ratio=4, marginal_kws={'kde': True})
```
#### Common Seaborn Issues and Solutions
**Issue: Legend outside plot area**
```python
g = sns.relplot(...)
g._legend.set_bbox_to_anchor((0.9, 0.5))
```
**Issue: Overlapping labels**
```python
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
```
**Issue: Text too small at final size**
```python
sns.set_context('paper', font_scale=1.2) # Increase if needed
```
#### Additional Resources
For more detailed seaborn information, see:
- `scientific-packages/seaborn/SKILL.md` - Comprehensive seaborn documentation
- `scientific-packages/seaborn/references/examples.md` - Practical use cases
- `scientific-packages/seaborn/references/function_reference.md` - Complete API reference
- `scientific-packages/seaborn/references/objects_interface.md` - Modern declarative API
### Plotly
- Interactive figures for exploration
- Export static images for publication
- Configure for publication quality:
```python
fig.update_layout(
font=dict(family='Arial, sans-serif', size=10),
plot_bgcolor='white',
# ... see matplotlib_examples.md Example 8
)
fig.write_image('figure.png', scale=3) # scale=3 gives ~300 DPI
```
## Resources
### References Directory
**Load these as needed for detailed information:**
- **`publication_guidelines.md`**: Comprehensive best practices
- Resolution and file format requirements
- Typography guidelines
- Layout and composition rules
- Statistical rigor requirements
- Complete publication checklist
- **`color_palettes.md`**: Color usage guide
- Colorblind-friendly palette specifications with RGB values
- Sequential and diverging colormap recommendations
- Testing procedures for accessibility
- Domain-specific palettes (genomics, microscopy)
- **`journal_requirements.md`**: Journal-specific specifications
- Technical requirements by publisher
- File format and DPI specifications
- Figure dimension requirements
- Quick reference table
- **`matplotlib_examples.md`**: Practical code examples
- 10 complete working examples
- Line plots, bar plots, heatmaps, multi-panel figures
- Journal-specific figure examples
- Tips for each library (matplotlib, seaborn, plotly)
### Scripts Directory
**Use these helper scripts for automation:**
- **`figure_export.py`**: Export utilities
- `save_publication_figure()`: Save in multiple formats with correct DPI
- `save_for_journal()`: Use journal-specific requirements automatically
- `check_figure_size()`: Verify dimensions meet journal specs
- Run directly: `python scripts/figure_export.py` for examples
- **`style_presets.py`**: Pre-configured styles
- `apply_publication_style()`: Apply preset styles (default, nature, science, cell)
- `set_color_palette()`: Quick palette switching
- `configure_for_journal()`: One-command journal configuration
- Run directly: `python scripts/style_presets.py` to see examples
### Assets Directory
**Use these files in figures:**
- **`color_palettes.py`**: Importable color definitions
- All recommended palettes as Python constants
- `apply_palette()` helper function
- Can be imported directly into notebooks/scripts
- **Matplotlib style files**: Use with `plt.style.use()`
- `publication.mplstyle`: General publication quality
- `nature.mplstyle`: Nature journal specifications
- `presentation.mplstyle`: Larger fonts for posters/slides
## Workflow Summary
**Recommended workflow for creating publication figures:**
1. **Plan**: Determine target journal, figure type, and content
2. **Configure**: Apply appropriate style for journal
```python
from style_presets import configure_for_journal
configure_for_journal('nature', 'single')
```
3. **Create**: Build figure with proper labels, colors, statistics
4. **Verify**: Check size, fonts, colors, accessibility
```python
from figure_export import check_figure_size
check_figure_size(fig, journal='nature')
```
5. **Export**: Save in required formats
```python
from figure_export import save_for_journal
save_for_journal(fig, 'figure1', 'nature', 'combination')
```
6. **Review**: View at final size in manuscript context
## Common Pitfalls to Avoid
1. **Font too small**: Text unreadable when printed at final size
2. **JPEG format**: Never use JPEG for graphs/plots (creates artifacts)
3. **Red-green colors**: ~8% of males cannot distinguish
4. **Low resolution**: Pixelated figures in publication
5. **Missing units**: Always label axes with units
6. **3D effects**: Distorts perception, avoid completely
7. **Chart junk**: Remove unnecessary gridlines, decorations
8. **Truncated axes**: Start bar charts at zero unless scientifically justified
9. **Inconsistent styling**: Different fonts/colors across figures in same manuscript
10. **No error bars**: Always show uncertainty
## Final Checklist
Before submitting figures, verify:
- [ ] Resolution meets journal requirements (300+ DPI)
- [ ] File format is correct (vector for plots, TIFF for images)
- [ ] Figure size matches journal specifications
- [ ] All text readable at final size (≥6 pt)
- [ ] Colors are colorblind-friendly
- [ ] Figure works in grayscale
- [ ] All axes labeled with units
- [ ] Error bars present with definition in caption
- [ ] Panel labels present and consistent
- [ ] No chart junk or 3D effects
- [ ] Fonts consistent across all figures
- [ ] Statistical significance clearly marked
- [ ] Legend is clear and complete
Use this skill to ensure scientific figures meet the highest publication standards while remaining accessible to all readers.
FILE:assets/color_palettes.py
"""
Colorblind-Friendly Color Palettes for Scientific Visualization
This module provides carefully curated color palettes optimized for
scientific publications and accessibility.
Usage:
from color_palettes import OKABE_ITO, apply_palette
import matplotlib.pyplot as plt
apply_palette('okabe_ito')
plt.plot([1, 2, 3], [1, 4, 9])
"""
# Okabe-Ito Palette (2008)
# The most widely recommended colorblind-friendly palette
OKABE_ITO = {
'orange': '#E69F00',
'sky_blue': '#56B4E9',
'bluish_green': '#009E73',
'yellow': '#F0E442',
'blue': '#0072B2',
'vermillion': '#D55E00',
'reddish_purple': '#CC79A7',
'black': '#000000'
}
OKABE_ITO_LIST = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
'#0072B2', '#D55E00', '#CC79A7', '#000000']
# Wong Palette (Nature Methods)
WONG = ['#000000', '#E69F00', '#56B4E9', '#009E73',
'#F0E442', '#0072B2', '#D55E00', '#CC79A7']
# Paul Tol Palettes (https://personal.sron.nl/~pault/)
TOL_BRIGHT = ['#4477AA', '#EE6677', '#228833', '#CCBB44',
'#66CCEE', '#AA3377', '#BBBBBB']
TOL_MUTED = ['#332288', '#88CCEE', '#44AA99', '#117733',
'#999933', '#DDCC77', '#CC6677', '#882255', '#AA4499']
TOL_LIGHT = ['#77AADD', '#EE8866', '#EEDD88', '#FFAABB',
'#99DDFF', '#44BB99', '#BBCC33', '#AAAA00', '#DDDDDD']
TOL_HIGH_CONTRAST = ['#004488', '#DDAA33', '#BB5566']
# Sequential colormaps (for continuous data)
SEQUENTIAL_COLORMAPS = [
'viridis', # Default, perceptually uniform
'plasma', # Perceptually uniform
'inferno', # Perceptually uniform
'magma', # Perceptually uniform
'cividis', # Optimized for colorblind viewers
'YlOrRd', # Yellow-Orange-Red
'YlGnBu', # Yellow-Green-Blue
'Blues', # Single hue
'Greens', # Single hue
'Purples', # Single hue
]
# Diverging colormaps (for data with meaningful center)
DIVERGING_COLORMAPS_SAFE = [
'RdYlBu', # Red-Yellow-Blue (reversed is common)
'RdBu', # Red-Blue
'PuOr', # Purple-Orange (excellent for colorblind)
'BrBG', # Brown-Blue-Green (good for colorblind)
'PRGn', # Purple-Green (use with caution)
'PiYG', # Pink-Yellow-Green (use with caution)
]
# Diverging colormaps to AVOID (red-green combinations)
DIVERGING_COLORMAPS_AVOID = [
'RdGn', # Red-Green (problematic!)
'RdYlGn', # Red-Yellow-Green (problematic!)
]
# Fluorophore colors (traditional - use with caution)
FLUOROPHORES_TRADITIONAL = {
'DAPI': '#0000FF', # Blue
'GFP': '#00FF00', # Green (problematic for colorblind)
'RFP': '#FF0000', # Red
'Cy5': '#FF00FF', # Magenta
'YFP': '#FFFF00', # Yellow
}
# Fluorophore colors (colorblind-friendly alternatives)
FLUOROPHORES_ACCESSIBLE = {
'Channel1': '#0072B2', # Blue
'Channel2': '#E69F00', # Orange (instead of green)
'Channel3': '#D55E00', # Vermillion (instead of red)
'Channel4': '#CC79A7', # Magenta
'Channel5': '#F0E442', # Yellow
}
# Genomics/Bioinformatics
DNA_BASES = {
'A': '#00CC00', # Green
'C': '#0000CC', # Blue
'G': '#FFB300', # Orange
'T': '#CC0000', # Red
}
DNA_BASES_ACCESSIBLE = {
'A': '#009E73', # Bluish Green
'C': '#0072B2', # Blue
'G': '#E69F00', # Orange
'T': '#D55E00', # Vermillion
}
def apply_palette(palette_name='okabe_ito'):
"""
Apply a color palette to matplotlib's default color cycle.
Parameters
----------
palette_name : str
Name of the palette to apply. Options:
'okabe_ito', 'wong', 'tol_bright', 'tol_muted',
'tol_light', 'tol_high_contrast'
Returns
-------
list
List of colors in the palette
Examples
--------
>>> apply_palette('okabe_ito')
>>> plt.plot([1, 2, 3], [1, 4, 9]) # Uses Okabe-Ito colors
"""
try:
import matplotlib.pyplot as plt
except ImportError:
print("matplotlib not installed")
return None
palettes = {
'okabe_ito': OKABE_ITO_LIST,
'wong': WONG,
'tol_bright': TOL_BRIGHT,
'tol_muted': TOL_MUTED,
'tol_light': TOL_LIGHT,
'tol_high_contrast': TOL_HIGH_CONTRAST,
}
if palette_name not in palettes:
available = ', '.join(palettes.keys())
raise ValueError(f"Palette '{palette_name}' not found. Available: {available}")
colors = palettes[palette_name]
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=colors)
return colors
def get_palette(palette_name='okabe_ito'):
"""
Get a color palette as a list.
Parameters
----------
palette_name : str
Name of the palette
Returns
-------
list
List of color hex codes
"""
palettes = {
'okabe_ito': OKABE_ITO_LIST,
'wong': WONG,
'tol_bright': TOL_BRIGHT,
'tol_muted': TOL_MUTED,
'tol_light': TOL_LIGHT,
'tol_high_contrast': TOL_HIGH_CONTRAST,
}
if palette_name not in palettes:
available = ', '.join(palettes.keys())
raise ValueError(f"Palette '{palette_name}' not found. Available: {available}")
return palettes[palette_name]
if __name__ == "__main__":
print("Available colorblind-friendly palettes:")
print(f" - Okabe-Ito: {len(OKABE_ITO_LIST)} colors")
print(f" - Wong: {len(WONG)} colors")
print(f" - Tol Bright: {len(TOL_BRIGHT)} colors")
print(f" - Tol Muted: {len(TOL_MUTED)} colors")
print(f" - Tol Light: {len(TOL_LIGHT)} colors")
print(f" - Tol High Contrast: {len(TOL_HIGH_CONTRAST)} colors")
print("\nOkabe-Ito palette (most recommended):")
for name, color in OKABE_ITO.items():
print(f" {name:15s}: {color}")
FILE:references/color_palettes.md
# Scientific Color Palettes and Guidelines
## Overview
Color choice in scientific visualization is critical for accessibility, clarity, and accurate data representation. This reference provides colorblind-friendly palettes and best practices for color usage.
## Colorblind-Friendly Palettes
### Okabe-Ito Palette (Recommended for Categories)
The Okabe-Ito palette is specifically designed to be distinguishable by people with all forms of color blindness.
```python
# Okabe-Ito colors (RGB values)
okabe_ito = {
'orange': '#E69F00', # RGB: (230, 159, 0)
'sky_blue': '#56B4E9', # RGB: (86, 180, 233)
'bluish_green': '#009E73', # RGB: (0, 158, 115)
'yellow': '#F0E442', # RGB: (240, 228, 66)
'blue': '#0072B2', # RGB: (0, 114, 178)
'vermillion': '#D55E00', # RGB: (213, 94, 0)
'reddish_purple': '#CC79A7', # RGB: (204, 121, 167)
'black': '#000000' # RGB: (0, 0, 0)
}
```
**Usage in Matplotlib:**
```python
import matplotlib.pyplot as plt
colors = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
'#0072B2', '#D55E00', '#CC79A7', '#000000']
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=colors)
```
**Usage in Seaborn:**
```python
import seaborn as sns
okabe_ito_palette = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
'#0072B2', '#D55E00', '#CC79A7']
sns.set_palette(okabe_ito_palette)
```
**Usage in Plotly:**
```python
import plotly.graph_objects as go
okabe_ito_plotly = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
'#0072B2', '#D55E00', '#CC79A7']
fig = go.Figure()
# Apply to discrete color scale
```
### Wong Palette (Alternative for Categories)
Another excellent colorblind-friendly palette by Bang Wong (Nature Methods).
```python
wong_palette = {
'black': '#000000',
'orange': '#E69F00',
'sky_blue': '#56B4E9',
'green': '#009E73',
'yellow': '#F0E442',
'blue': '#0072B2',
'vermillion': '#D55E00',
'purple': '#CC79A7'
}
```
### Paul Tol Palettes
Paul Tol has designed multiple scientifically-optimized palettes for different use cases.
**Bright Palette (up to 7 categories):**
```python
tol_bright = ['#4477AA', '#EE6677', '#228833', '#CCBB44',
'#66CCEE', '#AA3377', '#BBBBBB']
```
**Muted Palette (up to 9 categories):**
```python
tol_muted = ['#332288', '#88CCEE', '#44AA99', '#117733',
'#999933', '#DDCC77', '#CC6677', '#882255', '#AA4499']
```
**High Contrast (3 categories only):**
```python
tol_high_contrast = ['#004488', '#DDAA33', '#BB5566']
```
## Sequential Colormaps (Continuous Data)
Sequential colormaps represent data from low to high values with a single hue.
### Perceptually Uniform Colormaps
These colormaps have uniform perceptual change across the color scale.
**Viridis (default in Matplotlib):**
- Colorblind-friendly
- Prints well in grayscale
- Perceptually uniform
```python
plt.imshow(data, cmap='viridis')
```
**Cividis:**
- Optimized for colorblind viewers
- Designed specifically for deuteranopia/protanopia
```python
plt.imshow(data, cmap='cividis')
```
**Plasma, Inferno, Magma:**
- Perceptually uniform alternatives to viridis
- Good for different aesthetic preferences
```python
plt.imshow(data, cmap='plasma')
```
### When to Use Sequential Maps
- Heatmaps showing intensity
- Geographic elevation data
- Probability distributions
- Any single-variable continuous data (low → high)
## Diverging Colormaps (Negative to Positive)
Diverging colormaps have a neutral middle color with two contrasting colors at extremes.
### Colorblind-Safe Diverging Maps
**RdYlBu (Red-Yellow-Blue):**
```python
plt.imshow(data, cmap='RdYlBu_r') # _r reverses: blue (low) to red (high)
```
**PuOr (Purple-Orange):**
- Excellent for colorblind viewers
```python
plt.imshow(data, cmap='PuOr')
```
**BrBG (Brown-Blue-Green):**
- Good colorblind accessibility
```python
plt.imshow(data, cmap='BrBG')
```
### Avoid These Diverging Maps
- **RdGn (Red-Green)**: Problematic for red-green colorblindness
- **RdYlGn (Red-Yellow-Green)**: Same issue
### When to Use Diverging Maps
- Correlation matrices
- Change/difference data (positive vs. negative)
- Deviation from a central value
- Temperature anomalies
## Special Purpose Palettes
### For Genomics/Bioinformatics
**Sequence type identification:**
```python
# DNA/RNA bases
nucleotide_colors = {
'A': '#00CC00', # Green
'C': '#0000CC', # Blue
'G': '#FFB300', # Orange
'T': '#CC0000', # Red
'U': '#CC0000' # Red (RNA)
}
```
**Gene expression:**
- Use sequential colormaps (viridis, YlOrRd) for expression levels
- Use diverging colormaps (RdBu) for log2 fold change
### For Microscopy
**Fluorescence channels:**
```python
# Traditional fluorophore colors (use with caution)
fluorophore_colors = {
'DAPI': '#0000FF', # Blue - DNA
'GFP': '#00FF00', # Green (problematic for colorblind)
'RFP': '#FF0000', # Red
'Cy5': '#FF00FF' # Magenta
}
# Colorblind-friendly alternatives
fluorophore_alt = {
'Channel1': '#0072B2', # Blue
'Channel2': '#E69F00', # Orange (instead of green)
'Channel3': '#D55E00', # Vermillion
'Channel4': '#CC79A7' # Magenta
}
```
## Color Usage Best Practices
### Categorical Data (Qualitative Color Schemes)
**Do:**
- Use distinct, saturated colors from Okabe-Ito or Wong palette
- Limit to 7-8 categories max in one plot
- Use consistent colors for same categories across figures
- Add patterns/markers when colors alone might be insufficient
**Don't:**
- Use red/green combinations
- Use rainbow (jet) colormap for categories
- Use similar hues that are hard to distinguish
### Continuous Data (Sequential/Diverging Schemes)
**Do:**
- Use perceptually uniform colormaps (viridis, plasma, cividis)
- Choose diverging maps when data has meaningful center point
- Include colorbar with labeled ticks
- Test appearance in grayscale
**Don't:**
- Use rainbow (jet) colormap - not perceptually uniform
- Use red-green diverging maps
- Omit colorbar on heatmaps
## Testing for Colorblind Accessibility
### Online Simulators
- **Coblis**: https://www.color-blindness.com/coblis-color-blindness-simulator/
- **Color Oracle**: Free downloadable tool for Windows/Mac/Linux
- **Sim Daltonism**: Mac application
### Types of Color Vision Deficiency
- **Deuteranopia** (~5% of males): Cannot distinguish green
- **Protanopia** (~2% of males): Cannot distinguish red
- **Tritanopia** (<1%): Cannot distinguish blue (rare)
### Python Tools
```python
# Using colorspacious to simulate colorblind vision
from colorspacious import cspace_convert
def simulate_deuteranopia(image_rgb):
from colorspacious import cspace_convert
# Convert to colorblind simulation
# (Implementation would require colorspacious library)
pass
```
## Implementation Examples
### Setting Global Matplotlib Style
```python
import matplotlib.pyplot as plt
import matplotlib as mpl
# Set Okabe-Ito as default color cycle
okabe_ito_colors = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
'#0072B2', '#D55E00', '#CC79A7']
mpl.rcParams['axes.prop_cycle'] = mpl.cycler(color=okabe_ito_colors)
# Set default colormap to viridis
mpl.rcParams['image.cmap'] = 'viridis'
```
### Seaborn with Custom Palette
```python
import seaborn as sns
# Set Paul Tol muted palette
tol_muted = ['#332288', '#88CCEE', '#44AA99', '#117733',
'#999933', '#DDCC77', '#CC6677', '#882255', '#AA4499']
sns.set_palette(tol_muted)
# For heatmaps
sns.heatmap(data, cmap='viridis', annot=True)
```
### Plotly with Discrete Colors
```python
import plotly.express as px
# Use Okabe-Ito for categorical data
okabe_ito_plotly = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
'#0072B2', '#D55E00', '#CC79A7']
fig = px.scatter(df, x='x', y='y', color='category',
color_discrete_sequence=okabe_ito_plotly)
```
## Grayscale Compatibility
All figures should remain interpretable in grayscale. Test by converting to grayscale:
```python
# Convert figure to grayscale for testing
fig.savefig('figure_gray.png', dpi=300, colormap='gray')
```
**Strategies for grayscale compatibility:**
1. Use different line styles (solid, dashed, dotted)
2. Use different marker shapes (circles, squares, triangles)
3. Add hatching patterns to bars
4. Ensure sufficient luminance contrast between colors
## Color Spaces
### RGB vs CMYK
- **RGB** (Red, Green, Blue): For digital/screen display
- **CMYK** (Cyan, Magenta, Yellow, Black): For print
**Important:** Colors appear different in print vs. screen. When preparing for print:
1. Convert to CMYK color space
2. Check color appearance in CMYK preview
3. Ensure sufficient contrast remains
### Matplotlib Color Spaces
```python
# Save for print (CMYK)
# Note: Direct CMYK support limited; use PDF and let publisher convert
fig.savefig('figure.pdf', dpi=300)
# For RGB (digital)
fig.savefig('figure.png', dpi=300)
```
## Common Mistakes
1. **Using jet/rainbow colormap**: Not perceptually uniform; avoid
2. **Red-green combinations**: ~8% of males cannot distinguish
3. **Too many colors**: More than 7-8 becomes difficult to distinguish
4. **Inconsistent color meaning**: Same color should mean same thing across figures
5. **Missing colorbar**: Always include for continuous data
6. **Low contrast**: Ensure colors differ sufficiently
7. **Relying solely on color**: Add texture, patterns, or markers
## Resources
- **ColorBrewer**: http://colorbrewer2.org/ - Choose palettes by colorblind-safe option
- **Paul Tol's palettes**: https://personal.sron.nl/~pault/
- **Okabe-Ito palette origin**: "Color Universal Design" (Okabe & Ito, 2008)
- **Matplotlib colormaps**: https://matplotlib.org/stable/tutorials/colors/colormaps.html
- **Seaborn palettes**: https://seaborn.pydata.org/tutorial/color_palettes.html
FILE:references/journal_requirements.md
# Journal-Specific Figure Requirements
## Overview
Different journals have specific technical requirements for figures. This reference compiles common requirements from major scientific publishers. **Always check the specific journal's author guidelines for the most current requirements.**
## Nature Portfolio (Nature, Nature Methods, etc.)
### Technical Specifications
- **File formats**:
- Vector: PDF, EPS, AI (preferred for graphs)
- Raster: TIFF, PNG (for images)
- Never: PowerPoint, Word, JPEG
- **Resolution**:
- Line art: 1000-1200 DPI
- Combination (line art + images): 600 DPI
- Photographs/microscopy: 300 DPI minimum
- **Color space**: RGB (Nature is digital-first)
- **Dimensions**:
- Single column: 89 mm (3.5 inches)
- 1.5 column: 120 mm (4.7 inches)
- Double column: 183 mm (7.2 inches)
- Maximum height: 247 mm (9.7 inches)
- **Fonts**:
- Arial or Helvetica (or similar sans-serif)
- Minimum 5-7 pt at final size
- Embed all fonts in PDF/EPS
### Nature Specific Guidelines
- Panel labels: a, b, c (lowercase, bold) in top-left corner
- Scale bars required for microscopy images
- Gel images: Include molecular weight markers
- Cropping: Indicate with line breaks
- Statistics: Mark significance; define symbols in legend
- Source data: Required for all graphs
### File Naming
Format: `FirstAuthorLastName_FigureNumber.ext`
Example: `Smith_Fig1.pdf`
## Science (AAAS)
### Technical Specifications
- **File formats**:
- Vector: EPS, PDF (preferred)
- Raster: TIFF
- Acceptable: AI, PSD (Photoshop)
- **Resolution**:
- Line art: 1000 DPI minimum
- Photographs: 300 DPI minimum
- Combination: 600 DPI minimum
- **Color space**: RGB
- **Dimensions**:
- Single column: 5.5 cm (2.17 inches)
- 1.5 column: 12 cm (4.72 inches)
- Full width: 17.5 cm (6.89 inches)
- Maximum height: 23.3 cm (9.17 inches)
- **Fonts**:
- Helvetica (or Arial)
- 6-8 pt minimum at final size
- Consistent across all figures
### Science Specific Guidelines
- Panel labels: (A), (B), (C) in parentheses
- Minimal text within figures (details in caption)
- High contrast for web and print
- Error bars required; define in caption
- Avoid excessive whitespace
### File Naming
Format: `Manuscript#_Fig#.ext`
Example: `abn1234_Fig1.eps`
## Cell Press (Cell, Neuron, Molecular Cell, etc.)
### Technical Specifications
- **File formats**:
- Vector: PDF, EPS (preferred for graphs/diagrams)
- Raster: TIFF (for photographs)
- **Resolution**:
- Line art: 1000 DPI
- Photographs: 300 DPI
- Combination: 600 DPI
- **Color space**: RGB
- **Dimensions**:
- Single column: 85 mm (3.35 inches)
- Double column: 178 mm (7.01 inches)
- Maximum height: 230 mm (9.06 inches)
- **Fonts**:
- Arial or Helvetica only
- 8-12 pt for axis labels
- 6-8 pt for tick labels
### Cell Press Specific Guidelines
- Panel labels: (A), (B), (C) or A, B, C in top-left
- Related panels should match in size
- Scale bars mandatory for microscopy
- Western blots: Include molecular weight markers
- Arrows/arrowheads: 2 pt minimum width
- Line widths: 1-2 pt for data
## PLOS (Public Library of Science)
### Technical Specifications
- **File formats**:
- Vector: EPS, PDF (preferred)
- Raster: TIFF, PNG
- TIFF with LZW compression acceptable
- **Resolution**:
- Minimum 300 DPI at final size (all figure types)
- 600 DPI preferred for line art
- **Color space**: RGB
- **Dimensions**:
- Single column: 8.3 cm (3.27 inches)
- 1.5 column: 11.4 cm (4.49 inches)
- Double column: 17.3 cm (6.81 inches)
- Maximum height: 23.3 cm (9.17 inches)
- **Fonts**:
- Sans-serif preferred (Arial, Helvetica)
- 8-12 pt for labels at final size
### PLOS Specific Guidelines
- Figures should be understandable without caption
- Color required only if adding information
- All figures convertible to grayscale
- Panel labels optional but recommended
- Open access: Figures must be CC-BY licensed
- Source data files encouraged
## ACS (American Chemical Society)
### Technical Specifications
- **File formats**:
- Preferred: TIFF, PDF, EPS
- Application files: AI, CDX (ChemDraw), CDL
- Acceptable: PNG (not for publication)
- **Resolution**:
- Minimum 300 DPI at final size
- 600 DPI for line art and chemical structures
- 1200 DPI for detailed structures
- **Color space**: RGB or CMYK (check specific journal)
- **Dimensions**:
- Single column: 3.25 inches (8.25 cm)
- Double column: 7 inches (17.78 cm)
- **Fonts**:
- Embedded fonts required
- Consistent sizing across figures
### ACS Specific Guidelines
- Chemical structures: Use ChemDraw or equivalent
- Atom labels: 10-12 pt
- Bond thickness: 2 pt
- Panel labels: Lowercase bold (a, b, c)
- High contrast required (many ACS journals grayscale print)
## Elsevier Journals (varies by journal)
### Technical Specifications
- **File formats**:
- Vector: EPS, PDF
- Raster: TIFF, JPEG (only for photographs)
- **Resolution**:
- Line art: 1000 DPI minimum
- Photographs: 300 DPI minimum
- Combination: 600 DPI minimum
- **Color space**: RGB (for online); CMYK (for print journals)
- **Dimensions**: Vary by journal
- Common single column: 90 mm
- Common double column: 190 mm
- **Fonts**:
- Preferred: Arial, Times, Symbol
- Minimum 6 pt at final size
### Elsevier Specific Guidelines
- Check individual journal guidelines (highly variable)
- Some journals charge for color in print
- Panel labels typically (A), (B), (C) or A, B, C
- Graphical abstract often required (separate from figures)
## IEEE (Engineering/Computer Science)
### Technical Specifications
- **File formats**:
- Vector: PDF, EPS (preferred)
- Raster: TIFF, PNG
- **Resolution**:
- Photographs/graphics: 300 DPI minimum at final size
- Line art: 600 DPI minimum
- **Color space**: RGB (online); CMYK (print)
- **Dimensions**:
- Single column: 3.5 inches (8.9 cm)
- Double column: 7.16 inches (18.2 cm)
- **Fonts**:
- Sans-serif preferred
- Minimum 8-10 pt at final size
### IEEE Specific Guidelines
- Figures should be readable in black and white
- Color figures incur no charge (online publication)
- Panel labels: (a), (b), (c) in lowercase
- Captions below figures (not on separate page)
- Use IEEE graphics checker tool before submission
## BMC (BioMed Central) - Open Access
### Technical Specifications
- **File formats**:
- Any standard format accepted
- Preferred: TIFF, PDF, EPS, PNG
- **Resolution**:
- Minimum 600 DPI for line art
- Minimum 300 DPI for photographs
- **Color space**: RGB
- **Dimensions**:
- Flexible, but consider readability
- Maximum width typically 140 mm
- **Fonts**:
- Embedded and readable
### BMC Specific Guidelines
- Open access: CC-BY license required
- Figure files uploaded separately
- Panel labels as appropriate for field
- Source data encouraged
- Accessibility important (colorblind-friendly)
## Common Requirements Across Journals
### Universal Best Practices
1. **Never use JPEG for graphs/plots**: Compression artifacts
2. **Embed all fonts**: In PDF/EPS files
3. **Layer structure**: Flatten images (merge layers in Photoshop)
4. **RGB vs CMYK**: Most journals now RGB (digital-first)
5. **High resolution**: Always better to start high, reduce if needed
6. **Consistency**: Same style across all figures in manuscript
7. **File size**: Balance quality with reasonable file sizes (typically <10 MB per figure)
### Submitting Figures
- **Initial submission**: Lower resolution often acceptable (for review)
- **Revision/acceptance**: High-resolution required
- **Separate files**: Each figure as separate file
- **File naming**: Clear, systematic naming
- **Supporting information**: May have different requirements
## Quick Reference Table
| Publisher | Single Column | Double Column | Min DPI (photos) | Min DPI (line art) | Preferred Format |
|-----------|---------------|---------------|------------------|-------------------|------------------|
| Nature | 89 mm | 183 mm | 300 | 1000 | EPS, PDF |
| Science | 5.5 cm | 17.5 cm | 300 | 1000 | EPS, PDF |
| Cell Press | 85 mm | 178 mm | 300 | 1000 | EPS, PDF |
| PLOS | 8.3 cm | 17.3 cm | 300 | 600 | EPS, TIFF |
| ACS | 3.25 in | 7 in | 300 | 600 | TIFF, EPS |
## Checking Requirements
### Before Submission Checklist
1. Read journal's author guidelines (figure section)
2. Check file format requirements
3. Verify resolution requirements
4. Confirm size specifications (width × height)
5. Check font requirements
6. Verify color space (RGB vs CMYK)
7. Check panel labeling style
8. Review supplementary materials requirements
9. Confirm file naming conventions
10. Check file size limits
### Useful Tools
- **ImageJ/Fiji**: Check/adjust DPI
- **Adobe Acrobat**: Verify embedded fonts, check PDF properties
- **GIMP**: Free alternative to Photoshop for raster editing
- **Inkscape**: Free vector graphics editor
## Resources
- **Journal websites**: Always check "Author Guidelines" or "Instructions for Authors"
- **Publisher resources**: Many provide templates and tools
- **Format conversion**: Use reputable tools; check output quality
- **Help desks**: Contact journal staff if unclear
## Notes
- Requirements change periodically - always verify current guidelines
- Preprint servers (bioRxiv, arXiv) often have different requirements
- Conference proceedings may have separate requirements
- Some journals offer figure preparation services (often paid)
- Supplementary figures may have relaxed requirements compared to main text figures
FILE:references/matplotlib_examples.md
# Publication-Ready Matplotlib Examples
## Overview
This reference provides practical code examples for creating publication-ready scientific figures using Matplotlib, Seaborn, and Plotly. All examples follow best practices from `publication_guidelines.md` and use colorblind-friendly palettes from `color_palettes.md`.
## Setup and Configuration
### Publication-Quality Matplotlib Configuration
```python
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
# Set publication quality parameters
mpl.rcParams['figure.dpi'] = 300
mpl.rcParams['savefig.dpi'] = 300
mpl.rcParams['font.size'] = 8
mpl.rcParams['font.family'] = 'sans-serif'
mpl.rcParams['font.sans-serif'] = ['Arial', 'Helvetica']
mpl.rcParams['axes.labelsize'] = 9
mpl.rcParams['axes.titlesize'] = 9
mpl.rcParams['xtick.labelsize'] = 7
mpl.rcParams['ytick.labelsize'] = 7
mpl.rcParams['legend.fontsize'] = 7
mpl.rcParams['axes.linewidth'] = 0.5
mpl.rcParams['xtick.major.width'] = 0.5
mpl.rcParams['ytick.major.width'] = 0.5
mpl.rcParams['lines.linewidth'] = 1.5
# Use colorblind-friendly colors (Okabe-Ito palette)
okabe_ito = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
'#0072B2', '#D55E00', '#CC79A7', '#000000']
mpl.rcParams['axes.prop_cycle'] = mpl.cycler(color=okabe_ito)
# Use perceptually uniform colormap
mpl.rcParams['image.cmap'] = 'viridis'
```
### Helper Function for Saving
```python
def save_publication_figure(fig, filename, formats=['pdf', 'png'], dpi=300):
"""
Save figure in multiple formats for publication.
Parameters:
-----------
fig : matplotlib.figure.Figure
Figure to save
filename : str
Base filename (without extension)
formats : list
List of file formats to save ['pdf', 'png', 'eps', 'svg']
dpi : int
Resolution for raster formats
"""
for fmt in formats:
output_file = f"{filename}.{fmt}"
fig.savefig(output_file, dpi=dpi, bbox_inches='tight',
facecolor='white', edgecolor='none',
transparent=False, format=fmt)
print(f"Saved: {output_file}")
```
## Example 1: Line Plot with Error Bars
```python
import matplotlib.pyplot as plt
import numpy as np
# Generate sample data
x = np.linspace(0, 10, 50)
y1 = 2 * x + 1 + np.random.normal(0, 1, 50)
y2 = 1.5 * x + 2 + np.random.normal(0, 1.2, 50)
# Calculate means and standard errors for binned data
bins = np.linspace(0, 10, 11)
y1_mean = [y1[(x >= bins[i]) & (x < bins[i+1])].mean() for i in range(len(bins)-1)]
y1_sem = [y1[(x >= bins[i]) & (x < bins[i+1])].std() /
np.sqrt(len(y1[(x >= bins[i]) & (x < bins[i+1])]))
for i in range(len(bins)-1)]
x_binned = (bins[:-1] + bins[1:]) / 2
# Create figure with appropriate size (single column width = 3.5 inches)
fig, ax = plt.subplots(figsize=(3.5, 2.5))
# Plot with error bars
ax.errorbar(x_binned, y1_mean, yerr=y1_sem,
marker='o', markersize=4, capsize=3, capthick=0.5,
label='Condition A', linewidth=1.5)
# Add labels with units
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Fluorescence intensity (a.u.)')
# Add legend
ax.legend(frameon=False, loc='upper left')
# Remove top and right spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# Tight layout
fig.tight_layout()
# Save
save_publication_figure(fig, 'line_plot_with_errors')
plt.show()
```
## Example 2: Multi-Panel Figure
```python
import matplotlib.pyplot as plt
import numpy as np
from string import ascii_uppercase
# Create figure with multiple panels (double column width = 7 inches)
fig = plt.figure(figsize=(7, 4))
# Define grid for panels
gs = fig.add_gridspec(2, 3, hspace=0.4, wspace=0.4,
left=0.08, right=0.98, top=0.95, bottom=0.08)
# Panel A: Line plot
ax_a = fig.add_subplot(gs[0, :2])
x = np.linspace(0, 10, 100)
for i, offset in enumerate([0, 0.5, 1.0]):
ax_a.plot(x, np.sin(x) + offset, label=f'Dataset {i+1}')
ax_a.set_xlabel('Time (s)')
ax_a.set_ylabel('Amplitude (V)')
ax_a.legend(frameon=False, fontsize=6)
ax_a.spines['top'].set_visible(False)
ax_a.spines['right'].set_visible(False)
# Panel B: Bar plot
ax_b = fig.add_subplot(gs[0, 2])
categories = ['Control', 'Treatment\nA', 'Treatment\nB']
values = [100, 125, 140]
errors = [5, 8, 6]
ax_b.bar(categories, values, yerr=errors, capsize=3,
color=['#0072B2', '#E69F00', '#009E73'], alpha=0.8)
ax_b.set_ylabel('Response (%)')
ax_b.spines['top'].set_visible(False)
ax_b.spines['right'].set_visible(False)
ax_b.set_ylim(0, 160)
# Panel C: Scatter plot
ax_c = fig.add_subplot(gs[1, 0])
x = np.random.randn(100)
y = 2*x + np.random.randn(100)
ax_c.scatter(x, y, s=10, alpha=0.6, color='#0072B2')
ax_c.set_xlabel('Variable X')
ax_c.set_ylabel('Variable Y')
ax_c.spines['top'].set_visible(False)
ax_c.spines['right'].set_visible(False)
# Panel D: Heatmap
ax_d = fig.add_subplot(gs[1, 1:])
data = np.random.randn(10, 20)
im = ax_d.imshow(data, cmap='viridis', aspect='auto')
ax_d.set_xlabel('Sample number')
ax_d.set_ylabel('Feature')
cbar = plt.colorbar(im, ax=ax_d, fraction=0.046, pad=0.04)
cbar.set_label('Intensity (a.u.)', rotation=270, labelpad=12)
# Add panel labels
panels = [ax_a, ax_b, ax_c, ax_d]
for i, ax in enumerate(panels):
ax.text(-0.15, 1.05, ascii_uppercase[i], transform=ax.transAxes,
fontsize=10, fontweight='bold', va='top')
save_publication_figure(fig, 'multi_panel_figure')
plt.show()
```
## Example 3: Box Plot with Individual Points
```python
import matplotlib.pyplot as plt
import numpy as np
# Generate sample data
np.random.seed(42)
data = [np.random.normal(100, 15, 30),
np.random.normal(120, 20, 30),
np.random.normal(140, 18, 30),
np.random.normal(110, 22, 30)]
fig, ax = plt.subplots(figsize=(3.5, 3))
# Create box plot
bp = ax.boxplot(data, widths=0.5, patch_artist=True,
showfliers=False, # We'll add points manually
boxprops=dict(facecolor='lightgray', edgecolor='black', linewidth=0.8),
medianprops=dict(color='black', linewidth=1.5),
whiskerprops=dict(linewidth=0.8),
capprops=dict(linewidth=0.8))
# Overlay individual points
colors = ['#0072B2', '#E69F00', '#009E73', '#D55E00']
for i, (d, color) in enumerate(zip(data, colors)):
# Add jitter to x positions
x = np.random.normal(i+1, 0.04, size=len(d))
ax.scatter(x, d, alpha=0.4, s=8, color=color)
# Customize
ax.set_xticklabels(['Control', 'Treatment A', 'Treatment B', 'Treatment C'])
ax.set_ylabel('Cell count')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_ylim(50, 200)
fig.tight_layout()
save_publication_figure(fig, 'boxplot_with_points')
plt.show()
```
## Example 4: Heatmap with Colorbar
```python
import matplotlib.pyplot as plt
import numpy as np
# Generate correlation matrix
np.random.seed(42)
n = 10
A = np.random.randn(n, n)
corr_matrix = np.corrcoef(A)
# Create figure
fig, ax = plt.subplots(figsize=(4, 3.5))
# Plot heatmap
im = ax.imshow(corr_matrix, cmap='RdBu_r', vmin=-1, vmax=1, aspect='auto')
# Add colorbar
cbar = plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
cbar.set_label('Correlation coefficient', rotation=270, labelpad=15)
# Set ticks and labels
gene_names = [f'Gene{i+1}' for i in range(n)]
ax.set_xticks(np.arange(n))
ax.set_yticks(np.arange(n))
ax.set_xticklabels(gene_names, rotation=45, ha='right')
ax.set_yticklabels(gene_names)
# Add grid
ax.set_xticks(np.arange(n)-.5, minor=True)
ax.set_yticks(np.arange(n)-.5, minor=True)
ax.grid(which='minor', color='white', linestyle='-', linewidth=0.5)
fig.tight_layout()
save_publication_figure(fig, 'correlation_heatmap')
plt.show()
```
## Example 5: Seaborn Violin Plot
```python
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# Generate sample data
np.random.seed(42)
data = pd.DataFrame({
'condition': np.repeat(['Control', 'Drug A', 'Drug B'], 50),
'value': np.concatenate([
np.random.normal(100, 15, 50),
np.random.normal(120, 20, 50),
np.random.normal(140, 18, 50)
])
})
# Set style
sns.set_style('ticks')
sns.set_palette(['#0072B2', '#E69F00', '#009E73'])
fig, ax = plt.subplots(figsize=(3.5, 3))
# Create violin plot
sns.violinplot(data=data, x='condition', y='value', ax=ax,
inner='box', linewidth=0.8)
# Add strip plot
sns.stripplot(data=data, x='condition', y='value', ax=ax,
size=2, alpha=0.3, color='black')
# Customize
ax.set_xlabel('')
ax.set_ylabel('Expression level (AU)')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
fig.tight_layout()
save_publication_figure(fig, 'violin_plot')
plt.show()
```
## Example 6: Scientific Scatter with Regression
```python
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
# Generate data with correlation
np.random.seed(42)
x = np.random.randn(100)
y = 2.5 * x + np.random.randn(100) * 0.8
# Calculate regression
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
# Create figure
fig, ax = plt.subplots(figsize=(3.5, 3.5))
# Scatter plot
ax.scatter(x, y, s=15, alpha=0.6, color='#0072B2', edgecolors='none')
# Regression line
x_line = np.array([x.min(), x.max()])
y_line = slope * x_line + intercept
ax.plot(x_line, y_line, 'r-', linewidth=1.5, label=f'y = {slope:.2f}x + {intercept:.2f}')
# Add statistics text
stats_text = f'$R^2$ = {r_value**2:.3f}\n$p$ < 0.001' if p_value < 0.001 else f'$R^2$ = {r_value**2:.3f}\n$p$ = {p_value:.3f}'
ax.text(0.05, 0.95, stats_text, transform=ax.transAxes,
verticalalignment='top', fontsize=7,
bbox=dict(boxstyle='round', facecolor='white', alpha=0.8, edgecolor='gray', linewidth=0.5))
# Customize
ax.set_xlabel('Predictor variable')
ax.set_ylabel('Response variable')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
fig.tight_layout()
save_publication_figure(fig, 'scatter_regression')
plt.show()
```
## Example 7: Time Series with Shaded Error
```python
import matplotlib.pyplot as plt
import numpy as np
# Generate time series data
np.random.seed(42)
time = np.linspace(0, 24, 100)
n_replicates = 5
# Simulate multiple replicates
data = np.array([10 * np.exp(-time/10) + np.random.normal(0, 0.5, 100)
for _ in range(n_replicates)])
# Calculate mean and SEM
mean = data.mean(axis=0)
sem = data.std(axis=0) / np.sqrt(n_replicates)
# Create figure
fig, ax = plt.subplots(figsize=(4, 2.5))
# Plot mean line
ax.plot(time, mean, linewidth=1.5, color='#0072B2', label='Mean ± SEM')
# Add shaded error region
ax.fill_between(time, mean - sem, mean + sem,
alpha=0.3, color='#0072B2', linewidth=0)
# Customize
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Concentration (μM)')
ax.legend(frameon=False, loc='upper right')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_xlim(0, 24)
ax.set_ylim(0, 12)
fig.tight_layout()
save_publication_figure(fig, 'timeseries_shaded')
plt.show()
```
## Example 8: Plotly Interactive Figure
```python
import plotly.graph_objects as go
import numpy as np
# Generate data
np.random.seed(42)
x = np.random.randn(100)
y = 2*x + np.random.randn(100)
colors = np.random.choice(['Group A', 'Group B'], 100)
# Okabe-Ito colors for Plotly
okabe_ito_plotly = ['#E69F00', '#56B4E9']
# Create figure
fig = go.Figure()
for group, color in zip(['Group A', 'Group B'], okabe_ito_plotly):
mask = colors == group
fig.add_trace(go.Scatter(
x=x[mask], y=y[mask],
mode='markers',
name=group,
marker=dict(size=6, color=color, opacity=0.6)
))
# Update layout for publication quality
fig.update_layout(
width=500,
height=400,
font=dict(family='Arial, sans-serif', size=10),
plot_bgcolor='white',
xaxis=dict(
title='Variable X',
showgrid=False,
showline=True,
linewidth=1,
linecolor='black',
mirror=False
),
yaxis=dict(
title='Variable Y',
showgrid=False,
showline=True,
linewidth=1,
linecolor='black',
mirror=False
),
legend=dict(
x=0.02,
y=0.98,
bgcolor='rgba(255,255,255,0.8)',
bordercolor='gray',
borderwidth=0.5
)
)
# Save as static image (requires kaleido)
fig.write_image('plotly_scatter.png', width=500, height=400, scale=3) # scale=3 gives ~300 DPI
fig.write_html('plotly_scatter.html') # Interactive version
fig.show()
```
## Example 9: Grouped Bar Plot with Significance
```python
import matplotlib.pyplot as plt
import numpy as np
# Data
categories = ['WT', 'Mutant A', 'Mutant B']
control_means = [100, 85, 70]
control_sem = [5, 6, 5]
treatment_means = [100, 120, 140]
treatment_sem = [6, 8, 9]
x = np.arange(len(categories))
width = 0.35
fig, ax = plt.subplots(figsize=(3.5, 3))
# Create bars
bars1 = ax.bar(x - width/2, control_means, width, yerr=control_sem,
capsize=3, label='Control', color='#0072B2', alpha=0.8)
bars2 = ax.bar(x + width/2, treatment_means, width, yerr=treatment_sem,
capsize=3, label='Treatment', color='#E69F00', alpha=0.8)
# Add significance markers
def add_significance_bar(ax, x1, x2, y, h, text):
"""Add significance bar between two bars"""
ax.plot([x1, x1, x2, x2], [y, y+h, y+h, y], linewidth=0.8, c='black')
ax.text((x1+x2)/2, y+h, text, ha='center', va='bottom', fontsize=7)
# Mark significant differences
add_significance_bar(ax, x[1]-width/2, x[1]+width/2, 135, 3, '***')
add_significance_bar(ax, x[2]-width/2, x[2]+width/2, 155, 3, '***')
# Customize
ax.set_ylabel('Activity (% of WT control)')
ax.set_xticks(x)
ax.set_xticklabels(categories)
ax.legend(frameon=False, loc='upper left')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_ylim(0, 180)
# Add note about significance
ax.text(0.98, 0.02, '*** p < 0.001', transform=ax.transAxes,
ha='right', va='bottom', fontsize=6)
fig.tight_layout()
save_publication_figure(fig, 'grouped_bar_significance')
plt.show()
```
## Example 10: Publication-Ready Figure for Nature
```python
import matplotlib.pyplot as plt
import numpy as np
from string import ascii_lowercase
# Nature specifications: 89mm single column
inch_per_mm = 0.0393701
width_mm = 89
height_mm = 110
figsize = (width_mm * inch_per_mm, height_mm * inch_per_mm)
fig = plt.figure(figsize=figsize)
gs = fig.add_gridspec(3, 2, hspace=0.5, wspace=0.4,
left=0.12, right=0.95, top=0.96, bottom=0.08)
# Panel a: Time course
ax_a = fig.add_subplot(gs[0, :])
time = np.linspace(0, 48, 100)
for i, label in enumerate(['Control', 'Treatment']):
y = (1 + i*0.5) * np.exp(-time/20) * (1 + 0.3*np.sin(time/5))
ax_a.plot(time, y, linewidth=1.2, label=label)
ax_a.set_xlabel('Time (h)', fontsize=7)
ax_a.set_ylabel('Growth (OD$_{600}$)', fontsize=7)
ax_a.legend(frameon=False, fontsize=6)
ax_a.tick_params(labelsize=6)
ax_a.spines['top'].set_visible(False)
ax_a.spines['right'].set_visible(False)
# Panel b: Bar plot
ax_b = fig.add_subplot(gs[1, 0])
categories = ['A', 'B', 'C']
values = [1.0, 1.5, 2.2]
errors = [0.1, 0.15, 0.2]
ax_b.bar(categories, values, yerr=errors, capsize=2, width=0.6,
color='#0072B2', alpha=0.8)
ax_b.set_ylabel('Fold change', fontsize=7)
ax_b.tick_params(labelsize=6)
ax_b.spines['top'].set_visible(False)
ax_b.spines['right'].set_visible(False)
# Panel c: Heatmap
ax_c = fig.add_subplot(gs[1, 1])
data = np.random.randn(8, 6)
im = ax_c.imshow(data, cmap='viridis', aspect='auto')
ax_c.set_xlabel('Sample', fontsize=7)
ax_c.set_ylabel('Gene', fontsize=7)
ax_c.tick_params(labelsize=6)
# Panel d: Scatter
ax_d = fig.add_subplot(gs[2, :])
x = np.random.randn(50)
y = 2*x + np.random.randn(50)*0.5
ax_d.scatter(x, y, s=8, alpha=0.6, color='#E69F00')
ax_d.set_xlabel('Expression gene X', fontsize=7)
ax_d.set_ylabel('Expression gene Y', fontsize=7)
ax_d.tick_params(labelsize=6)
ax_d.spines['top'].set_visible(False)
ax_d.spines['right'].set_visible(False)
# Add lowercase panel labels (Nature style)
for i, ax in enumerate([ax_a, ax_b, ax_c, ax_d]):
ax.text(-0.2, 1.1, f'{ascii_lowercase[i]}', transform=ax.transAxes,
fontsize=9, fontweight='bold', va='top')
# Save in Nature-preferred format
fig.savefig('nature_figure.pdf', dpi=1000, bbox_inches='tight',
facecolor='white', edgecolor='none')
fig.savefig('nature_figure.png', dpi=300, bbox_inches='tight',
facecolor='white', edgecolor='none')
plt.show()
```
## Tips for Each Library
### Matplotlib
- Use `fig.tight_layout()` or `constrained_layout=True` to prevent overlapping
- Set DPI to 300-600 for publication
- Use vector formats (PDF, EPS) for line plots
- Embed fonts in PDF/EPS files
### Seaborn
- Built on matplotlib, so all matplotlib customizations work
- Use `sns.set_style('ticks')` or `'whitegrid'` for clean looks
- `sns.despine()` removes top and right spines
- Set custom palette with `sns.set_palette()`
### Plotly
- Great for interactive exploratory analysis
- Export static images with `fig.write_image()` (requires kaleido package)
- Use `scale` parameter to control DPI (scale=3 ≈ 300 DPI)
- Update layout extensively for publication quality
## Common Workflow
1. **Explore with default settings**
2. **Apply publication configuration** (see Setup section)
3. **Create plot with appropriate size** (check journal requirements)
4. **Customize colors** (use colorblind-friendly palettes)
5. **Adjust fonts and line widths** (readable at final size)
6. **Remove chart junk** (top/right spines, excessive grid)
7. **Add clear labels with units**
8. **Test in grayscale**
9. **Save in multiple formats** (PDF for vector, PNG for raster)
10. **Verify in final context** (import into manuscript to check size)
## Resources
- Matplotlib documentation: https://matplotlib.org/
- Seaborn gallery: https://seaborn.pydata.org/examples/index.html
- Plotly documentation: https://plotly.com/python/
- Nature Methods Points of View: Data visualization column archive
FILE:references/publication_guidelines.md
# Publication-Ready Figure Guidelines
## Core Principles
Scientific figures must be clear, accurate, and accessible. Publication-ready figures follow these fundamental principles:
1. **Clarity**: Information should be immediately understandable
2. **Accuracy**: Data representation must be truthful and unmanipulated
3. **Accessibility**: Figures should be interpretable by all readers, including those with visual impairments
4. **Professional**: Clean, polished appearance suitable for peer-reviewed journals
## Resolution and File Format
### Resolution Requirements
- **Raster images (photos, microscopy)**: 300-600 DPI at final print size
- **Line art and graphs**: 600-1200 DPI (or vector format)
- **Combined figures**: 300-600 DPI
### File Formats
- **Vector formats (preferred for graphs/plots)**: PDF, EPS, SVG
- Infinitely scalable without quality loss
- Smaller file sizes for line art
- Best for: plots, diagrams, schematics
- **Raster formats**: TIFF, PNG (never JPEG for scientific data)
- Use for: photographs, microscopy, images with continuous tone
- TIFF: Lossless, widely accepted
- PNG: Lossless, good for web and supplementary materials
- **Never use JPEG**: Lossy compression introduces artifacts
### Size Specifications
- **Single column**: 85-90 mm (3.35-3.54 inches) width
- **1.5 column**: 114-120 mm (4.49-4.72 inches) width
- **Double column**: 174-180 mm (6.85-7.08 inches) width
- **Maximum height**: Usually 230-240 mm (9-9.5 inches)
## Typography
### Font Guidelines
- **Font family**: Sans-serif fonts (Arial, Helvetica, Calibri) for most journals
- Some journals prefer specific fonts (check guidelines)
- Consistency across all figures in manuscript
- **Font sizes at final print size**:
- Axis labels: 7-9 pt minimum
- Tick labels: 6-8 pt minimum
- Legends: 6-8 pt
- Panel labels (A, B, C): 8-12 pt, bold
- Title: Generally avoided in multi-panel figures
- **Font weight**: Regular weight for most text; bold for panel labels only
### Text Best Practices
- Use sentence case for axis labels ("Time (hours)" not "TIME (HOURS)")
- Include units in parentheses
- Avoid abbreviations unless space-constrained (define in caption)
- No text smaller than 5-6 pt at final size
## Color Usage
### Color Selection Principles
1. **Colorblind-friendly**: ~8% of males have color vision deficiency
- Avoid red/green combinations
- Use blue/orange, blue/yellow, or add texture/pattern
- Test with colorblindness simulators
2. **Purposeful color**: Color should convey meaning, not just aesthetics
- Use color to distinguish categories or highlight key data
- Maintain consistency across figures (same treatment = same color)
3. **Print considerations**:
- Colors may appear different in print vs. screen
- Use CMYK color space for print, RGB for digital
- Ensure sufficient contrast (especially for grayscale conversion)
### Recommended Color Palettes
- **Qualitative (categories)**: ColorBrewer, Okabe-Ito palette
- **Sequential (low to high)**: Viridis, Cividis, Blues, Oranges
- **Diverging (negative to positive)**: RdBu, PuOr, BrBG (ensure colorblind-safe)
### Grayscale Compatibility
- All figures should be interpretable in grayscale
- Use different line styles (solid, dashed, dotted) and markers
- Add patterns/hatching to bars and areas
## Layout and Composition
### Multi-Panel Figures
- **Panel labels**: Use bold uppercase letters (A, B, C) in top-left corner
- **Spacing**: Adequate white space between panels
- **Alignment**: Align panels along edges or axes where possible
- **Sizing**: Related panels should have consistent sizes
- **Arrangement**: Logical flow (left-to-right, top-to-bottom)
### Plot Elements
#### Axes
- **Axis lines**: 0.5-1 pt thickness
- **Tick marks**: Point inward or outward consistently
- **Tick frequency**: Enough to read values, not cluttered (typically 4-7 major ticks)
- **Axis labels**: Required on all plots; state units
- **Axis ranges**: Start from zero for bar charts (unless scientifically inappropriate)
#### Lines and Markers
- **Line width**: 1-2 pt for data lines; 0.5-1 pt for reference lines
- **Marker size**: 3-6 pt, larger than line width
- **Marker types**: Differentiate when multiple series (circles, squares, triangles)
- **Error bars**: 0.5-1 pt width; include caps if appropriate
#### Legends
- **Position**: Inside plot area if space permits, outside otherwise
- **Frame**: Optional; if used, thin line (0.5 pt)
- **Order**: Match order of data appearance (top to bottom or left to right)
- **Content**: Concise descriptions; full details in caption
### White Space and Margins
- Remove unnecessary white space around plots
- Maintain consistent margins
- `tight_layout()` or `constrained_layout=True` in matplotlib
## Data Representation Best Practices
### Statistical Rigor
- **Error bars**: Always show uncertainty (SD, SEM, CI) and state which in caption
- **Sample size**: Indicate n in figure or caption
- **Significance**: Mark statistical significance clearly (*, **, ***)
- **Replicates**: Show individual data points when possible, not just summary statistics
### Appropriate Chart Types
- **Bar plots**: Comparing discrete categories; always start y-axis at zero
- **Line plots**: Time series or continuous relationships
- **Scatter plots**: Correlation between variables; add regression line if appropriate
- **Box plots**: Distribution comparisons; show outliers
- **Heatmaps**: Matrix data, correlations, expression patterns
- **Violin plots**: Distribution shape comparison (better than box plots for bimodal data)
### Avoiding Distortion
- **No 3D effects**: Distorts perception of values
- **No unnecessary decorations**: No gradients, shadows, or chart junk
- **Consistent scales**: Use same scale for comparable panels
- **No truncated axes**: Unless clearly indicated and scientifically justified
- **Linear vs. log scales**: Choose appropriate scale; always label clearly
## Accessibility
### Colorblind Considerations
- Test with online simulators (e.g., Coblis, Color Oracle)
- Use patterns/textures in addition to color
- Provide alternative representations in supplementary materials if needed
### Visual Impairment
- High contrast between elements
- Thick enough lines (minimum 0.5 pt)
- Clear, uncluttered layouts
### Data Availability
- Include data tables in supplementary materials
- Provide source data files for graphs
- Consider interactive figures for online supplementary materials
## Common Mistakes to Avoid
1. **Font too small**: Text unreadable at final print size
2. **Low resolution**: Pixelated or blurry images
3. **Chart junk**: Unnecessary grid lines, 3D effects, decorations
4. **Poor color choices**: Red/green combinations, low contrast
5. **Missing elements**: No axis labels, no units, no error bars
6. **Inconsistent styling**: Different fonts/sizes within figure or between figures
7. **Data distortion**: Truncated axes, inappropriate scales, 3D effects
8. **JPEG compression**: Artifacts around text and lines
9. **Too much information**: Cramming too many data series into one plot
10. **Inaccessible legends**: Legends outside the figure boundary after export
## Figure Checklist
Before submission, verify:
- [ ] Resolution meets journal requirements (300+ DPI for raster)
- [ ] File format is acceptable (vector for plots, TIFF/PNG for images)
- [ ] Figure dimensions match journal specifications
- [ ] All text is readable at final size (minimum 6-7 pt)
- [ ] Fonts are consistent and embedded (for PDF/EPS)
- [ ] Colors are colorblind-friendly
- [ ] Figure is interpretable in grayscale
- [ ] All axes are labeled with units
- [ ] Error bars or uncertainty indicators are present
- [ ] Statistical significance is marked if applicable
- [ ] Panel labels are present and consistent (A, B, C)
- [ ] Legend is clear and complete
- [ ] No chart junk or unnecessary elements
- [ ] File naming follows journal conventions
- [ ] Figure caption is comprehensive
- [ ] Source data is available
## Journal-Specific Considerations
Always consult the specific journal's author guidelines. Common variations include:
- **Nature journals**: RGB, 300 DPI minimum, specific size requirements
- **Science**: EPS or high-res TIFF, specific font requirements
- **Cell Press**: PDF or EPS preferred, Arial or Helvetica fonts
- **PLOS**: TIFF or EPS, specific color space requirements
- **ACS journals**: Application files (AI, EPS) or high-res TIFF
See `journal_requirements.md` for detailed specifications from major publishers.
FILE:scripts/figure_export.py
#!/usr/bin/env python3
"""
Figure Export Utilities for Publication-Ready Scientific Figures
This module provides utilities to export matplotlib figures in publication-ready
formats with appropriate settings for various journals.
"""
import matplotlib.pyplot as plt
from pathlib import Path
from typing import List, Optional, Union
def save_publication_figure(
fig: plt.Figure,
filename: Union[str, Path],
formats: List[str] = ['pdf', 'png'],
dpi: int = 300,
transparent: bool = False,
bbox_inches: str = 'tight',
pad_inches: float = 0.1,
facecolor: str = 'white',
**kwargs
) -> List[Path]:
"""
Save a matplotlib figure in multiple formats with publication-quality settings.
Parameters
----------
fig : matplotlib.figure.Figure
The figure to save
filename : str or Path
Base filename (without extension)
formats : list of str, default ['pdf', 'png']
List of file formats to save. Options: 'pdf', 'png', 'eps', 'svg', 'tiff'
dpi : int, default 300
Resolution for raster formats (png, tiff). 300 DPI is minimum for most journals
transparent : bool, default False
If True, save with transparent background
bbox_inches : str, default 'tight'
Bounding box specification. 'tight' removes excess whitespace
pad_inches : float, default 0.1
Padding around the figure when bbox_inches='tight'
facecolor : str, default 'white'
Background color (ignored if transparent=True)
**kwargs
Additional keyword arguments passed to fig.savefig()
Returns
-------
list of Path
List of paths to saved files
Examples
--------
>>> fig, ax = plt.subplots()
>>> ax.plot([1, 2, 3], [1, 4, 9])
>>> save_publication_figure(fig, 'my_plot', formats=['pdf', 'png'], dpi=600)
['my_plot.pdf', 'my_plot.png']
"""
filename = Path(filename)
base_name = filename.stem
output_dir = filename.parent if filename.parent.exists() else Path.cwd()
saved_files = []
for fmt in formats:
output_file = output_dir / f"{base_name}.{fmt}"
# Set format-specific parameters
save_kwargs = {
'dpi': dpi,
'bbox_inches': bbox_inches,
'pad_inches': pad_inches,
'facecolor': facecolor if not transparent else 'none',
'edgecolor': 'none',
'transparent': transparent,
'format': fmt,
}
# Update with user-provided kwargs
save_kwargs.update(kwargs)
# Adjust DPI for vector formats (DPI less relevant)
if fmt in ['pdf', 'eps', 'svg']:
save_kwargs['dpi'] = min(dpi, 300) # Lower DPI for embedded rasters in vector
try:
fig.savefig(output_file, **save_kwargs)
saved_files.append(output_file)
print(f"✓ Saved: {output_file}")
except Exception as e:
print(f"✗ Failed to save {output_file}: {e}")
return saved_files
def save_for_journal(
fig: plt.Figure,
filename: Union[str, Path],
journal: str,
figure_type: str = 'combination'
) -> List[Path]:
"""
Save figure with journal-specific requirements.
Parameters
----------
fig : matplotlib.figure.Figure
The figure to save
filename : str or Path
Base filename (without extension)
journal : str
Journal name. Options: 'nature', 'science', 'cell', 'plos', 'acs', 'ieee'
figure_type : str, default 'combination'
Type of figure. Options: 'line_art', 'photo', 'combination'
Returns
-------
list of Path
List of paths to saved files
Examples
--------
>>> fig, ax = plt.subplots()
>>> ax.plot([1, 2, 3], [1, 4, 9])
>>> save_for_journal(fig, 'figure1', journal='nature', figure_type='line_art')
"""
journal = journal.lower()
# Define journal-specific requirements
journal_specs = {
'nature': {
'line_art': {'formats': ['pdf', 'eps'], 'dpi': 1000},
'photo': {'formats': ['tiff'], 'dpi': 300},
'combination': {'formats': ['pdf'], 'dpi': 600},
},
'science': {
'line_art': {'formats': ['eps', 'pdf'], 'dpi': 1000},
'photo': {'formats': ['tiff'], 'dpi': 300},
'combination': {'formats': ['eps'], 'dpi': 600},
},
'cell': {
'line_art': {'formats': ['pdf', 'eps'], 'dpi': 1000},
'photo': {'formats': ['tiff'], 'dpi': 300},
'combination': {'formats': ['pdf'], 'dpi': 600},
},
'plos': {
'line_art': {'formats': ['pdf', 'eps'], 'dpi': 600},
'photo': {'formats': ['tiff', 'png'], 'dpi': 300},
'combination': {'formats': ['tiff'], 'dpi': 300},
},
'acs': {
'line_art': {'formats': ['tiff', 'pdf'], 'dpi': 600},
'photo': {'formats': ['tiff'], 'dpi': 300},
'combination': {'formats': ['tiff'], 'dpi': 600},
},
'ieee': {
'line_art': {'formats': ['pdf', 'eps'], 'dpi': 600},
'photo': {'formats': ['tiff'], 'dpi': 300},
'combination': {'formats': ['pdf'], 'dpi': 300},
},
}
if journal not in journal_specs:
available = ', '.join(journal_specs.keys())
raise ValueError(f"Journal '{journal}' not recognized. Available: {available}")
if figure_type not in journal_specs[journal]:
available = ', '.join(journal_specs[journal].keys())
raise ValueError(f"Figure type '{figure_type}' not valid. Available: {available}")
specs = journal_specs[journal][figure_type]
print(f"Saving for {journal.upper()} ({figure_type}):")
print(f" Formats: {', '.join(specs['formats'])}")
print(f" DPI: {specs['dpi']}")
return save_publication_figure(
fig=fig,
filename=filename,
formats=specs['formats'],
dpi=specs['dpi']
)
def check_figure_size(fig: plt.Figure, journal: str = 'nature') -> dict:
"""
Check if figure dimensions are appropriate for journal requirements.
Parameters
----------
fig : matplotlib.figure.Figure
The figure to check
journal : str, default 'nature'
Journal name
Returns
-------
dict
Dictionary with figure dimensions and compliance status
Examples
--------
>>> fig = plt.figure(figsize=(3.5, 3))
>>> info = check_figure_size(fig, journal='nature')
>>> print(info)
"""
journal = journal.lower()
# Get figure dimensions in inches
width_inches, height_inches = fig.get_size_inches()
width_mm = width_inches * 25.4
height_mm = height_inches * 25.4
# Journal specifications (widths in mm)
specs = {
'nature': {'single': 89, 'double': 183, 'max_height': 247},
'science': {'single': 55, 'double': 175, 'max_height': 233},
'cell': {'single': 85, 'double': 178, 'max_height': 230},
'plos': {'single': 83, 'double': 173, 'max_height': 233},
'acs': {'single': 82.5, 'double': 178, 'max_height': 247},
}
if journal not in specs:
journal_spec = specs['nature']
print(f"Warning: Journal '{journal}' not found, using Nature specifications")
else:
journal_spec = specs[journal]
# Determine column type
column_type = None
width_ok = False
tolerance = 5 # mm tolerance
if abs(width_mm - journal_spec['single']) < tolerance:
column_type = 'single'
width_ok = True
elif abs(width_mm - journal_spec['double']) < tolerance:
column_type = 'double'
width_ok = True
height_ok = height_mm <= journal_spec['max_height']
result = {
'width_inches': width_inches,
'height_inches': height_inches,
'width_mm': width_mm,
'height_mm': height_mm,
'journal': journal,
'column_type': column_type,
'width_ok': width_ok,
'height_ok': height_ok,
'compliant': width_ok and height_ok,
'recommendations': {
'single_column_mm': journal_spec['single'],
'double_column_mm': journal_spec['double'],
'max_height_mm': journal_spec['max_height'],
}
}
# Print report
print(f"\n{'='*60}")
print(f"Figure Size Check for {journal.upper()}")
print(f"{'='*60}")
print(f"Current size: {width_mm:.1f} × {height_mm:.1f} mm")
print(f" ({width_inches:.2f} × {height_inches:.2f} inches)")
print(f"\n{journal.upper()} specifications:")
print(f" Single column: {journal_spec['single']} mm")
print(f" Double column: {journal_spec['double']} mm")
print(f" Max height: {journal_spec['max_height']} mm")
print(f"\nCompliance:")
print(f" Width: {'✓ OK' if width_ok else '✗ Non-standard'} ({column_type or 'custom'})")
print(f" Height: {'✓ OK' if height_ok else '✗ Too tall'}")
print(f" Overall: {'✓ COMPLIANT' if result['compliant'] else '✗ NEEDS ADJUSTMENT'}")
print(f"{'='*60}\n")
return result
def verify_font_embedding(pdf_path: Union[str, Path]) -> bool:
"""
Check if fonts are embedded in a PDF file.
Note: This requires PyPDF2 or a similar library to be installed.
Parameters
----------
pdf_path : str or Path
Path to PDF file
Returns
-------
bool
True if fonts are embedded, False otherwise
"""
try:
from PyPDF2 import PdfReader
except ImportError:
print("Warning: PyPDF2 not installed. Cannot verify font embedding.")
print("Install with: pip install PyPDF2")
return None
pdf_path = Path(pdf_path)
try:
reader = PdfReader(pdf_path)
# This is a simplified check; full verification is complex
print(f"PDF has {len(reader.pages)} page(s)")
print("Note: Full font embedding verification requires detailed PDF inspection.")
return True
except Exception as e:
print(f"Error reading PDF: {e}")
return False
if __name__ == "__main__":
# Example usage
import numpy as np
# Create example figure
fig, ax = plt.subplots(figsize=(3.5, 2.5))
x = np.linspace(0, 10, 100)
ax.plot(x, np.sin(x), label='sin(x)')
ax.plot(x, np.cos(x), label='cos(x)')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.legend()
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
# Check size
check_figure_size(fig, journal='nature')
# Save in multiple formats
print("\nSaving figure...")
save_publication_figure(fig, 'example_figure', formats=['pdf', 'png'], dpi=300)
# Save with journal-specific requirements
print("\nSaving for Nature...")
save_for_journal(fig, 'example_figure_nature', journal='nature', figure_type='line_art')
plt.close(fig)
FILE:scripts/style_presets.py
#!/usr/bin/env python3
"""
Matplotlib Style Presets for Publication-Ready Scientific Figures
This module provides pre-configured matplotlib styles optimized for
different journals and use cases.
"""
import matplotlib.pyplot as plt
import matplotlib as mpl
from typing import Optional, Dict, Any
# Okabe-Ito colorblind-friendly palette
OKABE_ITO_COLORS = [
'#E69F00', # Orange
'#56B4E9', # Sky Blue
'#009E73', # Bluish Green
'#F0E442', # Yellow
'#0072B2', # Blue
'#D55E00', # Vermillion
'#CC79A7', # Reddish Purple
'#000000' # Black
]
# Paul Tol palettes
TOL_BRIGHT = ['#4477AA', '#EE6677', '#228833', '#CCBB44', '#66CCEE', '#AA3377', '#BBBBBB']
TOL_MUTED = ['#332288', '#88CCEE', '#44AA99', '#117733', '#999933', '#DDCC77', '#CC6677', '#882255', '#AA4499']
TOL_HIGH_CONTRAST = ['#004488', '#DDAA33', '#BB5566']
# Wong palette
WONG_COLORS = ['#000000', '#E69F00', '#56B4E9', '#009E73', '#F0E442', '#0072B2', '#D55E00', '#CC79A7']
def get_base_style() -> Dict[str, Any]:
"""
Get base publication-quality style settings.
Returns
-------
dict
Dictionary of matplotlib rcParams
"""
return {
# Figure
'figure.dpi': 100, # Display DPI (changed on save)
'figure.facecolor': 'white',
'figure.autolayout': False,
'figure.constrained_layout.use': True,
# Font
'font.size': 8,
'font.family': 'sans-serif',
'font.sans-serif': ['Arial', 'Helvetica', 'DejaVu Sans'],
# Axes
'axes.linewidth': 0.5,
'axes.labelsize': 9,
'axes.titlesize': 9,
'axes.labelweight': 'normal',
'axes.spines.top': False,
'axes.spines.right': False,
'axes.spines.left': True,
'axes.spines.bottom': True,
'axes.edgecolor': 'black',
'axes.labelcolor': 'black',
'axes.axisbelow': True,
'axes.prop_cycle': mpl.cycler(color=OKABE_ITO_COLORS),
# Grid
'axes.grid': False,
# Ticks
'xtick.major.size': 3,
'xtick.minor.size': 2,
'xtick.major.width': 0.5,
'xtick.minor.width': 0.5,
'xtick.labelsize': 7,
'xtick.direction': 'out',
'ytick.major.size': 3,
'ytick.minor.size': 2,
'ytick.major.width': 0.5,
'ytick.minor.width': 0.5,
'ytick.labelsize': 7,
'ytick.direction': 'out',
# Lines
'lines.linewidth': 1.5,
'lines.markersize': 4,
'lines.markeredgewidth': 0.5,
# Legend
'legend.fontsize': 7,
'legend.frameon': False,
'legend.loc': 'best',
# Savefig
'savefig.dpi': 300,
'savefig.format': 'pdf',
'savefig.bbox': 'tight',
'savefig.pad_inches': 0.05,
'savefig.transparent': False,
'savefig.facecolor': 'white',
# Image
'image.cmap': 'viridis',
'image.aspect': 'auto',
}
def apply_publication_style(style_name: str = 'default') -> None:
"""
Apply a pre-configured publication style.
Parameters
----------
style_name : str, default 'default'
Name of the style to apply. Options:
- 'default': General publication style
- 'nature': Nature journal style
- 'science': Science journal style
- 'cell': Cell Press style
- 'minimal': Minimal clean style
- 'presentation': Larger fonts for presentations
Examples
--------
>>> apply_publication_style('nature')
>>> fig, ax = plt.subplots()
>>> ax.plot([1, 2, 3], [1, 4, 9])
"""
base_style = get_base_style()
# Style-specific modifications
if style_name == 'nature':
base_style.update({
'font.size': 7,
'axes.labelsize': 8,
'axes.titlesize': 8,
'xtick.labelsize': 6,
'ytick.labelsize': 6,
'legend.fontsize': 6,
'savefig.dpi': 600,
})
elif style_name == 'science':
base_style.update({
'font.size': 7,
'axes.labelsize': 8,
'xtick.labelsize': 6,
'ytick.labelsize': 6,
'legend.fontsize': 6,
'savefig.dpi': 600,
})
elif style_name == 'cell':
base_style.update({
'font.size': 8,
'axes.labelsize': 9,
'xtick.labelsize': 7,
'ytick.labelsize': 7,
'legend.fontsize': 7,
'savefig.dpi': 600,
})
elif style_name == 'minimal':
base_style.update({
'axes.linewidth': 0.8,
'xtick.major.width': 0.8,
'ytick.major.width': 0.8,
'lines.linewidth': 2,
})
elif style_name == 'presentation':
base_style.update({
'font.size': 14,
'axes.labelsize': 16,
'axes.titlesize': 18,
'xtick.labelsize': 12,
'ytick.labelsize': 12,
'legend.fontsize': 12,
'axes.linewidth': 1.5,
'lines.linewidth': 2.5,
'lines.markersize': 8,
})
elif style_name != 'default':
print(f"Warning: Style '{style_name}' not recognized. Using 'default'.")
# Apply the style
plt.rcParams.update(base_style)
print(f"✓ Applied '{style_name}' publication style")
def set_color_palette(palette_name: str = 'okabe_ito') -> None:
"""
Set a colorblind-friendly color palette.
Parameters
----------
palette_name : str, default 'okabe_ito'
Name of the palette. Options:
- 'okabe_ito': Okabe-Ito palette (8 colors)
- 'wong': Wong palette (8 colors)
- 'tol_bright': Paul Tol bright palette (7 colors)
- 'tol_muted': Paul Tol muted palette (9 colors)
- 'tol_high_contrast': Paul Tol high contrast (3 colors)
Examples
--------
>>> set_color_palette('tol_muted')
>>> fig, ax = plt.subplots()
>>> for i in range(5):
... ax.plot([1, 2, 3], [i, i+1, i+2])
"""
palettes = {
'okabe_ito': OKABE_ITO_COLORS,
'wong': WONG_COLORS,
'tol_bright': TOL_BRIGHT,
'tol_muted': TOL_MUTED,
'tol_high_contrast': TOL_HIGH_CONTRAST,
}
if palette_name not in palettes:
available = ', '.join(palettes.keys())
print(f"Warning: Palette '{palette_name}' not found. Available: {available}")
palette_name = 'okabe_ito'
colors = palettes[palette_name]
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=colors)
print(f"✓ Applied '{palette_name}' color palette ({len(colors)} colors)")
def configure_for_journal(journal: str, figure_width: str = 'single') -> None:
"""
Configure matplotlib for a specific journal.
Parameters
----------
journal : str
Journal name: 'nature', 'science', 'cell', 'plos', 'acs', 'ieee'
figure_width : str, default 'single'
Figure width: 'single' or 'double' column
Examples
--------
>>> configure_for_journal('nature', figure_width='single')
>>> fig, ax = plt.subplots() # Will have correct size for Nature
"""
journal = journal.lower()
# Journal specifications
journal_configs = {
'nature': {
'single_width': 89, # mm
'double_width': 183,
'style': 'nature',
},
'science': {
'single_width': 55,
'double_width': 175,
'style': 'science',
},
'cell': {
'single_width': 85,
'double_width': 178,
'style': 'cell',
},
'plos': {
'single_width': 83,
'double_width': 173,
'style': 'default',
},
'acs': {
'single_width': 82.5,
'double_width': 178,
'style': 'default',
},
'ieee': {
'single_width': 89,
'double_width': 182,
'style': 'default',
},
}
if journal not in journal_configs:
available = ', '.join(journal_configs.keys())
raise ValueError(f"Journal '{journal}' not recognized. Available: {available}")
config = journal_configs[journal]
# Apply style
apply_publication_style(config['style'])
# Set default figure size
width_mm = config['single_width'] if figure_width == 'single' else config['double_width']
width_inches = width_mm / 25.4
plt.rcParams['figure.figsize'] = (width_inches, width_inches * 0.75) # 4:3 aspect ratio
print(f"✓ Configured for {journal.upper()} ({figure_width} column: {width_mm} mm)")
def create_style_template(output_file: str = 'publication.mplstyle') -> None:
"""
Create a matplotlib style file that can be used with plt.style.use().
Parameters
----------
output_file : str, default 'publication.mplstyle'
Output filename for the style file
Examples
--------
>>> create_style_template('my_style.mplstyle')
>>> plt.style.use('my_style.mplstyle')
"""
style = get_base_style()
with open(output_file, 'w') as f:
f.write("# Publication-quality matplotlib style\n")
f.write("# Usage: plt.style.use('publication.mplstyle')\n\n")
for key, value in style.items():
if isinstance(value, mpl.cycler):
# Handle cycler specially
colors = [c['color'] for c in value]
f.write(f"axes.prop_cycle : cycler('color', {colors})\n")
else:
f.write(f"{key} : {value}\n")
print(f"✓ Created style template: {output_file}")
print(f" Use with: plt.style.use('{output_file}')")
def show_color_palettes() -> None:
"""
Display available color palettes for visual inspection.
"""
palettes = {
'Okabe-Ito': OKABE_ITO_COLORS,
'Wong': WONG_COLORS,
'Tol Bright': TOL_BRIGHT,
'Tol Muted': TOL_MUTED,
'Tol High Contrast': TOL_HIGH_CONTRAST,
}
fig, axes = plt.subplots(len(palettes), 1, figsize=(8, len(palettes) * 0.5))
for ax, (name, colors) in zip(axes, palettes.items()):
ax.set_xlim(0, len(colors))
ax.set_ylim(0, 1)
ax.set_yticks([])
ax.set_xticks([])
ax.set_ylabel(name, fontsize=10)
for i, color in enumerate(colors):
ax.add_patch(plt.Rectangle((i, 0), 1, 1, facecolor=color, edgecolor='black', linewidth=0.5))
# Add hex code
ax.text(i + 0.5, 0.5, color, ha='center', va='center',
fontsize=7, color='white' if i >= len(colors) - 1 else 'black')
fig.suptitle('Colorblind-Friendly Palettes', fontsize=12, fontweight='bold')
plt.tight_layout()
plt.show()
def reset_to_default() -> None:
"""
Reset matplotlib to default settings.
"""
mpl.rcdefaults()
print("✓ Reset to matplotlib defaults")
if __name__ == "__main__":
print("Matplotlib Style Presets for Scientific Figures")
print("=" * 50)
# Show available styles
print("\nAvailable publication styles:")
print(" - default")
print(" - nature")
print(" - science")
print(" - cell")
print(" - minimal")
print(" - presentation")
print("\nAvailable color palettes:")
print(" - okabe_ito (recommended)")
print(" - wong")
print(" - tol_bright")
print(" - tol_muted")
print(" - tol_high_contrast")
print("\nExample usage:")
print(" from style_presets import apply_publication_style, set_color_palette")
print(" apply_publication_style('nature')")
print(" set_color_palette('okabe_ito')")
# Create example figure
print("\nGenerating example figure with 'default' style...")
apply_publication_style('default')
fig, ax = plt.subplots(figsize=(3.5, 2.5))
for i in range(5):
ax.plot([1, 2, 3, 4], [i, i+1, i+0.5, i+2], marker='o', label=f'Series {i+1}')
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Response (AU)')
ax.legend()
fig.suptitle('Example with Publication Style')
plt.tight_layout()
plt.show()
# Show color palettes
print("\nDisplaying color palettes...")
show_color_palettes()