IMAP snakemake workflow series

General Overview

IMAP stands for Integrated Microbiome Analysis Pipelines. IMAP comprises different parts. Each part represents a standalone GitHub repository. The IMAP parts, when used sequentially, provide a systematic microbiome data analysis beyond the traditional methods.



IMAP tentative parts: Each part forms a standalone git repository containing similar project stucture.


IMAP approach

  • We use the snakemake workflow management system[1,2] for:
    • Maintaining reproducibility in technical validation and regeneration of results.
    • Creating scalable data analysis scaled to a server, grid, or cloud environment.
    • Fostering sustainable improvement of the microbiome data analysis.
  • We break complex workflows into small contiguous but related chunks where each major step forms a separate executable snakemake rule.


Mission and Vision

We envision fostering continuous integration and improvement of highly reproducible and sustainable workflows for microbiome data analysis.

IMAP Project Structure

Note: This structure shows the basic folders and their content. Some folders or files may be removed or add new ones accordingly.

IMAP_Project_Directories
├── LICENSE.md
├── README.md
├── config
│   ├── config.yaml
│   ├── samples.tsv
│   └── units.tsv
├── data
│   ├── metadata
│   │   └── metadata.csv
│   └── reads
├── figures
│   ├── fig.pdf
│   ├── fig.png
│   └── fig.svg
├── images
│   ├── img.pdf
│   ├── img.png
│   └── img.svg
├── index.Rmd
├── library
│   ├── apa.csl
│   ├── imap.bib
│   └── references.bib
├── resources
├── results
└── workflow
    ├── Snakefile
    ├── envs
    │   ├── pipeline.yml
    │   └── tool.yml
    ├── notebooks
    │   ├── jnb.py.ipynb
    │   └── jnb.r.ipynb
    ├── reports
    │   ├── plot1.rst
    │   └── plot2.rst
    ├── rules
    │   ├── rule1.smk
    │   └── rule2.smk
    ├── schemas
    │   ├── schm1.yml
    │   └── schm2.yml
    └── scripts
        ├── Rmd.Rmd
        ├── bash.sh
        ├── python.py
        └── rscript.R

16 directories, 31 files

Potential Workflows

Repo Description Status
IMAP-GLIMPSE IMAP project overview In-progress
IMAP-PART 01 Software requirement for microbiome data analysis with Snakemake workflows In-progress
IMAP-PART 02 Downloading and exploring microbiome sample metadata from SRA Database In-progress
IMAP-PART 03 Downloading and filtering microbiome sequencing data from SRA database In-progress
IMAP-PART 04 Quality Control of Microbiome Next Generation Sequencing Reads In-progress
IMAP-PART 05 Bioinformatics & classification of preprocessed microbiome sequencing data In-progress
IMAP-PART 06 In-progress
IMAP-PART 07 In-progress
IMAP-PART 08 In-progress

Citation

Please consider citing the iMAP article[3] if you find any part of the IMAP practical user guides helpful in your microbiome data analysis.


References

[1]
Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., … Nahnsen, S. (2021). Sustainable data analysis with snakemake. F1000Research, 10. https://doi.org/10.12688/f1000research.29032.2
[2]
Snakemake. (2023). Snakemake. Retrieved from https://snakemake.readthedocs.io/en/stable
[3]
Buza, T. M., Tonui, T., Stomeo, F., Tiambo, C., Katani, R., Schilling, M., … Kapur, V. (2019). iMAP: An integrated bioinformatics and visualization pipeline for microbiome data analysis. BMC Bioinformatics, 20. https://doi.org/10.1186/S12859-019-2965-4