Alternative text

github

Latest version

01.09.2023 - v1.0.0, get it here

download links for the new release updated, but gradually updating the rest of the user guide. Check what have changed in the Releases sheet

Manual for PipeCraft2

PipeCraft2 is a Graphical User Interface (GUI) software that implements various popular tools for metabarcoding data analyses.
Implements various ready-to-run (pre-defined) pipelines as well as an option to run a variety of individual steps outside of a full-pipeline.

(click on the image for enlargement) Alternative text

Software settings for pipeline processes contain key options for metabarcoding sequence data analyses, but all options of any implemented program may be accessed via PipeCraft console (command line).
Default settings in the panels represent commonly used options for amplicon sequence data analyses, which may be tailored according to user experience or needs. Custom-designed pipeline settings can be saved, and thus the exact same pipeline may be easily re-run on other sequencing data (and for reproducibility, may be used as a supplement material in the manuscript). PipeCraft enables executing the full pipeline (user specifies the input, and output will be e.g. OTU/ASV table with taxonomic annotations of the generated features), but supports also single-step mode (Quick Tools panel) where analyses may be performed in a step-by-step manner (e.g. perform quality filtering, then examine the output and decide whether to adjust the quality filtering options of to proceed with next step, e.g. with chimera filtering step).

Glossary

List of terms that you may encounter in this user guide.

working directory

the directory (folder) that contains the files for the analyses.
The outputs will be written into this directory

paired-end data

obtained by sequencing two ends of the same DNA fragment,
which results in read 1 (R1) and read 2 (R2) files per library or per sample.
Note that PipeCraft expects that read 1 file contains the string R1
and read 2 contains R2
(not e.g. my_sample_L001_1.fastq / my_sample_L001_2.fastq)

single-end data

only one sequencing file per library or per sample.
Herein, may mean also assembled paired-end data.

demultiplexed data

sequences are sorted into separate files, representing individual samples

multiplexed data

file(s) that represent a pool of sequences from different samples

read/sequence

DNA sequence; herein, reads and sequences are used interchangeably

Docker images

All the processes are run through docker, where the PipeCraft’s GUI simply mediates the information exchange. Therefore, whenever a process is initiated for the first time, a relevant Docker image (contains required software for the analyses step) will be pulled from Docker Hub. Initial PipeCraft2 installation does not contain any software for sequence data processing.

Example: when running DEMULTIPLEXING for the first time Alternative text

Thus working Internet connection is initially required. Once the Docker images are pulled, PipeCraft2 can work without an Internet connection.

Docker images vary in size, and the speed of the first process is extended by the docker image download time.


Save workflow

Once the workflow settings are selected, save the workflow by pressin SAVE WORKFLOW button on the right-ribbon.

Note

starting from version 0.1.4, PipeCraft2 will automatically save the settings into selected WORKDIR prior starting the analyses (file name = “pipecraft2_config.json”)

Important

When saiving workflow settings in Linux, specify the file extension as json (e.g. my_16S_ASVs_pipe.json). When trying to load the workflow, only .JSON files will be permitted as input. Windows and Mac OS automatically extend files as json (so you may just save “my_16S_ASVs_pipe”).


Load workflow

Note

Prior loading the workflow, make sure that the saved workflow configuration has a .json extension. Note also that workflows saved in older PipeCraft2 version might not run in newer version, but anyhow the selected options will be visible.

Press the LOAD WORKFLOW button on the right-ribbon and select appropriate JSON file. The configuration will be loaded; SELECT WORKDIR and run analyses.

Contents of this user guide



Manual may contain some typos! Fixing those on the way.


Currently implemented software

See software version on the ‘Releases’ page

Software

Reference

Task

docker

https://www.docker.com

building, sharing and running applications

DADA2

Callahan et. al 2016

full pipeline operations

vsearch

Rognes et. al 2016

quality filtering, assemble paired-end reads, chimera filtering, clustering

NextITS

Mikryukov et. al

pipeline for fungal full-ITS (PacBio); not available in Mac version of PipeCraft2

trimmomatic

Bolger et al. 2014

quality filtering

fastp

Chen et al. 2018

quality filtering

seqkit

Shen et al. 2016

multiple sequence manipulation operations

cutadapt

Martin 2011

demultiplexing, cut primers

biopython

Cock et al. 2009

multiple sequence manipulation operations

GNU Parallel

Tangle 2021

executing jobs in parallel

mothur

Schloss et al. 2009

submodule in ITSx to make unique and deunique seqs

ITS Extractor

Bengtsson-Palme et al. 2013

extract ITS regions

fqgrep

Indraniel Das 2011

core for reorient reads

BLAST

Camacho et al. 2009

assign taxonomy

RDP classifier

Wang et al. 2007

assign taxonomy

ORFfinder

NCBI Tool

finding open reading frames of protein coding genes (filtering pseudogenes/off-targets)

HMMER

Web site

HMM based filtering if the sequences (filtering pseudogenes/off-targets)

FastQC

Andrews 2019

QualityCheck module

MultiQC

Ewels et al. 2016

QualityCheck module

LULU

Frøslev et al. 2017

post-clustering curation

DEICODE

Martino et al. 2019

dissimilarity analysis

Let us know if you would like to have a specific software implemeted to PipeCraft (contacts) or create an issue in the main repository.