: github

Latest version

01.09.2023 - v1.0.0, get it here

download links for the new release updated, but gradually updating the rest of the user guide. Check what have changed in the Releases sheet

Manual for PipeCraft2

PipeCraft2 is a Graphical User Interface (GUI) software that implements various popular tools for metabarcoding data analyses.

Implements various ready-to-run (pre-defined) pipelines as well as an option to run a variety of individual steps outside of a full-pipeline.

(click on the image for enlargement)

Software settings for pipeline processes contain key options for metabarcoding sequence data analyses, but all options of any implemented program may be accessed via PipeCraft console (command line).

Default settings in the panels represent commonly used options for amplicon sequence data analyses, which may be tailored according to user experience or needs. Custom-designed pipeline settings can be saved, and thus the exact same pipeline may be easily re-run on other sequencing data (and for reproducibility, may be used as a supplement material in the manuscript). PipeCraft enables executing the full pipeline (user specifies the input, and output will be e.g. OTU/ASV table with taxonomic annotations of the generated features), but supports also single-step mode (Quick Tools panel) where analyses may be performed in a step-by-step manner (e.g. perform quality filtering, then examine the output and decide whether to adjust the quality filtering options of to proceed with next step, e.g. with chimera filtering step).

Glossary

List of terms that you may encounter in this user guide.

working directory	the directory (folder) that contains the files for the analyses. The outputs will be written into this directory
paired-end data	obtained by sequencing two ends of the same DNA fragment, which results in read 1 (R1) and read 2 (R2) files per library or per sample. Note that PipeCraft expects that read 1 file contains the string R1 and read 2 contains R2 (not e.g. my_sample_L001_1.fastq / my_sample_L001_2.fastq)
single-end data	only one sequencing file per library or per sample. Herein, may mean also assembled paired-end data.
demultiplexed data	sequences are sorted into separate files, representing individual samples
multiplexed data	file(s) that represent a pool of sequences from different samples
read/sequence	DNA sequence; herein, reads and sequences are used interchangeably

Docker images

All the processes are run through docker, where the PipeCraft’s GUI simply mediates the information exchange. Therefore, whenever a process is initiated for the first time, a relevant Docker image (contains required software for the analyses step) will be pulled from Docker Hub. Initial PipeCraft2 installation does not contain any software for sequence data processing.

Example: when running DEMULTIPLEXING for the first time

Thus working Internet connection is initially required. Once the Docker images are pulled, PipeCraft2 can work without an Internet connection.

Docker images vary in size, and the speed of the first process is extended by the docker image download time.

Save workflow

Once the workflow settings are selected, save the workflow by pressin SAVE WORKFLOW button on the right-ribbon.

Note

starting from version 0.1.4, PipeCraft2 will automatically save the settings into selected WORKDIR prior starting the analyses (file name = “pipecraft2_config.json”)

Important

When saiving workflow settings in Linux, specify the file extension as json (e.g. my_16S_ASVs_pipe.json). When trying to load the workflow, only .JSON files will be permitted as input. Windows and Mac OS automatically extend files as json (so you may just save “my_16S_ASVs_pipe”).

Load workflow

Note

Prior loading the workflow, make sure that the saved workflow configuration has a .json extension. Note also that workflows saved in older PipeCraft2 version might not run in newer version, but anyhow the selected options will be visible.

Press the LOAD WORKFLOW button on the right-ribbon and select appropriate JSON file. The configuration will be loaded; SELECT WORKDIR and run analyses.

Contents of this user guide

Manual may contain some typos! Fixing those on the way.

Currently implemented software

See software version on the ‘Releases’ page

Software	Reference	Task
docker	https://www.docker.com	building, sharing and running applications
DADA2	Callahan et. al 2016	full pipeline operations
vsearch	Rognes et. al 2016	quality filtering, assemble paired-end reads, chimera filtering, clustering
NextITS	Mikryukov et. al	pipeline for fungal full-ITS (PacBio); not available in Mac version of PipeCraft2
trimmomatic	Bolger et al. 2014	quality filtering
fastp	Chen et al. 2018	quality filtering
seqkit	Shen et al. 2016	multiple sequence manipulation operations
cutadapt	Martin 2011	demultiplexing, cut primers
biopython	Cock et al. 2009	multiple sequence manipulation operations
GNU Parallel	Tangle 2021	executing jobs in parallel
mothur	Schloss et al. 2009	submodule in ITSx to make unique and deunique seqs
ITS Extractor	Bengtsson-Palme et al. 2013	extract ITS regions
fqgrep	Indraniel Das 2011	core for reorient reads
BLAST	Camacho et al. 2009	assign taxonomy
RDP classifier	Wang et al. 2007	assign taxonomy
ORFfinder	NCBI Tool	finding open reading frames of protein coding genes (filtering pseudogenes/off-targets)
HMMER	Web site	HMM based filtering if the sequences (filtering pseudogenes/off-targets)
FastQC	Andrews 2019	QualityCheck module
MultiQC	Ewels et al. 2016	QualityCheck module
LULU	Frøslev et al. 2017	post-clustering curation
DEICODE	Martino et al. 2019	dissimilarity analysis

Let us know if you would like to have a specific software implemeted to PipeCraft (contacts) or create an issue in the main repository.