How to create a workflow

Snakevir allows you to build a workflow using a simple config.yaml configuration file.

To create this file, just run:

snakevir make_config

The command make_config is used for create config fime at yaml format for snakevir. You have 2 choice, you can use arguement for write all information needed in config or you can only use some argument (-o is mandatory) and wirte in the file after the missing information.

snakevir make_config [OPTIONS]

Options

-o, --output <output>: Required Path of the output file with ‘.yaml’ extension (config.yml needed for snakevir.

-d, --output_directory <output_directory>: Path of the output directory for Snakevir’s result.

-n, --name <name>: Name of run (ex : HNXXXXXX)

-f, --fastq <fastq>: Path to the fastq directory

-g, --host_genome <host_genome>: Path to the genome host at fasta format

--r1 <r1>

Type of your R1 fastq files contains in FASTQ directory (for exemple : ‘_R1’ or ‘_1’, etc. )

Default:: _1

--r2 <r2>

Type of your R2 fastq files contains in FASTQ directory (for exemple : ‘_R2’ or ‘_2’, etc. )

Default:: _2

--ext <ext>

Etension of your reads files in the FASTQ directory (for exemple : ‘.fastq.gz’ or ‘.fq’, etc.)

Default:: .fastq.gz

--path_diamond_nr <path_diamond_nr>: Path to the diamond nr database

--path_blast_nt <path_blast_nt>: Path to the blast nt database

--A3 <a3>

Sequence of Adapter in 3’

Default:: CAGCGGACGCCTATGTGATG

--A5 <a5>

Sequence of Adapter in 5’

Default:: CATCACATAGGCGTCCGCTG

If you didn’t use the command for complete your config file, you can edit your config file :

Edit config file

1. Analysis name

First, give a name at your analysis

DATA:
    run: 'HNXXXXXX'

2. Fastq params

Then, indicate the data path and extension for fastq data :

DATA:
    fastq: '/path/to/fastq/directory/'
    ext_R1: "_1"
    ext_R2: "_2"
    ext: ".fastq.gz"

3. Database path

Indicate the database path for diamond and blast:

DATA:
    base_nr: "/PATH/TO/DIAMOND/NR/DATABASE"
    base_nt: "/PATH/TO/NT/DATABASXE"

Summary table

Find here a summary table with the description of each data needed to run snakevir :

Input	Description
run	A name for the run.
fastq	Path to the fastq directory which contains the paired fastq for each sample.
ext_R1	Type of your R1 fastq files contains in FASTQ directory (for exemple : ‘_R1’ or ‘_1’, etc.).
ext_R2	Type of your R2 fastq files contains in FASTQ directory (for exemple : ‘_R2’ or ‘_2’, etc.).
ext	Etension of your reads files in the FASTQ directory (for exemple : ‘.fastq.gz’ or ‘.fq’, etc.).
base_nr	Path of the Diamond specific protein database built from NCBI nr database.
base_nt	Path of the blast specific protein database built from NCBI nt database.

How to run the workflow

Before attempting to run snakevir, please verify that you have already modified the config.yaml file as explained in Edit config file.

If you installed snakevir and create the config file, you can now run:

snakevir run

Run the snbakevir workflow.

snakevir run [OPTIONS] [OTHER_SNAKEMAKE_OPTION]...

Options

-c, --config <config>: Required Path of config file

Arguments

OTHER_SNAKEMAKE_OPTION: Optional argument(s)

Output on Snakevir

The architecture of the Snakevir output is designed as follow: