How to create a workflow

Snakevir allows you to build a workflow using a simple config.yaml configuration file.

To create this file, just run:

snakevir make_config

The command make_config is used for create config fime at yaml format for snakevir. You have 2 choice, you can use arguement for write all information needed in config or you can only use some argument (-o is mandatory) and wirte in the file after the missing information.

snakevir make_config [OPTIONS]

Options

-o, --output <output>

Required Path of the output file with ‘.yaml’ extension (config.yml needed for snakevir.

-d, --output_directory <output_directory>

Path of the output directory for Snakevir’s result.

-n, --name <name>

Name of run (ex : HNXXXXXX)

-f, --fastq <fastq>

Path to the fastq directory

-g, --host_genome <host_genome>

Path to the genome host at fasta format

--r1 <r1>

Type of your R1 fastq files contains in FASTQ directory (for exemple : ‘_R1’ or ‘_1’, etc. )

Default:

_1

--r2 <r2>

Type of your R2 fastq files contains in FASTQ directory (for exemple : ‘_R2’ or ‘_2’, etc. )

Default:

_2

--ext <ext>

Etension of your reads files in the FASTQ directory (for exemple : ‘.fastq.gz’ or ‘.fq’, etc.)

Default:

.fastq.gz

--path_diamond_nr <path_diamond_nr>

Path to the diamond nr database

--path_blast_nt <path_blast_nt>

Path to the blast nt database

--A3 <a3>

Sequence of Adapter in 3’

Default:

CAGCGGACGCCTATGTGATG

--A5 <a5>

Sequence of Adapter in 5’

Default:

CATCACATAGGCGTCCGCTG

If you didn’t use the command for complete your config file, you can edit your config file :

Edit config file

1. Analysis name

First, give a name at your analysis

DATA:
    run: 'HNXXXXXX'

2. Fastq params

Then, indicate the data path and extension for fastq data :

DATA:
    fastq: '/path/to/fastq/directory/'
    ext_R1: "_1"
    ext_R2: "_2"
    ext: ".fastq.gz"

3. Database path

Indicate the database path for diamond and blast:

DATA:
    base_nr: "/PATH/TO/DIAMOND/NR/DATABASE"
    base_nt: "/PATH/TO/NT/DATABASXE"

Summary table

Find here a summary table with the description of each data needed to run snakevir :

Input

Description

run

A name for the run.

fastq

Path to the fastq directory which contains the paired fastq for each sample.

ext_R1

Type of your R1 fastq files contains in FASTQ directory (for exemple : ‘_R1’ or ‘_1’, etc.).

ext_R2

Type of your R2 fastq files contains in FASTQ directory (for exemple : ‘_R2’ or ‘_2’, etc.).

ext

Etension of your reads files in the FASTQ directory (for exemple : ‘.fastq.gz’ or ‘.fq’, etc.).

base_nr

Path of the Diamond specific protein database built from NCBI nr database.

base_nt

Path of the blast specific protein database built from NCBI nt database.


How to run the workflow

Before attempting to run snakevir, please verify that you have already modified the config.yaml file as explained in Edit config file.

If you installed snakevir and create the config file, you can now run:

snakevir run

Run the snbakevir workflow.

snakevir run [OPTIONS] [OTHER_SNAKEMAKE_OPTION]...

Options

-c, --config <config>

Required Path of config file

Arguments

OTHER_SNAKEMAKE_OPTION

Optional argument(s)


Output on Snakevir

The architecture of the Snakevir output is designed as follow:

Tree output