How to create a workflow
Snakevir allows you to build a workflow using a simple config.yaml configuration file.
To create this file, just run:
snakevir make_config
The command make_config is used for create config fime at yaml format for snakevir. You have 2 choice, you can use arguement for write all information needed in config or you can only use some argument (-o is mandatory) and wirte in the file after the missing information.
snakevir make_config [OPTIONS]
Options
- -o, --output <output>
Required Path of the output file with ‘.yaml’ extension (config.yml needed for snakevir.
- -d, --output_directory <output_directory>
Path of the output directory for Snakevir’s result.
- -n, --name <name>
Name of run (ex : HNXXXXXX)
- -f, --fastq <fastq>
Path to the fastq directory
- -g, --host_genome <host_genome>
Path to the genome host at fasta format
- --r1 <r1>
Type of your R1 fastq files contains in FASTQ directory (for exemple : ‘_R1’ or ‘_1’, etc. )
- Default:
_1
- --r2 <r2>
Type of your R2 fastq files contains in FASTQ directory (for exemple : ‘_R2’ or ‘_2’, etc. )
- Default:
_2
- --ext <ext>
Etension of your reads files in the FASTQ directory (for exemple : ‘.fastq.gz’ or ‘.fq’, etc.)
- Default:
.fastq.gz
- --path_diamond_nr <path_diamond_nr>
Path to the diamond nr database
- --path_blast_nt <path_blast_nt>
Path to the blast nt database
- --A3 <a3>
Sequence of Adapter in 3’
- Default:
CAGCGGACGCCTATGTGATG
- --A5 <a5>
Sequence of Adapter in 5’
- Default:
CATCACATAGGCGTCCGCTG
If you didn’t use the command for complete your config file, you can edit your config file :
Edit config file
1. Analysis name
First, give a name at your analysis
DATA:
run: 'HNXXXXXX'
2. Fastq params
Then, indicate the data path and extension for fastq data :
DATA:
fastq: '/path/to/fastq/directory/'
ext_R1: "_1"
ext_R2: "_2"
ext: ".fastq.gz"
3. Database path
Indicate the database path for diamond and blast:
DATA:
base_nr: "/PATH/TO/DIAMOND/NR/DATABASE"
base_nt: "/PATH/TO/NT/DATABASXE"
Summary table
Find here a summary table with the description of each data needed to run snakevir :
Input |
Description |
|---|---|
run |
A name for the run. |
fastq |
Path to the fastq directory which contains the paired fastq for each sample. |
ext_R1 |
Type of your R1 fastq files contains in FASTQ directory (for exemple : ‘_R1’ or ‘_1’, etc.). |
ext_R2 |
Type of your R2 fastq files contains in FASTQ directory (for exemple : ‘_R2’ or ‘_2’, etc.). |
ext |
|
base_nr |
|
base_nt |
Path of the blast specific protein database built from NCBI nt database. |
How to run the workflow
Before attempting to run snakevir, please verify that you have already modified the config.yaml file as explained in Edit config file.
If you installed snakevir and create the config file, you can now run:
snakevir run
Run the snbakevir workflow.
snakevir run [OPTIONS] [OTHER_SNAKEMAKE_OPTION]...
Options
- -c, --config <config>
Required Path of config file
Arguments
- OTHER_SNAKEMAKE_OPTION
Optional argument(s)
Output on Snakevir
The architecture of the Snakevir output is designed as follow: