Tutorials


The tutorial requires the mOTU profiler to be correctly installed as described in the Installation section.

General Workflow

1. Generating taxonomic profiles

Standard example

Taxonomic profiles can be generated using the profile command and one or more sequencing files in fastq format:
# Using the mOTU profiler with test files contained in the installation directory
$motus profile -s test/test1_single.fastq -o test1.motus -n test1
$motus profile -f test/test2_for.fastq -r test/test2_rev.fastq -o test2.motus -n test2
The resulting profile reports the relative abundance for each mOTU (ref + meta):
$head -n 10 test1.motus
# git tag version 2.0.1 | motus version 2.0.1 | map_tax 2.0.1 | gene database: nr2.0.1 | calc_mgc 2.0.1 -y insert.scaled_counts -l 75 | calc_motu 2.0.1 -k mOTU -g 3 | taxonomy: ref_mOTU_2.0.1 meta_mOTU_2.0.1
# call: motus profile -s test/test1_single.fastq -o test1.motus
# consensus_taxonomy test1
Kandleria vitulina [ref_mOTU_v2_0001] 0.0688211617
Methyloversatilis universalis [ref_mOTU_v2_0002] 0.0000000000
Megasphaera genomosp. [ref_mOTU_v2_0003] 0.0234955832
Streptococcus anginosus [ref_mOTU_v2_0004] 0.4156119038
Streptococcus anginosus [ref_mOTU_v2_0005] 0.0000000000
Streptococcus dysgalactiae [ref_mOTU_v2_0006] 0.0149170416
Staphylococcus epidermidis [ref_mOTU_v2_0007] 0.0000000000
...

Read counting

The -c flag changes the output from relative abundance to number of assigned reads:
$motus profile -s test/test1_single.fastq -c -o test1.motus -n test1
$head -n 10 test1.motus
# git tag version 2.0.1 | motus version 2.0.1 | map_tax 2.0.1 | gene database: nr2.0.1 | calc_mgc 2.0.1 -y insert.scaled_counts -l 75 | calc_motu 2.0.1 -k mOTU -g 3 -c | taxonomy: ref_mOTU_2.0.1 meta_mOTU_2.0.1
# call: motus profile -s test/test1_single.fastq -o test1.motus -c -n test1
# consensus_taxonomy test1
Kandleria vitulina [ref_mOTU_v2_0001] 36
Methyloversatilis universalis [ref_mOTU_v2_0002] 0
Megasphaera genomosp. [ref_mOTU_v2_0003] 12
Streptococcus anginosus [ref_mOTU_v2_0004] 220
Streptococcus anginosus [ref_mOTU_v2_0005] 0
Streptococcus dysgalactiae [ref_mOTU_v2_0006] 8
Staphylococcus epidermidis [ref_mOTU_v2_0007] 0
...

Taxonomic level

The -k flag changes taxonomic level:
$motus profile -s test/test1_single.fastq -c -o test1.motus -k phylum -n test1
$head -n 10 test1.motus # git tag version 2.0.1 | motus version 2.0.1 | map_tax 2.0.1 | gene database: nr2.0.1 | calc_mgc 2.0.1 -y insert.scaled_counts -l 75 | calc_motu 2.0.1 -k phylum -g 3 | taxonomy: ref_mOTU_2.0.1 meta_mOTU_2.0.1
# call: motus profile -s test1_single.fastq -o test1.motus -k phylum -n test1
# consensus_taxonomy test1
candidatus Calescamantes 0.0000000000
Chlorobi 0.0000000000
Cyanobacteria 0.0000000000
Ignavibacteriae 0.0000000000
Proteobacteria 0.0732714431
Firmicutes 0.6479060231
Nitrospinae 0.0000000000
...

Threading

You can assign multiple cores (-t flag) to accelerate the alignment process:
$motus profile -f sample_R1.fq.gz -r sample_R2.fq.gz -t 8 -o test1.motus

Database selection

Add the -e flag to perform taxonomic profiling using only ref-mOTUs database:
$motus profile -f sample_R1.fq.gz -r sample_R2.fq.gz -e -o test1.motus

Merging profiles

Multiple profiles can be merged into one file:
$motus merge -i test1.motus,test2.motus -o test.motus
$head -n 10 test.motus
# motus version 2.0.1 | merge 2.0.1 | info merged profiles: # git tag version 2.0.1 | motus version 2.0.1 | map_tax 2.0.1 | gene database: nr2.0.1 | calc_mgc 2.0.1 -y insert.scaled_counts -l 75 | calc_motu 2.0.1 -k mOTU -g 3 | taxonomy: ref_mOTU_2.0.1 meta_mOTU_2.0.1
# call: python $motus merge -i test1.motus,test2.motus -o test.motus
# consensus_taxonomy test1 test2
Kandleria vitulina [ref_mOTU_v2_0001] 0.0688211617 0.0000000000
Methyloversatilis universalis [ref_mOTU_v2_0002] 0.0000000000 0.2772086762
Megasphaera genomosp. [ref_mOTU_v2_0003] 0.0234955832 0.0853385707
Streptococcus anginosus [ref_mOTU_v2_0004] 0.4156119038 0.0000000000
Streptococcus anginosus [ref_mOTU_v2_0005] 0.0000000000 0.1462455344
Streptococcus dysgalactiae [ref_mOTU_v2_0006] 0.0149170416 0.0000000000
Staphylococcus epidermidis [ref_mOTU_v2_0007] 0.0000000000 0.0603188994
...

More options

There are more options that influence the quality of the alignment (minimum length, percent identity) or change the output format and reported quantities (NCBI taxID, full rank, summarizing at a specific taxonomic rank). These options are displayed when using the plain $motus profile command:
$motus profile

2. Generating metatranscriptomic profiles

Standard example

Metatranscriptomic profiles can be generated using the profile command and one or more sequencing files in fastq format:
# Using the mOTU profiler with test files contained in the installation directory
$motus profile -s test/test1_single.fastq -o test1.motus -n test1
$motus profile -f test/test2_for.fastq -r test/test2_rev.fastq -o test2.motus -n test2
The resulting profile reports the relative abundance for each mOTU (ref + meta):
$head -n 10 test1.motus
# git tag version 2.0.1 | motus version 2.0.1 | map_tax 2.0.1 | gene database: nr2.0.1 | calc_mgc 2.0.1 -y insert.scaled_counts -l 75 | calc_motu 2.0.1 -k mOTU -g 3 | taxonomy: ref_mOTU_2.0.1 meta_mOTU_2.0.1
# call: motus profile -s test/test1_single.fastq -o test1.motus
# consensus_taxonomy test1
Kandleria vitulina [ref_mOTU_v2_0001] 0.0688211617
Methyloversatilis universalis [ref_mOTU_v2_0002] 0.0000000000
Megasphaera genomosp. [ref_mOTU_v2_0003] 0.0234955832
Streptococcus anginosus [ref_mOTU_v2_0004] 0.4156119038
Streptococcus anginosus [ref_mOTU_v2_0005] 0.0000000000
Streptococcus dysgalactiae [ref_mOTU_v2_0006] 0.0149170416
Staphylococcus epidermidis [ref_mOTU_v2_0007] 0.0000000000
...

Read counting

The -c flag changes the output from relative abundance to number of assigned reads:
$motus profile -s test/test1_single.fastq -c -o test1.motus -n test1
$head -n 10 test1.motus
# git tag version 2.0.1 | motus version 2.0.1 | map_tax 2.0.1 | gene database: nr2.0.1 | calc_mgc 2.0.1 -y insert.scaled_counts -l 75 | calc_motu 2.0.1 -k mOTU -g 3 -c | taxonomy: ref_mOTU_2.0.1 meta_mOTU_2.0.1
# call: motus profile -s test/test1_single.fastq -o test1.motus -c -n test1
# consensus_taxonomy test1
Kandleria vitulina [ref_mOTU_v2_0001] 36
Methyloversatilis universalis [ref_mOTU_v2_0002] 0
Megasphaera genomosp. [ref_mOTU_v2_0003] 12
Streptococcus anginosus [ref_mOTU_v2_0004] 220
Streptococcus anginosus [ref_mOTU_v2_0005] 0
Streptococcus dysgalactiae [ref_mOTU_v2_0006] 8
Staphylococcus epidermidis [ref_mOTU_v2_0007] 0
...

Taxonomic level

The -k flag changes taxonomic level:
$motus profile -s test/test1_single.fastq -c -o test1.motus -k phylum -n test1
$head -n 10 test1.motus # git tag version 2.0.1 | motus version 2.0.1 | map_tax 2.0.1 | gene database: nr2.0.1 | calc_mgc 2.0.1 -y insert.scaled_counts -l 75 | calc_motu 2.0.1 -k phylum -g 3 | taxonomy: ref_mOTU_2.0.1 meta_mOTU_2.0.1
# call: motus profile -s test1_single.fastq -o test1.motus -k phylum -n test1
# consensus_taxonomy test1
candidatus Calescamantes 0.0000000000
Chlorobi 0.0000000000
Cyanobacteria 0.0000000000
Ignavibacteriae 0.0000000000
Proteobacteria 0.0732714431
Firmicutes 0.6479060231
Nitrospinae 0.0000000000
...

Threading

You can assign multiple cores (-t flag) to accelerate the alignment process:
$motus profile -f sample_R1.fq.gz -r sample_R2.fq.gz -t 8 -o test1.motus

Database selection

Add the -e flag to perform taxonomic profiling using only ref-mOTUs database:
$motus profile -f sample_R1.fq.gz -r sample_R2.fq.gz -e -o test1.motus

Merging profiles

Multiple profiles can be merged into one file:
$motus merge -i test1.motus,test2.motus -o test.motus
$head -n 10 test.motus
# motus version 2.0.1 | merge 2.0.1 | info merged profiles: # git tag version 2.0.1 | motus version 2.0.1 | map_tax 2.0.1 | gene database: nr2.0.1 | calc_mgc 2.0.1 -y insert.scaled_counts -l 75 | calc_motu 2.0.1 -k mOTU -g 3 | taxonomy: ref_mOTU_2.0.1 meta_mOTU_2.0.1
# call: python $motus merge -i test1.motus,test2.motus -o test.motus
# consensus_taxonomy test1 test2
Kandleria vitulina [ref_mOTU_v2_0001] 0.0688211617 0.0000000000
Methyloversatilis universalis [ref_mOTU_v2_0002] 0.0000000000 0.2772086762
Megasphaera genomosp. [ref_mOTU_v2_0003] 0.0234955832 0.0853385707
Streptococcus anginosus [ref_mOTU_v2_0004] 0.4156119038 0.0000000000
Streptococcus anginosus [ref_mOTU_v2_0005] 0.0000000000 0.1462455344
Streptococcus dysgalactiae [ref_mOTU_v2_0006] 0.0149170416 0.0000000000
Staphylococcus epidermidis [ref_mOTU_v2_0007] 0.0000000000 0.0603188994
...

More options

There are more options that influence the quality of the alignment (minimum length, percent identity) or change the output format and reported quantities (NCBI taxID, full rank, summarizing at a specific taxonomic rank). These options are displayed when using the plain $motus profile command:
$motus profile

3. Generating single nucleotide variant (SNV) profiles using MGs


Calling variants using marker genes is divided in to two subroutines, namely alignment and variant calling (map_snv, snv_call). map_snv aligns sequencing reads against the mOTU profiler database. snv_call utilizes the metaSNV package to call variants on these marker genes.
map_snv takes one or multiple sequencing files and aligns reads against the mOTU profiler database:
motus map_snv -s sample.fq.gz > sample.bam
motus map_snv -f sample_R1.fq.gz -r sample_R2.fq.gz > sample.bam
Tweaking alignment parameters allows for changes in the minimum alignment length (-l). The -t flag allows to accelerate the alignment step using multithreading:
motus map_snv -f sample_R1.fq.gz -r sample_R2.fq.gz -l 100 -t 8> sample.bam
snv_call takes the bam files created in the map_snv step as input and calls variants using the metaSNV package. This information is then be used to create a distance matrix between samples. The input for snv_call is a directory with bam files. Each bam file will be treated as an individual sample:
motus snv_call -d DIRECTORY -o OUTPUT_DIRECTORY
An example distance matrix for the comparison of 3 samples is shown below.
-------- sample_1  sample_2  sample_3
sample_1 0.0000   0.0012   0.1430
sample_2 0.0012   0.0000   0.1392
sample_3 0.1430   0.1392   0.0000
There are multiple filtering parameters that influence if variants are called such as coverage depth (-fd), coverage breadth (-fb) or the minimum number of samples that report a variant (-fm). A list of all parameter can be found when executing the plain motus snv_call command:
motus snv_call