The extension using these scripts works only for mOTUs >v2.5.0. The database versions v2.1.x and v2.0.x are not supported anymore

extendmOTUsDB

1. Installation

#download extender archive
wget https://motu-tool.org/data/extend_mOTUs_DBv2.5.tar.gz
#decompress
tar -xzvf extend_mOTUs_DBv2.5.tar.gz
cd extend_mOTUs_DBv2
#create a conda enviroment with dependencies 
conda env create -f env/update_mOTUs_v2.yaml
cd ..
source activate update_mOTUs_v2
#Motus should be installed without conda
git clone https://github.com/motu-tool/mOTUs_v2.git
cd mOTUs_v2
python setup.py
python test.py
cd ..
MOTUS_DIR=`pwd`/mOTUs_v2/

2. Running

2.1. Required input files

You can see in the extend_mOTUs_DB/TEST/ directory what you have to prepare as input: - a file named genome.list with the name of the files with the genomic sequences - a directory genomes with the genome sequences (one per file) - a file named taxonomy_file.txt with the taxonomy of the genomes (the format is NCBI_taxonomy_id taxonomy_name, separated by a tabs)

2.2. Run MG extraction

for i in $(cat extend_mOTUs_DB/TEST/genomes.list); do ./extend_mOTUs_DB/SCRIPTS/extend_mOTUs_addGenome.sh extend_mOTUs_DB/TEST/genomes/$i.fasta $i newdbfolder extend_mOTUs_DB/SCRIPTS/ $MOTUS_DIR; done

This call will extract the marker genes from the genome sequences.

2.3. Run DB generation

./extend_mOTUs_DB/SCRIPTS/extend_mOTUs_generateDB.sh extend_mOTUs_DB/TEST/genomes.list newdbname extend_mOTUs_DB/TEST/taxonomy_file.txt newdbfolder extend_mOTUs_DB/SCRIPTS/ $MOTUS_DIR

This call will do the clustering and create a new database that can be found in newdbfolder/newdbname/db_mOTU

2.4. Make DB available for mOTUs

The new database can be specified as the input using the motus command or copied to the default database folder inside the motus folder.

3. Testing

Test that the database is updated. In the extend_mOTUs_DB/TEST/ directory there is a fastq file to test (test1_single.fastq). Run: $MOTUS_DIR/motus profile -g 1 -c -s extend_mOTUs_DB/TEST/test1_single.fastq If the database was updated correctly you will see: unknown Roseburia [meta_mOTU_v2_7798] 0 unknown Firmicutes [meta_mOTU_v2_7799] 0 unknown Clostridiales [meta_mOTU_v2_7800] 0 -1 0 Chryseobacterium indologenes [newdbname_1] 0 unknown Sphingobacterium [newdbname_2] 2 unknown Leadbetterella [newdbname_3] 0 Where the last three rows are the new mOTUs. In extend_mOTUs_DB/TEST/test1_single.fastq we simulated some reads from newdbname and we are now able to profile this new species. Note that those mOTUs are specific of the genomes that were selected in point 2.1.