#download extender archive
wget https://motu-tool.org/data/extend_mOTUs_DBv3.tar.gz
#decompress
tar -xzvf extend_mOTUs_DBv2.tar.gz
cd extend_mOTUs_DBv2
#create a conda enviroment with dependencies
conda env create -f env/update_mOTUs_v2.yaml
cd ..
source activate update_mOTUs_v2
#Motus should be installed without conda
git clone https://github.com/motu-tool/mOTUs_v2.git
cd mOTUs_v2
python setup.py
python test.py
cd ..
MOTUS_DIR=`pwd`/mOTUs_v2/
You can see in the extend_mOTUs_DB/TEST/
directory what you have to prepare as input:
genome.list
with the name of the files with the genomic sequencesgenomes
with the genome sequences (one per file)taxonomy_file.txt
with the taxonomy of the genomes (the format is NCBI_taxonomy_id taxonomy_name
, separated by a tabs)for i in $(cat extend_mOTUs_DB/TEST/genomes.list); do ./extend_mOTUs_DB/SCRIPTS/extend_mOTUs_addGenome.sh extend_mOTUs_DB/TEST/genomes/$i.fasta $i newdbfolder extend_mOTUs_DB/SCRIPTS/ $MOTUS_DIR; done
This call will extract the marker genes from the genome sequences.
./extend_mOTUs_DB/SCRIPTS/extend_mOTUs_generateDB.sh extend_mOTUs_DB/TEST/genomes.list newdbname extend_mOTUs_DB/TEST/taxonomy_file.txt newdbfolder extend_mOTUs_DB/SCRIPTS/ $MOTUS_DIR
This call will do the clustering and create a new database that can be found in newdbfolder/newdbname/db_mOTU
Move the new database in the mOTU_v2
directory:
cp -r newdbfolder/newdbname/db_mOTU $MOTUS_DIR
Test that the database is updated. In the extend_mOTUs_DB/TEST/
directory there is a fastq file to test (test1_single.fastq
). Run:
$MOTUS_DIR/motus profile -g 1 -c -s extend_mOTUs_DB/TEST/test1_single.fastq
If the database was updated correctly you will see:
unknown Roseburia [meta_mOTU_v2_7798] 0
unknown Firmicutes [meta_mOTU_v2_7799] 0
unknown Clostridiales [meta_mOTU_v2_7800] 0
-1 0
Chryseobacterium indologenes [newdbname_1] 0
unknown Sphingobacterium [newdbname_2] 2
unknown Leadbetterella [newdbname_3] 0
Where the last three rows are the new mOTUs.
In extend_mOTUs_DB/TEST/test1_single.fastq
we simulated some reads from newdbname
and we are now able to profile this new species.
Note that those mOTUs are specific of the genomes that were selected in point 2.1.