Extract single copy marker genes


Phylogenetic marker genes are used to reconstruct the evolutionary history of organisms. One example is the 16S ribosomal RNA gene, which is very powerful and thus widely-used, but it also known to be limited in phylogenetic resolution, that is, in its ability to delineate closely related organisms. Efforts to find an alternative set of marker genes with favorable properties have led to the identification of protein-coding, rarely horizontally transferred, single copy phylogenetic marker genes (MGs). A set of 40 of such marker genes that are present in the vast majority of known organisms was used to reconstruct a universal tree of life [1].

The tool fetchMGs extracts these 40 MGs from genomes and metagenomes in a fast and accurate manner. This is done by utilizing Hidden Markov Models (HMMs) trained on protein alignments of known members of the 40 MGs using individually calibrated cutoffs. Please note that these cutoffs are only accurate when using complete protein sequences as input files. The output of the program are the protein sequences of the identified proteins, as well as their nucleotide sequences, if the corresponding nucleotide sequences are provided as an additional input.

The most current version (v1.2) of fetchMGs can be downloaded from GitHub.

The legacy version (v1.0) can be downloaded from here.

A list of the 40 MGs is available here.