11 research outputs found

    Automatic identification of optimal marker genes for phenotypic and taxonomic groups of microorganisms

    No full text
    <div><p>Finding optimal markers for microorganisms important in the medical, agricultural, environmental or ecological fields is of great importance. Thousands of complete microbial genomes now available allow us, for the first time, to exhaustively identify marker proteins for groups of microbial organisms. In this work, we model the biological task as the well-known mathematical ā€œhitting setā€ problem, solving it based on both greedy and randomized approximation algorithms. We identify unique markers for 17 phenotypic and taxonomic microbial groups, including proteins related to the nitrite reductase enzyme as markers for the non-anammox nitrifying bacteria group, and two transcription regulation proteins, <i>nusG</i> and <i>yhiF</i>, as markers for the Archaea and <i>Escherichia/Shigella</i> taxonomic groups, respectively. Additionally, we identify marker proteins for three subtypes of pathogenic <i>E</i>. <i>coli</i>, which previously had no known optimal markers. Practically, depending on the completeness of the database this algorithm can be used for identification of marker genes for any microbial group, these marker genes may be prime candidates for the understanding of the genetic basis of the group's phenotype or to help discover novel functions which are uniquely shared among a group of microbes. We show that our method is both theoretically and practically efficient, while establishing an upper bound on its time complexity and approximation ratio; thus, it promises to remain efficient and permit the identification of marker proteins that are specific to phenotypic or taxonomic groups, even as more and more bacterial genomes are being sequenced.</p></div

    A New Comparative-Genomics Approach for Defining Phenotype-Specific Indicators Reveals Specific Genetic Markers in Predatory Bacteria

    No full text
    <div><p>Predatory bacteria seek and consume other live bacteria. Although belonging to taxonomically diverse groups, relatively few bacterial predator species are known. Consequently, it is difficult to assess the impact of predation within the bacterial realm. As no genetic signatures distinguishing them from non-predatory bacteria are known, genomic resources cannot be exploited to uncover novel predators. In order to identify genes specific to predatory bacteria, we developed a bioinformatic tool called DiffGene. This tool automatically identifies marker genes that are specific to phenotypic or taxonomic groups, by mapping the complete gene content of all available fully-sequenced genomes for the presence/absence of each gene in each genome. A putative ā€˜predator regionā€™ of ~60 amino acids in the tryptophan 2,3-dioxygenase (TDO) protein was found to probably be a predator-specific marker. This region is found in all known obligate predator and a few facultative predator genomes, and is absent from most facultative predators and all non-predatory bacteria. We designed PCR primers that uniquely amplify a ~180bp-long sequence within the predatorsā€™ TDO gene, and validated them in monocultures as well as in metagenetic analysis of environmental wastewater samples. This marker, in addition to its usage in predator identification and phylogenetics, may finally permit reliable enumeration and cataloguing of predatory bacteria from environmental samples, as well as uncovering novel predators.</p></div

    Maximum-likelihood phylogenetic tree of the tryptophan 2,3-dioxygenase protein.

    No full text
    <p>The percentage of trees in which the associated taxa clustered together (out of 100 bootstraps) is shown next to the branches; branches with <50% were collapsed. Obligate bacterial predators are marked orange, facultative yellow. Red line indicates genomes with the ~60 amino acid-long insert.</p

    Example comparison of an excerpt of an abstract and corresponding lay summary.

    No full text
    <p>Part I (abstract) has 57% high frequency 16% mid-frequency and 27% jargon. Part II (summary) uses 71% high frequency, 20% mid-frequency, and 8% jargon. Excerpts taken from [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0181742#pone.0181742.ref053" target="_blank">53</a>].</p

    Hitting sets (marker proteins) of 17 microorganism groups.

    No full text
    <p>HS, hitting set. Min., minimal. Greedy and random refer to the algorithm type. Phen., phenotypic. Tax., taxonomic. AIEC, adherent-invasive <i>E</i>. <i>coli</i>. EPEC, enteropathogenic <i>E</i>. <i>coli</i>. UPEC, uropathogenic <i>E</i>. <i>coli</i>. STEC, Shiga toxin-producing <i>E</i>. <i>coli</i>. NMEC, neonatal meningitis-associated <i>E</i>. <i>coli</i>. ExPEC, extra-intestinal pathogenic <i>E</i>. <i>coli</i>. ETEC, enterotoxigenic <i>E</i>. <i>coli</i>. EIEC, enteroinvasive <i>E</i>. <i>coli</i>. EHEC, enterohemorrhagic <i>E</i>. <i>coli</i>. EAEC, enteroaggregative <i>E</i>. <i>coli</i>. APEC, avian pathogenic <i>E</i>. <i>coli</i>. EAHEC, enteroaggregative hemorrhagic <i>E</i>. <i>coli</i>.</p

    Automatic identification of optimal marker genes for phenotypic and taxonomic groups of microorganisms - Fig 1

    No full text
    <p>Graphical representation of the proteins (denoted P<sub>1</sub>, P<sub>2</sub>, P<sub>3</sub>, P<sub>4</sub>, P<sub>5</sub>) which can serve as markers for the bacterial (denoted B<sub>1</sub>, B<sub>2</sub>, B<sub>3</sub>, B<sub>4</sub>) group of interest consisting of B<sub>1</sub> and B<sub>2</sub>: (A) shows that P<sub>1</sub>, P<sub>2</sub> can serve as a minimal set of markers for the group of interest; (B) P<sub>1</sub> only can serve as a marker for the group of interest; and (C) there are no markers for the group of interest.</p

    Maximum-likelihood phylogenetic tree of the representative sequences of the 100-most abundant OTUs in the metagenetic analysis, including representative TDO sequences from known predatory and non-predatory bacteria.

    No full text
    <p>The bootstrap consensus tree inferred from 100 replicates is taken to represent the evolutionary history of the taxa analyzed. Wastewater OTU names are according to abundance, i.e. OTU1 is the most abundant OTU in the environmental samples, OTU2 is the second-most abundant, and so on. Known sequence names include GI accession, coordinates within the genome, and species name.</p

    Maximum-likelihood phylogenetic tree of the NusG protein.

    No full text
    <p>All archaeal NusG sequences were taken from the GenBank database, along with their most similar bacterial and eukaryotic homologs for a total of 500 protein sequences. The bootstrap consensus tree inferred from 100 replicates was taken to represent the evolutionary history of the taxa analyzed. Branches were merged at the domain level.</p
    corecore