39 research outputs found

    The PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences

    Get PDF
    Metagenome sequencing is becoming common and there is an increasing need for easily accessible tools for data analysis. An essential step is the taxonomic classification of sequence fragments. We describe a web server for the taxonomic assignment of metagenome sequences with PhyloPythiaS. PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clades. Taxonomic assignments with the web server can be made with a generic model, or with sample-specific models that users can specify and create. Several interactive visualization modes and multiple download formats allow quick and convenient analysis and downstream processing of taxonomic assignments. Here, we demonstrate usage of our web server by taxonomic assignment of metagenome samples from an acidophilic biofilm community of an acid mine and of a microbial community from cow rumen

    Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes.

    Get PDF
    INTRODUCTION: Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. RESULTS: We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. CONCLUSION: Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study

    TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach

    Get PDF
    Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW. TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009;10(1):56.Background: Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. Results Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained. Conclusion: An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date. Background

    An Environment-Sensitive Synthetic Microbial Ecosystem

    Get PDF
    Microbial ecosystems have been widely used in industrial production, but the inter-relationships of organisms within them haven't been completely clarified due to complex composition and structure of natural microbial ecosystems. So it is challenging for ecologists to get deep insights on how ecosystems function and interplay with surrounding environments. But the recent progresses in synthetic biology show that construction of artificial ecosystems where relationships of species are comparatively clear could help us further uncover the meadow of those tiny societies. By using two quorum-sensing signal transduction circuits, this research designed, simulated and constructed a synthetic ecosystem where various population dynamics formed by changing environmental factors. Coherent experimental data and mathematical simulation in our study show that different antibiotics levels and initial cell densities can result in correlated population dynamics such as extinction, obligatory mutualism, facultative mutualism and commensalism. This synthetic ecosystem provides valuable information for addressing questions in ecology and may act as a chassis for construction of more complex microbial ecosystems

    Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes

    Get PDF
    BACKGROUND: Lineage-specific, or taxonomically restricted genes (TRGs), especially those that are species and strain-specific, are of special interest because they are expected to play a role in defining exclusive ecological adaptations to particular niches. Despite this, they are relatively poorly studied and little understood, in large part because many are still orphans or only have homologues in very closely related isolates. This lack of homology confounds attempts to establish the likelihood that a hypothetical gene is expressed and, if so, to determine the putative function of the protein. METHODOLOGY/PRINCIPAL FINDINGS: We have developed "QIPP" ("Quality Index for Predicted Proteins"), an index that scores the "quality" of a protein based on non-homology-based criteria. QIPP can be used to assign a value between zero and one to any protein based on comparing its features to other proteins in a given genome. We have used QIPP to rank the predicted proteins in the proteomes of Bacteria and Archaea. This ranking reveals that there is a large amount of variation in QIPP scores, and identifies many high-scoring orphans as potentially "authentic" (expressed) orphans. There are significant differences in the distributions of QIPP scores between orphan and non-orphan genes for many genomes and a trend for less well-conserved genes to have lower QIPP scores. CONCLUSIONS: The implication of this work is that QIPP scores can be used to further annotate predicted proteins with information that is independent of homology. Such information can be used to prioritize candidates for further analysis. Data generated for this study can be found in the OrphanMine at http://www.genomics.ceh.ac.uk/orphan_mine

    A statistical toolbox for metagenomics: assessing functional diversity in microbial communities

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data.</p> <p>Results</p> <p>Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments.</p> <p>Conclusion</p> <p>The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.</p

    Secondary metabolite gene expression and interplay of bacterial functions in a tropical freshwater cyanobacterial bloom

    Get PDF
    Cyanobacterial harmful algal blooms (cyanoHABs) appear to be increasing in frequency on a global scale. The Cyanobacteria in blooms can produce toxic secondary metabolites that make freshwater dangerous for drinking and recreation. To characterize microbial activities in a cyanoHAB, transcripts from a eutrophic freshwater reservoir in Singapore were sequenced for six samples collected over one day-night period. Transcripts from the Cyanobacterium Microcystis dominated all samples and were accompanied by at least 533 genera primarily from the Cyanobacteria, Proteobacteria, Bacteroidetes and Actinobacteria. Within the Microcystis population, abundant transcripts were from genes for buoyancy, photosynthesis and synthesis of the toxin microviridin, suggesting that these are necessary for competitive dominance in the Reservoir. During the day, Microcystis transcripts were enriched in photosynthesis and energy metabolism while at night enriched pathways included DNA replication and repair and toxin biosynthesis. Microcystis was the dominant source of transcripts from polyketide and non-ribosomal peptide synthase (PKS and NRPS, respectively) gene clusters. Unexpectedly, expression of all PKS/NRPS gene clusters, including for the toxins microcystin and aeruginosin, occurred throughout the day-night cycle. The most highly expressed PKS/NRPS gene cluster from Microcystis is not associated with any known product. The four most abundant phyla in the reservoir were enriched in different functions, including photosynthesis (Cyanobacteria), breakdown of complex organic molecules (Proteobacteria), glycan metabolism (Bacteroidetes) and breakdown of plant carbohydrates, such as cellobiose (Actinobacteria). These results provide the first estimate of secondary metabolite gene expression, functional partitioning and functional interplay in a freshwater cyanoHAB.Singapore. National Research Foundation (Singapore MIT Alliance for Research and Technology (SMART), Center for Environmental Sensing and Modeling (CENSAM) research program)National Science Foundation (U.S.) (Postdoctoral Research Fellowship in Biology, Grant No. DBI-1202865)National Institute of Environmental Health Sciences (NIEHS Grant P30-ES002109 to the MIT Center for Environmental Health Sciences)MIT International Science and Technology Initiatives (MISTI-Hayashi fund

    Differential preservation of endogenous human and microbial DNA in dental calculus and dentin

    Get PDF
    Dental calculus (calcified dental plaque) is prevalent in archaeological skeletal collections and is a rich source of oral microbiome and host-derived ancient biomolecules. Recently, it has been proposed that dental calculus may provide a more robust environment for DNA preservation than other skeletal remains, but this has not been systematically tested. In this study, shotgun-sequenced data from paired dental calculus and dentin samples from 48 globally distributed individuals are compared using a metagenomic approach. Overall, we find DNA from dental calculus is consistently more abundant and less contaminated than DNA from dentin. The majority of DNA in dental calculus is microbial and originates from the oral microbiome; however, a small but consistent proportion of DNA (mean 0.08 ± 0.08%, range 0.007–0.47%) derives from the host genome. Host DNA content within dentin is variable (mean 13.70 ± 18.62%, range 0.003–70.14%), and for a subset of dentin samples (15.21%), oral bacteria contribute \u3e 20% of total DNA. Human DNA in dental calculus is highly fragmented, and is consistently shorter than both microbial DNA in dental calculus and human DNA in paired dentin samples. Finally, we find that microbial DNA fragmentation patterns are associated with guanine-cytosine (GC) content, but not aspects of cellular structure

    The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity

    Get PDF
    New bioinformatic tools are needed to analyze the growing volume of DNA sequence data. This is especially true in the case of secondary metabolite biosynthesis, where the highly repetitive nature of the associated genes creates major challenges for accurate sequence assembly and analysis. Here we introduce the web tool Natural Product Domain Seeker (NaPDoS), which provides an automated method to assess the secondary metabolite biosynthetic gene diversity and novelty of strains or environments. NaPDoS analyses are based on the phylogenetic relationships of sequence tags derived from polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) genes, respectively. The sequence tags correspond to PKS-derived ketosynthase domains and NRPS-derived condensation domains and are compared to an internal database of experimentally characterized biosynthetic genes. NaPDoS provides a rapid mechanism to extract and classify ketosynthase and condensation domains from PCR products, genomes, and metagenomic datasets. Close database matches provide a mechanism to infer the generalized structures of secondary metabolites while new phylogenetic lineages provide targets for the discovery of new enzyme architectures or mechanisms of secondary metabolite assembly. Here we outline the main features of NaPDoS and test it on four draft genome sequences and two metagenomic datasets. The results provide a rapid method to assess secondary metabolite biosynthetic gene diversity and richness in organisms or environments and a mechanism to identify genes that may be associated with uncharacterized biochemistry

    Relative amino acid composition signatures of organisms and environments

    Get PDF
    BACKGROUND: Identifying organism-environment interactions at the molecular level is crucial to understanding how organisms adapt to and change the chemical and molecular landscape of their habitats. In this work we investigated whether relative amino acid compositions could be used as a molecular signature of an environment and whether such a signature could also be observed at the level of the cellular amino acid composition of the microorganisms that inhabit that environment. METHODOLOGIES/PRINCIPAL FINDINGS: To address these questions we collected and analyzed environmental amino acid determinations from the literature, and estimated from complete genomic sequences the global relative amino acid abundances of organisms that are cognate to the different types of environment. Environmental relative amino acid abundances clustered into broad groups (ocean waters, host-associated environments, grass land environments, sandy soils and sediments, and forest soils), indicating the presence of amino acid signatures specific for each environment. These signatures correlate to those found in organisms. Nevertheless, relative amino acid abundance of organisms was more influenced by GC content than habitat or phylogeny. CONCLUSIONS: Our results suggest that relative amino acid composition can be used as a signature of an environment. In addition, we observed that the relative amino acid composition of organisms is not highly determined by environment, reinforcing previous studies that find GC content to be the major factor correlating to amino acid composition in living organisms.AM was supported by Fundação para a Ciência e a Tecnologia, Portugal, through the postdoctoral grant SFRH/BPD/72256/2010. RA was partially supported by the Ministerio de Ciencia e Innovación (Spain) through grant BFU2010-17704, and by the Generalitat de Catalunya through a grant for research group 2009SGR809. MAS was supported in part by a grant from the US Public Health Service (RO1-GM30054). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Authors wish to thank Albert Sorribas, Enrique Herrero and Ester Vilaprinyo for critical reading of the manuscript and Ester Vilaprinyo for assistance with Wolfram Mathematica software.publishe
    corecore