12 research outputs found

    Phylogenetic classification of short environmental DNA fragments

    Get PDF
    Metagenomics is providing striking insights into the ecology of microbial communities. The recently developed massively parallel 454 pyrosequencing technique gives the opportunity to rapidly obtain metagenomic sequences at a low cost and without cloning bias. However, the phylogenetic analysis of the short reads produced represents a significant computational challenge. The phylogenetic algorithm CARMA for predicting the source organisms of environmental 454 reads is described. The algorithm searches for conserved Pfam domain and protein families in the unassembled reads of a sample. These gene fragments (environmental gene tags, EGTs), are classified into a higher-order taxonomy based on the reconstruction of a phylogenetic tree of each matching Pfam family. The method exhibits high accuracy for a wide range of taxonomic groups, and EGTs as short as 27 amino acids can be phylogenetically classified up to the rank of genus. The algorithm was applied in a comparative study of three aquatic microbial samples obtained by 454 pyrosequencing. Profound differences in the taxonomic composition of these samples could be clearly revealed

    TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach

    Get PDF
    Diaz NN, Krause L, Goesmann A, Niehaus K, Nattkemper TW. TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformatics. 2009;10(1):56.Background: Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning. Results Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained. Conclusion: An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date. Background

    Photoreceptors: unconventional ways of seeing

    Get PDF
    Animals perceive light typically by photoreceptor neurons assembled in eyes, but some also use non-eye photosensory neurons. Multidendritic neurons in the body wall of Drosophila larvae have now been shown to use an unconventional phototransduction mechanism to sense light

    Hyperbolic SOM-based clustering of DNA fragment features for taxonomic visualization and classification

    No full text
    Martin C, Diaz NN, Ontrup J, Nattkemper TW. Hyperbolic SOM-based clustering of DNA fragment features for taxonomic visualization and classification. Bioinformatics. 2008;24(14):1568-1574

    Finding novel genes in bacterial communities isolated from the environment

    No full text
    Krause L, Diaz NN, Bartels D, et al. Finding novel genes in bacterial communities isolated from the environment. BIOINFORMATICS. 2006;22(14):e281-e289.Motivation: Novel sequencing techniques can give access to organisms that are difficult to cultivate using conventional methods. When applied to environmental samples, the data generated has some drawbacks, e. g. short length of assembled contigs, in-frame stop codons and frame shifts. Unfortunately, current gene finders cannot circumvent these difficulties. At the same time, the automated prediction of genes is a prerequisite for the increasing amount of genomic sequences to ensure progress in metagenomics. Results: We introduce a novel gene finding algorithm that incorporates features overcoming the short length of the assembled contigs from environmental data, in-frame stop codons as well as frame shifts contained in bacterial sequences. The results show that by searching for sequence similarities in an environmental sample our algorithm is capable of detecting a high fraction of its gene content, depending on the species composition and the overall size of the sample. The method is valuable for hunting novel unknown genes that may be specific for the habitat where the sample is taken. Finally, we show that our algorithm can even exploit the limited information contained in the short reads generated by 454 technology for the prediction of protein coding genes

    Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing

    No full text
    Kroeber M, Bekel T, Diaz NN, et al. Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing. JOURNAL OF BIOTECHNOLOGY. 2009;142(1):38-49.The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage. green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 165-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 165-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 165-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 165-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the a-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 165-rDNA metagenome sequence reads to 62 165-rDNA amplicon sequences thus enabling frequency of abundance estimations for 165-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 165-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 165-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown. (C) 2009 Elsevier B.V. All rights reserved

    Taxonomic composition and gene content of a methane-producing microbial community isolated from a biogas reactor

    No full text
    Krause L, Diaz NN, Edwards RA, et al. Taxonomic composition and gene content of a methane-producing microbial community isolated from a biogas reactor. JOURNAL OF BIOTECHNOLOGY. 2008;136(1-2):91-101.A total community DNA sample from an agricultural biogas reactor continuously fed with maize silage, green rye, and small proportions of chicken manure has recently been sequenced using massively parallel pyrosequencing. In this Study, the sample was computationally characterized without a prior assembly step, providing quantitative insights into the taxonomic composition and gene content of the underlying microbial community. Clostridiales from the phylum Firmicutes is the most prevalent phylogenetic order, Methanomicrobiales are dominant among methanogenic archaea. An analysis of Operational Taxonomic Units (OTUs) revealed that the entire microbial community is only partially covered by the sequenced sample, despite that estimates suggest only a moderate overall diversity of the community. Furthermore, the results strongly indicate that archaea related to the genus Methanoculleus, using CO2 as electron acceptor and H-2 as electron donor, are the main producers of methane in the analyzed biogas reactor sample. A phylogenetic analysis of glycosyl hydrolase protein families suggests that Clostridia play an important role in the digestion of polysaccharides and oligosaccharides. Finally, the results unveiled that most of the organisms constituting the sample are still unexplored. (C) 2008 Elsevier B.V. All rights reserved
    corecore