3 research outputs found

    Xander: employing a novel method for efficient gene-targeted metagenomic assembly

    Get PDF
    BACKGROUND: Metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. RESULTS: We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. CONCLUSION: Xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines. This method is implemented as open source software and is available at https://github.com/rdpstaff/Xander_assembler. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40168-015-0093-6) contains supplementary material, which is available to authorized users

    Sequence Homology Search Based on Database Indexing Using the Profile Hidden Markov Model

    No full text
    Abstract — The Profile Hidden Markov Model (PHMM) has received increasing attention in the field of protein homology detection, since profile-based methods are much more sensitive in detecting distant homologous relationships than pairwise methods. Pure dynamic-programming-based systems are often used for PHMM searches. However, these dynamic-programmingbased systems are very time consuming for a large database. For instance, it may take approximately 15 minutes to search a short model of length 12 in the GenBank protein sequence database. Instead of searching the database sequentially, we search the database based on a tree-structured database indexing, called the HD-tree. The HD-tree is able to reduce the PHMM search time significantly without reducing the quality of search results. Performance of search using the HD-tree is compared with that of HMMER [1], a popular implementation of PHMM for protein sequence analysis. It is shown that the HD-tree approach is orders of magnitude faster than HMMER for short queries. I
    corecore