3 research outputs found

    GraphBin2: Refined and Overlapped Binning of Metagenomic Contigs Using Assembly Graphs

    Get PDF
    Metagenomic sequencing allows us to study structure, diversity and ecology in microbial communities without the necessity of obtaining pure cultures. In many metagenomics studies, the reads obtained from metagenomics sequencing are first assembled into longer contigs and these contigs are then binned into clusters of contigs where contigs in a cluster are expected to come from the same species. As different species may share common sequences in their genomes, one assembled contig may belong to multiple species. However, existing tools for contig binning only support non-overlapped binning, i.e., each contig is assigned to at most one bin (species). In this paper, we introduce GraphBin2 which refines the binning results obtained from existing tools and, more importantly, is able to assign contigs to multiple bins. GraphBin2 uses the connectivity and coverage information from assembly graphs to adjust existing binning results on contigs and to infer contigs shared by multiple species. Experimental results on both simulated and real datasets demonstrate that GraphBin2 not only improves binning results of existing tools but also supports to assign contigs to multiple bins

    Metagenomics Binning Using Assembly Graphs

    Get PDF
    Metagenomics involves the study of various genetic material obtained directly from communities of microorganisms living in natural environments. The field of metagenomics has provided valuable insights into the structure, diversity and ecology within microbial communities. Recent developments in high-throughput sequencing technologies have enabled metagenomics to analyse samples from environments, without having to rely on culture-based methods. Once an environmental sample is sequenced, a process called metagenomics binning is used to cluster the sequences into bins that represent different taxonomic groups such as species, genera or higher levels. Various efforts have been made throughout the past to bin metagenomic sequences. One approach followed is to bin raw sequencing reads prior to assembly. However, reads are considered too short to produce accurate and reliable binning results for downstream analysis. Hence, the standard approach followed during metagenomics analysis is to assemble short reads into longer sequences called contigs and then bin these resulting contigs. Existing metagenomic contig-binning methods rely on the composition and abundance information of the contigs, and face challenges when binning short contigs and contigs with similar composition and abundance. Contigs are derived from the underlying assembly graph which contains valuable connectivity information among contigs. However, existing metagenomic contig-binning methods do not consider the assembly graph in the binning process. Firstly, this thesis describes a bin refinement tool named GraphBin that improves existing metagenomic binning results using assembly graphs. GraphBin makes use of the assembly graph and a label propagation method to refine binning results of existing contig-binning tools by correcting mis-binned contigs and recovering short contigs that are discarded. Secondly, this thesis explains how to enable the detection of shared sequences among multiple species from assembly graphs and introduces a tool named GraphBin2 which can perform overlapped binning. GraphBin2 makes use of the assembly graph and the coverage information of contigs which enables the detection of contigs that may belong to multiple species. Thirdly, this thesis introduces a stand-alone approach named MetaCoAG to bridge metagenomics binning and assembly by incorporating composition, coverage and assembly graphs. MetaCoAG uses single-copy marker genes to estimate the number of initial bins, assigns contigs into bins iteratively and adjusts the number of bins dynamically throughout the binning process. In summary, this thesis discusses the challenges in binning metagenomic contigs, the shortcomings of existing metagenomic contig-binning tools and presents how the assembly graph can be incorporated to improve metagenomics binning

    Whole is greater than the sum of the parts: piecing together microbial methylated amine metabolism, The

    Get PDF
    2020 Fall.Includes bibliographical references.To view the abstract, please see the full text of the document
    corecore