1,923 research outputs found

    Extreme Scale De Novo Metagenome Assembly

    Get PDF
    Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified Parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes. We demonstrate the unprecedented capability of MetaHipMer by computing the first full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion reads - size 2.6 TBytes.Comment: Accepted to SC1

    Recovering complete and draft population genomes from metagenome datasets.

    Get PDF
    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    Comparative metagenomic analysis reveals mechanisms for stress response in hypoliths from extreme hyperarid deserts

    Get PDF
    Understanding microbial adaptation to environmental stressors is crucial for interpreting broader ecological patterns. In the most extreme hot and cold deserts, cryptic niche communities are thought to play key roles in ecosystem processes and represent excellent model systems for investigating microbial responses to environmental stressors. However, relatively little is known about the genetic diversity underlying such functional processes in climatically extreme desert systems. This study presents the first comparative metagenome analysis of cyanobacteria-dominated hypolithic communities in hot (Namib Desert, Namibia) and cold (Miers Valley, Antarctica) hyperarid deserts. The most abundant phyla in both hypolith metagenomes were Actinobacteria, Proteobacteria, Cyanobacteria and Bacteroidetes with Cyanobacteria dominating in Antarctic hypoliths. However, no significant differences between the twometagenomeswere identified. The Antarctic hypolithicmetagenome displayed a high number of sequences assigned to sigma factors, replication,recombination andrepair, translation, ribosomal structure,andbiogenesis. In contrast, theNamibDesert metagenome showed a high abundance of sequences assigned to carbohydrate transport and metabolism. Metagenome data analysis also revealed significantdivergence inthe geneticdeterminantsof aminoacidandnucleotidemetabolismbetween these two metagenomes and those of soil from other polar deserts, hot deserts, and non-desert soils. Our results suggest extensive niche differentiation in hypolithic microbial communities from these two extreme environments and a high genetic capacity for survival under environmental extremes.Fil: Le, Phuong Thi. University of Pretoria; Sudáfrica. Vlaams Instituut voor Biotechnologie; Bélgica. University of Ghent; BélgicaFil: Makhalanyane, Thulani P.. University of Pretoria; SudáfricaFil: Guerrero, Leandro Demián. University of Pretoria; Sudáfrica. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Investigaciones en Ingeniería Genética y Biología Molecular "Dr. Héctor N. Torres"; ArgentinaFil: Vikram, Surendra. University of Pretoria; SudáfricaFil: Van De Peer, Yves. University of Pretoria; Sudáfrica. Vlaams Instituut voor Biotechnologie; Bélgica. University of Ghent; BélgicaFil: Cowan, Don A.. University of Pretoria; Sudáfric

    De Novo sequences of Haloquadratum walsbyi from Lake Tyrrell, Australia, reveal a aariable genomic landscape

    Get PDF
    Hypersaline systems near salt saturation levels represent an extreme environment, in which organisms grow and survive near the limits of life. One of the abundant members of the microbial communities in hypersaline systems is the square archaeon, Haloquadratum walsbyi. Utilizing a short-read metagenome from Lake Tyrrell, a hypersaline ecosystem in Victoria, Australia, we performed a comparative genomic analysis of H. walsbyi to better understand the extent of variation between strains/subspecies. Results revealed that previously isolated strains/subspecies do not fully describe the complete repertoire of the genomic landscape present in H. walsbyi. Rearrangements, insertions, and deletions were observed for the Lake Tyrrell derived Haloquadratum genomes and were supported by environmental de novo sequences, including shifts in the dominant genomic landscape of the two most abundant strains. Analysis pertaining to halomucins indicated that homologs for this large protein are not a feature common for all species of Haloquadratum. Further, we analyzed ATP-binding cassette transporters (ABC-type transporters) for evidence of niche partitioning between different strains/subspecies. We were able to identify unique and variable transporter subunits from all five genomes analyzed and the de novo environmental sequences, suggesting that differences in nutrient and carbon source acquisition may play a role in maintaining distinct strains/subspecies.Funding for this was provided by the National Science Foundation (NSF) MCB Award no. 0626526 to J. Banfield, E. Allen, and K. Heidelberg

    Metatranscriptome of human faecal microbial communities in a cohort of adult men

    Get PDF
    The gut microbiome is intimately related to human health, but it is not yet known which functional activities are driven by specific microorganisms\u27 ecological configurations or transcription. We report a large-scale investigation of 372 human faecal metatranscriptomes and 929 metagenomes from a subset of 308 men in the Health Professionals Follow-Up Study. We identified a metatranscriptomic \u27core\u27 universally transcribed over time and across participants, often by different microorganisms. In contrast to the housekeeping functions enriched in this core, a \u27variable\u27 metatranscriptome included specialized pathways that were differentially expressed both across participants and among microorganisms. Finally, longitudinal metagenomic profiles allowed ecological interaction network reconstruction, which remained stable over the six-month timespan, as did strain tracking within and between participants. These results provide an initial characterization of human faecal microbial ecology into core, subject-specific, microorganism-specific and temporally variable transcription, and they differentiate metagenomically versus metatranscriptomically informative aspects of the human faecal microbiome

    Essential guidelines for computational method benchmarking

    Get PDF
    In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.Comment: Minor update
    corecore