Search CORE

717 research outputs found

Recommended from our members

Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons

Author: Bomhoff Matthew
Choi Illyoung
Hartman John H
Hurwitz Bonnie L
Ponsero Alise J
Youens-Clark Ken
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/02/2019
Field of study

Background Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative metagenomics enables the comparison of metagenomes based on their total genetic content. Results We developed a tool called Libra that performs an all-vs-all comparison of metagenomes for precise clustering based on their k-mer content. Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth, and a web-based implementation in iMicrobe (http://imicrobe.us) that uses the CyVerse advanced cyberinfrastructure to promote broad use of the tool by the scientific community. Conclusions A comparison of Libra to equivalent tools using both simulated and real metagenomic datasets, ranging from 80 million to 4.2 billion reads, reveals that methods commonly implemented to reduce compute time for large datasets, such as data reduction, read count normalization, and presence/absence distance metrics, greatly diminish the resolution of large-scale comparative analyses. In contrast, Libra uses all of the reads to calculate k-mer abundance in a Hadoop architecture that can scale to any size dataset to enable global-scale analyses and link microbial signatures to biological processes.National Science Foundation [1640775]Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

The University of Arizona

Spaced seeds improve k-mer-based metagenomic classification

Author: Brinda Karel
Kucherov Gregory
Sykulski Maciej
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/07/2015
Field of study

Metagenomics is a powerful approach to study genetic content of environmental samples that has been strongly promoted by NGS technologies. To cope with massive data involved in modern metagenomic projects, recent tools [4, 39] rely on the analysis of k-mers shared between the read to be classified and sampled reference genomes. Within this general framework, we show in this work that spaced seeds provide a significant improvement of classification accuracy as opposed to traditional contiguous k-mers. We support this thesis through a series a different computational experiments, including simulations of large-scale metagenomic projects. Scripts and programs used in this study, as well as supplementary material, are available from http://github.com/gregorykucherov/spaced-seeds-for-metagenomics.Comment: 23 page

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Data integration for marine ecological genomics

Author: Kottmann R.
Publication venue: Jacobs University
Publication date: 28/05/2009
Field of study

MPG.PuRe

Statistical models for large-scale comparative metagenome analysis

Author: Aßhauer Kathrin Petra
Publication venue
Publication date: 19/02/2015
Field of study

Georg-August-University Göttingen

Picoeukaryotic sequences in the Sargasso Sea metagenome

Author: Derelle Evelyne
Desdevises Yves
Moreau Herve
Piganeau Gwenael
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Many sequences from picoeukaryotes were found in DNA sequence data assembled from Sargasso seawater

Crossref

Springer - Publisher Connector

PubMed Central

Multiple Comparative Metagenomics using Multiset k-mer Counting

Author: Benoit Gaëtan
Drezen Erwan
Lavenier Dominique
Lemaitre Claire
Mariadassou Mahendra
Peterlongo Pierre
Schbath Sophie
Publication venue
Publication date: 28/04/2016
Field of study

Background. Large scale metagenomic projects aim to extract biodiversity knowledge between different environmental conditions. Current methods for comparing microbial communities face important limitations. Those based on taxonomical or functional assignation rely on a small subset of the sequences that can be associated to known organisms. On the other hand, de novo methods, that compare the whole sets of sequences, either do not scale up on ambitious metagenomic projects or do not provide precise and exhaustive results. Methods. These limitations motivated the development of a new de novo metagenomic comparative method, called Simka. This method computes a large collection of standard ecological distances by replacing species counts by k-mer counts. Simka scales-up today's metagenomic projects thanks to a new parallel k-mer counting strategy on multiple datasets. Results. Experiments on public Human Microbiome Project datasets demonstrate that Simka captures the essential underlying biological structure. Simka was able to compute in a few hours both qualitative and quantitative ecological distances on hundreds of metagenomic samples (690 samples, 32 billions of reads). We also demonstrate that analyzing metagenomes at the k-mer level is highly correlated with extremely precise de novo comparison techniques which rely on all-versus-all sequences alignment strategy or which are based on taxonomic profiling

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Directory of Open Access Journals