Search CORE

623 research outputs found

The impact of sequence database choice on metaproteomic results in gut microbiota studies

Author: Addis Maria Filippa
Deligios Massimo
Fraumene Cristina
Manghina Valeria
Martens Lennart
Muth Thilo
Pagnozzi Daniela
Palomba Antonio
Rapp Erdmann
Tanca Alessandro
Uzzau Sergio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Elucidating the role of gut microbiota in physiological and pathological processes has recently emerged as a key research aim in life sciences. In this respect, metaproteomics, the study of the whole protein complement of a microbial community, can provide a unique contribution by revealing which functions are actually being expressed by specific microbial taxa. However, its wide application to gut microbiota research has been hindered by challenges in data analysis, especially related to the choice of the proper sequence databases for protein identification. Results: Here, we present a systematic investigation of variables concerning database construction and annotation and evaluate their impact on human and mouse gut metaproteomic results. We found that both publicly available and experimental metagenomic databases lead to the identification of unique peptide assortments, suggesting parallel database searches as a mean to gain more complete information. In particular, the contribution of experimental metagenomic databases was revealed to be mandatory when dealing with mouse samples. Moreover, the use of a "merged" database, containing all metagenomic sequences from the population under study, was found to be generally preferable over the use of sample-matched databases. We also observed that taxonomic and functional results are strongly database-dependent, in particular when analyzing the mouse gut microbiota. As a striking example, the Firmicutes/Bacteroidetes ratio varied up to tenfold depending on the database used. Finally, assembling reads into longer contigs provided significant advantages in terms of functional annotation yields. Conclusions: This study contributes to identify host- and database-specific biases which need to be taken into account in a metaproteomic experiment, providing meaningful insights on how to design gut microbiota studies and to perform metaproteomic data analysis. In particular, the use of multiple databases and annotation tools has to be encouraged, even though this requires appropriate bioinformatic resources

AIR Universita degli studi di Milano

Ghent University Academic Bibliography

PubMed Central

MPG.PuRe

Statistical models for large-scale comparative metagenome analysis

Author: Aßhauer Kathrin Petra
Publication venue
Publication date: 19/02/2015
Field of study

Georg-August-University Göttingen

Novel NGS Pipeline for Virus Discovery from a Wide Spectrum of Hosts and Sample Types

Author: Holm Liisa
Jääskeläinen Anne J.
Kant Ravi
Pljusnin Ilja
Sironen Tarja
Smura Teemu
Vapalahti Olli
Publication venue
Publication date: 01/01/2020
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Recommended from our members

Libra: scalable k-mer-based tool for massive all-vs-all metagenome comparisons

Author: Bomhoff Matthew
Choi Illyoung
Hartman John H
Hurwitz Bonnie L
Ponsero Alise J
Youens-Clark Ken
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/02/2019
Field of study

Background Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative metagenomics enables the comparison of metagenomes based on their total genetic content. Results We developed a tool called Libra that performs an all-vs-all comparison of metagenomes for precise clustering based on their k-mer content. Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth, and a web-based implementation in iMicrobe (http://imicrobe.us) that uses the CyVerse advanced cyberinfrastructure to promote broad use of the tool by the scientific community. Conclusions A comparison of Libra to equivalent tools using both simulated and real metagenomic datasets, ranging from 80 million to 4.2 billion reads, reveals that methods commonly implemented to reduce compute time for large datasets, such as data reduction, read count normalization, and presence/absence distance metrics, greatly diminish the resolution of large-scale comparative analyses. In contrast, Libra uses all of the reads to calculate k-mer abundance in a Hadoop architecture that can scale to any size dataset to enable global-scale analyses and link microbial signatures to biological processes.National Science Foundation [1640775]Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

The University of Arizona

Data integration for marine ecological genomics

Author: Kottmann R.
Publication venue: Jacobs University
Publication date: 28/05/2009
Field of study

MPG.PuRe

Streaming histogram sketching for rapid microbiome analytics

Author: A Sczyrba
AG Shaw
AL Greninger
AP Carrieri
B Grüning
BD Ondov
C Alcon-Giner
C Kakkanatt
D Yang
DB Rusch
F Pedregosa
G Benoit
G Cormode
H Mulcahy-O’Grady
Human Microbiome Project Consortium
I Koychev
JD Forbes
K Sim
LP Coelho
LR Thompson
M Bawa
MW Libbrecht
Q Zhang
R Bovee
S Ioffe
S Seth
SY Anvar
T Brown
T Haveliwala
VB Dubinkina
W Wu
XC Morgan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2019
Field of study

Background: The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for the compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time. Results: We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed ‘histosketch’ that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we use a ‘real life’ example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s. Conclusions: Our method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2 GB microbiome in 50 s on a standard laptop using four cores, with the sketch occupying 3000 bytes of disk space

University of Liverpool Repository

Crossref

University of Birmingham Research Portal

Directory of Open Access Journals

Spiral - Imperial College Digital Repository

University of East Anglia digital repository

Optimization of Spaced K-mer Frequency Feature Extraction using Genetic Algorithms for Metagenome Fragment Classification

Author: Buono Agus
Kusuma Wisnu Ananta
Pekuwali Arini
Publication venue: LPPM ITBis Lembah Dempo
Publication date: 01/09/2018
Field of study

K-mer frequencies are commonly used in extracting features from metagenome fragments. In spite of this, researchers have found that their use is still inefficient. In this research, a genetic algorithm was employed to find optimally spaced k-mers. These were obtained by generating the possible combinations of match positions and don't care positions (written as *). This approach was adopted from the concept of spaced seeds in PatternHunter. The use of spaced k-mers could reduce the size of the k-mer frequency feature's dimension. To measure the accuracy of the proposed method we used the naÃ¯ve Bayesian classifier (NBC). The result showed that the chromosome 111111110001, representing spaced k-mer model [111 1111 10001], was the best chromosome, with a higher fitness (85.42) than that of the k-mer frequency feature. Moreover, the proposed approach also reduced the feature extraction time.

Journal of ICT Research and Applications

Directory of Open Access Journals

ITB Journal

The PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences

Author: A Brady
A Valouev
AC McHardy
AC McHardy
Alice Carolyn McHardy
C Burge
DH Huson
F Meyer
F Sanger
F Warnecke
GL Rosen
GW Tyson
H Teeling
I Tsochantaridis
J Handelsman
K Mavromatis
Kaustubh Raosaheb Patil
KR Patil
KU Foerstner
Linus Roune
M Hess
M Margulies
ML Metzker
N Adams
P Hugenholtz
PB Pope
PJ Turnbaugh
R Sandberg
R Tewhey
S Karlin
Sarah K. Highlander
SF Altschul
W Gerlach
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Metagenome sequencing is becoming common and there is an increasing need for easily accessible tools for data analysis. An essential step is the taxonomic classification of sequence fragments. We describe a web server for the taxonomic assignment of metagenome sequences with PhyloPythiaS. PhyloPythiaS is a fast and accurate sequence composition-based classifier that utilizes the hierarchical relationships between clades. Taxonomic assignments with the web server can be made with a generic model, or with sample-specific models that users can specify and create. Several interactive visualization modes and multiple download formats allow quick and convenient analysis and downstream processing of taxonomic assignments. Here, we demonstrate usage of our web server by taxonomic assignment of metagenome samples from an acidophilic biofilm community of an acid mine and of a microbial community from cow rumen

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MPG.PuRe