Search CORE

36 research outputs found

Analysis and comparison of very large metagenomes with fast clustering and functional annotation

Author: AC McHardy
AR Quinlan
B Rodriguez-Brito
D Sheskin
DB Rusch
DC Richter
DH Huson
E Portugaly
EA Dinsdale
EF DeLong
FE Angly
GW Tyson
H Noguchi
H Noguchi
H Teeling
H Teeling
J Shendure
JC Venter
K Mavromatis
KJ Hoff
L Krause
PD Schloss
R Seshadri
RK Aziz
S Yooseph
S Yooseph
SF Altschul
SG Tringe
SR Eddy
SR Gill
W Li
W Li
W Li
W Li
Weizhong Li
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from <url>http://tools.camera.calit2.net/camera/rammcap/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

Author: A Bateman
A Nekrutenko
AC McHardy
AG Murzin
CA Orengo
D Fischer
DB Rusch
DH Haft
DL Wheeler
E Birney
ED Harrington
EF DeLong
EF DeLong
F Corpet
F Sanger
FMDL Vega
Granger Sutton
GW Tyson
H Noguchi
H Ochman
J Besemer
J Quackenbush
JA Eisen
JC Venter
K Chen
K Mavromatis
L Krause
L Rychlewski
M Margulies
M Sait
N Siew
R Seshadri
R Unger
RC Edgar
S Yooseph
SF Altschul
SF Altschul
SG Tringe
Shibu Yooseph
SJ Giovannoni
SR Gill
W Li
W Li
W Li
Weizhong Li
Z Yang
Z Yang
Publication venue: BioMed Central
Publication date: 01/04/2008
Field of study

Abstract Background The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools. Results We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net). Conclusion The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Probing Metagenomics by Rapid Cluster Analysis of Very Large Datasets

Author: A Krogh
A Lupas
A Sali
AC McHardy
Adam Godzik
AJ Enright
AJ Enright
B Rodriguez-Brito
BE Suzek
David Jones
DB Rusch
DH Huson
EF DeLong
FE Angly
G Yona
GW Tyson
J Park
JA Cuff
JC Venter
JD Bendtsen
JD Thompson
John C. Wooley
K Mavromatis
L Holm
L Krause
L Rychlewski
ML Tress
O Sasson
P Pipenbacher
PD Schloss
R Apweiler
RL Tatusov
S Mika
S Yooseph
SF Altschul
SG Tringe
SR Eddy
SR Gill
U Hobohm
W Li
W Li
W Li
W Li
Weizhong Li
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

BACKGROUND: The scale and diversity of metagenomic sequencing projects challenge both our technical and conceptual approaches in gene and genome annotations. The recent Sorcerer II Global Ocean Sampling (GOS) expedition yielded millions of predicted protein sequences, which significantly altered the landscape of known protein space by more than doubling its size and adding thousands of new families (Yooseph et al., 2007 PLoS Biol 5, e16). Such datasets, not only by their sheer size, but also by many other features, defy conventional analysis and annotation methods. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we describe an approach for rapid analysis of the sequence diversity and the internal structure of such very large datasets by advanced clustering strategies using the newly modified CD-HIT algorithm. We performed a hierarchical clustering analysis on the 17.4 million Open Reading Frames (ORFs) identified from the GOS study and found over 33 thousand large predicted protein clusters comprising nearly 6 million sequences. Twenty percent of these clusters did not match known protein families by sequence similarity search and might represent novel protein families. Distributions of the large clusters were illustrated on organism composition, functional class, and sample locations. CONCLUSION/SIGNIFICANCE: Our clustering took about two orders of magnitude less computational effort than the similar protein family analysis of original GOS study. This approach will help to analyze other large metagenomic datasets in the future. A Web server with our clustering results and annotations of predicted protein clusters is available online at http://tools.camera.calit2.net/gos under the CAMERA project

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

MetaMine – A tool to detect and analyse gene patterns in their environmental context

Author: A Alexeyenko
A Bateman
A Enright
A Meyerdierks
B Jørgensen
B Snel
C von Mering
ED Harrington
Frank O Glöckner
GW Tyson
I Jonassen
I Jonassen
I Mandoiu
I Rigoutsos
J Boekhorst
JC Venter
M Hu
MA Moran
MA Moran
MA Moran
MPP Béal
N Luc
R Finn
R Overbeek
R Overbeek
Renzo Kottmann
RK Aziz
RL Tatusov
S Altschul
S Giovannoni
S Hallam
S Yooseph
SG Tringe
SJH Kim
T Lombardot
T Lombardot
Thierry Lombardot
Uta Bohnebeck
V Markowitz
V Markowitz
VM Markowitz
X He
Y Ye
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background Modern sequencing technologies allow rapid sequencing and bioinformatic analysis of genomes and metagenomes. With every new sequencing project a vast number of new proteins become available with many genes remaining functionally unclassified based on evidences from sequence similarities alone. Extending similarity searches with gene pattern approaches, defined as genes sharing a distinct genomic neighbourhood, have shown to significantly improve the number of functional assignments. Further functional evidences can be gained by correlating these gene patterns with prevailing environmental parameters. MetaMine was developed to approach the large pool of unclassified proteins by searching for recurrent gene patterns across habitats based on key genes. Results MetaMine is an interactive data mining tool which enables the detection of gene patterns in an environmental context. The gene pattern search starts with a user defined environmentally interesting key gene. With this gene a BLAST search is carried out against the Microbial Ecological Genomics DataBase (MEGDB) containing marine genomic and metagenomic sequences. This is followed by the determination of all neighbouring genes within a given distance and a search for functionally equivalent genes. In the final step a set of common genes present in a defined number of distinct genomes is determined. The gene patterns found are associated with their individual pattern instances describing gene order and directions. They are presented together with information about the sample and the habitat. MetaMine is implemented in Java and provided as a client/server application with a user-friendly graphical user interface. The system was evaluated with environmentally relevant genes related to the methane-cycle and carbon monoxide oxidation. Conclusion MetaMine offers a targeted, semi-automatic search for gene patterns based on expert input. The graphical user interface of MetaMine provides a user-friendly overview of the computed gene patterns for further inspection in an ecological context. Prevailing biological processes associated with a key gene can be used to infer new annotations and shape hypotheses to guide further analyses. The use-cases demonstrate that meaningful gene patterns can be quickly detected using MetaMine

Crossref

Springer - Publisher Connector

PubMed Central

MPG.PuRe

New Detection Systems of Bacteria Using Highly Selective Media Designed by SMART: Selective Medium-Design Algorithm Restricted by Two Constraints

Culturing is an indispensable technique in microbiological research, and culturing with selective media has played a crucial role in the detection of pathogenic microorganisms and the isolation of commercially useful microorganisms from environmental samples. Although numerous selective media have been developed in empirical studies, unintended microorganisms often grow on such media probably due to the enormous numbers of microorganisms in the environment. Here, we present a novel strategy for designing highly selective media based on two selective agents, a carbon source and antimicrobials. We named our strategy SMART for highly Selective Medium-design Algorithm Restricted by Two constraints. To test whether the SMART method is applicable to a wide range of microorganisms, we developed selective media for Burkholderia glumae, Acidovorax avenae, Pectobacterium carotovorum, Ralstonia solanacearum, and Xanthomonas campestris. The series of media developed by SMART specifically allowed growth of the targeted bacteria. Because these selective media exhibited high specificity for growth of the target bacteria compared to established selective media, we applied three notable detection technologies: paper-based, flow cytometry-based, and color change-based detection systems for target bacteria species. SMART facilitates not only the development of novel techniques for detecting specific bacteria, but also our understanding of the ecology and epidemiology of the targeted bacteria

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A statistical toolbox for metagenomics: assessing functional diversity in microbial communities

Author: A Chao
A Chao
A Chao
A Chao
A Chao
AC McHardy
AE Magurran
AP Martin
B Rodriguez-Brito
C von Mering
CS Riesenfeld
DA Rasko
DB Rusch
DH Huson
DR Singleton
E Lerat
EF DeLong
EF Delong
GW Tyson
H Garcia Martin
H Teeling
H Teeling
JC Venter
JC Yue
JL Stein
Jo Handelsman
JP Wang
K Mavromatis
KP Burnham
KU Foerstner
L Excoffier
M Breitbart
M Breitbart
M Margulies
M Strous
MJ Anderson
MR Rondon
P Legendre
Patrick D Schloss
PD Schloss
PD Schloss
PD Schloss
PD Schloss
PD Schloss
PL Johnson
S Yooseph
SG Tringe
SJ Hallam
SJ Hallam
SR Gill
T Woyke
TD Read
TM Schmidt
VM Markowitz
W Ludwig
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data. Results Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments. Conclusion The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Contrasting Microbial Community Assembly Hypotheses: A Reconciling Tale from the Río Tinto

The Río Tinto (RT) is distinguished from other acid mine drainage systems by its natural and ancient origins. Microbial life from all three domains flourishes in this ecosystem, but bacteria dominate metabolic processes that perpetuate environmental extremes. While the patchy geochemistry of the RT likely influences the dynamics of bacterial populations, demonstrating which environmental variables shape microbial diversity and unveiling the mechanisms underlying observed patterns, remain major challenges in microbial ecology whose answers rely upon detailed assessments of community structures coupled with fine-scale measurements of physico-chemical parameters.By using high-throughput environmental tag sequencing we achieved saturation of richness estimators for the first time in the RT. We found that environmental factors dictate the distribution of the most abundant taxa in this system, but stochastic niche differentiation processes, such as mutation and dispersal, also contribute to observed diversity patterns.We predict that studies providing clues to the evolutionary and ecological processes underlying microbial distributions will reconcile the ongoing debate between the Baas Becking vs. Hubbell community assembly hypotheses

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

HAL Descartes

Global diversity and biogeography of deep-sea pelagic prokaryotes

Author: A López-López
A Quaiser
A-B Martín-Cuadrado
BC Crump
C Pedrós-Alió
C Winter
CA Hanson
Carlos M Duarte
D Wilkins
DB Rusch
E Ivars-Martinez
E Pruesse
E Stackebrandt
E Stackebrandt
E Teira
EA Eloe
EF DeLong
EF DeLong
EF DeLong
Eugenio Fraile-Nuez
F Smedile
FM Cohan
Francisco M Cornejo-Castillo
GE Fox
GJ Herndl
GJ Herndl
Guillem Salazar
H Agogué
J Arístegui
J-F Ghiglione
JA Fuhrman
JA Gilbert
JA Gilbert
JF Ghiglione
JG Caporaso
Josep M Gasol
JR Cole
K Clarke
KD Pruitt
L Zinger
LJ Hamdan
LZ Allen
MB Karner
MJ Anderson
MJ Anderson
ML Sogin
MV Brown
MV Brown
NL Oden
P López-García
PE Galand
PR Minchin
R Massana
R Massana
R Schauer
S Ganesh
S Sunagawa
S Yooseph
SF Altschul
SG Acinas
Silvia G Acinas
SJ Giovannoni
T Nunoura
Verónica Benítez-Barrios
WJ Jenkins
X Antón Álvarez-Salgado
X Irigoien
Y Moalic
Y Wang
Å Hagström
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2015
Field of study

The deep-sea is the largest biome of the biosphere, and contains more than half of the whole ocean/'s microbes. Uncovering their general patterns of diversity and community structure at a global scale remains a great challenge, as only fragmentary information of deep-sea microbial diversity exists based on regional-scale studies. Here we report the first globally comprehensive survey of the prokaryotic communities inhabiting the bathypelagic ocean using high-throughput sequencing of the 16S rRNA gene. This work identifies the dominant prokaryotes in the pelagic deep ocean and reveals that 50{\%} of the operational taxonomic units (OTUs) belong to previously unknown prokaryotic taxa, most of which are rare and appear in just a few samples. We show that whereas the local richness of communities is comparable to that observed in previous regional studies, the global pool of prokaryotic taxa detected is modest (\~{}3600 OTUs), as a high proportion of OTUs are shared among samples. The water masses appear to act as clear drivers of the geographical distribution of both particle-attached and free-living prokaryotes. In addition, we show that the deep-oceanic basins in which the bathypelagic realm is divided contain different particle-attached (but not free-living) microbial communities. The combination of the aging of the water masses and a lack of complete dispersal are identified as the main drivers for this biogeographical pattern. All together, we identify the potential of the deep ocean as a reservoir of still unknown biological diversity with a higher degree of spatial complexity than hitherto considered.En prensa8,951

Crossref

PubMed Central

Digital.CSIC

Repositorio Institucional Digital del IEO

Achievements and new knowledge unraveled by metagenomic approaches

Author: A Henne
A Knietsch
A Knietsch
A Knietsch
A Majernik
AC McHardy
C Heath
C Jogler
C Manichanh
C Meilleur
C Simon
C Von Mering
C Wang
C Wu
Carola Simon
CB Abulencia
CC Lee
CJ Duan
CR Woese
CS Riesenfeld
CS Riesenfeld
DA Benson
DB Rusch
DC Richter
DG Lee
DH Haft
DH Huson
DH Huson
DL Cox-Foster
DT Pride
EA Dinsdale
EA Dinsdale
EF DeLong
EJ Biers
EM Gabor
F Hårdeman
F Meyer
FE Angly
G Li
G. P. Pathak
GR LeCleir
GW Tyson
H Teeling
H Teeling
H Yokouchi
HC Rees
HM Monzoorul
HN Poinar
I-C. Chen
J Bailly
J Frias-Lopez
J Handelsman
J Handelsman
J Pottkämper
J Wuyts
JA Gilbert
JA Gilbert
JC Venter
JF Biddle
JJ Banik
JK Rhee
JR Cole
JS Song
L Krause
L Wegley
LJ Jensen
LK McNeil
LL Williamson
M Ferrer
M Ferrer
M Kanehisa
M Strous
MH Lee
NN Diaz
O Béjà
P Lopez-Garcia
P Lorenz
P Lorenz
P Wei
PJ Turnbaugh
PW Van der Wielen
QC Meyer
R Daniel
R Overbeek
RA Edwards
RD Finn
RD Sleator
Rebecca Vega Thurber
RL Tatusov
Rolf Daniel
S Grant
S Karlin
S Morimoto
S Sjöling
S Voget
S Voget
S Yooseph
SF Altschul
SF Brady
SG Tringe
SJ Hallam
T Abe
T Abe
T Uchiyama
T Urich
T Waschkowitz
TC Galvao
TZ DeSantis
W Ludwig
Y Feng
Publication venue: Springer-Verlag
Publication date: 01/01/2009
Field of study

Metagenomics has paved the way for cultivation-independent assessment and exploitation of microbial communities present in complex ecosystems. In recent years, significant progress has been made in this research area. A major breakthrough was the improvement and development of high-throughput next-generation sequencing technologies. The application of these technologies resulted in the generation of large datasets derived from various environments such as soil and ocean water. The analyses of these datasets opened a window into the enormous phylogenetic and metabolic diversity of microbial communities living in a variety of ecosystems. In this way, structure, functions, and interactions of microbial communities were elucidated. Metagenomics has proven to be a powerful tool for the recovery of novel biomolecules. In most cases, functional metagenomics comprising construction and screening of complex metagenomic DNA libraries has been applied to isolate new enzymes and drugs of industrial importance. For this purpose, several novel and improved screening strategies that allow efficient screening of large collections of clones harboring metagenomes have been introduced

Crossref

Springer - Publisher Connector

PubMed Central

Genome Atlases, Potential Applications in Study of Metagenomes

Author: A Bolshoy
AG Pedersen
EF DeLong
ES Shpigelman
GW Tyson
HJ Tripp
J García-Martínez
LJ Jensen
MG Kalyuzhnaya
MV Brown
P Baldi
PF Hallin
RL Ornstein
S Sun
S Yooseph
SC Satchwell
SF Altschul
SG Tringe
SJ Giovannoni
SJ Giovannoni
SL Strom
VM Markowitz
Y-Y Huo
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref