Search CORE

1,442 research outputs found

PhylOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data.

Author: Eisen Jonathan A
Green Jessica L
Kembel Steven W
Ladau Joshua
O'Dwyer James P
Pollard Katherine S
Riesenfeld Samantha J
Sharpton Thomas J
Publication venue: eScholarship, University of California
Publication date: 01/01/2011
Field of study

Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Motifs from the deep

Author: Codrea Vlad
Ellington Andrew D
Hwang Tony W
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Because of the increasing recognition of the importance of non-coding RNAs in gene regulation, there is considerable interest in identifying RNA motifs in genomic data. In a recent report in BMC Genomics, Breaker and colleagues describe a new algorithm for identifying functional noncoding RNAs in metagenomic sequences of marine organisms, a strategy that may be particularly effective for discovering new and unique riboswitches

Crossref

PubMed Central

Identification of candidate structured RNAs in the marine organism 'Candidatus Pelagibacter ubique'

Author: Ames Tyler D
Breaker Ronald R
Giovannoni Stephen J
Meyer Michelle M
Schwalbach Michael S
Smith Daniel P
Weinberg Zasha
Publication venue: BioMed Central
Publication date: 01/06/2009
Field of study

Abstract Background Metagenomic sequence data are proving to be a vast resource for the discovery of biological components. Yet analysis of this data to identify functional RNAs lags behind efforts to characterize protein diversity. The genome of '<it>Candidatus </it>Pelagibacter ubique' HTCC 1062 is the closest match for approximately 20% of marine metagenomic sequence reads. It is also small, contains little non-coding DNA, and has strikingly low GC content. Results To aid the discovery of RNA motifs within the marine metagenome we exploited the genomic properties of '<it>Cand</it>. P. ubique' by targeting our search to long intergenic regions (IGRs) with relatively high GC content. Analysis of known RNAs (rRNA, tRNA, riboswitches etc.) shows that structured RNAs are significantly enriched in such IGRs. To identify additional candidate structured RNAs, we examined other IGRs with similar characteristics from '<it>Cand</it>. P. ubique' using comparative genomics approaches in conjunction with marine metagenomic data. Employing this strategy, we discovered four candidate structured RNAs including a new riboswitch class as well as three additional likely <it>cis</it>-regulatory elements that precede genes encoding ribosomal proteins S2 and S12, and the cytoplasmic protein component of the signal recognition particle. We also describe four additional potential RNA motifs with few or no examples occurring outside the metagenomic data. Conclusion This work begins the process of identifying functional RNA motifs present in the metagenomic data and illustrates how existing completed genomes may be used to aid in this task.</p

Directory of Open Access Journals

PubMed Central

A Primer on Metagenomics

Author: Bourne Philip E.
Friedberg Iddo
Godzik Adam
Wooley John C.
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

eScholarship - University of California

Expansion of the Protein Repertoire in Newly Explored Environments: Human Gut Microbiome Specific Protein Families

Author: Adam Godzik
AJ Enright
AM Cerdeno-Tarraga
BH Dessailly
C Zmasek
David T. Jones
DC Savage
EC Martens
H Noguchi
J Xu
JA Shipman
John C. Wooley
K Kurokawa
Kyle Ellrott
Lukasz Jaroszewski
N Siew
NC Verberkmoes
PB Eckburg
PJ Turnbaugh
RD Finn
RD Finn
RL Tatusov
S Hunter
S Yooseph
SF Altschul
SF Altschul
SR Eddy
SR Gill
TZ DeSantis
W Li
W Li
Weizhong Li
Publication venue: Public Library of Science
Publication date: 01/06/2010
Field of study

The microbes that inhabit particular environments must be able to perform molecular functions that provide them with a competitive advantage to thrive in those environments. As most molecular functions are performed by proteins and are conserved between related proteins, we can expect that organisms successful in a given environmental niche would contain protein families that are specific for functions that are important in that environment. For instance, the human gut is rich in polysaccharides from the diet or secreted by the host, and is dominated by Bacteroides, whose genomes contain highly expanded repertoire of protein families involved in carbohydrate metabolism. To identify other protein families that are specific to this environment, we investigated the distribution of protein families in the currently available human gut genomic and metagenomic data. Using an automated procedure, we identified a group of protein families strongly overrepresented in the human gut. These not only include many families described previously but also, interestingly, a large group of previously unrecognized protein families, which suggests that we still have much to discover about this environment. The identification and analysis of these families could provide us with new information about an environment critical to our health and well being

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees

Author: Eisen Jonathan A.
Frazier Marvin
Halpern Aaron
Rusch Douglas B.
Venter J. Craig
Wu Dongying
Wu Martin
Yooseph Shibu
Publication venue: Public Library of Science
Publication date: 01/03/2011
Field of study

BACKGROUND: Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species. METHODOLOGY/PRINCIPAL FINDINGS: We designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences. CONCLUSIONS/SIGNIFICANCE: Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Systematic identification of gene families for use as markers for phylogenetic and phylogeny- driven ecological studies of bacteria and archaea and their major subgroups

Author: Eisen Jonathan A.
Jospin Guillaume
Wu Dongying
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 02/07/2013
Field of study

With the astonishing rate that the genomic and metagenomic sequence data sets are accumulating, there are many reasons to constrain the data analyses. One approach to such constrained analyses is to focus on select subsets of gene families that are particularly well suited for the tasks at hand. Such gene families have generally been referred to as marker genes. We are particularly interested in identifying and using such marker genes for phylogenetic and phylogeny-driven ecological studies of microbes and their communities. We therefore refer to these as PhyEco (for phylogenetic and phylogenetic ecology) markers. The dual use of these PhyEco markers means that we needed to develop and apply a set of somewhat novel criteria for identification of the best candidates for such markers. The criteria we focused on included universality across the taxa of interest, ability to be used to produce robust phylogenetic trees that reflect as much as possible the evolution of the species from which the genes come, and low variation in copy number across taxa. We describe here an automated protocol for identifying potential PhyEco markers from a set of complete genome sequences. The protocol combines rapid searching, clustering and phylogenetic tree building algorithms to generate protein families that meet the criteria listed above. We report here the identification of PhyEco markers for different taxonomic levels including 40 for all bacteria and archaea, 114 for all bacteria, and much more for some of the individual phyla of bacteria. This new list of PhyEco markers should allow much more detailed automated phylogenetic and phylogenetic ecology analyses of these groups than possible previously.Comment: 24 pages, 3 figure

arXiv.org e-Print Archive

FigShare

Recommended from our members

Extensive horizontal gene transfer in cheese-associated bacteria.

Author: Bonham Kevin S
Dutton Rachel J
Wolfe Benjamin E
Publication venue: eScholarship, University of California
Publication date: 01/06/2017
Field of study

Acquisition of genes through horizontal gene transfer (HGT) allows microbes to rapidly gain new capabilities and adapt to new or changing environments. Identifying widespread HGT regions within multispecies microbiomes can pinpoint the molecular mechanisms that play key roles in microbiome assembly. We sought to identify horizontally transferred genes within a model microbiome, the cheese rind. Comparing 31 newly sequenced and 134 previously sequenced bacterial isolates from cheese rinds, we identified over 200 putative horizontally transferred genomic regions containing 4733 protein coding genes. The largest of these regions are enriched for genes involved in siderophore acquisition, and are widely distributed in cheese rinds in both Europe and the US. These results suggest that HGT is prevalent in cheese rind microbiomes, and that identification of genes that are frequently transferred in a particular environment may provide insight into the selective forces shaping microbial communities

eScholarship - University of California

PhyloSift: Phylogenetic analysis of genomes and metagenomes

Author: Bik HM
Darling AE
Eisen JA
Jospin G
Lowe E
Matsen FA
Publication venue: 'PeerJ'
Publication date: 01/01/2014
Field of study

Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection. In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata. These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454). © 2014 Darling et al

OPUS - University of Technology Sydney

Directory of Open Access Journals

PubMed Central

eScholarship - University of California