25 research outputs found
Novel Bacterial Taxa in the Human Microbiome
The human gut harbors thousands of bacterial taxa. A profusion of metagenomic sequence data has been generated from human stool samples in the last few years, raising the question of whether more taxa remain to be identified. We assessed metagenomic data generated by the Human Microbiome Project Consortium to determine if novel taxa remain to be discovered in stool samples from healthy individuals. To do this, we established a rigorous bioinformatics pipeline that uses sequence data from multiple platforms (Illumina GAIIX and Roche 454 FLX Titanium) and approaches (whole-genome shotgun and 16S rDNA amplicons) to validate novel taxa. We applied this approach to stool samples from 11 healthy subjects collected as part of the Human Microbiome Project. We discovered several low-abundance, novel bacterial taxa, which span three major phyla in the bacterial tree of life. We determined that these taxa are present in a larger set of Human Microbiome Project subjects and are found in two sampling sites (Houston and St. Louis). We show that the number of false-positive novel sequences (primarily chimeric sequences) would have been two orders of magnitude higher than the true number of novel taxa without validation using multiple datasets, highlighting the importance of establishing rigorous standards for the identification of novel taxa in metagenomic data. The majority of novel sequences are related to the recently discovered genus Barnesiella, further encouraging efforts to characterize the members of this genus and to study their roles in the microbial communities of the gut. A better understanding of the effects of less-abundant bacteria is important as we seek to understand the complex gut microbiome in healthy individuals and link changes in the microbiome to disease
Integrating Diverse Datasets Improves Developmental Enhancer Prediction
Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology. © 2014 Erwin et al
A framework for human microbiome research
A variety of microbial communities and their genes (the microbiome) exist throughout the human body, with fundamental roles in human health and disease. The National Institutes of Health (NIH)-funded Human Microbiome Project Consortium has established a population-scale framework to develop metagenomic protocols, resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 or 18 body sites up to three times, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far. In parallel, approximately 800 reference strains isolated from the human body have been sequenced. Collectively, these data represent the largest resource describing the abundance and variety of the human microbiome, while providing a framework for current and future studies
Structure, function and diversity of the healthy human microbiome
Author Posting. © The Authors, 2012. This article is posted here by permission of Nature Publishing Group. The definitive version was published in Nature 486 (2012): 207-214, doi:10.1038/nature11234.Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81–99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome.This research was supported in
part by National Institutes of Health grants U54HG004969 to B.W.B.; U54HG003273
to R.A.G.; U54HG004973 to R.A.G., S.K.H. and J.F.P.; U54HG003067 to E.S.Lander;
U54AI084844 to K.E.N.; N01AI30071 to R.L.Strausberg; U54HG004968 to G.M.W.;
U01HG004866 to O.R.W.; U54HG003079 to R.K.W.; R01HG005969 to C.H.;
R01HG004872 to R.K.; R01HG004885 to M.P.; R01HG005975 to P.D.S.;
R01HG004908 to Y.Y.; R01HG004900 to M.K.Cho and P. Sankar; R01HG005171 to
D.E.H.; R01HG004853 to A.L.M.; R01HG004856 to R.R.; R01HG004877 to R.R.S. and
R.F.; R01HG005172 to P. Spicer.; R01HG004857 to M.P.; R01HG004906 to T.M.S.;
R21HG005811 to E.A.V.; M.J.B. was supported by UH2AR057506; G.A.B. was
supported by UH2AI083263 and UH3AI083263 (G.A.B., C. N. Cornelissen, L. K. Eaves
and J. F. Strauss); S.M.H. was supported by UH3DK083993 (V. B. Young, E. B. Chang,
F. Meyer, T. M. S., M. L. Sogin, J. M. Tiedje); K.P.R. was supported by UH2DK083990 (J.
V.); J.A.S. and H.H.K. were supported by UH2AR057504 and UH3AR057504 (J.A.S.);
DP2OD001500 to K.M.A.; N01HG62088 to the Coriell Institute for Medical Research;
U01DE016937 to F.E.D.; S.K.H. was supported by RC1DE0202098 and
R01DE021574 (S.K.H. and H. Li); J.I. was supported by R21CA139193 (J.I. and
D. S. Michaud); K.P.L. was supported by P30DE020751 (D. J. Smith); Army Research
Office grant W911NF-11-1-0473 to C.H.; National Science Foundation grants NSF
DBI-1053486 to C.H. and NSF IIS-0812111 to M.P.; The Office of Science of the US
Department of Energy under Contract No. DE-AC02-05CH11231 for P.S. C.; LANL
Laboratory-Directed Research and Development grant 20100034DR and the US
Defense Threat Reduction Agency grants B104153I and B084531I to P.S.C.; Research
Foundation - Flanders (FWO) grant to K.F. and J.Raes; R.K. is an HHMI Early Career
Scientist; Gordon&BettyMoore Foundation funding and institutional funding fromthe
J. David Gladstone Institutes to K.S.P.; A.M.S. was supported by fellowships provided by
the Rackham Graduate School and the NIH Molecular Mechanisms in Microbial
Pathogenesis Training Grant T32AI007528; a Crohn’s and Colitis Foundation of
Canada Grant in Aid of Research to E.A.V.; 2010 IBM Faculty Award to K.C.W.; analysis
of the HMPdata was performed using National Energy Research Scientific Computing
resources, the BluBioU Computational Resource at Rice University
Recommended from our members
Extensive sequencing of seven human genomes to characterize benchmark reference materials
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly
Distribution and abundance of novel OTUs.
<p>(A) The number of reads assigned to each OTU (x-axis) across all the samples (y-axis) is represented by color on a scale from light yellow (few sequences) to dark red (many sequences). Within each dataset (“Roche V1–3”, “Roche V3–5”, and “WGS”), samples are organized by their source: the three smaller groups of samples labeled “11” contain the original reads (from 11 subjects) used to define the OTUs. Reads recruited to the OTUs from the extended Roche variable region datasets (“Roche V3–5” and “Roche V1–3”) and Illumina WGS dataset (additional samples, “AS”) are also shown. (B) The total number of reads assigned to each OTU (x-axis) across all subjects (y-axis) is represented by color on the same scale. These data include the original reads as well as reads from both extended variable region datasets and the Roche and Illumina WGS datasets. The subjects in the lower portion of the figure were sampled in St. Louis while those in the upper portion of the figure were sampled in Houston.</p
Flow chart describing the pipeline for the identification of novel OTUs.
<p>Green circles represent the data collected by the HMP, with the darkest green circles representing WGS read, and successively lighter green circles associated with 16S reads that show progressively more evidence of novelty. Blue squares represent external databases used to determine novelty. Major filtering steps in the pipeline are shown as yellow triangles.</p
Four novel developmental enhancers near <i>FOXC2</i>.
<p>This UCSC Genome Browser (<a href="http://genome.ucsc.edu" target="_blank">http://genome.ucsc.edu</a>) snapshot shows the genomic context of four candidate human enhancers tested in transgenic zebrafish. For each enhancer, we show a zebrafish image that is representative of the reproducible expression patterns. <i>FOXC2</i> Enhancer Candidate 1 (F2EC-1) drives expression at 48 hpf in the eye and epidermis (arrows). F2EC-2 shows expression at 24 hpf in the forebrain, midbrain, and nerve. F2EC-3 drives expression at 48 hpf in the epidermis and heart. F2EC-4 shows expression at 48 hpf in the notochord, spinal cord, and heart. See <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003677#pcbi.1003677.s017" target="_blank">Table S6</a> for full list of expressed tissues seen in each candidate enhancer and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003677#pcbi.1003677.s010" target="_blank">Figure S10</a> for results on candidate enhancers near <i>FOXC1</i>.</p
Novelty at different taxonomic levels.
<p>(A) Distribution of novel OTUs as a function of the maximum percent identity of a constituent read to a nearest neighbor for (black) all reads in the RDP and NT databases and (red) cultured reads. (B) The distribution of novel OTUs as a function of the lowest taxonomic rank that was confidently (with a bootstrap value of >0.5) assigned to a constituent read.</p