30 research outputs found

    Novel Bacterial Taxa in the Human Microbiome

    Get PDF
    The human gut harbors thousands of bacterial taxa. A profusion of metagenomic sequence data has been generated from human stool samples in the last few years, raising the question of whether more taxa remain to be identified. We assessed metagenomic data generated by the Human Microbiome Project Consortium to determine if novel taxa remain to be discovered in stool samples from healthy individuals. To do this, we established a rigorous bioinformatics pipeline that uses sequence data from multiple platforms (Illumina GAIIX and Roche 454 FLX Titanium) and approaches (whole-genome shotgun and 16S rDNA amplicons) to validate novel taxa. We applied this approach to stool samples from 11 healthy subjects collected as part of the Human Microbiome Project. We discovered several low-abundance, novel bacterial taxa, which span three major phyla in the bacterial tree of life. We determined that these taxa are present in a larger set of Human Microbiome Project subjects and are found in two sampling sites (Houston and St. Louis). We show that the number of false-positive novel sequences (primarily chimeric sequences) would have been two orders of magnitude higher than the true number of novel taxa without validation using multiple datasets, highlighting the importance of establishing rigorous standards for the identification of novel taxa in metagenomic data. The majority of novel sequences are related to the recently discovered genus Barnesiella, further encouraging efforts to characterize the members of this genus and to study their roles in the microbial communities of the gut. A better understanding of the effects of less-abundant bacteria is important as we seek to understand the complex gut microbiome in healthy individuals and link changes in the microbiome to disease

    Integrating Diverse Datasets Improves Developmental Enhancer Prediction

    Get PDF
    Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope will enable researchers to further investigate questions in developmental biology. © 2014 Erwin et al

    A framework for human microbiome research

    Get PDF
    A variety of microbial communities and their genes (the microbiome) exist throughout the human body, with fundamental roles in human health and disease. The National Institutes of Health (NIH)-funded Human Microbiome Project Consortium has established a population-scale framework to develop metagenomic protocols, resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 or 18 body sites up to three times, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far. In parallel, approximately 800 reference strains isolated from the human body have been sequenced. Collectively, these data represent the largest resource describing the abundance and variety of the human microbiome, while providing a framework for current and future studies

    Structure, function and diversity of the healthy human microbiome

    Get PDF
    Author Posting. © The Authors, 2012. This article is posted here by permission of Nature Publishing Group. The definitive version was published in Nature 486 (2012): 207-214, doi:10.1038/nature11234.Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81–99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome.This research was supported in part by National Institutes of Health grants U54HG004969 to B.W.B.; U54HG003273 to R.A.G.; U54HG004973 to R.A.G., S.K.H. and J.F.P.; U54HG003067 to E.S.Lander; U54AI084844 to K.E.N.; N01AI30071 to R.L.Strausberg; U54HG004968 to G.M.W.; U01HG004866 to O.R.W.; U54HG003079 to R.K.W.; R01HG005969 to C.H.; R01HG004872 to R.K.; R01HG004885 to M.P.; R01HG005975 to P.D.S.; R01HG004908 to Y.Y.; R01HG004900 to M.K.Cho and P. Sankar; R01HG005171 to D.E.H.; R01HG004853 to A.L.M.; R01HG004856 to R.R.; R01HG004877 to R.R.S. and R.F.; R01HG005172 to P. Spicer.; R01HG004857 to M.P.; R01HG004906 to T.M.S.; R21HG005811 to E.A.V.; M.J.B. was supported by UH2AR057506; G.A.B. was supported by UH2AI083263 and UH3AI083263 (G.A.B., C. N. Cornelissen, L. K. Eaves and J. F. Strauss); S.M.H. was supported by UH3DK083993 (V. B. Young, E. B. Chang, F. Meyer, T. M. S., M. L. Sogin, J. M. Tiedje); K.P.R. was supported by UH2DK083990 (J. V.); J.A.S. and H.H.K. were supported by UH2AR057504 and UH3AR057504 (J.A.S.); DP2OD001500 to K.M.A.; N01HG62088 to the Coriell Institute for Medical Research; U01DE016937 to F.E.D.; S.K.H. was supported by RC1DE0202098 and R01DE021574 (S.K.H. and H. Li); J.I. was supported by R21CA139193 (J.I. and D. S. Michaud); K.P.L. was supported by P30DE020751 (D. J. Smith); Army Research Office grant W911NF-11-1-0473 to C.H.; National Science Foundation grants NSF DBI-1053486 to C.H. and NSF IIS-0812111 to M.P.; The Office of Science of the US Department of Energy under Contract No. DE-AC02-05CH11231 for P.S. C.; LANL Laboratory-Directed Research and Development grant 20100034DR and the US Defense Threat Reduction Agency grants B104153I and B084531I to P.S.C.; Research Foundation - Flanders (FWO) grant to K.F. and J.Raes; R.K. is an HHMI Early Career Scientist; Gordon&BettyMoore Foundation funding and institutional funding fromthe J. David Gladstone Institutes to K.S.P.; A.M.S. was supported by fellowships provided by the Rackham Graduate School and the NIH Molecular Mechanisms in Microbial Pathogenesis Training Grant T32AI007528; a Crohn’s and Colitis Foundation of Canada Grant in Aid of Research to E.A.V.; 2010 IBM Faculty Award to K.C.W.; analysis of the HMPdata was performed using National Energy Research Scientific Computing resources, the BluBioU Computational Resource at Rice University

    The global prevalence and ethnic heterogeneity of primary ciliary dyskinesia gene variants: a genetic database analysis

    No full text
    Background: Primary ciliary dyskinesia (PCD) is a motile ciliopathy characterised by otosinopulmonary infections. Inheritance is commonly autosomal recessive, with extensive locus and allelic heterogeneity. The prevalence is uncertain. Most genetic studies have been done in North America or Europe. The aim of the study was to estimate the worldwide prevalence and ethnic heterogeneity of PCD. Methods: We calculated the allele frequency of disease-causing variants in 29 PCD genes associated with autosomal recessive inheritance in 182 681 unique individuals to estimate the global prevalence of PCD in seven ethnicities (African or African American, Latino, Ashkenazi Jewish, Finnish, non-Finnish European, east Asian, and south Asian). We began by aggregating variants that had been interpreted by Invitae, San Francisco, CA, USA, a genetics laboratory with PCD expertise. We then determined the allele frequency of each variant (pathogenic, likely pathogenic, or variant of uncertain significance [VUS]) in the Genome Aggregation Database (gnomAD), a publicly available next-generation sequencing database that aggregates exome and genome sequencing information from a wide variety of large-scale projects and stratifies allele counts by ethnicity. Using the Hardy-Weinberg equilibrium equation, we were able to calculate a lower-end prevalence of PCD for each ethnicity by including only pathogenic and likely pathogenic variants; and upper-end prevalence by also including VUS. This approach was similar to previous work on Li-Fraumeni (TP53 variants) prevalence. We were not diagnosing PCD, but rather estimating prevalence based on known variants. Findings: The overall minimum global prevalence of PCD is calculated to be at least one in 7554 individuals, although this is likely to be an underestimate because some variants currently classified as VUS might be disease-causing and some pathogenic variants might not be detected by our methods. In the overall cohort, Invitae data could be included for variants without gnomAD data for a primary ethnicity. When using only gnomAD allele frequencies to calculate prevalence in individual ethnicities, the estimated prevalence of PCD was lower in each ethnicity compared with the overall cohort. This is because the overall cohort includes additional data from the Invitae database such as copy number variants and other variants not present in gnomAD. With gnomAD we found the expected PCD frequency to be higher in individuals of African ancestry than in most other populations (excluding VUS: 1 in 9906 in African or African American vs 1 in 10 388 in non-Finnish European vs 1 in 14 606 in east Asian vs 1 in 16 309 in Latino; including VUS: 1 in 106 in African or African American vs 1 in 178 in non-Finnish European vs 1 in 196 in Latino vs 1 in 188 in east Asian). In addition, we found that the top 5 genes most commonly implicated in PCD differed across ethnic ancestries and contrasted commonly published findings. Interpretation: PCD appears to be more common than has been recognised, particularly in individuals of African ancestry. We identified gene distributions that differ from those in previous European and North American studies. These results could have an international impact on case identification. Our analytic approach can be expanded as more PCD loci are identified, and could be adapted to study the prevalence of other inherited diseases

    Four novel developmental enhancers near <i>FOXC2</i>.

    No full text
    <p>This UCSC Genome Browser (<a href="http://genome.ucsc.edu" target="_blank">http://genome.ucsc.edu</a>) snapshot shows the genomic context of four candidate human enhancers tested in transgenic zebrafish. For each enhancer, we show a zebrafish image that is representative of the reproducible expression patterns. <i>FOXC2</i> Enhancer Candidate 1 (F2EC-1) drives expression at 48 hpf in the eye and epidermis (arrows). F2EC-2 shows expression at 24 hpf in the forebrain, midbrain, and nerve. F2EC-3 drives expression at 48 hpf in the epidermis and heart. F2EC-4 shows expression at 48 hpf in the notochord, spinal cord, and heart. See <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003677#pcbi.1003677.s017" target="_blank">Table S6</a> for full list of expressed tissues seen in each candidate enhancer and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003677#pcbi.1003677.s010" target="_blank">Figure S10</a> for results on candidate enhancers near <i>FOXC1</i>.</p

    EnhancerFinder's two-step approach captures tissue-specific attributes of enhancers.

    No full text
    <p>(A) The true overlap of human enhancers of brain, heart, and limb in the VISTA database. The vast majority of characterized enhancers are unique to one of these tissues at this stage. For example, of the 84 validated heart enhancers, 71 are unique to heart, five are shared with brain, four with limb, and four with both. (B) The predicted overlap of VISTA enhancers based on predictions made with a single training step using MKL with only enhancers of that tissue considered positives and the genomic background as negatives. This approach overestimates the number of enhancers active in multiple tissues. Each classifier mainly learns general attributes of enhancers, rather than tissue-specific attributes. (C) The predicted overlap based on EnhancerFinder's two-step approach. These predictions are much more tissue-specific and exhibit overlaps between tissues similar to the true values (A). Predicted tissue distributions are similar when the methods are applied to other genomic regions, as illustrated in our genome-wide predictions, but only predictions on VISTA enhancers are shown here to enable comparisons to the distribution for validated enhancers (A).</p
    corecore