78 research outputs found

    Mapping genetic variations to three- dimensional protein structures to enhance variant interpretation: a proposed framework

    Get PDF
    The translation of personal genomics to precision medicine depends on the accurate interpretation of the multitude of genetic variants observed for each individual. However, even when genetic variants are predicted to modify a protein, their functional implications may be unclear. Many diseases are caused by genetic variants affecting important protein features, such as enzyme active sites or interaction interfaces. The scientific community has catalogued millions of genetic variants in genomic databases and thousands of protein structures in the Protein Data Bank. Mapping mutations onto three-dimensional (3D) structures enables atomic-level analyses of protein positions that may be important for the stability or formation of interactions; these may explain the effect of mutations and in some cases even open a path for targeted drug development. To accelerate progress in the integration of these data types, we held a two-day Gene Variation to 3D (GVto3D) workshop to report on the latest advances and to discuss unmet needs. The overarching goal of the workshop was to address the question: what can be done together as a community to advance the integration of genetic variants and 3D protein structures that could not be done by a single investigator or laboratory? Here we describe the workshop outcomes, review the state of the field, and propose the development of a framework with which to promote progress in this arena. The framework will include a set of standard formats, common ontologies, a common application programming interface to enable interoperation of the resources, and a Tool Registry to make it easy to find and apply the tools to specific analysis problems. Interoperability will enable integration of diverse data sources and tools and collaborative development of variant effect prediction methods

    A Unique Signal Distorts the Perception of Species Richness and Composition in High-Throughput Sequencing Surveys of Microbial Communities: a Case Study of Fungi in Indoor Dust

    Get PDF
    Sequence-based surveys of microorganisms in varied environments have found extremely diverse assemblages. A standard practice in current high-throughput sequence (HTS) approaches in microbial ecology is to sequence the composition of many environmental samples at once by pooling amplicon libraries at a common concentration before processing on one run of a sequencing platform. Biomass of the target taxa, however, is not typically determined prior to HTS, and here, we show that when abundances of the samples differ to a large degree, this standard practice can lead to a perceived bias in community richness and composition. Fungal signal in settled dust of five university teaching laboratory classrooms, one of which was used for a mycology course, was surveyed. The fungal richness and composition in the dust of the nonmycology classrooms were remarkably similar to each other, while the mycology classroom was dominated by abundantly sporulating specimen fungi, particularly puffballs, and appeared to have a lower overall richness based on rarefaction curves and richness estimators. The fungal biomass was three to five times higher in the mycology classroom than the other classrooms, indicating that fungi added to the mycology classroom swamped the background fungi present in indoor air. Thus, the high abundance of a few taxa can skew the perception of richness and composition when samples are sequenced to an even depth. Next, we used in silico manipulations of the observed data to confirm that a unique signature can be identified with HTS approaches when the source is abundant, whether or not the taxon identity is distinct. Lastly, aerobiology of indoor fungi is discussed. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s00248-013-0266-4) contains supplementary material, which is available to authorized users

    Identification of six new susceptibility loci for invasive epithelial ovarian cancer.

    Get PDF
    Genome-wide association studies (GWAS) have identified 12 epithelial ovarian cancer (EOC) susceptibility alleles. The pattern of association at these loci is consistent in BRCA1 and BRCA2 mutation carriers who are at high risk of EOC. After imputation to 1000 Genomes Project data, we assessed associations of 11 million genetic variants with EOC risk from 15,437 cases unselected for family history and 30,845 controls and from 15,252 BRCA1 mutation carriers and 8,211 BRCA2 mutation carriers (3,096 with ovarian cancer), and we combined the results in a meta-analysis. This new study design yielded increased statistical power, leading to the discovery of six new EOC susceptibility loci. Variants at 1p36 (nearest gene, WNT4), 4q26 (SYNPO2), 9q34.2 (ABO) and 17q11.2 (ATAD5) were associated with EOC risk, and at 1p34.3 (RSPO1) and 6p22.1 (GPX6) variants were specifically associated with the serous EOC subtype, all with P < 5 × 10(-8). Incorporating these variants into risk assessment tools will improve clinical risk predictions for BRCA1 and BRCA2 mutation carriers.COGS project is funded through a European Commission's Seventh Framework Programme grant (agreement number 223175 ] HEALTH ]F2 ]2009 ]223175). The CIMBA data management and data analysis were supported by Cancer Research.UK grants 12292/A11174 and C1287/A10118. The Ovarian Cancer Association Consortium is supported by a grant from the Ovarian Cancer Research Fund thanks to donations by the family and friends of Kathryn Sladek Smith (PPD/RPCI.07). The scientific development and funding for this project were in part supported by the US National Cancer Institute GAME ]ON Post ]GWAS Initiative (U19 ]CA148112). This study made use of data generated by the Wellcome Trust Case Control consortium. Funding for the project was provided by the Wellcome Trust under award 076113. The results published here are in part based upon data generated by The Cancer Genome Atlas Pilot Project established by the National Cancer Institute and National Human Genome Research Institute (dbGap accession number phs000178.v8.p7). The cBio portal is developed and maintained by the Computational Biology Center at Memorial Sloan ] Kettering Cancer Center. SH is supported by an NHMRC Program Grant to GCT. Details of the funding of individual investigators and studies are provided in the Supplementary Note. This study made use of data generated by the Wellcome Trust Case Control consortium, funding for which was provided by the Wellcome Trust under award 076113. The results published here are, in part, based upon data generated by The Cancer Genome Atlas Pilot Project established by the National Cancerhttp://dx.doi.org/10.1038/ng.3185This is the Author Accepted Manuscript of 'Identification of six new susceptibility loci for invasive epithelial ovarian cancer' which was published in Nature Genetics 47, 164–171 (2015) © Nature Publishing Group - content may only be used for academic research

    A framework for human microbiome research

    Get PDF
    A variety of microbial communities and their genes (the microbiome) exist throughout the human body, with fundamental roles in human health and disease. The National Institutes of Health (NIH)-funded Human Microbiome Project Consortium has established a population-scale framework to develop metagenomic protocols, resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 or 18 body sites up to three times, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far. In parallel, approximately 800 reference strains isolated from the human body have been sequenced. Collectively, these data represent the largest resource describing the abundance and variety of the human microbiome, while providing a framework for current and future studies

    Structure, function and diversity of the healthy human microbiome

    Get PDF
    Author Posting. © The Authors, 2012. This article is posted here by permission of Nature Publishing Group. The definitive version was published in Nature 486 (2012): 207-214, doi:10.1038/nature11234.Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81–99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome.This research was supported in part by National Institutes of Health grants U54HG004969 to B.W.B.; U54HG003273 to R.A.G.; U54HG004973 to R.A.G., S.K.H. and J.F.P.; U54HG003067 to E.S.Lander; U54AI084844 to K.E.N.; N01AI30071 to R.L.Strausberg; U54HG004968 to G.M.W.; U01HG004866 to O.R.W.; U54HG003079 to R.K.W.; R01HG005969 to C.H.; R01HG004872 to R.K.; R01HG004885 to M.P.; R01HG005975 to P.D.S.; R01HG004908 to Y.Y.; R01HG004900 to M.K.Cho and P. Sankar; R01HG005171 to D.E.H.; R01HG004853 to A.L.M.; R01HG004856 to R.R.; R01HG004877 to R.R.S. and R.F.; R01HG005172 to P. Spicer.; R01HG004857 to M.P.; R01HG004906 to T.M.S.; R21HG005811 to E.A.V.; M.J.B. was supported by UH2AR057506; G.A.B. was supported by UH2AI083263 and UH3AI083263 (G.A.B., C. N. Cornelissen, L. K. Eaves and J. F. Strauss); S.M.H. was supported by UH3DK083993 (V. B. Young, E. B. Chang, F. Meyer, T. M. S., M. L. Sogin, J. M. Tiedje); K.P.R. was supported by UH2DK083990 (J. V.); J.A.S. and H.H.K. were supported by UH2AR057504 and UH3AR057504 (J.A.S.); DP2OD001500 to K.M.A.; N01HG62088 to the Coriell Institute for Medical Research; U01DE016937 to F.E.D.; S.K.H. was supported by RC1DE0202098 and R01DE021574 (S.K.H. and H. Li); J.I. was supported by R21CA139193 (J.I. and D. S. Michaud); K.P.L. was supported by P30DE020751 (D. J. Smith); Army Research Office grant W911NF-11-1-0473 to C.H.; National Science Foundation grants NSF DBI-1053486 to C.H. and NSF IIS-0812111 to M.P.; The Office of Science of the US Department of Energy under Contract No. DE-AC02-05CH11231 for P.S. C.; LANL Laboratory-Directed Research and Development grant 20100034DR and the US Defense Threat Reduction Agency grants B104153I and B084531I to P.S.C.; Research Foundation - Flanders (FWO) grant to K.F. and J.Raes; R.K. is an HHMI Early Career Scientist; Gordon&BettyMoore Foundation funding and institutional funding fromthe J. David Gladstone Institutes to K.S.P.; A.M.S. was supported by fellowships provided by the Rackham Graduate School and the NIH Molecular Mechanisms in Microbial Pathogenesis Training Grant T32AI007528; a Crohn’s and Colitis Foundation of Canada Grant in Aid of Research to E.A.V.; 2010 IBM Faculty Award to K.C.W.; analysis of the HMPdata was performed using National Energy Research Scientific Computing resources, the BluBioU Computational Resource at Rice University

    Rare coding variants in PLCG2, ABI3, and TREM2 implicate microglial-mediated innate immunity in Alzheimer's disease

    Get PDF
    We identified rare coding variants associated with Alzheimer’s disease (AD) in a 3-stage case-control study of 85,133 subjects. In stage 1, 34,174 samples were genotyped using a whole-exome microarray. In stage 2, we tested associated variants (P<1×10-4) in 35,962 independent samples using de novo genotyping and imputed genotypes. In stage 3, an additional 14,997 samples were used to test the most significant stage 2 associations (P<5×10-8) using imputed genotypes. We observed 3 novel genome-wide significant (GWS) AD associated non-synonymous variants; a protective variant in PLCG2 (rs72824905/p.P522R, P=5.38×10-10, OR=0.68, MAFcases=0.0059, MAFcontrols=0.0093), a risk variant in ABI3 (rs616338/p.S209F, P=4.56×10-10, OR=1.43, MAFcases=0.011, MAFcontrols=0.008), and a novel GWS variant in TREM2 (rs143332484/p.R62H, P=1.55×10-14, OR=1.67, MAFcases=0.0143, MAFcontrols=0.0089), a known AD susceptibility gene. These protein-coding changes are in genes highly expressed in microglia and highlight an immune-related protein-protein interaction network enriched for previously identified AD risk genes. These genetic findings provide additional evidence that the microglia-mediated innate immune response contributes directly to AD development

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
    corecore