68 research outputs found

    Network inference in matrix-variate Gaussian models with non-independent noise

    Full text link
    Inferring a graphical model or network from observational data from a large number of variables is a well studied problem in machine learning and computational statistics. In this paper we consider a version of this problem that is relevant to the analysis of multiple phenotypes collected in genetic studies. In such datasets we expect correlations between phenotypes and between individuals. We model observations as a sum of two matrix normal variates such that the joint covariance function is a sum of Kronecker products. This model, which generalizes the Graphical Lasso, assumes observations are correlated due to known genetic relationships and corrupted with non-independent noise. We have developed a computationally efficient EM algorithm to fit this model. On simulated datasets we illustrate substantially improved performance in network reconstruction by allowing for a general noise distribution

    GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals.

    Get PDF
    Loci discovered by genome-wide association studies predominantly map outside protein-coding genes. The interpretation of the functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking by which to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages genome-wide association studies' findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding not offered by current methods. We further assess enrichment of genome-wide association studies for 19 traits within Encyclopedia of DNA Elements- and Roadmap-derived regulatory regions. We characterize unique enrichment patterns for traits and annotations driving novel biological insights. The method is implemented in standalone software and an R package, to facilitate its application by the research community

    Genetic perturbation of PU.1 binding and chromatin looping at neutrophil enhancers associates with autoimmune disease.

    Get PDF
    Neutrophils play fundamental roles in innate immune response, shape adaptive immunity, and are a potentially causal cell type underpinning genetic associations with immune system traits and diseases. Here, we profile the binding of myeloid master regulator PU.1 in primary neutrophils across nearly a hundred volunteers. We show that variants associated with differential PU.1 binding underlie genetically-driven differences in cell count and susceptibility to autoimmune and inflammatory diseases. We integrate these results with other multi-individual genomic readouts, revealing coordinated effects of PU.1 binding variants on the local chromatin state, enhancer-promoter contacts and downstream gene expression, and providing a functional interpretation for 27 genes underlying immune traits. Collectively, these results demonstrate the functional role of PU.1 and its target enhancers in neutrophil transcriptional control and immune disease susceptibility

    The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease

    Get PDF
    Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants. This effort yielded hundreds of low frequency (<5%) and rare (<1%) variants with a strong impact on blood cell phenotypes. Our data highlight general properties of the allelic architecture of complex traits, including the proportion of the heritable component of each blood trait explained by the polygenic signal across different genome regulatory domains. Finally, through Mendelian randomization, we provide evidence of shared genetic pathways linking blood cell indices with complex pathologies, including autoimmune diseases, schizophrenia, and coronary heart disease and evidence suggesting previously reported population associations between blood cell indices and cardiovascular disease may be non-causal.We thank members of the Cambridge BioResource Scientific Advisory Board and Management Committee for their support of our study and the National Institute for Health Research Cambridge Biomedical Research Centre for funding. K.D. is funded as a HSST trainee by NHS Health Education England. M.F. is funded from the BLUEPRINT Grant Code HEALTH-F5-2011-282510 and the BHF Cambridge Centre of Excellence [RE/13/6/30180]. J.R.S. is funded by a MRC CASE Industrial studentship, co-funded by Pfizer. J.D. is a British Heart Foundation Professor, European Research Council Senior Investigator, and National Institute for Health Research (NIHR) Senior Investigator. S.M., S.T, M.H, K.M. and L.D. are supported by the NIHR BioResource-Rare Diseases, which is funded by NIHR. Research in the Ouwehand laboratory is supported by program grants from the NIHR to W.H.O., the European Commission (HEALTH-F2-2012-279233), the British Heart Foundation (BHF) to W.J.A. and D.R. under numbers RP-PG-0310-1002 and RG/09/12/28096 and Bristol Myers-Squibb; the laboratory also receives funding from NHSBT. W.H.O is a NIHR Senior Investigator. The INTERVAL academic coordinating centre receives core support from the UK Medical Research Council (G0800270), the BHF (SP/09/002), the NIHR and Cambridge Biomedical Research Centre, as well as grants from the European Research Council (268834), the European Commission Framework Programme 7 (HEALTH-F2-2012-279233), Merck and Pfizer. DJR and DA were supported by the NIHR Programme ‘Erythropoiesis in Health and Disease’ (Ref. NIHR-RP-PG-0310-1004). N.S. is supported by the Wellcome Trust (Grant Codes WT098051 and WT091310), the EU FP7 (EPIGENESYS Grant Code 257082 and BLUEPRINT Grant Code HEALTH-F5-2011-282510). The INTERVAL study is funded by NHSBT and has been supported by the NIHR-BTRU in Donor Health and Genomics at the University of Cambridge in partnership with NHSBT. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, the Department of Health of England or NHSBT. D.G. is supported by a “la Caixa”-Severo Ochoa pre-doctoral fellowship

    Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation

    Get PDF
    Characterizing genetic influences on DNA methylation (DNAm) provides an opportunity to understand mechanisms underpinning gene regulation and disease. In the present study, we describe results of DNAm quantitative trait locus (mQTL) analyses on 32,851 participants, identifying genetic variants associated with DNAm at 420,509 DNAm sites in blood. We present a database of >270,000 independent mQTLs, of which 8.5% comprise long-range (trans) associations. Identified mQTL associations explain 15–17% of the additive genetic variance of DNAm. We show that the genetic architecture of DNAm levels is highly polygenic. Using shared genetic control between distal DNAm sites, we constructed networks, identifying 405 discrete genomic communities enriched for genomic annotations and complex traits. Shared genetic variants are associated with both DNAm levels and complex diseases, but only in a minority of cases do these associations reflect causal relationships from DNAm to trait or vice versa, indicating a more complex genotype–phenotype map than previously anticipated

    Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    Get PDF
    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common-and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader allelic architecture of 12 anthropometric traits associated with height, body mass, and fat distribution in up to 267,616 individuals. We report 106 genome-wide significant signals that have not been previously identified, including 9 low-frequency variants pointing to functional candidates. Of the 106 signals, 6 are in genomic regions that have not been implicated with related traits before, 28 are independent signals at previously reported regions, and 72 represent previously reported signals for a different anthropometric trait. 71% of signals reside within genes and fine mapping resolves 23 signals to one or two likely causal variants. We confirm genetic overlap between human monogenic and polygenic anthropometric traits and find signal enrichment in cis expression QTLs in relevant tissues. Our results highlight the potential of WGS strategies to enhance biologically relevant discoveries across the frequency spectrum.Peer reviewe

    eFORGE: A Tool for Identifying Cell Type-Specific Signal in Epigenomic Data

    Get PDF
    Epigenome-wide association studies (EWAS) provide an alternative approach for studying human disease through consideration of non-genetic variants such as altered DNA methylation. To advance the complex interpretation of EWAS, we developed eFORGE (http://eforge.cs.ucl.ac.uk/), a new standalone and web-based tool for the analysis and interpretation of EWAS data. eFORGE determines the cell type-specific regulatory component of a set of EWAS-identified differentially methylated positions. This is achieved by detecting enrichment of overlap with DNase I hypersensitive sites across 454 samples (tissues, primary cell types, and cell lines) from the ENCODE, Roadmap Epigenomics, and BLUEPRINT projects. Application of eFORGE to 20 publicly available EWAS datasets identified disease-relevant cell types for several common diseases, a stem cell-like signature in cancer, and demonstrated the ability to detect cell-composition effects for EWAS performed on heterogeneous tissues. Our approach bridges the gap between large-scale epigenomics data and EWAS-derived target selection to yield insight into disease etiology.C.E.B. was supported by a PhD fellowship from the EU-FP7 project EpiTrain (316758). J.H. was supported by the UCL Cancer Institute Research Trust. V.K.R. was supported by BLUEPRINT (282510). K.D. was funded as a HSST trainee by NHS Health Education England. M.F. was supported by the BHF Cambridge Centre of Excellence (RE/13/6/30180). Research in W.H.O.’s laboratory was supported by EU-FP7 project BLUEPRINT (282510) and by program grants from the National Institute for Health Research (NIHR, http://www.nihr.ac.uk) and the British Heart Foundation under numbers RP-PG-0310-1002 and RG/09/12/28096 (https://www.bhf.org.uk/). W.H.O.’s laboratory receives funding from NHS Blood and Transplant for facilities. We gratefully acknowledge the participation of all NIHR Cambridge BioResource volunteers. We thank the Cambridge BioResource staff for their help with volunteer recruitment. We thank members of the Cambridge BioResource SAB and Management Committee for their support of our study and the National Institute for Health Research Cambridge Biomedical Research Centre for funding. R.S. and his group were supported by the European Union in the framework of the BLUEPRINT Project (HEALTH-F5-2011-282510) and the German Ministry of Science and Education (BMBF) in the framework of the MMML-MYC-SYS project (036166B). We thank Deborah Winter (Weizmann Institute) for supplying a set of microglial enhancers from Lavin et al. (2014). Research in S.B.’s laboratory was supported by the Wellcome Trust (99148), Royal Society Wolfson Research Merit Award (WM100023), and EU-FP7 projects EpiTrain (316758), EPIGENESYS (257082), and BLUEPRINT (282510)

    A structured approach to evaluating life course hypotheses: Moving beyond analyses of exposed versus unexposed in the omics context

    Get PDF
    The structured life course modeling approach (SLCMA) is a theory-driven analytic method that empirically compares multiple prespecified life course hypotheses characterizing time-dependent exposure-outcome relationships to determine which theory best fits the observed data. In this study, we performed simulations and empirical analyses to evaluate the performance of the SLCMA when applied to genome-wide DNA methylation (DNAm). Using simulations, we compared five statistical inference tests used with SLCMA (n=700), assessing the familywise error rate, statistical power, and confidence interval coverage to determine whether inference based on these tests was valid in the presence of substantial multiple testing and small effects, two hallmark challenges of inference from omics data. In the empirical analyses, we evaluated the time-dependent relationship of childhood abuse with genome-wide DNAm (n=703). In simulations, selective inference and max-|t|-test performed best: both controlled family-wise error rate and yielded moderate statistical power. Empirical analyses using SLCMA revealed time-dependent effects of childhood abuse on DNAm. Our findings show that SLCMA, applied and interpreted appropriately, can be used in high-throughput settings to examine time-dependent effects underlying exposure-outcome relationships over the life course. We provide recommendations for applying the SLCMA in omics settings and encourage researchers to move beyond analyses of exposed versus unexposed

    Steroid receptor coactivator-1 modulates the function of Pomc neurons and energy homeostasis

    Get PDF
    Hypothalamic neurons expressing the anorectic peptide Pro-opiomelanocortin (Pomc) regulate food intake and body weight. Here, we show that Steroid Receptor Coactivator-1 (SRC-1) interacts with a target of leptin receptor activation, phosphorylated STAT3, to potentiate Pomc transcription. Deletion of SRC-1 in Pomc neurons in mice attenuates their depolarization by leptin, decreases Pomc expression and increases food intake leading to high-fat diet-induced obesity. In humans, fifteen rare heterozygous variants in SRC-1 found in severely obese individuals impair leptin-mediated Pomc reporter activity in cells, whilst four variants found in non-obese controls do not. In a knock-in mouse model of a loss of function human variant (SRC-1L1376P), leptin-induced depolarization of Pomc neurons and Pomc expression are significantly reduced, and food intake and body weight are increased. In summary, we demonstrate that SRC-1 modulates the function of hypothalamic Pomc neurons, and suggest that targeting SRC-1 may represent a useful therapeutic strategy for weight loss.Peer reviewe

    Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    Get PDF
    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants
    • 

    corecore