29 research outputs found

    Network inference in matrix-variate Gaussian models with non-independent noise

    Full text link
    Inferring a graphical model or network from observational data from a large number of variables is a well studied problem in machine learning and computational statistics. In this paper we consider a version of this problem that is relevant to the analysis of multiple phenotypes collected in genetic studies. In such datasets we expect correlations between phenotypes and between individuals. We model observations as a sum of two matrix normal variates such that the joint covariance function is a sum of Kronecker products. This model, which generalizes the Graphical Lasso, assumes observations are correlated due to known genetic relationships and corrupted with non-independent noise. We have developed a computationally efficient EM algorithm to fit this model. On simulated datasets we illustrate substantially improved performance in network reconstruction by allowing for a general noise distribution

    Genetic perturbation of PU.1 binding and chromatin looping at neutrophil enhancers associates with autoimmune disease.

    Get PDF
    Neutrophils play fundamental roles in innate immune response, shape adaptive immunity, and are a potentially causal cell type underpinning genetic associations with immune system traits and diseases. Here, we profile the binding of myeloid master regulator PU.1 in primary neutrophils across nearly a hundred volunteers. We show that variants associated with differential PU.1 binding underlie genetically-driven differences in cell count and susceptibility to autoimmune and inflammatory diseases. We integrate these results with other multi-individual genomic readouts, revealing coordinated effects of PU.1 binding variants on the local chromatin state, enhancer-promoter contacts and downstream gene expression, and providing a functional interpretation for 27 genes underlying immune traits. Collectively, these results demonstrate the functional role of PU.1 and its target enhancers in neutrophil transcriptional control and immune disease susceptibility

    The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease

    Get PDF
    Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants. This effort yielded hundreds of low frequency (<5%) and rare (<1%) variants with a strong impact on blood cell phenotypes. Our data highlight general properties of the allelic architecture of complex traits, including the proportion of the heritable component of each blood trait explained by the polygenic signal across different genome regulatory domains. Finally, through Mendelian randomization, we provide evidence of shared genetic pathways linking blood cell indices with complex pathologies, including autoimmune diseases, schizophrenia, and coronary heart disease and evidence suggesting previously reported population associations between blood cell indices and cardiovascular disease may be non-causal.We thank members of the Cambridge BioResource Scientific Advisory Board and Management Committee for their support of our study and the National Institute for Health Research Cambridge Biomedical Research Centre for funding. K.D. is funded as a HSST trainee by NHS Health Education England. M.F. is funded from the BLUEPRINT Grant Code HEALTH-F5-2011-282510 and the BHF Cambridge Centre of Excellence [RE/13/6/30180]. J.R.S. is funded by a MRC CASE Industrial studentship, co-funded by Pfizer. J.D. is a British Heart Foundation Professor, European Research Council Senior Investigator, and National Institute for Health Research (NIHR) Senior Investigator. S.M., S.T, M.H, K.M. and L.D. are supported by the NIHR BioResource-Rare Diseases, which is funded by NIHR. Research in the Ouwehand laboratory is supported by program grants from the NIHR to W.H.O., the European Commission (HEALTH-F2-2012-279233), the British Heart Foundation (BHF) to W.J.A. and D.R. under numbers RP-PG-0310-1002 and RG/09/12/28096 and Bristol Myers-Squibb; the laboratory also receives funding from NHSBT. W.H.O is a NIHR Senior Investigator. The INTERVAL academic coordinating centre receives core support from the UK Medical Research Council (G0800270), the BHF (SP/09/002), the NIHR and Cambridge Biomedical Research Centre, as well as grants from the European Research Council (268834), the European Commission Framework Programme 7 (HEALTH-F2-2012-279233), Merck and Pfizer. DJR and DA were supported by the NIHR Programme ‘Erythropoiesis in Health and Disease’ (Ref. NIHR-RP-PG-0310-1004). N.S. is supported by the Wellcome Trust (Grant Codes WT098051 and WT091310), the EU FP7 (EPIGENESYS Grant Code 257082 and BLUEPRINT Grant Code HEALTH-F5-2011-282510). The INTERVAL study is funded by NHSBT and has been supported by the NIHR-BTRU in Donor Health and Genomics at the University of Cambridge in partnership with NHSBT. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, the Department of Health of England or NHSBT. D.G. is supported by a “la Caixa”-Severo Ochoa pre-doctoral fellowship

    Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation

    Get PDF
    Characterizing genetic influences on DNA methylation (DNAm) provides an opportunity to understand mechanisms underpinning gene regulation and disease. In the present study, we describe results of DNAm quantitative trait locus (mQTL) analyses on 32,851 participants, identifying genetic variants associated with DNAm at 420,509 DNAm sites in blood. We present a database of >270,000 independent mQTLs, of which 8.5% comprise long-range (trans) associations. Identified mQTL associations explain 15–17% of the additive genetic variance of DNAm. We show that the genetic architecture of DNAm levels is highly polygenic. Using shared genetic control between distal DNAm sites, we constructed networks, identifying 405 discrete genomic communities enriched for genomic annotations and complex traits. Shared genetic variants are associated with both DNAm levels and complex diseases, but only in a minority of cases do these associations reflect causal relationships from DNAm to trait or vice versa, indicating a more complex genotype–phenotype map than previously anticipated

    Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    Get PDF
    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants

    A multiple-phenotype imputation method for genetic studies

    Get PDF
    Genetic association studies have yielded a wealth of biologic discoveries. However, these have mostly analyzed one trait and one SNP at a time, thus failing to capture the underlying complexity of these datasets. Joint genotypephenotype analyses of complex, high-dimensional datasets represent an important way to move beyond simple GWAS with great potential. The move to high-dimensional phenotypes will raise many new statistical problems. In this paper we address the central issue of missing phenotypes in studies with any level of relatedness between samples. We propose a multiple phenotype mixed model and use a computationally efficient variational Bayesian algorithm to fit the model. On a variety of simulated and real datasets from a range of organisms and trait types, we show that our method outperforms existing state-of-the-art methods from the statistics and machine learning literature and can boost signals of associatio

    Bi-allelic Loss-of-Function CACNA1B Mutations in Progressive Epilepsy-Dyskinesia.

    Get PDF
    The occurrence of non-epileptic hyperkinetic movements in the context of developmental epileptic encephalopathies is an increasingly recognized phenomenon. Identification of causative mutations provides an important insight into common pathogenic mechanisms that cause both seizures and abnormal motor control. We report bi-allelic loss-of-function CACNA1B variants in six children from three unrelated families whose affected members present with a complex and progressive neurological syndrome. All affected individuals presented with epileptic encephalopathy, severe neurodevelopmental delay (often with regression), and a hyperkinetic movement disorder. Additional neurological features included postnatal microcephaly and hypotonia. Five children died in childhood or adolescence (mean age of death: 9 years), mainly as a result of secondary respiratory complications. CACNA1B encodes the pore-forming subunit of the pre-synaptic neuronal voltage-gated calcium channel Cav2.2/N-type, crucial for SNARE-mediated neurotransmission, particularly in the early postnatal period. Bi-allelic loss-of-function variants in CACNA1B are predicted to cause disruption of Ca2+ influx, leading to impaired synaptic neurotransmission. The resultant effect on neuronal function is likely to be important in the development of involuntary movements and epilepsy. Overall, our findings provide further evidence for the key role of Cav2.2 in normal human neurodevelopment.MAK is funded by an NIHR Research Professorship and receives funding from the Wellcome Trust, Great Ormond Street Children's Hospital Charity, and Rosetrees Trust. E.M. received funding from the Rosetrees Trust (CD-A53) and Great Ormond Street Hospital Children's Charity. K.G. received funding from Temple Street Foundation. A.M. is funded by Great Ormond Street Hospital, the National Institute for Health Research (NIHR), and Biomedical Research Centre. F.L.R. and D.G. are funded by Cambridge Biomedical Research Centre. K.C. and A.S.J. are funded by NIHR Bioresource for Rare Diseases. The DDD Study presents independent research commissioned by the Health Innovation Challenge Fund (grant number HICF-1009-003), a parallel funding partnership between the Wellcome Trust and the Department of Health, and the Wellcome Trust Sanger Institute (grant number WT098051). We acknowledge support from the UK Department of Health via the NIHR comprehensive Biomedical Research Centre award to Guy's and St. Thomas' National Health Service (NHS) Foundation Trust in partnership with King's College London. This research was also supported by the NIHR Great Ormond Street Hospital Biomedical Research Centre. J.H.C. is in receipt of an NIHR Senior Investigator Award. The research team acknowledges the support of the NIHR through the Comprehensive Clinical Research Network. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, Department of Health, or Wellcome Trust. E.R.M. acknowledges support from NIHR Cambridge Biomedical Research Centre, an NIHR Senior Investigator Award, and the University of Cambridge has received salary support in respect of E.R.M. from the NHS in the East of England through the Clinical Academic Reserve. I.E.S. is supported by the National Health and Medical Research Council of Australia (Program Grant and Practitioner Fellowship)

    Bayesian methods for multivariate phenotype analysis in genome-wide association studies

    No full text
    Most genome-wide association studies search for genetic variants associated to a single trait of interest, despite the main interest usually being the understanding of a complex genotype-phenotype network. Furthermore, many studies collect data on multiple phenotypes, each measuring a different aspect of the biological system under consideration, therefore it can often make sense to jointly analyze the phenotypes. However this is rarely the case and there is a lack of well developed methods for multiple phenotype analysis. Here we propose novel approaches for genome-wide association analysis, which scan the genome one SNP at a time for association with multivariate traits. The first half of this thesis focuses on an analytic model averaging approach which bi-partitions traits into associated and unassociated, fits all such models and measures evidence of association using a Bayes factor. The discrete nature of the model allows very fine control of prior beliefs about which sets of traits are more likely to be jointly associated. Using simulated data we show that this method can have much greater power than simpler approaches that do not explicitly model residual correlation between traits. On real data of six hematological parameters in 3 population cohorts (KORA, UKNBS and TwinsUK) from the HaemGen consortium, this model allows us to uncover an association at the RCL locus that was not identified in the original analysis but has been validated in a much larger study. In the second half of the thesis we propose and explore the properties of models that use priors encouraging sparse solutions, in the sense that genetic effects of phenotypes are shrunk towards zero when there is little evidence of association. To do this we explore and use spike and slab (SAS) priors. All methods combine both hypothesis testing, via calculation of a Bayes factor, and model selection, which occurs implicitly via the sparsity priors. We have successfully implemented a Variational Bayesian approach to fit this model, which provides a tractable approximation to the posterior distribution, and allows us to approximate the very high-dimensional integral required for the Bayes factor calculation. This approach has a number of desirable properties. It can handle missing phenotype data, which is a real feature of most studies. It allows for both correlation due to relatedness between subjects or population structure and residual phenotype correlation. It can be viewed as a sparse Bayesian multivariate generalization of the mixed model approaches that have become popular recently in the GWAS literature. In addition, the method is computationally fast and can be applied to millions of SNPs for a large number of phenotypes. Furthermore we apply our method to 15 glycans from 3 isolated population cohorts (ORCADES, KORCULA and VIS), where we uncover association at a known locus, not identified in the original study but discovered later in a larger one. We conclude by discussing future directions.</p

    Bayesian Methods for Multivariate Phenotype Analysis in Genome-wide Association Studies

    No full text
    Most genome-wide association studies search for genetic variants associated to a single trait of interest, despite the main interest usually being the understanding of a complex genotype-phenotype network. Furthermore, many studies collect data on multiple phenotypes, each measuring a different aspect of the biological system under consideration, therefore it can often make sense to jointly analyze the phenotypes. However this is rarely the case and there is a lack of well developed methods for multiple phenotype analysis. Here we propose novel approaches for genome-wide association analysis, which scan the genome one SNP at a time for association with multivariate traits. The first half of this thesis focuses on an analytic model averaging approach which bi-partitions traits into associated and unassociated, fits all such models and measures evidence of association using a Bayes factor. The discrete nature of the model allows very fine control of prior beliefs about which sets of traits are more likely to be jointly associated. Using simulated data we show that this method can have much greater power than simpler approaches that do not explicitly model residual correlation between traits. On real data of six hematological parameters in 3 population cohorts (KORA, UKNBS and TwinsUK) from the HaemGen consortium, this model allows us to uncover an association at the RCL locus that was not identified in the original analysis but has been validated in a much larger study. In the second half of the thesis we propose and explore the properties of models that use priors encouraging sparse solutions, in the sense that genetic effects of phenotypes are shrunk towards zero when there is little evidence of association. To do this we explore and use spike and slab (SAS) priors. All methods combine both hypothesis testing, via calculation of a Bayes factor, and model selection, which occurs implicitly via the sparsity priors. We have successfully implemented a Variational Bayesian approach to fit this model, which provides a tractable approximation to the posterior distribution, and allows us to approximate the very high-dimensional integral required for the Bayes factor calculation. This approach has a number of desirable properties. It can handle missing phenotype data, which is a real feature of most studies. It allows for both correlation due to relatedness between subjects or population structure and residual phenotype correlation. It can be viewed as a sparse Bayesian multivariate generalization of the mixed model approaches that have become popular recently in the GWAS literature. In addition, the method is computationally fast and can be applied to millions of SNPs for a large number of phenotypes. Furthermore we apply our method to 15 glycans from 3 isolated population cohorts (ORCADES, KORCULA and VIS), where we uncover association at a known locus, not identified in the original study but discovered later in a larger one. We conclude by discussing future directions.This thesis is not currently available on ORA

    Data of GWAS associated SNPs in regulatory regions

    No full text
    <p>Data file 1: DNase I hotspot samples included in FORGE analysis.</p> <p>A list of DNase I hotspot samples included in FORGE analysis is shown.The table lists details of the 125 ENCODE and 299 Roadmap Epigenome samples. The fields are File: file name; Lab: data-generating lab. One of UW (University of Washington, John Stamatoyannopoulos lab), Duke:UNC:UTA (Duke University, Greg Crawford lab) or combined representing a merged dataset from both labs; Experiment type: always DNase-seq in the current implementation; Project: Either ENCODE or Roadmap; Cell: cell type; Tissue: tissue name derived as described above; Data type: always hotspots in the current implementation; Short name: a short sample used for plotting; Individual: either the code for the individual sample as described in Biosamples or NA if not available; GEO accession: the GEO accession where found, or “Not found” if it could not be deconvoluted.</p> <p>Data file 2: List of phenotypes analysed.</p> <p>The list of phenotypes analysed, with non-redundant SNP counts and Pubmed identifiers for the studies involved is shown. This table is also available in the data directory of the GitHub release,https://github.com/iandunham/Forge.</p> <p> </p
    corecore