37 research outputs found

    Phenome-Wide Association Study (PheWAS) for Detection of Pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network

    Get PDF
    Using a phenome-wide association study (PheWAS) approach, we comprehensively tested genetic variants for association with phenotypes available for 70,061 study participants in the Population Architecture using Genomics and Epidemiology (PAGE) network. Our aim was to better characterize the genetic architecture of complex traits and identify novel pleiotropic relationships. This PheWAS drew on five population-based studies representing four major racial/ethnic groups (European Americans (EA), African Americans (AA), Hispanics/Mexican-Americans, and Asian/Pacific Islanders) in PAGE, each site with measurements for multiple traits, associated laboratory measures, and intermediate biomarkers. A total of 83 single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) were genotyped across two or more PAGE study sites. Comprehensive tests of association, stratified by race/ethnicity, were performed, encompassing 4,706 phenotypes mapped to 105 phenotype-classes, and association results were compared across study sites. A total of 111 PheWAS results had significant associations for two or more PAGE study sites with consistent direction of effect with a significance threshold of p<0.01 for the same racial/ethnic group, SNP, and phenotype-class. Among results identified for SNPs previously associated with phenotypes such as lipid traits, type 2 diabetes, and body mass index, 52 replicated previously published genotype-phenotype associations, 26 represented phenotypes closely related to previously known genotype-phenotype associations, and 33 represented potentially novel genotype-phenotype associations with pleiotropic effects. The majority of the potentially novel results were for single PheWAS phenotype-classes, for example, for CDKN2A/B rs1333049 (previously associated with type 2 diabetes in EA) a PheWAS association was identified for hemoglobin levels in AA. Of note, however, GALNT2 rs2144300 (previously associated with high-density lipoprotein cholesterol levels in EA) had multiple potentially novel PheWAS associations, with hypertension related phenotypes in AA and with serum calcium levels and coronary artery disease phenotypes in EA. PheWAS identifies associations for hypothesis generation and exploration of the genetic architecture of complex traits

    Statistical Methods for Analyzing Multivariate Phenotypes and Detecting Rare Variant Associations

    Get PDF
    This dissertation includes four papers with each distributed in one chapter. In chapter 1, I compared the performance of eight multivariate phenotype association tests. The motivation to conduct this power comparison paper is as follows. For nearly 15 years, genome-wide association studies (GWAS) have been widely used to identify genetic variants associated with human diseases and traits. GWAS typically investigate genetic variants for a predefined phenotype, thus fail to identify weak but important effects. In recent years, many multivariate association tests have been developed. However, there is a lack of comprehensive summary of such kinds of approaches. To fill this important gap, I did this power comparison work. The results show that none of the methods is consistently more powerful than that of others. Relatively more powerful methods are still in large demanding. In chapter 2, I proposed a Weighted Combination of multiple Phenotypes approach (WCmulP) for testing multiple correlated phenotypes and one genetic variant of interest. WCmulP linearly combines the multiple phenotypes with optimal weights such that the score test statistic is maximized. I compare WCmulP with other widely used tests and conduct extensive simulation studies as well as real data analysis to evaluate the performance of these methods. The results show that WCmulP outperforms the compared methods in most of the simulation scenarios and real data analysis. As the availability of electronic health record (EHR), thousands of clinical phenotypes can be measured and collected systematically. As a result, the phenome-wide association studies (PheWAS) emerged to detect variants with a broad spectrum of phenotypes. However, the current PheWAS are intrinsically univariate test, which investigate the phenotype one at a time. Genuine PheWAS that simultaneously test the wide range of phenotypes need to be discovered. In chapter 3, I proposed a novel PheWAS approach, which referred to as PheCLC (PheWAS using clustering linear combination), to examine genetic variation associated with up to thousands of phenotypes. PheCLC jointly analyzes a wide spectrum of human phenotypes as well as classifies them into different categories based on the International Classification of Diseases (ICD) codes. The simulation results show that PheCLC certainly controls type I error rates and is much more powerful than the traditional multivariate approaches. To date, GWAS have published thousands of common variants associated with human diseases. However, these common variants only contribute a small portion of the phenotypic variance. Many studies showed that rare variants could substantially explain missing heritability. In chapter 4, I derived a rare variant association study for family-based designs, where the rare variants can be enriched compared to population-based designs. I applied the proposed method as well as the other two family-based tests to the genetic analysis workshop 19 (GAW19) dataset and the results show that our method can identify more genes with power greater than 40% than the other two methods

    Phenome wide association study of vitamin D genetic variants in the UK Biobank cohort

    Get PDF
    Introduction Vitamin D status is an important public health issue due to the high prevalence of vitamin D insufficiency and deficiency, especially in high latitude areas. Furthermore, it has been reported to be associated with a number of diseases. In a previous umbrella review of meta-analyses of randomized clinical trials (RCTs) and of observational studies, it was found that plasma/ serum 25-hydroxyvitamin D (25(OH)D) or supplemental vitamin D has been linked to more than 130 unique health outcomes. However, the majority of the studies yielded conflicting results and no association was convincing. Aim and Objectives The aim of my PhD was to comprehensively explore the association between vitamin D and multiple outcomes. The specific objectives were to: 1) update the umbrella review of meta-analysis of observational studies or randomized controlled trials on associations between vitamin D and health outcomes published between 2014 and 2018; 2) conduct a systematic literature review of previous Mendelian Randomization studies on causal associations between vitamin D and all outcomes; 3) conduct a systematic literature review of published phenome wide association studies, summarizing the methods, results and predictors; 4) create a polygenic risk score of vitamin D related genetic variants, weighted by their effect estimates from the most recent genome wide association study; 5) encode phenotype groups based on electronic medical records of participants; 6) study the associations between vitamin D related SNPs and the whole spectrum of health outcomes, defined by electronic medical records utilising the UK Biobank study; 7) explore the causal effect of 25- hydroxyvitamin D level on health outcomes by applying novel instrumental variable methods. Methods First I updated the vitamin D umbrella review published in 2015, by summarizing the evidence from meta-analyses of observational studies and meta-analyses of RCTs published between 2014 and 2018. I also performed a systematic literature review of all previous Mendelian Randomizations studies on the effect of vitamin D on all health outcomes, as well as a systematic review of all published PheWAS studies and the methodology they applied. Then I conducted original data analysis in a large prospective population-based cohort, the UK Biobank, which includes more than 500,000 participants. A 25(OH)D genetic risk score (weighted sum score of 6 serum 25(OH)D-related SNPs: rs3755967, rs12785878, rs10741657, rs17216707, rs10745742 and rs8018720, as identified by the largest genome wide association study of 25(OH)D levels) was constructed to be used as the instrumental variable. I used a phenotyping algorithm to code the electronic medical records (EMR) of UK Biobank participants into 1853 distinct disease categories and I then ran the PheWAS analysis to test the associations between the 25(OH)D genetic risk score and 950 disease outcome groups (i.e. outcomes with more than 200 cases). For phenotypes found to show a statistically significant association with 25(OH)D levels in the PheWAS or phenotypes which were found to be convincing or highly suggestive in previous studies, I developed an extended case definition by incorporating self-reported data collected by UK Biobank baseline questionnaire and interview. The possible causal effect of vitamin D on those outcomes was then explored by the MR two-stage method, inverse variance weighted MR and Egger’s regression, followed by sensitivity analyses. Results In the updated systematic literature review of meta-analyses of observational studies or RCTs, only studies on new outcomes which had not been covered by the previous umbrella review were included. A total of 95 meta-analyses met the inclusion criteria. Among the included studies there were 66 meta-analyses of observational studies, and 29 meta-analyses of RCTs. Eighty-five new outcomes were explored by meta-analyses of observational studies, and 59 new outcomes were covered by meta-analyses of RCTs. In the systematic review of published Mendelian Randomization studies on vitamin D, a total of 29 studies were included. A causal role of 25(OH)D level was supported by MR analysis for the following outcomes: type 2 diabetes, total adiponectin, diastolic blood pressure, risk of hypertension, multiple sclerosis, Alzheimer’s disease, all-cause mortality, cancer mortality, mortality excluding cancer and cardiovascular events, ovarian cancer, HDL-cholesterol, triglycerides and cognitive functions. For the systematic literature review of published PheWAS studies and their methodology, a total of 45 studies were included. The processes for implementing a PheWAS study include the following steps: sample selection, predictor selection, phenotyping, statistical analysis and result interpretation. One of the main challenges is the definitions of the phenotypes (i.e., the method of binning participants into different phenotype groups). In the phenotyping step, an ICD curated phenotyping was widely used by previous PheWAS, which I also used in my own analysis. By applying the ICD curated phenotyping, 1853 phenotype groups were defined in the participants I used. In PheWAS, only phenotype groups with more than 200 cases were analysed (920 phenotypes). In the PheWAS, only associations between rs17216707 (CYP24A1) and “calculus of ureter” (beta = -0.219, se = 0.045, P = 1.14*10-6), “urinary calculus” (beta = -0.129, se = 0.027, P = 1.31*10-6), “alveolar and parietoalveolar pneumonopathy” (beta = 0.418, se = 0.101, P = 3.53*10-5) survived Bonferroni correction. Nine outcomes, including systolic blood pressure, diastolic blood pressure, body mass index, risk of hypertension, type 2 diabetes, ischemic heart disease, depression, non-vertebral fracture and all-cause mortality were explored in MR analyses. The MR analysis had more than 80% power for detecting a true odds ratio of 1.2 or larger for binary outcomes. None of explored outcomes were statistically significant. Results from multiple MR methods and sensitivity analyses were consistent. Discussion Vitamin D and its association with multiple outcomes has been widely studied. More than 230 outcomes have been linked with vitamin D by meta-analyses of observational studies and RCTs. On the contrary, evidence from Mendelian Randomization studies is lacking. In particular I identified only 20 existing MR studies and only 13 outcomes were suggested to be causally related to vitamin D. In the systematic literature review of previous PheWAS studies, I summarized the applied methods, predictors and results. Although phenotyping based on ICD codes provided good performance and was widely applied by previous PheWAS studies, phenotyping can be improved if lab data, imaging data and medical notes can be incorporated. Alternative algorithms, which takes advantage of deep learning and thus enable high precision phenotyping, needs to be developed. From the PheWAS analysis, the score of vitamin D related genetic variants was not statistically significantly associated with any of the 920 phenotypes tested. In the single variant analysis, only rs17216707 (CYP24A1) was shown to be associated with calculus outcomes statistically significantly. Previous studies reported associations between vitamin D and hypercalcemia, hypercalciuria, nephrolithiasis and nephrocalcinosis, may be due to the role of vitamin D in calcium homeostasis. In the MR analysis, I found no evidence of large to moderate (OR>1.2) causal associations of vitamin D on a very wide range of health outcomes. These included SBP, DBP, hypertension, T2D, IHD, BMI, depression, non-vertebral fracture and allcause mortality which have previously been proposed to be influenced by low vitamin D levels. Further, even larger studies, probably involving the joint analysis of data from several large biobanks with future IVs that explain a higher proportion of the trait variance, will be required to exclude smaller causal effects which could have public health importance because of the high population prevalence of low vitamin D levels in some populations

    Statistical methods for gene selection and genetic association studies

    Get PDF
    This dissertation includes five Chapters. A brief description of each chapter is organized as follows. In Chapter One, we propose a signed bipartite genotype and phenotype network (GPN) by linking phenotypes and genotypes based on the statistical associations. It provides a new insight to investigate the genetic architecture among multiple correlated phenotypes and explore where phenotypes might be related at a higher level of cellular and organismal organization. We show that multiple phenotypes association studies by considering the proposed network are improved by incorporating the genetic information into the phenotype clustering. In Chapter Two, we first illustrate the proposed GPN to GWAS summary statistics. Then, we assess contributions to constructing a well-defined GPN with a clear representation of genetic associations by comparing the network properties with a random network, including connectivity, centrality, and community structure. The network topology annotations based on the sparse representations of GPN can be used to understand the disease heritability for the highly correlated phenotypes. In applications of phenome-wide association studies, the proposed GPN can identify more significant pairs of genetic variant and phenotype categories. In Chapter Three, a powerful and computationally efficient gene-based association test is proposed, aggregating information from different gene-based association tests and also incorporating expression quantitative trait locus information. We show that the proposed method controls the type I error rates very well and has higher power in the simulation studies and can identify more significant genes in the real data analyses. In Chapter Four, we develop six statistical selection methods based on the penalized regression for inferring target genes of a transcription factor (TF). In this study, the proposed selection methods combine statistics, machine learning , and convex optimization approach, which have great efficacy in identifying the true target genes. The methods will fill the gap of lacking the appropriate methods for predicting target genes of a TF, and are instrumental for validating experimental results yielding from ChIP-seq and DAP-seq, and conversely, selection and annotation of TFs based on their target genes. In Chapter Five, we propose a gene selection approach by capturing gene-level signals in network-based regression into case-control association studies with DNA sequence data or DNA methylation data, inspired by the popular gene-based association tests using a weighted combination of genetic variants to capture the combined effect of individual genetic variants within a gene. We show that the proposed gene selection approach have higher true positive rates than using traditional dimension reduction techniques in the simulation studies and select potentially rheumatoid arthritis related genes that are missed by existing methods

    Hypothesis exploration with visualization of variance.

    Get PDF
    BackgroundThe Consortium for Neuropsychiatric Phenomics (CNP) at UCLA was an investigation into the biological bases of traits such as memory and response inhibition phenotypes-to explore whether they are linked to syndromes including ADHD, Bipolar disorder, and Schizophrenia. An aim of the consortium was in moving from traditional categorical approaches for psychiatric syndromes towards more quantitative approaches based on large-scale analysis of the space of human variation. It represented an application of phenomics-wide-scale, systematic study of phenotypes-to neuropsychiatry research.ResultsThis paper reports on a system for exploration of hypotheses in data obtained from the LA2K, LA3C, and LA5C studies in CNP. ViVA is a system for exploratory data analysis using novel mathematical models and methods for visualization of variance. An example of these methods is called VISOVA, a combination of visualization and analysis of variance, with the flavor of exploration associated with ANOVA in biomedical hypothesis generation. It permits visual identification of phenotype profiles-patterns of values across phenotypes-that characterize groups. Visualization enables screening and refinement of hypotheses about variance structure of sets of phenotypes.ConclusionsThe ViVA system was designed for exploration of neuropsychiatric hypotheses by interdisciplinary teams. Automated visualization in ViVA supports 'natural selection' on a pool of hypotheses, and permits deeper understanding of the statistical architecture of the data. Large-scale perspective of this kind could lead to better neuropsychiatric diagnostics

    ANALYSIS OF CHROMOSOME SPATIAL ORGANIZATION DATA AND INTEGRATION WITH GENE MAPPING FOR COMPLEX TRAITS

    Get PDF
    Studying the 3D chromosomal organization is crucial to understanding processes of transcription, histone modifications, and DNA repair and replication. Chromatin conformation shapes molecular functions beyond genetic variation at the sequence level and epigenetic footprints along the one-dimensional genome. DNA spatial organization features can influence molecular and organism-level phenotypes, from regulation of the expression of target genes (which can be megabases [Mb] away), to the development of various diseases including autoimmune diseases, neurological diseases, and cancer.The genome-wide chromosome conformation capture technology Hi-C captures genomic interactions of all loci, genome wide. Hi-C data allows us to investigate chromatin organization at various levels and resolutions, including the Mb resolution chromosome compartments and topologically associated domains (TADs), 10-40Kb resolution frequently interacting regions (FIREs), and 1-40Kb resolution chromatin loops and long-range chromatin interactions. FIREs have been demonstrated to provide valuable information for tissue or cell type-specific transcriptional regulation, characteristics unique from other domain features observed in the 3D genome. Until now, there is no stand-alone software package for the detection of FIREs. To fill in this gap, I first present a user-friendly R-package to identify FIREs and the clustering of FIREs (super-FIREs), accessible to the general scientific community.Next, I further explore the 3D genome and analyze brain tissue Hi-C data from 3 fetal and 3 adult human cortex samples with a total of 10.4 billion raw reads, the most deeply sequenced human brain tissue Hi-C datasets we are aware of to date. My analysis of this Hi-C data (identifying compartments, TAD boundaries, FIREs, and long range chromatin interactions) generated mechanistic insights at GWAS loci for psychiatric disorders, brain-based traits, and neurological conditions, particularly schizophrenia.Lastly, as incorporating annotation can provide insights at GWAS loci, I annotate 148,019 variants identified in a recent trans-ethnic analysis for hematological traits in 746,667 participants. I present my findings in an R Shiny app, ABCx: Annotator for Blood Cell Traits, which highlights variants 1D epigenomic signatures, impact on gene expression, and chromatin conformation information to aid in further functional follow up.Doctor of Public Healt

    Aurallinen migreeni – geneettiset alttiusvariantit

    Get PDF
    Migraine is a complex headache disorder affecting approximately 15% of the adult population worldwide. It has a great impact on both individual patients and society. According to the Global Burden of Disease Study, migraine is one of the most costly and disabling neurological diseases. There are two main subtypes of migraine: migraine without aura and migraine with aura. Migraine without aura is the most common subtype of migraine. However, one-third of migraine patients experience neurological aura symptoms. In most cases, aura is visual, including scintillating scotoma and loss of vision type symptoms, but it can also be sensory, motor or result in speech disturbance. In hemiplegic migraine, a rare form of migraine with aura, the aura is characterized by motor weakness. The exact pathophysiological mechanisms underlying migraine are still unknown. Both family and twin studies have shown that migraine is hereditary. Recent genome-wide association studies (GWAS) have revealed the polygenic nature of common forms of migraine, while high-impact mutations have been found mainly in familial hemiplegic migraine (FHM). FHM is suggested to be a monogenic disorder with three major causative genes: CACNA1A, ATP1A2 and SCN1A. Genetic variants in these ion-transport/channel genes have also been associated with rare monogenic forms of epilepsy. The aim of this doctoral thesis was to identify genetic susceptibility factors for migraine with aura and migraine-epilepsy phenotype. We applied targeted and genome-wide approaches in a large and well-characterized Finnish migraine family sample (1,967 families with 8,937 family members). The first part of the thesis defined hemiplegic migraine as a clinically and genetically heterogeneous disease. In terms of headache characteristics and neurological aura symptoms, hemiplegic migraine patients appeared at the extreme end of the migraine with aura symptom spectrum. Our study also showed that mutations in CACNA1A, ATP1A2 and SCN1A are not the major cause of hemiplegic migraine in Finnish patients, as only 9% (4/45) of the studied FHM families and none of the sporadic cases (n=201) carried pathogenic exonic variants in these genes. These data suggest that there are additional genetic factors contributing to the hemiplegic migraine phenotype. In the second part of this thesis, we utilized data obtained from a previously published migraine GWAS to calculate polygenic risk scores (PRS) for 8,319 participants from the Finnish migraine family collection and 14,470 FINRISK population-based samples. Results showed that common polygenic variation significantly contributes to the familial aggregation of migraine. The polygenic burden was higher in familial migraine cases than in population cases. Furthermore, the polygenic burden was increased across all of the studied migraine with aura and migraine without aura subtypes in the family dataset compared with the population controls. Patients with typical migraine aura or hemiplegic migraine carried a higher load compared with patients having migraine without aura. Our findings are especially interesting considering that FHM has been suggested to be a monogenic disorder primarily driven by rare, high-impact variants. The third part of this thesis focused on a previously identified migraine-epilepsy susceptibility locus on chromosome 12q24.2-q24.3 identified in a large multi-generational Finnish migraine-epilepsy family including 120 individuals. We defined a 450 kbp haplotype that was shared among 12 out of 13 epilepsy patients. This segment covers almost the entire NCOR2 gene, which plays an important regulatory role during brain development. Interestingly, one of the 123 migraine risk loci recently reported by the International Headache Genetics Consortium also co-localized with this region. Our results suggest that NCOR2 could potentially have a role in both migraine and epilepsy and could thus contribute to the susceptibility of both of these paroxysmal brain diseases. However, further studies are needed to identify the actual causal variants. Overall, the results of this doctoral thesis highlight migraine as a clinically and genetically heterogeneous disease. Our results suggest that migraine with typical aura and hemiplegic migraine share a similar genetic background with a high polygenic load. Even FHM may not be a true monogenic disease, but rather a disease in which common risk variants, together with rare pathogenic variants and environmental risk factors, contribute to the disease outcome. Furthermore, our results provide genetic evidence from a large multi-generational Finnish family for potentially shared pathophysiology underlying both epilepsy and migraine.Migreeni on yleinen kohtauksellinen pÀÀnsÀrkysairaus. Kolmasosalla migreenipotilaista kohtauksiin liittyy auraoire, joka voi olla nÀkö-, puhe- tai tuntohÀiriö. Harvinaisessa hemiplegisessÀ migreenissÀ aura esiintyy toisella puolella kehoa puutumista ja voimattomuutta aiheuttavana oireena. Migreenin patofysiologiaa ei vielÀ tÀysin tunneta. Laajat geenitutkimukset ovat tunnistaneet yli sata migreeniriskiÀ hieman lisÀÀvÀÀ perimÀn vaihtelevaa kohtaa (geenivarianttia). Sairastumisriskiin merkittÀvÀsti yksinÀÀn vaikuttavia variantteja on tunnistettu vain monogeenisenÀ sairautena pidetylle hemiplegiselle migreenille. TÀssÀ vÀitöskirjatutkimuksessa selvitettiin aurallisen migreenin sekÀ migreenin ja epilepsian yhteisesiintymisen geneettistÀ taustaa hyödyntÀmÀllÀ suurta suomalaista migreeniperheaineistoa (1967 perhettÀ, 8937 henkilöÀ). Tutkimuksen ensimmÀisen osatyön tulokset osoittivat, ettÀ hemipleginen migreeni on osa aurallisen migreenin oirejatkumoa, vaikkakin hemiplegistÀ migreeniÀ sairastavien oireet ovat keskimÀÀrin vakavampia kuin tyypillistÀ aurallista migreeniÀ sairastavilla. Tulokset osoittivat myös, ettÀ kolme tunnettua hemiplegisen migreenin alttiusgeeniÀ (CACNA1A, ATP1A2 ja SCN1A) eivÀt yksinÀÀn riitÀ selittÀmÀÀn hemiplegisen migreenin esiintymistÀ suomalaisissa perheissÀ. Ainoastaan 9 %:lta tutkituista perheistÀ (4/45) löydettiin todennÀköinen sairausvariantti kyseisistÀ geeneistÀ. Tutkimuksen toisen osatyön tulokset osoittivat, ettÀ monien geneettisten riskitekijöiden yhteisvaikutus selittÀÀ migreenin esiintymistÀ suvuissa. Tutkimuksessa havaittiin myös eroja eri migreenityyppien vÀlillÀ. Yleisten geneettisten riskitekijöiden muodostama taakka oli suurempi aurallisessa migreenissÀ, mukaan lukien hemiplegistÀ migreeniÀ sairastavat henkilöt, kuin aurattomassa migreenissÀ. Aiemmin hemiplegisen migreenin on ajateltu aiheutuvan pelkÀstÀÀn harvinaisista varianteista. Tutkimuksen viimeisessÀ osatyössÀ keskityttiin yhteen suureen perheeseen ja siinÀ tunnistettuun migreenille ja epilepsialle altistavaan 12q24.31 kromosomialueeseen. Jatkotutkimukset osoittivat aivojen kehitykseen vaikuttavan NCOR2-geenin todennÀköisimmÀksi epilepsian alttiusgeeniksi kyseisessÀ perheessÀ. Yksi migreenin riskialueista sijaitsee samalla genomialueella, mikÀ tukee alueen merkitystÀ myös migreenin taustalla. Kokonaisuudessaan tÀmÀn vÀitöstutkimuksen tulokset viittaavat siihen, ettÀ migreeni on kliinisesti ja geneettisesti heterogeeninen sairaus, jonka kehittymiseen vaikuttavat useat geenivariantit yhdessÀ ulkoisten tekijöiden kanssa. Aurallisen migreenin kliiniset oireet muodostavat jatkumon, jossa hemiplegisen migreenin oireet ovat kestoltaan, mÀÀrÀltÀÀn ja tyypiltÀÀn kaikkein vakavimpia. YllÀttÀen myös geneettisten riskitekijöiden yhteisvaikutus on aurallisilla ja hemiplegisillÀ migreenipotilailla kaikkein suurin
    corecore