37 research outputs found
Phenome-Wide Association Study (PheWAS) for Detection of Pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network
Using a phenome-wide association study (PheWAS) approach, we comprehensively tested genetic variants for association with phenotypes available for 70,061 study participants in the Population Architecture using Genomics and Epidemiology (PAGE) network. Our aim was to better characterize the genetic architecture of complex traits and identify novel pleiotropic relationships. This PheWAS drew on five population-based studies representing four major racial/ethnic groups (European Americans (EA), African Americans (AA), Hispanics/Mexican-Americans, and Asian/Pacific Islanders) in PAGE, each site with measurements for multiple traits, associated laboratory measures, and intermediate biomarkers. A total of 83 single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) were genotyped across two or more PAGE study sites. Comprehensive tests of association, stratified by race/ethnicity, were performed, encompassing 4,706 phenotypes mapped to 105 phenotype-classes, and association results were compared across study sites. A total of 111 PheWAS results had significant associations for two or more PAGE study sites with consistent direction of effect with a significance threshold of p<0.01 for the same racial/ethnic group, SNP, and phenotype-class. Among results identified for SNPs previously associated with phenotypes such as lipid traits, type 2 diabetes, and body mass index, 52 replicated previously published genotype-phenotype associations, 26 represented phenotypes closely related to previously known genotype-phenotype associations, and 33 represented potentially novel genotype-phenotype associations with pleiotropic effects. The majority of the potentially novel results were for single PheWAS phenotype-classes, for example, for CDKN2A/B rs1333049 (previously associated with type 2 diabetes in EA) a PheWAS association was identified for hemoglobin levels in AA. Of note, however, GALNT2 rs2144300 (previously associated with high-density lipoprotein cholesterol levels in EA) had multiple potentially novel PheWAS associations, with hypertension related phenotypes in AA and with serum calcium levels and coronary artery disease phenotypes in EA. PheWAS identifies associations for hypothesis generation and exploration of the genetic architecture of complex traits
Statistical Methods for Analyzing Multivariate Phenotypes and Detecting Rare Variant Associations
This dissertation includes four papers with each distributed in one chapter.
In chapter 1, I compared the performance of eight multivariate phenotype association tests. The motivation to conduct this power comparison paper is as follows. For nearly 15 years, genome-wide association studies (GWAS) have been widely used to identify genetic variants associated with human diseases and traits. GWAS typically investigate genetic variants for a predefined phenotype, thus fail to identify weak but important effects. In recent years, many multivariate association tests have been developed. However, there is a lack of comprehensive summary of such kinds of approaches. To fill this important gap, I did this power comparison work. The results show that none of the methods is consistently more powerful than that of others. Relatively more powerful methods are still in large demanding.
In chapter 2, I proposed a Weighted Combination of multiple Phenotypes approach (WCmulP) for testing multiple correlated phenotypes and one genetic variant of interest. WCmulP linearly combines the multiple phenotypes with optimal weights such that the score test statistic is maximized. I compare WCmulP with other widely used tests and conduct extensive simulation studies as well as real data analysis to evaluate the performance of these methods. The results show that WCmulP outperforms the compared methods in most of the simulation scenarios and real data analysis.
As the availability of electronic health record (EHR), thousands of clinical phenotypes can be measured and collected systematically. As a result, the phenome-wide association studies (PheWAS) emerged to detect variants with a broad spectrum of phenotypes. However, the current PheWAS are intrinsically univariate test, which investigate the phenotype one at a time. Genuine PheWAS that simultaneously test the wide range of phenotypes need to be discovered. In chapter 3, I proposed a novel PheWAS approach, which referred to as PheCLC (PheWAS using clustering linear combination), to examine genetic variation associated with up to thousands of phenotypes. PheCLC jointly analyzes a wide spectrum of human phenotypes as well as classifies them into different categories based on the International Classification of Diseases (ICD) codes. The simulation results show that PheCLC certainly controls type I error rates and is much more powerful than the traditional multivariate approaches.
To date, GWAS have published thousands of common variants associated with human diseases. However, these common variants only contribute a small portion of the phenotypic variance. Many studies showed that rare variants could substantially explain missing heritability. In chapter 4, I derived a rare variant association study for family-based designs, where the rare variants can be enriched compared to population-based designs. I applied the proposed method as well as the other two family-based tests to the genetic analysis workshop 19 (GAW19) dataset and the results show that our method can identify more genes with power greater than 40% than the other two methods
Phenome wide association study of vitamin D genetic variants in the UK Biobank cohort
Introduction
Vitamin D status is an important public health issue due to the high prevalence of
vitamin D insufficiency and deficiency, especially in high latitude areas. Furthermore,
it has been reported to be associated with a number of diseases. In a previous umbrella
review of meta-analyses of randomized clinical trials (RCTs) and of observational
studies, it was found that plasma/ serum 25-hydroxyvitamin D (25(OH)D) or
supplemental vitamin D has been linked to more than 130 unique health outcomes.
However, the majority of the studies yielded conflicting results and no association was
convincing.
Aim and Objectives
The aim of my PhD was to comprehensively explore the association between vitamin
D and multiple outcomes. The specific objectives were to: 1) update the umbrella
review of meta-analysis of observational studies or randomized controlled trials on
associations between vitamin D and health outcomes published between 2014 and
2018; 2) conduct a systematic literature review of previous Mendelian Randomization
studies on causal associations between vitamin D and all outcomes; 3) conduct a
systematic literature review of published phenome wide association studies,
summarizing the methods, results and predictors; 4) create a polygenic risk score of
vitamin D related genetic variants, weighted by their effect estimates from the most
recent genome wide association study; 5) encode phenotype groups based on
electronic medical records of participants; 6) study the associations between vitamin
D related SNPs and the whole spectrum of health outcomes, defined by electronic
medical records utilising the UK Biobank study; 7) explore the causal effect of 25-
hydroxyvitamin D level on health outcomes by applying novel instrumental variable
methods.
Methods
First I updated the vitamin D umbrella review published in 2015, by summarizing the
evidence from meta-analyses of observational studies and meta-analyses of RCTs
published between 2014 and 2018. I also performed a systematic literature review of
all previous Mendelian Randomizations studies on the effect of vitamin D on all health
outcomes, as well as a systematic review of all published PheWAS studies and the
methodology they applied. Then I conducted original data analysis in a large
prospective population-based cohort, the UK Biobank, which includes more than
500,000 participants. A 25(OH)D genetic risk score (weighted sum score of 6 serum
25(OH)D-related SNPs: rs3755967, rs12785878, rs10741657, rs17216707,
rs10745742 and rs8018720, as identified by the largest genome wide association study
of 25(OH)D levels) was constructed to be used as the instrumental variable. I used a
phenotyping algorithm to code the electronic medical records (EMR) of UK Biobank
participants into 1853 distinct disease categories and I then ran the PheWAS analysis
to test the associations between the 25(OH)D genetic risk score and 950 disease
outcome groups (i.e. outcomes with more than 200 cases). For phenotypes found to
show a statistically significant association with 25(OH)D levels in the PheWAS or
phenotypes which were found to be convincing or highly suggestive in previous
studies, I developed an extended case definition by incorporating self-reported data
collected by UK Biobank baseline questionnaire and interview. The possible causal
effect of vitamin D on those outcomes was then explored by the MR two-stage method,
inverse variance weighted MR and Eggerâs regression, followed by sensitivity
analyses.
Results
In the updated systematic literature review of meta-analyses of observational studies
or RCTs, only studies on new outcomes which had not been covered by the previous
umbrella review were included. A total of 95 meta-analyses met the inclusion criteria.
Among the included studies there were 66 meta-analyses of observational studies, and
29 meta-analyses of RCTs. Eighty-five new outcomes were explored by meta-analyses
of observational studies, and 59 new outcomes were covered by meta-analyses of
RCTs.
In the systematic review of published Mendelian Randomization studies on vitamin D,
a total of 29 studies were included. A causal role of 25(OH)D level was supported by
MR analysis for the following outcomes: type 2 diabetes, total adiponectin, diastolic
blood pressure, risk of hypertension, multiple sclerosis, Alzheimerâs disease, all-cause
mortality, cancer mortality, mortality excluding cancer and cardiovascular events,
ovarian cancer, HDL-cholesterol, triglycerides and cognitive functions.
For the systematic literature review of published PheWAS studies and their
methodology, a total of 45 studies were included. The processes for implementing a
PheWAS study include the following steps: sample selection, predictor selection,
phenotyping, statistical analysis and result interpretation. One of the main challenges
is the definitions of the phenotypes (i.e., the method of binning participants into
different phenotype groups). In the phenotyping step, an ICD curated phenotyping was
widely used by previous PheWAS, which I also used in my own analysis.
By applying the ICD curated phenotyping, 1853 phenotype groups were defined in the
participants I used. In PheWAS, only phenotype groups with more than 200 cases were
analysed (920 phenotypes). In the PheWAS, only associations between rs17216707
(CYP24A1) and âcalculus of ureterâ (beta = -0.219, se = 0.045, P = 1.14*10-6), âurinary
calculusâ (beta = -0.129, se = 0.027, P = 1.31*10-6), âalveolar and parietoalveolar
pneumonopathyâ (beta = 0.418, se = 0.101, P = 3.53*10-5) survived Bonferroni
correction.
Nine outcomes, including systolic blood pressure, diastolic blood pressure, body mass
index, risk of hypertension, type 2 diabetes, ischemic heart disease, depression, non-vertebral
fracture and all-cause mortality were explored in MR analyses. The MR
analysis had more than 80% power for detecting a true odds ratio of 1.2 or larger for
binary outcomes. None of explored outcomes were statistically significant. Results
from multiple MR methods and sensitivity analyses were consistent.
Discussion
Vitamin D and its association with multiple outcomes has been widely studied. More
than 230 outcomes have been linked with vitamin D by meta-analyses of observational
studies and RCTs. On the contrary, evidence from Mendelian Randomization studies
is lacking. In particular I identified only 20 existing MR studies and only 13 outcomes
were suggested to be causally related to vitamin D. In the systematic literature review
of previous PheWAS studies, I summarized the applied methods, predictors and results.
Although phenotyping based on ICD codes provided good performance and was
widely applied by previous PheWAS studies, phenotyping can be improved if lab data,
imaging data and medical notes can be incorporated. Alternative algorithms, which
takes advantage of deep learning and thus enable high precision phenotyping, needs to
be developed.
From the PheWAS analysis, the score of vitamin D related genetic variants was not
statistically significantly associated with any of the 920 phenotypes tested. In the
single variant analysis, only rs17216707 (CYP24A1) was shown to be associated with
calculus outcomes statistically significantly. Previous studies reported associations
between vitamin D and hypercalcemia, hypercalciuria, nephrolithiasis and
nephrocalcinosis, may be due to the role of vitamin D in calcium homeostasis.
In the MR analysis, I found no evidence of large to moderate (OR>1.2) causal
associations of vitamin D on a very wide range of health outcomes. These included
SBP, DBP, hypertension, T2D, IHD, BMI, depression, non-vertebral fracture and allcause
mortality which have previously been proposed to be influenced by low vitamin
D levels. Further, even larger studies, probably involving the joint analysis of data
from several large biobanks with future IVs that explain a higher proportion of the trait
variance, will be required to exclude smaller causal effects which could have public
health importance because of the high population prevalence of low vitamin D levels
in some populations
Statistical methods for gene selection and genetic association studies
This dissertation includes five Chapters. A brief description of each chapter is organized as follows.
In Chapter One, we propose a signed bipartite genotype and phenotype network (GPN) by linking phenotypes and genotypes based on the statistical associations. It provides a new insight to investigate the genetic architecture among multiple correlated phenotypes and explore where phenotypes might be related at a higher level of cellular and organismal organization. We show that multiple phenotypes association studies by considering the proposed network are improved by incorporating the genetic information into the phenotype clustering.
In Chapter Two, we first illustrate the proposed GPN to GWAS summary statistics. Then, we assess contributions to constructing a well-defined GPN with a clear representation of genetic associations by comparing the network properties with a random network, including connectivity, centrality, and community structure. The network topology annotations based on the sparse representations of GPN can be used to understand the disease heritability for the highly correlated phenotypes. In applications of phenome-wide association studies, the proposed GPN can identify more significant pairs of genetic variant and phenotype categories.
In Chapter Three, a powerful and computationally efficient gene-based association test is proposed, aggregating information from different gene-based association tests and also incorporating expression quantitative trait locus information. We show that the proposed method controls the type I error rates very well and has higher power in the simulation studies and can identify more significant genes in the real data analyses.
In Chapter Four, we develop six statistical selection methods based on the penalized regression for inferring target genes of a transcription factor (TF). In this study, the proposed selection methods combine statistics, machine learning , and convex optimization approach, which have great efficacy in identifying the true target genes. The methods will fill the gap of lacking the appropriate methods for predicting target genes of a TF, and are instrumental for validating experimental results yielding from ChIP-seq and DAP-seq, and conversely, selection and annotation of TFs based on their target genes.
In Chapter Five, we propose a gene selection approach by capturing gene-level signals in network-based regression into case-control association studies with DNA sequence data or DNA methylation data, inspired by the popular gene-based association tests using a weighted combination of genetic variants to capture the combined effect of individual genetic variants within a gene. We show that the proposed gene selection approach have higher true positive rates than using traditional dimension reduction techniques in the simulation studies and select potentially rheumatoid arthritis related genes that are missed by existing methods
Hypothesis exploration with visualization of variance.
BackgroundThe Consortium for Neuropsychiatric Phenomics (CNP) at UCLA was an investigation into the biological bases of traits such as memory and response inhibition phenotypes-to explore whether they are linked to syndromes including ADHD, Bipolar disorder, and Schizophrenia. An aim of the consortium was in moving from traditional categorical approaches for psychiatric syndromes towards more quantitative approaches based on large-scale analysis of the space of human variation. It represented an application of phenomics-wide-scale, systematic study of phenotypes-to neuropsychiatry research.ResultsThis paper reports on a system for exploration of hypotheses in data obtained from the LA2K, LA3C, and LA5C studies in CNP. ViVA is a system for exploratory data analysis using novel mathematical models and methods for visualization of variance. An example of these methods is called VISOVA, a combination of visualization and analysis of variance, with the flavor of exploration associated with ANOVA in biomedical hypothesis generation. It permits visual identification of phenotype profiles-patterns of values across phenotypes-that characterize groups. Visualization enables screening and refinement of hypotheses about variance structure of sets of phenotypes.ConclusionsThe ViVA system was designed for exploration of neuropsychiatric hypotheses by interdisciplinary teams. Automated visualization in ViVA supports 'natural selection' on a pool of hypotheses, and permits deeper understanding of the statistical architecture of the data. Large-scale perspective of this kind could lead to better neuropsychiatric diagnostics
ANALYSIS OF CHROMOSOME SPATIAL ORGANIZATION DATA AND INTEGRATION WITH GENE MAPPING FOR COMPLEX TRAITS
Studying the 3D chromosomal organization is crucial to understanding processes of transcription, histone modifications, and DNA repair and replication. Chromatin conformation shapes molecular functions beyond genetic variation at the sequence level and epigenetic footprints along the one-dimensional genome. DNA spatial organization features can influence molecular and organism-level phenotypes, from regulation of the expression of target genes (which can be megabases [Mb] away), to the development of various diseases including autoimmune diseases, neurological diseases, and cancer.The genome-wide chromosome conformation capture technology Hi-C captures genomic interactions of all loci, genome wide. Hi-C data allows us to investigate chromatin organization at various levels and resolutions, including the Mb resolution chromosome compartments and topologically associated domains (TADs), 10-40Kb resolution frequently interacting regions (FIREs), and 1-40Kb resolution chromatin loops and long-range chromatin interactions. FIREs have been demonstrated to provide valuable information for tissue or cell type-specific transcriptional regulation, characteristics unique from other domain features observed in the 3D genome. Until now, there is no stand-alone software package for the detection of FIREs. To fill in this gap, I first present a user-friendly R-package to identify FIREs and the clustering of FIREs (super-FIREs), accessible to the general scientific community.Next, I further explore the 3D genome and analyze brain tissue Hi-C data from 3 fetal and 3 adult human cortex samples with a total of 10.4 billion raw reads, the most deeply sequenced human brain tissue Hi-C datasets we are aware of to date. My analysis of this Hi-C data (identifying compartments, TAD boundaries, FIREs, and long range chromatin interactions) generated mechanistic insights at GWAS loci for psychiatric disorders, brain-based traits, and neurological conditions, particularly schizophrenia.Lastly, as incorporating annotation can provide insights at GWAS loci, I annotate 148,019 variants identified in a recent trans-ethnic analysis for hematological traits in 746,667 participants. I present my findings in an R Shiny app, ABCx: Annotator for Blood Cell Traits, which highlights variants 1D epigenomic signatures, impact on gene expression, and chromatin conformation information to aid in further functional follow up.Doctor of Public Healt
Aurallinen migreeni â geneettiset alttiusvariantit
Migraine is a complex headache disorder affecting approximately 15% of the adult population worldwide. It has a great impact on both individual patients and society. According to the Global Burden of Disease Study, migraine is one of the most costly and disabling neurological diseases.
There are two main subtypes of migraine: migraine without aura and migraine with aura. Migraine without aura is the most common subtype of migraine. However, one-third of migraine patients experience neurological aura symptoms. In most cases, aura is visual, including scintillating scotoma and loss of vision type symptoms, but it can also be sensory, motor or result in speech disturbance. In hemiplegic migraine, a rare form of migraine with aura, the aura is characterized by motor weakness.
The exact pathophysiological mechanisms underlying migraine are still unknown. Both family and twin studies have shown that migraine is hereditary. Recent genome-wide association studies (GWAS) have revealed the polygenic nature of common forms of migraine, while high-impact mutations have been found mainly in familial hemiplegic migraine (FHM). FHM is suggested to be a monogenic disorder with three major causative genes: CACNA1A, ATP1A2 and SCN1A. Genetic variants in these ion-transport/channel genes have also been associated with rare monogenic forms of epilepsy.
The aim of this doctoral thesis was to identify genetic susceptibility factors for migraine with aura and migraine-epilepsy phenotype. We applied targeted and genome-wide approaches in a large and well-characterized Finnish migraine family sample (1,967 families with 8,937 family members).
The first part of the thesis defined hemiplegic migraine as a clinically and genetically heterogeneous disease. In terms of headache characteristics and neurological aura symptoms, hemiplegic migraine patients appeared at the extreme end of the migraine with aura symptom spectrum. Our study also showed that mutations in CACNA1A, ATP1A2 and SCN1A are not the major cause of hemiplegic migraine in Finnish patients, as only 9% (4/45) of the studied FHM families and none of the sporadic cases (n=201) carried pathogenic exonic variants in these genes. These data suggest that there are additional genetic factors contributing to the hemiplegic migraine phenotype.
In the second part of this thesis, we utilized data obtained from a previously published migraine GWAS to calculate polygenic risk scores (PRS) for 8,319 participants from the Finnish migraine family collection and 14,470 FINRISK population-based samples. Results showed that common polygenic variation significantly contributes to the familial aggregation of migraine. The polygenic burden was higher in familial migraine cases than in population cases. Furthermore, the polygenic burden was increased across all of the studied migraine with aura and migraine without aura subtypes in the family dataset compared with the population controls. Patients with typical migraine aura or hemiplegic migraine carried a higher load compared with patients having migraine without aura. Our findings are especially interesting considering that FHM has been suggested to be a monogenic disorder primarily driven by rare, high-impact variants.
The third part of this thesis focused on a previously identified migraine-epilepsy susceptibility locus on chromosome 12q24.2-q24.3 identified in a large multi-generational Finnish migraine-epilepsy family including 120 individuals. We defined a 450 kbp haplotype that was shared among 12 out of 13 epilepsy patients. This segment covers almost the entire NCOR2 gene, which plays an important regulatory role during brain development. Interestingly, one of the 123 migraine risk loci recently reported by the International Headache Genetics Consortium also co-localized with this region. Our results suggest that NCOR2 could potentially have a role in both migraine and epilepsy and could thus contribute to the susceptibility of both of these paroxysmal brain diseases. However, further studies are needed to identify the actual causal variants.
Overall, the results of this doctoral thesis highlight migraine as a clinically and genetically heterogeneous disease. Our results suggest that migraine with typical aura and hemiplegic migraine share a similar genetic background with a high polygenic load. Even FHM may not be a true monogenic disease, but rather a disease in which common risk variants, together with rare pathogenic variants and environmental risk factors, contribute to the disease outcome. Furthermore, our results provide genetic evidence from a large multi-generational Finnish family for potentially shared pathophysiology underlying both epilepsy and migraine.Migreeni on yleinen kohtauksellinen pÀÀnsÀrkysairaus. Kolmasosalla migreenipotilaista kohtauksiin liittyy auraoire, joka voi olla nÀkö-, puhe- tai tuntohÀiriö. Harvinaisessa hemiplegisessÀ migreenissÀ aura esiintyy toisella puolella kehoa puutumista ja voimattomuutta aiheuttavana oireena. Migreenin patofysiologiaa ei vielÀ tÀysin tunneta. Laajat geenitutkimukset ovat tunnistaneet yli sata migreeniriskiÀ hieman lisÀÀvÀÀ perimÀn vaihtelevaa kohtaa (geenivarianttia). Sairastumisriskiin merkittÀvÀsti yksinÀÀn vaikuttavia variantteja on tunnistettu vain monogeenisenÀ sairautena pidetylle hemiplegiselle migreenille.
TÀssÀ vÀitöskirjatutkimuksessa selvitettiin aurallisen migreenin sekÀ migreenin ja epilepsian yhteisesiintymisen geneettistÀ taustaa hyödyntÀmÀllÀ suurta suomalaista migreeniperheaineistoa (1967 perhettÀ, 8937 henkilöÀ).
Tutkimuksen ensimmÀisen osatyön tulokset osoittivat, ettÀ hemipleginen migreeni on osa aurallisen migreenin oirejatkumoa, vaikkakin hemiplegistÀ migreeniÀ sairastavien oireet ovat keskimÀÀrin vakavampia kuin tyypillistÀ aurallista migreeniÀ sairastavilla. Tulokset osoittivat myös, ettÀ kolme tunnettua hemiplegisen migreenin alttiusgeeniÀ (CACNA1A, ATP1A2 ja SCN1A) eivÀt yksinÀÀn riitÀ selittÀmÀÀn hemiplegisen migreenin esiintymistÀ suomalaisissa perheissÀ. Ainoastaan 9 %:lta tutkituista perheistÀ (4/45) löydettiin todennÀköinen sairausvariantti kyseisistÀ geeneistÀ.
Tutkimuksen toisen osatyön tulokset osoittivat, ettÀ monien geneettisten riskitekijöiden yhteisvaikutus selittÀÀ migreenin esiintymistÀ suvuissa. Tutkimuksessa havaittiin myös eroja eri migreenityyppien vÀlillÀ. Yleisten geneettisten riskitekijöiden muodostama taakka oli suurempi aurallisessa migreenissÀ, mukaan lukien hemiplegistÀ migreeniÀ sairastavat henkilöt, kuin aurattomassa migreenissÀ. Aiemmin hemiplegisen migreenin on ajateltu aiheutuvan pelkÀstÀÀn harvinaisista varianteista.
Tutkimuksen viimeisessÀ osatyössÀ keskityttiin yhteen suureen perheeseen ja siinÀ tunnistettuun migreenille ja epilepsialle altistavaan 12q24.31 kromosomialueeseen. Jatkotutkimukset osoittivat aivojen kehitykseen vaikuttavan NCOR2-geenin todennÀköisimmÀksi epilepsian alttiusgeeniksi kyseisessÀ perheessÀ. Yksi migreenin riskialueista sijaitsee samalla genomialueella, mikÀ tukee alueen merkitystÀ myös migreenin taustalla.
Kokonaisuudessaan tÀmÀn vÀitöstutkimuksen tulokset viittaavat siihen, ettÀ migreeni on kliinisesti ja geneettisesti heterogeeninen sairaus, jonka kehittymiseen vaikuttavat useat geenivariantit yhdessÀ ulkoisten tekijöiden kanssa. Aurallisen migreenin kliiniset oireet muodostavat jatkumon, jossa hemiplegisen migreenin oireet ovat kestoltaan, mÀÀrÀltÀÀn ja tyypiltÀÀn kaikkein vakavimpia. YllÀttÀen myös geneettisten riskitekijöiden yhteisvaikutus on aurallisilla ja hemiplegisillÀ migreenipotilailla kaikkein suurin