19 research outputs found

    Phenome-wide association study (PheWAS) on the genetic determinants of serum urate level and disease outcomes in UK Biobank

    Get PDF
    IntroductionElevated serum uric acid (SUA) concentration, known as hyperuricaemia, is a common abnormity in individuals with metabolic disorders. There is increasing evidence supporting the link between high SUA level and the increased risk of a wide range of clinical disorders, including hypertension, cardiovascular diseases (CVD), chronic renal diseases and metabolic syndrome. Although there are considerable research efforts in understanding the pathogenic pathways of high SUA level and the related clinical consequences, their causal relationships have not been established except for gout. Like other complex traits, genetic determinants play a substantial role (an estimated heritability of 40-70%) in the regulation of SUA level. Investigating the role of genetic variants related to SUA in various diseases might provide evidence for the above hypothesis which links uric acid to clinical disorders. Method Umbrella review was carried out first to provide a comprehensive overview on the range of health outcomes in relation to SUA level by incorporating evidence from systematic reviews and meta-analyses of observational studies, meta-analyses of randomised controlled trials (RCTs), and Mendelian randomisation (MR) studies. The umbrella review summarised the range of related health outcomes, the magnitude, direction and significance of identified associations and effects, and classified the evidence into four categories (class I [convincing], II [highly suggestive], III [suggestive], and IV [weak]) with assessment of multiple sources of biases. Then, a MR-PheWAS (Phenome-wide association study incorporated with Mendelian randomisation [MR]) was performed to investigate the associations between the 31 SUA genetic risk variants and a very wide range of disease outcomes by using the interim release data of UK Biobank (n=120,091). The SUA genetic risk loci were employed as instruments individually. The framework of phenome was defined by the PheCODE schema using the International Classification of Diseases (ICD) diagnosis codes documented in the health records of UK Biobank. Phenome-wide association test was performed first to identify any association across the SUA genetic risk loci and the phenome; MR design and HEIDI (heterogeneity in dependent instruments) tests were then applied to distinguish the PheWAS associations that were due to causality, pleiotropy or genetic linkage.To validate the MR-PheWAS findings, an enlarged Phenome-wide Mendelian randomisation (PWMR) analysis were performed by using data from the full UK Biobank cohort (n=339,256). A weighted polygenic risk score (GRS), incorporating effect estimates of multiple genetic risk loci, was employed as a proxy of the SUA level. The framework of phenome was defined by both the PheCODE schema and an alternative Tree-structured phenotypic model (TreeWAS) for analysis. Significant associations from these analyses were taken forward for replication in different populations by analysing data from various GWAS consortia documented in the MR-base database. Sensitivity analyses examining the pleiotropic effects of urate genetic risk loci on a set of metabolic traits were performed to explore any causal effects and pleiotropic associations.ResultsThe umbrella review included 101 articles and comprised 144 meta-analyses of observational studies, 31 meta-analyses of randomised controlled trials and 107 Mendelian randomisation studies. This remarkable assembly of evidence explored 136 unique health outcomes and reported convincing (class I) evidence for the causal role of SUA in gout and nephrolithiasis. Furthermore, highly suggestive (class II) evidence was reported for five health outcomes, in which high SUA level was associated with increased risk of heart failure, hypertension, impaired fasting glucose or diabetes, chronic kidney disease, and coronary heart disease mortality in the general population. The remaining 129 associations were classified as either suggestive or weak. The MR-PheWAS (using the interim release cohort) identified 25 disease groups/ outcomes to be associated with SUA genetic risk loci after multiple testing correction (p<8.6 ×10-5). The MR IVW (inverse variance weighted) analysis implicated a causal role of SUA level in three disease groups: inflammatory polyarthropathies (OR=1.22, 95% CI: 1.11 to 1.34), hypertensive disease (OR=1.08, 95% CI: 1.03 to 1.14) and disorders of metabolism (OR=1.07, 95% CI: 1.01 to 1.14); and four disease outcomes: gout (OR=4.88, 95% CI: 3.91 to 6.09), essential hypertension (OR=1.08, 95% CI: 1.03 to 1.14), myocardial infarction (OR=1.16, 95% CI: 1.03 to 1.30) and coeliac disease (OR=1.41, 95% CI: 1.05 to 1.89). After balancing pleiotropic effects in MR Egger analysis, only gout and its encompassing disease group of inflammatory polyarthropathies were considered to be causally associated with SUA level. The analysis also highlighted a locus (ATXN2/S2HB3) that may influence SUA level and multiple cardiovascular and autoimmune diseases via pleiotropy.The PWMR analysis, using data from the full UK Biobank cohort (n=339,256), examining the association with 1,431 disease outcomes, identified 13 phecodes that were associated with the weighted GRS of SUA level with the p value passing the significance threshold of PheWAS (p<3.4×10-4). These phecodes represent 4 disease groups: inflammatory polyarthropathies (OR=1.28; 95% CI: 1.21 to 1.35; p=4.97×10-19), hypertensive disease (OR=1.08; 95% CI: 1.05 to 1.11; p=6.02×10-7), circulatory disease (OR=1.05; 95% CI: 1.02 to 1.07; p=3.29×10-4) and metabolic disorders (OR=1.07; 95% CI: 1.03 to 1.11; p= 3.33×10-4), and 9 disease outcomes: gout (OR=5.37; 95% CI: 4.67 to 6.18; p= 4.27×10-123), gouty arthropathy (OR=5.11; 95% CI: 2.45 to 10.66; p=1.39×10-5), pyogenic arthritis (OR=2.10; 95% CI: 1.41 to 3.14; p=2.87×10-4), essential hypertension (OR=1.08; 95% CI: 1.05 to 1.11; p=6.62×10-7), coronary atherosclerosis (OR=1.10; 95% CI: 1.05 to 1.15; p=1.17×10-5), ischaemic heart disease (OR=1.10, 95% CI: 1.05 to 1.15; p=1.73×10-5), chronic ischaemic heart disease (OR=1.10, 95% CI: 1.05 to 1.15; p=1.52×10-5), myocardial infarction (OR=1.15, 95% CI=1.07 to 1.23, p=5.23×10-5), and hypercholesterolaemia (OR=1.08, 95% CI: 1.04 to 1.13, p=3.34×10-4). Findings from the TreeWAS analysis were generally consistent with that of PheWAS, with a number of more sub-phenotypes being identified. Results from IVW MR suggested that genetically determined high serum urate level was associated with increased risk of gout (OR=4.53, 95%CI: 3.64-5.64, p=9.66×10-42), CHD (OR=1.10, 95%CI: 1.02 to 1.19, p=0.009), myocardial infarction (OR=1.11, 95%CI:1.02 to 1.20, p=0.011) and decreased level of HDL-c (OR=0.93, 95%CI:0.88 to 0.98, p=0.004), but had no effect on RA (OR=0.92, 95%CI: 0.84 to 1.01, p=0.085) and ischaemic stroke (OR=1.03, 95%CI: 0.93 to 1.14, P= 0.582). Egger MR indicated pleiotropic effects on the causal estimates of DBP (P_pleiotropy=0.014), SBP (P_pleiotropy=0.003), CHD (P_pleiotropy=0.008), myocardial infarction (P_pleiotropy=0.008) and HDL-c (P_pleiotropy=0.016). When balancing out the potential pleiotropic effects in Egger MR, a causal effect can only be verified for gout (OR=4.17, 95%CI: 3.03 to 5.74, P_effect=1.27×〖10〗^(-9); P_pleiotropy=0.485). Sensitivity analyses on the GRSs of different groups of pleiotropic loci support an inference that pleiotropic effects of genetic variants on urate and metabolic traits contribute to the observed associations with cardiovascular/metabolic diseases. ConclusionsThis thesis presents a comprehensive investigation on the health outcomes in relation to SUA level. The causal relationship between high SUA level and gout is robustly verified in this thesis with consistent evidence from the umbrella review, the MR-PheWAS and the PWMR. The association of high SUA level with hypertension and heart diseases is supported by both the evidence from umbrella review and analyses conducted in this thesis, however, given the caveat of pleiotropy in the causal inference, a conclusion of causality on hypertension and heart diseases is not robust enough based on the current findings. Furthermore, the epidemiological evidence from the umbrella review indicated that high SUA level was associated with several components of metabolic disorders, and the analyses of the UK Biobank data identified a significant association with metabolic disorders and a sub-phenotype (hypercholesterolaemia). The causal inference in this study is limited by the common difficulty of pleiotropy caused by the use of multiple genetic instruments. Although we have performed sensitivity analysis by excluding the key pleiotropic locus, unmeasured pleiotropy and biases are still possible. In particular, unbalanced pleiotropy is recognised as an issue for the causal connections on the association between SUA level and hypertension. Other potential causal relevance of SUA level with respiratory diseases and ocular diseases is also worthy of further investigation. Overall, when taken together the findings from umbrella review, MR-PheWAS, PheWAS/TreeWAS analysis, MR replication and sensitivity analysis conducted in this thesis, I conclude that there are robust associations between urate and several disease groups, including gout, hypertensive diseases, heart diseases and metabolic disorders, but the causal role of urate only exists in gout. This study indicates that the observed associations between urate and cardiovascular/metabolic diseases are probably derived from the pleiotropic effects of genetic variants on urate and metabolic traits. Further investigation of therapies targeting the shared biological pathways between urate and metabolic traits may be beneficial for the treatment of gout and the primary prevention of cardiovascular/metabolic diseases

    Phenome-wide association studies across large population cohorts support drug target validation

    Get PDF
    Phenome-wide association studies (PheWAS) have been proposed as a possible aid in drug development through elucidating mechanisms of action, identifying alternative indications, or predicting adverse drug events (ADEs). Here, we select 25 single nucleotide polymorphisms (SNPs) linked through genome-wide association studies (GWAS) to 19 candidate drug targets for common disease indications. We interrogate these SNPs by PheWAS in four large cohorts with extensive health information (23andMe, UK Biobank, FINRISK, CHOP) for association with 1683 binary endpoints in up to 697,815 individuals and conduct meta-analyses for 145 mapped disease endpoints. Our analyses replicate 75% of known GWAS associations (P<0.05) and identify nine study-wide significant novel associations (of 71 with FDR <0.1). We describe associations that may predict ADEs, e.g., acne, high cholesterol, gout, and gallstones with rs738409 (p.I148M) in PNPLA3 and asthma with rs1990760 (p.T946A) in IFIH1. Our results demonstrate PheWAS as a powerful addition to the toolkit for drug discovery.Peer reviewe

    Investigating genetic determinants of liver disease and its associations with cardiovascular diseases

    Get PDF
    Background Dramatic modifications in lifestyle have given rise to an epidemic in chronic liver diseases, predominantly driven by non-alcoholic fatty liver disease (NAFLD). The more severe NAFLD phenotypes are associated with elevated liver iron, inflammation (steatohepatitis), scarring and liver failure (fibrosis, cirrhosis), and possibly with cardiovascular diseases (CVDs); genetic and population studies of these phenotypes and their links to CVDs have been limited. Aims 1) Investigate the genetic susceptibility underlying liver MRI phenotypes (iron and corrected T1 (cT1), a steatohepatitis proxy) and explore associations with other cardiometabolic traits. 2) Investigate whether liver fibrosis is an independent risk factor for CVDs. Methods We carried out genome-wide association studies (GWASs) of liver MRI phenotypes (iron (N = 8,289), and corrected T1 (a steatohepatitis proxy, N = 14,440)) in UK Biobank. We used genetics to investigate causality with other traits. We calculated a FIB-4 score (a validated non-invasive scoring system that predicts liver fibrosis) in 44,956 individuals in the UK and investigated its association with the incidence of five CVDs (ischaemic stroke, myocardial infarction, heart failure, peripheral arterial disease, atrial fibrillation (AF)). Results Three genetic variants known to influence hepcidin regulation (rs1800562 (C282Y) and rs1799945 (H63D) in HFE, rs855791 (V736A) in TMPRSS6) were associated with liver iron (p < 5 x 10-8). Mendelian randomisation provided evidence that central obesity causes higher liver iron. Four variants (rs75935921 in SLC30A10, rs13107325 in SLC39A8, rs58542926 in TM6SF2, rs738409 in PNPLA3) were associated with elevated cT1 (p < 5 x 10-8). Insulin resistance, type 2 diabetes, fatty liver, and BMI were causally associated with elevated cT1 whilst favourable adiposity was protective. In 44,956 individuals over a median of 5.4 years, adjusted models demonstrated strong associations of “suspected liver fibrosis” (FIB-4 1.3) with cirrhosis (Hazard ratio (HR 13.64 [10.79 – 17.26], p < 2 x 10-16)) and hepatocellular carcinoma (HR 11.64 [5.15 – 26.31], p = 3.5 x 10-9), but no association with the incidence of most CVDs, albeit a modest increase in AF risk (HR 1.18 [1.01 – 1.37]), when compared to individuals with a FIB-4 < 1.3. Conclusions This thesis provides genetic evidence that mechanisms underlying higher liver iron content are likely systemic rather than organ specific. The association between two metal ion transporters and cT1 indicates a new mechanism in steatohepatitis. There is little evidence to suggest that liver fibrosis is an independent risk factor for most CVDs, except possibly a small increase risk in incident AF risk. This thesis’ findings can be used to investigate causality, generate hypotheses for drug development and inform health policies

    Phenome wide association study of vitamin D genetic variants in the UK Biobank cohort

    Get PDF
    Introduction Vitamin D status is an important public health issue due to the high prevalence of vitamin D insufficiency and deficiency, especially in high latitude areas. Furthermore, it has been reported to be associated with a number of diseases. In a previous umbrella review of meta-analyses of randomized clinical trials (RCTs) and of observational studies, it was found that plasma/ serum 25-hydroxyvitamin D (25(OH)D) or supplemental vitamin D has been linked to more than 130 unique health outcomes. However, the majority of the studies yielded conflicting results and no association was convincing. Aim and Objectives The aim of my PhD was to comprehensively explore the association between vitamin D and multiple outcomes. The specific objectives were to: 1) update the umbrella review of meta-analysis of observational studies or randomized controlled trials on associations between vitamin D and health outcomes published between 2014 and 2018; 2) conduct a systematic literature review of previous Mendelian Randomization studies on causal associations between vitamin D and all outcomes; 3) conduct a systematic literature review of published phenome wide association studies, summarizing the methods, results and predictors; 4) create a polygenic risk score of vitamin D related genetic variants, weighted by their effect estimates from the most recent genome wide association study; 5) encode phenotype groups based on electronic medical records of participants; 6) study the associations between vitamin D related SNPs and the whole spectrum of health outcomes, defined by electronic medical records utilising the UK Biobank study; 7) explore the causal effect of 25- hydroxyvitamin D level on health outcomes by applying novel instrumental variable methods. Methods First I updated the vitamin D umbrella review published in 2015, by summarizing the evidence from meta-analyses of observational studies and meta-analyses of RCTs published between 2014 and 2018. I also performed a systematic literature review of all previous Mendelian Randomizations studies on the effect of vitamin D on all health outcomes, as well as a systematic review of all published PheWAS studies and the methodology they applied. Then I conducted original data analysis in a large prospective population-based cohort, the UK Biobank, which includes more than 500,000 participants. A 25(OH)D genetic risk score (weighted sum score of 6 serum 25(OH)D-related SNPs: rs3755967, rs12785878, rs10741657, rs17216707, rs10745742 and rs8018720, as identified by the largest genome wide association study of 25(OH)D levels) was constructed to be used as the instrumental variable. I used a phenotyping algorithm to code the electronic medical records (EMR) of UK Biobank participants into 1853 distinct disease categories and I then ran the PheWAS analysis to test the associations between the 25(OH)D genetic risk score and 950 disease outcome groups (i.e. outcomes with more than 200 cases). For phenotypes found to show a statistically significant association with 25(OH)D levels in the PheWAS or phenotypes which were found to be convincing or highly suggestive in previous studies, I developed an extended case definition by incorporating self-reported data collected by UK Biobank baseline questionnaire and interview. The possible causal effect of vitamin D on those outcomes was then explored by the MR two-stage method, inverse variance weighted MR and Egger’s regression, followed by sensitivity analyses. Results In the updated systematic literature review of meta-analyses of observational studies or RCTs, only studies on new outcomes which had not been covered by the previous umbrella review were included. A total of 95 meta-analyses met the inclusion criteria. Among the included studies there were 66 meta-analyses of observational studies, and 29 meta-analyses of RCTs. Eighty-five new outcomes were explored by meta-analyses of observational studies, and 59 new outcomes were covered by meta-analyses of RCTs. In the systematic review of published Mendelian Randomization studies on vitamin D, a total of 29 studies were included. A causal role of 25(OH)D level was supported by MR analysis for the following outcomes: type 2 diabetes, total adiponectin, diastolic blood pressure, risk of hypertension, multiple sclerosis, Alzheimer’s disease, all-cause mortality, cancer mortality, mortality excluding cancer and cardiovascular events, ovarian cancer, HDL-cholesterol, triglycerides and cognitive functions. For the systematic literature review of published PheWAS studies and their methodology, a total of 45 studies were included. The processes for implementing a PheWAS study include the following steps: sample selection, predictor selection, phenotyping, statistical analysis and result interpretation. One of the main challenges is the definitions of the phenotypes (i.e., the method of binning participants into different phenotype groups). In the phenotyping step, an ICD curated phenotyping was widely used by previous PheWAS, which I also used in my own analysis. By applying the ICD curated phenotyping, 1853 phenotype groups were defined in the participants I used. In PheWAS, only phenotype groups with more than 200 cases were analysed (920 phenotypes). In the PheWAS, only associations between rs17216707 (CYP24A1) and “calculus of ureter” (beta = -0.219, se = 0.045, P = 1.14*10-6), “urinary calculus” (beta = -0.129, se = 0.027, P = 1.31*10-6), “alveolar and parietoalveolar pneumonopathy” (beta = 0.418, se = 0.101, P = 3.53*10-5) survived Bonferroni correction. Nine outcomes, including systolic blood pressure, diastolic blood pressure, body mass index, risk of hypertension, type 2 diabetes, ischemic heart disease, depression, non-vertebral fracture and all-cause mortality were explored in MR analyses. The MR analysis had more than 80% power for detecting a true odds ratio of 1.2 or larger for binary outcomes. None of explored outcomes were statistically significant. Results from multiple MR methods and sensitivity analyses were consistent. Discussion Vitamin D and its association with multiple outcomes has been widely studied. More than 230 outcomes have been linked with vitamin D by meta-analyses of observational studies and RCTs. On the contrary, evidence from Mendelian Randomization studies is lacking. In particular I identified only 20 existing MR studies and only 13 outcomes were suggested to be causally related to vitamin D. In the systematic literature review of previous PheWAS studies, I summarized the applied methods, predictors and results. Although phenotyping based on ICD codes provided good performance and was widely applied by previous PheWAS studies, phenotyping can be improved if lab data, imaging data and medical notes can be incorporated. Alternative algorithms, which takes advantage of deep learning and thus enable high precision phenotyping, needs to be developed. From the PheWAS analysis, the score of vitamin D related genetic variants was not statistically significantly associated with any of the 920 phenotypes tested. In the single variant analysis, only rs17216707 (CYP24A1) was shown to be associated with calculus outcomes statistically significantly. Previous studies reported associations between vitamin D and hypercalcemia, hypercalciuria, nephrolithiasis and nephrocalcinosis, may be due to the role of vitamin D in calcium homeostasis. In the MR analysis, I found no evidence of large to moderate (OR>1.2) causal associations of vitamin D on a very wide range of health outcomes. These included SBP, DBP, hypertension, T2D, IHD, BMI, depression, non-vertebral fracture and allcause mortality which have previously been proposed to be influenced by low vitamin D levels. Further, even larger studies, probably involving the joint analysis of data from several large biobanks with future IVs that explain a higher proportion of the trait variance, will be required to exclude smaller causal effects which could have public health importance because of the high population prevalence of low vitamin D levels in some populations

    Functional Analysis of Genomic Variation and Impact on Molecular and Higher Order Phenotypes

    Get PDF
    Reverse genetics methods, particularly the production of gene knockouts and knockins, have revolutionized the understanding of gene function. High throughput sequencing now makes it practical to exploit reverse genetics to simultaneously study functions of thousands of normal sequence variants and spontaneous mutations that segregate in intercross and backcross progeny generated by mating completely sequenced parental lines. To evaluate this new reverse genetic method we resequenced the genome of one of the oldest inbred strains of mice—DBA/2J—the father of the large family of BXD recombinant inbred strains. We analyzed ~100X wholegenome sequence data for the DBA/2J strain, relative to C57BL/6J, the reference strain for all mouse genomics and the mother of the BXD family. We generated the most detailed picture of molecular variation between the two mouse strains to date and identified 5.4 million sequence polymorphisms, including, 4.46 million single nucleotide polymorphisms (SNPs), 0.94 million intersections/deletions (indels), and 20,000 structural variants. We systematically scanned massive databases of molecular phenotypes and ~4,000 classical phenotypes to detect linked functional consequences of sequence variants. In majority of cases we successfully recovered known genotype-to-phenotype associations and in several cases we linked sequence variants to novel phenotypes (Ahr, Fh1, Entpd2, and Col6a5). However, our most striking and consistent finding is that apparently deleterious homozygous SNPs, indels, and structural variants have undetectable or very modest additive effects on phenotypes

    Statistical and Computational Methods for Genome-Wide Association Analysis

    Full text link
    Technological and scientific advances in recent years have revolutionized genomics. For example, decreases in whole genome sequencing (WGS) costs have enabled larger WGS studies as well as larger imputation reference panels, which in turn provide more comprehensive genomic coverage from lower-cost genotyping methods. In addition, new technologies and large collaborative efforts such as ENCODE and GTEx have shed new light on regulatory genomics and the function of non-coding variation, and produced expansive publicly available data sets. These advances have introduced data of unprecedented size and dimension, unique statistical and computational challenges, and numerous opportunities for innovation. In this dissertation, we develop methods to leverage functional genomics data in post-GWAS analysis, to expedite routine computations with increasingly large genetic data sets, and to address limitations of current imputation reference panels for understudied populations. In Chapter 2, we propose strategies to improve imputation and increase power in GWAS of understudied populations. Genotype imputation is instrumental in GWAS, providing increased genomic coverage from low-cost genotyping arrays. Imputation quality depends crucially on reference panel size and the genetic distance between reference and target haplotypes. Current reference panels provide excellent imputation quality in many European populations, but lower quality in non-European, admixed, and isolate populations. We consider a GWAS strategy in which a subset of participants is sequenced and the rest are imputed using a reference panel that comprises the sequenced participants together with individuals from an external reference panel. Using empirical data from the HRC and TOPMed WGS Project, simulations, and asymptotic analysis, we identify powerful and cost-effective study designs for GWAS of non-European, admixed, and isolated populations. In Chapter 3, we develop efficient methods to estimate linkage disequilibrium (LD) with large data sets. Motivated by practical and logistical constraints, a variety of statistical methods and tools have been developed for analysis of GWAS summary statistics rather than individual-level data. These methods often rely on LD estimates from an external reference panel, which are ideally calculated on-the-fly rather than precomputed and stored. We develop efficient algorithms to estimate LD exploiting sparsity and haplotype structure and implement our methods in an open-source C++ tool, emeraLD. We benchmark performance using genotype data from the 1KGP, HRC, and UK Biobank, and find that emeraLD is up to two orders of magnitude faster than existing tools while using comparable or less memory. In Chapter 4, we develop methods to identify causative genes and biological mechanisms underlying associations in post-GWAS analysis by leveraging regulatory and functional genomics databases. Many gene-based association tests can be viewed as instrumental variable methods in which intermediate phenotypes, e.g. tissue-specific expression or protein alteration, are hypothesized to mediate the association between genotype and GWAS trait. However, LD and pleiotropy can confound these statistics, which complicates their mechanistic interpretation. We develop a hierarchical Bayesian model that accounts for multiple potential mechanisms underlying associations using functional genomic annotations derived from GTEx, Roadmap/ENCODE, and other sources. We apply our method to analyze twenty-five complex traits using GWAS summary statistics from UK Biobank, and provide an open-source implementation of our methods. In Chapter 5, we review our work, discuss its relevance and prospects as new resources emerge, and suggest directions for future research.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147697/1/corbinq_1.pd

    Statistical Methods for Large Scale Genetic Analyses

    Full text link
    Population scale genomic analyses have informed the development of novel therapeutics, diagnostics, and understanding of disease etiology. Among the recent developments in human genetic association analyses, electronic health record (EHR) linked biobanks and population scale whole genome sequencing (WGS) have provided fertile ground for association discovery. In tandem with the emergence of these approaches, novel computational and statistical approaches are needed to address the methodological challenges of working with these data. In Chapter 2, I present study design recommendations and meta-analysis results for genetic association studies applied to clinical laboratory data in EHR linked biobanks. We conducted genome-wide association studies (GWAS) of 70 clinical lab traits from both the Michigan Genomics Initiative (MGI) and BioVU from the University of Vanderbilt health system. In addition to the discovery of novel association results, we conducted systematic study design analyses in parallel across the two biobanks to inform recommendations for association studies of lab traits. In Chapter 3, I present a novel sparse Mendelian randomization (MR) method for causal inference. MR methods are an instrumental variable approach for inferring the causal effect of an exposure on an outcome using genetic variants as an instrument. Under settings where the proportion of genetic variants that are causal is low, current approaches that assume dense genetic architectures may have poor statistical power. Here, we present a novel Bayesian MR method using a horseshoe prior which can be applied to summary statistics. The horseshoe prior is a continuous-scale shrinkage prior which facilitates variable selection. We use simulations to evaluate the performance of the method across genetic architectures. We apply the method to lab trait GWAS summary statistics. In Chapter 4, I present a novel method for estimating the rate at which somatic clones are expanding in clonal hematopoiesis. Clonal hematopoiesis refers to a state of mosaicism in blood defined by the acquisition of oncogenic driver mutations at an appreciate clone size and can be identified using WGS. Previous approaches for describing the growth of these mutations have relied on longitudinal sequencing methods. Here, we develop a Bayesian hierarchical model for estimating the parameters that describe the expansion of driver variants. In contrast to previous reports, our method only requires a single draw of blood. We validate the method using simulations and longitudinal amplicon sequencing. We apply our method to ~5,000 samples with clonal hematopoiesis from the Trans-Omics for Precision Medicine (TOPMed) sequencing initiative, enabling association studies of the molecular determinants of clonal expansion.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169713/1/jweinstk_1.pd

    Geeniinfo väärtus südame-veresoonkonnahaiguste riski hindamisel

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publiktasiooneFakt, et südame-veresoonkonnahaigused on peamiseks suremuse põhjustajaks maailmas, rõhutab vajadust edendada ja täiustada olemasolevaid haiguse ennetus- ja ennustusstrateegiaid. Südame-veresoonkonnahaiguste riski hindamine põhineb tänases kliinilises praktikas klassikalisi fenotüübilisi riskitegureid arvestavatel riski hindamise mudelitel. Kuigi nimetatud strateegia võimaldab kõrge riskiga indiviide suhteliselt hästi tuvastada, jääb pea kolmandiku riski hinnang ebatäpseks ning ravimääramine ebaselgeks. Lisaks eelnevale peegeldub mudelite piiratud kasutus selles, et riskifaktorite loetlemisega hinnatakse tegelikkuses molekulaarsel tasandil juba toimunud muutusi. Seega leevendatakse praeguse strateegia kasutamisel pigem patoloogia progresseerunud kulgu, kui pärsitakse või ennetatakse molekulaarsete mehhanismide häirumist varases staadiumis. Üheks võimalikuks edasiarenduse meetmeks pakutakse haiguse geneetilise informatsiooni arvestamist. Seda eeskätt seetõttu, et südame-veresoonkonnahaiguste geneetiliste seoste uuringutega on täna jõutud hinnanguteni, millel on potentsiaali muuta oluliselt täpsemaks nii tervete indiviidide varast haigusriski hindamist kui ka haigete kliinilist käsitlust. Selle doktoritöö peamiseks eesmärgiks on anda ülevaade tänastest südame-veresoonkonnahaiguste riski hindamise meetmetest ning sellest, kas ja kuidas geneetilise informatsiooni kaasamine igapäeva kliinilistesse otsustesse neid edendada võiks. Lisaks toon näiteid, kuidas kõrge resolutsiooniga genoomi järjestusandmestik võimaldaks tunnusega seotud põhjuslikke geenivariante täpsemini tuvastada ning kuidas populatsiooni-põhise biopanga andmete kasutamine tõhustaks kõrge riskiga indiviidide kliinilist käsitlust.Cardiovascular diseases are the main cause of morbidity and mortality worldwide, underscoring the requisite for improved strategies for disease prevention and risk prediction. The main approach applied in today's clinical practice to identify those at increased cardiovascular risk relies on the utilization of phenotypic risk models that facilitate the estimation of one's disease risk based on traditional risk factors. While this strategy is beneficial for avoiding disease incidence and it does on the whole target individuals at high risk for treatment sufficiently well, a third of individuals, who experience an adverse event, are misclassified into a lower risk category and are therefore advocated treatment ambiguously. Importantly, the current approach lacks in providing accurate estimation for primordial prevention, that is estimating risk before risk factors emerge. To overcome this issue and seek for approaches to enhance risk estimation, attention has now been turned to genetics with the aim of incorporating genetic information into established risk prediction strategies. The scrutiny of the genetic architecture of cardiovascular diseases conducted in recent decades has today resulted in estimates that can be of clinical utility and value. This doctoral thesis aims to give an overview of the status quo of the genomic research on cardiovascular diseases and contemplate on what the advances in molecular technology, computational capacities and large-scale initiatives have enabled, what the progress of these endeavours entail and whether these do bestow incremental value for clinical utility. Furthermore, I will bring examples of how the utilization of high-coverage sequencing data can enhance the search for the genetic underpinnings of cardiovascular disease-associated phenotypes, and how the use of large-scale cohorts and population-based biobanks can enable the anticipated improvement in disease risk estimation, especially when integrated into a national healthcare system.https://www.ester.ee/record=b522706
    corecore