70 research outputs found
Statistical Methods and Models for Modern Genetic Analysis.
The Genome-Wide Association Study (GWAS) is the predominant tool to search for genetic risk variants that contribute to complex human disease. Despite the large number of GWAS findings, variants implicated by GWAS are themselves unlikely to fully explain the heritability of many diseases. In this dissertation, we propose statistical methods to augment GWAS and further our understanding of the genetic causes of complex disease.
In the first project, we consider the challenges of a gene-environment analysis performed as a follow-up to a significant initial GWAS result. It is known that effect estimates based on the same data that showed the significant GWAS result suffer from an upward bias called the “Winner's Curse." We show that the initial GWAS testing strategy can induce bias in both follow-up hypothesis testing and estimation for gene-environment interaction. We propose a novel bias-correction method based on a partial likelihood Markov Chain Monte Carlo algorithm.
In the second project, we shift attention to rare genetic variants that have low power of being detected by GWAS. We propose the Cumulative Minor Allele Test (CMAT) to pool together multiple rare variants from the same gene and test for an excessive burden of rare variants in either cases or controls. We show the CMAT performs favorably across a range of study designs. Notably, the CMAT accommodates probabilistic genotypes, extending applicability to low-coverage and imputed sequence data. We use a simulation analysis to validate study designs that combine sequenced and imputed samples as a means to improve power to detect rare risk variants.
Determining conditions that optimize imputation accuracy is important for successful application. In the final project, we propose a coalescent model of genotype imputation that allows fast, analytical estimates of imputation accuracy across complex population genetic models. We use our model to compare the performance of custom-made reference panels drawn from the same source population as imputation targets to publicly available reference panels (i.e. 1000 Genomes Project) that may differ in ancestry from the targets.Ph.D.BiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89761/1/mattz_1.pd
Geometric Framework for Evaluating Rare Variant Tests of Association
The wave of next-generation sequencing data has arrived. However, many questions still remain about how to best analyze sequence data, particularly the contribution of rare genetic variants to human disease. Numerous statistical methods have been proposed to aggregate association signals across multiple rare variant sites in an effort to increase statistical power; however, the precise relation between the tests is often not well understood. We present a geometric representation for rare variant data in which rare allele counts in case and control samples are treated as vectors in Euclidean space. The geometric framework facilitates a rigorous classification of existing rare variant tests into two broad categories: tests for a difference in the lengths of the case and control vectors, and joint tests for a difference in either the lengths or angles of the two vectors. We demonstrate that genetic architecture of a trait, including the number and frequency of risk alleles, directly relates to the behavior of the length and joint tests. Hence, the geometric framework allows prediction of which tests will perform best under different disease models. Furthermore, the structure of the geometric framework immediately suggests additional classes and types of rare variant tests. We consider two general classes of tests which show robustness to noncausal and protective variants. The geometric framework introduces a novel and unique method to assess current rare variant methodology and provides guidelines for both applied and theoretical researchers
Leveraging Summary Statistics to Make Inferences about Complex Phenotypes in Large Biobanks
As genetic sequencing becomes less expensive and data sets linking genetic data and medical records (e.g., Biobanks) become larger and more common, issues of data privacy and computational challenges become more necessary to address in order to realize the benefits of these datasets. One possibility for alleviating these issues is through the use of already-computed summary statistics (e.g., slopes and standard errors from a regression model of a phenotype on a genotype). If groups share summary statistics from their analyses of biobanks, many of the privacy issues and computational challenges concerning the access of these data could be bypassed. In this paper we explore the possibility of using summary statistics from simple linear models of phenotype on genotype in order to make inferences about more complex phenotypes (those that are derived from two or more simple phenotypes). We provide exact formulas for the slope, intercept, and standard error of the slope for linear regressions when combining phenotypes. Derived equations are validated via simulation and tested on a real data set exploring the genetics of fatty acids
Meta-Analysis of Gene Level Tests for Rare Variant Association
The vast majority of connections between complex disease and common genetic variants were identified through meta-analysis, a powerful approach that enables large sample sizes while protecting against common artifacts due to population structure, repeated small sample analyses, and/or limitations with sharing individual level data. As the focus of genetic association studies shifts to rare variants, genes and other functional units are becoming the unit of analysis. Here, we propose and evaluate new approaches for performing meta-analysis of rare variant association tests, including burden tests, weighted burden tests, variable threshold tests and tests that allow variants with opposite effects to be grouped together. We show that our approach retains useful features of single variant meta-analytic approaches and demonstrate its utility in a study of blood lipid levels in ∼18,500 individuals genotyped with exome arrays
Genome-wide analysis of 53,400 people with irritable bowel syndrome highlights shared genetic pathways with mood and anxiety disorders.
Irritable bowel syndrome (IBS) results from disordered brain-gut interactions. Identifying susceptibility genes could highlight the underlying pathophysiological mechanisms. We designed a digestive health questionnaire for UK Biobank and combined identified cases with IBS with independent cohorts. We conducted a genome-wide association study with 53,400 cases and 433,201 controls and replicated significant associations in a 23andMe panel (205,252 cases and 1,384,055 controls). Our study identified and confirmed six genetic susceptibility loci for IBS. Implicated genes included NCAM1, CADM2, PHF2/FAM120A, DOCK9, CKAP2/TPTE2P3 and BAG6. The first four are associated with mood and anxiety disorders, expressed in the nervous system, or both. Mirroring this, we also found strong genome-wide correlation between the risk of IBS and anxiety, neuroticism and depression (rg > 0.5). Additional analyses suggested this arises due to shared pathogenic pathways rather than, for example, anxiety causing abdominal symptoms. Implicated mechanisms require further exploration to help understand the altered brain-gut interactions underlying IBS
Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants
Psoriasis is a complex disease of skin with a prevalence of about 2%. We conducted the largest meta-analysis of genome-wide association studies (GWAS) for psoriasis to date, including data from eight different Caucasian cohorts, with a combined effective sample size amp;gt;39,000 individuals. We identified 16 additional psoriasis susceptibility loci achieving genome-wide significance, increasing the number of identified loci to 63 for European-origin individuals. Functional analysis highlighted the roles of interferon signalling and the NFkB cascade, and we showed that the psoriasis signals are enriched in regulatory elements from different T cells (CD8(+) T-cells and CD4(+) T-cells including T(H)0, T(H)1 and T(H)17). The identified loci explain similar to 28% of the genetic heritability and generate a discriminatory genetic risk score (AUC = 0.76 in our sample) that is significantly correlated with age at onset (p = 2 x 10(-89)). This study provides a comprehensive layout for the genetic architecture of common variants for psoriasis.Funding Agencies|National Institutes of Health [R01AR042742, R01AR050511, R01AR054966, R01AR063611, R01AR065183]; Foundation for the National Institutes of Health; Dermatology Foundation; National Psoriasis Foundation; Arthritis National Research Foundation; Ann Arbor Veterans Affairs Hospital; Dawn and Dudley Holmes Foundation; Babcock Memorial Trust; Medical Research Council [MR/L011808/1]; German Ministry of Education and Research (BMBF); Doris Duke Foundation [2013106]; National Institute of Health [K08AR060802, R01AR06907]; Taubman Medical Research Institute; Department of Health via the NIHR comprehensive Biomedical Research Center; Kings College London; KCH NHS Foundation Trust; Barbara and Neal Henschel Charitable Foundation; Heinz Nixdorf Foundation; Estonian Ministry of Education and Research [IUT20-46]; Centre of Translational Genomics of University of Tartu (SP1GVARENG); European Regional Development Fund (Centre of Translational Medicine, University of Tartu); German Federal Ministry of Education and Research (BMBF); National Human Genome Research Institute of the National Institutes of Health [R44HG006981]; International Psoriasis Council</p
GWAS of thyroid stimulating hormone highlights pleiotropic effects and inverse association with thyroid cancer
Correction: Volume12, Issue1 Article Number7354 DOI10.1038/s41467-021-27675-w PublishedDEC 16 2021Thyroid stimulating hormone (TSH) is critical for normal development and metabolism. To better understand the genetic contribution to TSH levels, we conduct a GWAS meta-analysis at 22.4 million genetic markers in up to 119,715 individuals and identify 74 genome-wide significant loci for TSH, of which 28 are previously unreported. Functional experiments show that the thyroglobulin protein-altering variants P118L and G67S impact thyroglobulin secretion. Phenome-wide association analysis in the UK Biobank demonstrates the pleiotropic effects of TSH-associated variants and a polygenic score for higher TSH levels is associated with a reduced risk of thyroid cancer in the UK Biobank and three other independent studies. Two-sample Mendelian randomization using TSH index variants as instrumental variables suggests a protective effect of higher TSH levels (indicating lower thyroid function) on risk of thyroid cancer and goiter. Our findings highlight the pleiotropic effects of TSH-associated variants on thyroid function and growth of malignant and benign thyroid tumors. Thyroid stimulating hormone (TSH) is critical for normal development and metabolism. Here, the authors conduct a GWAS and suggest protective effect of higher TSH on risk of thyroid cancer and goitre.Peer reviewe
Author Correction:GWAS of thyroid stimulating hormone highlights the pleiotropic effects and inverse association with thyroid cancer
The original version of this article contained an error in the results, in the second paragraph of the subsection entitled “Fine-mapping for potentially causal variants among TSH loci”, in which effect sizes for two variants were incorrectly reported
Evaluating the contribution of rare variants to type 2 diabetes and related traits using pedigrees
Significance
Contributions of rare variants to common and complex traits such as type 2 diabetes (T2D) are difficult to measure. This paper describes our results from deep whole-genome analysis of large Mexican-American pedigrees to understand the role of rare-sequence variations in T2D and related traits through enriched allele counts in pedigrees. Our study design was well-powered to detect association of rare variants if rare variants with large effects collectively accounted for large portions of risk variability, but our results did not identify such variants in this sample. We further quantified the contributions of common and rare variants in gene expression profiles and concluded that rare expression quantitative trait loci explain a substantive, but minor, portion of expression heritability.</jats:p
- …