8 research outputs found

    A comprehensive SNP and indel imputability database

    Get PDF
    Motivation: Genotype imputation has become an indispensible step in genome-wide association studies (GWAS). Imputation accuracy, directly influencing downstream analysis, has shown to be improved using re-sequencing-based reference panels; however, this comes at the cost of high computational burden due to the huge number of potentially imputable markers (tens of millions) discovered through sequencing a large number of individuals. Therefore, there is an increasing need for access to imputation quality information without actually conducting imputation. To facilitate this process, we have established a publicly available SNP and indel imputability database, aiming to provide direct access to imputation accuracy information for markers identified by the 1000 Genomes Project across four major populations and covering multiple GWAS genotyping platforms

    Imputation-based Genetic Association Analysis of Complex Traits in Admixed Populations

    Get PDF
    Genetic association studies in admixed populations have drawn increasing attention from the genetic community, as performing association analysis in diverse populations allows us to gain deeper understanding of the genetic architecture of human diseases and traits. However, population stratification due to admixture poses special challenges. To address the challenges, I conducted the following studies from the perspectives of enhancing genotype imputation quality and providing proper treatment of local ancestry in the association analysis. First, I provided a new resource of marker imputability information with commonly used reference panels to guide the choice of reference and genotyping platforms. To be specific, I systematically evaluated marker imputation quality using sequencing-based reference panels from the 1000 Genomes Project and released the information through a user-friendly and publicly available data portal. This is the first resource providing variant imputability information specific to each continental group and to each genotyping platform. Second, I established a paradigm for better imputation in African Americans using study-specific sequencing based reference panels. I built an internal reference panel consisting of variants derived from the NHLBI Exome Sequencing Project for African American subjects, which significantly increased effective sample size comparing with that from the 1000 Genomes Project. No loss of imputation quality was observed using a panel built from phenotypic extremes. In addition, I recommended using haplotypes from Exome Sequencing Project alone or concatenation of the two panels over quality score-based post-imputation selection or IMPUTE2’s two-panel combination. Finally, I proposed a robust and powerful two-step testing procedure for association analysis in admixed populations. Through extensive numeric simulations, I demonstrated that our testing procedure robustly captures and pinpoints associations due to allele effect, ancestry effect or the existence of effect heterogeneity between the two ancestral populations. In particular, our testing procedure is more powerful in identifying the presence of effect heterogeneity than traditional cross-product interaction model. I further illustrated its usefulness by applying the two-step testing procedure to test for the association between genetic variants and hemoglobin trait in African American participates from CARe. Taken together, the above studies guide genotype imputation practice and substantially improve the power of imputation-based genetic association studies in admixed populations, leading to more accurate discovery of disease-associated variants and ultimately better therapeutic strategies in admixed populations.Doctor of Philosoph

    Genetic Imputation: Accuracy to Application

    Get PDF
    Genotype imputation, the process of inferring genotypes for untyped variants, is used to identify and refine genetic association findings. This body of work focuses on assessing imputation accuracy and uses imputed data to identify genetic contributors to mentholated cigarette preference. Inaccuracies in imputed data can distort the observed association between variants and a disease. Many statistics are used to assess accuracy; some compare imputed to genotyped data and others are calculated without reference to true genotypes. Prior work has shown that the Imputation Quality Score (IQS), which is based on Cohens kappa statistic and compares imputed genotype probabilities to true genotypes, appropriately adjusts for chance agreement; however, it is not commonly used. To identify differences in accuracy assessment, we compared IQS with concordance rate, squared correlation, and accuracy measures built into imputation programs. Genotypes from the 1000 Genomes reference populations (AFR N = 246 and EUR N = 379) were masked to match the typed single nucleotide polymorphism (SNP) coverage of several SNP arrays and were imputed with BEAGLE 3.3.2 and IMPUTE2 in regions associated with smoking behaviors. Additional masking and imputation was conducted for sequenced subjects from the Collaborative Genetic Study of Nicotine Dependence and the Genetic Study of Nicotine Dependence in African Americans (N = 1,481 African Americans and N =1,480 European Americans). Our results offer further evidence that concordance rate inflates accuracy estimates, particularly for rare and low frequency variants. For common variants, squared correlation, BEAGLE R2, IMPUTE2 INFO, and IQS produce similar assessments of imputation accuracy. However, for rare and low frequency variants, compared to IQS, the other statistics tend to be more liberal in their assessment of accuracy. IQS is important to consider when evaluating imputation accuracy, particularly for rare and low frequency variants. This work directly impacts the interpretation of association studies by improving our understanding of accuracy assessments of imputed variants. Mentholated cigarettes are addictive, widely available, and commonly used, particularly by African American smokers. We aim to identify genetic variants that increase susceptibility to mentholated cigarette use in hopes of gaining biological insights into risk that may ultimately improve cessation efforts. We begin by pursuing hypothesis-driven candidate genes and regions (TAS2R38, CHRNA5/A3/B4, CHRNB3/A6, and CYP2A6/A7) and extend to a genome-wide approach. This study involves 1,365 African Americans and 2,206 European Americans (3,571 combined ancestry) nicotine dependent current smokers from The Collaborative Genetic Study of Nicotine Dependence (COGEND) and Transdisciplinary Tobacco Use Research Center (UW-TTURC). Analyses were conducted within each cohort, and meta-analysis was used to combine results across studies and across ancestral groups. We identified some suggestively associated variants, although none meet genome wide significance. This study represents a new, important aspect to understanding menthol cigarette preference. Further work is necessary to better understand this smoking behavior in efforts to improve cessation

    STATISTICAL METHODS FOR GENETIC AND EPIGENETIC ASSOCIATION STUDIES

    Get PDF
    First, in genome-wide association studies, few methods have been developed for rare variants which are one of the natural places to explain some missing heritability left over from common variants. Therefore, we propose EM-LRT that incorporates imputation uncertainty for downstream association analysis, with improved power and/or computational efficiency. We consider two scenarios: I) when posterior probabilities of all possible genotypes are estimated; and II) when only the one-dimensional summary statistic, imputed dosage, is available. Our methods show enhanced statistical power over existing methods and are computationally more efficient than the best existing method for association analysis of variants with low frequency or imputation quality. Second, although genome-wide association studies have identified a large number of loci associated with complex traits, a substantial proportion of the heritability remains unexplained. Thanks to advanced technology, we may now conduct large-scale epigenome-wide association studies. DNA methylation is of particular interest because it is highly dynamic and has been shown to be associated with many complex human traits, including immune dysfunctions, cardiovascular diseases, multiple cancer, and aging. We propose FunMethyl, a penalized functional regression framework to perform association testing between multiple DNA methylation sites in a region and a quantitative outcome. Our results from both real data based simulations and real data clearly show that FunMethyl outperforms single-site analysis across a wide spectrum of realistic scenarios. Finally, large studies may have a mixture of old and new arrays, or a mixture of old and new technologies, on the large number of samples they investigate. These different arrays or technologies usually measure different sets of methylation sites, making data analysis challenging. We propose a method to predict site-specific DNA methylation level from one array to another - a penalized functional regression model that uses functional predictors to capture non-local correlation from non-neighboring sites and covariates to capture local correlation. Application to real data shows promising results: the proposed model can predict methylation level at sites on a new array reasonably well from those on an old array.Doctor of Philosoph

    Improving the Reporting of Pharmacogenetic Studies to Facilitate Evidence Synthesis: Anti-Tuberculosis Drug-Related Toxicity as an Example

    Get PDF
    Background In pharmacogenetic studies, researchers explore how genetic variants impact individuals’ responses to drugs. Implementation of pharmacogenetic tests in clinical practice can improve treatment efficacy and reduce toxicity. For health service providers to implement pharmacogenetic testing in clinical practice, the pharmacogenetic association of interest must be supported by strong evidence. Performing meta-analyses of pharmacogenetic studies increases sample size and power, and is therefore an indispensable tool to researchers striving to improve the strength of evidence for pharmacogenetic associations. The aim of this thesis is to identify and resolve challenges that reviewers might encounter when synthesising evidence from primary pharmacogenetic studies. Methods We explored methods of evidence synthesis for pharmacogenetic studies and applied them to undertake a systematic review and meta-analysis of associations between genetic variants and anti-tuberculosis drug-related toxicity. We applied both standard methods of meta-analysis, and more complex methods of meta-analysis that account for correlation between related effect sizes for each genetic variant. Conducting this systematic review and meta-analysis enabled us to identify that key information was often poorly reported in the primary pharmacogenetic studies. In order to improve the reporting of pharmacogenetic studies with a view to facilitating the evidence synthesis process, we used consensus methodology to develop a reporting guideline for pharmacogenetic studies, known as the STROPS (Strengthening The Reporting Of Pharmacogenetic Studies) guideline. Results Our systematic review of the association between genetic variants and anti-tuberculosis drug-related toxicity included 70 studies. Slow acetylators are more likely to experience anti-TB drug-induced hepatotoxicity than intermediate/rapid acetylators. We also observed associations between the CYP2E1 RsaI and GSTM1 null polymorphisms and hepatotoxicity. Key information, such as the ethnicity of included patients, methodological quality, and patient cohort overlap, was poorly reported. We also found that improvements in the reporting of outcome data would give systematic reviewers greater freedom in terms of their analysis approach. As part of the development of the STROPS guideline, 52 individuals from key stakeholder groups participated in two rounds of a Delphi survey. A total of eight individuals participated in a consensus meeting, before the 54-item STROPS guideline was finalised. Conclusions Our systematic review showed that pharmacogenetic testing may be useful in clinical practice in terms of risk stratification for hepatotoxicity during TB treatment. More studies are needed to overcome methodological limitations of the existing studies and to assess the feasibility and cost-effectiveness of a stratified medicine approach. It is currently challenging to synthesise pharmacogenetic evidence, due to poor reporting of primary studies. We encourage authors to adhere to the STROPS guideline when publishing pharmacogenetic studies. The STROPS guideline will not only improve the transparency of reporting of pharmacogenetic studies, but will also facilitate the conduct of high-quality systematic reviews and meta-analyses, and thus improve the power to detect pharmacogenetic associations
    corecore