3,248 research outputs found

    Replication in Genome-Wide Association Studies

    Full text link
    Replication helps ensure that a genotype-phenotype association observed in a genome-wide association (GWA) study represents a credible association and is not a chance finding or an artifact due to uncontrolled biases. We discuss prerequisites for exact replication, issues of heterogeneity, advantages and disadvantages of different methods of data synthesis across multiple studies, frequentist vs. Bayesian inferences for replication, and challenges that arise from multi-team collaborations. While consistent replication can greatly improve the credibility of a genotype-phenotype association, it may not eliminate spurious associations due to biases shared by many studies. Conversely, lack of replication in well-powered follow-up studies usually invalidates the initially proposed association, although occasionally it may point to differences in linkage disequilibrium or effect modifiers across studies.Comment: Published in at http://dx.doi.org/10.1214/09-STS290 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    An evaluation of different meta-analysis approaches in the presence of allelic heterogeneity

    Get PDF
    Meta-analysis has proven a useful tool in genetic association studies. Allelic heterogeneity can arise from ethnic background differences across populations being meta-analyzed (for example, in search of common frequency variants through genome-wide association studies), and through the presence of multiple low frequency and rare associated variants in the same functional unit of interest (for example, within a gene or a regulatory region). The latter challenge will be increasingly relevant in whole-genome and whole-exome sequencing studies investigating association with complex traits. Here, we evaluate the performance of different approaches to meta-analysis in the presence of allelic heterogeneity. We simulate allelic heterogeneity scenarios in three populations and examine the performance of current approaches to the analysis of these data. We show that current approaches can detect only a small fraction of common frequency causal variants. We also find that for low-frequency variants with large effects (odds ratios 2–3), single-point tests have high power, but also high false-positive rates. P-value based meta-analysis of summary results from allele-matching locus-wide tests outperforms collapsing approaches. We conclude that current strategies for the combination of genetic association data in the presence of allelic heterogeneity are insufficiently powered

    Statistical and Computational Methods for Analyzing and Visualizing Large-Scale Genomic Datasets

    Full text link
    Advances in large-scale genomic data production have led to a need for better methods to process, interpret, and organize this data. Starting with raw sequencing data, generating results requires many complex data processing steps, from quality control, alignment, and variant calling to genome wide association studies (GWAS) and characterization of expression quantitative trait loci (eQTL). In this dissertation, I present methods to address issues faced when working with large-scale genomic datasets. In Chapter 2, I present an analysis of 4,787 whole genomes sequenced for the study of age-related macular degeneration (AMD) as a follow-up fine-mapping study to previous work from the International AMD Genomics Consortium (IAMDGC). Through whole genome sequencing, we comprehensively characterized genetic variants associated with AMD in known loci to provide additional insights on the variants potentially responsible for the disease by leveraging 60,706 additional controls. Our study improved the understanding of loci associated with AMD and demonstrated the advantages and disadvantages of different approaches for fine-mapping studies with sequence-based genotypes. In Chapter 3, I describe a novel method and a software tool to perform Hardy-Weinberg equilibrium (HWE) tests for structured populations. In sequence-based genetic studies, HWE test statistics are important quality metrics to distinguish true genetic variants from artifactual ones, but it becomes much less informative when it is applied to a heterogeneous and/or structured population. As next generation sequencing studies contain samples from increasingly diverse ancestries, we developed a new HWE test which addresses both the statistical and computational challenges of modern large-scale sequencing data and implemented the method in a publicly available software tool. Moreover, we extensively evaluated our proposed method with alternative methods to test HWE in both simulated and real datasets. Our method has been successfully applied to the latest variant calling QC pipeline in the TOPMed project. In Chapter 4, I describe PheGET, a web application to interactively visualize Expression Quantitative Trait Loci (eQTLs) across tissues, genes, and regions to aid functional interpretations of regulatory variants. Tissue-specific expression has become increasingly important for understanding the links between genetic variation and disease. To address this need, the Genotype-Tissue Expression (GTEx) project collected and analyzed a treasure trove of expression data. However, effectively navigating this wealth of data to find signals relevant to researchers has become a major challenge. I demonstrate the functionalities of PheGET using the newest GTEx data on our eQTL browser website at https://eqtl.pheweb.org/, allowing the user to 1) view all cis-eQTLs for a single variant; 2) view and compare single-tissue, single-gene associations within any genomic region; 3) find the best eQTL signal in any given genomic region or gene; and 4) customize the plotted data in real time. PheGET is designed to handle and display the kind of complex multidimensional data often seen in our post-GWAS era, such as multi-tissue expression data, in an intuitive and convenient interface, giving researchers an additional tool to better understand the links between genetics and disease.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162918/1/amkwong_1.pd

    Population Structure and Cryptic Relatedness in Genetic Association Studies

    Get PDF
    We review the problem of confounding in genetic association studies, which arises principally because of population structure and cryptic relatedness. Many treatments of the problem consider only a simple ``island'' model of population structure. We take a broader approach, which views population structure and cryptic relatedness as different aspects of a single confounder: the unobserved pedigree defining the (often distant) relationships among the study subjects. Kinship is therefore a central concept, and we review methods of defining and estimating kinship coefficients, both pedigree-based and marker-based. In this unified framework we review solutions to the problem of population structure, including family-based study designs, genomic control, structured association, regression control, principal components adjustment and linear mixed models. The last solution makes the most explicit use of the kinships among the study subjects, and has an established role in the analysis of animal and plant breeding studies. Recent computational developments mean that analyses of human genetic association data are beginning to benefit from its powerful tests for association, which protect against population structure and cryptic kinship, as well as intermediate levels of confounding by the pedigree.Comment: Published in at http://dx.doi.org/10.1214/09-STS307 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Models and Methods for Genome-Wide Association Studies.

    Full text link
    Genome-wide association (GWA) studies provide an extensive assessment of common genetic variants across the human genome for disease association. However, due to variation in allele frequencies and disease prevalence across populations, combining samples from different geographic or ethnic groups may lead to spurious evidence for association or diminish the true association signals. In part one of this dissertation, I propose a novel approach to correct for population stratification that makes use of the large amount of genetic information available in a GWA study. Based on allele-sharing identity-by-state (IBS) measures, I develop similarity scores that can describe genetic similarity between individuals, and match cases and controls accordingly. Association tests can then be performed conditional on the matched case-control groups. I apply our approach to the Pritzker bipolar GWA study. In part two, I extend our matching approach to families of arbitrary structure. I first apply similarity score-based matching to selected members from each family and then assign other family members to the same matched group. I modify a corrected chi-square test [Bourgain et al., 2003] following the Mantel-Haenszel procedure to account for correlations both between the family samples and between the matched cases and controls. The rapid advance in next-generation sequencing technologies allows a near-complete survey of genomic regions of interest and even whole genomes, enabling more extensive genetic association studies of rare variants. As we plan such re-sequencing studies of a complex disease, it is useful to consider the range of plausible genetic models, e.g., risk allele frequency (RAF) and genotype relative risk (GRR) of rare or less common causal variants, based on results of previous genetic linkage and association studies for the trait. In part three, I compute the power to detect linkage and/or association as a function of genetic model, and summarize the range of models likely to yield results that are consistent with existing GWA and/or linkage studies.Ph.D.BiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/77921/1/wguan_1.pd

    Breast Cancer Risk, Fungicide Exposure and CYP1A1*2A Gene-Environment Interactions in a Province-Wide Case Control Study in Prince Edward Island, Canada

    Get PDF
    Scientific certainty regarding environmental toxin-related etiologies of breast cancer, particularly among women with genetic polymorphisms in estrogen metabolizing enzymes, is lacking. Fungicides have been recognized for their carcinogenic potential, yet there is a paucity of epidemiological studies examining the health risks of these agents. The association between agricultural fungicide exposure and breast cancer risk was examined in a secondary analysis of a province-wide breast cancer case-control study in Prince Edward Island (PEI) Canada. Specific objectives were: (1) to derive and examine the level of association between estimated fungicide exposures, and breast cancer risk among women in PEI; and (2) to assess the potential for gene-environment interactions between fungicide exposure and a CYP1A1 polymorphism in cases versus controls. After 1:3 matching of 207 cases to 621 controls by age, family history of breast cancer and menopausal status, fungicide exposure was not significantly associated with an increased risk of breast cancer (OR = 0.74; 95% CI: 0.46–1.17). Moreover, no statistically significant interactions between fungicide exposure and CYP1A1*2A were observed. Gene-environment interactions were identified. Though interpretations of findings are challenged by uncertainty of exposure assignment and small sample sizes, this study does provide grounds for further research
    • …
    corecore