3,248 research outputs found
Replication in Genome-Wide Association Studies
Replication helps ensure that a genotype-phenotype association observed in a
genome-wide association (GWA) study represents a credible association and is
not a chance finding or an artifact due to uncontrolled biases. We discuss
prerequisites for exact replication, issues of heterogeneity, advantages and
disadvantages of different methods of data synthesis across multiple studies,
frequentist vs. Bayesian inferences for replication, and challenges that arise
from multi-team collaborations. While consistent replication can greatly
improve the credibility of a genotype-phenotype association, it may not
eliminate spurious associations due to biases shared by many studies.
Conversely, lack of replication in well-powered follow-up studies usually
invalidates the initially proposed association, although occasionally it may
point to differences in linkage disequilibrium or effect modifiers across
studies.Comment: Published in at http://dx.doi.org/10.1214/09-STS290 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
An evaluation of different meta-analysis approaches in the presence of allelic heterogeneity
Meta-analysis has proven a useful tool in genetic association studies. Allelic heterogeneity can arise from ethnic background differences across populations being meta-analyzed (for example, in search of common frequency variants through genome-wide association studies), and through the presence of multiple low frequency and rare associated variants in the same functional unit of interest (for example, within a gene or a regulatory region). The latter challenge will be increasingly relevant in whole-genome and whole-exome sequencing studies investigating association with complex traits. Here, we evaluate the performance of different approaches to meta-analysis in the presence of allelic heterogeneity. We simulate allelic heterogeneity scenarios in three populations and examine the performance of current approaches to the analysis of these data. We show that current approaches can detect only a small fraction of common frequency causal variants. We also find that for low-frequency variants with large effects (odds ratios 2–3), single-point tests have high power, but also high false-positive rates. P-value based meta-analysis of summary results from allele-matching locus-wide tests outperforms collapsing approaches. We conclude that current strategies for the combination of genetic association data in the presence of allelic heterogeneity are insufficiently powered
Statistical and Computational Methods for Analyzing and Visualizing Large-Scale Genomic Datasets
Advances in large-scale genomic data production have led to a need for better methods to process, interpret, and organize this data. Starting with raw sequencing data, generating results requires many complex data processing steps, from quality control, alignment, and variant calling to genome wide association studies (GWAS) and characterization of expression quantitative trait loci (eQTL). In this dissertation, I present methods to address issues faced when working with large-scale genomic datasets. In Chapter 2, I present an analysis of 4,787 whole genomes sequenced for the study of age-related macular degeneration (AMD) as a follow-up fine-mapping study to previous work from the International AMD Genomics Consortium (IAMDGC). Through whole genome sequencing, we comprehensively characterized genetic variants associated with AMD in known loci to provide additional insights on the variants potentially responsible for the disease by leveraging 60,706 additional controls. Our study improved the understanding of loci associated with AMD and demonstrated the advantages and disadvantages of different approaches for fine-mapping studies with sequence-based genotypes. In Chapter 3, I describe a novel method and a software tool to perform Hardy-Weinberg equilibrium (HWE) tests for structured populations. In sequence-based genetic studies, HWE test statistics are important quality metrics to distinguish true genetic variants from artifactual ones, but it becomes much less informative when it is applied to a heterogeneous and/or structured population. As next generation sequencing studies contain samples from increasingly diverse ancestries, we developed a new HWE test which addresses both the statistical and computational challenges of modern large-scale sequencing data and implemented the method in a publicly available software tool. Moreover, we extensively evaluated our proposed method with alternative methods to test HWE in both simulated and real datasets. Our method has been successfully applied to the latest variant calling QC pipeline in the TOPMed project. In Chapter 4, I describe PheGET, a web application to interactively visualize Expression Quantitative Trait Loci (eQTLs) across tissues, genes, and regions to aid functional interpretations of regulatory variants. Tissue-specific expression has become increasingly important for understanding the links between genetic variation and disease. To address this need, the Genotype-Tissue Expression (GTEx) project collected and analyzed a treasure trove of expression data. However, effectively navigating this wealth of data to find signals relevant to researchers has become a major challenge. I demonstrate the functionalities of PheGET using the newest GTEx data on our eQTL browser website at https://eqtl.pheweb.org/, allowing the user to 1) view all cis-eQTLs for a single variant; 2) view and compare single-tissue, single-gene associations within any genomic region; 3) find the best eQTL signal in any given genomic region or gene; and 4) customize the plotted data in real time. PheGET is designed to handle and display the kind of complex multidimensional data often seen in our post-GWAS era, such as multi-tissue expression data, in an intuitive and convenient interface, giving researchers an additional tool to better understand the links between genetics and disease.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162918/1/amkwong_1.pd
Population Structure and Cryptic Relatedness in Genetic Association Studies
We review the problem of confounding in genetic association studies, which
arises principally because of population structure and cryptic relatedness.
Many treatments of the problem consider only a simple ``island'' model of
population structure. We take a broader approach, which views population
structure and cryptic relatedness as different aspects of a single confounder:
the unobserved pedigree defining the (often distant) relationships among the
study subjects. Kinship is therefore a central concept, and we review methods
of defining and estimating kinship coefficients, both pedigree-based and
marker-based. In this unified framework we review solutions to the problem of
population structure, including family-based study designs, genomic control,
structured association, regression control, principal components adjustment and
linear mixed models. The last solution makes the most explicit use of the
kinships among the study subjects, and has an established role in the analysis
of animal and plant breeding studies. Recent computational developments mean
that analyses of human genetic association data are beginning to benefit from
its powerful tests for association, which protect against population structure
and cryptic kinship, as well as intermediate levels of confounding by the
pedigree.Comment: Published in at http://dx.doi.org/10.1214/09-STS307 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Recommended from our members
Developing Statistical Methods for Incorporating Complexity in Association Studies
Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with hundreds of human traits. Yet the common variant model tested by traditional GWAS only provides an incomplete explanation for the known genetic heritability of many traits. Many divergent methods have been proposed to address the shortcomings of GWAS, including most notably the extension of association methods into rarer variants through whole exome and whole genome sequencing. GWAS methods feature numerous simplifications designed for feasibility and ease of use, as opposed to statistical rigor. Furthermore, no systematic quantification of the performance of GWAS across all traits exists. Beyond improving the utility of data that already exist, a more thorough understanding of the performance of GWAS on common variants may elucidate flaws not in the method but rather in its implementation, which may pose a continued or growing threat to the utility of rare variant association studies now underway.
This thesis focuses on systematic evaluation and incremental improvement of GWAS modeling. We collect a rich dataset containing standardized association results from all GWAS conducted on quantitative human traits, finding that while the majority of published significant results in the field do not disclose sufficient information to determine whether the results are actually valid, those that do replicate precisely in concordance with their statistical power when conducted in samples of similar ancestry and reporting accurate per-locus sample sizes. We then look to the inability of effectively all existing association methods to handle missingness in genetic data, and show that adapting missingness theory from statistics can both increase power and provide a flexible framework for extending most existing tools with minimal effort. We finally undertake novel variant association in a schizophrenia cohort from a bottleneck population. We find that the study itself is confounded by nonrandom population sampling and identity-by-descent, manifesting as batch effects correlated with outcome that remain in novel variants after all sample-wide quality control. On the whole, these results emphasize both the past and present utility and reliability of the GWAS model, as well as the extent to which lessons from the GWAS era must inform genetic studies moving forward
Models and Methods for Genome-Wide Association Studies.
Genome-wide association (GWA) studies provide an extensive assessment of common genetic variants across the human genome for disease association. However, due to variation in allele frequencies and disease prevalence across populations, combining samples from different geographic or ethnic groups may lead to spurious evidence for association or diminish the true association signals. In part one of this dissertation, I propose a novel approach to correct for population stratification that makes use of the large amount of genetic information available in a GWA study. Based on allele-sharing identity-by-state (IBS) measures, I develop similarity scores that can describe genetic similarity between individuals, and match cases and controls accordingly. Association tests can then be performed conditional on the matched case-control groups. I apply our approach to the Pritzker bipolar GWA study.
In part two, I extend our matching approach to families of arbitrary structure. I first apply similarity score-based matching to selected members from each family and then assign other family members to the same matched group. I modify a corrected chi-square test [Bourgain et al., 2003] following the Mantel-Haenszel procedure to account for correlations both between the family samples and between the matched cases and controls.
The rapid advance in next-generation sequencing technologies allows a near-complete survey of genomic regions of interest and even whole genomes, enabling more extensive genetic association studies of rare variants. As we plan such re-sequencing studies of a complex disease, it is useful to consider the range of plausible genetic models, e.g., risk allele frequency (RAF) and genotype relative risk (GRR) of rare or less common causal variants, based on results of previous genetic linkage and association studies for the trait. In part three, I compute the power to detect linkage and/or association as a function of genetic model, and summarize the range of models likely to yield results that are consistent with existing GWA and/or linkage studies.Ph.D.BiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/77921/1/wguan_1.pd
Breast Cancer Risk, Fungicide Exposure and CYP1A1*2A Gene-Environment Interactions in a Province-Wide Case Control Study in Prince Edward Island, Canada
Scientific certainty regarding environmental toxin-related etiologies of breast cancer, particularly among women with genetic polymorphisms in estrogen metabolizing enzymes, is lacking. Fungicides have been recognized for their carcinogenic potential, yet there is a paucity of epidemiological studies examining the health risks of these agents. The association between agricultural fungicide exposure and breast cancer risk was examined in a secondary analysis of a province-wide breast cancer case-control study in Prince Edward Island (PEI) Canada. Specific objectives were: (1) to derive and examine the level of association between estimated fungicide exposures, and breast cancer risk among women in PEI; and (2) to assess the potential for gene-environment interactions between fungicide exposure and a CYP1A1 polymorphism in cases versus controls. After 1:3 matching of 207 cases to 621 controls by age, family history of breast cancer and menopausal status, fungicide exposure was not significantly associated with an increased risk of breast cancer (OR = 0.74; 95% CI: 0.46–1.17). Moreover, no statistically significant interactions between fungicide exposure and CYP1A1*2A were observed. Gene-environment interactions were identified. Though interpretations of findings are challenged by uncertainty of exposure assignment and small sample sizes, this study does provide grounds for further research
Recommended from our members
Genome-wide association study of primary open-angle glaucoma in continental and admixed African populations.
Primary open angle glaucoma (POAG) is a complex disease with a major genetic contribution. Its prevalence varies greatly among ethnic groups, and is up to five times more frequent in black African populations compared to Europeans. So far, worldwide efforts to elucidate the genetic complexity of POAG in African populations has been limited. We conducted a genome-wide association study in 1113 POAG cases and 1826 controls from Tanzanian, South African and African American study samples. Apart from confirming evidence of association at TXNRD2 (rs16984299; OR[T] 1.20; P = 0.003), we found that a genetic risk score combining the effects of the 15 previously reported POAG loci was significantly associated with POAG in our samples (OR 1.56; 95% CI 1.26-1.93; P = 4.79 × 10-5). By genome-wide association testing we identified a novel candidate locus, rs141186647, harboring EXOC4 (OR[A] 0.48; P = 3.75 × 10-8), a gene transcribing a component of the exocyst complex involved in vesicle transport. The low frequency and high degree of genetic heterogeneity at this region hampered validation of this finding in predominantly West-African replication sets. Our results suggest that established genetic risk factors play a role in African POAG, however, they do not explain the higher disease load. The high heterogeneity within Africans remains a challenge to identify the genetic commonalities for POAG in this ethnicity, and demands studies of extremely large size
- …