808 research outputs found

    Quantifying and correcting for the winner's curse in quantitative-trait association studies

    Full text link
    Quantitative traits (QT) are an important focus of human genetic studies both because of interest in the traits themselves and because of their role as risk factors for many human diseases. For large-scale QT association studies including genome-wide association studies, investigators usually focus on genetic loci showing significant evidence for SNP-QT association, and genetic effect size tends to be overestimated as a consequence of the winner's curse. In this paper, we study the impact of the winner's curse on QT association studies in which the genetic effect size is parameterized as the slope in a linear regression model. We demonstrate by analytical calculation that the overestimation in the regression slope estimate decreases as power increases. To reduce the ascertainment bias, we propose a three-parameter maximum likelihood method and then simplify this to a one-parameter method by excluding nuisance parameters. We show that both methods reduce the bias when power to detect association is low or moderate, and that the one-parameter model generally results in smaller variance in the estimate. Genet. Epidemiol . 35:133-138, 2011.  © 2011 Wiley-Liss, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/83456/1/20551_ftp.pd

    Using Ontology Fingerprints to evaluate genome-wide association study results

    Get PDF
    We describe an approach to characterize genes or phenotypes via ontology fingerprints which are composed of Gene Ontology (GO) terms overrepresented among those PubMed abstracts linked to the genes or phenotypes. We then quantify the biological relevance between genes and phenotypes by comparing their ontology fingerprints to calculate a similarity score. We validated this approach by correctly identifying genes belong to their biological pathways with high accuracy, and applied this approach to evaluate GWA study by ranking genes associated with the lipid concentrations in plasma as well as to prioritize genes within linkage disequilibrium (LD) block. We found that the genes with highest scores were: ABCA1, LPL, and CETP for HDL; LDLR, APOE and APOB for LDL; and LPL, APOA1 and APOB for triglyceride. In addition, we identified some top ranked genes linking to lipid metabolism from the literature even in cases where such knowledge was not reflected in current annotation of these genes. These results demonstrate that ontology fingerprints can be used effectively to prioritize genes from GWA studies for experimental validation

    Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results

    Full text link
    Individual sequencing studies often have limited sample sizes and so limited power to detect trait associations with rare variants. A common strategy is to aggregate data from multiple studies. For studying rare variants, jointly calling all samples together is the gold standard strategy but can be difficult to implement due to privacy restrictions and computational burden. Here, we compare joint calling to the alternative of single‐study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and assess their impact on downstream association analysis. To do so, we analyze deep‐coverage (~82×) exome and low‐coverage (~5×) genome sequence data on 2,250 individuals from the Genetics of Type 2 Diabetes study jointly and separately within five geographic cohorts.For rare single nucleotide variants (SNVs): (a) ≥97% of discovered SNVs are found by both calling strategies; (b) nonreference concordance with a set of highly accurate genotypes is ≥99% for both calling strategies; (c) meta‐analysis has similar power to joint analysis in deep‐coverage sequence data but can be less powerful in low‐coverage sequence data. Given similar data processing and quality control steps, we recommend single‐study calling as a viable alternative to joint calling for analyzing SNVs of all minor allele frequency in deep‐coverage data.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/153654/1/gepi22261_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/153654/2/gepi22261.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/153654/3/gepi22261-sup-0002-final_revised_supp_figures_7_19_2019.pd

    Evaluating the Calibration and Power of Three Gene‐Based Association Tests of Rare Variants for the X Chromosome

    Full text link
    Although genome‐wide association studies (GWAS) have identified thousands of trait‐associated genetic variants, there are relatively few findings on the X chromosome. For analysis of low‐frequency variants (minor allele frequency <5%), investigators can use region‐ or gene‐based tests where multiple variants are analyzed jointly to increase power. To date, there are no gene‐based tests designed for association testing of low‐frequency variants on the X chromosome. Here we propose three gene‐based tests for the X chromosome: burden, sequence kernel association test (SKAT), and optimal unified SKAT (SKAT‐O). Using simulated case‐control and quantitative trait (QT) data, we evaluate the calibration and power of these tests as a function of (1) male:female sample size ratio; and (2) coding of haploid male genotypes for variants under X‐inactivation. For case‐control studies, all three tests are reasonably well‐calibrated for all scenarios we evaluated. As expected, power for gene‐based tests depends on the underlying genetic architecture of the genomic region analyzed. Studies with more (haploid) males are generally less powerful due to decreased number of chromosomes. Power generally is slightly greater when the coding scheme for male genotypes matches the true underlying model, but the power loss for misspecifying the (generally unknown) model is small. For QT studies, type I error and power results largely mirror those for binary traits. We demonstrate the use of these three gene‐based tests for X‐chromosome association analysis in simulated data and sequencing data from the Genetics of Type 2 Diabetes (GoT2D) study.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/115905/1/gepi21935.pd

    Multi‐SKAT: General framework to test for rare‐variant association with multiple phenotypes

    Full text link
    In genetic association analysis, a joint test of multiple distinct phenotypes can increase power to identify sets of trait‐associated variants within genes or regions of interest. Existing multiphenotype tests for rare variants make specific assumptions about the patterns of association with underlying causal variants, and the violation of these assumptions can reduce power to detect association. Here, we develop a general framework for testing pleiotropic effects of rare variants on multiple continuous phenotypes using multivariate kernel regression (Multi‐SKAT). Multi‐SKAT models affect sizes of variants on the phenotypes through a kernel matrix and perform a variance component test of association. We show that many existing tests are equivalent to specific choices of kernel matrices with the Multi‐SKAT framework. To increase power of detecting association across tests with different kernel matrices, we developed a fast and accurate approximation of the significance of the minimum observed P value across tests. To account for related individuals, our framework uses random effects for the kinship matrix. Using simulated data and amino acid and exome‐array data from the METabolic Syndrome In Men (METSIM) study, we show that Multi‐SKAT can improve power over single‐phenotype SKAT‐O test and existing multiple‐phenotype tests, while maintaining Type I error rate.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147759/1/gepi22156.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147759/2/gepi22156_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/147759/3/gepi22156-sup-0001-Supplementary_GenEpi_Revision_Final.pd

    Recommended Joint and Meta‐Analysis Strategies for Case‐Control Association Testing of Single Low‐Count Variants

    Full text link
    In genome‐wide association studies of binary traits, investigators typically use logistic regression to test common variants for disease association within studies, and combine association results across studies using meta‐analysis. For common variants, logistic regression tests are well calibrated, and meta‐analysis of study‐specific association results is only slightly less powerful than joint analysis of the combined individual‐level data. In recent sequencing and dense chip based association studies, investigators increasingly test low‐frequency variants for disease association. In this paper, we seek to (1) identify the association test with maximal power among tests with well controlled type I error rate and (2) compare the relative power of joint and meta‐analysis tests. We use analytic calculation and simulation to compare the empirical type I error rate and power of four logistic regression based tests: Wald, score, likelihood ratio, and Firth bias‐corrected. We demonstrate for low‐count variants (roughly minor allele count [MAC] < 400) that: (1) for joint analysis, the Firth test has the best combination of type I error and power; (2) for meta‐analysis of balanced studies (equal numbers of cases and controls), the score test is best, but is less powerful than Firth test based joint analysis; and (3) for meta‐analysis of sufficiently unbalanced studies, all four tests can be anti‐conservative, particularly the score test. We also establish MAC as the key parameter determining test calibration for joint and meta‐analysis.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/99692/1/gepi21742.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/99692/2/gepi21742-sup-0010-figureS1.pd

    Ungeplante Schwangerschaften: wider das Ideal der Naturbeherrschung?

    Full text link
    "Prozesse einer allgemein zunehmenden Informalisierung und Technisierung der Gesellschaft machen auch vor dem generativen Verhalten nicht halt. Zwischen einer ursprünglich vorherrschenden Natürlichkeit von Reproduktion ist vermittelt über den medizinisch-technischen Fortschritt das Ideal der Planbarkeit fertilen Verhaltens getreten. Gegenwärtige mediale und wissenschaftliche Diskurse verweisen auf zwei Seiten von Planbarkeit und Fertilität: einerseits die durch Verhütung gesicherte Planbarkeit gegen ein Kind wie auch andererseits die - notfalls medizinisch unterstützte - Planbarkeit zum Kind. Empirisches Material spricht jedoch eine andere Sprache, deuten sie doch auf diametrale Prozesse zu diesem Planbarkeitsideal. Der Beitrag wird sich auf einen der beschriebenen zwei Aspekte konzentrieren: die ungeplanten Schwangerschaften. Wie 'natürlich' bzw. ungeplant sind ungeplante Schwangerschaften? Die sehr wenigen Studien (z.B. DESIS, Frauen leben, SOEP), die über ungeplante Schwangerschaften berichten, geben für die Bundesrepublik einen Anteil zwischen 30 und 40 Prozent ungeplanter Schwangerschaften an. International betrachtet schwanken die Ergebnisse hierzu beträchtlich. In Zeiten sicherer Verhütungsmittel und wenn man berücksichtigt, dass Schwangerschaften biologisch betrachtet nur in recht engen Zeitfenstern ('fruchtbare Tage') eintreten können, stellt sich die Frage, wie sich der hohe Anteil von ungeplanten Schwangerschaften erklärt. Die diesbezüglich üblicherweise zu findenden Erklärungen, die auf geringe Bildung, niedriges Alter sowie unsichere Anwendung von und geringes Wissen über Kontrazeptiva, oder auch auf ein uneindeutiges Verständnis des Begriffes von Planbarkeit verweisen, sind nicht wirklich überzeugend, zieht man etwa neuere Ergebnisse zum Verhütungsverhalten heran. Denkbar ist zusätzlich, dass gerade auch Ambivalenzen im Kinderwunsch das generative Verhalten beeinflussen und eher auf inkonsequente Verhütungspraxen hinweisen. Der Beitrag wird versuchen, diesen Aspekten mittels vorliegender nationaler und ggf. internationaler Studien zu ungeplanten Schwangerschaften näher zu kommen. Aktuelles empirisches Material bietet der Datensatz des so genannten Mini-Panels des PAIRFAM-Projekts (Panel Analysis of Intimate Relationships and Family Dynamics). Er ist einer der wenigen Datensätze, der neben der expliziten Erhebung generativer Intentionen prospektiv auch die proximalen Faktoren (wie z.B. das Verhütungsverhalten, Einstellungen zur Schwangerschaft usw.) mit erhebt und nähere Antworten auf das Planungsverhalten von fertilen Verhaltens verspricht." (Autorenreferat
    corecore