Estimating the number and size of the main effects in genome-wide case-control association studies

Abstract

It has recently become possible to screen thousands of markers to detect genetic causes of common diseases. Along with this potential comes analytical challenges, and it is important to develop new statistical tools to identify markers with causal effects and accurately estimate their effect sizes. Knowledge of the proportion of markers without true effects (p0) and the effect sizes of markers with effects provides information to control for false discoveries and to design follow-up studies. We apply newly developed methods to simulated Genetic Analysis Workshop 15 genome-wide case-control data sets, including a maximum likelihood (ML) and a quasi-ML (QML) approach that incorporate the test statistic distribution and estimates effect size simultaneously with p0, and two conservative estimators of p0 that do not rely on the test statistic distribution under the alternative. Compared with four existing commonly used estimators for p0, our results illustrated that all of our estimators have favorable properties in terms of the standard deviation with which p0 is estimated. On average, the ML method performed slightly better than the QML method; the conservative method performed well and was even slightly more precise than the ML estimators, and can be more robust in less optimal conditions (small sample sizes and small number of markers). Further improvements and extensions of the proposed methods are conceivable, such as estimating the distribution of effect sizes and taking population stratification into account when obtain estimates of p0 and effect size

    Similar works