245,387 research outputs found

    Unbiased estimation of odds ratios: combining genomewide association scans with replication studies

    Get PDF
    Odds ratios or other effect sizes estimated from genome scans are upwardly biased, because only the top-ranking associations are reported, and moreover only if they reach a defined level of significance. No unbiased estimate exists based on data selected in this fashion, but replication studies are routinely performed that allow unbiased estimation of the effect sizes. Estimation based on replication data alone is inefficient in the sense that the initial scan could, in principle, contribute information on the effect size. We propose an unbiased estimator combining information from both the initial scan and the replication study, which is more efficient than that based just on the replication. Specifically, we adjust the standard combined estimate to allow for selection by rank and significance in the initial scan. Our approach explicitly allows for multiple associations arising from a scan, and is robust to mis-specification of a significance threshold. We require replication data to be available but argue that, in most applications, estimates of effect sizes are only useful when associations have been replicated. We illustrate our approach on some recently completed scans and explore its efficiency by simulation. Genet. Epidemiol. 33:406–418, 2009. © 2009 Wiley-Liss, Inc

    Nonparametric and semiparametric inference on quantile lost lifespan

    Get PDF
    A new summary measure for time-to-event data, termed lost lifespan, is proposed in which the existing concept of reversed percentile residual life, or percentile inactivity time, is recast to show that it can be used for routine analysis to summarize life lost. The lost lifespan infers the distribution of time lost due to experiencing an event of interest before some specified time point. An estimating equation approach is adopted to avoid estimation of the probability density function of the underlying time-to-event distribution to estimate the variance of the quantile estimator. A K-sample test statistic is proposed to test the ratio of quantile lost lifespans. Simulation studies are performed to assess finite properties of the proposed statistic in terms of coverage probability and power. The concept of life lost is then extended to a regression setting to analyze covariate effects on the quantiles of the distribution of the lost lifespan under right censoring. An estimating equation, variance estimator, and minimum dispersion statistic for testing the significance of regression parameters are proposed and evaluated via simulation studies. The proposed approach reveals several advantages over existing methods for analyzing time-to-event data, which is illustrated with a breast cancer dataset from a Phase III clinical trial conducted by the National Surgical Adjuvant Breast and Bowel Project. Public Health Significance: The analysis of time-to-event data can provide important information about new treatments and therapies, particularly in clinical trial settings. The methods provided in this dissertation will allow public health researchers to analyze effectiveness of new treatments in terms of a new summary measure, life loss. In addition to providing statistical advantages over existing methods, analyzing time-to-event data in terms of the lost lifespan provides a more straightforward interpretation beneficial to clinicians, patients, and other stakeholders

    Bayesian methods to overcome the winner's curse in genetic studies

    Full text link
    Parameter estimates for associated genetic variants, report ed in the initial discovery samples, are often grossly inflated compared to the values observed in the follow-up replication samples. This type of bias is a consequence of the sequential procedure in which the estimated effect of an associated genetic marker must first pass a stringent significance threshold. We propose a hierarchical Bayes method in which a spike-and-slab prior is used to account for the possibility that the significant test result may be due to chance. We examine the robustness of the method using different priors corresponding to different degrees of confidence in the testing results and propose a Bayesian model averaging procedure to combine estimates produced by different models. The Bayesian estimators yield smaller variance compared to the conditional likelihood estimator and outperform the latter in studies with low power. We investigate the performance of the method with simulations and applications to four real data examples.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS373 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Measuring reproducibility of high-throughput experiments

    Full text link
    Reproducibility is essential to reliable scientific discovery in high-throughput experiments. In this work we propose a unified approach to measure the reproducibility of findings identified from replicate experiments and identify putative discoveries using reproducibility. Unlike the usual scalar measures of reproducibility, our approach creates a curve, which quantitatively assesses when the findings are no longer consistent across replicates. Our curve is fitted by a copula mixture model, from which we derive a quantitative reproducibility score, which we call the "irreproducible discovery rate" (IDR) analogous to the FDR. This score can be computed at each set of paired replicate ranks and permits the principled setting of thresholds both for assessing reproducibility and combining replicates. Since our approach permits an arbitrary scale for each replicate, it provides useful descriptive measures in a wide variety of situations to be explored. We study the performance of the algorithm using simulations and give a heuristic analysis of its theoretical properties. We demonstrate the effectiveness of our method in a ChIP-seq experiment.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS466 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    An adaptive significance threshold criterion for massive multiple hypotheses testing

    Full text link
    This research deals with massive multiple hypothesis testing. First regarding multiple tests as an estimation problem under a proper population model, an error measurement called Erroneous Rejection Ratio (ERR) is introduced and related to the False Discovery Rate (FDR). ERR is an error measurement similar in spirit to FDR, and it greatly simplifies the analytical study of error properties of multiple test procedures. Next an improved estimator of the proportion of true null hypotheses and a data adaptive significance threshold criterion are developed. Some asymptotic error properties of the significant threshold criterion is established in terms of ERR under distributional assumptions widely satisfied in recent applications. A simulation study provides clear evidence that the proposed estimator of the proportion of true null hypotheses outperforms the existing estimators of this important parameter in massive multiple tests. Both analytical and simulation studies indicate that the proposed significance threshold criterion can provide a reasonable balance between the amounts of false positive and false negative errors, thereby complementing and extending the various FDR control procedures. S-plus/R code is available from the author upon request.Comment: Published at http://dx.doi.org/10.1214/074921706000000392 in the IMS Lecture Notes--Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

    A review of applied methods in Europe for flood-frequency analysis in a changing environment

    Get PDF
    The report presents a review of methods used in Europe for trend analysis, climate change projections and non-stationary analysis of extreme precipitation and flood frequency. In addition, main findings of the analyses are presented, including a comparison of trend analysis results and climate change projections. Existing guidelines in Europe on design flood and design rainfall estimation that incorporate climate change are reviewed. The report concludes with a discussion of research needs on non-stationary frequency analysis for considering the effects of climate change and inclusion in design guidelines. Trend analyses are reported for 21 countries in Europe with results for extreme precipitation, extreme streamflow or both. A large number of national and regional trend studies have been carried out. Most studies are based on statistical methods applied to individual time series of extreme precipitation or extreme streamflow using the non-parametric Mann-Kendall trend test or regression analysis. Some studies have been reported that use field significance or regional consistency tests to analyse trends over larger areas. Some of the studies also include analysis of trend attribution. The studies reviewed indicate that there is some evidence of a general increase in extreme precipitation, whereas there are no clear indications of significant increasing trends at regional or national level of extreme streamflow. For some smaller regions increases in extreme streamflow are reported. Several studies from regions dominated by snowmelt-induced peak flows report decreases in extreme streamflow and earlier spring snowmelt peak flows. Climate change projections have been reported for 14 countries in Europe with results for extreme precipitation, extreme streamflow or both. The review shows various approaches for producing climate projections of extreme precipitation and flood frequency based on alternative climate forcing scenarios, climate projections from available global and regional climate models, methods for statistical downscaling and bias correction, and alternative hydrological models. A large number of the reported studies are based on an ensemble modelling approach that use several climate forcing scenarios and climate model projections in order to address the uncertainty on the projections of extreme precipitation and flood frequency. Some studies also include alternative statistical downscaling and bias correction methods and hydrological modelling approaches. Most studies reviewed indicate an increase in extreme precipitation under a future climate, which is consistent with the observed trend of extreme precipitation. Hydrological projections of peak flows and flood frequency show both positive and negative changes. Large increases in peak flows are reported for some catchments with rainfall-dominated peak flows, whereas a general decrease in flood magnitude and earlier spring floods are reported for catchments with snowmelt-dominated peak flows. The latter is consistent with the observed trends. The review of existing guidelines in Europe on design floods and design rainfalls shows that only few countries explicitly address climate change. These design guidelines are based on climate change adjustment factors to be applied to current design estimates and may depend on design return period and projection horizon. The review indicates a gap between the need for considering climate change impacts in design and actual published guidelines that incorporate climate change in extreme precipitation and flood frequency. Most of the studies reported are based on frequency analysis assuming stationary conditions in a certain time window (typically 30 years) representing current and future climate. There is a need for developing more consistent non-stationary frequency analysis methods that can account for the transient nature of a changing climate
    • …
    corecore