245,387 research outputs found
Unbiased estimation of odds ratios: combining genomewide association scans with replication studies
Odds ratios or other effect sizes estimated from genome scans are upwardly biased, because only the top-ranking associations are reported, and moreover only if they reach a defined level of significance. No unbiased estimate exists based on data selected in this fashion, but replication studies are routinely performed that allow unbiased estimation of the effect sizes. Estimation based on replication data alone is inefficient in the sense that the initial scan could, in principle, contribute information on the effect size. We propose an unbiased estimator combining information from both the initial scan and the replication study, which is more efficient than that based just on the replication. Specifically, we adjust the standard combined estimate to allow for selection by rank and significance in the initial scan. Our approach explicitly allows for multiple associations arising from a scan, and is robust to mis-specification of a significance threshold. We require replication data to be available but argue that, in most applications, estimates of effect sizes are only useful when associations have been replicated. We illustrate our approach on some recently completed scans and explore its efficiency by simulation. Genet. Epidemiol. 33:406–418, 2009. © 2009 Wiley-Liss, Inc
Nonparametric and semiparametric inference on quantile lost lifespan
A new summary measure for time-to-event data, termed lost lifespan, is proposed in which the existing concept of reversed percentile residual life, or percentile inactivity time, is recast to show that it can be used for routine analysis to summarize life lost. The lost lifespan infers the distribution of time lost due to experiencing an event of interest before some specified time point. An estimating equation approach is adopted to avoid estimation of the probability density function of the underlying time-to-event distribution to estimate the variance of the quantile estimator. A K-sample test statistic is proposed to test the ratio of quantile lost lifespans. Simulation studies are performed to assess finite properties of the proposed statistic in terms of coverage probability and power. The concept of life lost is then extended to a regression setting to analyze covariate effects on the quantiles of the distribution of the lost lifespan under right censoring. An estimating equation, variance estimator, and minimum dispersion statistic for testing the significance of regression parameters are proposed and evaluated via simulation studies. The proposed approach reveals several advantages over existing methods for analyzing time-to-event data, which is illustrated with a breast cancer dataset from a Phase III clinical trial conducted by the National Surgical Adjuvant Breast and Bowel Project.
Public Health Significance: The analysis of time-to-event data can provide important information about new treatments and therapies, particularly in clinical trial settings. The methods provided in this dissertation will allow public health researchers to analyze effectiveness of new treatments in terms of a new summary measure, life loss. In addition to providing statistical advantages over existing methods, analyzing time-to-event data in terms of the lost lifespan provides a more straightforward interpretation beneficial to clinicians, patients, and other stakeholders
Bayesian methods to overcome the winner's curse in genetic studies
Parameter estimates for associated genetic variants, report ed in the initial
discovery samples, are often grossly inflated compared to the values observed
in the follow-up replication samples. This type of bias is a consequence of the
sequential procedure in which the estimated effect of an associated genetic
marker must first pass a stringent significance threshold. We propose a
hierarchical Bayes method in which a spike-and-slab prior is used to account
for the possibility that the significant test result may be due to chance. We
examine the robustness of the method using different priors corresponding to
different degrees of confidence in the testing results and propose a Bayesian
model averaging procedure to combine estimates produced by different models.
The Bayesian estimators yield smaller variance compared to the conditional
likelihood estimator and outperform the latter in studies with low power. We
investigate the performance of the method with simulations and applications to
four real data examples.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS373 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Measuring reproducibility of high-throughput experiments
Reproducibility is essential to reliable scientific discovery in
high-throughput experiments. In this work we propose a unified approach to
measure the reproducibility of findings identified from replicate experiments
and identify putative discoveries using reproducibility. Unlike the usual
scalar measures of reproducibility, our approach creates a curve, which
quantitatively assesses when the findings are no longer consistent across
replicates. Our curve is fitted by a copula mixture model, from which we derive
a quantitative reproducibility score, which we call the "irreproducible
discovery rate" (IDR) analogous to the FDR. This score can be computed at each
set of paired replicate ranks and permits the principled setting of thresholds
both for assessing reproducibility and combining replicates. Since our approach
permits an arbitrary scale for each replicate, it provides useful descriptive
measures in a wide variety of situations to be explored. We study the
performance of the algorithm using simulations and give a heuristic analysis of
its theoretical properties. We demonstrate the effectiveness of our method in a
ChIP-seq experiment.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS466 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
An adaptive significance threshold criterion for massive multiple hypotheses testing
This research deals with massive multiple hypothesis testing. First regarding
multiple tests as an estimation problem under a proper population model, an
error measurement called Erroneous Rejection Ratio (ERR) is introduced and
related to the False Discovery Rate (FDR). ERR is an error measurement similar
in spirit to FDR, and it greatly simplifies the analytical study of error
properties of multiple test procedures. Next an improved estimator of the
proportion of true null hypotheses and a data adaptive significance threshold
criterion are developed. Some asymptotic error properties of the significant
threshold criterion is established in terms of ERR under distributional
assumptions widely satisfied in recent applications. A simulation study
provides clear evidence that the proposed estimator of the proportion of true
null hypotheses outperforms the existing estimators of this important parameter
in massive multiple tests. Both analytical and simulation studies indicate that
the proposed significance threshold criterion can provide a reasonable balance
between the amounts of false positive and false negative errors, thereby
complementing and extending the various FDR control procedures. S-plus/R code
is available from the author upon request.Comment: Published at http://dx.doi.org/10.1214/074921706000000392 in the IMS
Lecture Notes--Monograph Series
(http://www.imstat.org/publications/lecnotes.htm) by the Institute of
Mathematical Statistics (http://www.imstat.org
A review of applied methods in Europe for flood-frequency analysis in a changing environment
The report presents a review of methods used in Europe for trend analysis, climate change projections and non-stationary analysis of extreme precipitation and flood frequency. In addition, main findings of the analyses are presented, including a comparison of trend analysis results and climate change projections. Existing guidelines in Europe on design flood and design rainfall estimation that incorporate climate change are reviewed. The report
concludes with a discussion of research needs on non-stationary frequency analysis for considering the effects of climate change and inclusion in design guidelines.
Trend analyses are reported for 21 countries in Europe with results for extreme precipitation, extreme streamflow or both. A large number of national and regional trend studies have been carried out. Most studies are based on statistical methods applied to individual time series of extreme precipitation or extreme streamflow using the non-parametric Mann-Kendall trend test or regression analysis. Some studies have been reported that use field significance or regional consistency tests to analyse trends over larger areas. Some of the studies also include analysis of trend attribution. The studies reviewed indicate that there is
some evidence of a general increase in extreme precipitation, whereas there are no clear indications of significant increasing trends at regional or national level of extreme streamflow. For some smaller regions increases in extreme streamflow are reported. Several studies from regions dominated by snowmelt-induced peak flows report decreases in extreme streamflow and earlier spring snowmelt peak flows. Climate change projections have been reported for 14 countries in Europe with results for extreme precipitation, extreme streamflow or both. The review shows various approaches for producing climate projections of extreme precipitation and flood frequency based on
alternative climate forcing scenarios, climate projections from available global and regional climate models, methods for statistical downscaling and bias correction, and alternative hydrological models. A large number of the reported studies are based on an ensemble modelling approach that use several climate forcing scenarios and climate model projections in order to address the uncertainty on the projections of extreme precipitation and flood frequency. Some studies also include alternative statistical downscaling and bias correction methods and hydrological modelling approaches. Most studies reviewed indicate an increase in extreme precipitation under a future climate, which is consistent with the observed trend of extreme precipitation. Hydrological projections of peak flows and flood frequency show both positive and negative changes. Large increases in peak flows are reported for some catchments with rainfall-dominated peak flows, whereas a general decrease in flood magnitude and earlier spring floods are reported for catchments with snowmelt-dominated peak flows. The latter is consistent with the observed trends. The review of existing guidelines in Europe on design floods and design rainfalls shows that only few countries explicitly address climate change. These design guidelines are based on climate change adjustment factors to be applied to current design estimates and may
depend on design return period and projection horizon. The review indicates a gap between the need for considering climate change impacts in design and actual published guidelines that incorporate climate change in extreme precipitation and flood frequency. Most of the studies reported are based on frequency analysis assuming stationary conditions in a certain time window (typically 30 years) representing current and future climate. There is a need for developing more consistent non-stationary frequency analysis methods that can account for the transient nature of a changing climate
- …