86 research outputs found
The Interval Property in Multiple Testing of Pairwise Differences
The usual step-down and step-up multiple testing procedures most often lack
an important intuitive, practical, and theoretical property called the interval
property. In short, the interval property is simply that for an individual
hypothesis, among the several to be tested, the acceptance sections of relevant
statistics are intervals. Lack of the interval property is a serious
shortcoming. This shortcoming is demonstrated for testing various pairwise
comparisons in multinomial models, multivariate normal models and in
nonparametric models. Residual based stepwise multiple testing procedures that
do have the interval property are offered in all these cases.Comment: Published in at http://dx.doi.org/10.1214/11-STS372 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Characterization of Bayes procedures for multiple endpoint problems and inadmissibility of the step-up procedure
The problem of multiple endpoint testing for k endpoints is treated as a 2^k
finite action problem. The loss function chosen is a vector loss function
consisting of two components. The two components lead to a vector risk. One
component of the vector risk is the false rejection rate (FRR), that is, the
expected number of false rejections. The other component is the false
acceptance rate (FAR), that is, the expected number of acceptances for which
the corresponding null hypothesis is false. This loss function is more
stringent than the positive linear combination loss function of Lehmann [Ann.
Math. Statist. 28 (1957) 1-25] and Cohen and Sackrowitz [Ann. Statist. (2005)
33 126-144] in the sense that the class of admissible rules is larger for this
vector risk formulation than for the linear combination risk function. In other
words, fewer procedures are inadmissible for the vector risk formulation. The
statistical model assumed is that the vector of variables Z is multivariate
normal with mean vector \mu and known intraclass covariance matrix \Sigma. The
endpoint hypotheses are H_i:\mu_i=0 vs K_i:\mu_i>0, i=1,...,k. A
characterization of all symmetric Bayes procedures and their limits is
obtained. The characterization leads to a complete class theorem. The complete
class theorem is used to provide a useful necessary condition for admissibility
of a procedure. The main result is that the step-up multiple endpoint procedure
is shown to be inadmissible.Comment: Published at http://dx.doi.org/10.1214/009053604000000986 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Decision theory results for one-sided multiple comparison procedures
A resurgence of interest in multiple hypothesis testing has occurred in the
last decade. Motivated by studies in genomics, microarrays, DNA sequencing,
drug screening, clinical trials, bioassays, education and psychology,
statisticians have been devoting considerable research energy in an effort to
properly analyze multiple endpoint data. In response to new applications, new
criteria and new methodology, many ad hoc procedures have emerged. The
classical requirement has been to use procedures which control the strong
familywise error rate (FWE) at some predetermined level \alpha. That is, the
probability of any false rejection of a true null hypothesis should be less
than or equal to \alpha. Finding desirable and powerful multiple test
procedures is difficult under this requirement. One of the more recent ideas is
concerned with controlling the false discovery rate (FDR), that is, the
expected proportion of rejected hypotheses which are, in fact, true. Many
multiple test procedures do control the FDR. A much earlier approach to
multiple testing was formulated by Lehmann [Ann. Math. Statist. 23 (1952)
541-552 and 28 (1957) 1-25]. Lehmann's approach is decision theoretic and he
treats the multiple endpoints problem as a 2^k finite action problem when there
are k endpoints. This approach is appealing since unlike the FWE and FDR
criteria, the finite action approach pays attention to false acceptances as
well as false rejections.Comment: Published at http://dx.doi.org/10.1214/009053604000000968 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
An Alternative to Student\u27s t-Test for Problems With Indifference Zones
Consider a sample from a normal population with mean, Ό, and variance unknown. Suppose it is desired to test H0:Ό †Ό0 versus H1:Ό ℠Ό1, with the region HI1:Ό0 \u3c Ό \u3c Ό1 being a (nonempty) indifference zone. It is shown that the usual Student\u27s t-test is inadmissible for this problem. An alternative test is proposed.
The two sided problem with indifference region is also discussed. By contrast with the above result, the usual Student\u27s t-test is admissible here. However the two sided version of the alternative test mentioned above does offer some practical advantages relative to the two sided t-test.
A 3-decision version of the two sided problem is also discussed. Here the t-test is inadmissible, and is dominated by the appropriate version of the alternative test.
The results concerning tests are also reformulated as results about confidence procedures
A new multiple testing method in the dependent case
The most popular multiple testing procedures are stepwise procedures based on
-values for individual test statistics. Included among these are the false
discovery rate (FDR) controlling procedures of Benjamini--Hochberg [J. Roy.
Statist. Soc. Ser. B 57 (1995) 289--300] and their offsprings. Even for models
that entail dependent data, -values based on marginal distributions are
used. Unlike such methods, the new method takes dependency into account at all
stages. Furthermore, the -value procedures often lack an intuitive convexity
property, which is needed for admissibility. Still further, the new methodology
is computationally feasible. If the number of tests is large and the proportion
of true alternatives is less than say 25 percent, simulations demonstrate a
clear preference for the new methodology. Applications are detailed for models
such as testing treatments against control (or any intraclass correlation
model), testing for change points and testing means when correlation is
successive.Comment: Published in at http://dx.doi.org/10.1214/08-AOS616 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The earth is flat (p < 0.05): significance thresholds and the crisis of unreplicable research
The widespread use of âstatistical significanceâ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p -values into âsignificantâ and ânonsignificantâ contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p -values at face value, but mistrust results with larger p -values. In either case, p -values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance ( p â€Â 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be âconflictingâ, meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging, p -hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p -values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p -values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that âthere is no effectâ. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of larger p -values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or that p -values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment
Effect of angiotensin-converting enzyme inhibitor and angiotensin receptor blocker initiation on organ support-free days in patients hospitalized with COVID-19
IMPORTANCE Overactivation of the renin-angiotensin system (RAS) may contribute to poor clinical outcomes in patients with COVID-19.
Objective To determine whether angiotensin-converting enzyme (ACE) inhibitor or angiotensin receptor blocker (ARB) initiation improves outcomes in patients hospitalized for COVID-19.
DESIGN, SETTING, AND PARTICIPANTS In an ongoing, adaptive platform randomized clinical trial, 721 critically ill and 58 nonâcritically ill hospitalized adults were randomized to receive an RAS inhibitor or control between March 16, 2021, and February 25, 2022, at 69 sites in 7 countries (final follow-up on June 1, 2022).
INTERVENTIONS Patients were randomized to receive open-label initiation of an ACE inhibitor (nâ=â257), ARB (nâ=â248), ARB in combination with DMX-200 (a chemokine receptor-2 inhibitor; nâ=â10), or no RAS inhibitor (control; nâ=â264) for up to 10 days.
MAIN OUTCOMES AND MEASURES The primary outcome was organ supportâfree days, a composite of hospital survival and days alive without cardiovascular or respiratory organ support through 21 days. The primary analysis was a bayesian cumulative logistic model. Odds ratios (ORs) greater than 1 represent improved outcomes.
RESULTS On February 25, 2022, enrollment was discontinued due to safety concerns. Among 679 critically ill patients with available primary outcome data, the median age was 56 years and 239 participants (35.2%) were women. Median (IQR) organ supportâfree days among critically ill patients was 10 (â1 to 16) in the ACE inhibitor group (nâ=â231), 8 (â1 to 17) in the ARB group (nâ=â217), and 12 (0 to 17) in the control group (nâ=â231) (median adjusted odds ratios of 0.77 [95% bayesian credible interval, 0.58-1.06] for improvement for ACE inhibitor and 0.76 [95% credible interval, 0.56-1.05] for ARB compared with control). The posterior probabilities that ACE inhibitors and ARBs worsened organ supportâfree days compared with control were 94.9% and 95.4%, respectively. Hospital survival occurred in 166 of 231 critically ill participants (71.9%) in the ACE inhibitor group, 152 of 217 (70.0%) in the ARB group, and 182 of 231 (78.8%) in the control group (posterior probabilities that ACE inhibitor and ARB worsened hospital survival compared with control were 95.3% and 98.1%, respectively).
CONCLUSIONS AND RELEVANCE In this trial, among critically ill adults with COVID-19, initiation of an ACE inhibitor or ARB did not improve, and likely worsened, clinical outcomes.
TRIAL REGISTRATION ClinicalTrials.gov Identifier: NCT0273570
The influence of the nonrecent past in prediction for stochastic processes
Consider the stochastic processes X1, X2,... and [Lambda]1, [Lambda]2,... where the X process can be thought of as observations on the [Lambda] process. We investigate the asymptotic behavior of the conditional distributions of Xt+v given X1,..., Xt and [Lambda]t+v given X1,..., Xt with regard to their dependency on the "early" part of the X process. These distributions arise in various time series and sequential decision theory problems. The results support the intuitively reasonable and often used (as a basic tenet of model building) assumption that only the more recent past is needed for near optimal prediction.Stochastic process prediction martingale Markov process stationary process
- âŠ