224 research outputs found
On asymptotically optimal tests under loss of identifiability in semiparametric models
We consider tests of hypotheses when the parameters are not identifiable
under the null in semiparametric models, where regularity conditions for
profile likelihood theory fail. Exponential average tests based on integrated
profile likelihood are constructed and shown to be asymptotically optimal under
a weighted average power criterion with respect to a prior on the
nonidentifiable aspect of the model. These results extend existing results for
parametric models, which involve more restrictive assumptions on the form of
the alternative than do our results. Moreover, the proposed tests accommodate
models with infinite dimensional nuisance parameters which either may not be
identifiable or may not be estimable at the usual parametric rate. Examples
include tests of the presence of a change-point in the Cox model with current
status data and tests of regression parameters in odds-rate models with right
censored data. Optimal tests have not previously been studied for these
scenarios. We study the asymptotic distribution of the proposed tests under the
null, fixed contiguous alternatives and random contiguous alternatives. We also
propose a weighted bootstrap procedure for computing the critical values of the
test statistics. The optimal tests perform well in simulation studies, where
they may exhibit improved power over alternative tests.Comment: Published in at http://dx.doi.org/10.1214/08-AOS643 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Robust Inference for Univariate Proportional Hazards Frailty Regression Models
We consider a class of semiparametric regression models which are
one-parameter extensions of the Cox [J. Roy. Statist. Soc. Ser. B 34 (1972)
187-220] model for right-censored univariate failure times. These models assume
that the hazard given the covariates and a random frailty unique to each
individual has the proportional hazards form multiplied by the frailty.
The frailty is assumed to have mean 1 within a known one-parameter family of
distributions. Inference is based on a nonparametric likelihood. The behavior
of the likelihood maximizer is studied under general conditions where the
fitted model may be misspecified. The joint estimator of the regression and
frailty parameters as well as the baseline hazard is shown to be uniformly
consistent for the pseudo-value maximizing the asymptotic limit of the
likelihood. Appropriately standardized, the estimator converges weakly to a
Gaussian process. When the model is correctly specified, the procedure is
semiparametric efficient, achieving the semiparametric information bound for
all parameter components. It is also proved that the bootstrap gives valid
inferences for all parameters, even under misspecification.
We demonstrate analytically the importance of the robust inference in several
examples. In a randomized clinical trial, a valid test of the treatment effect
is possible when other prognostic factors and the frailty distribution are both
misspecified. Under certain conditions on the covariates, the ratios of the
regression parameters are still identifiable. The practical utility of the
procedure is illustrated on a non-Hodgkin's lymphoma dataset.Comment: Published by the Institute of Mathematical Statistics
(http://www.imstat.org) in the Annals of Statistics
(http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000053
Nonparametric Bounds and Sensitivity Analysis of Treatment Effects
This paper considers conducting inference about the effect of a treatment (or
exposure) on an outcome of interest. In the ideal setting where treatment is
assigned randomly, under certain assumptions the treatment effect is
identifiable from the observable data and inference is straightforward.
However, in other settings such as observational studies or randomized trials
with noncompliance, the treatment effect is no longer identifiable without
relying on untestable assumptions. Nonetheless, the observable data often do
provide some information about the effect of treatment, that is, the parameter
of interest is partially identifiable. Two approaches are often employed in
this setting: (i) bounds are derived for the treatment effect under minimal
assumptions, or (ii) additional untestable assumptions are invoked that render
the treatment effect identifiable and then sensitivity analysis is conducted to
assess how inference about the treatment effect changes as the untestable
assumptions are varied. Approaches (i) and (ii) are considered in various
settings, including assessing principal strata effects, direct and indirect
effects and effects of time-varying exposures. Methods for drawing formal
inference about partially identified parameters are also discussed.Comment: Published in at http://dx.doi.org/10.1214/14-STS499 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Accounting for competing risks in randomized controlled trials: a review and recommendations for improvement
In studies with survival or time-to-event outcomes, a competing risk is an event whose occurrence precludes the occurrence of the primary event of interest. Specialized statistical methods must be used to analyze survival data in the presence of competing risks. We conducted a review of randomized controlled trials with survival outcomes that were published in high-impact general medical journals. Of 40 studies that we identified, 31 (77.5%) were potentially susceptible to competing risks. However, in the majority of these studies, the potential presence of competing risks was not accounted for in the statistical analyses that were described. Of the 31 studies potentially susceptible to competing risks, 24 (77.4%) reported the results of a Kaplan-Meier survival analysis, while only five (16.1%) reported using cumulative incidence functions to estimate the incidence of the outcome over time in the presence of competing risks. The former approach will tend to result in an overestimate of the incidence of the outcome over time, while the latter approach will result in unbiased estimation of the incidence of the primary outcome over time. We provide recommendations on the analysis and reporting of randomized controlled trials with survival outcomes in the presence of competing risks. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd
Practical recommendations for reporting Fine-Gray model analyses for competing risk data
In survival analysis, a competing risk is an event whose occurrence precludes the occurrence of the primary event of interest. Outcomes in medical research are frequently subject to competing risks. In survival analysis, there are 2 key questions that can be addressed using competing risk regression models: first, which covariates affect the rate at which events occur, and second, which covariates affect the probability of an event occurring over time. The cause‐specific hazard model estimates the effect of covariates on the rate at which events occur in subjects who are currently event‐free. Subdistribution hazard ratios obtained from the Fine‐Gray model describe the relative effect of covariates on the subdistribution hazard function. Hence, the covariates in this model can also be interpreted as having an effect on the cumulative incidence function or on the probability of events occurring over time. We conducted a review of the use and interpretation of the Fine‐Gray subdistribution hazard model in articles published in the medical literature in 2015. We found that many authors provided an unclear or incorrect interpretation of the regression coefficients associated with this model. An incorrect and inconsistent interpretation of regression coefficients may lead to confusion when comparing results across different studies. Furthermore, an incorrect interpretation of estimated regression coefficients can result in an incorrect understanding about the magnitude of the association between exposure and the incidence of the outcome. The objective of this article is to clarify how these regression coefficients should be reported and to propose suggestions for interpreting these coefficients
Designing penalty functions in high dimensional problems: The role of tuning parameters
Various forms of penalty functions have been developed for regularized estimation and variable selection. Screening approaches are often used to reduce the number of covariate before penalized estimation. However, in certain problems, the number of covariates remains large after screening. For example, in genome-wide association (GWA) studies, the purpose is to identify Single Nucleotide Polymorphisms (SNPs) that are associated with certain traits, and typically there are millions of SNPs and thousands of samples. Because of the strong correlation of nearby SNPs, screening can only reduce the number of SNPs from millions to tens of thousands and the variable selection problem remains very challenging. Several penalty functions have been proposed for such high dimensional data. However, it is unclear which class of penalty functions is the appropriate choice for a particular application. In this paper, we conduct a theoretical analysis to relate the ranges of tuning parameters of various penalty functions with the dimensionality of the problem and the minimum effect size. We exemplify our theoretical results in several penalty functions. The results suggest that a class of penalty functions that bridges L0 and L1 penalties requires less restrictive conditions on dimensionality and minimum effect sizes in order to attain the two fundamental goals of penalized estimation: to penalize all the noise to be zero and to obtain unbiased estimation of the true signals. The penalties such as SICA and Log belong to this class, but they have not been used often in applications. The simulation and real data analysis using GWAS data suggest the promising applicability of such class of penalties
The number of primary events per variable affects estimation of the subdistribution hazard competing risks model
AbstractObjectivesTo examine the effect of the number of events per variable (EPV) on the accuracy of estimated regression coefficients, standard errors, empirical coverage rates of estimated confidence intervals, and empirical estimates of statistical power when using the Fine–Gray subdistribution hazard regression model to assess the effect of covariates on the incidence of events that occur over time in the presence of competing risks.Study Design and SettingMonte Carlo simulations were used. We considered two different definitions of the number of EPV. One included events of any type that occurred (both primary events and competing events), whereas the other included only the number of primary events that occurred.ResultsThe definition of EPV that included only the number of primary events was preferable to the alternative definition, as the number of competing events had minimal impact on estimation. In general, 40–50 EPV were necessary to ensure accurate estimation of regression coefficients and associated quantities. However, if all of the covariates are continuous or are binary with moderate prevalence, then 10 EPV are sufficient to ensure accurate estimation.ConclusionAnalysts must base the number of EPV on the number of primary events that occurred
- …