Search CORE

224 research outputs found

On asymptotically optimal tests under loss of identifiability in semiparametric models

Author: Fine Jason P.
Kosorok Michael R.
Song Rui
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

We consider tests of hypotheses when the parameters are not identifiable under the null in semiparametric models, where regularity conditions for profile likelihood theory fail. Exponential average tests based on integrated profile likelihood are constructed and shown to be asymptotically optimal under a weighted average power criterion with respect to a prior on the nonidentifiable aspect of the model. These results extend existing results for parametric models, which involve more restrictive assumptions on the form of the alternative than do our results. Moreover, the proposed tests accommodate models with infinite dimensional nuisance parameters which either may not be identifiable or may not be estimable at the usual parametric rate. Examples include tests of the presence of a change-point in the Cox model with current status data and tests of regression parameters in odds-rate models with right censored data. Optimal tests have not previously been studied for these scenarios. We study the asymptotic distribution of the proposed tests under the null, fixed contiguous alternatives and random contiguous alternatives. We also propose a weighted bootstrap procedure for computing the critical values of the test statistics. The optimal tests perform well in simulation studies, where they may exhibit improved power over alternative tests.Comment: Published in at http://dx.doi.org/10.1214/08-AOS643 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

PubMed Central

Carolina Digital Repository

Robust Inference for Univariate Proportional Hazards Frailty Regression Models

Author: Fine Jason P.
Kosorok Michael R.
Lee Bee Leng
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2004
Field of study

We consider a class of semiparametric regression models which are one-parameter extensions of the Cox [J. Roy. Statist. Soc. Ser. B 34 (1972) 187-220] model for right-censored univariate failure times. These models assume that the hazard given the covariates and a random frailty unique to each individual has the proportional hazards form multiplied by the frailty. The frailty is assumed to have mean 1 within a known one-parameter family of distributions. Inference is based on a nonparametric likelihood. The behavior of the likelihood maximizer is studied under general conditions where the fitted model may be misspecified. The joint estimator of the regression and frailty parameters as well as the baseline hazard is shown to be uniformly consistent for the pseudo-value maximizing the asymptotic limit of the likelihood. Appropriately standardized, the estimator converges weakly to a Gaussian process. When the model is correctly specified, the procedure is semiparametric efficient, achieving the semiparametric information bound for all parameter components. It is also proved that the bootstrap gives valid inferences for all parameters, even under misspecification. We demonstrate analytically the importance of the robust inference in several examples. In a randomized clinical trial, a valid test of the treatment effect is possible when other prognostic factors and the frailty distribution are both misspecified. Under certain conditions on the covariates, the ratios of the regression parameters are still identifiable. The practical utility of the procedure is illustrated on a non-Hodgkin's lymphoma dataset.Comment: Published by the Institute of Mathematical Statistics (http://www.imstat.org) in the Annals of Statistics (http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000053

arXiv.org e-Print Archive

CiteSeerX

Crossref

ScholarBank@NUS

Nonparametric Bounds and Sensitivity Analysis of Treatment Effects

Author: Fine Jason P.
Gilbert Peter B.
Hudgens Michael G.
Richardson Amy
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

This paper considers conducting inference about the effect of a treatment (or exposure) on an outcome of interest. In the ideal setting where treatment is assigned randomly, under certain assumptions the treatment effect is identifiable from the observable data and inference is straightforward. However, in other settings such as observational studies or randomized trials with noncompliance, the treatment effect is no longer identifiable without relying on untestable assumptions. Nonetheless, the observable data often do provide some information about the effect of treatment, that is, the parameter of interest is partially identifiable. Two approaches are often employed in this setting: (i) bounds are derived for the treatment effect under minimal assumptions, or (ii) additional untestable assumptions are invoked that render the treatment effect identifiable and then sensitivity analysis is conducted to assess how inference about the treatment effect changes as the untestable assumptions are varied. Approaches (i) and (ii) are considered in various settings, including assessing principal strata effects, direct and indirect effects and effects of time-varying exposures. Methods for drawing formal inference about partially identified parameters are also discussed.Comment: Published in at http://dx.doi.org/10.1214/14-STS499 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

PubMed Central

Carolina Digital Repository

Accounting for competing risks in randomized controlled trials: a review and recommendations for improvement

Author: Austin Peter C.
Fine Jason P.
Publication venue
Publication date: 01/01/2017
Field of study

In studies with survival or time-to-event outcomes, a competing risk is an event whose occurrence precludes the occurrence of the primary event of interest. Specialized statistical methods must be used to analyze survival data in the presence of competing risks. We conducted a review of randomized controlled trials with survival outcomes that were published in high-impact general medical journals. Of 40 studies that we identified, 31 (77.5%) were potentially susceptible to competing risks. However, in the majority of these studies, the potential presence of competing risks was not accounted for in the statistical analyses that were described. Of the 31 studies potentially susceptible to competing risks, 24 (77.4%) reported the results of a Kaplan-Meier survival analysis, while only five (16.1%) reported using cumulative incidence functions to estimate the incidence of the outcome over time in the presence of competing risks. The former approach will tend to result in an overestimate of the incidence of the outcome over time, while the latter approach will result in unbiased estimation of the incidence of the primary outcome over time. We provide recommendations on the analysis and reporting of randomized controlled trials with survival outcomes in the presence of competing risks. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd

PubMed Central

Carolina Digital Repository

Practical recommendations for reporting Fine-Gray model analyses for competing risk data

Author: Austin Peter C.
Fine Jason P.
Publication venue
Publication date: 01/01/2017
Field of study

In survival analysis, a competing risk is an event whose occurrence precludes the occurrence of the primary event of interest. Outcomes in medical research are frequently subject to competing risks. In survival analysis, there are 2 key questions that can be addressed using competing risk regression models: first, which covariates affect the rate at which events occur, and second, which covariates affect the probability of an event occurring over time. The cause‐specific hazard model estimates the effect of covariates on the rate at which events occur in subjects who are currently event‐free. Subdistribution hazard ratios obtained from the Fine‐Gray model describe the relative effect of covariates on the subdistribution hazard function. Hence, the covariates in this model can also be interpreted as having an effect on the cumulative incidence function or on the probability of events occurring over time. We conducted a review of the use and interpretation of the Fine‐Gray subdistribution hazard model in articles published in the medical literature in 2015. We found that many authors provided an unclear or incorrect interpretation of the regression coefficients associated with this model. An incorrect and inconsistent interpretation of regression coefficients may lead to confusion when comparing results across different studies. Furthermore, an incorrect interpretation of estimated regression coefficients can result in an incorrect understanding about the magnitude of the association between exposure and the incidence of the outcome. The objective of this article is to clarify how these regression coefficients should be reported and to propose suggestions for interpreting these coefficients

Crossref

Carolina Digital Repository

Designing penalty functions in high dimensional problems: The role of tuning parameters

Author: Chen Ting-Huei
Fine Jason P.
Sun Wei
Publication venue
Publication date: 01/01/2016
Field of study

Various forms of penalty functions have been developed for regularized estimation and variable selection. Screening approaches are often used to reduce the number of covariate before penalized estimation. However, in certain problems, the number of covariates remains large after screening. For example, in genome-wide association (GWA) studies, the purpose is to identify Single Nucleotide Polymorphisms (SNPs) that are associated with certain traits, and typically there are millions of SNPs and thousands of samples. Because of the strong correlation of nearby SNPs, screening can only reduce the number of SNPs from millions to tens of thousands and the variable selection problem remains very challenging. Several penalty functions have been proposed for such high dimensional data. However, it is unclear which class of penalty functions is the appropriate choice for a particular application. In this paper, we conduct a theoretical analysis to relate the ranges of tuning parameters of various penalty functions with the dimensionality of the problem and the minimum effect size. We exemplify our theoretical results in several penalty functions. The results suggest that a class of penalty functions that bridges L0 and L1 penalties requires less restrictive conditions on dimensionality and minimum effect sizes in order to attain the two fundamental goals of penalized estimation: to penalize all the noise to be zero and to obtain unbiased estimation of the true signals. The penalties such as SICA and Log belong to this class, but they have not been used often in applications. The simulation and real data analysis using GWAS data suggest the promising applicability of such class of penalties

Crossref

Carolina Digital Repository

The number of primary events per variable affects estimation of the subdistribution hazard competing risks model

Author: Allignol Arthur
Austin Peter C.
Fine Jason P.
Publication venue
Publication date: 01/01/2017
Field of study

AbstractObjectivesTo examine the effect of the number of events per variable (EPV) on the accuracy of estimated regression coefficients, standard errors, empirical coverage rates of estimated confidence intervals, and empirical estimates of statistical power when using the Fine–Gray subdistribution hazard regression model to assess the effect of covariates on the incidence of events that occur over time in the presence of competing risks.Study Design and SettingMonte Carlo simulations were used. We considered two different definitions of the number of EPV. One included events of any type that occurred (both primary events and competing events), whereas the other included only the number of primary events that occurred.ResultsThe definition of EPV that included only the number of primary events was preferable to the alternative definition, as the number of competing events had minimal impact on estimation. In general, 40–50 EPV were necessary to ensure accurate estimation of regression coefficients and associated quantities. However, if all of the covariates are continuous or are binary with moderate prevalence, then 10 EPV are sufficient to ensure accurate estimation.ConclusionAnalysts must base the number of EPV on the number of primary events that occurred

Carolina Digital Repository