1,018 research outputs found
Conditional Screening for Ultra-high Dimensional Covariates with Survival Outcomes
Identifying important biomarkers that are predictive for cancer patients'
prognosis is key in gaining better insights into the biological influences on
the disease and has become a critical component of precision medicine. The
emergence of large-scale biomedical survival studies, which typically involve
excessive number of biomarkers, has brought high demand in designing efficient
screening tools for selecting predictive biomarkers. The vast amount of
biomarkers defies any existing variable selection methods via regularization.
The recently developed variable screening methods, though powerful in many
practical setting, fail to incorporate prior information on the importance of
each biomarker and are less powerful in detecting marginally weak while jointly
important signals. We propose a new conditional screening method for survival
outcome data by computing the marginal contribution of each biomarker given
priorly known biological information. This is based on the premise that some
biomarkers are known to be associated with disease outcomes a priori. Our
method possesses sure screening properties and a vanishing false selection
rate. The utility of the proposal is further confirmed with extensive
simulation studies and analysis of a Diffuse large B-cell lymphoma (DLBCL)
dataset.Comment: 34 pages, 3 figure
Model-free screening procedure for ultrahigh-dimensional survival data based on Hilbert-Schmidt independence criterion
How to select the active variables which have significant impact on the event
of interest is a very important and meaningful problem in the statistical
analysis of ultrahigh-dimensional data. Sure independent screening procedure
has been demonstrated to be an effective method to reduce the dimensionality of
data from a large scale to a relatively moderate scale. For censored survival
data, the existing screening methods mainly adopt the Kaplan--Meier estimator
to handle censoring, which may not perform well for scenarios which have heavy
censoring rate. In this article, we propose a model-free screening procedure
based on the Hilbert-Schmidt independence criterion (HSIC). The proposed method
avoids the complication to specify an actual model from a large number of
covariates. Compared with existing screening procedures, this new approach has
several advantages. First, it does not involve the Kaplan--Meier estimator,
thus its performance is much more robust for the cases with a heavy censoring
rate. Second, the empirical estimate of HSIC is very simple as it just depends
on the trace of a product of Gram matrices. In addition, the proposed procedure
does not require any complicated numerical optimization, so the corresponding
calculation is very simple and fast. Finally, the proposed procedure which
employs the kernel method is substantially more resistant to outliers.
Extensive simulation studies demonstrate that the proposed method has favorable
exhibition over the existing methods. As an illustration, we apply the proposed
method to analyze the diffuse large-B-cell lymphoma (DLBCL) data and the
ovarian cancer data
Scalable Sparse Cox's Regression for Large-Scale Survival Data via Broken Adaptive Ridge
This paper develops a new scalable sparse Cox regression tool for sparse
high-dimensional massive sample size (sHDMSS) survival data. The method is a
local -penalized Cox regression via repeatedly performing reweighted
-penalized Cox regression. We show that the resulting estimator enjoys the
best of - and -penalized Cox regressions while overcoming their
limitations. Specifically, the estimator is selection consistent, oracle for
parameter estimation, and possesses a grouping property for highly correlated
covariates. Simulation results suggest that when the sample size is large, the
proposed method with pre-specified tuning parameters has a comparable or better
performance than some popular penalized regression methods. More importantly,
because the method naturally enables adaptation of efficient algorithms for
massive -penalized optimization and does not require costly data driven
tuning parameter selection, it has a significant computational advantage for
sHDMSS data, offering an average of 5-fold speedup over its closest competitor
in empirical studies
Marginal empirical likelihood and sure independence feature screening
We study a marginal empirical likelihood approach in scenarios when the
number of variables grows exponentially with the sample size. The marginal
empirical likelihood ratios as functions of the parameters of interest are
systematically examined, and we find that the marginal empirical likelihood
ratio evaluated at zero can be used to differentiate whether an explanatory
variable is contributing to a response variable or not. Based on this finding,
we propose a unified feature screening procedure for linear models and the
generalized linear models. Different from most existing feature screening
approaches that rely on the magnitudes of some marginal estimators to identify
true signals, the proposed screening approach is capable of further
incorporating the level of uncertainties of such estimators. Such a merit
inherits the self-studentization property of the empirical likelihood approach,
and extends the insights of existing feature screening methods. Moreover, we
show that our screening approach is less restrictive to distributional
assumptions, and can be conveniently adapted to be applied in a broad range of
scenarios such as models specified using general moment conditions. Our
theoretical results and extensive numerical examples by simulations and data
analysis demonstrate the merits of the marginal empirical likelihood approach.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1139 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- β¦