1,329 research outputs found
Model-free screening procedure for ultrahigh-dimensional survival data based on Hilbert-Schmidt independence criterion
How to select the active variables which have significant impact on the event
of interest is a very important and meaningful problem in the statistical
analysis of ultrahigh-dimensional data. Sure independent screening procedure
has been demonstrated to be an effective method to reduce the dimensionality of
data from a large scale to a relatively moderate scale. For censored survival
data, the existing screening methods mainly adopt the Kaplan--Meier estimator
to handle censoring, which may not perform well for scenarios which have heavy
censoring rate. In this article, we propose a model-free screening procedure
based on the Hilbert-Schmidt independence criterion (HSIC). The proposed method
avoids the complication to specify an actual model from a large number of
covariates. Compared with existing screening procedures, this new approach has
several advantages. First, it does not involve the Kaplan--Meier estimator,
thus its performance is much more robust for the cases with a heavy censoring
rate. Second, the empirical estimate of HSIC is very simple as it just depends
on the trace of a product of Gram matrices. In addition, the proposed procedure
does not require any complicated numerical optimization, so the corresponding
calculation is very simple and fast. Finally, the proposed procedure which
employs the kernel method is substantially more resistant to outliers.
Extensive simulation studies demonstrate that the proposed method has favorable
exhibition over the existing methods. As an illustration, we apply the proposed
method to analyze the diffuse large-B-cell lymphoma (DLBCL) data and the
ovarian cancer data
Variable Screening for High Dimensional Time Series
Variable selection is a widely studied problem in high dimensional
statistics, primarily since estimating the precise relationship between the
covariates and the response is of great importance in many scientific
disciplines. However, most of theory and methods developed towards this goal
for the linear model invoke the assumption of iid sub-Gaussian covariates and
errors. This paper analyzes the theoretical properties of Sure Independence
Screening (SIS) (Fan and Lv [J. R. Stat. Soc. Ser. B Stat. Methodol. 70 (2008)
849-911]) for high dimensional linear models with dependent and/or heavy tailed
covariates and errors. We also introduce a generalized least squares screening
(GLSS) procedure which utilizes the serial correlation present in the data. By
utilizing this serial correlation when estimating our marginal effects, GLSS is
shown to outperform SIS in many cases. For both procedures we prove sure
screening properties, which depend on the moment conditions, and the strength
of dependence in the error and covariate processes, amongst other factors.
Additionally, combining these screening procedures with the adaptive Lasso is
analyzed. Dependence is quantified by functional dependence measures (Wu [Proc.
Natl. Acad. Sci. USA 102 (2005) 14150-14154]), and the results rely on the use
of Nagaev-type and exponential inequalities for dependent random variables. We
also conduct simulations to demonstrate the finite sample performance of these
procedures, and include a real data application of forecasting the US inflation
rate.Comment: Published in the Electronic Journal of Statistics
(https://projecteuclid.org/euclid.ejs/1519700498
Independent screening for single-index hazard rate models with ultra-high dimensional features
In data sets with many more features than observations, independent screening
based on all univariate regression models leads to a computationally convenient
variable selection method. Recent efforts have shown that in the case of
generalized linear models, independent screening may suffice to capture all
relevant features with high probability, even in ultra-high dimension. It is
unclear whether this formal sure screening property is attainable when the
response is a right-censored survival time. We propose a computationally very
efficient independent screening method for survival data which can be viewed as
the natural survival equivalent of correlation screening. We state conditions
under which the method admits the sure screening property within a general
class of single-index hazard rate models with ultra-high dimensional features.
An iterative variant is also described which combines screening with penalized
regression in order to handle more complex feature covariance structures. The
methods are evaluated through simulation studies and through application to a
real gene expression dataset.Comment: 32 pages, 3 figure
- …