1,018 research outputs found

    Conditional Screening for Ultra-high Dimensional Covariates with Survival Outcomes

    Get PDF
    Identifying important biomarkers that are predictive for cancer patients' prognosis is key in gaining better insights into the biological influences on the disease and has become a critical component of precision medicine. The emergence of large-scale biomedical survival studies, which typically involve excessive number of biomarkers, has brought high demand in designing efficient screening tools for selecting predictive biomarkers. The vast amount of biomarkers defies any existing variable selection methods via regularization. The recently developed variable screening methods, though powerful in many practical setting, fail to incorporate prior information on the importance of each biomarker and are less powerful in detecting marginally weak while jointly important signals. We propose a new conditional screening method for survival outcome data by computing the marginal contribution of each biomarker given priorly known biological information. This is based on the premise that some biomarkers are known to be associated with disease outcomes a priori. Our method possesses sure screening properties and a vanishing false selection rate. The utility of the proposal is further confirmed with extensive simulation studies and analysis of a Diffuse large B-cell lymphoma (DLBCL) dataset.Comment: 34 pages, 3 figure

    Model-free screening procedure for ultrahigh-dimensional survival data based on Hilbert-Schmidt independence criterion

    Full text link
    How to select the active variables which have significant impact on the event of interest is a very important and meaningful problem in the statistical analysis of ultrahigh-dimensional data. Sure independent screening procedure has been demonstrated to be an effective method to reduce the dimensionality of data from a large scale to a relatively moderate scale. For censored survival data, the existing screening methods mainly adopt the Kaplan--Meier estimator to handle censoring, which may not perform well for scenarios which have heavy censoring rate. In this article, we propose a model-free screening procedure based on the Hilbert-Schmidt independence criterion (HSIC). The proposed method avoids the complication to specify an actual model from a large number of covariates. Compared with existing screening procedures, this new approach has several advantages. First, it does not involve the Kaplan--Meier estimator, thus its performance is much more robust for the cases with a heavy censoring rate. Second, the empirical estimate of HSIC is very simple as it just depends on the trace of a product of Gram matrices. In addition, the proposed procedure does not require any complicated numerical optimization, so the corresponding calculation is very simple and fast. Finally, the proposed procedure which employs the kernel method is substantially more resistant to outliers. Extensive simulation studies demonstrate that the proposed method has favorable exhibition over the existing methods. As an illustration, we apply the proposed method to analyze the diffuse large-B-cell lymphoma (DLBCL) data and the ovarian cancer data

    Scalable Sparse Cox's Regression for Large-Scale Survival Data via Broken Adaptive Ridge

    Full text link
    This paper develops a new scalable sparse Cox regression tool for sparse high-dimensional massive sample size (sHDMSS) survival data. The method is a local L0L_0-penalized Cox regression via repeatedly performing reweighted L2L_2-penalized Cox regression. We show that the resulting estimator enjoys the best of L0L_0- and L2L_2-penalized Cox regressions while overcoming their limitations. Specifically, the estimator is selection consistent, oracle for parameter estimation, and possesses a grouping property for highly correlated covariates. Simulation results suggest that when the sample size is large, the proposed method with pre-specified tuning parameters has a comparable or better performance than some popular penalized regression methods. More importantly, because the method naturally enables adaptation of efficient algorithms for massive L2L_2-penalized optimization and does not require costly data driven tuning parameter selection, it has a significant computational advantage for sHDMSS data, offering an average of 5-fold speedup over its closest competitor in empirical studies

    Marginal empirical likelihood and sure independence feature screening

    Full text link
    We study a marginal empirical likelihood approach in scenarios when the number of variables grows exponentially with the sample size. The marginal empirical likelihood ratios as functions of the parameters of interest are systematically examined, and we find that the marginal empirical likelihood ratio evaluated at zero can be used to differentiate whether an explanatory variable is contributing to a response variable or not. Based on this finding, we propose a unified feature screening procedure for linear models and the generalized linear models. Different from most existing feature screening approaches that rely on the magnitudes of some marginal estimators to identify true signals, the proposed screening approach is capable of further incorporating the level of uncertainties of such estimators. Such a merit inherits the self-studentization property of the empirical likelihood approach, and extends the insights of existing feature screening methods. Moreover, we show that our screening approach is less restrictive to distributional assumptions, and can be conveniently adapted to be applied in a broad range of scenarios such as models specified using general moment conditions. Our theoretical results and extensive numerical examples by simulations and data analysis demonstrate the merits of the marginal empirical likelihood approach.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1139 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore