Search CORE

255 research outputs found

Adaptive kernel estimation of the baseline function in the Cox model, with high-dimensional covariates

Author: Guilloux Agathe
Lemler Sarah
Taupin Marie-Luce
Publication venue
Publication date: 06/07/2015
Field of study

The aim of this article is to propose a novel kernel estimator of the baseline function in a general high-dimensional Cox model, for which we derive non-asymptotic rates of convergence. To construct our estimator, we first estimate the regression parameter in the Cox model via a Lasso procedure. We then plug this estimator into the classical kernel estimator of the baseline function, obtained by smoothing the so-called Breslow estimator of the cumulative baseline function. We propose and study an adaptive procedure for selecting the bandwidth, in the spirit of Gold-enshluger and Lepski (2011). We state non-asymptotic oracle inequalities for the final estimator, which reveal the reduction of the rates of convergence when the dimension of the covariates grows

arXiv.org e-Print Archive

Regularization for Cox's proportional hazards model with NP-dimensionality

Author: Bradic Jelena
Fan Jianqing
Jiang Jiancheng
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 25/05/2012
Field of study

High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding need for better measurement specific model selection. In this paper we establish strong oracle properties of nonconcave penalized methods for nonpolynomial (NP) dimensional data with censoring in the framework of Cox's proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCAD are discussed specifically. We unveil the question under which dimensionality and correlation restrictions can an oracle estimator be constructed and grasped. It is demonstrated that nonconcave penalties lead to significant reduction of the "irrepresentable condition" needed for LASSO model selection consistency. The large deviation result for martingales, bearing interests of its own, is developed for characterizing the strong oracle property. Moreover, the nonconcave regularized estimator, is shown to achieve asymptotically the information bound of the oracle estimator. A coordinate-wise algorithm is developed for finding the grid of solution paths for penalized hazard regression problems, and its performance is evaluated on simulated and gene association study examples.Comment: Published in at http://dx.doi.org/10.1214/11-AOS911 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Oracle inequalities for the Lasso in the high-dimensional Aalen multiplicative intensity model

Author: Lemler Sarah
Publication venue
Publication date: 01/01/2013
Field of study

In a general counting process setting, we consider the problem of obtaining a prognostic on the survival time adjusted on covariates in high-dimension. Towards this end, we construct an estimator of the whole conditional intensity. We estimate it by the best Cox proportional hazards model given two dictionaries of functions. The first dictionary is used to construct an approximation of the logarithm of the baseline hazard function and the second to approximate the relative risk. We introduce a new data-driven weighted Lasso procedure to estimate the unknown parameters of the best Cox model approximating the intensity. We provide non-asymptotic oracle inequalities for our procedure in terms of an appropriate empirical Kullback divergence. Our results rely on an empirical Bernstein's inequality for martingales with jumps and properties of modified self-concordant functions

arXiv.org e-Print Archive

HAL Evry

CiteSeerX

HAL Descartes

Variance Estimation Using Refitted Cross-validation in Ultrahigh Dimensional Regression

Author: Fan Jianqing
Guo Shaojun
Hao Ning
Publication venue
Publication date: 24/12/2010
Field of study

Variance estimation is a fundamental problem in statistical modeling. In ultrahigh dimensional linear regressions where the dimensionality is much larger than sample size, traditional variance estimation techniques are not applicable. Recent advances on variable selection in ultrahigh dimensional linear regressions make this problem accessible. One of the major problems in ultrahigh dimensional regression is the high spurious correlation between the unobserved realized noise and some of the predictors. As a result, the realized noises are actually predicted when extra irrelevant variables are selected, leading to serious underestimate of the noise level. In this paper, we propose a two-stage refitted procedure via a data splitting technique, called refitted cross-validation (RCV), to attenuate the influence of irrelevant variables with high spurious correlations. Our asymptotic results show that the resulting procedure performs as well as the oracle estimator, which knows in advance the mean regression function. The simulation studies lend further support to our theoretical claims. The naive two-stage estimator which fits the selected variables in the first stage and the plug-in one stage estimators using LASSO and SCAD are also studied and compared. Their performances can be improved by the proposed RCV method

arXiv.org e-Print Archive

Princeton University Open Access Repository

Scalable Sparse Cox's Regression for Large-Scale Survival Data via Broken Adaptive Ridge

Author: Kawaguchi Eric S.
Li Gang
Liu Zhenqiu
Suchard Marc A.
Publication venue: 'Wiley'
Publication date: 25/07/2018
Field of study

This paper develops a new scalable sparse Cox regression tool for sparse high-dimensional massive sample size (sHDMSS) survival data. The method is a local

L_0

-penalized Cox regression via repeatedly performing reweighted

L_2

-penalized Cox regression. We show that the resulting estimator enjoys the best of

L_0

- and

L_2

-penalized Cox regressions while overcoming their limitations. Specifically, the estimator is selection consistent, oracle for parameter estimation, and possesses a grouping property for highly correlated covariates. Simulation results suggest that when the sample size is large, the proposed method with pre-specified tuning parameters has a comparable or better performance than some popular penalized regression methods. More importantly, because the method naturally enables adaptation of efficient algorithms for massive

L_2

-penalized optimization and does not require costly data driven tuning parameter selection, it has a significant computational advantage for sHDMSS data, offering an average of 5-fold speedup over its closest competitor in empirical studies

arXiv.org e-Print Archive

eScholarship - University of California