213 research outputs found

    Regularization for Cox's proportional hazards model with NP-dimensionality

    Full text link
    High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding need for better measurement specific model selection. In this paper we establish strong oracle properties of nonconcave penalized methods for nonpolynomial (NP) dimensional data with censoring in the framework of Cox's proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCAD are discussed specifically. We unveil the question under which dimensionality and correlation restrictions can an oracle estimator be constructed and grasped. It is demonstrated that nonconcave penalties lead to significant reduction of the "irrepresentable condition" needed for LASSO model selection consistency. The large deviation result for martingales, bearing interests of its own, is developed for characterizing the strong oracle property. Moreover, the nonconcave regularized estimator, is shown to achieve asymptotically the information bound of the oracle estimator. A coordinate-wise algorithm is developed for finding the grid of solution paths for penalized hazard regression problems, and its performance is evaluated on simulated and gene association study examples.Comment: Published in at http://dx.doi.org/10.1214/11-AOS911 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Nonparametric tests of the Markov hypothesis in continuous-time models

    Full text link
    We propose several statistics to test the Markov hypothesis for β\beta-mixing stationary processes sampled at discrete time intervals. Our tests are based on the Chapman--Kolmogorov equation. We establish the asymptotic null distributions of the proposed test statistics, showing that Wilks's phenomenon holds. We compute the power of the test and provide simulations to investigate the finite sample performance of the test statistics when the null model is a diffusion process, with alternatives consisting of models with a stochastic mean reversion level, stochastic volatility and jumps.Comment: Published in at http://dx.doi.org/10.1214/09-AOS763 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification

    Full text link
    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.Comment: 30 pages, 2 figure

    Modeling Nonlinear Vector Time Series Data

    Get PDF
    In this chapter, we review nonlinear models for vector time series data and develop new nonparametric estimation and inference for them. Vector time series data exist widely in practice. In financial markets, multiple time series are usually correlated. When analyzing several interdependent time series, in general one should consider them as a single vector time series fitted by multivariate models, which provides a useful tool for modeling interdependencies among multiple time series and for simultaneously analyzing feedback and Granger causality effects. Since nonlinear features are widely observed in time series, we consider nonlinear methodology for modeling nonlinear vector time series data, which allows flexibility in the model structure and avoids the curse of dimensionality

    The global mean sea surface model WHU2013

    Get PDF
    AbstractThe mean sea surface (MSS) model is an important reference for the study of charting datum and sea level change. A global MSS model named WHU2013, with 2′ × 2′ spatial resolution between 80°S and 84°N, is established in this paper by combining nearly 20 years of multi-satellite altimetric data that include Topex/Poseidon (T/P), Jason-1, Jason-2, ERS-2, ENVISAT and GFO Exact Repeat Mission (ERM) data, ERS-1/168, Jason-1/C geodetic mission data and Cryosat-2 low resolution mode (LRM) data. All the ERM data are adjusted by the collinear method to achieve the mean along-track sea surface height (SSH), and the combined dataset of T/P, Jason-1 and Jason-2 from 1993 to 2012 after collinear adjustment is used as the reference data. The sea level variations in the non-ERM data (geodetic mission data and LRM data) are mainly investigated, and a combined method is proposed to correct the sea level variations between 66°S and 66°N by along-track sea level variation time series and beyond 66°S or 66°N by seasonal sea level variations. In the crossover adjustment between multi-altimetric data, a stepwise method is used to solve the problem of inconsistency in the reference data between the high and low latitude regions. The proposed model is compared with the CNES-CLS2011 and DTU13 MSS models, and the standard derivation (STD) of the differences between the models is about 5 cm between 80°S and 84°N, less than 3 cm between 66°S and 66°N, and less than 4 cm in the China Sea and its adjacent sea. Furthermore, the three models exhibit a good agreement in the SSH differences and the along-track gradient of SSH following comparisons with satellite altimetry data