31 research outputs found

    Independent screening for single-index hazard rate models with ultra-high dimensional features

    Get PDF
    In data sets with many more features than observations, independent screening based on all univariate regression models leads to a computationally convenient variable selection method. Recent efforts have shown that in the case of generalized linear models, independent screening may suffice to capture all relevant features with high probability, even in ultra-high dimension. It is unclear whether this formal sure screening property is attainable when the response is a right-censored survival time. We propose a computationally very efficient independent screening method for survival data which can be viewed as the natural survival equivalent of correlation screening. We state conditions under which the method admits the sure screening property within a general class of single-index hazard rate models with ultra-high dimensional features. An iterative variant is also described which combines screening with penalized regression in order to handle more complex feature covariance structures. The methods are evaluated through simulation studies and through application to a real gene expression dataset.Comment: 32 pages, 3 figure

    Some statistical models for high-dimensional data

    Get PDF

    Feature Selection in High-Dimensional Studies

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    SIS: An R Package for Sure Independence Screening in Ultrahigh-Dimensional Statistical Models

    Get PDF
    We revisit sure independence screening procedures for variable selection in generalized linear models and the Cox proportional hazards model. Through the publicly available R package SIS, we provide a unified environment to carry out variable selection using iterative sure independence screening (ISIS) and all of its variants. For the regularization steps in the ISIS recruiting process, available penalties include the LASSO, SCAD, and MCP while the implemented variants for the screening steps are sample splitting, data-driven thresholding, and combinations thereof. Performance of these feature selection techniques is investigated by means of real and simulated data sets, where we find considerable improvements in terms of model selection and computational time between our algorithms and traditional penalized pseudo-likelihood methods applied directly to the full set of covariates

    Penalized Profiled Semiparametric Estimating Functions

    Get PDF
    In this paper, we propose a general class of penalized profiled semiparametric estimating functions which is applicable to a wide range of statistical models, including quantile regression, survival analysis, and missing data, among others. It is noteworthy that the estimating function can be non-smooth in the parametric and/or nonparametric components. Without imposing a specific functional structure on the nonparametric component or assuming a conditional distribution of the response variable for the given covariates, we establish a unified theory which demonstrates that the resulting estimator for the parametric component possesses the oracle property. Monte Carlo studies indicate that the proposed estimator performs well. An empirical example is also presented to illustrate the usefulness of the new method

    Statistical Methods on Survival Data with Measurement Error

    Get PDF
    In survival data analysis, covariates are often subject to measurement error. A naive analysis with measurement error ignored commonly leads to biased estimation of parameters of survival models. Measurement error also causes efficiency loss for detecting possible association between risk factors and time to event. Furthermore, it induces difficulty on model building and model checking, because the presence of measurement error frequently masks true underlying patterns of data. Although there has been a large body of literature to handle error-prone survival data since the paper by Prentice (1982), many important issues still remain unexplored in this area. This thesis focuses on several important issues of survival analysis with covariate measurement error. One problem that has received little attention is on misspecification of measurement error models. In this thesis, we investigate this important problem with the attention particularly paid to error-contaminated survival data under the Cox model. In particular, we conduct bias analysis which offers a way to unify many existing methods of survival data with measurement error, and study the impact of misspecifying the error models in survival data analysis. A simple expression is obtained to quantify the bias of "working" estimators derived under misspecified error models. Consistent estimators under general error models are derived based on this simple expression. Furthermore, we study hypothesis testing with both model misspecification and measurement error present. A second problem of our interest is about the validity of survival model assumptions when measurement error is involved. In the literature, a large number of methods have been developed to correct for measurement error effects, and these methods basically assume the survival model to be the Cox model. When the Cox model or the error model assumptions fail to hold, existing methods would break down. In this thesis, we address the issue of checking the Cox model assumptions with measurement error. We propose valid goodness of fit tests for survival data with covariate measurement error. This research offers us an important addition to the literature of survival data with measurement error. Our third topic concerns survival data analysis under additive hazards models with covariate measurement error. The additive hazards model is a useful and important alternative to the Cox model. However, this model is relatively less studied for situations where covariates are measured with error. In this thesis, we make important contributions to this topic. Specifically, we explore asymptotic bias induced from ignoring measurement error. A number of inference methods are developed to correct for error effects. The validity of the proposed methods is justified both theoretically and empirically. We investigate issues of model checking and model misspecification as well. In many studies, collection of data often includes a large number of variables in which many of them are unimportant in explaining survival of an individual. An important task is thus to identify relevant risk factors which are linked to the hazards of subjects. Although there is work on variable selection for survival data analysis, the available methods typically require all variables be precisely measured. This requirement is, however, often infeasible. More challengingly, in some studies, the dimension of the risk factors can be quite large or even much larger than the size of subjects. Our fourth topic concerns about estimation and variable selection for survival data with high dimensional mismeasured covariates. We propose corrected penalized methods. Our methods can adjust for measurement error effects, and perform estimation and variable selection simultaneously. Our research on this topic closes multiple gaps among the areas of survival analysis, measurement error and variable selection.4 month

    Novel Methods for Estimation and Inference in Varying Coefficient Models

    Full text link
    Function type parameters relax many model assumptions because of the flexibility and the size of the parameter space. However, the curse of dimensionality has been the biggest challenge in the nonparametric regression area. An advantageous approach to dimension reduction is using basis expansion to approximate infinite parameter space. An even more challenging problem is estimating functions with unique structures, such as functions with zero-effect regions. The main part of this dissertation is working on varying coefficients with zero-effect regions. We propose a novel model that can detect zero-effect regions and estimate the non-zero effects simultaneously. We provide theoretical support for the inference of our proposed estimators. Simulation studies and real data analyses demonstrate the advantage of our models. This dissertation also introduces a new model that considers the additive effects from a novel aspect: estimating the dynamic effect changes. Simulations and real data applications provide comparisons between our model and the existing model.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163251/1/yuanyang_1.pd
    corecore