770 research outputs found

    Regression methods for survival and multistate models.

    Get PDF
    A common research interest in medical, biological, and engineering research is determining whether certain independent variables are correlated with the survival or failure times. Standard statistical techniques cannot usually be applied for failure-time data due to the lack of complete data or in other word, due to censoring. From a statistical perspective, the study of time to event data is even more challenging when further complexities such as high dimensionality or multivariablity is added to the model. In this dissertation, we consider the predicating patient survival from proteomic profile of patient serum using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) data of non-small cell lung cancer patients. Due to much larger dimension of features in a mass spectrum compared to the study sample size, traditional linear regression modeling of survival times with high number of proteomic features is not feasible. Hence, we consider latent factor and regularized/penalized methods for fitting such models in order to predict patient survival from the mass spectrometry features. Extensive numerical studies involving both simulated as well as real mass spectrometry data are used to compare four popular regression methods, namely, partial least squares (PLS), sparse partial least square (SPLS), least absolute shrinkage and selection operator (LASSO) and elastic net regularization, on processed spectra. Right censoring is handled through a residual based multiple imputation. Overall, more complex methods such as the elastic net and SPLS result in better performances provided the operational parameters are chosen carefully via cross validation. For survival time prediction, we recommend using the elastic net based on a selected set of features. As a type of multivariate survival data, multistate models have a wide range of applications. Most of the existing regression approaches to analyze such data are based on parametric and semi-parametric procedures in which one should rely on specific model structures. In this dissertation, we construct non-parametric regression estimators of a number of temporal functions in a multistate system based on a univariate continuous baseline covariate. These estimators include state occupation probabilities, state entry, exit and waiting (sojourn) times distribution functions of a general progressive (e.g. acyclic) multistate model. The data are subject to right censoring and the censoring mechanism is explainable by observable covariates that could be time dependent. The resulting estimators are valid even if the multistate process is non-Markov. The performance of the estimators is studied using a detailed simulation. We illustrate our estimators using a data set on bone marrow transplant patients. Finally, some extension of the proposed methods to more general case with multivariate covariates are presented along with plans for future developments

    Bayesian Conditional Transformation Models

    Full text link
    Recent developments in statistical regression methodology establish flexible relationships between all parameters of the response distribution and the covariates. This shift away from pure mean regression is just one example and is further intensified by conditional transformation models (CTMs). They aim to infer the entire conditional distribution directly by applying a transformation function that transforms the response conditionally on a set of covariates towards a simple log-concave reference distribution. Thus, CTMs allow not only variance, kurtosis and skewness but the complete conditional distribution function to depend on the explanatory variables. In this article, we propose a Bayesian notion of conditional transformation models (BCTM) for discrete and continuous responses in the presence of random censoring. Rather than relying on simple polynomials, we implement a spline-based parametrization for monotonic effects that are supplemented with smoothness penalties. Furthermore, we are able to benefit from the Bayesian paradigm directly via easily obtainable credible intervals and other quantities without relying on large sample approximations. A simulation study demonstrates the competitiveness of our approach against its likelihood-based counterpart, most likely transformations (MLTs) and Bayesian additive models of location, scale and shape (BAMLSS). Three applications illustrate the versatility of the BCTMs in problems involving real world data

    Focused information criterion and model averaging for generalized additive partial linear models

    Full text link
    We study model selection and model averaging in generalized additive partial linear models (GAPLMs). Polynomial spline is used to approximate nonparametric functions. The corresponding estimators of the linear parameters are shown to be asymptotically normal. We then develop a focused information criterion (FIC) and a frequentist model average (FMA) estimator on the basis of the quasi-likelihood principle and examine theoretical properties of the FIC and FMA. The major advantages of the proposed procedures over the existing ones are their computational expediency and theoretical reliability. Simulation experiments have provided evidence of the superiority of the proposed procedures. The approach is further applied to a real-world data example.Comment: Published in at http://dx.doi.org/10.1214/10-AOS832 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Nonparametric Independence Screening via Favored Smoothing Bandwidth

    Full text link
    We propose a flexible nonparametric regression method for ultrahigh-dimensional data. As a first step, we propose a fast screening method based on the favored smoothing bandwidth of the marginal local constant regression. Then, an iterative procedure is developed to recover both the important covariates and the regression function. Theoretically, we prove that the favored smoothing bandwidth based screening possesses the model selection consistency property. Simulation studies as well as real data analysis show the competitive performance of the new procedure.Comment: 22 page

    Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models

    Full text link
    Structured additive regression provides a general framework for complex Gaussian and non-Gaussian regression models, with predictors comprising arbitrary combinations of nonlinear functions and surfaces, spatial effects, varying coefficients, random effects and further regression terms. The large flexibility of structured additive regression makes function selection a challenging and important task, aiming at (1) selecting the relevant covariates, (2) choosing an appropriate and parsimonious representation of the impact of covariates on the predictor and (3) determining the required interactions. We propose a spike-and-slab prior structure for function selection that allows to include or exclude single coefficients as well as blocks of coefficients representing specific model terms. A novel multiplicative parameter expansion is required to obtain good mixing and convergence properties in a Markov chain Monte Carlo simulation approach and is shown to induce desirable shrinkage properties. In simulation studies and with (real) benchmark classification data, we investigate sensitivity to hyperparameter settings and compare performance to competitors. The flexibility and applicability of our approach are demonstrated in an additive piecewise exponential model with time-varying effects for right-censored survival times of intensive care patients with sepsis. Geoadditive and additive mixed logit model applications are discussed in an extensive appendix
    corecore