770 research outputs found
Regression methods for survival and multistate models.
A common research interest in medical, biological, and engineering research is determining whether certain independent variables are correlated with the survival or failure times. Standard statistical techniques cannot usually be applied for failure-time data due to the lack of complete data or in other word, due to censoring. From a statistical perspective, the study of time to event data is even more challenging when further complexities such as high dimensionality or multivariablity is added to the model. In this dissertation, we consider the predicating patient survival from proteomic profile of patient serum using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) data of non-small cell lung cancer patients. Due to much larger dimension of features in a mass spectrum compared to the study sample size, traditional linear regression modeling of survival times with high number of proteomic features is not feasible. Hence, we consider latent factor and regularized/penalized methods for fitting such models in order to predict patient survival from the mass spectrometry features. Extensive numerical studies involving both simulated as well as real mass spectrometry data are used to compare four popular regression methods, namely, partial least squares (PLS), sparse partial least square (SPLS), least absolute shrinkage and selection operator (LASSO) and elastic net regularization, on processed spectra. Right censoring is handled through a residual based multiple imputation. Overall, more complex methods such as the elastic net and SPLS result in better performances provided the operational parameters are chosen carefully via cross validation. For survival time prediction, we recommend using the elastic net based on a selected set of features. As a type of multivariate survival data, multistate models have a wide range of applications. Most of the existing regression approaches to analyze such data are based on parametric and semi-parametric procedures in which one should rely on specific model structures. In this dissertation, we construct non-parametric regression estimators of a number of temporal functions in a multistate system based on a univariate continuous baseline covariate. These estimators include state occupation probabilities, state entry, exit and waiting (sojourn) times distribution functions of a general progressive (e.g. acyclic) multistate model. The data are subject to right censoring and the censoring mechanism is explainable by observable covariates that could be time dependent. The resulting estimators are valid even if the multistate process is non-Markov. The performance of the estimators is studied using a detailed simulation. We illustrate our estimators using a data set on bone marrow transplant patients. Finally, some extension of the proposed methods to more general case with multivariate covariates are presented along with plans for future developments
Bayesian Conditional Transformation Models
Recent developments in statistical regression methodology establish flexible
relationships between all parameters of the response distribution and the
covariates. This shift away from pure mean regression is just one example and
is further intensified by conditional transformation models (CTMs). They aim to
infer the entire conditional distribution directly by applying a transformation
function that transforms the response conditionally on a set of covariates
towards a simple log-concave reference distribution. Thus, CTMs allow not only
variance, kurtosis and skewness but the complete conditional distribution
function to depend on the explanatory variables. In this article, we propose a
Bayesian notion of conditional transformation models (BCTM) for discrete and
continuous responses in the presence of random censoring. Rather than relying
on simple polynomials, we implement a spline-based parametrization for
monotonic effects that are supplemented with smoothness penalties. Furthermore,
we are able to benefit from the Bayesian paradigm directly via easily
obtainable credible intervals and other quantities without relying on large
sample approximations. A simulation study demonstrates the competitiveness of
our approach against its likelihood-based counterpart, most likely
transformations (MLTs) and Bayesian additive models of location, scale and
shape (BAMLSS). Three applications illustrate the versatility of the BCTMs in
problems involving real world data
Focused information criterion and model averaging for generalized additive partial linear models
We study model selection and model averaging in generalized additive partial
linear models (GAPLMs). Polynomial spline is used to approximate nonparametric
functions. The corresponding estimators of the linear parameters are shown to
be asymptotically normal. We then develop a focused information criterion (FIC)
and a frequentist model average (FMA) estimator on the basis of the
quasi-likelihood principle and examine theoretical properties of the FIC and
FMA. The major advantages of the proposed procedures over the existing ones are
their computational expediency and theoretical reliability. Simulation
experiments have provided evidence of the superiority of the proposed
procedures. The approach is further applied to a real-world data example.Comment: Published in at http://dx.doi.org/10.1214/10-AOS832 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Nonparametric Independence Screening via Favored Smoothing Bandwidth
We propose a flexible nonparametric regression method for
ultrahigh-dimensional data. As a first step, we propose a fast screening method
based on the favored smoothing bandwidth of the marginal local constant
regression. Then, an iterative procedure is developed to recover both the
important covariates and the regression function. Theoretically, we prove that
the favored smoothing bandwidth based screening possesses the model selection
consistency property. Simulation studies as well as real data analysis show the
competitive performance of the new procedure.Comment: 22 page
Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models
Structured additive regression provides a general framework for complex
Gaussian and non-Gaussian regression models, with predictors comprising
arbitrary combinations of nonlinear functions and surfaces, spatial effects,
varying coefficients, random effects and further regression terms. The large
flexibility of structured additive regression makes function selection a
challenging and important task, aiming at (1) selecting the relevant
covariates, (2) choosing an appropriate and parsimonious representation of the
impact of covariates on the predictor and (3) determining the required
interactions. We propose a spike-and-slab prior structure for function
selection that allows to include or exclude single coefficients as well as
blocks of coefficients representing specific model terms. A novel
multiplicative parameter expansion is required to obtain good mixing and
convergence properties in a Markov chain Monte Carlo simulation approach and is
shown to induce desirable shrinkage properties. In simulation studies and with
(real) benchmark classification data, we investigate sensitivity to
hyperparameter settings and compare performance to competitors. The flexibility
and applicability of our approach are demonstrated in an additive piecewise
exponential model with time-varying effects for right-censored survival times
of intensive care patients with sepsis. Geoadditive and additive mixed logit
model applications are discussed in an extensive appendix
- …