4 research outputs found

    Kernel based methods for accelerated failure time model with ultra-high dimensional data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Most genomic data have ultra-high dimensions with more than 10,000 genes (probes). Regularization methods with <it>L</it><sub>1 </sub>and <it>L<sub>p </sub></it>penalty have been extensively studied in survival analysis with high-dimensional genomic data. However, when the sample size <it>n </it>≪ <it>m </it>(the number of genes), directly identifying a small subset of genes from ultra-high (<it>m </it>> 10, 000) dimensional data is time-consuming and not computationally efficient. In current microarray analysis, what people really do is select a couple of thousands (or hundreds) of genes using univariate analysis or statistical tests, and then apply the LASSO-type penalty to further reduce the number of disease associated genes. This two-step procedure may introduce bias and inaccuracy and lead us to miss biologically important genes.</p> <p>Results</p> <p>The accelerated failure time (AFT) model is a linear regression model and a useful alternative to the Cox model for survival analysis. In this paper, we propose a nonlinear kernel based AFT model and an efficient variable selection method with adaptive kernel ridge regression. Our proposed variable selection method is based on the kernel matrix and dual problem with a much smaller <it>n </it>× <it>n </it>matrix. It is very efficient when the number of unknown variables (genes) is much larger than the number of samples. Moreover, the primal variables are explicitly updated and the sparsity in the solution is exploited.</p> <p>Conclusions</p> <p>Our proposed methods can simultaneously identify survival associated prognostic factors and predict survival outcomes with ultra-high dimensional genomic data. We have demonstrated the performance of our methods with both simulation and real data. The proposed method performs superbly with limited computational studies.</p

    Novel Computational Methods for Censored Data and Regression

    Get PDF
    This dissertation can be divided into three topics. In the first topic, we derived a recursive algorithm for the constrained Kaplan-Meier estimator, which promotes the computation speed up to fifty times compared to the current method that uses EM algorithm. We also showed how this leads to the vast improvement of empirical likelihood analysis with right censored data. After a brief review of regularized regressions, we investigated the computational problems in the parametric/non-parametric hybrid accelerated failure time models and its regularization in a high dimensional setting. We also illustrated that, when the number of pieces increases, the discussed models are close to a nonparametric one. In the last topic, we discussed a semi-parametric approach of hypothesis testing problem in the binary choice model. The major tools used are Buckley-James like algorithm and empirical likelihood. The essential idea, which is similar to the first topic, is iteratively computing linear constrained empirical likelihood using optimization algorithms including EM, and iterative convex minorant algorithm

    Advanced Bayesian Models for High-Dimensional Biomedical Data

    Get PDF
    Alzheimer’s Disease (AD) is a neurodegenerative and firmly incurable disease, and the total number of AD patients is predicted to be 13.8 million by 2050. Our motivation comes from needs to unravel a missing link between AD and biomedical information for a better understanding of AD. With the advent of data acquisition techniques, we could obtain more biomedical data with a massive and complex structure. Classical statistical models, however, often fail to address the unique structures, which hinders rigorous analysis. A fundamental question this dissertation is asking is how to use the data in a better way. Bayesian methods for high-dimensional data have been successfully employed by using novel priors, MCMC algorithms, and hierarchical modeling. This dissertation proposes novel Bayesian approaches to address statistical challenges arising in biomedical data including brain imaging and genetic data. The first and second projects aim to quantify effects of hippocampal morphology and genetic variants on the time to conversion to AD within mild cognitive impairment (MCI) patients. We propose Bayesian survival models with functional/high-dimensional covariates. The third project discusses a Bayesian matrix decomposition method applicable to brain functional connectivity. It facilitates estimation of clinical covariates, the examination of whether functional connectivity is different among normal, MCI, and AD subjects.Doctor of Philosoph
    corecore