SUPERVISED LEARNING FOR COMPLEX DATA

Abstract

Supervised learning problems are commonly seen in a wide range of scientific fields such as medicine and neuroscience. Given data with predictors and responses, an important goal of supervised learning is to find the underlying relationship between predictors and responses for future prediction. In this dissertation, we propose three new supervised learning approaches for the analysis of complex data. For the first two projects, we focus on block-wise missing multi-modal data which contain samples with different modalities. In the first project, we study regression problems with multiple responses. We propose a new penalized method to predict multiple correlated responses jointly, using not only the information from block-wise missing predictors but also the correlation information among responses. In the second project, we study regression problems with censored outcomes. We propose a penalized Buckley-James method that can simultaneously handle block-wise missing covariates and censored outcomes. For the third project, we analyze data streams under reproducing kernel Hilbert spaces. Specifically, we develop a new supervised learning method to learn the underlying model with limited storage space, where the model may be non-stationary. We use a shrinkage parameter and a data sparsity constraint to balance the bias-variance tradeoff, and use random feature approximation to control the storage space.Doctor of Philosoph

    Similar works