3 research outputs found

    Scaling Up Large-scale Sparse Learning and Its Application to Medical Imaging

    Get PDF
    abstract: Large-scale â„“1\ell_1-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. In many applications, it remains challenging to apply the sparse learning model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy is to scaling up the optimization problem in parallel. Parallel solvers run multiple cores on a shared memory system or a distributed environment to speed up the computation, while the practical usage is limited by the huge dimension in the feature space and synchronization problems. In this dissertation, I carry out the research along the direction with particular focuses on scaling up the optimization of sparse learning for supervised and unsupervised learning problems. For the supervised learning, I firstly propose an asynchronous parallel solver to optimize the large-scale sparse learning model in a multithreading environment. Moreover, I propose a distributed framework to conduct the learning process when the dataset is distributed stored among different machines. Then the proposed model is further extended to the studies of risk genetic factors for Alzheimer's Disease (AD) among different research institutions, integrating a group feature selection framework to rank the top risk SNPs for AD. For the unsupervised learning problem, I propose a highly efficient solver, termed Stochastic Coordinate Coding (SCC), scaling up the optimization of dictionary learning and sparse coding problems. The common issue for the medical imaging research is that the longitudinal features of patients among different time points are beneficial to study together. To further improve the dictionary learning model, I propose a multi-task dictionary learning method, learning the different task simultaneously and utilizing shared and individual dictionary to encode both consistent and changing imaging features.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Machine Learning Methods for Diagnosis, Prognosis and Prediction of Long-term Treatment Outcome of Major Depression

    Get PDF
    abstract: Major Depression, clinically called Major Depressive Disorder, is a mood disorder that affects about one eighth of population in US and is projected to be the second leading cause of disability in the world by the year 2020. Recent advances in biotechnology have enabled us to collect a great variety of data which could potentially offer us a deeper understanding of the disorder as well as advancing personalized medicine. This dissertation focuses on developing methods for three different aspects of predictive analytics related to the disorder: automatic diagnosis, prognosis, and prediction of long-term treatment outcome. The data used for each task have their specific characteristics and demonstrate unique problems. Automatic diagnosis of melancholic depression is made on the basis of metabolic profiles and micro-array gene expression profiles where the presence of missing values and strong empirical correlation between the variables is not unusual. To deal with these problems, a method of generating a representative set of features is proposed. Prognosis is made on data collected from rating scales and questionnaires which consist mainly of categorical and ordinal variables and thus favor decision tree based predictive models. Decision tree models are known for the notorious problem of overfitting. A decision tree pruning method that overcomes the shortcomings of a greedy nature and reliance on heuristics inherent in traditional decision tree pruning approaches is proposed. The method is further extended to prune Gradient Boosting Decision Tree and tested on the task of prognosis of treatment outcome. Follow-up studies evaluating the long-term effect of the treatments on patients usually measure patients' depressive symptom severity monthly, resulting in the actual time of relapse upper bounded by the observed time of relapse. To resolve such uncertainty in response, a general loss function where the hypothesis could take different forms is proposed to predict the risk of relapse in situations where only an interval for time of relapse can be derived from the observed data.Dissertation/ThesisDoctoral Dissertation Computer Science 201
    corecore