151 research outputs found
A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing
The past years have witnessed many dedicated open-source projects that built
and maintain implementations of Support Vector Machines (SVM), parallelized for
GPU, multi-core CPUs and distributed systems. Up to this point, no comparable
effort has been made to parallelize the Elastic Net, despite its popularity in
many high impact applications, including genetics, neuroscience and systems
biology. The first contribution in this paper is of theoretical nature. We
establish a tight link between two seemingly different algorithms and prove
that Elastic Net regression can be reduced to SVM with squared hinge loss
classification. Our second contribution is to derive a practical algorithm
based on this reduction. The reduction enables us to utilize prior efforts in
speeding up and parallelizing SVMs to obtain a highly optimized and parallel
solver for the Elastic Net and Lasso. With a simple wrapper, consisting of only
11 lines of MATLAB code, we obtain an Elastic Net implementation that naturally
utilizes GPU and multi-core CPUs. We demonstrate on twelve real world data
sets, that our algorithm yields identical results as the popular (and highly
optimized) glmnet implementation but is one or several orders of magnitude
faster.Comment: 10 page
Krylov Solvers for Interior Point Methods with Applications in Radiation Therapy and Support Vector Machines
Interior point methods are widely used for different types of mathematical
optimization problems. Many implementations of interior point methods in use
today rely on direct linear solvers to solve systems of equations in each
iteration. The need to solve ever larger optimization problems more efficiently
and the rise of hardware accelerators for general purpose computing has led to
a large interest in using iterative linear solvers instead, with the major
issue being inevitable ill-conditioning of the linear systems arising as the
optimization progresses. We investigate the use of Krylov solvers for interior
point methods in solving optimization problems from radiation therapy and
support vector machines. We implement a prototype interior point method using a
so called doubly augmented formulation of the Karush-Kuhn-Tucker linear system
of equations, originally proposed by Forsgren and Gill, and evaluate its
performance on real optimization problems from radiation therapy and support
vector machines. Crucially, our implementation uses a preconditioned conjugate
gradient method with Jacobi preconditioning internally. Our measurements of the
conditioning of the linear systems indicate that the Jacobi preconditioner
improves the conditioning of the systems to a degree that they can be solved
iteratively, but there is room for further improvement in that regard.
Furthermore, profiling of our prototype code shows that it is suitable for GPU
acceleration, which may further improve its performance in practice. Overall,
our results indicate that our method can find solutions of acceptable accuracy
in reasonable time, even with a simple Jacobi preconditioner
GPU Acceleration of ADMM for Large-Scale Quadratic Programming
The alternating direction method of multipliers (ADMM) is a powerful operator
splitting technique for solving structured convex optimization problems. Due to
its relatively low per-iteration computational cost and ability to exploit
sparsity in the problem data, it is particularly suitable for large-scale
optimization. However, the method may still take prohibitively long to compute
solutions to very large problem instances. Although ADMM is known to be
parallelizable, this feature is rarely exploited in real implementations. In
this paper we exploit the parallel computing architecture of a graphics
processing unit (GPU) to accelerate ADMM. We build our solver on top of OSQP, a
state-of-the-art implementation of ADMM for quadratic programming. Our
open-source CUDA C implementation has been tested on many large-scale problems
and was shown to be up to two orders of magnitude faster than the CPU
implementation
Regularized Optimal Transport and the Rot Mover's Distance
This paper presents a unified framework for smooth convex regularization of
discrete optimal transport problems. In this context, the regularized optimal
transport turns out to be equivalent to a matrix nearness problem with respect
to Bregman divergences. Our framework thus naturally generalizes a previously
proposed regularization based on the Boltzmann-Shannon entropy related to the
Kullback-Leibler divergence, and solved with the Sinkhorn-Knopp algorithm. We
call the regularized optimal transport distance the rot mover's distance in
reference to the classical earth mover's distance. We develop two generic
schemes that we respectively call the alternate scaling algorithm and the
non-negative alternate scaling algorithm, to compute efficiently the
regularized optimal plans depending on whether the domain of the regularizer
lies within the non-negative orthant or not. These schemes are based on
Dykstra's algorithm with alternate Bregman projections, and further exploit the
Newton-Raphson method when applied to separable divergences. We enhance the
separable case with a sparse extension to deal with high data dimensions. We
also instantiate our proposed framework and discuss the inherent specificities
for well-known regularizers and statistical divergences in the machine learning
and information geometry communities. Finally, we demonstrate the merits of our
methods with experiments using synthetic data to illustrate the effect of
different regularizers and penalties on the solutions, as well as real-world
data for a pattern recognition application to audio scene classification
Fast Machine Learning Algorithms for Massive Datasets with Applications in the Biomedical Domain
The continuous increase in the size of datasets introduces computational challenges for machine learning algorithms. In this dissertation, we cover the machine learning algorithms and applications in large-scale data analysis in manufacturing and healthcare. We begin with introducing a multilevel framework to scale the support vector machine (SVM), a popular supervised learning algorithm with a few tunable hyperparameters and highly accurate prediction. The computational complexity of nonlinear SVM is prohibitive on large-scale datasets compared to the linear SVM, which is more scalable for massive datasets. The nonlinear SVM has shown to produce significantly higher classification quality on complex and highly imbalanced datasets. However, a higher classification quality requires a computationally expensive quadratic programming solver and extra kernel parameters for model selection. We introduce a generalized fast multilevel framework for regular, weighted, and instance weighted SVM that achieves similar or better classification quality compared to the state-of-the-art SVM libraries such as LIBSVM. Our framework improves the runtime more than two orders of magnitude for some of the well-known benchmark datasets. We cover multiple versions of our proposed framework and its implementation in detail. The framework is implemented using PETSc library which allows easy integration with scientific computing tasks. Next, we propose an adaptive multilevel learning framework for SVM to reduce the variance between prediction qualities across the levels, improve the overall prediction accuracy, and boost the runtime. We implement multi-threaded support to speed up the parameter fitting runtime that results in more than an order of magnitude speed-up. We design an early stopping criteria to reduce the extra computational cost when we achieve expected prediction quality. This approach provides significant speed-up, especially for massive datasets. Finally, we propose an efficient low dimensional feature extraction over massive knowledge networks. Knowledge networks are becoming more popular in the biomedical domain for knowledge representation. Each layer in knowledge networks can store the information from one or multiple sources of data. The relationships between concepts or between layers represent valuable information. The proposed feature engineering approach provides an efficient and highly accurate prediction of the relationship between biomedical concepts on massive datasets. Our proposed approach utilizes semantics and probabilities to reduce the potential search space for the exploration and learning of machine learning algorithms. The calculation of probabilities is highly scalable with the size of the knowledge network. The number of features is fixed and equivalent to the number of relationships or classes in the data. A comprehensive comparison of well-known classifiers such as random forest, SVM, and deep learning over various features extracted from the same dataset, provides an overview for performance and computational trade-offs. Our source code, documentation and parameters will be available at https://github.com/esadr/
- …