    Faster SVM training via conjugate SMO

    We propose an improved version of the SMO algorithm for training classification and regression SVMs, based on a Conjugate Descent procedure. This new approach only involves a modest increase on the com- putational cost of each iteration but, in turn, usually results in a substantial decrease in the number of iterations required to converge to a given precision. Besides, we prove convergence of the iterates of this new Conjugate SMO as well as a linear rate when the kernel matrix is positive definite. We have im- plemented Conjugate SMO within the LIBSVM library and show experimentally that it is faster for many hyper-parameter configurations, being often a better option than second order SMO when performing a grid-search for SVM tuning

    Acceleration Methods for Classic Convex Optimization Algorithms

    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura : 12-09-2017Most Machine Learning models are defined in terms of a convex optimization problem. Thus, developing algorithms to quickly solve such problems its of great interest to the field. We focus in this thesis on two of the most widely used models, the Lasso and Support Vector Machines. The former belongs to the family of regularization methods, and it was introduced in 1996 to perform both variable selection and regression at the same time. This is accomplished by adding a `1-regularization term to the least squares model, achieving interpretability and also a good generalization error. Support Vector Machines were originally formulated to solve a classification problem by finding the maximum-margin hyperplane, that is, the hyperplane which separates two sets of points and its at equal distance from both of them. SVMs were later extended to handle non-separable classes and non-linear classification problems, applying the kernel-trick. A first contribution of this work is to carefully analyze all the existing algorithms to solve both problems, describing not only the theory behind them but also pointing out possible advantages and disadvantages of each one. Although the Lasso and SVMs solve very different problems, we show in this thesis that they are both equivalent. Following a recent result by Jaggi, given an instance of one model we can construct an instance of the other having the same solution, and vice versa. This equivalence allows us to translate theoretical and practical results, such as algorithms, from one field to the other, that have been otherwise being developed independently. We will give in this thesis not only the theoretical result but also a practical application, that consists on solving the Lasso problem using the SMO algorithm, the state-of-the-art solver for non-linear SVMs. We also perform experiments comparing SMO to GLMNet, one of the most popular solvers for the Lasso. The results obtained show that SMO is competitive with GLMNet, and sometimes even faster. Furthermore, motivated by a recent trend where classical optimization methods are being re-discovered in improved forms and successfully applied to many problems, we have also analyzed two classical momentum-based methods: the Heavy Ball algorithm, introduced by Polyak in 1963 and Nesterov’s Accelerated Gradient, discovered by Nesterov in 1983. In this thesis we develop practical versions of Conjugate Gradient, which is essentially equivalent to the Heavy Ball method, and Nesterov’s Acceleration for the SMO algorithm.     A generic coordinate descent solver for nonsmooth convex optimization

    International audienceWe present a generic coordinate descent solver for the minimization of a nonsmooth convex objective with structure. The method can deal in particular with problems with linear constraints. The implementation makes use of efficient residual updates and automatically determines which dual variables should be duplicated. A list of basic functional atoms is pre-compiled for efficiency and a modelling language in Python allows the user to combine them at run time. So, the algorithm can be used to solve a large variety of problems including Lasso, sparse multinomial logistic regression, linear and quadratic programs

    Accelerating greedy coordinate descent methods

    We introduce and study two algorithms to accelerate greedy coordinate descent in theory and in practice: Accelerated Semi-Greedy Coordinate Descent (ASCD) and Accelerated Greedy Co-ordinate Descent (AGCD). On the theory side, our main results are for ASCD: We show that ASCD achieves 0(l/k[superscript 2]) convergence, and it also achieves accelerated linear convergence for strongly convex functions. On the empirical side, while both AGCD and ASCD outperform Accelerated Randomized Coordinate Descent on most instances in our numerical experiments, we note that AGCD significantly outperforms the other two methods in our experiments, in spite of a lack of theoretical guarantees for this method. To complement this empirical finding for AGCD, we present an explanation why standard proof techniques for acceleration cannot work for AGCD, and we introduce a technical condition under which AGCD is guaranteed to have accelerated convergence. Finally, we confirm that this technical condition holds in our numerical experiments