221 research outputs found
ν-SVM solutions of constrained lasso and elastic net
Many important linear sparse models have at its core the Lasso problem, for which the GLMNet algorithm is often considered as the current state of the art. Recently M. Jaggi has observed that Constrained Lasso (CL) can be reduced to an SVM-like problem, for which the LIBSVM library provides very efficient algorithms. This suggests that it could also be used advantageously to solve CL. In this work we will refine Jaggi’s arguments to reduce CL as well as constrained Elastic Net to a Nearest Point Problem, which in turn can be rewritten as an appropriate ν-SVM problem solvable by LIBSVM. We will also show experimentally that the well-known LIBSVM library results in a faster convergence than GLMNet for small problems and also, if properly adapted, for larger ones. Screening is another ingredient to speed up solving Lasso. Shrinking can be seen as the simpler alternative of SVM to screening and we will discuss how it also may in some cases reduce the cost of an SVM-based CL solutionWith partial support from Spanish government grants TIN2013-42351-P, TIN2016-76406-P, TIN2015-70308-REDT and S2013/ICE-2845 CASI-CAM-CM; work also supported by project FACIL–Ayudas Fundación BBVA a Equipos de Investigación Científica 2016 and the UAM–ADIC Chair for Data Science and Machine Learning. The first author is also supported by the FPU–MEC grant AP-2012-5163. We gratefully acknowledge the use of the facilities of Centro de Computación Científica (CCC) at UAM and thank Red Eléctrica de España for kindly supplying wind energy dat
Solution Path Algorithm for Twin Multi-class Support Vector Machine
The twin support vector machine and its extensions have made great
achievements in dealing with binary classification problems, however, which is
faced with some difficulties such as model selection and solving
multi-classification problems quickly. This paper is devoted to the fast
regularization parameter tuning algorithm for the twin multi-class support
vector machine. A new sample dataset division method is adopted and the
Lagrangian multipliers are proved to be piecewise linear with respect to the
regularization parameters by combining the linear equations and block matrix
theory. Eight kinds of events are defined to seek for the starting event and
then the solution path algorithm is designed, which greatly reduces the
computational cost. In addition, only few points are combined to complete the
initialization and Lagrangian multipliers are proved to be 1 as the
regularization parameter tends to infinity. Simulation results based on UCI
datasets show that the proposed method can achieve good classification
performance with reducing the computational cost of grid search method from
exponential level to the constant level
An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification
Sparsity regularized loss minimization problems play an important role in
various fields including machine learning, data mining, and modern statistics.
Proximal gradient descent method and coordinate descent method are the most
popular approaches to solving the minimization problem. Although existing
methods can achieve implicit model identification, aka support set
identification, in a finite number of iterations, these methods still suffer
from huge computational costs and memory burdens in high-dimensional scenarios.
The reason is that the support set identification in these methods is implicit
and thus cannot explicitly identify the low-complexity structure in practice,
namely, they cannot discard useless coefficients of the associated features to
achieve algorithmic acceleration via dimension reduction. To address this
challenge, we propose a novel accelerated doubly stochastic gradient descent
(ADSGD) method for sparsity regularized loss minimization problems, which can
reduce the number of block iterations by eliminating inactive coefficients
during the optimization process and eventually achieve faster explicit model
identification and improve the algorithm efficiency. Theoretically, we first
prove that ADSGD can achieve a linear convergence rate and lower overall
computational complexity. More importantly, we prove that ADSGD can achieve a
linear rate of explicit model identification. Numerically, experimental results
on benchmark datasets confirm the efficiency of our proposed method
Safe Screening With Variational Inequalities and Its Application to LASSO
Sparse learning techniques have been routinely used for feature selection as
the resulting model usually has a small number of non-zero entries. Safe
screening, which eliminates the features that are guaranteed to have zero
coefficients for a certain value of the regularization parameter, is a
technique for improving the computational efficiency. Safe screening is gaining
increasing attention since 1) solving sparse learning formulations usually has
a high computational cost especially when the number of features is large and
2) one needs to try several regularization parameters to select a suitable
model. In this paper, we propose an approach called "Sasvi" (Safe screening
with variational inequalities). Sasvi makes use of the variational inequality
that provides the sufficient and necessary optimality condition for the dual
problem. Several existing approaches for Lasso screening can be casted as
relaxed versions of the proposed Sasvi, thus Sasvi provides a stronger safe
screening rule. We further study the monotone properties of Sasvi for Lasso,
based on which a sure removal regularization parameter can be identified for
each feature. Experimental results on both synthetic and real data sets are
reported to demonstrate the effectiveness of the proposed Sasvi for Lasso
screening.Comment: Accepted by International Conference on Machine Learning 201
Acceleration Methods for Classic Convex Optimization Algorithms
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura : 12-09-2017Most Machine Learning models are defined in terms of a convex optimization problem. Thus,
developing algorithms to quickly solve such problems its of great interest to the field. We focus
in this thesis on two of the most widely used models, the Lasso and Support Vector Machines.
The former belongs to the family of regularization methods, and it was introduced in 1996 to
perform both variable selection and regression at the same time. This is accomplished by adding
a `1-regularization term to the least squares model, achieving interpretability and also a good
generalization error.
Support Vector Machines were originally formulated to solve a classification problem by
finding the maximum-margin hyperplane, that is, the hyperplane which separates two sets
of points and its at equal distance from both of them. SVMs were later extended to handle
non-separable classes and non-linear classification problems, applying the kernel-trick. A first
contribution of this work is to carefully analyze all the existing algorithms to solve both problems,
describing not only the theory behind them but also pointing out possible advantages and
disadvantages of each one.
Although the Lasso and SVMs solve very different problems, we show in this thesis that they
are both equivalent. Following a recent result by Jaggi, given an instance of one model we can
construct an instance of the other having the same solution, and vice versa. This equivalence
allows us to translate theoretical and practical results, such as algorithms, from one field to the
other, that have been otherwise being developed independently. We will give in this thesis not
only the theoretical result but also a practical application, that consists on solving the Lasso
problem using the SMO algorithm, the state-of-the-art solver for non-linear SVMs. We also
perform experiments comparing SMO to GLMNet, one of the most popular solvers for the Lasso.
The results obtained show that SMO is competitive with GLMNet, and sometimes even faster.
Furthermore, motivated by a recent trend where classical optimization methods are being
re-discovered in improved forms and successfully applied to many problems, we have also analyzed
two classical momentum-based methods: the Heavy Ball algorithm, introduced by Polyak in
1963 and Nesterov’s Accelerated Gradient, discovered by Nesterov in 1983. In this thesis we
develop practical versions of Conjugate Gradient, which is essentially equivalent to the Heavy
Ball method, and Nesterov’s Acceleration for the SMO algorithm. Experiments comparing
the convergence of all the methods are also carried out. The results show that the proposed
algorithms can achieve a faster convergence both in terms of iterations and execution time.La mayoría de modelos de Aprendizaje Automático se definen en términos de un problema
de optimización convexo. Por tanto, desarrollar algoritmos para resolver rápidamente dichos
problemas es de gran interés para este campo. En esta tesis nos centramos en dos de los modelos
más usados, Lasso y Support Vector Machines. El primero pertenece a la familia de métodos de
regularización, y fue introducido en 1996 para realizar selección de características y regresión al
mismo tiempo. Esto se consigue añadiendo una penalización `1al modelo de mínimos cuadrados,
obteniendo interpretabilidad y un buen error de generalización.
Las Máquinas de Vectores de Soporte fueron formuladas originalmente para resolver un
problema de clasificación buscando el hiper-plano de máximo margen, es decir, el hiper-plano
que separa los dos conjuntos de puntos y está a la misma distancia de ambos. Las SVMs se
han extendido posteriormente para manejar clases no separables y problemas de clasificación
no lineales, mediante el uso de núcleos. Una primera contribución de este trabajo es analizar
cuidadosamente los algoritmos existentes para resolver ambos problemas, describiendo no solo la
teoría detrás de los mismos sino también mencionando las posibles ventajas y desventajas de
cada uno.
A pesar de que el Lasso y las SVMs resuelven problemas muy diferentes, en esta tesis
demostramos que ambos son equivalentes. Continuando con un resultado reciente de Jaggi,
dada una instancia de uno de los modelos podemos construir una instancia del otro que tiene
la misma solución, y viceversa. Esta equivalencia nos permite trasladar resultados teóricos y
prácticos, como por ejemplo algoritmos, de un campo al otro, que se han desarrollado de forma
independiente. En esta tesis mostraremos no solo la equivalencia teórica sino también una
aplicación práctica, que consiste en resolver el problema Lasso usando el algoritmo SMO, que
es el estado del arte para la resolución de SVM no lineales. También realizamos experimentos
comparando SMO a GLMNet, uno de los algoritmos más populares para resolver el Lasso. Los
resultados obtenidos muestran que SMO es competitivo con GLMNet, y en ocasiones incluso
más rápido.
Además, motivado por una tendencia reciente donde métodos clásicos de optimización se
están re- descubriendo y aplicando satisfactoriamente en muchos problemas, también hemos
analizado dos métodos clásicos basados en “momento”: el algoritmo Heavy Ball, creado por
Polyak en 1963 y el Gradiente Acelerado de Nesterov, descubierto por Nesterov en 1983. En esta
tesis desarrollamos versiones prácticas de Gradiente Conjugado, que es equivalente a Heavy Ball,
y Aceleración de Nesterov para el algortimo SMO. Además, también se realizan experimentos
comparando todos los métodos. Los resultados muestran que los algoritmos propuestos a menudo
convergen más rápido, tanto en términos de iteraciones como de tiempo de ejecución
Screening for Sparse Online Learning
Sparsity promoting regularizers are widely used to impose low-complexity
structure (e.g. l1-norm for sparsity) to the regression coefficients of
supervised learning. In the realm of deterministic optimization, the sequence
generated by iterative algorithms (such as proximal gradient descent) exhibit
"finite activity identification", namely, they can identify the low-complexity
structure in a finite number of iterations. However, most online algorithms
(such as proximal stochastic gradient descent) do not have the property owing
to the vanishing step-size and non-vanishing variance. In this paper, by
combining with a screening rule, we show how to eliminate useless features of
the iterates generated by online algorithms, and thereby enforce finite
activity identification. One consequence is that when combined with any
convergent online algorithm, sparsity properties imposed by the regularizer can
be exploited for computational gains. Numerically, significant acceleration can
be obtained
- …