44,957 research outputs found
The Support Vector Machine and Mixed Integer Linear Programming: Ramp Loss SVM with L1-Norm Regularization
The support vector machine (SVM) is a flexible classification method that accommodates a kernel trick to learn nonlinear decision rules. The traditional formulation as an optimization problem is a quadratic program. In efforts to reduce computational complexity, some have proposed using an L1-norm regularization to create a linear program (LP). In other efforts aimed at increasing the robustness to outliers, investigators have proposed using the ramp loss which results in what may be expressed as a quadratic integer programming problem (QIP). In this paper, we consider combining these ideas for ramp loss SVM with L1-norm regularization. The result is four formulations for SVM that each may be expressed as a mixed integer linear program (MILP). We observe that ramp loss SVM with L1-norm regularization provides robustness to outliers with the linear kernel. We investigate the time required to find good solutions to the various formulations using a branch and bound solver
Ramp Loss SVM with L1-Norm Regularizaion
The Support Vector Machine (SVM) classification method has recently gained popularity due to the ease of implementing non-linear separating surfaces. SVM is an optimization problem with the two competing goals, minimizing misclassification on training data and maximizing a margin defined by the normal vector of a learned separating surface. We develop and implement new SVM models based on previously conceived SVM with L_1-Norm regularization with ramp loss error terms. The goal being a new SVM model that is both robust to outliers due to ramp loss, while also easy to implement in open source and off the shelf mathematical programming solvers and relatively efficient in finding solutions due to the mixed linear-integer form of the model. To show the effectiveness of the models we compare results of ramp loss SVM with L_1-Norm and L_2-Norm regularization on human organ microbial data and simulated data sets with outliers
SimpleMKL
Multiple kernel learning (MKL) aims at simultaneously learning a kernel and the associated predictor in supervised learning settings. For the support vector machine, an efficient and general multiple kernel learning algorithm, based on semi-infinite linear progamming, has been recently proposed. This approach has opened new perspectives since it makes MKL tractable for large-scale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs numerous iterations for converging towards a reasonable solution. In this paper, we address the MKL problem through a weighted 2-norm regularization formulation with an additional constraint on the weights that encourages sparse kernel combinations. Apart from learning the combination, we solve a standard SVM optimization problem, where the kernel is defined as a linear combination of multiple kernels. We propose an algorithm, named SimpleMKL, for solving this MKL problem and provide a new insight on MKL algorithms based on \mixed-norm regularization by showing that the two approaches are equivalent. We show how SimpleMKL can be applied beyond binary classification, for problems like regression, clustering (one-class classification) or multiclass classification. Experimental results show that the proposed algorithm converges rapidly and that its efficiency compares favorably to other MKL algorithms. Finally, we illustrate the usefulness of MKL for some regressors based on wavelet kernels and on some model selection problems related to multiclass classification problems
SimpleMKL
International audienceMultiple kernel learning aims at simultaneously learning a kernel and the associated predictor in supervised learning settings. For the support vector machine, an efficient and general multiple kernel learning (MKL) algorithm, based on semi-infinite linear progamming, has been recently proposed. This approach has opened new perspectives since itmakes the MKL approach tractable for large-scale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs numerous iterations for converging towards a reasonable solution. In this paper, we address the MKL problem through an adaptive 2-norm regularization formulation that encourages sparse kernel combinations. Apart from learning the combination, we solve a standard SVM optimization problem, where the kernel is defined as a linear combination of multiple kernels. We propose an algorithm, named SimpleMKL, for solving this MKL problem and provide a new insight on MKL algorithms based on mixed-norm regularization by showing that the two approaches are equivalent. Furthermore, we show how SimpleMKL can be applied beyond binary classification, for problems like regression, clustering (one-class classification) or multiclass classification. Ex- perimental results show that the proposed algorithm converges rapidly and that its efficiency compares favorably to other MKL algorithms. Finally, we illustrate the usefulness of MKL for some regressors based on wavelet kernels and on some model selection problems related to multiclass classification problems. A SimpleMKL Toolbox is available at http://asi.insa-rouen.fr/enseignants/~arakotom/code/mklindex.htm
Optimistic Robust Optimization With Applications To Machine Learning
Robust Optimization has traditionally taken a pessimistic, or worst-case
viewpoint of uncertainty which is motivated by a desire to find sets of optimal
policies that maintain feasibility under a variety of operating conditions. In
this paper, we explore an optimistic, or best-case view of uncertainty and show
that it can be a fruitful approach. We show that these techniques can be used
to address a wide variety of problems. First, we apply our methods in the
context of robust linear programming, providing a method for reducing
conservatism in intuitive ways that encode economically realistic modeling
assumptions. Second, we look at problems in machine learning and find that this
approach is strongly connected to the existing literature. Specifically, we
provide a new interpretation for popular sparsity inducing non-convex
regularization schemes. Additionally, we show that successful approaches for
dealing with outliers and noise can be interpreted as optimistic robust
optimization problems. Although many of the problems resulting from our
approach are non-convex, we find that DCA or DCA-like optimization approaches
can be intuitive and efficient
Sparse Support Vector Infinite Push
In this paper, we address the problem of embedded feature selection for
ranking on top of the list problems. We pose this problem as a regularized
empirical risk minimization with -norm push loss function () and
sparsity inducing regularizers. We leverage the issues related to this
challenging optimization problem by considering an alternating direction method
of multipliers algorithm which is built upon proximal operators of the loss
function and the regularizer. Our main technical contribution is thus to
provide a numerical scheme for computing the infinite push loss function
proximal operator. Experimental results on toy, DNA microarray and BCI problems
show how our novel algorithm compares favorably to competitors for ranking on
top while using fewer variables in the scoring function.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Interior Point Methods for Massive Support Vector Machines
We investigate the use of interior point methods for solving quadratic
programming problems with a small number of linear constraints where
the quadratic term consists of a low-rank update to a positive semi-de nite
matrix. Several formulations of the support vector machine t into this
category. An interesting feature of these particular problems is the vol-
ume of data, which can lead to quadratic programs with between 10 and
100 million variables and a dense Q matrix. We use OOQP, an object-
oriented interior point code, to solve these problem because it allows us
to easily tailor the required linear algebra to the application. Our linear
algebra implementation uses a proximal point modi cation to the under-
lying algorithm, and exploits the Sherman-Morrison-Woodbury formula
and the Schur complement to facilitate e cient linear system solution.
Since we target massive problems, the data is stored out-of-core and we
overlap computation and I/O to reduce overhead. Results are reported
for several linear support vector machine formulations demonstrating the
reliability and scalability of the method
- …