72,433 research outputs found
Gaussian processes and SVM: Mean field results and leave-one-out
In this chapter, we elaborate on the well-known relationship between Gaussian processes (GP) and Support Vector Machines (SVM). Secondly, we present approximate solutions for two computational problems arising in GP and SVM. The first one is the calculation of the posterior mean for GP classifiers using a `naive' mean field approach. The second one is a leave-one-out estimator for the generalization error of SVM based on a linear response method. Simulation results on a benchmark dataset show similar performances for the GP mean field algorithm and the SVM algorithm. The approximate leave-one-out estimator is found to be in very good agreement with the exact leave-one-out error
Separable Convex Optimization with Nested Lower and Upper Constraints
We study a convex resource allocation problem in which lower and upper bounds
are imposed on partial sums of allocations. This model is linked to a large
range of applications, including production planning, speed optimization,
stratified sampling, support vector machines, portfolio management, and
telecommunications. We propose an efficient gradient-free divide-and-conquer
algorithm, which uses monotonicity arguments to generate valid bounds from the
recursive calls, and eliminate linking constraints based on the information
from sub-problems. This algorithm does not need strict convexity or
differentiability. It produces an -approximate solution for the
continuous problem in time
and an integer solution in time, where is
the number of decision variables, is the number of constraints, and is
the resource bound. A complexity of is also achieved
for the linear and quadratic cases. These are the best complexities known to
date for this important problem class. Our experimental analyses confirm the
good performance of the method, which produces optimal solutions for problems
with up to 1,000,000 variables in a few seconds. Promising applications to the
support vector ordinal regression problem are also investigated
Training Support Vector Machines Using Frank-Wolfe Optimization Methods
Training a Support Vector Machine (SVM) requires the solution of a quadratic
programming problem (QP) whose computational complexity becomes prohibitively
expensive for large scale datasets. Traditional optimization methods cannot be
directly applied in these cases, mainly due to memory restrictions.
By adopting a slightly different objective function and under mild conditions
on the kernel used within the model, efficient algorithms to train SVMs have
been devised under the name of Core Vector Machines (CVMs). This framework
exploits the equivalence of the resulting learning problem with the task of
building a Minimal Enclosing Ball (MEB) problem in a feature space, where data
is implicitly embedded by a kernel function.
In this paper, we improve on the CVM approach by proposing two novel methods
to build SVMs based on the Frank-Wolfe algorithm, recently revisited as a fast
method to approximate the solution of a MEB problem. In contrast to CVMs, our
algorithms do not require to compute the solutions of a sequence of
increasingly complex QPs and are defined by using only analytic optimization
steps. Experiments on a large collection of datasets show that our methods
scale better than CVMs in most cases, sometimes at the price of a slightly
lower accuracy. As CVMs, the proposed methods can be easily extended to machine
learning problems other than binary classification. However, effective
classifiers are also obtained using kernels which do not satisfy the condition
required by CVMs and can thus be used for a wider set of problems
Fast SVM training using approximate extreme points
Applications of non-linear kernel Support Vector Machines (SVMs) to large
datasets is seriously hampered by its excessive training time. We propose a
modification, called the approximate extreme points support vector machine
(AESVM), that is aimed at overcoming this burden. Our approach relies on
conducting the SVM optimization over a carefully selected subset, called the
representative set, of the training dataset. We present analytical results that
indicate the similarity of AESVM and SVM solutions. A linear time algorithm
based on convex hulls and extreme points is used to compute the representative
set in kernel space. Extensive computational experiments on nine datasets
compared AESVM to LIBSVM \citep{LIBSVM}, CVM \citep{Tsang05}, BVM
\citep{Tsang07}, LASVM \citep{Bordes05},
\citep{Joachims09}, and the random features method \citep{rahimi07}. Our AESVM
implementation was found to train much faster than the other methods, while its
classification accuracy was similar to that of LIBSVM in all cases. In
particular, for a seizure detection dataset, AESVM training was almost
times faster than LIBSVM and LASVM and more than forty times faster than CVM
and BVM. Additionally, AESVM also gave competitively fast classification times.Comment: The manuscript in revised form has been submitted to J. Machine
Learning Researc
The Case for Approximate Intermittent Computing
We present the concept of approximate intermittent computing and concretely demonstrate its application. Intermittent computations stem from the erratic energy patterns caused by energy harvesting: computations unpredictably terminate whenever energy is insufficient and the application state is lost. Existing solutions maintain equivalence to continuous executions by creating persistent state on non-volatile memory, enabling stateful computations to cross power failures. The performance penalty is massive: system throughput reduces while energy consumption increases. In contrast, approximate intermittent computations trade the accuracy of the results for sparing the entire overhead to maintain equivalence to a continuous execution. This is possible as we use approximation to limit the extent of stateful computations to the single power cycle, enabling the system to completely shift the energy budget for managing persistent state to useful computations towards an immediate approximate result. To this end, we effectively reverse the regular formulation of approximate computing problems. First, we apply approximate intermittent computing to human activity recognition. We design an anytime variation of support vector machines able to improve the accuracy of the classification as energy is available. We build a hw/sw prototype using kinetic energy and show a 7x improvement in system throughput compared to state-of-the-art system support for intermittent computing, while retaining 83% accuracy in a setting where the best attainable accuracy is 88%. Next, we apply approximate intermittent computing in a sharply different scenario, that is, embedded image processing, using loop perforation. Using a different hw/sw prototype we build and diverse energy traces, we show a 5x improvement in system throughput compared to state-of-the-art system support for intermittent computing, while providing an equivalent output in 84% of the cases
A Divide-and-Conquer Solver for Kernel Support Vector Machines
The kernel support vector machine (SVM) is one of the most widely used
classification methods; however, the amount of computation required becomes the
bottleneck when facing millions of samples. In this paper, we propose and
analyze a novel divide-and-conquer solver for kernel SVMs (DC-SVM). In the
division step, we partition the kernel SVM problem into smaller subproblems by
clustering the data, so that each subproblem can be solved independently and
efficiently. We show theoretically that the support vectors identified by the
subproblem solution are likely to be support vectors of the entire kernel SVM
problem, provided that the problem is partitioned appropriately by kernel
clustering. In the conquer step, the local solutions from the subproblems are
used to initialize a global coordinate descent solver, which converges quickly
as suggested by our analysis. By extending this idea, we develop a multilevel
Divide-and-Conquer SVM algorithm with adaptive clustering and early prediction
strategy, which outperforms state-of-the-art methods in terms of training
speed, testing accuracy, and memory usage. As an example, on the covtype
dataset with half-a-million samples, DC-SVM is 7 times faster than LIBSVM in
obtaining the exact SVM solution (to within relative error) which
achieves 96.15% prediction accuracy. Moreover, with our proposed early
prediction strategy, DC-SVM achieves about 96% accuracy in only 12 minutes,
which is more than 100 times faster than LIBSVM
Quadratic convergence of smoothing Newton's method for 0/1 loss optimization
It has been widely recognized that the 0/1-loss function is one of the most natural choices for modelling classification errors, and it has a wide range of applications including support vector machines and 1-bit compressed sensing. Due to the combinatorial nature of the 0/1 loss function, methods based on convex relaxations or smoothing approximations have dominated the existing research and are often able to provide approximate solutions of good quality. However, those methods are not optimizing the 0/1 loss function directly and hence no optimality has been established for the original problem. This paper aims to study the optimality conditions of the 0/1 function minimization and for the first time to develop Newton's method that directly optimizes the 0/1 function with a local quadratic convergence under reasonable conditions. Extensive numerical experiments demonstrate its superior performance as one would expect from Newton-type methods
- …