78 research outputs found
Global Convergence of the (1+1) Evolution Strategy
We establish global convergence of the (1+1) evolution strategy, i.e.,
convergence to a critical point independent of the initial state. More
precisely, we show the existence of a critical limit point, using a suitable
extension of the notion of a critical point to measurable functions. At its
core, the analysis is based on a novel progress guarantee for elitist,
rank-based evolutionary algorithms. By applying it to the (1+1) evolution
strategy we are able to provide an accurate characterization of whether global
convergence is guaranteed with full probability, or whether premature
convergence is possible. We illustrate our results on a number of example
applications ranging from smooth (non-convex) cases over different types of
saddle points and ridge functions to discontinuous and extremely rugged
problems
The Planning-ahead SMO Algorithm
The sequential minimal optimization (SMO) algorithm and variants thereof are
the de facto standard method for solving large quadratic programs for support
vector machine (SVM) training. In this paper we propose a simple yet powerful
modification. The main emphasis is on an algorithm improving the SMO step size
by planning-ahead. The theoretical analysis ensures its convergence to the
optimum. Experiments involving a large number of datasets were carried out to
demonstrate the superiority of the new algorithm
Challenges of Convex Quadratic Bi-objective Benchmark Problems
Convex quadratic objective functions are an important base case in
state-of-the-art benchmark collections for single-objective optimization on
continuous domains. Although often considered rather simple, they represent the
highly relevant challenges of non-separability and ill-conditioning. In the
multi-objective case, quadratic benchmark problems are under-represented. In
this paper we analyze the specific challenges that can be posed by quadratic
functions in the bi-objective case. Our construction yields a full factorial
design of 54 different problem classes. We perform experiments with
well-established algorithms to demonstrate the insights that can be supported
by this function class. We find huge performance differences, which can be
clearly attributed to two root causes: non-separability and alignment of the
Pareto set with the coordinate system
Limits of End-to-End Learning
End-to-end learning refers to training a possibly complex learning system by
applying gradient-based learning to the system as a whole. End-to-end learning
system is specifically designed so that all modules are differentiable. In
effect, not only a central learning machine, but also all "peripheral" modules
like representation learning and memory formation are covered by a holistic
learning process. The power of end-to-end learning has been demonstrated on
many tasks, like playing a whole array of Atari video games with a single
architecture. While pushing for solutions to more challenging tasks, network
architectures keep growing more and more complex.
In this paper we ask the question whether and to what extent end-to-end
learning is a future-proof technique in the sense of scaling to complex and
diverse data processing architectures. We point out potential inefficiencies,
and we argue in particular that end-to-end learning does not make optimal use
of the modular design of present neural networks. Our surprisingly simple
experiments demonstrate these inefficiencies, up to the complete breakdown of
learning
Accelerated Linear SVM Training with Adaptive Variable Selection Frequencies
Support vector machine (SVM) training is an active research area since the
dawn of the method. In recent years there has been increasing interest in
specialized solvers for the important case of linear models. The algorithm
presented by Hsieh et al., probably best known under the name of the
"liblinear" implementation, marks a major breakthrough. The method is analog to
established dual decomposition algorithms for training of non-linear SVMs, but
with greatly reduced computational complexity per update step. This comes at
the cost of not keeping track of the gradient of the objective any more, which
excludes the application of highly developed working set selection algorithms.
We present an algorithmic improvement to this method. We replace uniform
working set selection with an online adaptation of selection frequencies. The
adaptation criterion is inspired by modern second order working set selection
methods. The same mechanism replaces the shrinking heuristic. This novel
technique speeds up training in some cases by more than an order of magnitude
Coordinate Descent with Online Adaptation of Coordinate Frequencies
Coordinate descent (CD) algorithms have become the method of choice for
solving a number of optimization problems in machine learning. They are
particularly popular for training linear models, including linear support
vector machine classification, LASSO regression, and logistic regression.
We consider general CD with non-uniform selection of coordinates. Instead of
fixing selection frequencies beforehand we propose an online adaptation
mechanism for this important parameter, called the adaptive coordinate
frequencies (ACF) method. This mechanism removes the need to estimate optimal
coordinate frequencies beforehand, and it automatically reacts to changing
requirements during an optimization run.
We demonstrate the usefulness of our ACF-CD approach for a variety of
optimization problems arising in machine learning contexts. Our algorithm
offers significant speed-ups over state-of-the-art training methods
Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)
We propose a multi-objective optimization algorithm aimed at achieving good
anytime performance over a wide range of problems. Performance is assessed in
terms of the hypervolume metric. The algorithm called HMO-CMA-ES represents a
hybrid of several old and new variants of CMA-ES, complemented by BOBYQA as a
warm start. We benchmark HMO-CMA-ES on the recently introduced bi-objective
problem suite of the COCO framework (COmparing Continuous Optimizers),
consisting of 55 scalable continuous optimization problems, which is used by
the Black-Box Optimization Benchmarking (BBOB) Workshop 2016.Comment: BBOB workshop of GECCO'201
Vehicle Shape and Color Classification Using Convolutional Neural Network
This paper presents a module of vehicle reidentification based on make/model
and color classification. It could be used by the Automated Vehicular
Surveillance (AVS) or by the fast analysis of video data. Many of problems,
that are related to this topic, had to be addressed. In order to facilitate and
accelerate the progress in this subject, we will present our way to collect and
to label a large scale data set. We used deeper neural networks in our
training. They showed a good classification accuracy. We show the results of
make/model and color classification on controlled and video data set. We
demonstrate with the help of a developed application the re-identification of
vehicles on video images based on make/model and color classification. This
work was partially funded under the grant
Dual SVM Training on a Budget
We present a dual subspace ascent algorithm for support vector machine
training that respects a budget constraint limiting the number of support
vectors. Budget methods are effective for reducing the training time of kernel
SVM while retaining high accuracy. To date, budget training is available only
for primal (SGD-based) solvers. Dual subspace ascent methods like sequential
minimal optimization are attractive for their good adaptation to the problem
structure, their fast convergence rate, and their practical speed. By
incorporating a budget constraint into a dual algorithm, our method enjoys the
best of both worlds. We demonstrate considerable speed-ups over primal budget
training methods
Limited-Memory Matrix Adaptation for Large Scale Black-box Optimization
The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is a popular
method to deal with nonconvex and/or stochastic optimization problems when the
gradient information is not available. Being based on the CMA-ES, the recently
proposed Matrix Adaptation Evolution Strategy (MA-ES) provides a rather
surprising result that the covariance matrix and all associated operations
(e.g., potentially unstable eigendecomposition) can be replaced in the CMA-ES
by a updated transformation matrix without any loss of performance. In order to
further simplify MA-ES and reduce its time and
storage complexity to , we present the
Limited-Memory Matrix Adaptation Evolution Strategy (LM-MA-ES) for efficient
zeroth order large-scale optimization. The algorithm demonstrates
state-of-the-art performance on a set of established large-scale benchmarks. We
explore the algorithm on the problem of generating adversarial inputs for a
(non-smooth) random forest classifier, demonstrating a surprising vulnerability
of the classifier
- …