36,769 research outputs found
Runtime Distributions and Criteria for Restarts
Randomized algorithms sometimes employ a restart strategy. After a certain
number of steps, the current computation is aborted and restarted with a new,
independent random seed. In some cases, this results in an improved overall
expected runtime. This work introduces properties of the underlying runtime
distribution which determine whether restarts are advantageous. The most
commonly used probability distributions admit the use of a scale and a location
parameter. Location parameters shift the density function to the right, while
scale parameters affect the spread of the distribution. It is shown that for
all distributions scale parameters do not influence the usefulness of restarts
and that location parameters only have a limited influence. This result
simplifies the analysis of the usefulness of restarts. The most important
runtime probability distributions are the log-normal, the Weibull, and the
Pareto distribution. In this work, these distributions are analyzed for the
usefulness of restarts. Secondly, a condition for the optimal restart time (if
it exists) is provided. The log-normal, the Weibull, and the generalized Pareto
distribution are analyzed in this respect. Moreover, it is shown that the
optimal restart time is also not influenced by scale parameters and that the
influence of location parameters is only linear
From Understanding Genetic Drift to a Smart-Restart Mechanism for Estimation-of-Distribution Algorithms
Estimation-of-distribution algorithms (EDAs) are optimization algorithms that
learn a distribution on the search space from which good solutions can be
sampled easily. A key parameter of most EDAs is the sample size (population
size). If the population size is too small, the update of the probabilistic
model builds on few samples, leading to the undesired effect of genetic drift.
Too large population sizes avoid genetic drift, but slow down the process.
Building on a recent quantitative analysis of how the population size leads
to genetic drift, we design a smart-restart mechanism for EDAs. By stopping
runs when the risk for genetic drift is high, it automatically runs the EDA in
good parameter regimes.
Via a mathematical runtime analysis, we prove a general performance guarantee
for this smart-restart scheme. This in particular shows that in many situations
where the optimal (problem-specific) parameter values are known, the restart
scheme automatically finds these, leading to the asymptotically optimal
performance.
We also conduct an extensive experimental analysis. On four classic benchmark
problems, we clearly observe the critical influence of the population size on
the performance, and we find that the smart-restart scheme leads to a
performance close to the one obtainable with optimal parameter values. Our
results also show that previous theory-based suggestions for the optimal
population size can be far from the optimal ones, leading to a performance
clearly inferior to the one obtained via the smart-restart scheme. We also
conduct experiments with PBIL (cross-entropy algorithm) on two combinatorial
optimization problems from the literature, the max-cut problem and the
bipartition problem. Again, we observe that the smart-restart mechanism finds
much better values for the population size than those suggested in the
literature, leading to a much better performance.Comment: Accepted for publication in "Journal of Machine Learning Research".
Extended version of our GECCO 2020 paper. This article supersedes
arXiv:2004.0714
A Smooth Primal-Dual Optimization Framework for Nonsmooth Composite Convex Minimization
We propose a new first-order primal-dual optimization framework for a convex
optimization template with broad applications. Our optimization algorithms
feature optimal convergence guarantees under a variety of common structure
assumptions on the problem template. Our analysis relies on a novel combination
of three classic ideas applied to the primal-dual gap function: smoothing,
acceleration, and homotopy. The algorithms due to the new approach achieve the
best known convergence rate results, in particular when the template consists
of only non-smooth functions. We also outline a restart strategy for the
acceleration to significantly enhance the practical performance. We demonstrate
relations with the augmented Lagrangian method and show how to exploit the
strongly convex objectives with rigorous convergence rate guarantees. We
provide numerical evidence with two examples and illustrate that the new
methods can outperform the state-of-the-art, including Chambolle-Pock, and the
alternating direction method-of-multipliers algorithms.Comment: 35 pages, accepted for publication on SIAM J. Optimization. Tech.
Report, Oct. 2015 (last update Sept. 2016
The Early Restart Algorithm
Consider an algorithm whose time to convergence is unknown (because of some random element in the algorithm, such as a random initial weight choice for neural network training). Consider the following strategy. Run the algorithm for a specific time T. If it has not converged by time T, cut the run short and rerun it from the start (repeat the same strategy for every run). This so-called restart mechanism has been proposed by Fahlman (1988) in the context of backpropagation training. It is advantageous in problems that are prone to local minima or when there is a large variability in convergence time from run to run, and may lead to a speed-up in such cases. In this article, we analyze theoretically the restart mechanism, and obtain conditions on the probability density of the convergence time for which restart will improve the expected convergence time. We also derive the optimal restart time. We apply the derived formulas to several cases, including steepest-descent algorithms
Computational Complexity versus Statistical Performance on Sparse Recovery Problems
We show that several classical quantities controlling compressed sensing
performance directly match classical parameters controlling algorithmic
complexity. We first describe linearly convergent restart schemes on
first-order methods solving a broad range of compressed sensing problems, where
sharpness at the optimum controls convergence speed. We show that for sparse
recovery problems, this sharpness can be written as a condition number, given
by the ratio between true signal sparsity and the largest signal size that can
be recovered by the observation matrix. In a similar vein, Renegar's condition
number is a data-driven complexity measure for convex programs, generalizing
classical condition numbers for linear systems. We show that for a broad class
of compressed sensing problems, the worst case value of this algorithmic
complexity measure taken over all signals matches the restricted singular value
of the observation matrix which controls robust recovery performance. Overall,
this means in both cases that, in compressed sensing problems, a single
parameter directly controls both computational complexity and recovery
performance. Numerical experiments illustrate these points using several
classical algorithms.Comment: Final version, to appear in information and Inferenc
The Potential of Restarts for ProbSAT
This work analyses the potential of restarts for probSAT, a quite successful
algorithm for k-SAT, by estimating its runtime distributions on random 3-SAT
instances that are close to the phase transition. We estimate an optimal
restart time from empirical data, reaching a potential speedup factor of 1.39.
Calculating restart times from fitted probability distributions reduces this
factor to a maximum of 1.30. A spin-off result is that the Weibull distribution
approximates the runtime distribution for over 93% of the used instances well.
A machine learning pipeline is presented to compute a restart time for a
fixed-cutoff strategy to exploit this potential. The main components of the
pipeline are a random forest for determining the distribution type and a neural
network for the distribution's parameters. ProbSAT performs statistically
significantly better than Luby's restart strategy and the policy without
restarts when using the presented approach. The structure is particularly
advantageous on hard problems.Comment: Eurocast 201
Templates for Convex Cone Problems with Applications to Sparse Signal Recovery
This paper develops a general framework for solving a variety of convex cone
problems that frequently arise in signal processing, machine learning,
statistics, and other fields. The approach works as follows: first, determine a
conic formulation of the problem; second, determine its dual; third, apply
smoothing; and fourth, solve using an optimal first-order method. A merit of
this approach is its flexibility: for example, all compressed sensing problems
can be solved via this approach. These include models with objective
functionals such as the total-variation norm, ||Wx||_1 where W is arbitrary, or
a combination thereof. In addition, the paper also introduces a number of
technical contributions such as a novel continuation scheme, a novel approach
for controlling the step size, and some new results showing that the smooth and
unsmoothed problems are sometimes formally equivalent. Combined with our
framework, these lead to novel, stable and computationally efficient
algorithms. For instance, our general implementation is competitive with
state-of-the-art methods for solving intensively studied problems such as the
LASSO. Further, numerical experiments show that one can solve the Dantzig
selector problem, for which no efficient large-scale solvers exist, in a few
hundred iterations. Finally, the paper is accompanied with a software release.
This software is not a single, monolithic solver; rather, it is a suite of
programs and routines designed to serve as building blocks for constructing
complete algorithms.Comment: The TFOCS software is available at http://tfocs.stanford.edu This
version has updated reference
- …