2,995 research outputs found
Learning sequential and parallel runtime distributions for randomized algorithms
In cloud systems, computation time can be rented by the hour and for a given number of processors. Thus, accurate predictions of the behaviour of both sequential and parallel algorithms has become an important issue, in particular in the case of costly methods such as randomized combinatorial optimization tools. In this work, our objective is to use machine learning to predict performance of sequential and parallel local search algorithms. In addition to classical features of the instances used by other machine learning tools, we consider data on the sequential runtime distributions of a local search method. This allows us to predict with a high accuracy the parallel computation time of a large class of instances, by learning the behaviour of the sequential version of the algorithm on a small number of instances. Experiments with three solvers on SAT and TSP instances indicate that our method works well, with a correlation coefficient of up to 0.85 for SAT instances and up to 0.95 for TSP instances
The Potential of Restarts for ProbSAT
This work analyses the potential of restarts for probSAT, a quite successful
algorithm for k-SAT, by estimating its runtime distributions on random 3-SAT
instances that are close to the phase transition. We estimate an optimal
restart time from empirical data, reaching a potential speedup factor of 1.39.
Calculating restart times from fitted probability distributions reduces this
factor to a maximum of 1.30. A spin-off result is that the Weibull distribution
approximates the runtime distribution for over 93% of the used instances well.
A machine learning pipeline is presented to compute a restart time for a
fixed-cutoff strategy to exploit this potential. The main components of the
pipeline are a random forest for determining the distribution type and a neural
network for the distribution's parameters. ProbSAT performs statistically
significantly better than Luby's restart strategy and the policy without
restarts when using the presented approach. The structure is particularly
advantageous on hard problems.Comment: Eurocast 201
Runtime Distributions and Criteria for Restarts
Randomized algorithms sometimes employ a restart strategy. After a certain
number of steps, the current computation is aborted and restarted with a new,
independent random seed. In some cases, this results in an improved overall
expected runtime. This work introduces properties of the underlying runtime
distribution which determine whether restarts are advantageous. The most
commonly used probability distributions admit the use of a scale and a location
parameter. Location parameters shift the density function to the right, while
scale parameters affect the spread of the distribution. It is shown that for
all distributions scale parameters do not influence the usefulness of restarts
and that location parameters only have a limited influence. This result
simplifies the analysis of the usefulness of restarts. The most important
runtime probability distributions are the log-normal, the Weibull, and the
Pareto distribution. In this work, these distributions are analyzed for the
usefulness of restarts. Secondly, a condition for the optimal restart time (if
it exists) is provided. The log-normal, the Weibull, and the generalized Pareto
distribution are analyzed in this respect. Moreover, it is shown that the
optimal restart time is also not influenced by scale parameters and that the
influence of location parameters is only linear
Neural Networks for Predicting Algorithm Runtime Distributions
Many state-of-the-art algorithms for solving hard combinatorial problems in
artificial intelligence (AI) include elements of stochasticity that lead to
high variations in runtime, even for a fixed problem instance. Knowledge about
the resulting runtime distributions (RTDs) of algorithms on given problem
instances can be exploited in various meta-algorithmic procedures, such as
algorithm selection, portfolios, and randomized restarts. Previous work has
shown that machine learning can be used to individually predict mean, median
and variance of RTDs. To establish a new state-of-the-art in predicting RTDs,
we demonstrate that the parameters of an RTD should be learned jointly and that
neural networks can do this well by directly optimizing the likelihood of an
RTD given runtime observations. In an empirical study involving five algorithms
for SAT solving and AI planning, we show that neural networks predict the true
RTDs of unseen instances better than previous methods, and can even do so when
only few runtime observations are available per training instance
Efficient Benchmarking of Algorithm Configuration Procedures via Model-Based Surrogates
The optimization of algorithm (hyper-)parameters is crucial for achieving
peak performance across a wide range of domains, ranging from deep neural
networks to solvers for hard combinatorial problems. The resulting algorithm
configuration (AC) problem has attracted much attention from the machine
learning community. However, the proper evaluation of new AC procedures is
hindered by two key hurdles. First, AC benchmarks are hard to set up. Second
and even more significantly, they are computationally expensive: a single run
of an AC procedure involves many costly runs of the target algorithm whose
performance is to be optimized in a given AC benchmark scenario. One common
workaround is to optimize cheap-to-evaluate artificial benchmark functions
(e.g., Branin) instead of actual algorithms; however, these have different
properties than realistic AC problems. Here, we propose an alternative
benchmarking approach that is similarly cheap to evaluate but much closer to
the original AC problem: replacing expensive benchmarks by surrogate benchmarks
constructed from AC benchmarks. These surrogate benchmarks approximate the
response surface corresponding to true target algorithm performance using a
regression model, and the original and surrogate benchmark share the same
(hyper-)parameter space. In our experiments, we construct and evaluate
surrogate benchmarks for hyperparameter optimization as well as for AC problems
that involve performance optimization of solvers for hard combinatorial
problems, drawing training data from the runs of existing AC procedures. We
show that our surrogate benchmarks capture overall important characteristics of
the AC scenarios, such as high- and low-performing regions, from which they
were derived, while being much easier to use and orders of magnitude cheaper to
evaluate
Structure-Aware Dynamic Scheduler for Parallel Machine Learning
Training large machine learning (ML) models with many variables or parameters
can take a long time if one employs sequential procedures even with stochastic
updates. A natural solution is to turn to distributed computing on a cluster;
however, naive, unstructured parallelization of ML algorithms does not usually
lead to a proportional speedup and can even result in divergence, because
dependencies between model elements can attenuate the computational gains from
parallelization and compromise correctness of inference. Recent efforts toward
this issue have benefited from exploiting the static, a priori block structures
residing in ML algorithms. In this paper, we take this path further by
exploring the dynamic block structures and workloads therein present during ML
program execution, which offers new opportunities for improving convergence,
correctness, and load balancing in distributed ML. We propose and showcase a
general-purpose scheduler, STRADS, for coordinating distributed updates in ML
algorithms, which harnesses the aforementioned opportunities in a systematic
way. We provide theoretical guarantees for our scheduler, and demonstrate its
efficacy versus static block structures on Lasso and Matrix Factorization
- …