3,791 research outputs found
Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems
In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with
unknown reward models. At each time, a player selects one arm to play, aiming
to maximize the total expected reward over a horizon of length T. An approach
based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is
developed for constructing sequential arm selection policies. It is shown that
for all light-tailed reward distributions, DSEE achieves the optimal
logarithmic order of the regret, where regret is defined as the total expected
reward loss against the ideal case with known reward models. For heavy-tailed
reward distributions, DSEE achieves O(T^1/p) regret when the moments of the
reward distributions exist up to the pth order for 1<p<=2 and O(T^1/(1+p/2))
for p>2. With the knowledge of an upperbound on a finite moment of the
heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret
order. The proposed DSEE approach complements existing work on MAB by providing
corresponding results for general reward distributions. Furthermore, with a
clearly defined tunable parameter-the cardinality of the exploration sequence,
the DSEE approach is easily extendable to variations of MAB, including MAB with
various objectives, decentralized MAB with multiple players and incomplete
reward observations under collisions, MAB with unknown Markov dynamics, and
combinatorial MAB with dependent arms that often arise in network optimization
problems such as the shortest path, the minimum spanning, and the dominating
set problems under unknown random weights.Comment: 22 pages, 2 figure
The Potential of Restarts for ProbSAT
This work analyses the potential of restarts for probSAT, a quite successful
algorithm for k-SAT, by estimating its runtime distributions on random 3-SAT
instances that are close to the phase transition. We estimate an optimal
restart time from empirical data, reaching a potential speedup factor of 1.39.
Calculating restart times from fitted probability distributions reduces this
factor to a maximum of 1.30. A spin-off result is that the Weibull distribution
approximates the runtime distribution for over 93% of the used instances well.
A machine learning pipeline is presented to compute a restart time for a
fixed-cutoff strategy to exploit this potential. The main components of the
pipeline are a random forest for determining the distribution type and a neural
network for the distribution's parameters. ProbSAT performs statistically
significantly better than Luby's restart strategy and the policy without
restarts when using the presented approach. The structure is particularly
advantageous on hard problems.Comment: Eurocast 201
Runtime Distributions and Criteria for Restarts
Randomized algorithms sometimes employ a restart strategy. After a certain
number of steps, the current computation is aborted and restarted with a new,
independent random seed. In some cases, this results in an improved overall
expected runtime. This work introduces properties of the underlying runtime
distribution which determine whether restarts are advantageous. The most
commonly used probability distributions admit the use of a scale and a location
parameter. Location parameters shift the density function to the right, while
scale parameters affect the spread of the distribution. It is shown that for
all distributions scale parameters do not influence the usefulness of restarts
and that location parameters only have a limited influence. This result
simplifies the analysis of the usefulness of restarts. The most important
runtime probability distributions are the log-normal, the Weibull, and the
Pareto distribution. In this work, these distributions are analyzed for the
usefulness of restarts. Secondly, a condition for the optimal restart time (if
it exists) is provided. The log-normal, the Weibull, and the generalized Pareto
distribution are analyzed in this respect. Moreover, it is shown that the
optimal restart time is also not influenced by scale parameters and that the
influence of location parameters is only linear
- …