3,791 research outputs found

    Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

    Full text link
    In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. It is shown that for all light-tailed reward distributions, DSEE achieves the optimal logarithmic order of the regret, where regret is defined as the total expected reward loss against the ideal case with known reward models. For heavy-tailed reward distributions, DSEE achieves O(T^1/p) regret when the moments of the reward distributions exist up to the pth order for 1<p<=2 and O(T^1/(1+p/2)) for p>2. With the knowledge of an upperbound on a finite moment of the heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret order. The proposed DSEE approach complements existing work on MAB by providing corresponding results for general reward distributions. Furthermore, with a clearly defined tunable parameter-the cardinality of the exploration sequence, the DSEE approach is easily extendable to variations of MAB, including MAB with various objectives, decentralized MAB with multiple players and incomplete reward observations under collisions, MAB with unknown Markov dynamics, and combinatorial MAB with dependent arms that often arise in network optimization problems such as the shortest path, the minimum spanning, and the dominating set problems under unknown random weights.Comment: 22 pages, 2 figure

    The Potential of Restarts for ProbSAT

    Full text link
    This work analyses the potential of restarts for probSAT, a quite successful algorithm for k-SAT, by estimating its runtime distributions on random 3-SAT instances that are close to the phase transition. We estimate an optimal restart time from empirical data, reaching a potential speedup factor of 1.39. Calculating restart times from fitted probability distributions reduces this factor to a maximum of 1.30. A spin-off result is that the Weibull distribution approximates the runtime distribution for over 93% of the used instances well. A machine learning pipeline is presented to compute a restart time for a fixed-cutoff strategy to exploit this potential. The main components of the pipeline are a random forest for determining the distribution type and a neural network for the distribution's parameters. ProbSAT performs statistically significantly better than Luby's restart strategy and the policy without restarts when using the presented approach. The structure is particularly advantageous on hard problems.Comment: Eurocast 201

    Runtime Distributions and Criteria for Restarts

    Full text link
    Randomized algorithms sometimes employ a restart strategy. After a certain number of steps, the current computation is aborted and restarted with a new, independent random seed. In some cases, this results in an improved overall expected runtime. This work introduces properties of the underlying runtime distribution which determine whether restarts are advantageous. The most commonly used probability distributions admit the use of a scale and a location parameter. Location parameters shift the density function to the right, while scale parameters affect the spread of the distribution. It is shown that for all distributions scale parameters do not influence the usefulness of restarts and that location parameters only have a limited influence. This result simplifies the analysis of the usefulness of restarts. The most important runtime probability distributions are the log-normal, the Weibull, and the Pareto distribution. In this work, these distributions are analyzed for the usefulness of restarts. Secondly, a condition for the optimal restart time (if it exists) is provided. The log-normal, the Weibull, and the generalized Pareto distribution are analyzed in this respect. Moreover, it is shown that the optimal restart time is also not influenced by scale parameters and that the influence of location parameters is only linear
    • …
    corecore