24,102 research outputs found

    Online combinatorial optimization with stochastic decision sets and adversarial losses

    Get PDF
    International audienceMost work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches

    Cakewalk Sampling

    Full text link
    We study the task of finding good local optima in combinatorial optimization problems. Although combinatorial optimization is NP-hard in general, locally optimal solutions are frequently used in practice. Local search methods however typically converge to a limited set of optima that depend on their initialization. Sampling methods on the other hand can access any valid solution, and thus can be used either directly or alongside methods of the former type as a way for finding good local optima. Since the effectiveness of this strategy depends on the sampling distribution, we derive a robust learning algorithm that adapts sampling distributions towards good local optima of arbitrary objective functions. As a first use case, we empirically study the efficiency in which sampling methods can recover locally maximal cliques in undirected graphs. Not only do we show how our adaptive sampler outperforms related methods, we also show how it can even approach the performance of established clique algorithms. As a second use case, we consider how greedy algorithms can be combined with our adaptive sampler, and we demonstrate how this leads to superior performance in k-medoid clustering. Together, these findings suggest that our adaptive sampler can provide an effective strategy to combinatorial optimization problems that arise in practice.Comment: Accepted as a conference paper by AAAI-2020 (oral presentation

    First-order regret bounds for combinatorial semi-bandits

    Get PDF
    We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions. After making each decision, the learner observes the losses associated with its action, but not other losses. For this problem, there are several learning algorithms that guarantee that the learner's expected regret grows as O~(T)\widetilde{O}(\sqrt{T}) with the number of rounds TT. In this paper, we propose an algorithm that improves this scaling to O~(LT∗)\widetilde{O}(\sqrt{{L_T^*}}), where LT∗L_T^* is the total loss of the best action. Our algorithm is among the first to achieve such guarantees in a partial-feedback scheme, and the first one to do so in a combinatorial setting.Comment: To appear at COLT 201

    An efficient algorithm for learning with semi-bandit feedback

    Full text link
    We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a side result, we also improve the best known regret bounds for FPL in the full information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m) over previous bounds for this algorithm.Comment: submitted to ALT 201

    Stochastic Combinatorial Optimization via Poisson Approximation

    Full text link
    We study several stochastic combinatorial problems, including the expected utility maximization problem, the stochastic knapsack problem and the stochastic bin packing problem. A common technical challenge in these problems is to optimize some function of the sum of a set of random variables. The difficulty is mainly due to the fact that the probability distribution of the sum is the convolution of a set of distributions, which is not an easy objective function to work with. To tackle this difficulty, we introduce the Poisson approximation technique. The technique is based on the Poisson approximation theorem discovered by Le Cam, which enables us to approximate the distribution of the sum of a set of random variables using a compound Poisson distribution. We first study the expected utility maximization problem introduced recently [Li and Despande, FOCS11]. For monotone and Lipschitz utility functions, we obtain an additive PTAS if there is a multidimensional PTAS for the multi-objective version of the problem, strictly generalizing the previous result. For the stochastic bin packing problem (introduced in [Kleinberg, Rabani and Tardos, STOC97]), we show there is a polynomial time algorithm which uses at most the optimal number of bins, if we relax the size of each bin and the overflow probability by eps. For stochastic knapsack, we show a 1+eps-approximation using eps extra capacity, even when the size and reward of each item may be correlated and cancelations of items are allowed. This generalizes the previous work [Balghat, Goel and Khanna, SODA11] for the case without correlation and cancelation. Our algorithm is also simpler. We also present a factor 2+eps approximation algorithm for stochastic knapsack with cancelations. the current known approximation factor of 8 [Gupta, Krishnaswamy, Molinaro and Ravi, FOCS11].Comment: 42 pages, 1 figure, Preliminary version appears in the Proceeding of the 45th ACM Symposium on the Theory of Computing (STOC13
    • …
    corecore