24,102 research outputs found
Online combinatorial optimization with stochastic decision sets and adversarial losses
International audienceMost work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches
Cakewalk Sampling
We study the task of finding good local optima in combinatorial optimization
problems. Although combinatorial optimization is NP-hard in general, locally
optimal solutions are frequently used in practice. Local search methods however
typically converge to a limited set of optima that depend on their
initialization. Sampling methods on the other hand can access any valid
solution, and thus can be used either directly or alongside methods of the
former type as a way for finding good local optima. Since the effectiveness of
this strategy depends on the sampling distribution, we derive a robust learning
algorithm that adapts sampling distributions towards good local optima of
arbitrary objective functions. As a first use case, we empirically study the
efficiency in which sampling methods can recover locally maximal cliques in
undirected graphs. Not only do we show how our adaptive sampler outperforms
related methods, we also show how it can even approach the performance of
established clique algorithms. As a second use case, we consider how greedy
algorithms can be combined with our adaptive sampler, and we demonstrate how
this leads to superior performance in k-medoid clustering. Together, these
findings suggest that our adaptive sampler can provide an effective strategy to
combinatorial optimization problems that arise in practice.Comment: Accepted as a conference paper by AAAI-2020 (oral presentation
First-order regret bounds for combinatorial semi-bandits
We consider the problem of online combinatorial optimization under
semi-bandit feedback, where a learner has to repeatedly pick actions from a
combinatorial decision set in order to minimize the total losses associated
with its decisions. After making each decision, the learner observes the losses
associated with its action, but not other losses. For this problem, there are
several learning algorithms that guarantee that the learner's expected regret
grows as with the number of rounds . In this
paper, we propose an algorithm that improves this scaling to
, where is the total loss of the best
action. Our algorithm is among the first to achieve such guarantees in a
partial-feedback scheme, and the first one to do so in a combinatorial setting.Comment: To appear at COLT 201
An efficient algorithm for learning with semi-bandit feedback
We consider the problem of online combinatorial optimization under
semi-bandit feedback. The goal of the learner is to sequentially select its
actions from a combinatorial decision set so as to minimize its cumulative
loss. We propose a learning algorithm for this problem based on combining the
Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss
estimation procedure called Geometric Resampling (GR). Contrary to previous
solutions, the resulting algorithm can be efficiently implemented for any
decision set where efficient offline combinatorial optimization is possible at
all. Assuming that the elements of the decision set can be described with
d-dimensional binary vectors with at most m non-zero entries, we show that the
expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a
side result, we also improve the best known regret bounds for FPL in the full
information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m)
over previous bounds for this algorithm.Comment: submitted to ALT 201
Stochastic Combinatorial Optimization via Poisson Approximation
We study several stochastic combinatorial problems, including the expected
utility maximization problem, the stochastic knapsack problem and the
stochastic bin packing problem. A common technical challenge in these problems
is to optimize some function of the sum of a set of random variables. The
difficulty is mainly due to the fact that the probability distribution of the
sum is the convolution of a set of distributions, which is not an easy
objective function to work with. To tackle this difficulty, we introduce the
Poisson approximation technique. The technique is based on the Poisson
approximation theorem discovered by Le Cam, which enables us to approximate the
distribution of the sum of a set of random variables using a compound Poisson
distribution.
We first study the expected utility maximization problem introduced recently
[Li and Despande, FOCS11]. For monotone and Lipschitz utility functions, we
obtain an additive PTAS if there is a multidimensional PTAS for the
multi-objective version of the problem, strictly generalizing the previous
result.
For the stochastic bin packing problem (introduced in [Kleinberg, Rabani and
Tardos, STOC97]), we show there is a polynomial time algorithm which uses at
most the optimal number of bins, if we relax the size of each bin and the
overflow probability by eps.
For stochastic knapsack, we show a 1+eps-approximation using eps extra
capacity, even when the size and reward of each item may be correlated and
cancelations of items are allowed. This generalizes the previous work [Balghat,
Goel and Khanna, SODA11] for the case without correlation and cancelation. Our
algorithm is also simpler. We also present a factor 2+eps approximation
algorithm for stochastic knapsack with cancelations. the current known
approximation factor of 8 [Gupta, Krishnaswamy, Molinaro and Ravi, FOCS11].Comment: 42 pages, 1 figure, Preliminary version appears in the Proceeding of
the 45th ACM Symposium on the Theory of Computing (STOC13
- …