20 research outputs found
Label optimal regret bounds for online local learning
We resolve an open question from (Christiano, 2014b) posed in COLT'14
regarding the optimal dependency of the regret achievable for online local
learning on the size of the label set. In this framework the algorithm is shown
a pair of items at each step, chosen from a set of items. The learner then
predicts a label for each item, from a label set of size and receives a
real valued payoff. This is a natural framework which captures many interesting
scenarios such as collaborative filtering, online gambling, and online max cut
among others. (Christiano, 2014a) designed an efficient online learning
algorithm for this problem achieving a regret of , where
is the number of rounds. Information theoretically, one can achieve a regret of
. One of the main open questions left in this framework
concerns closing the above gap.
In this work, we provide a complete answer to the question above via two main
results. We show, via a tighter analysis, that the semi-definite programming
based algorithm of (Christiano, 2014a), in fact achieves a regret of
. Second, we show a matching computational lower bound. Namely,
we show that a polynomial time algorithm for online local learning with lower
regret would imply a polynomial time algorithm for the planted clique problem
which is widely believed to be hard. We prove a similar hardness result under a
related conjecture concerning planted dense subgraphs that we put forth. Unlike
planted clique, the planted dense subgraph problem does not have any known
quasi-polynomial time algorithms.
Computational lower bounds for online learning are relatively rare, and we
hope that the ideas developed in this work will lead to lower bounds for other
online learning scenarios as well.Comment: 13 pages; Changes from previous version: small changes to proofs of
Theorems 1 & 2, a small rewrite of introduction as well (this version is the
same as camera-ready copy in COLT '15
One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits
We address the problem of \emph{`Internal Regret'} in \emph{Sleeping Bandits}
in the fully adversarial setup, as well as draw connections between different
existing notions of sleeping regrets in the multiarmed bandits (MAB) literature
and consequently analyze the implications: Our first contribution is to propose
the new notion of \emph{Internal Regret} for sleeping MAB. We then proposed an
algorithm that yields sublinear regret in that measure, even for a completely
adversarial sequence of losses and availabilities. We further show that a low
sleeping internal regret always implies a low external regret, and as well as a
low policy regret for iid sequence of losses. The main contribution of this
work precisely lies in unifying different notions of existing regret in
sleeping bandits and understand the implication of one to another. Finally, we
also extend our results to the setting of \emph{Dueling Bandits} (DB)--a
preference feedback variant of MAB, and proposed a reduction to MAB idea to
design a low regret algorithm for sleeping dueling bandits with stochastic
preferences and adversarial availabilities. The efficacy of our algorithms is
justified through empirical evaluations
Near-Optimal Algorithms for Online Matrix Prediction
In several online prediction problems of recent interest the comparison class
is composed of matrices with bounded entries. For example, in the online
max-cut problem, the comparison class is matrices which represent cuts of a
given graph and in online gambling the comparison class is matrices which
represent permutations over n teams. Another important example is online
collaborative filtering in which a widely used comparison class is the set of
matrices with a small trace norm. In this paper we isolate a property of
matrices, which we call (beta,tau)-decomposability, and derive an efficient
online learning algorithm, that enjoys a regret bound of O*(sqrt(beta tau T))
for all problems in which the comparison class is composed of
(beta,tau)-decomposable matrices. By analyzing the decomposability of cut
matrices, triangular matrices, and low trace-norm matrices, we derive near
optimal regret bounds for online max-cut, online gambling, and online
collaborative filtering. In particular, this resolves (in the affirmative) an
open problem posed by Abernethy (2010); Kleinberg et al (2010). Finally, we
derive lower bounds for the three problems and show that our upper bounds are
optimal up to logarithmic factors. In particular, our lower bound for the
online collaborative filtering problem resolves another open problem posed by
Shamir and Srebro (2011).Comment: 25 page
Online Optimization of Smoothed Piecewise Constant Functions
We study online optimization of smoothed piecewise constant functions over
the domain [0, 1). This is motivated by the problem of adaptively picking
parameters of learning algorithms as in the recently introduced framework by
Gupta and Roughgarden (2016). Majority of the machine learning literature has
focused on Lipschitz-continuous functions or functions with bounded gradients.
1 This is with good reason---any learning algorithm suffers linear regret even
against piecewise constant functions that are chosen adversarially, arguably
the simplest of non-Lipschitz continuous functions. The smoothed setting we
consider is inspired by the seminal work of Spielman and Teng (2004) and the
recent work of Gupta and Roughgarden---in this setting, the sequence of
functions may be chosen by an adversary, however, with some uncertainty in the
location of discontinuities. We give algorithms that achieve sublinear regret
in the full information and bandit settings
Online combinatorial optimization with stochastic decision sets and adversarial losses
International audienceMost work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches
Open Problem: Online Sabotaged Shortest Path
Abstract There has been much work on extending the prediction with expert advice methodology to the case when experts are composed of components and there are combinatorially many such experts. One of the core examples is the Online Shortest Path problem where the components are edges and the experts are paths. In this note we revisit this online routing problem in the case where in each trial some of the edges or components are sabotaged / blocked. In the vanilla expert setting a known method can solve this extension where experts are now awake or asleep in each trial. We ask whether this technology can be upgraded efficiently to the case when at each trial every component can be awake or asleep. It is easy get to get an initial regret bound by using combinatorially many experts. However it is open whether there are efficient algorithms achieving the same regret
One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits
We address the problem of `Internal Regret' in Sleeping Bandits in the fully adversarial setup, as well as draw connections between different existing notions of sleeping regrets in the multiarmed bandits (MAB) literature and consequently analyze the implications: Our first contribution is to propose the new notion of Internal Regret for sleeping MAB. We then proposed an algorithm that yields sublinear regret in that measure, even for a completely adversarial sequence of losses and availabilities. We further show that a low sleeping internal regret always implies a low external regret, and as well as a low policy regret for iid sequence of losses. The main contribution of this work precisely lies in unifying different notions of existing regret in sleeping bandits and understand the implication of one to another. Finally, we also extend our results to the setting of Dueling Bandits (DB)--a preference feedback variant of MAB, and proposed a reduction to MAB idea to design a low regret algorithm for sleeping dueling bandits with stochastic preferences and adversarial availabilities. The efficacy of our algorithms is justified through empirical evaluations