21,625 research outputs found
An efficient algorithm for learning with semi-bandit feedback
We consider the problem of online combinatorial optimization under
semi-bandit feedback. The goal of the learner is to sequentially select its
actions from a combinatorial decision set so as to minimize its cumulative
loss. We propose a learning algorithm for this problem based on combining the
Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss
estimation procedure called Geometric Resampling (GR). Contrary to previous
solutions, the resulting algorithm can be efficiently implemented for any
decision set where efficient offline combinatorial optimization is possible at
all. Assuming that the elements of the decision set can be described with
d-dimensional binary vectors with at most m non-zero entries, we show that the
expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a
side result, we also improve the best known regret bounds for FPL in the full
information setting to O(m^(3/2) sqrt(T log d)), gaining a factor of sqrt(d/m)
over previous bounds for this algorithm.Comment: submitted to ALT 201
Best-of-Both-Worlds Algorithms for Partial Monitoring
This study considers the partial monitoring problem with -actions and
-outcomes and provides the first best-of-both-worlds algorithms, whose
regrets are favorably bounded both in the stochastic and adversarial regimes.
In particular, we show that for non-degenerate locally observable games, the
regret is in the
stochastic regime and in the
adversarial regime, where is the number of rounds, is the maximum
number of distinct observations per action, is the minimum
suboptimality gap, and is the number of Pareto optimal actions.
Moreover, we show that for globally observable games, the regret is
in the
stochastic regime and in the adversarial regime, where is a
game-dependent constant. We also provide regret bounds for a stochastic regime
with adversarial corruptions. Our algorithms are based on the
follow-the-regularized-leader framework and are inspired by the approach of
exploration by optimization and the adaptive learning rate in the field of
online learning with feedback graphs.Comment: 31 page
Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method
We give a new approach to the dictionary learning (also known as "sparse
coding") problem of recovering an unknown matrix (for ) from examples of the form where is a random vector in
with at most nonzero coordinates, and is a random
noise vector in with bounded magnitude. For the case ,
our algorithm recovers every column of within arbitrarily good constant
accuracy in time , in particular achieving
polynomial time if for any , and time if is (a sufficiently small) constant. Prior algorithms with
comparable assumptions on the distribution required the vector to be much
sparser---at most nonzero coordinates---and there were intrinsic
barriers preventing these algorithms from applying for denser .
We achieve this by designing an algorithm for noisy tensor decomposition that
can recover, under quite general conditions, an approximate rank-one
decomposition of a tensor , given access to a tensor that is
-close to in the spectral norm (when considered as a matrix). To our
knowledge, this is the first algorithm for tensor decomposition that works in
the constant spectral-norm noise regime, where there is no guarantee that the
local optima of and have similar structures.
Our algorithm is based on a novel approach to using and analyzing the Sum of
Squares semidefinite programming hierarchy (Parrilo 2000, Lasserre 2001), and
it can be viewed as an indication of the utility of this very general and
powerful tool for unsupervised learning problems
Approximations for Throughput Maximization
In this paper we study the classical problem of throughput maximization. In
this problem we have a collection of jobs, each having a release time
, deadline , and processing time . They have to be scheduled
non-preemptively on identical parallel machines. The goal is to find a
schedule which maximizes the number of jobs scheduled entirely in their
window. This problem has been studied extensively (even for the
case of ). Several special cases of the problem remain open. Bar-Noy et
al. [STOC1999] presented an algorithm with ratio for
machines, which approaches as increases. For ,
Chuzhoy-Ostrovsky-Rabani [FOCS2001] presented an algorithm with approximation
with ratio (for any ). Recently
Im-Li-Moseley [IPCO2017] presented an algorithm with ratio
for some absolute constant for any
fixed . They also presented an algorithm with ratio for general which approaches 1 as grows. The
approximability of the problem for remains a major open question. Even
for the case of and distinct processing times the problem is
open (Sgall [ESA2012]). In this paper we study the case of and show
that if there are distinct processing times, i.e. 's come from a set
of size , then there is a -approximation that runs in time
, where is the largest deadline.
Therefore, for constant and constant this yields a PTAS. Our algorithm
is based on proving structural properties for a near optimum solution that
allows one to use a dynamic programming with pruning
Stackelberg Network Pricing Games
We study a multi-player one-round game termed Stackelberg Network Pricing
Game, in which a leader can set prices for a subset of priceable edges in a
graph. The other edges have a fixed cost. Based on the leader's decision one or
more followers optimize a polynomial-time solvable combinatorial minimization
problem and choose a minimum cost solution satisfying their requirements based
on the fixed costs and the leader's prices. The leader receives as revenue the
total amount of prices paid by the followers for priceable edges in their
solutions, and the problem is to find revenue maximizing prices. Our model
extends several known pricing problems, including single-minded and unit-demand
pricing, as well as Stackelberg pricing for certain follower problems like
shortest path or minimum spanning tree. Our first main result is a tight
analysis of a single-price algorithm for the single follower game, which
provides a -approximation for any . This can
be extended to provide a -approximation for the
general problem and followers. The latter result is essentially best
possible, as the problem is shown to be hard to approximate within
\mathcal{O(\log^\epsilon k + \log^\epsilon m). If followers have demands, the
single-price algorithm provides a -approximation, and the
problem is hard to approximate within \mathcal{O(m^\epsilon) for some
. Our second main result is a polynomial time algorithm for
revenue maximization in the special case of Stackelberg bipartite vertex cover,
which is based on non-trivial max-flow and LP-duality techniques. Our results
can be extended to provide constant-factor approximations for any constant
number of followers
An Efficient Interior-Point Method for Online Convex Optimization
A new algorithm for regret minimization in online convex optimization is
described. The regret of the algorithm after time periods is - which is the minimum possible up to a logarithmic term. In
addition, the new algorithm is adaptive, in the sense that the regret bounds
hold not only for the time periods but also for every sub-interval
. The running time of the algorithm matches that of newly
introduced interior point algorithms for regret minimization: in
-dimensional space, during each iteration the new algorithm essentially
solves a system of linear equations of order , rather than solving some
constrained convex optimization problem in dimensions and possibly many
constraints
- …