8,557 research outputs found
A Wide Range No-Regret Theorem
In a sequential decision problem at any stage a decision maker, based on the history, takes a decision and receives a payoff which depends also on the realized state of nature. A strategy, f, is said to be as good as an alternative strategy g at a sequence of states, if in the long run f does, on average, at least as well as g does. It is shown that for any distribution, P, over the alternative strategies there is a strategy f which is, at any sequence of states, as good as P-almost any alternative g.No-regret, Approachability, large spaces
Fighting Bandits with a New Kind of Smoothness
We define a novel family of algorithms for the adversarial multi-armed bandit
problem, and provide a simple analysis technique based on convex smoothing. We
prove two main results. First, we show that regularization via the
\emph{Tsallis entropy}, which includes EXP3 as a special case, achieves the
minimax regret. Second, we show that a wide class of
perturbation methods achieve a near-optimal regret as low as if the perturbation distribution has a bounded hazard rate. For example,
the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this
key property.Comment: In Proceedings of NIPS, 201
Adversarially Robust Optimization with Gaussian Processes
In this paper, we consider the problem of Gaussian process (GP) optimization
with an added robustness requirement: The returned point may be perturbed by an
adversary, and we require the function value to remain as high as possible even
after this perturbation. This problem is motivated by settings in which the
underlying functions during optimization and implementation stages are
different, or when one is interested in finding an entire region of good inputs
rather than only a single point. We show that standard GP optimization
algorithms do not exhibit the desired robustness properties, and provide a
novel confidence-bound based algorithm StableOpt for this purpose. We
rigorously establish the required number of samples for StableOpt to find a
near-optimal point, and we complement this guarantee with an
algorithm-independent lower bound. We experimentally demonstrate several
potential applications of interest using real-world data sets, and we show that
StableOpt consistently succeeds in finding a stable maximizer where several
baseline methods fail.Comment: Corrected typo
Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems
In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with
unknown reward models. At each time, a player selects one arm to play, aiming
to maximize the total expected reward over a horizon of length T. An approach
based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is
developed for constructing sequential arm selection policies. It is shown that
for all light-tailed reward distributions, DSEE achieves the optimal
logarithmic order of the regret, where regret is defined as the total expected
reward loss against the ideal case with known reward models. For heavy-tailed
reward distributions, DSEE achieves O(T^1/p) regret when the moments of the
reward distributions exist up to the pth order for 1<p<=2 and O(T^1/(1+p/2))
for p>2. With the knowledge of an upperbound on a finite moment of the
heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret
order. The proposed DSEE approach complements existing work on MAB by providing
corresponding results for general reward distributions. Furthermore, with a
clearly defined tunable parameter-the cardinality of the exploration sequence,
the DSEE approach is easily extendable to variations of MAB, including MAB with
various objectives, decentralized MAB with multiple players and incomplete
reward observations under collisions, MAB with unknown Markov dynamics, and
combinatorial MAB with dependent arms that often arise in network optimization
problems such as the shortest path, the minimum spanning, and the dominating
set problems under unknown random weights.Comment: 22 pages, 2 figure
Stochastic Subgradient Algorithms for Strongly Convex Optimization over Distributed Networks
We study diffusion and consensus based optimization of a sum of unknown
convex objective functions over distributed networks. The only access to these
functions is through stochastic gradient oracles, each of which is only
available at a different node, and a limited number of gradient oracle calls is
allowed at each node. In this framework, we introduce a convex optimization
algorithm based on the stochastic gradient descent (SGD) updates. Particularly,
we use a carefully designed time-dependent weighted averaging of the SGD
iterates, which yields a convergence rate of
after gradient updates for each node on
a network of nodes. We then show that after gradient oracle calls, the
average SGD iterate achieves a mean square deviation (MSD) of
. This rate of convergence is optimal as it
matches the performance lower bound up to constant terms. Similar to the SGD
algorithm, the computational complexity of the proposed algorithm also scales
linearly with the dimensionality of the data. Furthermore, the communication
load of the proposed method is the same as the communication load of the SGD
algorithm. Thus, the proposed algorithm is highly efficient in terms of
complexity and communication load. We illustrate the merits of the algorithm
with respect to the state-of-art methods over benchmark real life data sets and
widely studied network topologies
- âŠ