12,400 research outputs found
Reducing Dueling Bandits to Cardinal Bandits
We present algorithms for reducing the Dueling Bandits problem to the
conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits
problem is an online model of learning with ordinal feedback of the form "A is
preferred to B" (as opposed to cardinal feedback like "A has value 2.5"),
giving it wide applicability in learning from implicit user feedback and
revealed and stated preferences. In contrast to existing algorithms for the
Dueling Bandits problem, our reductions -- named \Doubler, \MultiSbm and
\DoubleSbm -- provide a generic schema for translating the extensive body of
known results about conventional Multi-Armed Bandit algorithms to the Dueling
Bandits setting. For \Doubler and \MultiSbm we prove regret upper bounds in
both finite and infinite settings, and conjecture about the performance of
\DoubleSbm which empirically outperforms the other two as well as previous
algorithms in our experiments. In addition, we provide the first almost optimal
regret bound in terms of second order terms, such as the differences between
the values of the arms
Boosting Variational Inference: an Optimization Perspective
Variational inference is a popular technique to approximate a possibly
intractable Bayesian posterior with a more tractable one. Recently, boosting
variational inference has been proposed as a new paradigm to approximate the
posterior by a mixture of densities by greedily adding components to the
mixture. However, as is the case with many other variational inference
algorithms, its theoretical properties have not been studied. In the present
work, we study the convergence properties of this approach from a modern
optimization viewpoint by establishing connections to the classic Frank-Wolfe
algorithm. Our analyses yields novel theoretical insights regarding the
sufficient conditions for convergence, explicit rates, and algorithmic
simplifications. Since a lot of focus in previous works for variational
inference has been on tractability, our work is especially important as a much
needed attempt to bridge the gap between probabilistic models and their
corresponding theoretical properties
Pure Exploration for Multi-Armed Bandit Problems
We consider the framework of stochastic multi-armed bandit problems and study
the possibilities and limitations of forecasters that perform an on-line
exploration of the arms. These forecasters are assessed in terms of their
simple regret, a regret notion that captures the fact that exploration is only
constrained by the number of available rounds (not necessarily known in
advance), in contrast to the case when the cumulative regret is considered and
when exploitation needs to be performed at the same time. We believe that this
performance criterion is suited to situations when the cost of pulling an arm
is expressed in terms of resources rather than rewards. We discuss the links
between the simple and the cumulative regret. One of the main results in the
case of a finite number of arms is a general lower bound on the simple regret
of a forecaster in terms of its cumulative regret: the smaller the latter, the
larger the former. Keeping this result in mind, we then exhibit upper bounds on
the simple regret of some forecasters. The paper ends with a study devoted to
continuous-armed bandit problems; we show that the simple regret can be
minimized with respect to a family of probability distributions if and only if
the cumulative regret can be minimized for it. Based on this equivalence, we
are able to prove that the separable metric spaces are exactly the metric
spaces on which these regrets can be minimized with respect to the family of
all probability distributions with continuous mean-payoff functions
Multi-agent pathfinding for unmanned aerial vehicles
Unmanned aerial vehicles (UAVs), commonly known as drones, have become more and
more prevalent in recent years. In particular, governmental organizations and companies
around the world are starting to research how UAVs can be used to perform tasks such
as package deliver, disaster investigation and surveillance of key assets such as pipelines,
railroads and bridges. NASA is currently in the early stages of developing an air traffic
control system specifically designed to manage UAV operations in low-altitude airspace.
Companies such as Amazon and Rakuten are testing large-scale drone deliver services in
the USA and Japan.
To perform these tasks, safe and conflict-free routes for concurrently operating UAVs must
be found. This can be done using multi-agent pathfinding (mapf) algorithms, although
the correct choice of algorithms is not clear. This is because many state of the art mapf
algorithms have only been tested in 2D space in maps with many obstacles, while UAVs
operate in 3D space in open maps with few obstacles. In addition, when an unexpected
event occurs in the airspace and UAVs are forced to deviate from their original routes
while inflight, new conflict-free routes must be found. Planning for these unexpected
events is commonly known as contingency planning. With manned aircraft, contingency
plans can be created in advance or on a case-by-case basis while inflight. The scale at
which UAVs operate, combined with the fact that unexpected events may occur anywhere
at any time make both advanced planning and planning on a case-by-case basis impossible.
Thus, a new approach is needed. Online multi-agent pathfinding (online mapf) looks to
be a promising solution. Online mapf utilizes traditional mapf algorithms to perform path
planning in real-time. That is, new routes for UAVs are found while inflight.
The primary contribution of this thesis is to present one possible approach to UAV
contingency planning using online multi-agent pathfinding algorithms, which can be used
as a baseline for future research and development. It also provides an in-depth overview
and analysis of offline mapf algorithms with the goal of determining which ones are likely
to perform best when applied to UAVs. Finally, to further this same goal, a few different
mapf algorithms are experimentally tested and analyzed
On the Complexity of Exact Maximum-Likelihood Decoding for Asymptotically Good Low Density Parity Check Codes: A New Perspective
The problem of exact maximum-likelihood (ML) decoding of general linear codes is well-known to be NP-hard. In this paper, we show that exact ML decoding of a class of asymptotically good low density parity check codes — expander codes — over binary symmetric channels (BSCs) is possible with an average-case polynomial complexity. This offers a new way of looking at the complexity issue of exact ML decoding for communication systems where the randomness in channel plays a fundamental central role. More precisely, for any bit-flipping probability p in a nontrivial range, there exists a rate region of non-zero support and a family of asymptotically good codes which achieve error probability exponentially decaying in coding length n while admitting exact ML decoding in average-case polynomial time. As p approaches zero, this rate region approaches the Shannon channel capacity region. Similar results can be extended to AWGN channels, suggesting it may be feasible to eliminate the error floor phenomenon associated with belief-propagation decoding of LDPC codes in the high SNR regime. The derivations are based on a hierarchy of ML certificate decoding algorithms adaptive to the channel realization. In this process, we propose an efficient O(n^2) new ML certificate algorithm based on the max-flow algorithm. Moreover, exact ML decoding of the considered class of codes constructed from LDPC codes with regular left degree, of which the considered expander codes are a special case, remains NP-hard; thus giving an interesting contrast between the worst-case and average-case complexities
Multiple-Play Bandits in the Position-Based Model
Sequentially learning to place items in multi-position displays or lists is a
task that can be cast into the multiple-play semi-bandit setting. However, a
major concern in this context is when the system cannot decide whether the user
feedback for each item is actually exploitable. Indeed, much of the content may
have been simply ignored by the user. The present work proposes to exploit
available information regarding the display position bias under the so-called
Position-based click model (PBM). We first discuss how this model differs from
the Cascade model and its variants considered in several recent works on
multiple-play bandits. We then provide a novel regret lower bound for this
model as well as computationally efficient algorithms that display good
empirical and theoretical performance
- …