12,400 research outputs found

    Reducing Dueling Bandits to Cardinal Bandits

    Full text link
    We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions -- named \Doubler, \MultiSbm and \DoubleSbm -- provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For \Doubler and \MultiSbm we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of \DoubleSbm which empirically outperforms the other two as well as previous algorithms in our experiments. In addition, we provide the first almost optimal regret bound in terms of second order terms, such as the differences between the values of the arms

    Boosting Variational Inference: an Optimization Perspective

    Full text link
    Variational inference is a popular technique to approximate a possibly intractable Bayesian posterior with a more tractable one. Recently, boosting variational inference has been proposed as a new paradigm to approximate the posterior by a mixture of densities by greedily adding components to the mixture. However, as is the case with many other variational inference algorithms, its theoretical properties have not been studied. In the present work, we study the convergence properties of this approach from a modern optimization viewpoint by establishing connections to the classic Frank-Wolfe algorithm. Our analyses yields novel theoretical insights regarding the sufficient conditions for convergence, explicit rates, and algorithmic simplifications. Since a lot of focus in previous works for variational inference has been on tractability, our work is especially important as a much needed attempt to bridge the gap between probabilistic models and their corresponding theoretical properties

    Pure Exploration for Multi-Armed Bandit Problems

    Get PDF
    We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of forecasters that perform an on-line exploration of the arms. These forecasters are assessed in terms of their simple regret, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast to the case when the cumulative regret is considered and when exploitation needs to be performed at the same time. We believe that this performance criterion is suited to situations when the cost of pulling an arm is expressed in terms of resources rather than rewards. We discuss the links between the simple and the cumulative regret. One of the main results in the case of a finite number of arms is a general lower bound on the simple regret of a forecaster in terms of its cumulative regret: the smaller the latter, the larger the former. Keeping this result in mind, we then exhibit upper bounds on the simple regret of some forecasters. The paper ends with a study devoted to continuous-armed bandit problems; we show that the simple regret can be minimized with respect to a family of probability distributions if and only if the cumulative regret can be minimized for it. Based on this equivalence, we are able to prove that the separable metric spaces are exactly the metric spaces on which these regrets can be minimized with respect to the family of all probability distributions with continuous mean-payoff functions

    Multi-agent pathfinding for unmanned aerial vehicles

    Get PDF
    Unmanned aerial vehicles (UAVs), commonly known as drones, have become more and more prevalent in recent years. In particular, governmental organizations and companies around the world are starting to research how UAVs can be used to perform tasks such as package deliver, disaster investigation and surveillance of key assets such as pipelines, railroads and bridges. NASA is currently in the early stages of developing an air traffic control system specifically designed to manage UAV operations in low-altitude airspace. Companies such as Amazon and Rakuten are testing large-scale drone deliver services in the USA and Japan. To perform these tasks, safe and conflict-free routes for concurrently operating UAVs must be found. This can be done using multi-agent pathfinding (mapf) algorithms, although the correct choice of algorithms is not clear. This is because many state of the art mapf algorithms have only been tested in 2D space in maps with many obstacles, while UAVs operate in 3D space in open maps with few obstacles. In addition, when an unexpected event occurs in the airspace and UAVs are forced to deviate from their original routes while inflight, new conflict-free routes must be found. Planning for these unexpected events is commonly known as contingency planning. With manned aircraft, contingency plans can be created in advance or on a case-by-case basis while inflight. The scale at which UAVs operate, combined with the fact that unexpected events may occur anywhere at any time make both advanced planning and planning on a case-by-case basis impossible. Thus, a new approach is needed. Online multi-agent pathfinding (online mapf) looks to be a promising solution. Online mapf utilizes traditional mapf algorithms to perform path planning in real-time. That is, new routes for UAVs are found while inflight. The primary contribution of this thesis is to present one possible approach to UAV contingency planning using online multi-agent pathfinding algorithms, which can be used as a baseline for future research and development. It also provides an in-depth overview and analysis of offline mapf algorithms with the goal of determining which ones are likely to perform best when applied to UAVs. Finally, to further this same goal, a few different mapf algorithms are experimentally tested and analyzed

    On the Complexity of Exact Maximum-Likelihood Decoding for Asymptotically Good Low Density Parity Check Codes: A New Perspective

    Get PDF
    The problem of exact maximum-likelihood (ML) decoding of general linear codes is well-known to be NP-hard. In this paper, we show that exact ML decoding of a class of asymptotically good low density parity check codes — expander codes — over binary symmetric channels (BSCs) is possible with an average-case polynomial complexity. This offers a new way of looking at the complexity issue of exact ML decoding for communication systems where the randomness in channel plays a fundamental central role. More precisely, for any bit-flipping probability p in a nontrivial range, there exists a rate region of non-zero support and a family of asymptotically good codes which achieve error probability exponentially decaying in coding length n while admitting exact ML decoding in average-case polynomial time. As p approaches zero, this rate region approaches the Shannon channel capacity region. Similar results can be extended to AWGN channels, suggesting it may be feasible to eliminate the error floor phenomenon associated with belief-propagation decoding of LDPC codes in the high SNR regime. The derivations are based on a hierarchy of ML certificate decoding algorithms adaptive to the channel realization. In this process, we propose an efficient O(n^2) new ML certificate algorithm based on the max-flow algorithm. Moreover, exact ML decoding of the considered class of codes constructed from LDPC codes with regular left degree, of which the considered expander codes are a special case, remains NP-hard; thus giving an interesting contrast between the worst-case and average-case complexities

    Multiple-Play Bandits in the Position-Based Model

    Full text link
    Sequentially learning to place items in multi-position displays or lists is a task that can be cast into the multiple-play semi-bandit setting. However, a major concern in this context is when the system cannot decide whether the user feedback for each item is actually exploitable. Indeed, much of the content may have been simply ignored by the user. The present work proposes to exploit available information regarding the display position bias under the so-called Position-based click model (PBM). We first discuss how this model differs from the Cascade model and its variants considered in several recent works on multiple-play bandits. We then provide a novel regret lower bound for this model as well as computationally efficient algorithms that display good empirical and theoretical performance
    • …
    corecore